Data Management Plan (DMP)
Federal and other funding agencies require a data management plan (DMP) to accompany applications for research funding; if a DMP is requested and not included, it may not be accepted or it may be accepted and promptly rejected. We strongly recommend use of the Data Management Plan Tool to assist with data management plan preparation for research grant applications.
The Data Management Plan Tool found at DMPTool.org is managed by the University of California Curation Center (UC3). It is an open-source, international project intended to help researchers fulfill data management plan requirements. Currently, you will need to open independent account at DMPTool. We will continue to work with UC3 as we attempt to make SSO to DMPTool.org possible using ETSU credentials.
Data Management Plans and Grant Applications (what to consider)
Each grant-making agency (research funding agency/federal department) provides specific guidelines and requires specific information to be contained in the data management plan. They generally request information in 5-10 specific areas and ask that you not exceed a specific number of pages, generally between 2-5 maximum; do not exceed the maximum for the agency to which you are responding. It is important to note that the requirements may vary between divisions (offices and directorates) within an individual agency.
The Institute of Education Sciences (IES) within the Education Department (ED), for example, requires information in 9 areas, with a maximum of 5 pages. IES DMP requirements specifically revolve around data sharing, the implication being that data sharing is important and must be adequately addressed for a grant application to be successful. However, an emphasis on data sharing is evident in all federal grants.
Data sharing is important in 2 of the 5 major areas in the overal NSF data management plan with a maximum of 2 pages for the entire plan. The generic categories in the DMP for NSF are reflective of the content required by other federal agencies and so are discussed below (as posted by NSF, Feb. 2019); again, note that DMP requirements may vary between offices within NSF. Also note that parts 3 and 4 (italicized here) from the generic NSF data plan involve data sharing.
- the types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project;
- the standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies);
- policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements;
- policies and provisions for re-use, re-distribution, and the production of derivatives; and
- plans for archiving data, samples, and other research products, and for preservation of access to them.
What sorts of information should you include in the 5 NSF categories above?
Information specific to ETSU is underlined.
- In plain terms, what do the data describe? Example: The data are height (m), girth (cm), and estimated age (yrs) of live trees. Now, proceed with specifics. Are you collecting the data or using an existing dataset? What is the source of the dataset? Will the new data be added to existing data? What are the instruments for data collection/measurement? Are the data to be collected in a conventional manner? If not, why not? Are the instruments calibrated at certain intervals, and otherwise cleaned and maintained for quality control purposes? Who is in charge of quality control? What do you do if a machine is involved and the machine breaks down in the middle of measuring data? Where will the data be collected? Who will be collecting the data? Who will be supervising data collection, storage, and backup? What file types are being created or used? What is the general nature of the data (numeric, text, 2D or 3D image, audio, surveys, student records, patient records, video, other)? What is the predicted magnitude of the dataset(s) (megabytes, gigabytes, terabytes)? How many files will you be dealing with (dozens, hundreds, thousands, millions)? Where are the data stored and backed up? Make reference to the ETSU Digital Research Data Storage and Backup Policy which requires all research data to be backed up to one of 3 locations - ETSU network drives, ETSU OneDrive for Business Accounts, or ETSU-managed AWS accounts. Identify where your data are stored and backed up. Are original data protected from being over-written (versioning on), altered, or deleted (via authorization levels and activity logging)? All of these protections can be set up if your data are backed up on AWS. Is there any additional data processing (cleaning, conversion, coding)? What are the data analysis software tools used? Are proprietary files exported in an atlernate format prior to analysis? Is the analysis performed with a proprietary tool? Are varables measured by a machine converted to an alternate variable by a software program? If so, are the original measured variables also saved?
- In plain terms, are the data expressed in conventional units? Example: Data are measured using methods and units of measure recommended by the International Union of Forest Research Organizations. If there are no standards, document your solution. Now, proceed with specifics. Is the funding agency likely to recognize those as convention methods and units? Are the file formats you are using conventional or proprietary and not easily read by others? If you will not be using conventional file formats for the discipline, why not? What is the naming convention for your files? What metadata are found within the file name? What metadata are found within the file properties? What metadata are contained within the body of the file? Are there additional metadata contained within associated files (possibly files following a parallel naming convention but with an alternate extension and securely stored alongside of the data files)? Is there field specific metadata that should be collected along with the data? Have you developed a "readme" file that summarizes all of the metadata in a single file? Metadata basics include the who, what, when, where, and how of your research along with any other information that might of use to others in interpretation of your data. Who is responsible for metadata for the study?
- Carefully review requirements for data sharing; be aware that federal funding agencies favor data sharing. Data sharing means that you will make the raw data that were analyzed, resulting in a publication, available to others who wish to examine the data, for any reason. If policies, regulations, agreements, and applicable laws permit you to make datasets and publications available for continuous, free, public access, you should do so. An effective means to meet these standards is via the Digital Commons managed by the ETSU Library. Information about the Digital Commons is at https://libraries.etsu.edu/research/digilib/digitalcommons. The Digital Commons is accessed here - https://dc.etsu.edu/. Use of the Digital Commons is paid for by ETSU Libraries and Academic Affairs. If your data will be de-identiifed by an IRB-approved method prior to sharing, or modified from a proprietary format to make data more accessible for those without specific software, note these modifications here. If you are willing to make any additional metadata or readme files available, you should note that here. If you choose to identify obstacles to sharing, explain how you will overcome the obstacles, not how they will prevent you from sharing data. If there is a specific reason that you cannot share data immediately upon publication - for example, you are examining artifacts donated to an archive under the condition that they will not be made publicly available for 50 years - explain the reason and indicate when the data will become available. If release of data is subject to any other limitations from funding agencies, data owners (e.g. if you are studying a dataset owned by an external organization), or is subject to a waiting period following publication of your study, for any logical reason, explain that here. Do not argue that the cost of data sharing is prohibitive or that others in your discipline do not share data; such arguments may eliminate your application from consideration. While the cost of sharing in a field specific archive might be considered excessive, data sharing through the Digital Commons managed by the ETSU Library is still possible.
- In plain terms, are you placing any provisions or requirements on re-use or re-distribution of the data or on the production of any derivatives? Example: Use of the dataset, alone or in combination with any other data and for any purpose, should acknowledge, by name and institution, the creator of the dataset; the dataset may be re-distributed if accompanied by the same acknowledgement. If your goal is to have people cite and acknowledge your work, keep the provisions simple. If you believe that people working in many areas are likely to have interest in the data, note the areas here. If the data may only be provided in de-identified format, note that.
- If you have indicated that, following publication, data will be stored in the Digital Commons managed by the ETSU Library, you should note in your DMP that this is part of your long-term archival solution. If you plan to store the data in any other archive, note that here. If you intend to archive additional metadata files that help with interpretation of the data, you should note that here, in addition to the archival method for metadata files. Minimum data retention standards are addressed in the ETSU Research Data Ownership and Retention Policy - these are minimum requirements. ITS-RCS strongly recommends that you do not delete your research data unless there is a specific security or privacy protection requirement to do so by law, policy, regulation, or funding agency requirement. ITS-RCS also maintains a long-term research data archive on behalf of the Vice Provost for Research and Sponsored Programs; this long-term archive resides on AWS, is available to all ETSU researchers, and can be used to store any type of data file; the current plan is to maintain data from any given study for as long as there is an identifiable data custodian for the data from that study. Departing researchers, including facutly, staff, and student researchers, are required to leave a copy of their data at ETSU upon departure. For staff and students, this would generally mean that a copy must be left with the PI for the study. For faculty, this would generally mean that a copy of the data must be left with their Department Chair. Transfer of data to another institution is possible subject to requirements specified in the ETSU Research Data Ownership and Retention Policy.