This article provides a comprehensive, step-by-step guide to Element 3 of the National Institutes of Health (NIH) Data Management and Sharing (DMS) Policy: Data Standards.
Suggestions: Review the Overview of 2023 NIH Data Management and Sharing Policy for information on all areas and requirements for submitting an NIH DMS Plan with your grant application.
For example plans, see List of Sample Data Management and Sharing Plans.
Element 3: Data Standards – Content
Requirement Examples Fill-in-the-Blank Template NIH Guidance
Element 3: Requirement
State what common data standards will be applied to the scientific data and associated metadata to enable interoperability of datasets and resources, and provide the name(s) of the data standards that will be applied and describe how these data standards will be applied to the scientific data generated by the research proposed in this project. If applicable, indicate that no consensus standards exist.
According to the NNLM data glossary, a data standard is:
a type of standard, which is an agreed-upon approach to allow for consistent measurement, qualification, or exchange of an object, process, or unit of information. Data standards refer to organizing, documenting, and formatting data to aid in data aggregation, sharing, and reuse.
Data standards help to promote the FAIR principles. When you use an accepted data standard, you make your work more findable and interoperable. This approach enhances the possibilities for data reuse by making data sets compatible with meta-analysis. For example, you might use a standardized survey when collecting data, which means that your data can easily be compared and combined with data from others using the same instrument. However, most researchers are not trained to find and utilize data standards. In addition, many fields have yet to develop commonly used standards or have several competing standards.
Tip: It is possible that your work will employ multiple formal standards or a mix of formal standards and other data management strategies. Be sure to be as specific as possible when describing the standards used for each type of data included in your proposal.
Here are some examples of data standards:
-
File type: When curating a dataset to share, researchers should convert their data to an open file format, if possible. For instance, a spreadsheet should be made available as a CSV rather than an Excel document (XLSX). Using standardized open file types is a data standard.
-
Controlled vocabularies/ontologies: A controlled vocabulary defines the specific terms that may be used in a given data field – e.g., MESH for medical subject terms or the Gene Ontology (GO) for genomics. Controlled vocabularies ensure interoperability across studies.
-
Minimum information: Minimum information standards, such as the MINSEQE, specify the minimum amount of metadata and data required for different data types. This helps to facilitate reuse and prevent mystery datasets without documentation from coming into a repository.
-
Metadata schema: A metadata schema defines the elements of metadata for an object and how those elements can be used to describe a specific resource.
One strategy in finding appropriate data standards is to work backward from your chosen repository. Most repositories have documentation about the standards they use, and much of the information can be copied from the repository’s documentation into Element 3 of your DMS Plan. For example, the NIMH National Data Archive has a page describing its data standards. If using the repository's information is not possible, see below in the additional resources section.
Tip: If you are still unable to find a useful data standard for your data, you must indicate in your plan that no data standard is available.
Element 3: Examples
Sample Plan Text
To improve the interoperability of datasets, we will use open file formats (e.g., CSV, TXT, MP4, PDF) whenever possible and convert a proprietary file format (.APV) to an open file format (.CSV). The neuroscience community has yet to agree on a single standard data format that is generated by all acquisition systems, so we will use .CSV where possible for data that will be preserved and shared. We will collect metadata using common standards (e.g., ISO 8601 for date/time, instrument settings, software name and version) and PIDs (e.g., ORCID iDs for researchers, funder ID, grant number, DOI for protocols, RRID for resources) to facilitate interpretation of data and interoperability. (Source)
Fill-in-the-Blank Template
Data will be stored in common and open formats, such as ____ [list formats] for our ____ [type of data] data. Information needed to make use of this data [e.g., the meaning of variable names, codes, information about missing data, other metadata, etc.], along with references to the sources of those standardized names and metadata items, will be included wherever applicable.
If there are formal data standards for some/all of the data:
Whenever possible, we will use ______ [common data elements, standardized survey instruments, etc.] to structure and organize our data. Our ____ [type of data] data will be structured and described using the ____ [specify standard] standard, which has been widely adopted in the ____ [research field] community. [Add additional information about this standard, if applicable – e.g., implementation in data repositories, utility in combining/reusing datasets.]
If there are no formal data standards for some/all of the data:
Formal standards for ____ [type of data] data have not yet been widely adopted. However, our data and other materials will be structured and described according to best practices, which are as follows: _________ [list appropriate best practices].
Element 3: NIH Guidance
While many scientific fields have developed and adopted common data standards, others have not. In such cases, the plan may indicate that no consensus data standards exist for the scientific data and metadata to be generated, preserved, and shared.
Tip: Using the DMPTool
There are currently no specific formatting requirements included in the NIH DMS Application Guide. However, there is a helpful DMPTool, a free online wizard that walks you through the process of creating an NIH-compliant DMS Plan. The information in this article includes examples from DMPTool.
NIH Guide Notice: As outlined in the NIH Guide Notice Supplemental Policy Information: Elements of an NIH Data Management and Sharing Plan, DMS Plans should address six elements (areas): Data Type; Tools, Software, and Code; Data Standards; Data Preservation, Access, and Associated Timelines; Access, Distribution, or Reuse Considerations; and Oversight of Data Management and Sharing, as described in the Application Guide. The NIH suggests that a DMS Plan be no more than two pages. The plan should be attached to the application as a PDF file, as outlined in the NIH’s Format Attachments page.
Additional Resources
- Research Data Alliance metadata standards catalog
- Fairsharing has a registry of standards.
- Bioportal Ontologies catalogs ontologies, with a biomedical focus.
- DCC catalogue of disciplinary metadata standards
- NIH Common Data Elements Repository. Though this resource is outside the scope of this article, some ICOs ask specifically about common data elements, which are an initiative of the NIH. More information on using CDEs is available here.
Related Articles
- NIH DMS Policy Overview: This article includes an overview of the NIH DMS Policy and links to all other Element articles.
- Sample NIH Data Management and Sharing Plans: Examples of completed DMS Plans.
Sources
NIH Template Working Group*. (2023). DMPTool NIH-Default DMSP template, v9. In California Digital Library (Ed.), DMPTool [DMP authoring software]. Retrieved from https://dmptool.org/template_export/118304408.pdf
* More information on the NIH Template Working Group history and membership can be found at https://blog.dmptool.org/2022/08/18/supporting-the-upcoming-nih-data-sharing-requirements-with-the-dmptool/