Creating Data Dictionaries for Research Projects

A guide to describing the variables in your datasets to facilitate communication and reusability.

 

Data Management: Data Dictionaries

What Is a Data Dictionary?

According to the NNLM Data Glossary, “a data dictionary is a document that outlines the structure, content, and meaning of a given variable.” More generally speaking, a data dictionary describes each variable included in a dataset. While data dictionaries can be in any format, most commonly they are created in the form of a spreadsheet.

Why Are Data Dictionaries Important?

A data dictionary helps to make sure that all members of the research team are using standardized definitions for each variable collected. Data dictionaries also increase reusability for researchers both inside and outside the team. When data is shared and reused, data dictionaries help researchers interpret the dataset correctly.

What Should I Include in my Data Dictionary?

Basic fields

  • Variable name: This is the variable name exactly as it appears in your dataset.
  • Variable description: A short explanation of the variable and the way you are using it. Avoid circular definitions.
  • Variable units
  • Allowed values: This shows the acceptable values for this variable, with minimum and maximum.
  • Known issues
  • Relationship to other variables 

Optional fields that you may need depending on your project

  • Null value for the variable
  • Variable format
  • Synonyms for variable name

You can also include anything else that would help another researcher understand your project.

Example of a data dictionary. (Source)

Additional Resources

Sources

Briney, Kristin. “Data Dictionaries.” Data Ab Initio (blog), August 5, 2014. http://dataabinitio.com/?p=454.

NNLM. “Data Dictionary.” Accessed January 26, 2024. https://www.nnlm.gov/guides/data-glossary/data-dictionary.

OSF Support. “How to Make a Data Dictionary,” May 5, 2023. https://help.osf.io/article/217-how-to-make-a-data-dictionary.

 

Back to top of page