Guide to Organizing Research Datasets with READMEs

A guide to using README files to help organize and explain research datasets

What is a README?

A README file is a text file that is included with a dataset to provide information about the other files in the directory. READMEs can be used as reference documents by other members of your team and by researchers who want to reuse your data. Usually, a README is a plain text file titled README.txt and is located in the root (top-level) folder of your dataset. Supplementary READMEs can be provided for other groups of files or even for a single file. READMEs are traditionally used in software development, where they accompany the other files to explain what the software does and how to install it. However, scientific researchers have widely adopted READMEs due to their usefulness in explaining complicated datasets.

READMEs do not have a standardized format, and they can be customized to your research. Some researchers even prefer to use PDF/A or another open file format instead of plain text.

What should a README include?

A README is a flexible format that can be adapted to your needs. This generalized list of information to consider including in a README file is adapted from the Geneva Graduate Institute. However, you can add or omit sections to make the README work for you.

General information

  • Dataset title
  • Investigators’ names, roles, institutions, and contact information (include ORCID if available)
  • Project title (if any)
  • Grant information

Your data and the world

  • Licenses and restrictions placed on the dataset
  • Relationship to other datasets
  • Other resources used as sources for data collection (books, articles, etc.)
  • Links to publications based on the dataset

Data collection

  • Collection date (or date range)
  • Geographic location of collection (if appropriate)
  • Methods used for data collection (including references, documentation, links)
  • Experimental and environmental conditions of collection (if appropriate)
  • Standards and calibration for data collection (if applicable)
  • Uncertainty, precision, and accuracy of measurements (if appropriate)
  • Known problems and caveats (sampling, blanks, etc.)

Organization

  • Folder structure
  • File naming system (with examples)
  • Relationships and dependencies between files
  • Other documentation files of interest within dataset (notes, companion files)
  • For each major file, a short description of its contents and date of creation
  • Description of file versioning system (if appropriate)

Codebook

  • Definition of codes, symbols, and abbreviations used in files
  • List of variables with full name and definition
  • Definition of column headings and row labels for tabular data
  • Treatment of missing data (code, etc.)
  • Units used in measurements

Processing and quality assurance

  • Methods used for data processing
  • Software used in data collection and processing, including version numbers
  • File formats used in the dataset and recommended software
  • Quality control procedure(s) applied
  • Dataset changelog

Tip: In addition to the main README at the root of the dataset, you may want to create supplemental READMEs in larger datasets. This might mean one README for every subfolder or group of subfolders.

README Templates

README Examples

Back to top of page

Additional Resource

README Checklist from Harvard Libraries

Sources

Cornell Data Services. “Guide to Writing ‘Readme’ Style Metadata.” Accessed January 10, 2023. https://data.research.cornell.edu/data-management/sharing/readme/.

Pasquier, Guillaume. “Research Data Management: README.Txt.” Geneva Graduate Institute Kathryn and Shelby Cullom Davis Library, November 16, 2023. https://libguides.graduateinstitute.ch/rdm/readme.