Choosing Your Generalist Repository

This guide discusses how to select a generalist repository for your research data, with a focus on the repositories that are part of the GREI initiative.

What is a Generalist Repository?

Put simply, a generalist repository is a repository that accepts all types of data. Generalist repositories support researchers from all disciplines and accept almost all data types. These repositories are most commonly used by researchers whose data cannot go into a domain- or discipline-specific repository. Sometimes, there is no domain-specific repository for a specific field or data type, or the data is too large for the domain-specific repository to support.

For most generalist repositories, there is a fee for depositors of large amounts of data. Be sure to check with your repository to find out costs ahead of time and include them in your grant budget.

What is GREI?

The Generalist Repository Ecosystem Initiative (GREI) is an initiative of the National Institutes of Health (NIH) to bring together seven generalist repositories in a collaborative working group. The goal is establishing “a common set of cohesive and consistent capabilities, services, metrics, and social infrastructure” and encouraging adoption of the FAIR principles.

So far, GREI has undertaken projects like creating a new metadata standard for generalist repositories, which will make it easier for researchers to search across multiple repositories; compiling a comparison chart for GREI repositories, and educating researchers about the use of generalist repositories.

Generalist Repositories

The following highlighted descriptions of individual repositories are from the GREI Generalist Repository Comparison Chart.

Harvard Dataverse

Harvard Dataverse Repository is a free data repository open to all researchers from any discipline, both inside and outside of the Harvard community, where you can share, archive, cite, access, and explore research data.

Harvard Dataverse is the generalist repository arm of the Dataverse Project, which distributes open source repository software. The Dataverse software also powers a number of other domain-specific and institutional repositories.

Harvard Dataverse guidance for complying with the NIH Data Management and Sharing (DMS) Policy using their repository.

Dryad

Dryad is an open data publishing platform and community committed to the open availability and routine re-use of all research data. Dryad fully curates all data  and metadata and publishes exclusively under a Creative Commons Public Domain License (CC0).

Dryad focuses on data from the sciences. It does not support controlled or managed access.

Dryad guidance for complying with the NIH DMS Policy using their repository.

Figshare

Figshare is a freely available open data publishing platform for all researchers where they can share and get credit for all types of scholarly output, including any file type from any research discipline. The Figshare+ repository supports sharing of larger datasets.

Figshare is a product of the company Digital Science, which also makes Altmetric and Dimensions.

Figshare guidance for complying with the NIH DMS Policy using their repository.

Mendeley Data

Mendeley Data is a free repository specialized for research data. Search more than 20 million datasets indexed from 1000s of data repositories and collect and share datasets with the research community following the FAIR data principles.

Mendeley is a product of the publisher Elsevier. It has a 10GB limit on datasets.

OSF (Open Science Framework)

OSF is a free and open source project management tool that supports researchers throughout their entire project lifecycle in open science best practices.

OSF is a repository and a project management tool for scientists run by the Center for Open Science. While public projects are limited to 50GB and private projects to 5GB, OSF includes a number of add-ons and integrations with popular tools, including Google Drive and OneDrive, where more data can be stored.

OSF guidance for complying with the NIH DMS Policy using their repository.

Vivli

Vivli is an independent, nonprofit organization that has developed a global data-sharing and analytics platform. Our focus is on sharing individual participant-level data from completed clinical trials to serve the international research community.

Vivli focuses on data from clinical trials. It also features a secure cloud-computing environment where researchers can share and access individual-level data, without downloading it onto their local machines. However, data must still be de-identified before it can be uploaded to Vivli. Vivli charges $5,000 to $10,000 for individual depositors, but it waives fees for researchers who are part of a member institution.

Vivli guidance for complying with the NIH DMS Policy using their repository.

Zenodo

Powering Open Science, built on Open Source. Built by researchers for researchers. Run from the CERN Data Centre, whose purpose is long-term preservation of digital objects. CERN maintains one of the largest scientific datasets in the world for high-energy physics.

Zenodo has a dataset limit of 50GB, but higher quotas can be requested and granted on a case-by-case basis.

Zenodo guidance for complying with the NIH DMS Policy using their repository.

IEEE Dataport (not a GREI member)

IEEE Dataport is a product from the publisher IEEE. Dataport focuses on large datasets, with limits of 2TB per dataset for individual researchers or 10TB for those who are part of an institution with a subscription. Depositing a dataset is free, but depositors who are not part of a member institution must pay a $1,950 fee to make a dataset open access.

Additional Resource

Generalist Repository Selection Flowchart: The chart, a product of the Generalist Repository Ecosystem Initiative (GREI), is designed to guide users through a series of considerations for selecting an appopriate data repository.

Sources

Longwood Research Data Management. “Share & Publish | Data Management.” Accessed May 14, 2024. https://datamanagement.hms.harvard.edu/share-publish.

Stall, Shelley, Maryann E. Martone, Ishwar Chandramouliswaran, Lisa Federer, Julian Gautier, Jennifer Gibson, Mark Hahnel, et al. “Generalist Repository Comparison Chart,” May 17, 2023. https://doi.org/10.5281/ZENODO.7946938.