The National Library of Medicine (NLM) has released a new Dataset Catalog for beta testing by the public.
A data catalog serves as a directory of datasets. Unlike a repository, a data catalog does not store datasets; rather, it gives information about each dataset and where to find it in a public repository. Using a data catalog, researchers can search across different data repositories and find datasets of interest without knowing in which specific repository they are housed.
In a blog post announcing the new catalog, the NLM states that the goal of this project is to create an easy-to-use, all-in-one tool. The new data catalog could eventually function for datasets the way PubMed does for research literature: as a clearinghouse with links out to full text/datasets.
The catalog also serves as a test for the NLM’s new datasets metadata model (DATMM). All information about the datasets included in this catalog is converted to the DATMM format. According to the NLM, “By harmonizing and standardizing the structure of descriptive data, the Dataset Catalog facilitates discovery and reuse of biomedical datasets and will eventually make it easier to find and connect datasets to related objects on the Semantic Web.”
Currently, the repositories included in the beta are:
The NLM invites feedback on the beta version via the blue “Give Feedback” button on the right side of the dataset catalog website.