A guide to finding and utilizing standardized terms to facilitate reuse of research data.
What Is a Controlled Vocabulary?
According to the book Introduction to Controlled Vocabularies, “a controlled vocabulary is an organized arrangement of words and phrases used to index content and/or to retrieve content.... It typically includes preferred and variant terms and has a defined scope or describes a specific domain.”
Put simply, a controlled vocabulary is a list of preferred or standardized terms in a particular domain. Often, the phrase “controlled vocabulary” is used interchangeably with “ontology” or “taxonomy,” though each term has a slightly different technical definition. Usually, a controlled vocabulary also includes a definition and alternate terms for each entry.
Why Use a Controlled Vocabulary?
As a researcher, you have probably run into a situation where a concept you were searching for in the literature existed under multiple terms. As an example, let’s say that you want to try an intervention where patients with anxiety take nature walks. You want to search the literature to find articles with similar studies. You would need to search for terms such as “forest bathing,” “nature therapy,” and “nature intervention,” because terminology for this type of intervention is not standardized.
Controlled vocabularies help to mitigate this problem by standardizing the terms used to categorize research articles and datasets. If datasets use standard vocabularies, those who want to reuse data can search for and retrieve it more easily. Moreover, because controlled vocabularies include definitions of each term, researchers can be sure they are using the words the same way.
Examples of Controlled Vocabularies
These examples are general-purpose controlled vocabularies. There are many other controlled vocabularies that are specific to a field or domain.
Names
- Library of Congress Name Authority File (LCNAF) includes names for titles, persons, groups, and geographic locations.
- Virtual International Authority File (VIAF) combines name authority files from national libraries and research institutions around the world.
- Countries and their subdivisions: ISO 3166
- Dates: ISO 8601
Concepts, Objects, Topics, etc.
- Library of Congress Subject Headings (LCSH) is a massive list of terms used to describe resources of various formats.
- Wikidata contains controlled vocabularies from Wikimedia.
Languages
- ISO 639-2 Codes for Representation of Names of Languages provides two- and three-letter codes for identifying languages.
Using a Controlled Vocabulary: MeSH
MeSH (Medical Subject Headings), created and maintained by the National Library of Medicine, is the most common controlled vocabulary for medicine. It is used to catalog articles in PubMed, the widely used database of medical abstracts.
Let’s say you are a cancer researcher and you are submitting a dataset to a repository that uses MeSH to categorize its datasets. You want to find the controlled term MeSH uses for “cancer.”
First, go to the homepage for MESH. Using the search bar, search for “cancer.”
The top result is “Neoplasms.” This is the MeSH standardized term for cancer. If you click on the result, you can see the full entry.
Since the most common usage of MeSH is in PubMed, many of the items on this page have to do with building a PubMed search. However, you can still find a definition of the term and synonyms for this term (called “Entry Terms” here). MeSH is structured as a tree, with a hierarchy of terms, which you can see shown at the bottom.
In our example, you now know that the preferred term for “cancer” is “neoplasm.”
Sources
Bolam, Mike. “Metadata & Discovery @ Pitt: Taxonomies and Controlled Vocabularies.” University of Pittsburgh Library System, September 26, 2023. https://pitt.libguides.com/metadatadiscovery/controlledvocabularies.
Cofield, Melanie. “Metadata Basics: Controlled Vocabularies.” University of Texas Libraries, August 28, 2023. https://guides.lib.utexas.edu/metadata-basics/controlled-vocabs.
Harpring, Patricia, and Murtha Baca. Introduction to Controlled Vocabularies: Terminology for Art, Architecture, and Other Cultural Works. Updated edition = Second edition. Introduction To. Los Angeles, California: Getty Research Institute, 2013.
Pasquier, Guillaume. “Research Data Management: Standard Vocabularies.” Geneva Graduate Institute Kathryn and Shelby Cullom Davis Library, January 26, 2024. https://libguides.graduateinstitute.ch/rdm/vocabularies.