Research Collaboration and Data Management with Version Control

This guide defines version control and discusses how to use it in your research.

What Is Version Control?

Version control refers to the concept of keeping a record of changes made to a file. For example, a researcher might have a draft of a paper (version 1) that they then go through and edit (version 2). Each new iteration of the file is saved as a new version with its own number. Depending on the version-control system employed, the researcher can also record metadata like the author, date, and rationale for the changes. By convention, whole numbers are used for new major revisions (e.g., v2.0) and decimals for minor revisions (e.g., v2.1).

Why Is Version Control Important?

Using version control allows researchers to “roll back” to previous versions, like using an undo button that covers the full duration of a project. This can be very useful if researchers make mistakes or go down a fruitless path of analysis.

Version control is also useful for collaboration. Through version control, you can merge changes made by different users at the same time (similarly to how multiple people can edit a Google Docs document).

Finally, version control provides a trail of your work over time. You can use this trail to review your methodology when writing your final publication or if you are ever subject to an audit.

Using Version Control

Manually

The simplest form of version control is adding a version number to your documents. You can do this manually through your file name. For example: linearregression1_v4.1.csv. Avoid using file names with words like “final” or “complete,” which are ambiguous.

Git

Git is a popular version-control software that can run locally on your computer. Git is commonly associated with version control, but it is not the only program that can help automate the process. Git is often used in concert with Github, a web interface for sharing files and working with Git. Git is most commonly used by programmers for coding projects, but it can be used for other types of data as well.

Git is generally controlled through the command line, though Github does have a desktop program for those who want to use a GUI interface. If you would like to learn to use Git, here are some tutorials:

Cloud Storage and Sharing Platforms

More and more cloud collaboration platforms, such as Google Drive and Microsoft OneDrive, are adding built-in version-control features. Some platforms now implement version control automatically, allowing for past versions to be accessed through a “version history” feature, and some have specific processes to implement version control. Here is information on version control in the platforms most commonly used in research:

Sources

Darragh, Jen. “Research Data Management: File Versioning.” Duke University Libraries, October 10, 2023. https://guides.library.duke.edu/c.php?g=633433&p=4429292.

Huck, Jennifer. “Research Data Management: File Formats & Version Control.” University of Virginia Library. Accessed December 28, 2023. https://guides.lib.virginia.edu/c.php?g=515290&p=3522221.

Labou, Stephanie. “Data Science: Version Control & GitHub.” UC San Diego Library, November 14, 2023. https://ucsd.libguides.com/data-science/version-control.

UC Merced Library. “Version Control.” Accessed December 28, 2023. https://library.ucmerced.edu/version-control.