Choosing Stable File Formats: A Guide for Data Preservation

Find the best format to preserve your data after you share it or send it to cold storage.

Why Is Using a Stable File Format Important?

Anyone who has tried to load a file created in an obsolete software program knows the pain of unstable file formats. For example, you may have attempted to load an old document created in WordPerfect and saved in the obsolete format .wpd. File formats can become obsolete, orphaned, or subject to abandonware, when the creator of a program abandons it. Stable file formats are formats that are unlikely to suffer from these issues. Using a stable file format helps to preserve your data for you and other researchers who may want to use it in the future.

What File Format Should I Use?

Stable file formats have these characteristics:

  • Non-proprietary: A proprietary format is created and controlled by a single company. For instance, .pptx is a proprietary format of Microsoft. By contrast, non-proprietary formats can be used across multiple operating systems and pieces of software without restriction. When using proprietary software, you can often choose to export to a non-proprietary format.
  • Uncompressed: Compression algorithms modify your data in order to make files smaller by rounding off bits of “nonessential” information. If your data analysis was done on uncompressed data, sharing only the compressed data can make your results nonreproducible.
  • Unencrypted: The type of encryption in popular use in software can change over time, making older files inaccessible.
  • Commonly used in your research community: Using common file types makes your work accessible to a wider set of researchers.

  Data type

Preferred file format examples

  Containers

TAR, GZIP, ZIP

  Databases

XML, CSV, SQLITE

  Geospatial

SHP, DBF, GeoTIFF, NetCDF

  Moving images

MOV, MPEG, AVI, MXF

  Sounds

WAVE, AIFF, MP3, MXF

  Statistics

ASCII, DTA, POR, SAS, SAV

  Still images

TIFF, JPEG2000, PDF, PNG, GIF, BMP

  Tabular data

CSV

  Text

XML, PDF/A, HTML, ASCII, UTF-8

  Web archive

WARC


(Table Source)


In addition to this general advice, some repositories give directions to researchers on stable file formats to use – for example, Dryad. Your funder or research information technology department may also have preferred file formats.

Some data does not have a file format that reaches the standards laid out here, and it must be saved in a proprietary format. When sharing data in a proprietary format, document in your README file the name of the program (and version number, if applicable) that can be used to read the data.

Some data types – for instance, GIS files – require multiple files working together to be read. In this case, make sure you supply all the files needed and document the file structure in your README.


Back to top of page

Additional Resources

Sources

Bischof, Steve. “Managing Your Data: Use Stable File Formats.” University of Massachusetts Amherst Libraries, November 28, 2023. https://guides.library.umass.edu/data/care/use.

Darragh, Jen. “Research Data Management: File Formats.” Duke University Libraries, October 10, 2023. https://guides.library.duke.edu/c.php?g=633433&p=4429351.