Basic data management guidelines: How do you organize and document your data?

It pays to organise and document your data!

Case 1: Many a thesis writer has been there: after a break, you find it difficult to start again because you no longer remember all the details about the material.

► Metadata, or information that describes the data, helps you understand the nature of the data.

Case 2: It is difficult to share data with other members in a group if everyone has produced and processed the data independently, without a shared plan.

► The selection of file formats facilitates the sharing and joint use of data in the long term.

► Naming files and organising folders makes the data easier to find.

An editable table template, including heading rows and designed for shared or individual use, helps collect data systematically. A structured table is handy when analysing data with statistical software, such as SPSS or R.

Case 3: You have made changes to your data that prove to be wrong – but there is no returning to the old version.

► Version control makes data processing safe.

Selection of file format

The selection of the file format has an impact on both research work and data usability in the long run. While there is no clear-cut recommendation for formats, keep the following basic principles in mind when selecting one:

  • Make your selection at an early stage to avoid a variety of file formats and format conversions.
  • The main selection criterion is that the format is fit for purpose.
  • The recommended file formats are those common and popular in the field.
  • A simple list of the recommended formats is available on the UK Data Service website.

Naming files and organising folders

By naming your files and organising your folders appropriately, you make your data easier to find and the data content easier to comprehend. There are a few rules of thumb for naming:

  • Ideally, naming conventions should be determined at the outset of each project.
  • Consistency and clarity are the key naming principles.
  • A good file name is logically constructed (e.g., based on the date) and describes the content (for an example, see the Purdue University website). You can also find helpful tips for naming on the University of Edinburgh website.

Plan your folder structure based on your needs. For example, store raw data, edited data, methods, documentation, the manuscript and presentations in separate folders.

Version control

Version control is an important part of organising data. Data processing results in various versions, and you want to be able to return to an earlier version, if required. Version control can be automated (the recommended alternative) or done manually. Store the original file or raw data separately to avoid accidentally editing it.

In automated version control, the system creates and keeps track of the different versions.

► Tools such as Git are also available for more advanced version control (see also Instructions for using Git).

In manual version control, the user is responsible for creating and managing versions (NB! The importance of naming).

► This is suitable for small amounts of data which the producer manages alone.

Metadata – descriptive data

Documenting data means describing data.

Metadata (metadata = ‘data about data’), or descriptive data, makes data understandable, findable and usable. It indicates

  • The type of data
  • What has been done to the data
  • The location of the data

The name of the dataset is the simplest kind of metadata. Descriptive data can also be related to the

  • Content of data
  • Collection of data (methods, devices, software)
  • Processing of data
  • Storage locations and terms of use, or the licence needed to access the data

Matters to consider when producing metadata

  • The earlier you begin describing the data, the easier it is – the quality of metadata suffers from being produced afterwards.
  • If possible, use metadata models: data can be described according to a specific model (metadata formats) or freely. Different disciplines use different metadata formats.
  • Create descriptive files: metadata is usually provided in a readme.txt file, data dictionary or codebook.
  • Further instructions for comprehensive data description and documentation are available on the FSD's guide.
  • Making a research project understandable - Guide for data documentation

What is metadata?