Describing your data

Data Documentation

Data need to be documented to be understood and managed. Data documents describe the content, context and structure of data. They describe the conditions and processes involved in the creation or collection of the data, the processing of the data and the context of the research. Documenting your data will help you and your collaborators understand your data during the project and in the future, and allows others to replicate and verify the research. Documentation is required for unambiguous identification, allowing searching and retrieval of datasets from repositories and for citation.

Metadata

The terms metadata and data documentation are often used interchangeably. Metadata literally means ‘data about data’. In the context of research data management, Metadata often refers to a highly structured, machine-readable subset of data documentation that may be indexed and stored within a database. A catalogue record is a good example of metadata. See Metadata for more details.

What to document?

This will be the information required by another researcher (in your discipline) to understand the data, typically:

Project level

  • Basic description of the research – why the data was collected
  • Methodologies, protocols, sampling techniques – how data was collected and processed
  • Equipment used – settings and calibrations
  • Software, code and algorithms used
  • Survey and questionnaire text
  • Codebooks, classification systems and abbreviations used
  • Details of 3rd party data used

Dataset / File level

  • Basic description of the dataset - Catalogue metadata: Title, creator, date, subject, format, keywords
  • Information about terms of use or restrictions
  • Data structure – relationships between different files of a dataset or tables in a database

Data item / Variable level

  • Detailed description of variables, field names in spreadsheets and databases, units of measurement used
  • Anomalies, irregularities, questionable results

Documentation is best created when data are collected and processed, as it may be difficult to remember details later on. At the start of the research, decide what information to record and build the documentation process into the research workflow.

Where to record documentation?

  • Embedded within a data file – such information may be provided in a worksheet of a spreadsheet, or a header section of a html document. Many formats can hold metadata as well as data content (e.g. .jpg, .tiff, .wav files)
  • In an unstructured document – such as a ‘readme.txt’ file providing basic information as free text
  • In a metadata file – providing information in a structured form. See more on Metadata
  • In a database file – providing information with links to the data files described
  • In a Lab notebook or Research journal – these, or documents derived from these, will need to be in digital format or digitised to facilitate data reuse

How to provide documentation?

  • Data document files may be grouped together with associated data files in a fileset and deposited in a research data repository
  • As supplementary material for a journal article – although it may be possible to provide enough information in the article itself to understand the underlying data
  • In webpages on a project website – here the context of the research may be provided and research reports may be published. This method of dissemination is only useful if the website has guaranteed long-term persistence.

For more information about data documentation, please see UK Data Archive Documenting your data. 

For further information, please contact rdm@sheffield.ac.uk