Metadata

Metadata literally means ‘data about data’. In the context of research data management, Metadata may refer to a highly structured, machine-readable subset of data documentation that may be indexed and stored within a database. The terms metadata and data documentation are often used interchangeably.

Creating metadata

Metadata may be created manually, by filling in fields in a spreadsheet or database for example, or created automatically, by being recorded by an instrument. The cost of manual metadata creation is significantly greater than that of automatic metadata creation. There are various tools available for automating metadata capture – some workflow and Lab notebook management tools are listed by the DCC.

Storing metadata
  • Metadata may be held within the data file itself. Many digital file formats include metadata fields, which are often automatically recorded by the instruments collecting the data. For example, digital cameras record the creation time and date in a .jpg file.
  • Metadata describing a data file may be held in a separate, structured document (often an xml file). This metadata file may be associated with the data file by being appropriately named and located in the same folder.
  • Metadata describing a collection of data files may be aggregated into a single, structured spreadsheet or database file. This makes the metadata easier to search and update.
  • Metadata held in semi-structured or unstructured forms (such as a readme.txt file) are here described as data documentation

Metadata schemas

Metadata needs to be standardised to be useful – common aspects such as language, spelling and date formats need agreement. Metadata schemas are arrangements in which metadata are structured, specifying the content, format and organisation of metadata elements. The metadata schema will specify fields (or parameters) with standardised formats for their content (or value). Metadata schemas range from the generic such as Dublin Core, generic schemas for research data, such as the Datacite Metadata Schema and the Data Documentation Initiative (DDI), to those developed for specific disciplines and the sorts of data they produce - For more information on these, see the DCC page on Disciplinary Metadata.
Types of Metadata
  1. Catalogue metadata – for identifying and describing a data object (see below)
  2. Reuse / discipline specific metadata – Richer metadata, including subject specific terms and descriptions. May involve controlled lists and specialist ontologies.
  3. Administrative metadata – information required to manage a resource: Technical metadata (describing file type, size and creation date), Rights management metadata (describes IPR), and Preservation metadata (information about archiving processes).
  4. Structural metadata – describes relationships between the different components of a resource.
Catalogue Metadata This is the information required to identify a dataset, to search and discover a dataset in a repository catalogue, and to cite a dataset. This is the information required by a data repository when depositing a dataset.

The Datacite schema has five mandatory elements necessary for a citation:

  1. Identifier – a unique string that identifies a resource (e.g. a DOI)
  2. Creator – the name, identifier and affiliation of the researcher(s) producing the data
  3. Title – the name of the resource
  4. Publisher – the entity that makes the resource accessible
  5. Publication year – when the resource becomes accessible

Several optional elements:

  1. Subject – subject classification, keywords or phrases describing the resource
  2. Contributor – name of person or institution contributing to the development of the resource with the type of contribution made
  3. Date – different dates and date ranges relevant to the work and the type of date (e.g. data collection dates)
  4. Language – primary language of resource
  5. Resource type – form of resource (e.g. dataset, collection, audiovisual, software, workflow etc.)
  6. Alternative identifier - other than a DOI
  7. Related identifier – reference to related resources (e.g. DOI of an article based on the data)
  8. Size – free text size information about size of resource or components
  9. Format – technical format of the resource or components
  10. Version – version number
  11. Rights – rights management statement giving information about ownership, copyright, access and embargoes
  12. Description – Free text description and technical information about the resources (e.g. abstract, methods, table of contents and structure of the resource)
  13. Geolocation – spatial region (point or area) or named place where data was collected or which is focus of data.

All repositories require the mandatory metadata elements; many make description, subject, resource type and rights metadata mandatory too. Most will require additional information, Funder and Grant code in particular. Some repositories employ specialised metadata schemas appropriate for their subject area, which require input of discipline specific information.

Discipline specific / Reuse metadata

Specialised schemas employed by some research data repositories, will include additional fields, controlled vocabularies and specialised ontologies allowing a dataset record to provide enough information to make the data understandable and reusable. For more information on these, see the DCC page on Disciplinary Metadata.

Basic ‘catalogue’ metadata schemas, such as the Datacite schema, may be adequate in providing this richer metadata, as information about the methods involved in creating or collecting and processing the data may be included in the ‘description’ field of a ‘catalogue’ metadata schema. Alternatively, it may be more convenient to give detailed information in a separate ‘data document’ (which will also require its own catalogue metadata description). Such data documents may be associated with the dataset file, by being included in the same file set or by being identified in the Related identifier / Reference field of the dataset metadata. For more information, see the page on Describing your data.

For more on metadata see ANDS Metadata Guide (working level)

For further information, please contact rdm@sheffield.ac.uk