Making data discoverable

After selecting research data to be preserved, you will need to decide which of these should be shared publicly in a data centre or repository.

Shared data should be made as discoverable and accessible as permitted by funders and other stakeholders. This is not only beneficial for other researchers; you may need to find and use the data yourself.

A simple way to do this is to make your shared data FAIR.

RDM FAIR

To ensure your data is deposited and shared in a FAIR and appropriate way, use your data management plan and the links below.

Metadata

Metadata means ‘data about data’ or ‘information about information’. An example of this is a library catalogue record containing information about a physical or digital resource. Research metadata works in the same way, enabling researchers to assess whether the data might be useful to them, and to access them if possible.

When placing data in a repository, you will usually need to complete a number of metadata fields. Repositories normally specify the format for this metadata and often follow an established schema with controlled vocabulary. Examples of these schemas are Dublin Core and Datacite Metadata Schema. For more information, see the Digital Curation Centre guidance on disciplinary metadata.

While basic metadata such as title, creator, publisher and publication date are usually mandatory, individual repositories may require other descriptive and technical details. You should check these requirements when you choose a repository and ensure you retain the relevant information throughout your project.

Find more information about metadata from the UK Data Service.

NB The terms ‘data documentation’ and ‘metadata’ are sometimes used interchangeably. ‘Data documentation’ generally refers to information stored with or as part of a dataset that enables understanding and reuse of data, whereas ‘metadata’ usually refers to details in a repository record that enable data discovery and access.

DOIs

A digital object identifier (DOI) is a persistent serial number identifying a digital object, such as a research paper, monograph or dataset. Objects assigned a DOI are guaranteed to be accessible on the web for the foreseeable future, and can always be found using the DOI even if they move to a different website (eg when a journal moves to a new publisher). DOIs can save time when adding outputs to databases, for example, and there are also online tools that use DOIs to format citations and track media attention.

A DOI is automatically allocated to datasets deposited in ORDA and a number of other data repositories.

Linking publications to data

The best way to link a publication to related data is to include a DOI to the dataset(s), if available, in the references section. Citing data in this way establishes a two-way link between publication and dataset, which many discovery and metrics tools will recognise. This also encourages others to cite your data if they reuse it, ensuring that you get credit for the work you’ve done.

If a citation of this type isn’t possible, you should include a short paragraph in your publication stating how the data may be obtained and under what conditions or restrictions. EPSRC and a number of other funders require a data access statement such as this to be included in research-related publications. Example statements include:

  • Data supporting this publication can be freely downloaded from the University of Sheffield Research Data Repository at https://doi.org/<your doi here>, under the terms of the Creative Commons Attribution (CC BY) licence.
  • Data supporting this publication include personal information, and may be obtained by contacting <group email address>@sheffield.ac.uk. A signed Data Sharing Agreement may be required to comply with patient consent.
  • Data supporting this publication are confidential, and can only be supplied by our industrial partner, <name>.

In many cases, data should be available for at least ten years after publication. An individual email address, or advice to ‘contact the corresponding author’, is therefore unsuitable for handling data requests. A department or research group address should be used instead.

Further examples of data access statements can be found on the University of Manchester website.

Data deposit checklist When depositing and sharing data, you should always check that:
  • Permission to deposit and share has been granted by all holders of IPR in the data, as well as right-holders of third-party material
  • Data is shared in compliance with funders’ data sharing requirements and terms of participants’ consent
  • Data is deposited separately under different access conditions if required
  • Files are organised in a meaningful structure, and use open or commonly used formats
  • Accurate metadata and documentation make data discoverable, understandable and reusable
  • Software code developed to generate or process data is submitted to a repository and cross-referenced to the dataset
  • An embargo period is applied if required by your funder or other stakeholders
  • An appropriate reuse licence is selected
  • Data is submitted to your funder’s data centre or a suitable data repository, and the details are registered in ORDA
  • Related publications are linked to the data, ideally using a repository-allocated DOI

For further information, please contact rdm@sheffield.ac.uk