Ensuring your data are well organised and documented allows them to be located and understood more easily. This will help you and your collaborators during the project, and others who may wish to replicate and verify the research in future.
|Choosing data formats||
Formats used to create and collect data may vary according to discipline and practical requirements, and may include proprietary formats readable only using specific software. For sharing and long-term preservation, however, data should be stored using standard or open formats. This will help to ensure the data remains accessible as technology progresses and changes. Planning what these formats will be at the beginning of a research project will reduce the risk of data being locked into a proprietary format, and the formats chosen should be detailed in your data management plan.
The UK Data Service provides advice on formatting your data.
|Organising and naming files||
Devising a folder structure and file naming convention at the start of a project makes it easier to manage and keep track of data. Elements within file names should be informative and consistently ordered, to provide version clarity and reduce risk of errors. They should also be standardised in terms of vocabulary, punctuation and numerical format. Useful elements to incorporate in file names include:
A hierarchical structure grouped by topic is recommended, with ongoing and completed work separated. Files that need to be more widely accessible may be copied or moved to higher-level folders, where permissions are easier to set.
Many versions of the same file may be created during the course of your research. It is important to differentiate between these using version control. Your version control strategy should be consistent throughout the project and files in different locations should be synchronised regularly. Version control can take a number of forms using various tools, including:
Data documentation should give the information required by another researcher in your discipline to understand and reuse the data provided. Most often this researcher will be you in the future, so doing this helps you as well as others. It describes the content, context and structure of the data, along with the conditions and processes involved in its creation, collection and processing. At the start of your research, you should decide what information you need to record, and create documentation as the research progresses.
Project-level documentation is normally stored with associated data files and is often in the form of a plain text file called a README, which is placed in the top level of the dataset. Its contents may include:
File or item-level documentation is usually included in individual files and contains details about variables within the file, including table headings, abbreviations, units of measurement and anomalies. This is commonly placed in:
Further documentation to enable data use may include:
NB The terms ‘data documentation’ and ‘metadata’ are sometimes used interchangeably. ‘Metadata’ generally refers to details in a repository record that enable data discovery and access, whereas ‘documentation’ usually refers to information within a dataset that enables data to be understood and reused.
For further information, please contact firstname.lastname@example.org