Text and Data Mining (TDM)

Information for staff and students undertaking text and data mining (TDM) as part of their research.

On

Introduction

Text and data mining is the process of extracting new information from existing files, usually using computational methods. There are several steps to most TDM activity, including data cleansing and indexing, but the first step is to gather the sources to be used. A temporary copy of the files is often made to enable the content to be processed, and this has implications for copyright.


Copyright and TDM

An exception to copyright (section 29A of the Copyright, Designs and Patents Act), introduced into UK law in 2014, allows you to make copies of works to which you have ‘lawful access’ for the purpose of text and data analysis, more commonly referred to as TDM. This must be as part of non-commercial research. The exception does not permit TDM for any directly or indirectly profit-making purpose.


Lawful access

Lawful access to works can take many forms. For example, it may be material we have access to via a Library subscription, or you may own a copy. When using publicly available works for TDM - such as material freely available online - you should ensure the source you use is lawfully available with the rights holder’s consent, and is not an infringing copy posted without authorisation. 

When accessing materials lawfully under contract or licence, content providers can apply reasonable measures to ensure stability and security of their networks, though any terms and conditions that aim to stop you undertaking TDM are unenforceable. 

Note that the section 29A TDM exception does not permit the override or bypassing of any technological protection measures (TPMs) in order to access and copy works.


Transfer of copies

If your research project involves collaboration with researchers at another UK higher education institution, you may be able to rely on our Copyright Licensing Agency licence for limited sharing of materials to be analysed - please contact us if you wish to discuss what is permitted. You should not otherwise share any copies of materials that you have made for the purposes of TDM.

While you cannot share copies of underlying material, the outputs you create from your analysis are under your control. You are usually free to share or publish your new analysis data provided you do not transfer or communicate any copies of the underlying works.

Publishing the results of your research open access may be a requirement of any applicable funder mandate. For this reason it is important to note and avoid any contractual restrictions on the reuse of your new analysis data, which some rights holders may attempt to impose as part of granting you lawful access to their collections.


TDM and database right

Database right is a related right to copyright, introduced into UK law by the Copyright and Rights in Databases Regulations 1997. The right protects databases where there has been significant investment in obtaining, verifying or presenting the database’s contents. The right can apply to a database as a whole, irrespective of whether the individual contents themselves qualify for intellectual property rights protection. 

The section 29A copyright exception detailed above does not apply to protected databases. There is a fair dealing exception to the database right at section 20 of the Regulations, which allows extraction (i.e. copying to another medium) but not further reutilisation (i.e. no onward communication). This extraction must be by a person with lawful access to the database, must be made solely for a non-commercial teaching or research purpose, and the data source must be properly indicated.

The fair dealing exception might permit TDM of a lawfully accessed database, though note this is an untested grey area and so caution is advised. There is no explicit contractual override provision in the section 20 exception. Any extraction and reuse of data that harms the investment of the database owner will likely infringe the database right.


TDM and the public domain

Works in the public domain can usually be freely reused and copied for any purpose, as copyright no longer applies to them. Be aware that if the materials you wish to use for TDM are made available to you under contract terms or licence, and consist of works that are free of copyright and related rights, then the exception for TDM does not apply. Your use must keep to the terms of your contract or licensing agreement in such circumstances.


Process for purchasing datasets for research projects

If you need to acquire a dataset for a project please contact the Library and we will then approach the publishers for a price quote. The cost of purchasing the dataset should then be included in the grant application where possible. Where possible the Library will buy the dataset if the researcher is able to meet the cost from the grant award. Our systems team can then make the dataset available on University networks.

This process ensures that datasets purchased have appropriate licence agreements, are made available for the whole University and that there are not multiple copies being purchased by different research groups.