Prof Paul Clough

Professor of Search and Analytics

BEng (York), PhD (Sheffield)

Paul Clough 2014

+44 (0)114 222 2664

My research interests focus on developing effective retrieval technologies that support users as they seek to fulfil their information needs. Specifically I have carried out research in the areas of multilingual search, retrieval of images, geo-spatial search, analysis of transaction logs, text re-use and plagiarism detection, and the evaluation of search systems. I have published over 100 peer-reviewed articles, including a co-authored Springer book on multilingual information retrieval. My background in natural language processing, gained during my PhD, has allowed me to develop more sophisticated approaches to accessing information. In addition to developing techniques, I have also built up an understanding of the users of information access systems and their information needs, taking a more user-oriented view to my research. A further theme of my research has been to create re-usable evaluation resources (corpora and test collections) for the wider research community, such as computational linguistics and information retrieval. I have been involved in coordinating activities at three international evaluation campaigns: the Cross Language Evaluation Form (CLEF) in Europe, the Text Retrieval Conference (TREC) in the US and the Forum for Information Retrieval Evaluation (FIRE) in India.

I am head of the Information Retrieval Research Group


Current PhD Students

Abdulkareem Alqusair: Product category extraction and linking in the area of semantic web

Monica Paramita: Methods to Build Comparable Corpora.

David Walsh: Supporting information access in digital cultural heritage

Xiaoli Chen: Crowdsourcing solutions for metadata quality improvement in large-scale research digital libraries, with an application to a community-based information platform in High-Energy Physics.


Completed PhD Students

Munirah Abdulhadi: Towards enriching metadata descriptions with tags in a bilingual academic library context.

Faisal Alvi: Plagiarism Detection.

Simon Wakeling: The User Centered Design of a Recommender System for a Universal Library Catalogue.

Azzah Al-Maskari: Evaluation of Interactive Information Retrieval Systems.

Antje Bothin: Analysing Meeting Noters and their Role in Automatic Meeting Summarisation.

Robert Pasley: Defining Vernacular Regions Using Knowledge from Unstructured NEB Data Sources.

Sophie Rutter: An investigation into the multi-session searching behaviour of primary-age children.

Johannes Schanda: Breakthroughts and Early Event Detection: Expanding New Event Detection to new Frontiers.

Shahram Sedghi Ilkhanlar: Relevance Criteria for Medical Images Applied by Health Care Professionals; a Grounded Theory Study.

Rita Wan-Chik: Religious information seeking on the web: A case study of Islamic and Qur'anic information searching.

Rao Muhammed Adeel Nawab: Mono-lingual Paraphrased Text reuse and Plagiarism detection.


Research Projects

Relating to Data: understanding data through visualisation

Economic and Social Research Council Investigator £0 1 October 2016 36 months

Relating to Data will develop new knowledge about how people relate to data through their visualisation, the narratives and meanings people attach to visualisations and the potential understanding produced by them. This topic is important because data are proliferating, acquiring new powers and playing an increasingly important role in society, and the main way that people get access to data is through visualisations (the visual representation of data and statistics in charts and graphs). Given this, greater understanding of the social role of visualisations than currently exists is needed.


User Preference vs Performance for Business Intelligence (BI) dashboards

University of Sheffield Principal Investigator £10,000 1 April 2014 6 months

Performance measurement is increasingly important to businesses who want to exploit internal data to improve process efficiency, and external "Big Data" to add competitive advantage. Often this is achieved through the use of Business Intelligence (BI) technologies and services, such as dashboards - graphical user interfaces that contain measures of business performance to support managerial decision making. However, as BI vendors continue to add increasingly complex and imaginative visualisations to their tools, it is unclear whether the end users of such tools are able to understand and use such visualisations, particularly with the varying levels of cognitive abilities and analytical skills found in most organisations. This collaborative research project between a commercial BI partner (Peak Indicators Ltd) and the Information School (University of Sheffield), aims to investigate how different people respond to BI visualisations, allowing the providers of BI tools and services to better support their clients and end users.


Digital Society Network

University of Sheffield Steering Group Member £24,000 1 February 2014 18 months

The Digital Society Network (DSN) draws together an interdisciplinary team of researchers engaged with research at the cutting-edge of society-technology interactions. Underpinning the network is a concern not only with how societies and individuals use digital technologies, but also the social implications and outcomes of an increasingly digitised world on numerous scales. In this way, digital society is understood as being the social aspect of the digital - a concern with who uses and does not use digital technology, for what purposes digital technologies are being used, how effective technologies and platforms are, and the implications and outcomes of these practices.


Developing a Taxonomy of Search Sessions

Google Principal Investigator £41,375 22 July 2013 28 months

The project seeks to develop a categorisation scheme to describe common patterns of user-system interaction behaviour as recorded in search engine log files. In particular, the project is focused on sessions, a period of continued usage that provides multiple unit of interaction with which to study how people use search systems. Search (or query) logs are created as the users of search systems (e.g. web search engines and library catalogues) interact with them to find relevant information.

We will use a combination of manual and automated techniques to develop the classification scheme, including applying conceptual models of information seeking as well as query log mining and cluster analysis. We will re-use a wide range of search logs from many domains to identify distinct and common types of session and user behaviour.

Why do we need to know this? Identifying and extracting user-system interaction patterns and mapping them to information seeking behaviour will help search providers, such as Google, to more effectively evaluate and optimise their search systems, resulting in higher system performance that will in turn likely lead to more satisfied users.


VisualSense: Tagging visual data with semantic descriptions

European Commission Investigator £1,080,164 3 March 2013 34 months

The visual sense project aims at mining automatically the semantic content of visual data to enable "machine reading" of images. In recent years, we have witnessed significant advances in the automatic recognition of visual concepts (VCR). These advances allowed for the creation of systems that can automatically generate keyword-based image annotations. The goal of this project is to move a step forward and predict semantic image representations that can be used to generate more informative sentence-based image annotations. Thus, facilitating search and browsing of large multi-modal collections. More specifically, the project targets three case studies, namely image annotation, re-ranking for image search, and automatic image illustration of articles.


PROMISE: Participative Research labOratory for Multimedia and Multilingual Information Systems Evaluation

European Commission Investigator £95,208 1 April 2012 17 months

Large-scale worldwide experimental evaluations provide fundamental contributions to the advancement of state-of-the-art techniques through common evaluation procedures, regular and systematic evaluation cycles, comparison and benchmarking of the adopted approaches, and spreading of knowledge. In the process, vast amounts of experimental data are generated that beg for analysis tools to enable interpretation and thereby facilitate scientific and technological progress. PROMISE provided a virtual laboratory for conducting participative research and experimentation to carry out, advance and bring automation into the evaluation and benchmarking of such complex information systems, by facilitating management and offering access, curation, preservation, re-use, analysis, visualization, and mining of the collected experimental data.



Royal Holloway University Library Principal Investigator £29,282 1 February 2012 4 months

Search25 is a regional resource discovery tool for London and the South East; providing one stop access to the library catalogues of nearly 60 world-renowned institutions and specialist collections within the M25 Consortium of Academic Libraries. Search25 is funded by the Consortium and makes it easier to search, locate and obtain resources at any M25 library, enabling users to benefit from the wealth of materials available to them.


PATHS: Personalised Access to Cultural Heritage Spaces

European Commission Investigator £468,229 1 January 2011 36 months

The vision of the PATHS project is to enable: personalised paths through digital library collections; offer suggestions about items to look at and assist in their interpretation; and support the user in knowledge discovery and exploration. We aim to make it easy for users to explore cultural heritage material by taking them along a trial, or pathway, created by experts, by themselves or by other users.


User-Centred Design of a Recommender System for a Universal Library Catalogue (Collaborative Doctoral Award)

Arts and Humanities Research Council Principal Investigator £70,000 1 October 2010 36 months


Improving Information Finding at the National Archives

The National Archives Principal Investigator £70,649 1 April 2010 6 months

The project aimed at improving access to data managed by TNA. The project involved analysing TNAs main web server logs to establish the range of subjects being searched by online visitors to their archives. Additionally the project analysed separate server logs of the UK Government Web Archive to establish the range of subjects of interest to online visitors and to determine any common patterns of user behaviour. An evaluation methodology was also developed for TNA based on crowdsourcing that allows them to evaluate their existing and future search products and services.


TrebleCLEF: Evaluation, Best Practice & Collaboration for Multilingual Information Access

European Commission Investigator £73,000 1 January 2008 24 months

TrebleCLEF is an EU-funded Coordination Action (CA) designed to bring together investigators working in the field of evaluation for multilingual information access to consolidate and promote best practice. The project seeks to build upon and extend the results already achieved by the existing Cross-Language Evaluation Forum (CLEF) and continue the development and dissemination of resources for evaluation of multilingual information system. The specific target for this project is the European digital library community.


Defining Imprecise Regions Using Knowledge from the Web (CASE Studentship)

Engineering and Physical Sciences Research Council Principal Investigator £82,000 1 October 2006 24 months


Memoir: Learning how technology can help people create and manage long-term personal memories

European Commission Investigator £473,923 1 June 2006 48 months

Memoir is an EU training grant to build expertise in a new multidisciplinary area, encompassing the computer science disciplines of information retrieval and human computer interaction, as well as design, ethics, history and cognitive science. Memoir will study the new area of personal memories, to better understand the technology, ethics and psychology of storing and accessing personal information. Changes in digital storage technology mean that people are beginning to store huge amounts of digital videos, photographs, music and speech in personal file systems. We will research new techniques to organize, store and retrieve such personal information that focus on user-centric concepts and methods. This research will draw on methods for multimedia retrieval, human computer interaction and ontologies. We will also explore cultural and social differences in such memories within different EU cultures, as well as socially disadvantaged communities in our local region. Our project may help address the digital divide as personalisation has been suggested as a way of making technology relevant to sections of the society who are reticent to adopt it.



European Commission Principal Investigator £237,101 1 January 2006 36 months

On the web, cultural heritage content is everywhere, in traditional environments such as libraries, museums, galleries and audiovisual archives, but also in popular magazines and newspapers, in multiple languages and multiple media. The aim of the MultiMatch project is to enable users to explore and interact with online accessible cultural heritage content, across media types and languages boundaries.