Information retrieval research group

Information retrieval research looks at how people find and use information and develops new techniques to enhance searching.

Several members of the public on their mobile phone

Key projects

Our research themes

The work of the information retrieval (IR) group involves developing effective web-based technologies that support people with accessing, managing and using information.

We approach this from different perspectives, including the study of the interactions between people, information and technologies; and the development of computational methods that support information access and use.

Our multidisciplinary team draws together skills from computer science, information science and human computer and information interaction. We have collaborated in our research with external academic and non-academic partners, funded by national and international funding bodies.

Through our research we aim to enrich the user’s search experience and further our understanding of how people access, interact with, use and re-use information.

Key research areas

The research we undertake is generally built upon the consideration of the user, the system and context of use. Current research in the IR group is informed by four core areas of activity:

  • The study of human computer and information interaction (e.g., in the iLab) to understand user cognition and behaviour with respect to the interactivity involved in information access, use and re-use.
  • The development of novel solutions to information access problems, ranging from the development of specific algorithms to the design of entire prototype systems, with a particular focus on web-scale systems and algorithmic bias.
  • The study and design of methods and techniques for evaluating information access systems for a variety of applications and search scenarios.
  • The development of novel methodologies to study dynamics of social interactions on social media platforms.

Specific areas of research include:

  • human computation and crowdsourcing
  • information visualisation
  • web science
  • information retrieval
  • data mining
  • big data
  • geo-spatial search
  • artificial intelligence
  • semantic search
  • multimedia retrieval
  • digital cultural heritage
  • NLP
  • data streams
  • search log analysis
  • task-based information interaction
  • lifelogging
  • exploratory search
  • human-machine interaction
  • recommender systems
  • algorithmic bias
  • user interface design and evaluation

Projects and research areas

See details of our projects below.

Hand hygiene at work

Read more

The hand hygiene at work project is jointly funded by Innovate UK, ESRC IAA, Research England QR-Policy Support Funding and Research England Covid Recovery.

The World Health Organisation (WHO) recommends that all workplaces and all workplace employees practise good hand hygiene. Improvements in workplace hand hygiene can lead to health benefits, reduce workplace absence from illness and signal a management commitment to employee well-being.  This project currently has two strands. Strand one, in collaboration with the University of Leeds and Savortex (an SME specialising in hand hygiene technology), aims to understand  how organisations and their employees perceive hand hygiene within the workplace. Strand two investigates the everyday experiences of  mobile workers' access to facilities as part of their work.

The Sheffield team include Sophie Rutter, Andrew Madden, Lauren White and Sally Sanger.

Visit the hand hygiene at work website for more information. 

Embedding equality, diversity and inclusion (EDI) in usability testing

Read more

The Embedding EDI (equality, diversity and inclusion) in usability testing project is funded through the Research England QR-Policy Support Funding. Usability is the evaluation of products and services with a user. Usability testing should be conducted with people who use products and services; however, in practice, many people are excluded from usability testing resulting in inequalities. The aim of the project is to create an agenda for embedding EDI in usability testing.

The Sheffield team include Sophie Rutter, Efpraxia Zamani, Jo McKenna- Aspell and Yuhua Wang.

Visit our project website for more information. 

Towards Building a Resilient Healthcare Supply Chain Using Entity Resolution

Read more

The COVID pandemic revealed poor resilience in the current NHS supply chain. Despite the government's temporary, emergency measures to respond to the demand spikes for PPEs, it has been recognised that a long-term reform is essential for building a resilient NHS supply chain. This project develops innovative technologies for enabling an AI-driven marketplace for healthcare procurement. This is based on automatically mining the massive public tender documents from heterogeneous sources to build models that can evaluate supplier capacity and credibility and buyer contract conditions, thus improving the NHS's capacity in supply management and better preparing itself for the next pandemic.

Vamstar Ltd. has developed a prototype based on the above idea using data from eTendering, consisting of a network graph of 110 million entities (e.g., hospitals, pharmaceutical companies, products and tenders) that allows easier search for and consolidation of structured information in a semi-automated way. Early stakeholder evaluation identified one critical limitation: the widespread presence of redundant information (e.g., same organisations are named in different ways). This has been a critical hurdle to its practical usage and commercialisation. This project will develop entity resolution methods that can be used to discover and resolve redundant information in large graphs. This addresses the above-mentioned limitation of the network graph, allowing it to reach the commercialisation quicker.

Funder: Higher Education Innovation Fund from Research England

Contact: Ziqi Zhang

Data Science powered healthcare supply chain network monitoring system in the post- COVID and post-Brexit

Read more

Supply chains in healthcare are getting more complex and becoming unpredictable due to fast changing market forces (such as COVID-19 related demand shocks, BREXIT uncertainty etc.) and structural deficiencies of the Global Value Chain (GVC) framework that introduces higher levels of supply-chain-operating-risk. There is an increasing pressure on healthcare buyers and suppliers to reduce the cost and overall inefficiencies in the trading relationships, so that the best care can be delivered to those that need the most. Existing supply chain decision support and performance management solutions are complex to implement within an organisation and impossible to connect across the trading-networks.

The project will develop a large-scale network for buyers (such as the NHS) and suppliers for monitoring buyer-supplier relationships and the overall health of those networks through monitoring contract particulars (eg. estimated contract-value, demand-supply-risk, strength of supplier- network, overall network gaps, suppliers’ financial strength etc.) and award documents that are exchanged between partners while automatically extracting data that provides insight (scores, reports, and automated-index weight) into that business relationship. This network will allow suppliers and buyers to be better connected and ultimately enabling better demand-supply forecast and matching.

Funder: Innovate UK

Contact: Ziqi Zhang

Imagining Robotic Care

Read more

Although autonomous systems are considered vital in addressing health-social care needs, research into stakeholder expectations is sparse. Identifying misalignment of expectations early can enhance research programmes, improve prototyping, and embed responsible innovation practices before projects start.

We use LEGO Serious Play as a method for collecting data on socio-technical “imaginaries” (collectively achieved, systemic visions of social transformation through technology) of autonomous care held by diverse stakeholders across the health-social care ecosystem, including roboticists, administrators, carers, and care users. By examining where imaginaries cohere and conflict, we can shape responsible research in trustworthy autonomous systems for care needs.

Funder: UKRI via Trustworthy Autonomous Sytems Hub

Contact: David Cameron

Civil War Bluejackets: Race, Class, and Ethnicity in the United States Navy

Read more

The project seeks to semi-automatedly digitise and link muster rolls, rendezvous (recruitment) rolls, deck logs, and pension records of US Navy sailors during the American Civil War period to better understand the dynamics of racial, ethnic, and class identities in mid-19th century America. It will advance and assess the use of crowdsourcing and semi-automatic machine learning technologies to enable the large-scale transcription of digitised historical records and analysis of automatically linked digitised datasets, opening research avenues that simply are not possible through traditional archival research alone.

The project will be conducted in collaboration with colleagues at Northumbria University, Newcastle.

Funder: Arts and Humanities Research Council

Contact: Morgan Harvey, Frank Hopfgartner

Recently completed projects 

DoubleTapp: Crowdsourcing the Long Tail of Nano-influencers

Read more

The project aimed to develop an innovative product for social media marketing based on DoubleTapp's novel 'crowdsourcing nano-influencers' (crowd nano-influencing) model that extends the already successful 'nano-influencer marketing'. Powered by cutting-edge big data technology and mobile computing, the product will, for the first time: empower the 'long-tail' of consumers - those with little influencing power - in their purchase decision-making; and enable affordable nano-influencer marketing for the 'long-tail' of businesses - SMEs often operating in local communities with limited or no access to this powerful marketing channel.

Invented by, 'crowd nano-influencing' refers to crowdsourcing influencers of any size for any business. In May 2019, DoubleTapp piloted its first (minimum viable product) mobile app that brings businesses and influencers of any size to a single platform. Businesses create adverts that describe the type of Instagram interactions they reward (e.g., sharing a photo of dining in the cafe), and reward customers who engage in such interactions through the app. During the pilot in Sheffield alone, it has worked with 40 business venues, and enabled an estimated 1,000,000 reach (an industry standard for influencer marketing pricing) - 4 times more effective than traditional influencers. With the large amount of user and interaction data collected, the project will develop: 1) data analytics capabilities to discover insights from such data, and 2) a new product powered by such insights to enable customer personalisation at a wider geographical scope. This will help improve the effectiveness of this model and scale it to, and beyond the national level.

Funder: Innovate UK

Contact: Ziqi Zhang

 A new resource for behavioural science - developing tools for understanding the relationship between behaviours

Read more

The TURBBO project, led by Prof. Thomas Webb is a collaboration between the Departments of Psychology, Computer Science and the Information School (Dr. Suvodeep Mazumdar, Co-Investigator). In this project, we aim to develop semantic web solutions to help better understand the relationships between different behaviours. Some associations between behaviours are intuitive (e.g. people who are more physically active may sleep longer, either because they view both as 'healthy' behaviours or because active people need more rest), while others are less intuitive (e.g. relations between driving behaviour and efforts to conserve biodiversity). Psychologists and other behavioural experts often view behaviours in isolation - for example seeking to improve sleep or increase levels of physical activity. However, everyday life is characterized by a wide range of behaviours, so it is crucial to understand how behaviours are related to one another. Fortunately, a lot of studies looking at the relationships between behaviours already exist. Any study that measures two or more behaviours and reports the correlation between them, or that provides access to data that allows the correlation to be calculated, can provide an estimate of their relationship, which can be pooled across datasets. However we currently lack the tools to understand and aggregate this data. In this project, we will develop (semantic web) tools to allow behavioural scientists to define behaviours, along with their similarities and differences, by creating a semantic model. We will then collate data from published papers and large secondary datasets on the relationship between behaviours and develop a set of tools - that will allow researchers to enter their own information to enable easy, rapid, and efficient generation of new knowledge.

Contact: Suvodeep Mazumdar

 CyCAT – Cyprus Centre for Algorithmic Transparency

Read more

The main aim of the project was to establish a centre of expertise on algorithmic transparency and information bias in Cyprus, the CyCAT centre. This was achieved e.g., by organising scientific networking events, and by fostering ongoing research in this area.

Another important aim of the project was the effective promotion of digital and algorithmic literacy to the general public, through collaboration with educators, students and professionals. This Information Literacy slant to the project was enhanced by some of the researchers in Cyprus coming from the education field of academia.

One of the concerns that this project was trying to address is the rise of ‘proprietary algorithmic processes’ by giants like Google and Facebook; the hidden mechanisms that decide what they show you when you use their services.

Funder: Horizon 2020

Contact: Frank Hopfgartner, Paul Clough, Jo Bates

Key publications:

Paramita M, Orphanou K, Christoforou E, Otterbacher J & Hopfgartner F (2021) Do you see what I see? Images of the COVID-19 pandemic through the lens of Google. Information Processing & Management. View this article in WRRO

Kleanthous S, Otterbacher J, Bates J, Giunchiglia F, Hopfgartner F, Kuflik T, Orphanou K, Paramita ML, Rovatsos M & Shulner-Tal A (2021) Report on the CyCAT winter school on fairness, accountability, transparency and ethics (FATE) in AI. ACM SIGIR Forum, 55(1). View this article in WRRO

 The Legacies of Stephen Dwoskin

Read more

Stephen Dwoskin left behind a large and varied archive, physical and digital, which was deposited in the Special Collections of the University of Reading soon after his death in 2012.

The archive includes, as well as paper documents of all kinds, computer hard drives, audio tapes, video cassettes, and thousands of photographs, slides, and negatives – plus many posters, paintings, and designs. Part of the task of the Dwoskin Project is to assist in the cataloguing of this material for the benefit of researchers.

The digital forensics and data exploration branch of Dwoskin Project was concerned with the preservation and examination of the twenty hard drives that encompass Dwoskin’s digital legacy. Dwoskin’s work straddles the transition of moving images from analogue to digital. He was an early adopter of and experimenter with digital cameras and editing software, and left behind a digital footprint that is both rich and diverse.

Funder: Arts and Humanities Research Council

Contact: Frank Hopfgartner

Key publication:

Bartliff Z, Kim Y, Hopfgartner F & Baxter G (2020) Leveraging digital forensics and data exploration to understand the creative work of a filmmaker: A case study of Stephen Dwoskin's digital archive. Information Processing & Management, 57(6). View this article in WRRO

 FashionBrain: understanding Europe’s fashion data universe

Read more

A core business of Europe’s fashion industry is to acquire a deep understanding of customer needs and to predict upcoming trends. Search engines and social networks are often used as a bridge between the customer's potential purchase decision and the retailer.

In order to reinforce Europe's position in the fashion industry and better exploit its distinctive characteristics e.g., multiple languages, fashion and cultural differences, it is pivotal to reduce its dependence on search engines.

This goal can be achieved by harnessing various data channels that retailers can leverage in order to gain greater insights into potential buyers and industry trends as a whole.

The goal of the FashionBrain project is to improve the fashion industry value chain through the creation of novel on-line shopping experiences, the detection of influencers, and the prediction of upcoming fashion trends.

Tangible outcomes will include software, demonstrators, and novel algorithms for a data-driven fashion industry.

Funder: Horizon 2020

Contact: Alessandro Checco, Paul Clough

Understanding indigenous and exogenous knowledge interactions within agricultural communities in rural Bangladesh

Read more

GCRF QR Pump Priming Award (£7000)

Co-investigators: Dr Suvodeep Mazumdar and Dr Andrea Jiminez

Agricultural communities in rural areas in the global South have long accessed indigenous knowledge (i.e.that has existed within families and communities) to provide growing sustenance, income and resilience to climate events.

Much of this tacit knowledge is shared from generations to generations as a part of daily-living. The introduction of exogenous knowledge (i.e. modern scientific knowledge, techniques, real-time data) such as chemical fertilisers and genetically modified crops have increased yields many fold, albeit at risk of traditional knowledge becoming subordinated and potentially extinct.

Preserving indigenous knowledge is essential to ensuring long-term environmental protection, crop diversity, food security, sustainable development and resilience to climate change. Moreover, adopting such knowledge will lead to protecting and safeguarding the cultural heritage of generations of agricultural communities.

With the proliferation of mobiles and affordable smartphones, agricultural communities in rural Bangladesh can now access external data, resulting in an interesting hybrid situation where indigenous and exogenous knowledge mix.

Previous studies have explored either the use of indigenous techniques for farming and environmental protection; or application of new technologies for agriculture. There is now a critical need to understand how indigenous knowledge that has existed for centuries in Bangladesh can be adopted in conjunction with other forms of data to overcome future environmental and agricultural challenges.

The project (comprising of Information School, Sheffield Institute for International Development, two universities (ULAB and NSU) in the global South, an NGO (Tech4Dev, BRAC) and an innovation lab (a2i)) therefore aims to answer ‘how can tacit, indigenous knowledge be combined with exogenous knowledge to help communities better respond to environmental and agricultural challenges?’.

The study will (via participant observations, 20 semi-structured interviews and two field visits (15 days)) attempt to gather a deeper understanding of how agricultural communities use, generate and preserve indigenous knowledge as well as how exogenous knowledge is currently delivered to the communities.

BetterCrowd: human computation for big data

Read more

In the last few years we have seen a rapid increase of available data. Digitization has become endemic.

This has led to a data deluge that left many unable to cope with such large amounts of messy data.

Also because of the large number of content producers and different formats, data is not always easy to process by machines due to its diverse quality and the presence of bias. Thus, in the current data-driven economy, if organisations can effectively analyse data at scale and use it as decision-support infrastructure at the executive level, data will lead to a key competitive advantage.

To deal with the current data deluge, the BetterCrowd project will define and evaluate Human Computation methods to improve both the effectiveness and efficiency of currently available hybrid Human-Machine systems.

The project comprises two main parts:

  1. Improving crowdsourcing effectiveness using novel techniques to detect malicious workers in crowdsourcing platforms
  2. Scaling up Human Computational techniques such that they can be applied to larger volumes of data

Funder: EPSRC

Project lead: Dr Gianluca Demartini

Personalised access to cultural heritage spaces (PATHS)

Read more

The PATHS (Personalised Access To cultural Heritage Spaces) project was funded under the European Commission’s FP7 programme (2011-2014) and aimed to enable personalised paths through digital library collections; offer suggestions about items to look at and assist in their interpretation, and support the user in knowledge discovery and exploration.

The project consisted of partners from multiple disciplines, including Cultural Heritage, Library and Information Science, and Computer Science, from both academic and non-academic institutions.

A selection of artefacts from Europeana was used as a source of cultural heritage artefacts, but additional semantic enrichment was carried out on the content together with the development of user interfaces to support users in their exploration of digital cultural heritage.

The PATHS project was coordinated by the University of Sheffield and MDR Partners and Paul Clough from the Information School acted as Scientific Director for the project.

PATHS website

PATHS poster

General overview articles:

Clough, P. (2015) Supporting Exploration and Use of Digital Cultural Heritage Materials, EuropeanaTech Insight, Issue 4.

P. Goodale, P. Clough, M. Hall, M. Stevenson, K. Fernie, J. Griffiths and e. agirre, Pathways to Discovery: Supporting Exploration and Information Use in Cultural Heritage Collections. In , N. Proctor & R. Cherry (eds). Silver Spring, MD: Museums and the Web. Published October 2, 2013.

Clough, P., Goodale, P., Hall, M., and Stevenson, M. (2015) Supporting Exploration and Use of Digital Cultural Heritage Materials: the PATHS Perspective, In Ruthven, I. and Chowdhury, G.G. (eds) Cultural Heritage Information Access and Management, Facet, pp. 197-220.

Making the Virtual Performer

Read more

A collaboration with various departments (English, Music, & Computer Science) across the University and Forced Entertainment to explore human-robot interaction in the performing arts. Could robots be actors, could they improvise, and how could this shape the experience of performance for artists and audiences alike?

Group members

Academic staff

Dr Dave Cameron (Head of Group)

Professor Paul Clough

Dr Morgan Harvey

Peter Holdridge

Dr Suvodeep Mazumdar

Monica Lestari Paramita

Dr Sophie Rutter

Dr Ziqi Zhang

Dr Mengdie Zhuang

Research staff

Adam Funk

Dr Fatima Sabiu Maikore

PhD researchers

Ahmed Alnuhayt

Amnah Salamh Alluqmani

Aisha Alshammri

Waad Khalid A Alshuaibi

Juan M Becerril del Toro

Chen (Cassie) Cao

Cui Cui

Jessica Fairbairn

Khalid Umar M Fallatah

Omaima Fallatah

Xinyu (Joseph) Jia

Yuyang Liu

Jie Qi

David Walsh

Mengyisong 'Sookie' Zhao

Zulfadli Zulfadli

Engagement and impact

We are active contributors to our research communities and been involved in organising international events. For example, in 2014 we organised the Cross Language Evaluation Forum (CLEF) event.

Our researchers regularly speak at international conferences and provided tutorials on topics, such as crowdsourcing, multilingual information retrieval and search evaluation.

We also develop resources (e.g., datasets, evaluation benchmarks and software code) for supporting research activities and produce research articles (e.g. journal articles, conference papers and books).

Our research has been funded by organisations, such as Google, OCLC Inc., the UK National Archives and Peak Indicators, as well as funding bodies, including the European Union, European Science Foundation, Arts and Humanities Research Council and Engineering and Physical Sciences Research Council.

Funders and collaborators

Research carried out within this group is funded by a wide range of organisations.

  • Horizon 2020
  • Engineering and Physical Sciences Research Council
  • European Union
  • Arts & Humanities Research Council
  • Google
  • Research England

A global reputation

Sheffield is a research university with a global reputation for excellence. We're a member of the Russell Group: one of the 24 leading UK universities for research and teaching.