Information retrieval research group

Information retrieval research looks at how people find and use information and develops new techniques to enhance searching.

Several members of the public on their mobile phone
On

Our research themes

The work of the information retrieval (IR) group involves developing effective web-based technologies that support people with accessing, managing and using information.

We approach this from different perspectives, including the study of the interactions between people, information and technologies; and the development of computational methods that support information access and use.

Our multidisciplinary team draws together skills from computer science, information science and human computer and information interaction. We have collaborated in our research with external academic and non-academic partners, funded by national and international funding bodies.

Through our research we aim to enrich the user’s search experience and further our understanding of how people access, interact with, use and re-use information.

Key research areas

The research we undertake is generally built upon the consideration of the user, the system and context of use. Current research in the IR group is informed by four core areas of activity:

  • The study of human computer and information interaction (e.g., in the iLab) to understand user cognition and behaviour with respect to the interactivity involved in information access, use and re-use.
  • The development of novel solutions to information access problems, ranging from the development of specific algorithms to the design of entire prototype systems, with a particular focus on web-scale systems and algorithmic bias.
  • The study and design of methods and techniques for evaluating information access systems for a variety of applications and search scenarios.
  • The development of novel methodologies to study dynamics of social interactions on social media platforms.

Specific areas of research include:

  • human computation and crowdsourcing
  • information visualisation
  • web science
  • information retrieval
  • data mining
  • big data
  • geo-spatial search
  • artificial intelligence
  • semantic search
  • multimedia retrieval
  • digital cultural heritage
  • NLP
  • data streams
  • search log analysis
  • task-based information interaction
  • lifelogging
  • exploratory search
  • human-machine interaction
  • recommender systems
  • algorithmic bias
  • user interface design and evaluation

Projects and research areas

See details of our projects below.

Understanding indigenous and exogenous knowledge interactions within agricultural communities in rural Bangladesh

Read more

GCRF QR Pump Priming Award (£7000)

Co-investigators: Dr Suvodeep Mazumdar and Dr Andrea Jiminez

Agricultural communities in rural areas in the global South have long accessed indigenous knowledge (i.e.that has existed within families and communities) to provide growing sustenance, income and resilience to climate events.

Much of this tacit knowledge is shared from generations to generations as a part of daily-living. The introduction of exogenous knowledge (i.e. modern scientific knowledge, techniques, real-time data) such as chemical fertilisers and genetically modified crops have increased yields many fold, albeit at risk of traditional knowledge becoming subordinated and potentially extinct.

Preserving indigenous knowledge is essential to ensuring long-term environmental protection, crop diversity, food security, sustainable development and resilience to climate change. Moreover, adopting such knowledge will lead to protecting and safeguarding the cultural heritage of generations of agricultural communities.

With the proliferation of mobiles and affordable smartphones, agricultural communities in rural Bangladesh can now access external data, resulting in an interesting hybrid situation where indigenous and exogenous knowledge mix.

Previous studies have explored either the use of indigenous techniques for farming and environmental protection; or application of new technologies for agriculture. There is now a critical need to understand how indigenous knowledge that has existed for centuries in Bangladesh can be adopted in conjunction with other forms of data to overcome future environmental and agricultural challenges.

The project (comprising of Information School, Sheffield Institute for International Development, two universities (ULAB and NSU) in the global South, an NGO (Tech4Dev, BRAC) and an innovation lab (a2i)) therefore aims to answer ‘how can tacit, indigenous knowledge be combined with exogenous knowledge to help communities better respond to environmental and agricultural challenges?’.

The study will (via participant observations, 20 semi-structured interviews and two field visits (15 days)) attempt to gather a deeper understanding of how agricultural communities use, generate and preserve indigenous knowledge as well as how exogenous knowledge is currently delivered to the communities.

BetterCrowd: human computation for big data

Read more

In the last few years we have seen a rapid increase of available data. Digitization has become endemic.

This has led to a data deluge that left many unable to cope with such large amounts of messy data.

Also because of the large number of content producers and different formats, data is not always easy to process by machines due to its diverse quality and the presence of bias. Thus, in the current data-driven economy, if organisations can effectively analyse data at scale and use it as decision-support infrastructure at the executive level, data will lead to a key competitive advantage.

To deal with the current data deluge, the BetterCrowd project will define and evaluate Human Computation methods to improve both the effectiveness and efficiency of currently available hybrid Human-Machine systems.

The project comprises two main parts:

  1. Improving crowdsourcing effectiveness using novel techniques to detect malicious workers in crowdsourcing platforms
  2. Scaling up Human Computational techniques such that they can be applied to larger volumes of data

Funder: EPSRC

Project lead: Dr Gianluca Demartini

Big digital archives: investigating entity-centric methods for information exploration and discovery in big digital archives

Read more

There is a clear need for cultural heritage institutions (archives, libraries and museums) to provide systems that go beyond keyword-based search and support more diverse information seeking behaviours, such as browsing and exploration of large collections.

Entities, such as people, places, organisations and events, can be extracted from the archive and linked to form a network that users can explore in addition to navigating the content directly.

This project builds on existing work to create linked data annotations and ontologies to support the exploration of content within the UK Government Web Archive.

The project is in collaboration with the UK National Archives.

Funder: AHRC

Project leads: Professor Paul Clough and Dr Gianluca Demartini

Personalised access to cultural heritage spaces (PATHS)

Read more

The PATHS (Personalised Access To cultural Heritage Spaces) project was funded under the European Commission’s FP7 programme (2011-2014) and aimed to enable personalised paths through digital library collections; offer suggestions about items to look at and assist in their interpretation, and support the user in knowledge discovery and exploration.

The project consisted of partners from multiple disciplines, including Cultural Heritage, Library and Information Science, and Computer Science, from both academic and non-academic institutions.

A selection of artefacts from Europeana was used as a source of cultural heritage artefacts, but additional semantic enrichment was carried out on the content together with the development of user interfaces to support users in their exploration of digital cultural heritage.

The PATHS project was coordinated by the University of Sheffield and MDR Partners and Paul Clough from the Information School acted as Scientific Director for the project.

PATHS website

PATHS poster

General overview articles:

Clough, P. (2015) Supporting Exploration and Use of Digital Cultural Heritage Materials, EuropeanaTech Insight, Issue 4.

P. Goodale, P. Clough, M. Hall, M. Stevenson, K. Fernie, J. Griffiths and e. agirre, Pathways to Discovery: Supporting Exploration and Information Use in Cultural Heritage Collections. In , N. Proctor & R. Cherry (eds). Silver Spring, MD: Museums and the Web. Published October 2, 2013.

Clough, P., Goodale, P., Hall, M., and Stevenson, M. (2015) Supporting Exploration and Use of Digital Cultural Heritage Materials: the PATHS Perspective, In Ruthven, I. and Chowdhury, G.G. (eds) Cultural Heritage Information Access and Management, Facet, pp. 197-220.

 Developing a taxonomy of search sessions

Read more

The goal of this project was to develop a categorisation scheme to describe common patterns of user-system interaction behaviour as recorded in search engine log files.

In particular the project focused on search sessions, a period of continued usage that provides multiple unit of interaction with which to study how people use search systems.

Search (or query) logs are created as the users of search systems (e.g. web search engines and library catalogues) interact with them to find relevant information.

The project utilised search logs from multiple systems to study the categorisation of sessions, including the investigation of clustering algorithms and related aspects, such as cluster stability.

Funder: Google

Project lead: Professor Paul Clough

Making the Virtual Performer

Read more

A collaboration with various departments (English, Music, & Computer Science) across the University and Forced Entertainment to explore human-robot interaction in the performing arts. Could robots be actors, could they improvise, and how could this shape the experience of performance for artists and audiences alike?

Human versus robotic interactions for children in a therapeutic context

Read more

A collaborative project with the Education Department and Sheffield Hallam addressing the emerging use of robots for therapeutic support for children. This project examines current prevalence, practices and purported benefits of robot-assisted therapies. In partnership with local education sites, we explore what people would want from using these services and whether those have potential to deliver.


Group members

Academic staff

Dr Frank Hopfgartner (Head of Group)

f.hopfgartner@sheffield.ac.uk
Tel: 0114 2222658

My research to date can be placed in the intersection of information systems (e.g., information retrieval and recommender systems), content analysis and data science. I have (co-) authored over 150 publications in above mentioned research fields, including a book on smart information systems, various book chapters and papers in peer-reviewed journals, conferences and workshops. To date, I have successfully acquired over £1 Million in research funding from national and international sources to support my research.

See Frank's full staff profile

Dr Dave Cameron

d.s.cameron@sheffield.ac.uk
Tel: 0114 2222644
Room 232

See Dave's full staff profile

Dr Alessandro Checco

a.checco@sheffield.ac.uk
Tel: 0114 2222674
Room 232

See Alessandro's full staff profile

Professor Paul Clough

p.d.clough@sheffield.ac.uk
Tel: 0114 2222664
Room 226

I research the development of effective retrieval technologies that support users as they seek to fulfil their information needs. Specifically I have carried out research in the areas of multilingual retrieval, image search, geographic information retrieval, search log analysis and the evaluation of search systems. Another area of my research focuses on originality and attribution in digital media, in particular on text re-use and plagiarism detection.

See Paul's full staff profile

Paula Goodale

My main research interest is in the needs and seeking behaviours of users in digital environments, including digital libraries, cultural heritage collections, and other information spaces. I am also interested in how people use and curate the information they find.

p.goodale@sheffield.ac.uk

See Paula's full staff profile

Dr Morgan Harvey

m.harvey@sheffield.ac.uk
Tel: +44 (0)114 222 6337
Room 306

I conduct research in the fields of information retrieval, information behaviour, recommender systems and in the wider area of information and data science and have published more than 60 peer-reviewed conference papers and journal articles.

Much of my work aims to bridge the gap between “systems” and “user-centred” Information Retrieval (IR) and Recommender Systems (RS), with recent work in particular focussing on mobile IR and RS for health and nutrition.

I also research e-government/digital services and their hidden costs, digital literacy, and the relation of these to the growing digital divide in our society.

See Morgan's full staff profile

Peter Holdridge

p.g.holdridge@sheffield.ac.uk
Tel: 0114 2222698
Room 227

I focus on Educational Informatics and e-Learning - including the creation/application of learning technologies. I am interested in accounting for cognitive style in networked learning systems.

See Peter's full staff profile

Dr Suvodeep Mazumdar

s.mazumdar@sheffield.ac.uk
Tel: 0114 2222697
Room 210

My research explores developing techniques and mechanisms for reducing the barrier for user communities in understanding and enriching very large complex multidimensional datasets.

I conduct inter-disciplinary research on highly engaging, interactive and visual mechanisms in conjunction with complex querying techniques for seamless navigation, exploration and understanding of complex datasets.

I have carried out research in a variety of domains such as aerospace engineering, emergency response, event management and smart cities.

See Suvodeep's full staff profile

Dr Sophie Rutter

s.rutter@sheffield.ac.uk
Tel: 0114 2222659
Room 234

I am particularly interested in how the environment influences the way people interact with information, what techniques people use to search for information, and how information use can be evaluated in different environments.

My research so far has been broadly focused on school children, search interfaces and health communication.

See Sophie's full staff profile

Dr Ziqi Zhang

ziqi.zhang@sheffield.ac.uk
Tel: 0114 2222657
Room 209

My research addresses methods that enable machines to extract human knowledge from text, to represent such knowledge in a structured representation that is understandable and usable by machines.

This ultimately enhances our capability of processing and sense-making of very large-scale data, improving decision making. Specifically, this include but is not limited to: Information Extraction, Semantic Web and Linked Data, knowledge graphs, and social media analytics.

See Ziqi's full staff profile

Research staff

Monica Lestari Paramita

m.paramita@sheffield.ac.uk

I am working on a project to evaluate search capabilities of Europeana. I am also working on a PhD to investigate methods for identifying cross-lingual similarity in Wikipedia.

PhD researchers

Ahmed Alnuhayt

AAlnuhayt1@sheffield.ac.uk

I am researching the role of computational intelligence and aggregation systems in decision making.

Abdulkareem Alqusaid

AOAlqusair1@sheffield.ac.uk

I am researching product category extraction and linking in the area of semantic web.

Chen (Cassie) Chao

ccao5@sheffield.ac.uk

I am researching the affordances of gamified online education in the context of the Chinese Market.

Cui Cui

CCui3@sheffield.ac.uk

My research is working towards creating a small scale, topic-based digital library of web archives of Chinese studies.

See Cui's full staff profile

Omaima Fallatah

oafallatah1@sheffield.ac.uk

I am researching the mapping and aligning of large knowledge bases.

See Omaima's full staff profile

Paula Goodale

I am researching the construction of personal narratives through exploration of cultural spaces online.

See Paula's full staff profile

Yuyang Liu

YLiu369@sheffield.ac.uk

I am researching machine learning and health informatics.

See Yuyang's full staff profile

Haoyu Xie

HXie5@sheffield.ac.uk

I am researching the self-organisation of crowdsourcing workers.

Zhixue Zhao

zzhao33@sheffield.ac.uk

My research interests are focused on handling unbalanced data and limited data in the task of hateful speech classification. I am examining the effects of different transfer learning strategies on improving models when using limited data. I am also interested in applying my research to other natural language processing and deep learning methods with the view to mitigate the destructive effects of poor training data.


Engagement and impact

We are active contributors to our research communities and been involved in organising international events. For example, in 2014 we organised the Cross Language Evaluation Forum (CLEF) event.

Our researchers regularly speak at international conferences and provided tutorials on topics, such as crowdsourcing, multilingual information retrieval and search evaluation.

We also develop resources (e.g., datasets, evaluation benchmarks and software code) for supporting research activities and produce research articles (e.g. journal articles, conference papers and books).

Our research has been funded by organisations, such as Google, OCLC Inc., the UK National Archives and Peak Indicators, as well as funding bodies, including the European Union, European Science Foundation, Arts and Humanities Research Council and Engineering and Physical Sciences Research Council.


Funders and collaborators

Research carried out within this group is funded by a wide range of organisations.

  • Horizon 2020
  • Engineering and Physical Sciences Research Council
  • European Union
  • Arts & Humanities Research Council
  • Google
  • Research England

A world top-100 university

We're a world top-100 university renowned for the excellence, impact and distinctiveness of our research-led learning and teaching.