Artificial Intelligence and data analytics

We are identifying the determinants and predictors of multimorbidity and frailty (including socio-economic and environmental factors and biological processes) using Artificial Intelligence approaches.

Lines of code from a computer programme

Aims and objectives

Our aim is to use artificial intelligence approaches to identify and follow clusters of multimorbidity to improve the design, testing and selection of more effective preventive interventions. The main implication and novelty of a cluster medicine approach is that instead of considering one cause – one treatment for one disease, it considers causes, and thus possible prevention and treatments, for the entire cluster of long term chronic conditions. 

Using state-of-the-art machine learning approaches (e.g. agglomerative clustering and language modelling) we aim to identify clusters of multimorbidity, their trajectories as well as possible predictive biomarkers and risk factors using health data from a variety of sources. We aim to apply new natural language processing approaches we have developed for sequence discovery and representation learning to validate the clusters and augment the data allowing the identification of new rare, indirect clusters. We aim to establish causal links between patient characteristics (e.g. age, comorbidities, gender, ethnicity), potential drivers of the clusters (e.g. inflammation) and the evolution of multimorbidity.

Our expertise

Machine learning

The University of Sheffield has an excellent track record in machine learning, in the development of unsupervised and reinforcement learning models, and their application to neurodegenerative and cardiovascular diseases. It also has a strong international reputation in Natural Language Processing, hosting one of the largest NLP groups in Europe and one of the most successful in the UK. Its internationally-recognised natural language processing and speech technology research has a track record spanning more than 30 years and supported by over £45m of UK and European funding since 1990 and includes most aspects of NLP research, such as language learning, machine translation, text simplification, social media analysis and information extraction. 

This group is supported by biologists experts in ageing and clinicians and has an international reputation in social gerontology, in the mining of data on deprivation and health, and the impact of social and behavioural factors on the prevalence of disease and tools which allow statistical analysis and modelling of the onset of multimorbidity. The group also has expertise in information governance and management of large national datasets, including NHS Digital, to address multimorbidity questions.

We work closely with Sheffield Teaching Hospital NHS Trust, Rotherham Foundation NHS Trust and Connected Yorkshire.

CIV digital code

Our capabilities

High performance compute clusters and GPU clusters

Our computational infra-structure includes 

  • Two High Performance Compute (HPC) clusters at the University of Sheffield (ShARC and Bessemer)
  • 91 individual rackmount servers, of which at least 11 are GPU servers
  • One major HPC grid of 25 nodes with up to 32 CPU cores, 256GB RAM per node, and around 28 GPU cards spread across various nodes
  • One smaller 4-node grid and a 5-node Cloud cluster used to provide IaaS
  • 6 DAS and NAS RAIDs of between 10TB and 40TB each, a Lustre file system running on an Infinband backbone of 160TB for the cluster, and a Ceph filesystem of around 300TB.

The University also has 1000+ managed desktops, some of these are high performance machines and some are also equipped with GPUs. In addition, we have access to external Tier 2 GPU clusters

N8 Tier 2 GPU cluster Bede (based in Durham)

Tier 2 GPU cluster Jade 2 (University of Oxford)

These clusters are available for specific machine/deep learning projects that require extensive GPU compute.


Name Speciality
Dr Mauricio Alvarez Machine Learning
Professor Peter Bath Information Science
Professor Ilaria Bellantuono Musculoskeletal Ageing
Professor Simon Heller Diabetes
Dr Dan Holman Social Sciences
Professor Allan Lawrie Cardiopulmonary Science
Dr Chenghua Lin Natural Language Processing
Dr Carolina Scarton Natural Language Processing
Dr Mark Stevenson Natural Language Processing
Tony Stone Data Architect
Dr Mari-Cruz Villa-Uriol Clinical data analytics
Professor Aline Villavicencio Natural Language Processing
Dr Dawn Walker Computational Biology
Dr Dennis Wang Bioinformatics
Professor Mark Wilkinson Orthopedics
External collaborators  
Alessandra Marengoni - Karolinska Institutet Geriatric Epidemiology

The group is also supported by biologists, experts in ageing and clinicians.

Flagship institutes

The University’s four flagship institutes bring together our key strengths to tackle global issues, turning interdisciplinary and translational research into real-world solutions.