Computer Science Supports The Sheffield COVID-19 Genomics Group
The group is part of a national consortium, COG-UK, which reports directly to the government through the Scientific Advisory Group for Emergencies (SAGE). Reading the full genomes of the virus enables them to detect changes that act like fingerprints that can help classify individual viral isolates and help track the spread of the disease. To ensure that they can deliver over a hundred viral genomes per week, the Sheffield Bioinformatics Core have developed an automated workflow that allows them to go from a sequencing run in the laboratory right through to a complete genome sequence. A laboratory information management system, developed by the team, interfaces with the sequencing device (Oxford Nanopore GridION). This allows real time monitoring of the sequencing run and data transfer. As data is produced it is automatically processed by Sheffield University's high performance computing cluster (HPC). Real-time sequence analysis is a key strength of Sheffield’s computational workflow that utilises state-of-the-art Nanopore tools developed by the Artic Network, among others.
Data of sufficient quality are uploaded to Cloud Infrastructure for Microbial Bioinformatics (CLIMB), the MRC’s national infrastructure for the project where it is shared with the global scientific community. Despite the lockdown, high performance laptops and access to the HPC funded by the Department of Computer science are helping Sheffield researchers streamline the process to produce complete genomes that are ready in 1-2hrs.
This project is of vital importance to understanding the disease, without sequencing data we cannot fully understand the spread of the virus. It is vital that we can rapidly sequence and analyse the data so that it can be used to, for example, to inform local infection control and also national decisions. Without powerful and cutting edge computing infrastructure we wouldn’t be unable to carry out the project efficiently. Processing 100’s of genomes a week means producing 10’s of Tb’s of data and this is not an easy task. Having fast infrastructure along with laptops that have GPU capabilities allow us to analyse data quickly
Dr Matthew Parker
Clinical Bioinformatics Scientist
The genomic analysis of SARS-CoV-2 is led by Dr Matthew Parker with support from Dr. Benjamin Lindsey from IICD and Dr Dennis Wang through the Department of Computer Science, IT services, and the Sheffield Biomedical Research Centre (Sheffield BRC). They work closely with COVID-19 Genomics Group lead, Dr Thushan de Silva of IICD and Sheffield Teaching Hospitals, whose lab has collected and sequenced patient samples.
The University’s four flagship institutes bring together our key strengths to tackle global issues, turning interdisciplinary and translational research into real-world solutions.