Dr Marta Milo

Marta MiloLecturer in Computational Biology
Department of Biomedical Science
The University of Sheffield
Western Bank
Sheffield S10 2TN
United Kingdom

Room: Alfred Denny C224a
Telephone: +44 (0) 114 222 4673
Email : m.milo@sheffield.ac.uk


General

Career history

  • 2016-present: Visiting Scientist, Wellcome Trust Sanger Institute
  • 2015-present: Fellow of the Higher Education Academy (FHEA)
  • 2012-present: Lecturer in Computational Biology
  • 2008-2012: Bioinformatics Research Fellow, NIHR Cardiovascular Biomedical Research Unit, Sheffield Teaching Hospitals NHS Trust.
  • 2004-2008: Wellcome Trust Research Fellow, University of Sheffield, UK.
  • 2003-2004: Postdoctoral Researcher at the Department of Biomedical Science, University of Sheffield, UK.
  • 2001-2003: Postdoctoral Fellow at the Dept of Computer Science, University of Sheffield, UK.
  • 1995-1996: Computer Science and Statistics Consulting at FORMEZ c/o OLIVETTI Research Centre, Arco Felice, Italy.
  • 1994-1995: Qualification in Computer Science and Statistics applied to Public Administrative systems, at FORMEZ c/o OLIVETTI Research Centre, Arco Felice, Italy.

Education

  • 2015: Postgraduate certificate in Learning and Teaching, The University of Sheffield, UK
  • 2000: PhD in Applied Mathematics and Computer Science, The University of Naples, "Federico II"
  • 1994: "Laurea" in Mathematics, The University of Naples "Federico II", Italy.

Research interests

To understand and define the source of uncertainty in quantitative biology it is a key aspect for improving sensitivity and accuracy in the analysis of high throughput genomic data. My research interests focus on developing computational tools, pipelines, appropriate experimental designs and protocols to assist in improving accuracy and sensitivity in the analysis of biological data. Major research activities are:

  • propagation of uncertainty, associated to low-level data, in downstream analysis of microarray data;
  • quantification and inference of gene expression levels using probabilistic models;
  • inference of gene networks using regulatory data and gene expression data;
  • integrated approaches for the analysis of Next Generation Sequencing data;
  • effect of splicing dysfunction on gene expression activity; 
  • effect on selection of genetic mutations in Embryonic Stem Cell.
  • single-cell gene expression analysis and noise reduction   

My research group is part of the Centre for Stem Cell Biology (CSCB)

CSCB

Activities and distinctions

  • Member of  PUMA project (Propagating Uncertainty in Microarray Analysis)
  • Reviewer for leading scientific journals
  • Member of Quantitative Biology Group and Sheffield Bioinformatics Hub
  • Fellow of the Higher Education Academy, FHEA (as part of the University Postgraduate Certificate in Learning and Teaching)
  • Visiting Scientist at the Wellcome Trust Sanger Institute

Awards

  • Wellcome Trust Advanced Training Fellow (2004-2008)
  • Bioinformatics Research Fellow 
  • NIHR Cardiovascular Biomedical Research Unit (2009-2012)

Funding

  • Wellcome Trust
  • British Heart Foundation
  • Royal Society
  • HEFCE

Selected publications

Google Scholar profile is here.

Full publications

Research

The main focus of my professional career has been to develop truly interdisciplinary skills, complementing and refining my bioinformatics skills with a deep understanding of the biological nature of the data collected. This is to better identify limitations in the experimental designs and better quantify variations in the data collection and validation. The main stream of my research, has been focusing on the analysis and interpretation of high-throughput biological data, with the aim to produce feasible and robust hypothesis for a deeper understanding of the biological systems under study.

Quantifying Uncertainty in Biology with Probabilistic models:

Lord KelvinIn quantitative sciences numerical knowledge is not enough to understand and predict systems behaviours that are only partially observed. Since the beginning of 20th century it was clear that predictions of data required an additional “knowledge” to become meaningful. This knowledge needed to be quantified in a way that reflects our prior knowledge of the systems and what we were able to measure. It signed the start of introducing the concept of quantified uncertainty.

The evolution of the technology for biological sciences enables us to apply the concepts of uncertainty on complex biological data. Modern measurements, despite being complex, limited and restrictive at times, shed complete new insights in understanding complex systems. My research interests mainly focus on exploring, develop and quantify the concept of uncertainty in Biology. This becomes an important step when we make predictions from complex data that want to be meaningful and satisfactory.

Here are some examples from my research, where the use of uncertainty in modelling, made substantial difference in improving accuracy and sensitivity of the analysis.

Microarray data analysis


Picture: Image 1Microarrays provided a practical method for measuring the expression of thousand of genes simultaneously. Although next generation sequencing has mainly replaced these assays, there is still a large amount of data available in public databases, that would enable to better design sequencing experiment with the insight of an high-throughput gene expression screening. For this reasons methods that have been developed in the past to analyse microarrays gene expression data, are still a valuable resource.  Microarray technology is associated with many significant sources of experimental uncertainty, which must be considered in order to make confident inference from the data. Estimate of uncertainty is not entirely achieved using repeat experiments. Outliers are often due to flaws in the microarray technique or to problems in the hybridization of the biological material. In high-density oligonucleotide arrays as well as in cDNA spotted arrays the aim is to extract from pixel intensity signals an estimate of gene expression levels.

Specifically for oligonucleotide arrays, such as Affymetrix GeneChip®, multiple probes are associated with each target paired as perfect match (PM) probes, designed to capture specific binding and mismatch (MM) probes, designed to capture non-specific binding. The probe-set is used to measure the target gene expression level and this measurement is then utilised to detect differentially expressed genes between different conditions or for visualisation, clustering or inference of gene networks. My research mainly focused on developing computational tools that are assisting in improving accuracy of both low-level and downstream analysis of biological data. In collaboration with PUMA (Propagating Uncertainty in Microarray Analysis) group we have developed a family of probabilistic models, that estimate gene expression levels with credibility intervals to quantify the measurement variance associated with the estimates of target concentration within a sample. 

The software puma is fully integrated in Bioconductor – Open Source Software for Bioinformatics.

Next Generation Sequencing


I am currently working to extend the use of probabilistic models to Next Generation Sequencing data, with particular focus on de-novo isoforms identification and data integration. In both cases we are integrating uncertainty in the models using probabilistic approaches, optimising computational time and accuracy. This research is done in close collaboration with the PUMA project to extend it to Next Generation Sequencing (NGS) data applications.

I was involved in the first NGS cross-faculty network in Sheffield and I am currently working with a consortium of 13 scientists from 8 different countries to study the effects of splicing dysfunction in disease with both RNA-Seq data and proteomics. Both networks provides questions and data for testing these developing methods. In Sheffield I actively collaborate with Dr G. Hautbergue and the Sheffield Institute of Translational Neuroscience ( SITraN) on studying the effect of TDP-43 mutation of Amyotrophic Lateral Sclerosis and other neurodegenerative disorders on the transcriptome and translatome of affected cells.   

Effect of splicing dysfunction on disease

I am involved in a Network that was created with the aim to develop innovative and multidisciplinary approaches to investigate splicing dysfunction as a common mechanisms of disease. The spliceosome regulates the mechanism of transcription from DNA to RNA with generating different forms of mature RNA by splicing the basic RNA molecule called pre-RNA. This enables functionally diverse protein isoforms to be expressed according to different regulatory programs. With the integration of data and expertise from this network of scientists, we focus on the disruption of normal splicing patterns which are likely to contribute to the pathology underlying Motor Neuron Disease, Parkinson Disease, Huntington Disease, Rett Syndrome, X-linked agammaglobulinemia and Deafness. Despite the large wealth of data generated and available in public databases, only convoluted signals of biological processes in disease can be identified and measured. Reverse engineering is required to produce informative knowledge from modern data, for this we are working on generating novel integrative data approaches for splicing dysfunction, based on synergy between computer models and experimental procedures to:
• analyse RNA transcripts from the different compartments at cellular level; 
• quantify and identify new isoforms expression from sequencing data;
• optimise methods of therapeutic splicing correction.

Effect of genetic mutations on selection in Embryonic Stem Cells

Genomic mutations occur in human pluripotent stem cells (hPSCs) during culture adaptation. These mutations might be cause of tumor formation, when cells are embedded in a tissue or simply can give a form of selective advantage in culture. There are two important factors in culture adaptation: MUTATION that generates the variants and SELECTION that creates prevalence in cultures. These studies aims to descover and characterise the impact that culture adaptation has on trascriptmonic changes and how those changes can impact on the adapted cells. Screening mutation is important to define changes in the lines; predicting selection and modelling functional effects are essential steps for translational use.
The Pluripotent Stem Cell Platform within the UK regenerative Medicine programme, aims to study the interaction between genetic variances and transcriptomics changes in Pluripotent Stem Cells (PSCs). My research interests in this programme are mainly to integrate whole genome sequencing, RNA-Seq and DNA bisulfite sequencing of clones of PSCs in order to predict their impact in translational uses. Using optimized pipelines from the Cancer Genome Group in the Sanger Institute we are screening for genetics substitutions and IN/DEL and correlate them with changes in RNA-Seq and epigenome.  substitution plots

This project is in collaboration with the Centre of Stem Cell Biology (CSCB), the Wellcome Trust Sanger Institute and The Babraham Institute in Cambridge, UK.

Analysis of Single-cell population at whole genome level 

My interest in studying cel populations at single cell level is highly connected with the interest of exploring the complex and obscure relationship between stem cells and their microenvironment that plays a pivotal role in stem cell fate determination. Important factor in this relationship is the association between molecular changes and morphogenetic changes induced by the shape, the biochemistry and the physical forces of the micro-environments where the cells attach and grow.
In stem cells population, heterogeneity has been very difficult to since the development of single–cell research which was able to demonstrate great power in identifying different cellular subsets.

My research in this field is both on the computational methodology and the experimental Biology. For the computational side I am working in collaboration with the Machine Learning Group in the Department of computer Science, Sheffield, on developing new methodologies for the analysis of single-cell depositions prior sequencing, building methods to: a) quantify uncertainty of singlet from the Fludigm C1 system deposition; b) classify cell cycle stage of the single cell prior sequencing.
From the experimental biology side I am working on “Delineating the developmental trajectory of Human Embryonic Stem Cell(HESC)-derived otic progenitors” in collaboration with Prof. M Rivolta.

I am also co-director together with Prof Rivolta of the Sheffield Single-Cell OMICS facility.

Experimental Biology:

I have also acquired skills in experimental biology that allow me to complement my bioinformatics skills with a deep understanding of the biological nature of the data and of the limitations and variations in the data collection.

Genetic profiling of mammalian inner ear:
Molecular mechanisms to stimulate sensory regeneration in the mammalian inner ear are commonly searched in studies based upon embryonic and post-natal developmental in animal models. This has revealed many genes that regulate the differentiation of sensory cells. A major challenge is to place these genes into the context of functional networks. This is to be able to describe developmental processes in more details and increase the chances of identifying useful therapeutic targets.

We used high throughput gene expression assays, specifically microarray assays, to identify gene networks related to transcription factors, gata3 and gata2 during development in the mammalian inner ear, as well has to identify networks for sensory neural development in Igf-1 null mice, in collaboration with Prof Varela-Nieto laboratory. The prediction made with probabilistic models for data collected in these studies, were supported but extensive validation in vivo and in vitro and opened the way to detailed questions that remain still work in progress.

I am actively working on profiling changes affecting the sensory neural development in Igf-1 null mice at whole genome level, with both Whole transcriptome arrays and RNA-Seq, together with Prof I. Varela-Nieto and her Laboratory in Madrid. We are in interested in the effect of aging in the null mice, within the EU programme Targear, as well as in exploring the inflammasome in different animal models and how it relates to autophagy in hearing development, hearing physiology and hearing loss.

Figures 5 and 6

Figure 4, 5 and 6 in Milo et al., PLoS ONE 4(9): e7144. doi:10.1371/journal.pone.0007144

Gene expression profiling in Acute Coronary Syndrome:
Acute coronary syndrome (ACS) is the cause of over 114 000 UK hospital and causes large associated costs to the National Health system. Advances in microarray technology allow a detailed understanding of genome-wide expression profiles of pathological processes. We hypothesised that analysis of ACS, at the time of an acute event and throughout recovery up to 90 days post event, would provide insight into pathology, as well as identify genes as potential drug targets and both diagnostic and prognostic markers. Using Microarray technology and screening of miRNA from whole blood samples, we aimed to identify specific biological pathways showing the late effects of acute events that can be used to discover biomarkers of coronary heart disease.

This study in still in its infancy and has identified a set of differentially expressed genes that are associated to relevant pathways, like Rho GTPase cytoskeletal, endothelin signalling, integrin signalling, G-protein signalling and inflammation-mediated pathways. We used principal component analysis with propagated uncertainty (pumaPCA) to visualise and interpret the data. With clinical information incorporated, it was found that the data discriminated between patients, putting them into troponin-positive and troponin-negative groups across all time points.
Appropriate filtering of the data and use of probabilistic model to combine replicates and define Differential expression (pumaComb and PPLR) defined a set of relevant genes. Hierarchical clustering, comparing the expression profiles between groups, identified different clusters of genes that increased in expression over time in the troponin-positive group.
Patients cohort:

50 patients presenting with chest pain consistent with ACS were recruited within 48 h of admission. 3 ml of peripheral whole blood was collected using Tempus RNA tubes at days 1, 3, 7, 30 and 90. Total RNA was extracted, cleared of globin mRNA and arrayed using Affymetrix HG_U133 plusv.2 GeneChips. Data were analysed using open source software PUMA.

Data were analysed using open source software PUMA.

Teaching

Teaching experience

I have been actively involved in teaching in my academic career and have taught undergraduate and postgraduate students. In 2015 I obtained my Postgraduate Certificate in Learning and Teaching from the University of Sheffield and became a Fellow of The Higher Education Academy.

I currently teach a a new undergraduate Level 3 module (BMS353) that provides Biomedical Science students with knowledge of Bioinformatics and programming. This module was develop and designed to provide BMS undergraduates with competitive and essential skills to handle large amounts of data and to transfer knowledge across disciplines. With this in mind, the key issues that BMS353 addresses in the curriculum design for this module are: a) create an engaging and inclusive environment that enables effective learning of new skills; b) achieve blended learning using traditional lecturing techniques and classroom based activities with technology enhanced learning on a cloud based environment; c) design an effective way to assign, mark and feedback while enhancing constructive learning.

This module is designed to address a series of issues that were key for BMS students to achieving constructive learning across discipline:
• Easy working environment to enable students to naturally change their mind setting
• Interactive sessions in which monitoring learning with formative feedback
• Assignment that were able to assess learning
• Marking and feedback from teacher to students
• Peer feedback and problem solving using a dedicated discussion forum.


BMS353 is taught with the support of cloud computing and with a user-friendly approach to programming, using the Jupyter notebooks, complying with the data sharing principles developed in collaboration with  Research Software Engineering Sheffield, led by Dr Mike Croucher. We facilitated the assign-mark-feedback process using the SageMathCloud environment for the practical classes and created a website interface for all the module activities, including forum.

  • BMS353 Bioinformatics for Biomedical Science (Coordinator)
  • BMS6014 Genomic Approaches to Drug Discovery