Dr Marta Milo
Lecturer in Computational Biology
To understand and define the source of uncertainty in quantitative biology it is a key aspect for improving sensitivity and accuracy in the analysis of high throughput genomic data. My research interests focus on developing computational tools, pipelines, appropriate experimental designs and protocols to assist in improving accuracy and sensitivity in the analysis of biological data. Major research activities are:
Quantifying Uncertainty in Biology with Probabilistic models:
In quantitative sciences numerical knowledge is not enough to understand and predict systems behaviours that are only partially observed. Since the beginning of 20th century it was clear that predictions of data required an additional “knowledge” to become meaningful. This knowledge needed to be quantified in a way that reflects our prior knowledge of the systems and what we were able to measure. It signed the start of introducing the concept of quantified uncertainty.
The evolution of the technology for biological sciences enables us to apply the concepts of uncertainty on complex biological data. Modern measurements, despite being complex, limited and restrictive at times, shed complete new insights in understanding complex systems. My research interests mainly focus on exploring, develop and quantify the concept of uncertainty in Biology. This becomes an important step when we make predictions from complex data that want to be meaningful and satisfactory.
Here are some examples from my research, where the use of uncertainty in modelling, made substantial difference in improving accuracy and sensitivity of the analysis.
Microarray data analysis
Specifically for oligonucleotide arrays, such as Affymetrix GeneChip®, multiple probes are associated with each target paired as perfect match (PM) probes, designed to capture specific binding and mismatch (MM) probes, designed to capture non-specific binding. The probe-set is used to measure the target gene expression level and this measurement is then utilised to detect differentially expressed genes between different conditions or for visualisation, clustering or inference of gene networks. My research mainly focused on developing computational tools that are assisting in improving accuracy of both low-level and downstream analysis of biological data. In collaboration with PUMA (Propagating Uncertainty in Microarray Analysis) group we have developed a family of probabilistic models, that estimate gene expression levels with credibility intervals to quantify the measurement variance associated with the estimates of target concentration within a sample.
Next Generation Sequencing
I was involved in the first NGS cross-faculty network in Sheffield and I am currently working with a consortium of 13 scientists from 8 different countries to study the effects of splicing dysfunction in disease with both RNA-Seq data and proteomics. Both networks provides questions and data for testing these developing methods. In Sheffield I actively collaborate with Dr G. Hautbergue and the Sheffield Institute of Translational Neuroscience ( SITraN) on studying the effect of TDP-43 mutation of Amyotrophic Lateral Sclerosis and other neurodegenerative disorders on the transcriptome and translatome of affected cells.
Effect of splicing dysfunction on disease
I am involved in a Network that was created with the aim to develop innovative and multidisciplinary approaches to investigate splicing dysfunction as a common mechanisms of disease. The spliceosome regulates the mechanism of transcription from DNA to RNA with generating different forms of mature RNA by splicing the basic RNA molecule called pre-RNA. This enables functionally diverse protein isoforms to be expressed according to different regulatory programs. With the integration of data and expertise from this network of scientists, we focus on the disruption of normal splicing patterns which are likely to contribute to the pathology underlying Motor Neuron Disease, Parkinson Disease, Huntington Disease, Rett Syndrome, X-linked agammaglobulinemia and Deafness. Despite the large wealth of data generated and available in public databases, only convoluted signals of biological processes in disease can be identified and measured. Reverse engineering is required to produce informative knowledge from modern data, for this we are working on generating novel integrative data approaches for splicing dysfunction, based on synergy between computer models and experimental procedures to:
Effect of genetic mutations on selection in Embryonic Stem Cells
Genomic mutations occur in human pluripotent stem cells (hPSCs) during culture adaptation. These mutations might be cause of tumor formation, when cells are embedded in a tissue or simply can give a form of selective advantage in culture. There are two important factors in culture adaptation: MUTATION that generates the variants and SELECTION that creates prevalence in cultures. These studies aims to descover and characterise the impact that culture adaptation has on trascriptmonic changes and how those changes can impact on the adapted cells. Screening mutation is important to define changes in the lines; predicting selection and modelling functional effects are essential steps for translational use.
This project is in collaboration with the Centre of Stem Cell Biology (CSCB), the Wellcome Trust Sanger Institute and The Babraham Institute in Cambridge, UK.
Analysis of Single-cell population at whole genome level
My interest in studying cel populations at single cell level is highly connected with the interest of exploring the complex and obscure relationship between stem cells and their microenvironment that plays a pivotal role in stem cell fate determination. Important factor in this relationship is the association between molecular changes and morphogenetic changes induced by the shape, the biochemistry and the physical forces of the micro-environments where the cells attach and grow.
My research in this field is both on the computational methodology and the experimental Biology. For the computational side I am working in collaboration with the Machine Learning Group in the Department of computer Science, Sheffield, on developing new methodologies for the analysis of single-cell depositions prior sequencing, building methods to: a) quantify uncertainty of singlet from the Fludigm C1 system deposition; b) classify cell cycle stage of the single cell prior sequencing.
I am also co-director together with Prof Rivolta of the Sheffield Single-Cell OMICS facility.
I have also acquired skills in experimental biology that allow me to complement my bioinformatics skills with a deep understanding of the biological nature of the data and of the limitations and variations in the data collection.
Genetic profiling of mammalian inner ear:
We used high throughput gene expression assays, specifically microarray assays, to identify gene networks related to transcription factors, gata3 and gata2 during development in the mammalian inner ear, as well has to identify networks for sensory neural development in Igf-1 null mice, in collaboration with Prof Varela-Nieto laboratory. The prediction made with probabilistic models for data collected in these studies, were supported but extensive validation in vivo and in vitro and opened the way to detailed questions that remain still work in progress.
I am actively working on profiling changes affecting the sensory neural development in Igf-1 null mice at whole genome level, with both Whole transcriptome arrays and RNA-Seq, together with Prof I. Varela-Nieto and her Laboratory in Madrid. We are in interested in the effect of aging in the null mice, within the EU programme Targear, as well as in exploring the inflammasome in different animal models and how it relates to autophagy in hearing development, hearing physiology and hearing loss.
Figure 4, 5 and 6 in Milo et al., PLoS ONE 4(9): e7144. doi:10.1371/journal.pone.0007144
Gene expression profiling in Acute Coronary Syndrome:
This study in still in its infancy and has identified a set of differentially expressed genes that are associated to relevant pathways, like Rho GTPase cytoskeletal, endothelin signalling, integrin signalling, G-protein signalling and inflammation-mediated pathways. We used principal component analysis with propagated uncertainty (pumaPCA) to visualise and interpret the data. With clinical information incorporated, it was found that the data discriminated between patients, putting them into troponin-positive and troponin-negative groups across all time points.
50 patients presenting with chest pain consistent with ACS were recruited within 48 h of admission. 3 ml of peripheral whole blood was collected using Tempus RNA tubes at days 1, 3, 7, 30 and 90. Total RNA was extracted, cleared of globin mRNA and arrayed using Affymetrix HG_U133 plusv.2 GeneChips. Data were analysed using open source software PUMA.
I have been actively involved in teaching in my academic career and have taught undergraduate and postgraduate students. In 2015 I obtained my Postgraduate Certificate in Learning and Teaching from the University of Sheffield and became a Fellow of The Higher Education Academy.
I currently teach a a new undergraduate Level 3 module (BMS353) that provides Biomedical Science students with knowledge of Bioinformatics and programming. This module was develop and designed to provide BMS undergraduates with competitive and essential skills to handle large amounts of data and to transfer knowledge across disciplines. With this in mind, the key issues that BMS353 addresses in the curriculum design for this module are: a) create an engaging and inclusive environment that enables effective learning of new skills; b) achieve blended learning using traditional lecturing techniques and classroom based activities with technology enhanced learning on a cloud based environment; c) design an effective way to assign, mark and feedback while enhancing constructive learning.
This module is designed to address a series of issues that were key for BMS students to achieving constructive learning across discipline:
• Easy working environment to enable students to naturally change their mind setting
BMS353 is taught with the support of cloud computing and with a user-friendly approach to programming, using the Jupyter notebooks, complying with the data sharing principles developed in collaboration with Research Software Engineering Sheffield, led by Dr Mike Croucher. We facilitated the assign-mark-feedback process using the SageMathCloud environment for the practical classes and created a website interface for all the module activities, including forum.
- Low rates of mutation in clinical grade human pluripotent stem cells under different culture conditions. Nature Communications, 11(1). View this article in WRRO
- Translating SOD1 Gene Silencing toward the Clinic: A Highly Efficacious, Off-Target-free, and Biomarker-Supported Strategy for fALS. Molecular Therapy : Nucleic Acids, 12, 75-88. View this article in WRRO
- Multipotency of Adult Hippocampal NSCs In Vivo Is Restricted by Drosha/NFIB. Cell Stem Cell, 19(5), 653-662. View this article in WRRO
- Comparative gene expression study of the vestibular organ of the Igf1 deficient mouse using whole-transcript arrays. Hearing Research, 330(IEB Kyoto), 62-77.
- Editorial: Aging, neurogenesis and neuroinflammation in hearing loss and protection. Frontiers in Aging Neuroscience, 7. View this article in WRRO
- Gene expression signatures in motor neurone disease fibroblasts reveal dysregulation of metabolism, hypoxia-response and RNA processing functions.. Neuropathol Appl Neurobiol, 41(2), 201-226. View this article in WRRO
- Exploiting adaptive Bayesian regression shrinkage to identify exome sequence variants associated with gene expression. Springer Proceedings in Mathematics and Statistics, 63, 135-138.