Professor Peter Willett

Professor of Information Science

MA (Oxford), MSc (Sheffield), PhD (Sheffield), DSc (Sheffield)

Peter Willett 2014

+44 (0)114 222 2633

My research has focused principally on the development of novel techniques for a range of important applications in chemoinformatics, but I have also made significant contributions to information retrieval and increasingly to bibliometrics. Many algorithms developed in my research group are embodied in operational chemoinformatics software that is in use throughout the world, with the GOLD, GASP and GALAHAD programs for ligand-protein docking and pharmacophore mapping being widely distributed on a commercial basis. My research has been reported in over 500 articles, books, chapters, reports etc. that have attracted over 22000 citations in Google Scholar (h-index of 69). I have supervised over 70 PhD students and been awarded over 80 research grants and contracts to a total value of ca. £6M. Much of this work has been supported by industry, the collaborations involving GlaxoSmithKline, Johnson and Johnson, Eli Lilly, Novartis, Pfizer, Syngenta and Unilever inter alia.

I am interested in supervising PhD students in Chemoinformatics and Bibliometrics.

I am a member of the Chemoinformatics Research Group


Current PhD Students

Maram Alajamy: The Role of Academic Librarians in Institutional Information System Strategic Planning: A Grounded Theory Study of Syrian Governmental Universities.

Lucyantie Mazalan: Effect of dimensionality on the effectiveness of chemical similarity searching.


Completed PhD Students

Faisal Altamimi: Total quality management applications in the Saudi academic Information Centres.

Wafaa Al-Motawah: The role of Kuwait University Libraries in supporting graduate students research.

Edmund Duesbury: Development and Efficacy of Chemical Hyperstructures in Similarity Searching and Virtual Screening.

Halima Egberongbe: Quality Management Approaches in Academic Libraries: A Case Study of South-western Nigeria.

Joao-Pedro Franco: Development of a quantitative model to support the assignment of orphan drug status.

Nor Sani: Use of genetic algorithms and genetic programming to develop methods for the prediction of biological activity.

Hua Xiang: Similarity based virtual screening: Effect of the choice of similarity measure.

Nurul Malim: Enhancing Similarity Searching.

Simon Read: Methods for the improved implementation of the spatial scan statistic when applied to binary labelled point data.

Christoph Mueller: Similarity Based Virtual Screening Using Inference Networks.

Georgios Papadatos: Data Mining for Lead Optimisation.

Shereena Arif: Fragment Weighting Scheme for Similarity Based Virtual Screening.

Chia-Wei Chu: Clustering for 2D chemical structures.

Aryati Bakri: Evaluation of Computer and Information Science in Malaysia: A Bibliometric Analysis.

Yogendra Patel: The Prediction of Molecular Properties Using Similarity Searching and Free-Wilson Analysis.

Mohd Firdaus Raih: Computational Exploration of RNA Three-Dimensional Structures.

David Wood: The Use of Kernel-Based Machine Learning Algorithms in Virtual Screening.

Chidochangu Mpamhanga: Development and Implementation of a Virtual Screening Stratagy for the Discovery of CJD Lead Candidates and their Biological Evaluation.

Jayshree Patel: Exploring Protein-Nucleic Acid Complexes at the Atomic Level.

Linda Hirons: Activity fingerprints in DNA based on a structural analysis of sequence information.

Jerome Hert: Two-Dimensional, Similarity-Based Methods for Virtual Screening.

Philippa Levy: 'Living' Theory in networked Learning: A Conceptual Framework for the Design and Facilitation of Professional Development for Networked Learner Support in Higher Education.

Stephen Jelfs: Development of a Novel Descriptor Targeted to High-Throughput Analysis in Lead Exploration and Combinatorial Library Design.

N Salim: Analysis and Comparison of Molecular Similarity Measures.

N Brown: Generation and Application of Activity-Weighted Chemical Hyperstructures.

Rungsang Nakrumpai: Theoretical studies on the complexity of beta-sheets in proteins.

J W Raymond: Applications of Graph-Based Similarity in Cheminformatics.

R V Spriggs: Development of the ASSAM and ASPROTE Programs for Protein Tertiary Structure Searching.

N E Jewell: Novel Molecular Alignments for Three-Dimensional Quantitative Structure-Activity Relationships.

S M Tyrell: Random and Rational Methods for Compound Selection.

E J Barker: John Venn's 'Alumni Cantabrigienses': A Case Study of a Biographical Dictionary and the Automated Conversion of the Printed Text to a Structured Database Format.

P Watson: Calculating the Knowledge-Based Similarity and Complementarity of Functional Groups based on their Non-Bonded Interactions.

Hyo Sook Lee: Automatic text processing for Korean language free text retrieval.

Eleanor Gardiner: Computational Analysis of Protein Binding Sites and Surfaces.

Nega Alemayehu Lakew: Development of a stemming algorithm for Amharic language text retrieval.

Miguel Nunes: The Experiential Dual Layer Model (EDLM): a Conceptual Model Integrating a Constructivist Theoretical Approach to Academic Learning with the Process of Hypermedia Design.

S S Ranade: The Application of Cluster Analysis to Predicting the Cellular Uptake of Foreign Chemicals.

M J Bayley: The Development of a Genetic Algorithm for the Calculation of Three-Dimensional Protein Structures from NMR Data.

C M R Ginn: The Application of Data Fusion to Similarity Searching of Chemical Databases.

F C Ekmekcioglu: Language Processing Techniques for the Implementation of a Document Retrieval System for Turkish Text Databases.

P M Wright: Electrostatic Field Similarity Searching in Databases of Three-Dimensional Conformationally Flexible Chemical Structures.

D B Turner: An Evaluation of a Novel Molecular Descriptor (EVA) for QSAR Studies and the Similarity Searching of Chemical Structure Databases.

D A Thorner: Electrostatic Field Searching in Databases of Three-Dimensional Chemical Structures.

Peter Bath: Similarity Searching in the Cambridge Structural Database.

J Furner-Hines: The Measurement of Inter-Linker Consistency and Retrieval Effectiveness in Manually-Constructed Hypertext Databases.

D J Wild: Structural and Electrostatic Similarity Searching in Three-Dimensional Chemical Databases using Genetic Algorithms and Parallel Computers.

G Jones: Genetic Algorithms for Chemical Structure Handling and Molecular Recognition.

G S Gill: The Automatic Identification of Topological Pharmacophores.

R de Vere: Novel Approaches to Full-Text Retrieval.

D E Clark: Representation and Searching of Conformationally Flexible Molecules.

R D Brown: A Hyperstructure Model for Chemical Structure Handling.

I G Bruno: : The Use of Graph-Theoretic Techniques for the Representation and Searching of Structures in the Complex Carbohydrate Structure Database.

H M Grindley: The Use of Graph-Theoretical Techniques for Searching Databases of 3-D Protein Structures.

A R Poirrette: The Use of Angle-Based Fragments for Screening Databases of Three-Dimensional Chemical Structures.

A M Robertson: An Evaluation of Algorithmic Techniques for the Identification of Word Variants in Historical Text Databases.

E C Ujah: A Study of -sheet Motifs at Different Levels of Structural Abstraction Using Graph-Theoretic and Dynamic Programming Techniques.

M Popovic: Implementation of a Slovene Language-Based Free-Text Retrieval System.

T Wilson: Implementation of Graph Matching Techniques in Chemical Databases Using a Single Instruction Stream, Multiple Data Stream Array Processor.

C A Pepperrell: An Investigation of Methods for Similarity Searching in Databases of Three-Dimensional Chemical Structures.

A Ormerod: Comparison of Fragment Weighting Schemes for Substructural Analysis.

J K Cringean: Text Retrieval from Bibliographic Databases using Transputer Networks.

S Al-Hawamdeh: Paragraph-Based Retrieval in Full-Text Documents.

S J Wade: : The Implementation and Evaluation of some Experimental Statistical Techniques for Text Retrieval.

E Rasmussen: Cluster Analysis on a Highly Parallel Array Processor.

E M Mitchell: Protein Secondary and Tertiary Structure Searching in Files of 3-D Atomic Co-ordinates.

A El-Hamdouchi: Using Inter-Document Relationships in Information Retrieval.

K C Mohan: Choice of Retrieval Techniques for a Multi-Strategy Retrieval System.

A J Brint: Matching Algorithms for Handling Three Dimensional Molecular Co-ordinate Data.

C A Pogue: An Evaluation of Parallel Processing Techniques for Document Retrieval Systems.

V Winterman: Use of Automatic Classification Techniques in Drug Development Programmes.


Research Projects

Open-Access Mega Journals and the Future of Scholarly Communication

Arts and Humanities Research Council Investigator £421,465 2 November 2015 24 months

Open-access 'mega-journals' are an emerging publishing trend which has the potential to reshape the way researchers share their findings, remoulding the academic publishing market and radically changing the nature and reach of scholarship. This project will investigate the influence of mega-journals in the academic community and beyond.


Bio-renewable Formulation - 6 month extension

Unilever Investigator £49,171 1 February 2015 12 months


Bio-renewable Formulation Information and Knowledge Management System

Technology Strategy Board Investigator £24,992 1 April 2014 24 months

Innovative ICT can play a crucial role in many innovation processes, but its potential is not always exploited in many industries. A route to innovation in formulated product industries is the exploitation of materials in what would otherwise be lost to waste streams from current manufacturing processes. This is exciting both in terms of realising additional value from manufacturing, but also in reduced utilisation of unsustainable material sources and exploitation of novel feedstocks for novel functional materials with new application benefits. This project will develop an information system based on highly innovative information technologies with the capability to rapidly identify the feedstock and functional material opportunities for formulated products, and demonstrate its value in rapid bio-derived surfactant discovery. It aims to support chemical using industries where environmental impact, sustainability and materials security are increasingly significant drivers of innovation alongside improved performance in formulated products. Project partners are Unilever, British Sugar, Croda, Cybula, University of Manchester and University of Liverpool


N8 Biohub Information and Knowledge Management System

Technology Strategy Board Investigator £131,128 1 October 2013 28 months

The overall aim of this project is to build, and demonstrate the value of, an information system (IS) to support the creation of a "Bio-Hub" centred on the N8 university group. The IS will demonstrate how functional ingredients from simple transformations of sustainable plant & waste feedstocks can be identified more quickly and recommend the best feedstocks for a particular function. It will address two big data problems using clever algorithms: semantic extraction of the available domain literature (terabytes) and optimised global search algorithms to explore the combinatorially large number of transformation products (up to petabytes). The innovations are in the creation of robust enough algorithms to run semi-automonously in an information system and in bringing these together with all the other components. The value will be demonstrated for specific feedstocks and applications, but the ICTs will be selected for simple extension to, and maintenance of, the overall information domain. Project partners are Unilever, British Sugar, Croda, Cybula and University of Manchester.


AstraZeneca Collaboration - Pharmacophores

AstraZeneca Investigator £74,512 1 January 2007 12 months

The project involved the development of a new multiobjective optimisation method for pharmacophore identification from sets of active compounds. A pharmacophore describes the three-dimensional arrangement of chemical features required for a small molecule to bind to a receptor and the aim of this project was to deduce the pharmacophore from a series of active compounds in the absence of the structure of the receptor itself. This involves superposing the compounds so that their common features are overlaid.


Sanofi-Sheffield collaboration

Sanofi-Aventis Principal Investigator £92,421 1 January 2007 12 months


Array design for lead optimisation in pharmaceutical research

GlaxoSmithKline Investigator £252,000 23 October 2006 48 months

This EPSRC-funded project focused on the development of tools to assist medicinal chemists in the design of compound arrays during the lead optimisation stage of drug discovery. Lead optimisation is a complex, time-consuming task, in which chemists seek to obtain a promising balance among potency, off-target interactions, toxicity, and pharmacokinetic behaviour, to identify a candidate molecule to progress to clinical trials. The focus has been on inverse QSAR, that is, determining the structural change necessary to achieve a desired change in property. This was been approached through retrospective studies of lead optimisation projects within the GSK archive and the development of computational tools that can be applied in prospective array design to inform decision making by chemists. These included a novel context-sensitive approach to matched molecular-pairs analysis.


Vector Analysis of 2D Fingerprints and Screening Data

Xention Limited Principal Investigator £13,000 1 July 2005 2 months


Support tools for automatic pharmacophore generation

Pfizer Investigator £71,467 1 March 2004 22 months


Johnson and Johnson PhD

Janssen Pharmaceuticals N.V. Principal Investigator £76,305 1 January 2004 36 months


Richmond Continuation

Tripos Principal Investigator £14,800 1 January 2004 3 months


Mining molecular bioassay data

Pfizer Principal Investigator £125,016 1 January 2003 24 months


Cheminformatics methods for HTS and profiling data analysis

Novartis Principal Investigator £50,001 1 December 2002 36 months

PhD studentship


Development of novel methods for protein surface representation and comparison

Biotechnology and Biological Sciences Research Council Investigator £151,376 1 December 2001 24 months


Generation of 3D Hyperstructures

Tripos Principal Investigator £106,000 1 December 2001 24 months


Use of graph-theoretical methods in computation chemistry for pattern identification

Medical Research Council Investigator £49,392 1 May 2001 36 months


Probabilistic prediction of bioactivity

Zeneca Pharmaceuticals Principal Investigator £170,251 1 March 2001 36 months


Genome Analysis Using DNA Structure

Biotechnology and Biological Sciences Research Council Investigator £134,164 1 January 2001 36 months


Discrete Mathematical Approaches to Chemical Information Retrieval

Parke Davis Neuroscience Research Centre Principal Investigator £31,396 1 October 2000 36 months