Michael F. Lynch: an appreciation

Prof Peter Willett
Information School
University of Sheffield

This appreciation was prepared for an issue of Journal of Documentation in 1998 to honour Michael (Mike) Felix Lynch, who had recently retired from the Department of Information Studies at the University of Sheffield

Mike obtained B.Sc. and Ph.D. degrees in chemistry from University College, Dublin in 1954 and 1957, respectively, and followed this with post-doctoral research at the Swiss Federal Institute of Technology in Zurich. Following two years in industry in the UK, he joined the staff of Chemical Abstracts Service (CAS) in Columbus, Ohio in the USA, in 1961. He spent four years at CAS at a time when it was carrying out some of the earliest large-scale experiments on the use of computers for the production of both textual and chemical databases [1, 2]. Mike played a prominent rĂ´le in these experiments, which resulted in the design and implementation of the first version of the CAS Registry System, a chemical and textual database that now contains machine-readable records for over 14 million chemical compounds. He returned to the UK in 1965 when he took up a position at what was then the Postgraduate School of Librarianship at the University of Sheffield. The University awarded him a Personal Chair in 1975 and he remained there for the remainder of his professional career. During his time in Sheffield, Mike made significant contributions to the theory and practice of information science. It is these contributions that we honour in this issue of the journal. The focus is on his research, but we must also remember the personal qualities that have made working with him so pleasurable and valuable. Whether giving advice, encouragement and enthusiasm to a colleague, or helping a student experiencing academic or personal difficulty, his seemingly inexhaustible generosity and humanity have formed an indelible impression, not just in this department but in the library and information profession as a whole.

Mike's first area of research on arriving in Sheffield was the development of automatic methods for the production of articulated subject indexes, such as are used in the Chemical Abstracts subject indices. The initial experiments [3] led to the development of an operational software package that was successfully used by several commercial organisations [4]. This period also saw the start of work on the development of automatic methods for the indexing, storage and retrieval of chemical reactions [5]. Particular attention was given to the development of algorithms that could compare the sets of reactant and product molecules to identify those areas which had been changed in the course of the reaction. This proved to be an extremely refractory area, and work continued on it for over a decade before an efficient and effective graph matching procedure was identified [6] that, with further development, has formed the basis for the many public and in-house reaction retrieval systems that are now available [7].

The late sixties saw the start of research into the selection of fragment screens (small atom-centred, bond-centred and ring-centred patterns of atoms and bonds) for chemical substructure searching [8]. The work involved a detailed analysis of the frequencies of occurrence of various types of algorithmic fragment [9]. This statistical information then formed the input to a screen selection procedure that resulted in the selection of a set of screens that occurred approximately equifrequently in the file and that could be used to reduce the computational requirements of the time-consuming graph-matching operations that lie at the heart of chemical substructure searching. Further studies evaluated the effectiveness of substructure search systems based upon the resulting screen sets, the statistical independence of screen assignments, and the relationship between query and structure characteristics, inter alia [10]. The frequency analysis and algorithmic fragment generation procedures developed in Sheffield form the basis for nearly all current systems for substructure searching in databases of two-dimensional chemical structure diagrams, such as CAS Online [11], and related procedures have subsequently been shown to be applicable to searching files for which three-dimensional atomic co-ordinate data are available [12].

For much of the seventies, Mike's main interest was the application of the screen-set techniques to textual databases. Methods were developed for the identification, and subsequent utilisation, of character substrings occurring approximately equifrequently in a range of types of text. Following an initial study of the frequency occurrences of n-grams (substrings containing n adjacent characters) [13], it was shown that carefully selected sets of n-grams could be developed for the efficient implementation of a range of tasks including text compression, search codes for online catalogues, sorting, and searching in serial text files, inter alia [14].

The principal focus of interest in the second half of Mike's career was the storage and retrieval of generic structures, the partially defined molecules that occur in chemical patents [15]. This pioneering work, which spanned some fifteen years, resulted in an input language and a machine-readable representation that can be used for the formal and explicit description of generic structures, algorithmic procedures for the assignment of fragments to generic structures, and a range of retrieval mechanisms to allow efficient and flexible searching of files of generic structures [16]. Many of the principles first enunciated during this research have since been embodied in operational systems for the storage and retrieval of generic chemical structures [17]. Most recently, the need to encompass both the chemical and the textual components of chemical patents has led to an interest in methods for information extraction from natural language patent descriptions [18].

His achievements have been widely recognised. In 1977, he was awarded the prize for the best paper in the Journal of the American Society for Information Science (reference [14] in the list below) and in 1980, he received the annual Award of the Institute of Information Scientists in recognition of his services to information science. In 1989, he was awarded the Skolnik Award of the American Chemical Society, which is made annually to recognise outstanding contributions to the theory and practice of chemical information science, he was the President of the Institute of Information Scientists for the year 1995-96, and is the Honorary President of the Chemical Structure Association.

  1. Dyson, G.M. and Lynch, M.F. Chemical-Biological Activities - a computer-produced express digest. Journal of Chemical Documentation, 3, 1963, 81-85.
  2. Cossum, W.E., Krakiwsky, M.L. and Lynch, M.F. Advances in automatic chemical substructure searching techniques. Journal of Chemical Documentation, 5, 1965, 33-35.
  3. Armitage, J.E. and Lynch, M.F. Articulation in the generation of subject indexes by computer. Journal of Documentation, 7, 1967, 170-178.
  4. Lynch, M.F. and Petrie, J.H. A program suite for the production of articulated subject indexes. Computer Journal, 16, 1973, 46-51.
  5. Armitage, J.E. and Lynch, M.F. Automatic detection of structural similarities among chemical compounds. Journal of the Chemical Society (C), 1967, 521-528.
  6. Lynch, M.F. and Willett, P. The automatic detection of chemical reaction sites. Journal of Chemical Information and Computer Sciences, 18, 1978, 154-159.
  7. Barth, A. Status and future developments of reaction databases and online retrieval systems. Journal of Chemical Information and Computer Sciences, 30, 1990, 384-393.
  8. Barnard, J.M. Substructure searching methods: old and new. Journal of Chemical Information and Computer Sciences, 33, 1993, 532-538.
  9. Adamson, G.W., Cowell, J., Lynch, M.F., McLure, A.H.W., Town, W.G. and Yapp, A.M. Strategic considerations in the design of screening systems for substructure searches of chemical structure files. Journal of Chemical Documentation, 13, 1973, 153-157.
  10. Lynch, M.F. Screening large chemical files. In: Ash, J.E. and Hyde, E. eds. Chemical information systems. Chichester: Ellis Horwood, 1974, 177-194.
  11. Dittmar, P.G., Farmer, N.A., Fisanick, W., Haines, R.C. and Mockus, J. The CAS Online search system. I. General system design and selection, generation and use of search screens. Journal of Chemical Information and Computer Sciences, 23, 1983, 93-102.
  12. Willett, P. Searching for pharmacophoric patterns in databases of three-dimensional chemical structures. Journal of Molecular Recognition, 8, 1995, 290-303.
  13. Clare, A.C., Cook, E.M. and Lynch, M.F. The identification of variable-length character strings in a natural language database. Computer Journal, 15, 1972, 259-262.
  14. Lynch, M.F. Variety generation - a reinterpretation of Shannon's mathematical theory of communication and its implications for information science. Journal of the American Society for Information Science, 28, 1977, 19-25.
  15. Barnard, J.M., ed. Computer handling of generic chemical structures. Aldershot: Gower, 1984.
  16. Lynch, M.F. and Holliday, J. D. The Sheffield Generic Structures Project - a retrospective review. Journal of Chemical Information and
    Computer Sciences
    , 36, 1996, 930-936.
  17. Lynch, M.F. and Downs, G.M. Chemical patent database systems. In: Ash, J.E., Warr, W.A. and Willett, P., eds. Chemical structure systems. Chichester: Ellis Horwood, 1991, 126-153.
  18. Lawson, M., Kemp, N.M., Lynch, M.F. and Chowdhury, G.G. Automatic extraction of citations from the text of English-language patents - an example of template mining. Journal of Information Science, 22, 1996, 423-436.

This is an amended version of the appreciation prepared for: Journal of Documentation, 54, 1998, 1-14.