Professor Thomas Hain

Department of Computer Science

Professor of Speech and Audio Technology

Director of UKRI CDT in Speech and Language Technologies

Director of Centre for Speech and Language Technology (VoiceBase)

Head of the Speech and Hearing (SpandH) research group

Thomas Hain profile photo
t.hain@sheffield.ac.uk
+44 114 222 1836

Full contact details

Professor Thomas Hain
Department of Computer Science
Regent Court (DCS)
211 Portobello
Sheffield
S1 4DP
Profile

Thomas Hain obtained the degree 'Dipl.-Ing' in Electrical/Communication Engineering in 1994 from the University of Technology, Vienna. He joined the Speech Technology Group at Philips Speech Processing which he left in a senior position.

In 1997 he joined the Speech, Vision and Robotics Group at the Cambridge University Engineering Department as Research Associate and PhD Student. He took up a Lectureship at the SVR group in 2001.

In 2004 he joined the Speech and Hearing Group to work as Lecturer in Computer Science. He was promoted to Senior Lecturer in 2008 and Reader in 2011.

Research interests

Thomas' research interests cover many areas in natural language processing, speech, audio and multimedia technology, machine learning, and complex system optimisation and design.

His interests include: large vocabulary continuous speech recognition, non-linear methods in speech processing, low bit-rate speech coding, machine learning, multi-modal systems, image classification, microphone arrays, system and resource optimisation.

Publications

Books

  • Young SJ, Evermann G, Gales MJF, Hain T, Kershaw D, Moore GL, Odell JJ, Ollason D, Povey D, Valtchev V & Woodland PC (2004) The HTK Book. Cambridge, England: Cambridge University Engineering Department. RIS download Bibtex download
  • Young S, Evermann G, Gales M, Hain T, Kershaw D, Xunying L, Moore G, Odell J, Ollason D, Povey D , Ragni A et al () The HTK Book (for HTK Version 3.5, documentation alpha version). Cambridge University Engineering Department: Cambridge University Engineering Department. RIS download Bibtex download

Journal articles

Chapters

  • Saenz JAL & Hain T (2021) Use of Speaker Metadata for Improving Automatic Pronunciation Assessment, Statistical Language and Speech Processing (pp. 61-72). RIS download Bibtex download
  • Hain T & Garner PN (2012) Speech Recognition In Carletta J, Renals S & Bourlard H (Ed.), Multimodal Signal Processing: Human Interactions in Meetings (pp. 56-83). Cambridge: Cambridge University Press. RIS download Bibtex download
  • Moore D, Dines J, Doss MM, Vepa J, Cheng O & Hain T (2006) Juicer: A weighted finite-state transducer speech decoder (pp. 285-296). RIS download Bibtex download
  • Carletta J, Ashby S, Bourban S, Guillemot M, Kronenthal M, Lathoud G, Lincoln M, McCowan I, Hain T, Kraaij W , Post W et al (2005) The AMI Meeting Corpus: A Pre-announcement, Machine Learning for Multimodal Interaction, Lecture Notes in Computer Science (pp. 28-39). Edinburgh: Springer. RIS download Bibtex download
  • Moore RK (2003) Speech recognition In Frawley W & Bright W (Ed.), International encyclopedia of linguistics RIS download Bibtex download
  • Renals S & Hain T () Speech Recognition, The Handbook of Computational Linguistics and Natural Language Processing (pp. 297-332). Wiley-Blackwell RIS download Bibtex download

Conference proceedings papers

Reports

  • el Hannani A & Hain T (2011) Data Dependence of Speech Decoder Parameters RIS download Bibtex download
  • Gibson M & Hain T (2011) Confidence-informed unsupervised Minimum Bayes Risk acoustic model adaptation RIS download Bibtex download
  • Hain T, Dines J & McCowan I (2006) Conversational multi-party speech recognition using remote microphones RIS download Bibtex download
  • Hain T, Woodland PC, Evermann G, Liu X, Moore GL, Povey D & Wang L (2003) Automatic Transcription of Conversational Telephone Speech. Development of the CU-HTK 2002 System RIS download Bibtex download

Theses / Dissertations

  • Hain T (2001) Hidden Model Sequence Models for Automatic Speech Recognition. RIS download Bibtex download
  • Hain T (1993) On the Use of Iterated Function Systems for Coding of Grayscale Images. RIS download Bibtex download

Datasets

Other

Grants

Current grants

  • Automatic voice conversion for transforming professional adult voice actors to artificial child voice actors, Innovate UK, 01/2021 - 01/2023, £173,605, as PI
  • UKRI Centre for Doctoral Training in Speech and Language Technologies and their Applications, EPSRC, 04/2019 to 09/2027, £5,508,850, as PI
  • VoiceBase Centre, VoiceBase Inc., 04/2018 - 03/2022, £1,499,972, as PI
  • WFST-based integration of ASR and MT in Spoken Language Translation, Google, 03/2014 to 12/2022, £63,588, as PI

Previous grants

  • MAUDIE: Multimedia Analysis for Unsupervised Dubbing In Entertainment, Innovate UK, 05/2018 to 07/2021, £393,115, as PI
  • TUTO II: Reading skills tutoring system, ITSLANGUAGE BV, 08/2017 to 12/2019, £121,439, as PI
  • Sound Source Separation Based on Deep Learning, Industrial, 05/2019 - 04/2020, £48,000, as PI
  • Acoustic correlates of emotions for automatic recognition, Industrial, 10/2018 to 09/2019, £48,900, as PI
  • Bridge Project, VoiceBase Inc., 09/2017 to 03/2018, £61,200, as PI
  • STATUS IV: Speech Technology and Translation Universal Survey, Defence Science and Technology Laboratory, 01/2017 to 10/2017, £60,000, as PI
  • TUTO: Reading skills tutoring system, ITSLANGUAGE BV, 09/2016 to 08/2017, £61,983, as PI
  • STATUS III: Speech Technology and Translation Universal Survey, Defence Science and Technology Laboratory, 01/2015 to 07/2016, £78,684, as PI
  • STATUS II: Speech Technology and Translation Universal Survey, Defence Science and Technology Laboratory, 11/2013 to 05/2014, £98,982, as PI
  • ItsLanguage, ITSLANGUAGE BV, 11/2012 to 03/2015, £68,333, as PI
  • German System Adaptation, ITSLANGUAGE BV, 11/2012 to 03/2015, £42,373, as PI
  • DocuMeet: Transcription, summarisation and documentation of meetings using advanced speech technologies, indexing and browsing capabilities, European Commission - FP7, 11/2012 to 10/2014, £368,433, as PI
  • STATUS: Speech Technology and Translation Universal Survey, Defence Science and Technology Laboratory, 10/2012 to 08/2013, £73,726, as PI
  • A Joint Model of Spoken Language Translation, Google, 09/2011 to 12/2016, £43,014, as PI
  • Natural Speech Technology, EPSRC, 05/2011 to 07/2016, £1,798,665, as PI
  • Unsupervised Domain Adaptation, CISCO, 11/2010 to 04/2012, £121,745, as PI
  • AMIDA: Augmented Multi-party Interaction with Distance Access, European Commission - FP6, 10/2006 to 12/2009, £467,074, as PI
  • AMIDA: Augmented Multi-party Interaction with Distance Access, European Commission - FP6, 10/2006 to 12/2009, £345,350, as PI
Professional activities