Professor Thomas Hain

School of Computer Science

Professor of Speech and Audio Technology

Director of CDT in Speech and Language Technologies

Director of Liveperson Centre

Member of the Speech and Hearing (SpandH) research group

t.hain@sheffield.ac.uk

Regent Court (CS)

Full contact details

Professor Thomas Hain
School of Computer Science
Regent Court (CS)
211 Portobello
Sheffield
S1 4DP

Profile

Thomas Hain obtained the degree 'Dipl.-Ing' in Electrical/Communication Engineering in 1994 from the University of Technology, Vienna. He joined the Speech Technology Group at Philips Speech Processing which he left in a senior position.

In 1997 he joined the Speech, Vision and Robotics Group at the Cambridge University Engineering Department as Research Associate and PhD Student. He took up a Lectureship at the SVR group in 2001.

In 2004 he joined the Speech and Hearing Group to work as Lecturer in Computer Science. He was promoted to Senior Lecturer in 2008 and Reader in 2011.

Research interests: Thomas' research interests cover many areas in natural language processing, speech, audio and multimedia technology, machine learning, and complex system optimisation and design.

His interests include: large vocabulary continuous speech recognition, non-linear methods in speech processing, low bit-rate speech coding, machine learning, multi-modal systems, image classification, microphone arrays, system and resource optimisation.

Publications

Books

Young SJ, Evermann G, Gales MJF, Hain T, Kershaw D, Moore GL, Odell JJ, Ollason D, Povey D, Valtchev V & Woodland PC (2004) The HTK Book. Cambridge, England: Cambridge University Engineering Department.
Young S, Evermann G, Gales M, Hain T, Kershaw D, Xunying L, Moore G, Odell J, Ollason D, Povey D , Ragni A et al () The HTK Book (for HTK Version 3.5, documentation alpha version). Cambridge University Engineering Department: Cambridge University Engineering Department.

Journal articles

Sudro PN, Ragni A & Hain T (2025) A comparative study of generative models for child voice conversion.. CoRR, abs/2512.12129.
Farooq MU & Hain T (2025) Enhancing Low-Resource Speech Recognition With Non-Linear Cross-Lingual Mappings. IEEE Transactions on Audio, Speech and Language Processing, 33, 4653-4666.
Song H, Zhang L, Gao M, Zhang H, Hain T & Shan L (2025) MS-EmoBoost: a novel strategy for enhancing self-supervised speech emotion representations. Scientific Reports, 15(1). View this article in WRRO
Hasan M, Jefferson N, Hain T & Dawson J (2022) Automatic detection of behavioural codes in team interactions. Computer Speech & Language, 74, 101339-101339.
Ravenscroft W, Goetze S & Hain T (2022) Att-TasNet: attending to encodings in time-domain audio speech separation of noisy, reverberant speech mixtures. Frontiers in Signal Processing, 2. View this article in WRRO
Shi Y, Huang Q & Hain T (2021) H-VECTORS : improving the robustness in utterance-level speaker embeddings using a hierarchical attention model. Neural Networks, 142, 329-339. View this article in WRRO
El Hannani A, Errattahi R, Salmam FZ, Hain T & Ouahmane H (2021) Evaluation of the effectiveness and efficiency of state-of-the-art features and models for automatic speech recognition error detection. Journal of Big Data, 8.
Errattahia R, Hannani AEL, Hain T & Ouahmane H (2019) System-independent ASR error detection and classification using Recurrent Neural Network. Computer Speech and Language, 55, 187-199. View this article in WRRO
Deena S, Hasan M, Doulaty M, Saz O & Hain T (2019) Recurrent neural network language model adaptation for multi-genre broadcast speech recognition and alignment. IEEE/ACM Transactions on Audio, Speech and Language Processing, 27(3), 572-582. View this article in WRRO
Saz Torralba O, Deena S, Doulaty M, Hasan M, Khaliq B, Milner R, Ng RWM, Olcoz J & Hain T (2018) Lightly supervised alignment of subtitles on multi-genre broadcasts. Multimedia Systems, 77(23), 30533-30550. View this article in WRRO
Ng W, Nicolao M & Hain T (2017) Unsupervised crosslingual adaptation of tokenisers for spoken language recognition. Computer Speech and Language, 46, 327-342.
Saz O & Hain T (2017) Acoustic Adaptation to Dynamic Background Conditions with Asynchronous Transformations. Computer, Speech & Language, 41, 180-194. View this article in WRRO
Kamper H, De Wet F, Hain T & Niesler T (2014) Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system. Computer Speech and Language, 28(6), 1255-1268.
Fox C & Hain T (2013) Lightly supervised learning from a damaged natural speech corpus. ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 8086-8090.
Gibson M & Hain T (2012) Application of SVM-based correctness predictions to unsupervised discriminative speaker adaptation. ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 4341-4344.
Lecorvé G, Dines J, Hain T & Motlicek P (2012) Supervised and unsupervised Web-based language model domain adaptation. 13th Annual Conference of the International Speech Communication Association 2012 Interspeech 2012, 1, 182-185.
Gibson M & Hain T (2012) Correctness-adjusted unsupervised discriminative acoustic model adaptation. IEEE Transactions on Audio, Speech and Language Processing, PP(99).
Furui S, Fiscus J, Friedland G & Hain T (2012) Introduction to the Special Section on New Frontiers in Rich Transcription. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 20(2), 353-355.
Alharbi G & Hain T (2012) Automatic transcription of academic lectures from diverse disciplines. 2012 IEEE Workshop on Spoken Language Technology Slt 2012 Proceedings, 398-403.
ZHOU YAN, GRYGORASH O & HAIN TF (2011) CLUSTERING WITH MINIMUM SPANNING TREES. International Journal on Artificial Intelligence Tools, 20(01), 139-177.
Hain T, Burget L, Dines J, Garner PN, Grezl F, el Hannani A, Huijbregts M, Karafiat M, Lincoln M & Wan V (2011) Transcribing meetings with the AMIDA systems. IEEE Transactions on Audio, Speech and Language Processing.
El Hannani A & Hain T (2010) Automatic Optimization of Speech Decoder Parameters. IEEE SIGNAL PROC LET, 17(1), 95-98.
Gibson M & Hain T (2010) Error approximation and minimum phone error acoustic model estimation. IEEE Transactions on Audio Speech and Language Processing, 18(6), 1269-1279.
Karafiát M, Burget L, Hain T & Černocký J (2008) Discrimininative training of narrow band - Wide band adapted systems for meeting recognition. Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 1217-1220.
Hain T, El Hannani A, Wrigley SN & Wan V (2008) Automatic speech recognition for scientific purposes - WebASR. Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 504-507.
Karafiát M, Burget L, Hain T & Černocký J (2008) Discrimininative training of narrow band - Wide band adapted systems for meeting recognition. INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association, 1217-1220.
Fife TD, Iverson DJ, Lempert T, Furman JM, Baloh RW, Tusa RJ, Hain TC, Herdman S, Morrow MJ & Gronseth GS (2008) Practice Parameter: Therapies for benign paroxysmal positional vertigo (an evidence-based review): [RETIRED]. Neurology, 70(22), 2067-2074.
Karafiát M, Burget L, Černocký J & Hain T (2007) Application of CMLLR in narrow band wide band adapted systems. International Speech Communication Association 8th Annual Conference of the International Speech Communication Association Interspeech 2007, 4, 2860-2863.
Renais S, Hain T & Boudard H (2007) Recognition and understanding of meetings the AMI and AMIDA projects. 2007 IEEE Workshop on Automatic Speech Recognition and Understanding Asru 2007 Proceedings, 238-247.
Hain T, Burget L, Dines J, Garau G, Wan V, Karafiat M, Vepa J & Lincoln M (2007) The AMI system for the transcription of speech in meetings. ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 4, IV357-IV360.
Wan V & Hain T (2006) Strategies for language model web-data collection. ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 1, I1069-I1072.
Bauer JJ, Mittal J, Larson CR & Hain TC (2006) Vocal responses to unanticipated perturbations in voice loudness feedback: An automatic mechanism for stabilizing voice amplitude. The Journal of the Acoustical Society of America, 119(4), 2363-2371.
Hain T, Woodland PC, Evermann G, Gales MJF, Liu X, Moore GL, Povey D & Wang L (2006) Corrections to "Automatic Transcription of Conversational Telephone Speech".. IEEE Trans. Speech Audio Process., 14, 727-727.
Hain T, Woodland PC, Evermann G, Gales MJF, Liu XY, Moore GL, Povey D & Wang L (2005) Automatic transcription of conversational telephone speech. IEEE T SPEECH AUDI P, 13(6), 1173-1185.
Hain TC & Yacovino D (2005) Pharmacologic Treatment of Persons with Dizziness. Neurologic Clinics, 23(3), 831-853.
Gurses S, Dhaher Y, Hain TC & Keshner EA (2005) Perturbation parameters associated with nonlinear responses of the head at small amplitudes. Chaos: An Interdisciplinary Journal of Nonlinear Science, 15(2).
Hain T (2005) Implicit modelling of pronunciation variation in automatic speech recognition. SPEECH COMMUNICATION, 46(2), 171-188.
Moon IS & Hain TC (2005) Delayed Quick Spins after Vestibular Nerve Section Respond to Anticonvulsant Therapy. Otology & Neurotology, 26(1), 82-85.
Yacovino DA & Hain TC (2004) Farmacología de las alteraciones vestibulares. Revista de Neurología, 39(04), 381-381.
Xu Y, Larson CR, Bauer JJ & Hain TC (2004) Compensation for pitch-shifted auditory feedback during the production of Mandarin tone sequences. The Journal of the Acoustical Society of America, 116(2), 1168-1178.
Squires TM, Weidman MS, Hain TC & Stone HA (2004) A mathematical model for top-shelf vertigo: the role of sedimenting otoconia in BPPV. Journal of Biomechanics, 37(8), 1137-1146.
Furman JM & Hain TC (2004) “Do try this at home”. Neurology, 63(1), 8-9.
Yacovino DA & Hain TC (2004) Vibración cervical: utilidad neurootológica. Revista de Neurología, 38(11), 1061-1061.
Brenner M, Hoistad DL & Hain TC (2004) Prevalence of Thyroid Dysfunction in Patients With Ménière's Disease. Archives of Otolaryngology–Head & Neck Surgery, 130(2), 226-226.
Hoistad DL & Hain TC (2003) Central Hearing Loss with a Bilateral Inferior Colliculus Lesion. Audiology and Neurotology, 8(2), 111-113.
Hain TC & Uddin M (2003) Pharmacological Treatment of Vertigo. CNS Drugs, 17(2), 85-100.
Simoneau M, Tinker S, Hain T & Lee W (2003) Effects of predictive mechanisms on head stability during forward trunk perturbation. Experimental Brain Research, 148(3), 338-349.
Mamikoglu B, Wiet RJ, Hain T & Check IJ (2002) Increased CD4 + T cells During Acute Attack of Me´nie`re's Disease. Acta Oto-Laryngologica, 122(8), 857-860.
Chen KJ, Keshner EA, Peterson BW & Hain TC (2002) Modeling head tracking of visual targets. Journal of Vestibular Research, 12(1), 25-33.
Chiu B & Hain TC (2002) Periodic Alternating Nystagmus Provoked by an Attack of M??ni??re's Disease. Journal of Neuro-Ophthalmology, 22(2), 107-109.
Hain TC, Burnett TA, Larson CR & Kiran S (2001) Effects of delayed auditory feedback (DAF) on the pitch-shift reflex. The Journal of the Acoustical Society of America, 109(5), 2146-2152.
(2000) Assessment: Vestibular testing techniques in adults and children [RETIRED]. Neurology, 55(10), 1431-1441.
Hain TC (2000) Mal de debarquement -: In reply. ARCHIVES OF OTOLARYNGOLOGY-HEAD & NECK SURGERY, 126(6), 805-806.
Larson CR, Burnett TA, Kiran S & Hain TC (2000) Effects of pitch-shift velocity on voice F0 responses. The Journal of the Acoustical Society of America, 107(1), 559-564.
Hain TC, Burnett TA, Kiran S, Larson CR, Singh S & Kenney MK (2000) Instructing subjects to make a voluntary response reveals the presence of two components to the audio-vocal reflex. Experimental Brain Research, 130(2), 133-141.
Keshner EA, Hain TC & Chen KJ (1999) Predicting control mechanisms for human head stabilization by altering the passive mechanics. Journal of Vestibular Research, 9(6), 423-434.
Peng GCY, Hain TC & Peterson BW (1999) Predicting vestibular, proprioceptive, and biomechanical control strategies in normal and pathological head movements. IEEE Transactions on Biomedical Engineering, 46(11), 1269-1280.
Riggs LC, Shofner WP, Shah AR, Young MR, Hain TC & Matz GJ (1999) Ototoxicity resulting from combined administration of metronidazole and gentamicin.. Am J Otol, 20(4), 430-434.
Burnett TA, Freedland MB, Larson CR & Hain TC (1998) Voice F0 responses to manipulations in pitch feedback. The Journal of the Acoustical Society of America, 103(6), 3153-3161.
Hain TC & Ostrowski VB (1997) Limits of Normal for Pressure Sensitivity in the Fistula Test. Audiology and Neurotology, 2(6), 384-390.
Peng GCY, Hain TC & Peterson BW (1996) A dynamical model for reflex activated head movements in the horizontal plane. Biological Cybernetics, 75(4), 309-319.
Furman JM, Baloh RW, Hain TC, Hirsch BE, Parker SW, Ferguson JH, Altrocchi PH, Brin M, Goldstein ML, Gorelick PB , Hanley DF et al (1996) Assessment: Electronystagmography. NEUROLOGY, 46(6), 1763-1766.
Rogers MW, Hain TC, Hanke TA & Janssen I (1996) Stimulus parameters and inertial load: Effects on the incidence of protective stepping responses in healthy human subjects. Archives of Physical Medicine and Rehabilitation, 77(4), 363-368.
Young NM, Mets MB & Hain TC (1996) Early diagnosis of Usher syndrome in infants and children.. Am J Otol, 17(1), 30-34.
Rascol O, Hain TC, Brefel C, Benazet M, Clanet M & Montastruc J-L (1995) Antivertigo Medications and Drug-Induced Vertigo. Drugs, 50(5), 777-791.
Young NM, Johnson JC, Mets MB & Hain TC (1995) Cochlear implants in young children with Usher's syndrome.. Ann Otol Rhinol Laryngol Suppl, 166, 342-345.
Hain TC (1995) Treatment of vertigo. NEUROLOGIST, 1(3), 125-133.
Ward C, Choi CH & Hain TF (1995) A data link control protocol for LEO satellite networks providing a reliable datagram service. IEEE/ACM Transactions on Networking, 3(1), 91-103.
Young NM, Johnson JC, Mets MB & Hain TC (1995) Cochlear Implants in Young Children with Usher'S Syndrome. Annals of Otology Rhinology & Laryngology, 104(9_suppl2), 342-345.
Harvey SA, Hain TC & Adamiec LC (1994) Modified liberatory maneuver: Effective treatment for benign paroxysmal positional vertigo. The Laryngoscope, 104(10), 1206-1212.
Ward C, Choi CH & Hain TF (1994) Performance of LAMS‐DLC: A protocol for low altitude satellite networks. International Journal of Satellite Communications, 12(6), 507-524.
Hain TC, Mattox D, Herdman SJ, Zee DS, Holliday M & Byskosh AT (1994) Localizing Value of Optokinetic Afternystagmus. Annals of Otology, Rhinology & Laryngology, 103(10), 806-811.
Herdman SJ, Sandusky AL, Hain TC, Zee DS & Tusa RJ (1994) Characteristics of postural stability in patients with aminoglycoside toxicity.. J Vestib Res, 4(1), 71-80.
(1993) Assessment: posturography. Report of the Therapeutics and Technology Assessment Subcommittee of the American Academy of Neurology.. Neurology, 43(6), 1261-1264.
TUSA RJ, ZEE DS, HAIN TC & SIMONSZ HJ (1992) VOLUNTARY CONTROL OF CONGENITAL NYSTAGMUS. CLINICAL VISION SCIENCES, 7(3), 195-210.
Zee DS & Hain TC (1992) Clinical implications of otolith-ocular reflexes.. Am J Otol, 13(2), 152-157.
Hain TC & Patel G (1992) Slow Cumulative Eye Position to Quantify Optokinetic Afternystagmus. Annals of Otology, Rhinology & Laryngology, 101(3), 255-260.
Hain TC & Zee DS (1991) Abolition of Optokinetic Afternystagmus by Aminoglycoside Ototoxicity. Annals of Otology, Rhinology & Laryngology, 100(7), 580-583.
ASHE J, HAIN TC, ZEE DS & SCHATZ NJ (1991) MICROSACCADIC FLUTTER. Brain, 114(1), 461-472.
Hain TC & Buettner UW (1990) Static roll and the vestibulo-ocular reflex (VOR). Experimental Brain Research, 82(3).
Hain TC & Luebke AE (1990) Phoria adaptation in patients with cerebellar dysfunction.. Invest Ophthalmol Vis Sci, 31(7), 1394-1397.
Fletcher WA, Hain TC & Zee DS (1990) Optokinetic nystagmus and afternystagmus in human beings: relationship to nonlinear processing of information about retinal slip. Experimental Brain Research, 81(1).
Tusa RJ, Kaplan PW, Hain TC & Naidu S (1990) Ipsiversive eye deviation and epileptic nystagmus. Neurology, 40(4), 662-662.
Wei D, Hain TC & Proctor LR (1989) Head-shaking Nystagmus: Associations with Canal Paresis and Hearing Loss. Acta Oto-Laryngologica, 108(5-6), 362-367.
Tijssen MAJ, Hain TC, Straathof CSM & Zee DS (1989) Optokinetic Afternystagmus in Humans: Normal Values of Amplitude, Time Constant, and Asymmetry. Annals of Otology, Rhinology & Laryngology, 98(9), 741-746.
Furman JMR, Hain TC & Paige GD (1989) Central adaptation models of the vestibulo-ocular and optokinetic systems. Biological Cybernetics, 61(4), 255-264.
Hain TC & Zee DS (1989) Vergence.. Bull Soc Belge Ophtalmol, 237, 145-161.
Hain TC, Zee DS & Maria BL (1988) Tilt Suppression of Vestibulo-ocular Reflex in Patients with Cerebellar Lesions. Acta Oto-Laryngologica, 105(1-2), 13-20.
Lasker AG, Zee DS, Hain TC, Folstein SE & Singer HS (1988) Saccades in Huntington's disease. Neurology, 38(3), 427-427.
Zee DS, Hain TC & Carl JR (1987) Abduction nystagmus in internuclear ophthalmoplegia. Annals of Neurology, 21(4), 383-388.
Lasker AG, Zee DS, Hain TC, Folstein SE & Singer HS (1987) Saccades in Huntington's disease. Neurology, 37(3), 364-364.
Hain TC, Fetter M & Zee DS (1987) Head-shaking nystagmus in patients with unilateral peripheral vestibular lesions. American Journal of Otolaryngology, 8(1), 36-47.
Kapoula Z, Hain TC, Zee DS & Robinson DA (1987) Adaptive changes in post-saccadic drift induced by patching one eye. Vision Research, 27(8), 1299-1307.
Hain TC, Zee DS & Mordes M (1986) Blink‐induced saccadic oscillations. Annals of Neurology, 19(3), 299-301.
Fetter M, Hain TC & Zee DS (1986) Influence of eye and head position on the vestibulo-ocular reflex. Experimental Brain Research, 64(1).
Hain TC (1986) A model of the nystagmus induced by off vertical axis rotation. Biological Cybernetics, 54(4-5), 337-350.
Kapoula ZA, Robinson DA & Hain TC (1986) Motion of the eye immediately after a saccade. Experimental Brain Research, 61(2).
Robinson DA, Zee DS, Hain TC, Holmes A & Rosenberg LF (1984) Alexander's law: Its behavior and origin in the human vestibulo‐ocular reflex. Annals of Neurology, 16(6), 714-722.
Wong RL, Reno JD, Hain TC, Platt RC, Gaynon PS & Joseph DM (1980) Profile of a dictionary compiled from scanning over one million words of surgical pathology narrative text. Computers and Biomedical Research, 13(4), 382-398.
Hain TF (1975) A solid-state “leaky” integrator. Nuclear Instruments and Methods, 127(1), 23-28.
Chang C, Hain TC, Hutton JR & Wetmur JG (1974) Effects of microscopic and macroscopic viscosity on the rate of renaturation of DNA. Biopolymers, 13(9), 1847-1858.

Book chapters

Close G, Hong K, Hain T & Goetze S (2026) WhiSQA: Non-intrusive Speech Quality Prediction Using Whisper Encoder Features, Lecture Notes in Computer Science (pp. 39-51). Springer Nature Switzerland
Saenz JAL & Hain T (2021) Use of Speaker Metadata for Improving Automatic Pronunciation Assessment, Lecture Notes in Computer Science (pp. 61-72). Springer International Publishing
(2016) Mal de débarquement syndrome, Handbook of Clinical Neurology (pp. 391-395). Elsevier
Hain T & Garner PN (2012) Speech recognition, Multimodal Signal Processing (pp. 56-83). Cambridge University Press
Renals S & Hain T (2010) Speech Recognition In Clark A, Fox C & Lappin S (Ed.), The Handbook of Computational Linguistics and Natural Language Processing (pp. 299-332). Wiley-Blackwell
Moore D, Dines J, Doss MM, Vepa J, Cheng O & Hain T (2006) Juicer: A weighted finite-state transducer speech decoder (pp. 285-296).
Carletta J, Ashby S, Bourban S, Guillemot M, Kronenthal M, Lathoud G, Lincoln M, McCowan I, Hain T, Kraaij W , Post W et al (2005) The AMI Meeting Corpus: A Pre-announcement, Machine Learning for Multimodal Interaction, Lecture Notes in Computer Science (pp. 28-39). Edinburgh: Springer.
Kapoula Z, Robinson DA & Hain TC (1987) DYNAMIC OVERSHOOT AND POST-SACCADIC DRIFT, Eye Movements from Physiology to Cognition (pp. 158-159). Elsevier

Conference proceedings

Meghanani A & Hain T (2026) Position-Invariant Fine-Tuning Of Speech Enhancement Models With Self-Supervised Speech Representations. ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp 18177-18181), 3 May 2026 - 8 May 2026.
Cassini SR, Hain T & Ragni A (2025) Emphasis Sensitivity in Speech Representations. 2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Vol. 00 (pp 1-8)
Close G, Hong K, Hain T & Goetze S (2025) WhiSQA: Non-intrusive speech quality prediction using whisper encoder features. Speech and Computer, Vol. 16187(Part 1) (pp 39-51). Szeged, Hungary, 13 October 2025 - 13 October 2025. View this article in WRRO
Park C & Hain T (2025) Semi-supervised learning for automatic speech recognition with word error rate estimation and targeted domain data selection. Proceedings of Interspeech 2025 (pp 3663-3667). Rotterdam, The Netherlands, 17 August 2025 - 17 August 2025. View this article in WRRO
Park C, Lu C, Chen M & Hain T (2025) Fast word error rate estimation using self-supervised representations for speech and text. ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp 1-5). Hyderabad, India, 6 April 2025 - 6 April 2025. View this article in WRRO
Do C-T, Imai S, Doddipatla R & Hain T (2024) Improving accented speech recognition using data augmentation based on unsupervised text-to-speech synthesis. 2024 32nd European Signal Processing Conference (EUSIPCO) (pp 136-140). Lyon, France, 26 August 2024 - 26 August 2024. View this article in WRRO
Park C, Kang H & Hain T (2024) Character error rate estimation for automatic speech recognition of short utterances. Proceedings of 2024 32nd European Signal Processing Conference (EUSIPCO) (pp 131-135). Lyon, France, 26 August 2024 - 26 August 2024. View this article in WRRO
Close G, Hain T & Goetze S (2024) Hallucination in perceptual metric-driven speech enhancement networks. Proceedings of 2024 32nd European Signal Processing Conference (EUSIPCO) (pp 21-25). Lyon, France, 26 August 2024 - 26 August 2024. View this article in WRRO
Sutherland R, Close G, Hain T, Goetze S & Barker J (2024) Using speech foundational models in loss functions for hearing aid speech enhancement. Proceedings of 2024 32nd European Signal Processing Conference (EUSIPCO) (pp 421-425). Lyon, France, 26 August 2024 - 26 August 2024. View this article in WRRO
Meghanani A & Hain T (2024) LASER: Learning by aligning self-supervised representations of speech for improving content-related tasks. Interspeech 2024 (pp 2835-2839). Kos, Greece, 1 September 2024 - 1 September 2024. View this article in WRRO
Ravenscroft W, Close G, Goetze S, Hain T, Soleymanpour M, Chowdhury A & Fuhs MC (2024) Transcription-free fine-tuning of speech separation models for noisy and reverberant multi-speaker automatic speech recognition. Proceedings of Interspeech 2024 (pp 4998-5002). Kos Island, Greece, 1 September 2024 - 1 September 2024. View this article in WRRO
Ma Z, Chen M, Zhang H, Zheng Z, Chen W, Li X, Ye J, Chen X & Hain T (2024) EmoBox: Multilingual multi-corpus speech emotion recognition toolkit and benchmark. Interspeech 2024 (pp 1580-1584). Kos Island, Greece, 1 September 2024 - 1 September 2024. View this article in WRRO
Mogridge R, Close G, Sutherland R, Hain T, Barker J, Goetze S & Ragni A (2024) Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users Using Intermediate ASR Features and Human Memory Models.. ICASSP (pp 306-310)
Ahmad R, Farooq MU & Hain T (2024) Progressive unsupervised domain adaptation for ASR using ensemble models and multi-stage training. ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. 2024 (pp 11466-11470). Seoul, Korea, 14 April 2024 - 14 April 2024. View this article in WRRO
Meghanani A & Hain T (2024) SCORE: Self-supervised correspondence fine-tuning for improved content representations. ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. 2024 (pp 12086-12090). Seoul, Korea, 14 April 2024 - 14 April 2024. View this article in WRRO
Close G, Ravenscroft W, Hain T & Goetze S (2024) Multi-CMGAN+/+: leveraging multi-objective speech quality metric prediction for speech enhancement. ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. 2024 (pp 351-355). Seoul, Korea, 14 April 2024 - 14 April 2024. View this article in WRRO
Farooq MU, Ahmad R & Hain T (2024) MUST: A MUltilingual Student-Teacher learning approach for low-resource speech recognition. Proceedings of 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp 1-6). Taipei, Taiwan, 16 December 2023 - 16 December 2023. View this article in WRRO
Ravenscroft JW, Goetze S & Hain T (2024) On time domain conformer models for monaural speech separation in noisy reverberant acoustic environments. 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). Taipei, Taiwan, 16 December 2023 - 16 December 2023. View this article in WRRO
Meghanani A & Hain T (2024) Deriving translational acoustic sub-word embeddings. 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Proceedings. Taipei, Taiwan, 16 December 2023 - 16 December 2023. View this article in WRRO
Islam E, Hain T & Nomo Sudro P (2024) Simulation of teacher-learner interaction in English language pronunciation learning. 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). Taipei, Taiwan, 16 December 2023 - 16 December 2023. View this article in WRRO
Park C, Chen M & Hain T (2024) Automatic speech recognition system-independent word error rate estimation. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp 1979-1987). Torino, Italy, 20 May 2024 - 20 May 2024. View this article in WRRO
Meghanani A & Hain T (2024) Improving acoustic word embeddings through correspondence training of self-supervised speech representations. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1 (pp 1959-1967). St. Julian’s, Malta, 17 March 2024 - 17 March 2024. View this article in WRRO
Iakovenko O & Hain T (2024) Methods of automatic matrix language determination for code-switched speech. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp 5791-5800). Miami, Florida, USA, 12 November 2024 - 12 November 2024. View this article in WRRO
Ravenscroft J, Goetze S & Hain T (2023) On data sampling strategies for training neural network speech separation models. 2023 31st European Signal Processing Conference (EUSIPCO). Helsinki, Finland, 4 September 2023 - 4 September 2023. View this article in WRRO
Nomo Sudro P, Ragni A & Hain T (2023) Adapting pretrained models for adult to child voice conversion. 2023 31st European Signal Processing Conference (EUSIPCO) Proceedings (pp 271-275). Helsinki, Finland, 4 September 2023 - 4 September 2023. View this article in WRRO
Ollerenshaw A, Jalal MA & Hain T (2023) Probing statistical representations for End-to-End ASR. 2023 31st European Signal Processing Conference (EUSIPCO) Proceedings (pp 401-405). Helsinki, Finland, 4 September 2023 - 4 September 2023. View this article in WRRO
Close G, Hain T & Goetze S (2023) The effect of spoken language on speech enhancement using self-supervised speech representation loss functions. Proceedings of 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz, NY, USA, 22 October 2023 - 22 October 2023. View this article in WRRO
Close GL, Ravenscroft W, Hain T & Goetze S (2023) The University of Sheffield CHiME-7 UDASE challenge speech enhancement system. Proc. 7th International Workshop on Speech Processing in Everyday Environments (CHiME 2023) (pp 33-38). Dublin, Ireland, 25 August 2023 - 25 August 2023. View this article in WRRO
Farooq MU & Hain T (2023) Learning cross-lingual mappings for data augmentation to improve low-resource speech recognition. Interspeech 2023 Proceedings (pp 5072-5076). Dublin, Ireland, 20 August 2023 - 20 August 2023. View this article in WRRO
Islam E, Park C & Hain T (2023) Exploring speech representations for proficiency assessment in language learning. 9th Workshop on Speech and Language Technology in Education (SLaTE) Proceedings (pp 151-155). Dublin, Ireland, 18 August 2023 - 18 August 2023. View this article in WRRO
Close G, Hain T & Goetze S (2023) PAMGAN+/-: Improving phase-aware speech enhancement performance via expanded discriminator training. AES Convention Europe 2023: 154th Audio Engineering Society Conference (pp 10656). Espoo, Helsinki, FInland, 13 May 2023 - 13 May 2023. View this article in WRRO
Ahmad R, Jalal MA, Umar Farooq M, Ollerenshaw A & Hain T (2023) Towards domain generalisation in ASR with elitist sampling and ensemble knowledge distillation. Proceedings of ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Rhodes Island, Greece, 4 June 2023 - 4 June 2023. View this article in WRRO
Ravenscroft W, Goetze S & Hain T (2023) Deformable temporal convolutional networks for monaural noisy reverberant speech separation. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Rhodes Island, Greece, 4 June 2023 - 4 June 2023. View this article in WRRO
Close G, Ravenscroft W, Hain T & Goetze S (2023) Perceive and predict: self-supervised speech representation based loss functions for speech enhancement. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Proceedings. Rhodes Island, Greece, 4 June 2023 - 4 June 2023. View this article in WRRO
Park B, Park C & Li G (2022) DRX mode implementation based on virtual machine. 2022 29th IEEE International Conference on Electronics, Circuits and Systems (ICECS) (pp 1-4), 24 October 2022 - 26 October 2022.
Ravenscroft W, Goetze S & Hain T (2022) Receptive field analysis of temporal convolutional networks for monaural speech dereverberation. Proceedings of 30th European Signal Processing Conference (EUSIPCO 2022) (pp 80-84). Belgrade, Serbia, 29 August 2022 - 29 August 2022. View this article in WRRO
Ollerenshaw A, Jalal MA & Hain T (2022) Insights of neural representations in multi-banded and multi-channel convolutional transformers for end-to-end ASR. Proceedings of 2022 30th European Signal Processing Conference (EUSIPCO) (pp 434-438). Belgrade, Serbia, 29 August 2022 - 29 August 2022. View this article in WRRO
Close G, Hain T & Goetze S (2022) MetricGAN+/-: increasing robustness of noise reduction on unseen data. Proceedings of 2022 30th European Signal Processing Conference (EUSIPCO) (pp 165-169). Belgrade, Serbia, 29 August 2022 - 29 August 2022. View this article in WRRO
Ravenscroft W, Goetze S & Hain T (2022) Utterance weighted multi-dilation temporal convolutional networks for monaural speech dereverberation. Proceedings of 2022 International Workshop on Acoustic Signal Enhancement (IWAENC). Bamberg, Germany, 5 September 2022 - 5 September 2022. View this article in WRRO
Farooq MU, Haniya Narayana DA & Hain T (2022) Non-linear pairwise language mappings for low-resource multilingual acoustic model fusion. Interspeech 2022 - 23rd Annual Conference of the International Speech Communication Association (pp 4850-4854). Incheon, Korea, 18 September 2022 - 18 September 2022. View this article in WRRO
Farooq MU & Hain T (2022) Investigating the impact of cross-lingual acoustic-phonetic similarities on multilingual speech recognition. Interspeech 2022 - 23rd Annual Conference of the International Speech Communication Association (pp 3849-3853). Incheon, Korea, 18 September 2022 - 18 September 2022. View this article in WRRO
Close G, Hollands S, Hain T & Goetze S (2022) Non-intrusive speech intelligibility estimated by metric prediction for hearing impaired individuals for the clarity prediction challenge 1. Interspeech 2022 - 23rd Annual Conference of the International Speech Communication Association (pp 3483-3487). Incheon, Korea, 18 September 2022 - 18 September 2022. View this article in WRRO
Lopez Saenz JA & Hain T (2022) A Model for Assessor Bias in Automatic Pronunciation Assessment. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp 7267-7271), 23 May 2022 - 27 May 2022.
Park C, Ahmad R & Hain T (2022) Unsupervised data selection for speech recognition with contrastive loss ratios. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp 8587-8591). Singapore, Singapore, 23 May 2022 - 23 May 2022. View this article in WRRO
Saenz JAL, Jalal MA, Milner R & Hain T (2021) Attention Based Model for Segmental Pronunciation Error Detection. 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp 725-732), 13 December 2021 - 17 December 2021.
Huang S, Chen M, Xu Y, Ke D & Hain T (2021) WINVC : one-shot voice conversion with weight adaptive instance normalization. PRICAI 2021: Trends in Artificial Intelligence 18th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2021, Hanoi, Vietnam, November 8–12, 2021, Proceedings, Part II (pp 559-573). Hanoi, Vietnam (virtual), 8 November 2021 - 8 November 2021. View this article in WRRO
Huang Q & Hain T (2021) Improving audio anomalies recognition using temporal convolutional attention networks. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp 6473-6477). Toronto, ON, Canada, 6 June 2021 - 11 June 2021.
Chen M, Shi Y & Hain T (2021) Towards low-resource StarGAN voice conversion using weight adaptive instance normalization. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, ON, Canada, 6 June 2021 - 6 June 2021. View this article in WRRO
Do C-T, Doddipatla R & Hain T (2021) Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. 2021 (pp 6978-6982). Toronto, Ontario, Canada, 6 June 2021 - 6 June 2021. View this article in WRRO
Shi Y & Hain T (2021) Supervised speaker embedding de-mixing in two-speaker environment. 2021 IEEE Spoken Language Technology Workshop (SLT) (pp 758-765). Shenzhen, China, 19 January 2021 - 22 January 2021.
Shi Y & Hain T (2021) Contextual joint factor acoustic embeddings. 2021 IEEE Spoken Language Technology Workshop (SLT) (pp 750-757). Shenzhen, China, 19 January 2021 - 19 January 2021. View this article in WRRO
Do C-T, Zhang S & Hain T (2021) Selective Adaptation of End-to-End Speech Recognition using Hybrid CTC/Attention Architecture for Noise Robustness. 2020 28th European Signal Processing Conference (EUSIPCO) (pp 321-325), 18 January 2021 - 21 January 2021.
Friedl K, Rizos G, Stappen L, Hasan M, Specia L, Hain T & Schuller B (2021) Uncertainty Aware Review Hallucination for Science Article Classification. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp 5004-5009), August 2021 - August 2021.
Ollerenshaw A, Jalal MA & Hain T (2021) Insights on neural representations for end-to-end speech recognition. Interspeech 2021 (pp 4079-4083). Brno, Czechia, 30 August 2021 - 30 August 2021. View this article in WRRO
Shi Y, Huang Q & Hain T (2020) Robust speaker recognition using speech enhancement and attention model. The Speaker and Language Recognition Workshop (Odyssey 2020) (pp 451-458). Tokyo, Japan, 1 November 2020 - 5 November 2020.
Chen M & Hain T (2020) Unsupervised acoustic unit representation learning for voice conversion using WaveNet auto-encoders. Interspeech 2020 (pp 4866-4870). Shanghai, China, 25 October 2020 - 25 October 2020. View this article in WRRO
Jalal MA, Milner R & Hain T (2020) Empirical interpretation of speech emotion perception with attention based model for speech emotion Recognition. Proceedings of Interspeech 2020 (pp 4113-4117). Shanghai, China (Online), 25 October 2020 - 25 October 2020. View this article in WRRO
Jalal MA, Milner R, Hain T & Moore RK (2020) Removing Bias with Residual Mixture of Multi-View Attention for Speech Emotion Recognition. Interspeech 2020 (pp 4084-4088). Shanghai, China, 25 October 2020 - 29 October 2020.
Stappen L, Rizos G, Hasan M, Hain T & Schuller BW (2020) Uncertainty-aware machine support for paper reviewing on the Interspeech 2019 submission corpus. Interspeech 2020 (pp 1808-1812). Shanghai, China, 25 October 2020 - 25 October 2020. View this article in WRRO
Shi Y, Huang Q & Hain T (2020) Weakly supervised training of hierarchical attention networks for speaker identification. Proceedings of Interspeech 2020 (pp 2992-2996). Shanghai, China, 25 October 2020 - 29 October 2020.
Huang Q & Hain T (2020) Exploration of audio quality assessment and anomaly localisation using attention models. Proceedings of Interspeech 2020 (pp 4611-4615). Shanghai, China, 25 October 2020 - 29 October 2020.
Shi Y, Huang Q & Hain T (2020) Speaker re-identification with speaker dependent speech enhancement. Proceedings of Interspeech 2020 (pp 1530-1534). Shanghai, China, 25 October 2020 - 29 October 2020.
Shi Y, Huang Q & Hain T (2020) H-vectors : utterance-level speaker embedding using a hierarchical attention model. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp 7579-7583). Barcelona, Spain (virtual), 4 May 2020 - 8 May 2020.
Sailor HB, Deena S, Jalal MA, Lileikyte R & Hain T (2019) Unsupervised Adaptation of Acoustic Models for ASR Using Utterance-Level Embeddings from Squeeze and Excitation Networks. 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp 980-987), 14 December 2019 - 18 December 2019.
Jalal MA, Moore RK & Hain T (2019) Spatio-Temporal Context Modelling for Speech Emotion Classification. 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp 853-859), 14 December 2019 - 18 December 2019.
Milner R, Jalal MA, Ng RWM & Hain T (2019) A Cross-Corpus Study on Speech Emotion Recognition. 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp 304-311), 14 December 2019 - 18 December 2019.
Doulaty M & Hain T (2019) Latent Dirichlet Allocation Based Acoustic data selection for automatic speech recognition. Interspeech 2019 (pp 3228-3232). Graz, Austria, 15 September 2019 - 15 September 2019. View this article in WRRO
Jalal MA, Loweimi E, Moore RK & Hain T (2019) Learning temporal clusters using capsule routing for speech emotion recognition. Proceedings of Interspeech 2019 (pp 1701-1705). Graz, Austria, 15 September 2019 - 15 September 2019. View this article in WRRO
Hain T & Schuller B (2019) Message from the technical program chairs. Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, Vol. 2019-September (pp 13-15)
Loweimi E, Barker JP & Hain T (2018) Exploring the use of group delay for generalised VTS based noise compensation. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing Proceedings. Calgary, Alberta, Canada, 15 April 2018 - 15 April 2018. View this article in WRRO
Nicolao M, Sanders M & Hain T (2018) Improved acoustic modelling for automatic literacy assessment of children. Proceedings of Interspeech 2018 (pp 1666-1670). Hyderabad, India, 2 September 2018 - 2 September 2018. View this article in WRRO
Errattahi R, Deena S, El Hannani A, Ouahmane H & Hain T (2018) Improving ASR Error Detection with RNNLM Adaptation. 2018 IEEE Spoken Language Technology Workshop (SLT) (pp 190-196), 18 December 2018 - 21 December 2018.
Errattahi R, El Hannani A, Hain T & Ouahmane H (2018) Towards a generic approach for automatic speech recognition error detection and classification. 2018 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP) (pp 1-6), 21 March 2018 - 24 March 2018.
Loweimi E, Barker J & Hain T (2018) On the usefulness of the speech phase spectrum for pitch extraction. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2018-S (pp 696-700). Hyderabad, India, 2 September 2018 - 2 September 2018. View this article in WRRO
Deena S, Ng RWM, Madhyashtha P, Specia L & Hain T (2018) Exploring the use of Acoustic Embeddings in Neural Machine Translation. Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop. Okinawa, Japan View this article in WRRO
Deena S, Ng RWM, Madhyashta P, Specia L & Hain T (2017) Semi-supervised adaptation of RNNLMs by fine-tuning with domain-specific auxiliary features. Proceedings of INTERSPEECH 2017: Conference of the International Speech Communication Association (pp 2715-2719). Stockholm, 20 August 2017 - 20 August 2017. View this article in WRRO
Loweimi E, Barker J & Hain T (2017) Channel Compensation in the Generalised Vector Taylor Series Approach to Robust ASR. Interspeech 2017 (pp 2466-2470). Stockholm, 20 August 2017 - 20 August 2017. View this article in WRRO
Loweimi E, Barker J, Torralba OS & Hain T (2017) Robust Source-Filter Separation of Speech Signal in the Phase Domain. Proceedings of the Annual Conference of the International Speech Communication Association. Stockholm, 20 August 2017 - 20 August 2017. View this article in WRRO
Ng WM, Kwan ACM, Lee T & Hain T (2017) ShefCE: A Cantonese-English Bilingual Speech Corpus for Pronunciation Assessment. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. New Orleans, USA View this article in WRRO
Loweimi E, Barker J & Hain T (2017) Statistical normalisation of phase-based feature representation for robust speech recognition. ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings (pp 5310-5314)
Milner R & Hain T (2017) DNN approach to speaker diarisation using speaker channels. Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing (pp 4925-4929)
Wu C, Ng RWM, Torralba OS & Hain T (2017) Analysing acoustic model changes for active learning in automatic speech recognition. International Conference on Systems, Signals and Image Processing (IWSSIP). Poznań, Poland, 22 May 2017 - 22 May 2017. View this article in WRRO
(2017) Interspeech 2017. Interspeech 2017
Saz O, Doulaty M, Deena S, Milner R, Ng RWM, Hasan M, Liu Y & Hain T (2016) The 2015 Sheffield system for transcription of Multi-Genre Broadcast media. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding Asru 2015 Proceedings (pp 624-631)
Olcoz J, Saz Torralba O & Hain T (2016) Error correction in lightly supervised alignment of broadcast subtitles. Proceedings of Interspeech 2016 (pp 2110-2114). San Francisco, CA, 8 September 2016 - 8 September 2016. View this article in WRRO
Hain T, Christian J, Saz O, Deena S, Hasan M, Ng RWM, Milner R, Doulaty M & Liu Y (2016) webASR 2 - Improved cloud based speech technology. Proceedings of Interspeech 2016. San Francisco, CA, 8 September 2016 - 8 September 2016. View this article in WRRO
Casanueva I, Hain T, Nicolao M & Green P (2016) Using phone features to improve dialogue state tracking generalisation to unseen states. Proceeding of SIGDIAL 2016. Los Angeles, USA, 13 September 2016 - 13 September 2016. View this article in WRRO
Loweimi E, Barker J & Hain T (2016) Use of generalised nonlinearity in Vector Taylor Series noise compensation for robust speech recognition. Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, Vol. 08-12-September-2016 (pp 3798-3802)
Ng R, Hain T & Chettri B (2016) Combining weak tokenisers for phonotactic language recognition in a resource-constrained setting. Combining weak tokenisers for phonotactic language recognition in a resource-constrained setting (pp 2939-2943), 9 September 2016 - 12 September 2016.
Deena S, Hasan M, Doulaty M, Saz O & Hain T (2016) Combining feature and model-based adaptation of RNNLMs for multi-genre broadcast speech recognition. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp 2343-2347). San Francisco, USA, 8 September 2016 - 8 September 2016. View this article in WRRO
Al-Shareef S & Hain T (2016) Colloquialising modern standard Arabic text for improved speech recognition. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp 1345-1349). San Francisco, USA, 8 September 2016 - 8 September 2016. View this article in WRRO
Liu Y, Fox C, Hasan M & Hain T (2016) The Sheffield Wargame Corpus - Day two and day three. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp 3833-3837). San Francisco, USA, 8 September 2016 - 8 September 2016. View this article in WRRO
Casanueva I, Hain T & Green P (2016) Improving generalisation to new speakers in spoken dialogue state tracking. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp 2726-2730). San Francisco, USA, 8 September 2016 - 8 September 2016. View this article in WRRO
Milner R & Hain T (2016) DNN-based speaker clustering for speaker diarisation. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp 2185-2189). San Francisco, USA, 8 September 2016 - 8 September 2016. View this article in WRRO
Doulaty M, Saz O, Ng RWM & Hain T (2016) Automatic Genre and Show Identification of Broadcast Media. Proceedings of the 17th Annual Conference of the International Speech Communication Association (Interspeech). San Francisco, 8 September 2016 - 8 September 2016. View this article in WRRO
Ng W, Nicolao M, Saz O, Hasan M, Chettri B, Doulaty M, Lee T & Hain T (2016) The Sheffield language recognition system in NIST LRE 2015. Proceedings of The Speaker and Language Recognition Workshop Odyssey 2016 (pp 181-187). Bilbao, Spain, 21 June 2016 - 21 June 2016. View this article in WRRO
Errattahi R, El Hannani A, Ouahmane H & Hain T (2016) Automatic Speech Recognition Errors Detection Using Supervised Learning Techniques. 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA). Agadir, Morocco View this article in WRRO
Nicolao M, Christensen H, Cunningham S, Green P & Hain T (2016) A Framework for Collecting Realistic Recordings of Dysarthric Speech - the homeService Corpus. Proceedings of LREC 2016. Portorož, Slovenia, 24 May 2016 - 24 May 2016. View this article in WRRO
Ng RWM, Shah K, Specia L & Hain T (2016) Groupwise learning for ASR k-best list reranking in spoken language translation. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Vol. 2016-M (pp 6120-6124). Shanghai, 20 March 2016 - 20 March 2016. View this article in WRRO
Milner R & Hain T (2016) Segment-oriented evaluation of speaker diarisation performance. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Shanghai, China, 20 March 2016 - 20 March 2016. View this article in WRRO
Alharbi G & Hain T (2016) The OpenCourseWare metadiscourse (OCWMD) corpus. Proceedings of the 10th International Conference on Language Resources and Evaluation Lrec 2016 (pp 1770-1776)
(2016) Interspeech 2016. Interspeech 2016
Milner R, Saz O, Deena S, Doulaty M, Ng R & Hain T (2015) The 2015 Sheffield System for Longitudinal Diarisation of Broadcast Media. Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) (pp 632-638). Scottsdale, AZ, 13 December 2015 - 13 December 2015. View this article in WRRO
Loweimi E, Barker J & Hain T (2015) Source-filter Separation of Speech Signal in the Phase Domain. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 (pp 598-602). Dresden, Germany, 6 September 2016 - 6 September 2016. View this article in WRRO
Loweimi E, Doulaty M, Barker J & Hain T (2015) Emotion Recognition from the Speech Signal by Effective Combination of Generative and Discriminative Models. USES 2015 - The University of Sheffield Engineering Symposium. The Octagon Centre, University of Sheffield, 24 June 2015 - 24 June 2015. View this article in WRRO
Doulaty Bashkand M, Saz O & Hain T (2015) Unsupervised Domain Discovery Using Latent Dirichlet Allocation for Acoustic Modelling in Speech Recognition. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp 3640-3644). Dresden, Germany, 6 September 2015 - 6 September 2015. View this article in WRRO
Doulaty M, Saz O, Ng RWM & Hain T (2015) Latent Dirichlet Allocation Based Organisation of Broadcast Media Archives for Deep Neural Network Adaptation. Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) (pp 130-136). Scottsdale, AZ, 13 December 2015 - 13 December 2015. View this article in WRRO
Bell P, Gales M, Hain T, Kilgour J, Lanchantin P, Liu A, McParland A, Renals S, Saz O, Wester M & Woodland P (2015) The MGB Challenge: Evaluating Multi-genre Broadcast Media Recognition. Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) (pp 687-693). Scottsdale, AZ, 13 December 2015 - 13 December 2015. View this article in WRRO
Loweimi E, Doulaty M, Barker J & Hain T (2015) Long-Term Statistical Feature Extraction from Speech Signal and Its Application in Emotion Recognition (pp 173-184)
Doulaty Bashkand M, Saz O & Hain T (2015) Data-Selective Transfer Learning for Multi-Domain Speech Recognition. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp 2897-2901). Dresden, Germany, 6 September 2015 - 6 September 2015. View this article in WRRO
Ng RWM, Shah K, Aziz W, Specia L & Hain T (2015) Quality estimation for asr k-best list rescoring in spoken language translation. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp 5226-5230), 19 April 2015 - 24 April 2015.
Liu Y, Karanasou P & Hain T (2015) An Investigation into Speaker Informed DNN Front-end for LVCSR. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE Xplore, 19 April 2015 - 19 April 2015. View this article in WRRO
Nicolao M, Beeston AV & Hain T (2015) Automatic assessment of English learner pronunciation using discriminative classifiers. Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp 5351-5355). Brisbane, Australia, 19 April 2015 - 19 April 2015. View this article in WRRO
Liu Y, Karanasou P & Hain T (2015) AN INVESTIGATION INTO SPEAKER INFORMED DNN FRONT-END FOR LVCSR. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) (pp 4300-4304)
Ng RWM, Shah K, Aziz W, Specia L & Hain T (2015) QUALITY ESTIMATION FOR ASR K-BEST LIST RESCORING IN SPOKEN LANGUAGE TRANSLATION. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) (pp 5226-5230)
Saz O, Doulaty M, Deena S, Milner R, Ng RWM, Hasan M, Liu Y & Hain T (2015) The 2015 sheffield system for transcription of Multi-Genre Broadcast media.. ASRU (pp 624-631)
Christensen H, Nicolao M, Cunningham S, Green P, Deena S & Hain T (2015) Speech-enabled environmental control in an AAL setting for people with speech disorders: a case study. IET International Conference on Technologies for Active and Assisted Living (TechAAL) (pp 6 .-6 .)
Casanueva I, Hain T, Christensen H, Marxer R & Green P (2015) Knowledge transfer between speakers for personalised dialogue management. Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp 12-21), September 2015 - September 2015.
AlHarbi G & Hain T (2015) Using Topic Segmentation Models for the Automatic Organisation of MOOCs resources.. EDM (pp 524-527)
Loweimi E, Barker J & Hain T (2014) Compression of Model-based Group Delay Function for Robust Speech Recognition. The University of Sheffield Engineering Symposium Conference Proceedings Vol. 1, Vol. 1. The Octagon Centre, University of Sheffield View this article in WRRO
Zhang P, Liu Y & Hain T (2014) Semi-Supervised DNN Training in Meeting Recognition. Proceedings of. South Lake Tahoe, California and Nevada, USA, 7 December 2014 - 7 December 2014. View this article in WRRO
Ng RWM, Doulaty M, Doddipatla R, Aziz W, Shah K, Saz O, Hasan M, AlHarbi G, Specia L & Hain T (2014) The USFD SLT System for IWSLT 2014. Proceedings of the International Workshop on Spoken Language Translation. http://workshop2014.iwslt.org/64.php, 4 December 2014 - 4 December 2014. View this article in WRRO
Liu Y, Zhang P & Hain T (2014) Using neural network front-ends on far field multiple microphones based speech recognition. 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Zhang P, Liu Y & Hain T (2014) Semi-supervised DNN training in meeting recognition. 2014 IEEE Workshop on Spoken Language Technology Slt 2014 Proceedings (pp 141-146)
Saz O, Doulaty M & Hain T (2014) Background-Tracking Acoustic Features for Genre Identification of Broadcast Shows. Proceedings of the 2014 Spoken Language Technology (SLT) Workshop (pp 118–123-118–123)
Christensen H, Casanueva I, Cunningham S, Green P & Hain T (2014) Automatic selection of speakers for improved acoustic modelling: recognition of disordered speech with sparse data. 2014 IEEE Spoken Language Technology Workshop (SLT) (pp 254-259), 7 December 2014 - 10 December 2014.
Saz O & Hain T (2014) Using contextual information in Joint Factor Eigenspace MLLR for speech recognition in diverse scenarios. Proceedings of the 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Florence, Italy, 4 May 2014 - 4 May 2014. View this article in WRRO
Ng R, Cohn T & Hain T (2013) Adaptation of lecture speech recognition system with machine translation output. Proceedings of the 38th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vancouver, Canada
Saz O & Hain T (2013) Asynchronous factorisation of speaker and background with feature transforms in speech recognition. INTERSPEECH-2013 (pp 1238-1242). Lyon, France, 25 August 2013 - 25 August 2013. View this article in WRRO
Fox C, Liu Y, Zwyssig E & Hain T (2013) The Sheffield Wargames Corpus. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 (pp 1115-1119)
Saz O & Hain T (2013) Asynchronous Factorisation of Speaker and Background with Feature Transforms in Speech Recognition. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 (pp 1237-1241)
Christensen H, Green P & Hain T (2013) Learning speaker-specific pronunciations of disordered speech. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 (pp 1158-1162)
Christensen H, Aniol MB, Bell P, Green P, Hain T, King S & Swietojanski P (2013) Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 (pp 3609-3612)
Lanchantin P, Bell PJ, Gales MJF, Hain T, Liu X, Long Y, Quinnell J, Renals S, Saz O, Seigel MS , Swietojanski P et al (2013) Automatic Transcription of Multi-Genre Media Archives. CEUR Workshop Proceedings (pp 26–31-26–31). Marseille, France View this article in WRRO
Christensen H, Green P & Hain T (2013) Learning speaker-specific pronunciations of disordered speech. Proceedings of the Annual Conference of the International Speech Communication Association Interspeech (pp 1159-1163)
Fox C, Liu Y, Zwyssig E & Hain T (2013) The Sheffield Wargames Corpus.. 14th Annual Conference of the International Speech Communication Association (Interspeech 2013). Lyon, France, 25 August 2013 - 29 August 2013.
Christensen H, Casanuevo I, Cunningham S, Green P & Hain T (2013) HomeService: Voice-enabled assistive technology in the home using cloud-based automatic speech recognition. Slpat 2013 4th Workshop on Speech and Language Processing for Assistive Technologies Slpat 2013 Workshop Proceedings (pp 29-34)
Christensen H, Cunningham S, Fox C, Green P & Hain T (2012) A comparative study of adaptive, automatic recognition of disordered speech. 13th Annual Conference of the International Speech Communication Association 2012 Interspeech 2012, Vol. 2 (pp 1774-1777)
Al-Shareef S & Hain T (2012) CRF-based diacritisation of colloquial Arabic for automatic speech recognition. 13th Annual Conference of the International Speech Communication Association 2012 Interspeech 2012, Vol. 3 (pp 1822-1825)
Ng RWM, Hain T & Hirose K (2012) An alignment matching method to explore pseudosyllable properties across different corpora. 13th Annual Conference of the International Speech Communication Association 2012 Interspeech 2012, Vol. 1 (pp 862-865)
Kamper H, de Wet F, Hain T & Niesler T (2012) RESOURCE DEVELOPMENT AND EXPERIMENTS IN AUTOMATIC SOUTH AFRICAN BROADCAST NEWS TRANSCRIPTION. 3rd Workshop on Spoken Language Technologies for Under Resourced Languages Sltu 2012 (pp 102-106)
Wrigley SN & Hain T (2011) Making an automatic speech recognition service freely available on the web. Interspeech’11
Tucker R, Fry D, Wan V, Wrigley S & Hain T (2011) Extending Audio Notetaker to Browse WebASR Transcriptions. Interspeech’11
Wrigley SN & Hain T (2011) Web-based automatic speech recognition service - webASR. Interspeech’11
Kempton T, Moore RK & Hain T (2011) Cross-language phone recognition when the target language phoneme inventory is not known. Interspeech’11. Florence
Al-Shareef S & Hain T (2011) An Investigation in Speech Recognition for Colloquial Arabic. Interspeech’11
Marino D & Hain T (2011) An Analysis of Automatic Speech Recognition with Multiple Microphones. Interspeech’11. Florence
Tucker R, Fry D, Wan V, Wrigley S & Hain T (2011) Extending Audio Notetaker to Browse WebASR Transcriptions. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 (pp 3336-+)
Hain T, Burget L, Dines J, Garner PN, el Hannani A, Huijbregts M, Karafiat M, Lincoln M & Wan V (2010) The AMIDA 2009 Meeting Transcription System. Interspeech’10 (pp 358-361)
Hain T & Renals S (2010) Meeting Recognition. Tutorial interspeech 2010
Hain T, Burget L, Dines J, Garner PN, El Hannani A, Huijbregts M, Karafiat M, Lincoln M & Wan V (2010) The AMIDA 2009 Meeting Transcription System. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-4 (pp 358-361)
Garner PN, Dines J, Hain T, El Hannani A, Karafiát M, Korchagin D, Lincoln M, Wan V & Zhang L (2009) Real-time ASR from meetings. Proceedings of the Annual Conference of the International Speech Communication Association Interspeech (pp 2119-2122)
Garner PN, Dines J, Hain T, El Hannani A, Karafiar M, Korchagin D, Lincoln M, Wan V & Zhang L (2009) Real-Time ASR from Meetings. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 (pp 2067-+)
Hain T, El Hannani A, Wrigley SN & Wan V (2008) Automatic speech recognition for scientific purposes - webASR. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 (pp 504-507)
Renals S, Hain T & Bourlard H (2008) Interpretation of multiparty meetings the AMI and AMIDA projects. 2008 Hands Free Speech Communication and Microphone Arrays Proceedings Hscma 2008 (pp 115-118)
Hain T, Burget L, Dines J, Garau G, Karafiat M, van Leeuwen D, Lincoln M & Wan V (2008) The 2007 AMI(DA) system for meeting transcription. MULTIMODAL TECHNOLOGIES FOR PERCEPTION OF HUMANS, Vol. 4625 (pp 414-428)
Renals S, Hain T & Bourlard H (2008) Interpretation of multiparty meetings the AMI and AMIDA projects. 2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS (pp 116-+)
Wan V, Dines J, El Hannani A & Hain T (2008) BOB: A LEXICON AND PRONUNCIATION DICTIONARY GENERATOR. 2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS (pp 217-220)
Gibson M & Hain T (2007) Temporal Masking for Unsupervised Minimum Bayes Risk Speaker Adaptation. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 (pp 1577-1580)
Karafiat M, Burget L, Hain T & Cernocky J (2007) Application of CMLLR in narrow band wide band adapted systems. Interspeech’07 (pp 282-285). Antwerp
Hain T, Burget L, Dines J, Garau G, Karafiat M, Lincoln M, Vepa J & Wan V (2007) The AMI system for the transcription of speech in meetings. 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol IV, Pts 1-3 (pp 357-360)
Al-Hames M, Hain T, Cernocky J, Schreiber S, Poel M, Muller R, Marcel S, van Leeuwen D, Odobez JM, Ba S , Bourlard H et al (2006) Audio-visual processing in meetings: Seven questions and current AMI answers. Machine Learning for Multimodal Interaction, Vol. 4299 (pp 24-35)
Hain T, Burget L, Dines J, Garau G, Karafiat M, Lincoln M, Vepal J & Wan V (2006) The AMI meeting transcription system: Progress and performance. Machine Learning for Multimodal Interaction, Vol. 4299 (pp 419-431)
Dines J, Vepa J & Hain T (2006) The segmentation of multi-channel meeting recordings for automatic speech recognition. Interspeech 2006 and 9th International Conference on Spoken Language Processing Interspeech 2006 ICSLP, Vol. 3 (pp 1213-1216)
Gibson M & Hain T (2006) Hypothesis Spaces For Minimum Bayes Risk Training In Large Vocabulary Speech Recognition. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 (pp 2406-2409)
Uraga E & Hain T (2006) Automatic Speech Recognition Experiments with Articulatory Data. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 (pp 353-356)
Wan V & Hain T (2006) Strategies for language model web-data collection. 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol I, Proceedings (pp 1069-1072). Toulouse, FRANCE, 14 May 2006 - 19 May 2006.
Wan V & Hain T (2006) Strategies for language model web-data collection. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13 (pp 1069-1072)
HAIN TC, SQUIRES TM & STONE HA (2005) Clinical Implications of a Mathematical Model of Benign Paroxysmal Positional Vertigo. Annals of the New York Academy of Sciences, Vol. 1039(1) (pp 384-394)
Hain T, Burget L, Dines J, McCowan I, Garau G, Karafiat M, Lincoln M, Moore D, Wan V, Ordelman R & Renals S (2005) The development of the AMI system for the transcription of speech in meetings. MACHINE LEARNING FOR MULTIMODAL INTERACTION, Vol. 3869 (pp 344-356)
Hain T, Burget L, Dines J, Garau G, Karafiat M, Lincoln M, McCowan I, Moore D, Wan V, Ordelman R & Renals S (2005) The 2005 AMI system for the transcription of speech in meetings. MACHINE LEARNING FOR MULTIMODAL INTERACTION, Vol. 3869 (pp 450-462)
Hain TF, Ahmad AL, Racherla SVR & Langan DD (2005) Fast, precise flattening of cubic Bézier path and offset curves. Computers & Graphics, Vol. 29(5) (pp 656-666)
Garau G, Renals S & Hain T (2005) Applying vocal tract length normalization to meeting recordings. 9th European Conference on Speech Communication and Technology (pp 265-268)
Hain T, Dines J, Garau G, Karafiat M, Moore D, Wan V, Ordelman R & Renals S (2005) Transcription of conference room meetings: An investigation. 9th European Conference on Speech Communication and Technology (pp 1661-1664)
Helminski JO, Janssen I, Kotaspouikis D, Kovacs K, Sheldon P, McQueen K & Hain TC (2005) Strategies to Prevent Recurrence of Benign Paroxysmal Positional Vertigo. Archives of Otolaryngology–Head & Neck Surgery, Vol. 131(4) (pp 344-344)
McCowan I, Carletta J, Kraaij W, Ashby S, Bourban S, Flynn M, Guillemot M, Hain T, Kadlec J, Karaiskos V , Kronenthal M et al (2005) The AMI Meeting Corpus. 5th International Conference on Methods and Techniques in Behavioral Research
Evermann G, Chan HY, Gales MJF, Hain T, Liu X, Mrva D, Wang L & Woodland PC (2004) Development of the 2003 CU-HTK conversational telephone speech transcription system. ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Vol. 1 (pp I249-I252)
Kim DY, Gales MJF, Chan HY, Woodland PC, Umesh S & Hain T (2004) Progress in Broadcast News English Transcription. EARS STT Technical Meeting 2004. Montreal, Canada
Woodland PC, Chan HY, Evermann G, Gales MJF, Hain T, Jia B, Kim DY, Liu X, Mrva D, Sim KC , Tranter SE et al (2004) Cambridge STT Overview. EARS Mid-year Meeting 2004
Kim DY, Umesh S, Gales MJF, Hain T & Woodland PC (2004) Using VTLN for Broadcast News Transcription. ICSLP’04. Cambridge University, UK
Evermann G, Chan HY, Gales MJF, Hain T, Liu X, Mrva D, Wang L & Woodland P (2004) Development of the 2003 CU-HTK Conversational Telephone Speech transcription system. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS (pp 249-252)
Hain T (2003) Single Pronunciation Dictionaries - Construction and Performance. EARS STT Technical Meeting 2004
Kim DY, Evermann G, Hain T, Mrva D, Tranter SE, Wang L & Woodland PC (2003) 2003 CU-HTK Broadcast News English System Development. Rich Transcription Workshop 2003s
Woodland PC, Chan HY, Evermann G, Gales MJF, Hain T, Kim DY, Liu X, Mrva D, Povey D, Tranter SE , Wang L et al (2003) 2003 CU-HTK English CTS Systems. Rich Transcription Workshop 2003s. Boston, Ma
Jia B, Sim KC, Gales MJF, Hain T, Liu X, Woodland PC & Yu K (2003) CU-HTK RT-03 Mandarin CTS System. Rich Transcription Workshop 2003
Woodland PC, Evermann G, Gales MJF, Hain T, Chan HY, Jia B, Kim DY, Liu X, Mrva D, Povey D , Sim KC et al (2003) Recent Experiments with HTK Broadcast News and Conversational Telephone Systems. EARS Mid-year meeting 2003
Kim DY, Evermann G, Hain T, Mrva D, Tranter SE, Wang L & Woodland P (2003) Recent advances in broadcast news transcription. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03 (pp 105-110)
Hain T (2002) Implicit Pronunciation Modelling in ASR. ITRW PMLA 2002. Estes Park, Colorado
Woodland PC, Evermann G, Gales MJF, Hain T, Liu X, Moore GL, Povey D & Wang L (2002) CU-HTK APRIL 2002 SWITCHBOARD SYSTEM. Rich Transcription Workshop 2002. Vienna, VA
PETERSON BW, CHOI H, HAIN T, KESHNER E & PENG GCY (2001) Dynamic and Kinematic Strategies for Head Movement Control. Annals of the New York Academy of Sciences, Vol. 942(1) (pp 381-393)
Larson CR, Burnett TA, Bauer JJ, Kiran S & Hain TC (2001) Comparison of voice F0 responses to pitch-shift onset and offset conditions. The Journal of the Acoustical Society of America, Vol. 110(6) (pp 2845-2848)
Hain T, Woodland PC, Evermann G & Povey D (2001) New features in the CU-HTK system for transcription of conversational telephone speech. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS (pp 57-60)
Bürkle T, Hain T, Hossain H, Dudeck J & Domann E (2001) Bioinformatics in medical practice: what is necessary for a hospital?. Stud Health Technol Inform, Vol. 84(Pt 2) (pp 951-955). Netherlands
Ostrowski VB, Byskosh A & Hain TC (2001) Tullio Phenomenon With Dehiscence of the Superior Semicircular Canal. Otology & Neurotology, Vol. 22(1) (pp 61-65)
Hain T, Woodland PC, Evermann G & Povey D (2000) The CU-HTK March 2000 HUB5E Transcription System. Speech Transcription Workshop 2000. College Park, Maryland
Hain T & Woodland PC (2000) Modelling sub-phone insertions and deletions in continuous speech recognition. ICSLP 2000
Hain TC, Fuller L, Weil L & Kotsias J (1999) Effects of T'ai Chi on Balance. Archives of Otolaryngology–Head & Neck Surgery, Vol. 125(11) (pp 1191-1191)
Woodland PC, Odell JJ, Hain T, Moore GL, Niesler TR, Tuerk A & Whittaker EWD (1999) Improvements in Accuracy and Speed in the HTK Broadcast News Transcription System. Eurospeech’99
Hain TC, Hanna PA & Rheinberger MA (1999) Mal de Debarquement. Archives of Otolaryngology–Head & Neck Surgery, Vol. 125(6) (pp 615-615)
Woodland PC, Hain T, Moore GL, Niesler TR, Povey D, Tuerk A & Whittaker EWD (1999) The 1998 HTK Broadcast News Transcription System: Development and Results. Proc. of the 1999 DARPA Broadcast News Transcription and Understanding Workshop. Herndon, VA
Odell JJ, Woodland PC & Hain T (1999) The CUHTK-Entropic 10xRT Broadcast News Transcription System. 1999 DARPA Broadcast News Transcription and Understanding Workshop (pp 271-275). Herndon, VA
Hain T & Woodland PC (1999) Hidden model sequences. Hub5 Workshop’99
Hain T & Woodland PC (1999) RECENT EXPERIMENTS WITH THE CU-HTK HUB5 SYSTEM. Hub5 Workshop’99
Hain T & Woodland PC (1999) Dynamic HMM selection for continuous speech recognition. Eurospeech’99 (pp 1327-1330). Budapest
Hain T, Woodland PC, Niesler TR & Whittaker EWD (1999) The 1998 HTK system for transcription of conversational telephone speech. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI (pp 57-60)
Woodland PC, Hain T, Johnson SE, Niesler TR, Tuerk A & Young SJ (1998) Experiments in broadcast news transcription. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6 (pp 909-912)
Hain T & Woodland PC (1998) SEGMENTATION AND CLASSIFICATION OF BROADCAST NEWS AUDIO. ICSLP’98
Woodland PC, Hain T, Johnson SE, Niesler TR, Tuerk A, Whittaker EWD & Young SJ (1998) The 1997 HTK Broadcast News Transcription System. 1998 DARPA Broadcast News Transcription and Understanding Workshop (pp 41-48)
Hain T & Woodland PC (1998) CU-HTK Acoustic modeling experiments. Hub5 Workshop 98
Hain T, Johnson SE, Tuerk A, Woodland PC & Young SJ (1998) Segment Generation and Clustering in the HTK Broadcast News Transcription System. 1998 DARPA Broadcast News Transcription and Understanding Workshop (pp 133-137)
Ostrowski VB, Hain TC & Wiet RJ (1997) Pressure-Induced Ocular Torsion. Archives of Otolaryngology - Head and Neck Surgery, Vol. 123(6) (pp 646-649)
LevyReis I, Uddin MK & Hain TC (1997) Vibration does not improve results of the Epley maneuver. NEUROLOGY, Vol. 48(3) (pp 2136-2136)
Peng GCY, Hain TC & Peterson BW (1997) How is the head held up? Modeling mechanisms for head stability in the sagittal plane. PROCEEDINGS OF THE 18TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOL 18, PTS 1-5, Vol. 18 (pp 627-628)
Hart CW & Hain TC (1995) A novel algorithm for EOG analysis. VERTIGO, NAUSEA, TINNITUS AND HEARING LOSS IN CENTRAL AND PERIPHERAL VESTIBULAR DISEASES, Vol. 1087 (pp 33-39)
KESHNER EA, PENG G, HAIN T & PETERSON BW (1995) Characteristics of head and neck stabilization in two planes of motion. MULTISENSORY CONTROL OF POSTURE (pp 83-94)
Hain TF & Vickery RJ (1995) Analysis of least-time and minimum-hop routing for clustered temporal networks. MILCOM 95 - CONFERENCE RECORD, VOLS 1-3 (pp 1144-1149)
Huertgen B & Hain T (1994) On the convergence of fractal transforms. ICASSP’94 (pp 561-564)
HAIN TC (1994) Multidimensional models of the vestibular-ocular reflex. CONTEMPORARY OCULAR MOTOR AND VESTIBULAR RESEARCH: A TRIBUTE TO DAVID A. ROBINSON (pp 72-79)
HAIN TC & ZEE DS (1992) Velocity Storage in Labyrinthine Disorders^a. Annals of the New York Academy of Sciences, Vol. 656(1) (pp 297-304)
Hanson DG, Logemann JA & Hain T (1992) Differential diagnosis of spasmodic dysphonia: A kinematic perspective. Journal of Voice, Vol. 6(4) (pp 325-337)
Yan B, Wang Q, Wiesner M, Diwan A, Iakovenko O, Polok A, Hamed I, Shimizu S, Emerman I, Hain T , Mortensen DR et al () CS-YODAS: A Mined Dataset of In-the-Wild Code-Switched Speech. Proceedings of the Language Resources and Evaluation Conference (pp 5776-5784), 11 May 2026 - 16 May 2026.
El Kheir Y, Ibrahim O, Meghanani A, Almarwani N, Toyin H, Alharbi S, Alfadly M, Alkanhal L, Selim I, Elbatal S , Mdhaffar S et al () Towards a Unified Benchmark for Arabic Pronunciation Assessment: Qur’anic Recitation as Case Study. Interspeech 2025 (pp 2410-2414)
Chen M, Zhang H, Li Y, Luo J, Wu W, Ma Z, Bell P, Lai C, Reiss JD, Wang L , Woodland PC et al () 1st place solution to Odyssey Emotion Recognition Challenge Task1: tackling class imbalance problem. The Speaker and Language Recognition Workshop (Odyssey 2024) (pp 260-265). Quebec City, Canada, 18 June 2024 - 18 June 2024. View this article in WRRO
Ravenscroft W, Goetze S & Hain T () Combining conformer and dual-path-transformer networks for single channel noisy reverberant speech separation. ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. 2024 (pp 11491-11495). Seoul, Korea, 14 April 2024 - 14 April 2024. View this article in WRRO
Do C-T, Doddipatla R, Li M & Hain T () Domain Adaptive Self-supervised Training of Automatic Speech Recognition. INTERSPEECH 2023 (pp 4389-4393)
Sailor HB & Hain T () Multilingual Speech Recognition Using Language-Specific Phoneme Recognition as Auxiliary Task for Indian Languages. Interspeech 2020 (pp 4756-4760)
Huang Q & Hain T () Detecting Mismatch Between Speech and Transcription Using Cross-Modal Attention. Interspeech 2019 (pp 584-588)
Hasan M, Doddipatla R & Hain T () Noise-matched training of CRF based sentence end detection models. Interspeech 2015 (pp 349-353)
Ng RWM, Shah K, Specia L & Hain T () A study on the stability and effectiveness of features in quality estimation for spoken language translation. Interspeech 2015 (pp 2257-2261)
Alharbi G, Ng RWM & Hain T () Annotating meta-discourse in academic lectures from different disciplines. Speech and Language Technology in Education (SLaTE 2015) (pp 161-166)
Fox C & Hain T () Extending Limabeam with discrimination and coarse gradients. Interspeech 2014 (pp 2440-2444)
Hasan M, Doddipatla R & Hain T () Multi-pass sentence-end detection of lecture speech. Interspeech 2014 (pp 2902-2906)
Doddipatla R, Hasan M & Hain T () Speaker dependent bottleneck layer training for speaker adaptation in automatic speech recognition. Interspeech 2014 (pp 2199-2203)
Casanueva I, Christensen H, Hain T & Green PD () Adaptive speech recognition and dialogue management for users with speech disorders. Interspeech 2014 (pp 1033-1037)
Christensen H, Aniol MB, Bell P, Green PD, Hain T, King S & Swietojanski P () Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech. Interspeech 2013 (pp 3642-3645)
Hain TF, Racherla SVR & Langan DD () Fast, precise flattening of cubic Bezier segment offset curves. Proceedings. 17th Brazilian Symposium on Computer Graphics and Image Processing (pp 244-249)
Woodland PC, Odell JJ, Hain T, Moore GL, Niesler TR, Tuerk A & Whittaker EWD () Improvements in accuracy and speed in the HTK broadcast news transcription system. 6th European Conference on Speech Communication and Technology (pp 1043-1046)

Reports

Close G, Hollands S, Goetze S & Hain T (2022) Clarity Prediction Challenge 1 Entry: Non-intrusive Speech Intelligibility Metric Prediction - Technical Report
el Hannani A & Hain T (2011) Data Dependence of Speech Decoder Parameters
Gibson M & Hain T (2011) Confidence-informed unsupervised Minimum Bayes Risk acoustic model adaptation
Hain T, Dines J & McCowan I (2006) Conversational multi-party speech recognition using remote microphones
Hain T, Woodland PC, Evermann G, Liu X, Moore GL, Povey D & Wang L (2003) Automatic Transcription of Conversational Telephone Speech. Development of the CU-HTK 2002 System

Theses

Hain T (2001) Hidden Model Sequence Models for Automatic Speech Recognition.
Hain T (1993) On the Use of Iterated Function Systems for Coding of Grayscale Images.

Datasets

Nicolao M, Hain T, Christensen H, Green P & Cunningham S The homeService corpus v1.0.
Deena S, Hasan M, Bashkand MD, Torralba OS & Hain T Experimental results for IEEE/ACM Transaction on Audio, Speech and Language Processing Journal Paper: "Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition and Alignment".
Torralba OS, Hain T & Martinez JO Interspeech 2016 - Experiment results for paper "Error correction in lightly supervised alignment of broadcast subtitles".
Torralba OS, Hain T, Deena S, Bashkand MD, Hasan M, Ng WM, Milner R & Liu Y Interspeech 2016 - Experiment results for paper "webASR 2 - Improved cloud based speech technology".
Torralba OS & Hain T Computer, Speech and Language - Experiment results for paper "Acoustic Adaptation to Dynamic Background Conditions with Asynchronous Transformations".
Torralba OS, Hain T, Deena S, Bashkand MD, Khaliq B, Ng WM, Milner R, Hasan M & Martinez JO Multimedia Tools and Applications - Experiments results for paper "Lightly supervised alignment of subtitles on multigenre broadcasts".
Deena S, Hasan M, Bashkand MD, Torralba OS & Hain T Interspeech 2016 - Experiment results for paper "Combining Feature and Model-Based Adaptation of RNNLMs for Multi-Genre Broadcast Speech Recognition".
Hain T, Liu Y & Hasan M Sheffield Wargame Corpora (SWC1, SWC2, SWC3) - Interspeech 2016 Experiment results.
Specia L, Hain T, Ng W & Shah K ICASSP 2016 - Experiment results for the paper "Groupwise learning for ASR k-best list reranking in spoken language translation".

Other

Ng WM, Kwan ACM, Lee T & Hain T () ShefCE: A Cantonese-English bilingual speech corpus.

Preprints

Meghanani A & Hain T (2026) Position-invariant Fine-tuning of Speech Enhancement Models with Self-supervised Speech Representations, arXiv.
Sudro PN, Ragni A & Hain T (2025) A comparative study of generative models for child voice conversion, arXiv.
Close G, Hong K, Hain T & Goetze S (2025) WhiSQA: Non-Intrusive Speech Quality Prediction Using Whisper Encoder Features.
Kheir YE, Ibrahim O, Meghanani A, Almarwani N, Toyin HO, Alharbi S, Alfadly M, Alkanhal L, Selim I, Elbatal S , Mdhaffar S et al (2025) Towards a Unified Benchmark for Arabic Pronunciation Assessment: Quranic Recitation as Case Study, arXiv.
Iakovenko O & Hain T (2024) Methods of Automatic Matrix Language Determination for Code-Switched Speech, arXiv.
Do C-T, Imai S, Doddipatla R & Hain T (2024) Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis, arXiv.
Ravenscroft W, Close G, Goetze S, Hain T, Soleymanpour M, Chowdhury A & Fuhs MC (2024) Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition, arXiv.
Meghanani A & Hain T (2024) LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related Tasks, arXiv.
Chen M, Zhang H, Li Y, Luo J, Wu W, Ma Z, Bell P, Lai C, Reiss J, Wang L , Woodland PC et al (2024) 1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem, arXiv.
Park C, Chen M & Hain T (2024) Automatic Speech Recognition System-Independent Word Error Rate Estimation.
Close G, Hain T & Goetze S (2024) Hallucination in Perceptual Metric-Driven Speech Enhancement Networks, arXiv.
Meghanani A & Hain T (2024) Improving Acoustic Word Embeddings through Correspondence Training of Self-supervised Speech Representations, arXiv.
Meghanani A & Hain T (2024) SCORE: Self-supervised Correspondence Fine-tuning for Improved Content Representations, arXiv.
Ahmad R, Farooq MU & Hain T (2024) Progressive unsupervised domain adaptation for ASR using ensemble models and multi-stage training.
Mogridge R, Close G, Sutherland R, Hain T, Barker J, Goetze S & Ragni A (2024) Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users using Intermediate ASR Features and Human Memory Models, arXiv.
Close G, Ravenscroft W, Hain T & Goetze S (2023) Multi-CMGAN+/+: Leveraging Multi-Objective Speech Quality Metric Prediction for Speech Enhancement, arXiv.
Park C, Lu C, Chen M & Hain T (2023) Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text.
Ravenscroft W, Goetze S & Hain T (2023) On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments, arXiv.
Close G, Hain T & Goetze S (2023) The Effect of Spoken Language on Speech Enhancement using Self-Supervised Speech Representation Loss Functions, arXiv.
Close G, Hain T & Goetze S (2023) The Effect of Spoken Language on Speech Enhancement using Self-Supervised Speech Representation Loss Functions.
Ollerenshaw A, Jalal MA, Milner R & Hain T (2023) Empirical Interpretation of the Relationship Between Speech Acoustic Context and Emotion Recognition, arXiv.
Ahmad R, Jalal MA, Farooq MU, Ollerenshaw A & Hain T (2023) Towards domain generalisation in ASR with elitist sampling and ensemble knowledge distillation, arXiv.
Ollerenshaw A, Jalal MA & Hain T (2022) Dynamic Kernels and Channel Attention for Low Resource Speaker Verification, arXiv.
Ollerenshaw A, Jalal MA & Hain T (2022) Probing Statistical Representations For End-To-End ASR, arXiv.
Ravenscroft W, Goetze S & Hain T (2022) Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation, arXiv.
Park C, Ahmad R & Hain T (2022) Unsupervised data selection for Speech Recognition with contrastive loss ratios, arXiv.
Farooq MU, Narayana DAH & Hain T (2022) Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion, arXiv.
Farooq MU & Hain T (2022) Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition.
Milner R, Jalal MA, Ng RWM & Hain T (2022) A cross-corpus study on speech emotion recognition, arXiv.
Ollerenshaw A, Jalal MA & Hain T (2022) Insights on Neural Representations for End-to-End Speech Recognition, arXiv.
Ravenscroft W, Goetze S & Hain T (2022) Receptive Field Analysis of Temporal Convolutional Networks for Monaural Speech Dereverberation, arXiv.
Chen M, Zhou Y, Huang H & Hain T (2022) Efficient Non-Autoregressive GAN Voice Conversion using VQWav2vec Features and Dynamic Convolution.
Do C-T, Doddipatla R & Hain T (2021) Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition, arXiv.
Chen M, Shi Y & Hain T (2020) Towards Low-Resource StarGAN Voice Conversion using Weight Adaptive Instance Normalization.
Chen M & Hain T (2020) Unsupervised Acoustic Unit Representation Learning for Voice Conversion using WaveNet Auto-encoders.
Doulaty M, Saz O, Ng RWM & Hain T (2016) Automatic Genre and Show Identification of Broadcast Media, arXiv.
Saz O, Doulaty M, Deena S, Milner R, Ng RWM, Hasan M, Liu Y & Hain T (2015) The 2015 Sheffield System for Transcription of Multi-Genre Broadcast Media, arXiv.
Doulaty M, Saz O, Ng RWM & Hain T (2015) Latent Dirichlet Allocation Based Organisation of Broadcast Media Archives for Deep Neural Network Adaptation, arXiv.
Saz O, Doulaty M & Hain T (2015) Background-tracking Acoustic Features for Genre Identification of Broadcast Shows, arXiv.
Doulaty M, Saz O & Hain T (2015) Data-selective Transfer Learning for Multi-Domain Speech Recognition, arXiv.
Doulaty M, Saz O & Hain T (2015) Unsupervised Domain Discovery using Latent Dirichlet Allocation for Acoustic Modelling in Speech Recognition, arXiv.
Ng RWM, Doulaty M, Doddipatla R, Aziz W, Shah K, Saz O, Hasan M, AlHarbi G, Specia L & Hain T (2014) The USFD Spoken Language Translation System for IWSLT 2014. View this article in WRRO
Ravenscroft W, Goetze S & Hain T () Utterance weighted multi-dilation temporal convolutional networks for monaural speech dereverberation. View this article in WRRO

Grants

UKRI Centre for Doctoral Training in Speech and Language Technologies and their Applications, EPSRC, 04/2019 - 09/2027, £5,508,850, as PI
VoiceBase Centre, VoiceBase Inc./Liveperson, 04/2018 - 03/2026, £2,488,691, as PI
WFST-based integration of ASR and MT in Spoken Language Translation, Industrial, 03/2014 - 12/2026, £63,588, as PI
Automatic voice conversion for transforming professional adult voice actors to artificial child voice actors, Innovate UK, 01/2021 - 01/2023, £173,605, as PI
MAUDIE: Multimedia Analysis for Unsupervised Dubbing In Entertainment, Innovate UK, 05/2018 - 07/2021, £393,115, as PI
TUTO II: Reading skills tutoring system, ITSLANGUAGE BV, 08/2017 - 12/2019, £121,439, as PI
Sound Source Separation Based on Deep Learning, Industrial, 05/2019 - 04/2020, £48,000, as PI
Acoustic correlates of emotions for automatic recognition, Industrial, 10/2018 - 09/2019, £48,900, as PI
Bridge Project, VoiceBase Inc., 09/2017 - 03/2018, £61,200, as PI
STATUS IV: Speech Technology and Translation Universal Survey, Defence Science and Technology Laboratory, 01/2017 - 10/2017, £60,000, as PI
TUTO: Reading skills tutoring system, ITSLANGUAGE BV, 09/2016 - 08/2017, £61,983, as PI
STATUS III: Speech Technology and Translation Universal Survey, Defence Science and Technology Laboratory, 01/2015 - 07/2016, £78,684, as PI
STATUS II: Speech Technology and Translation Universal Survey, Defence Science and Technology Laboratory, 11/2013 - 05/2014, £98,982, as PI
ItsLanguage, ITSLANGUAGE BV, 11/2012 - 03/2015, £68,333, as PI
German System Adaptation, ITSLANGUAGE BV, 11/2012 - 03/2015, £42,373, as PI
DocuMeet: Transcription, summarisation and documentation of meetings using advanced speech technologies, indexing and browsing capabilities, EC FP7, 11/2012 - 10/2014, £368,433, as PI
STATUS: Speech Technology and Translation Universal Survey, Defence Science and Technology Laboratory, 10/2012 - 08/2013, £73,726, as PI
A Joint Model of Spoken Language Translation, Google, 09/2011 - 12/2016, £43,014, as PI
Natural Speech Technology, EPSRC, 05/2011 - 07/2016, £1,798,665, as PI
Unsupervised Domain Adaptation, CISCO, 11/2010 - 04/2012, £121,745, as PI
AMIDA: Augmented Multi-party Interaction with Distance Access, EC FP6, 10/2006 - 12/2009, £467,074, as PI
AMIDA: Augmented Multi-party Interaction with Distance Access, EC FP6, 10/2006 - 12/2009, £345,350, as PI

Professional activities and memberships

Head of the Speech and Hearing research group
Editorial Board member, Computer Speech and Language
Associate Editor, ACM Transactions on Speech and Language Processing
Organising committee member, ASRU 2013
Area Chair, Interspeech 2014, Speech Recognition - Signal Processing, Acoustic Modelling, Robustness and Adaptation.
Area Chair, ICPR 2014, Track 3 Image, Speech. Signal and Video Processing
Programme Committee, PoITAL 2014

School of Computer Science

School of Computer Science

Professor Thomas Hain

Books

Journal articles

Book chapters

Conference proceedings

Reports

Theses

Datasets

Other

Preprints

Links