Natural Language Processing


The Natural Language Processing Research Group , established in 1993 , is one of the largest and most successful language processing groups in the UK and has a strong global reputation.

Natural Language Processing (NLP) is an interdisciplinary field that uses computational methods:

  • To investigate the properties of written human language and to model the cognitive mechanisms underlying the understanding and production of written language (scientific focus)
  • To develop novel practical applications involving the intelligent processing of written human language by computer (engineering focus)

Research themes

Contact us
Natural Language Processing Research Group
Department of Computer Science
University of Sheffield
Regent Court
211 Portobello
Sheffield, S1 4DP
+44 (0)114 222 1901

Twitter logo Follow us on Twitter @SheffieldNLP



The group's research interests fall into the broad areas of: 

Information Access: Building applications to improve access to information in massive text collections, such as the web, newswires and the scientific literature. Subtopics include: information extraction, text mining and semantic annotation, question answering, summarization.

Language Resources and Architectures for NLP: Providing resources - both data and processing resources - for research and development in NLP. Includes platforms for developing and deploying real world language processing applications, most notably GATE, the General Architecture for Text Engineering.

Machine Translation: Building applications to translate automatically between human languages, allowing access to the vast amount of information written in foreign languages and easier communication between speakers of different languages.

Human-Computer Dialogue Systems: Building systems to allow spoken language interaction with computers or embodied conversational agents, with applications in areas such as keyboard-free access to information, games and entertainment, articifial companions.

Detection of Reuse and Anomaly: Investigating techniques for determining when texts or portions of texts have been reused or where portions of text do not fit with surrounding text. These techniques have applications in areas such as plagiarism and authorship detection and in discovery of hidden content.

Foundational Topics: Developing applications with human-like capabilities for processing language requires progress in foundational topics in language processing. Areas of interest include: word sense disambiguation, semantics of time and events.

The NLP group's research has received support from: the EU's Framework Programmes (Frameworks 4, 5, 6 and 7) as well as Horizon 2020 and the European Research Council, the UK Research Councils (EPSRC, BBSRC, MRC and AHRC) and various governmental and industrial sponsors, including GlaxoSmithKline and IBM.

The NLP group has close associations with the Speech and Hearing and Information Retrieval research groups which carry out research into other areas of computational processing of human language.

We also host the CLUK Website. We no longer host the ICCL website but clicking the link will direct you to the new pages.



These are currently the members of NLP group. Click on a name to see a home page.

Senior Research Scientists

Prof. Kalina Bontcheva

Dr. Diana Maynard

Administrative Support

Lucy Moffatt

Alice Tucker

Former group members



  • The 4-year EU H2020 RISIS2 project started in January 2019. Sheffield PI's Kaina Bontcheva and Diana Maynard.
  • The 3-year EU H2020 Bergamot project started in January 2019. Sheffield PI's Lucia Specia and Nikos Aletras.
  • Mark Stevenson gave an invited talk on “Discovering Hidden Knowledge from Scientific Literature: Challenges and (some) Solutions” at the Workshop on Hypothesis Generating in Genetics and Biomedical Text Mining, Lancaster University
  • Tait, J. and Wilks, Y. (2019). Then and Now: 25 years of progress in Natural Language Engineering, Natural Language Engineering Vol 25(3) May 2019.
  • Wilks, Y. (2019). Moral orthoses: a new look at AI and ethics. AI magazine. 2019.
  • Wilks, Y. (2019). Artificial Intelligence: Modern magic or dangerous future? Icon books; London.
  • Emeritus Professor Yorick Wilks has been delivering a series of lectures at the Museum of London as Visiting Professor of Artificial Intelligence at Gresham College.
  • Emeritus Professor Yorick Wilks, along with Professor William Clocksin and Dr. Fraser Watts of Cambridge have been awarded a Synthesis Prize from the Templeton Foundation at UCLA to prepare a major proposal to the Foundation of Computation and Theology.

Papers accepted to ECIR 2019:

  • Topic Grouper: An Agglomerative Clustering Approach to Topic Modeling. Daniel Pfeifer and Jochen L. Leidner

  • The 3-year EU H2020 ELG project started in December 2018. Sheffield PI Kalina Bontcheva.
  • The 3-year EU H2020 WeVerify project started in December 2018. Sheffield PI Kalina Bontcheva.
  • The 3-year EU CEF-Telecon APE-QUEST project started in October 2018. Sheffield PI is Lucia Specia.
  • Professor Jochen L. Leidner had a tutorial accepted to IEEE DSAA 2018 entitled Project Management for Data Science - Tutorial.

Paper accepted to KDD 2018:

  • Mark Stevenson gave an invited talk to the Language and Computation group at Essex University. Title: Supporting Evidence-based Medicine with Natural Language Processing
  • Diana Maynard gave a Tutorial on Text analysis with GATE at the British Computer Society Search Solutions Conference in London.
  • Diana Maynard gave a Keynote talk on Artificial Intelligence and Social Media at the Chartered Institute of Public Relations annual conference, London.
  • Diana Maynard gave an invited talk on The Use of Semantic Technologies for Mapping European Research, at the University of Copenhagen, Denmark.
  • Diana Maynard gave an invited talk on Information extraction tools for social media and medical data analysis at GESIS, Cologne.
  • Diana Maynard gave a keynote talk and practical tutorial at the WSTNet Web Science Summer School in Hannover. Germany.
  • Diana Maynard gave tutorials on "Practical Sentiment Analysis" and "Introduction to NLP" at Essex University Summer School on Big Data and Analytics.
  • Diana Maynard gave 2 invited tutorials on tools for text analytics and social media analysis at the Digital Humanities and Technology THATcamp in Valence, France, on 14-15 June 2018.
  • Diana Maynard gave a keynote speech at the Emotion Modeling and Detection in Social Media and Online Interaction symposium at AISB 2018 in Liverpool on 5 April 2018
  • Journal paper: A. Saeed, R. Nawab, M. Stevenson and P. Rayson (2018) A word sense disambiguation corpus for Urdu. Language Resources and Evaluation.
  • Journal paper: A. Duque, M. Stevenson, J. Martinez-Romo and L. Araujo (2018) Co-Occurrence Graphs for Word Sense Disambiguation in the Biomedical Domain. Artificial Intelligence in Medicine, 87:9-19.
  • Book chapter: E. Agirre and M. Stevenson (2018) Word Sense Disambiguation. In Mitkov, R. Oxford Handbook of Computational Linguistics (Second Edition). Oxford University Press.
  • Paper: G. Gorrell, M. Greenwood, I. Roberts, D. Maynard, K. Bontcheva. Twits, Twats and Twaddle: Trends in Online Abuse towards UK Politicians. In Proceedings of the 12th International Conference on Web and Social Media (ICWSM 2018), 25-28 June 2018, Stanford, US.
  • Paper: L. Derczynski, K. Meesters, K. Bontcheva, D. Maynard. Helping Crisis Responders Find the Informative Needle in the Tweet Haystack. In Proceedings of 15th International Conference on Information Systems for Crisis Response and Management (ISCRAM), 20-23 May 2018, Rochester, US.
  • Paper: Prashant Khare, Gregoire Burel, Diana Maynard and Harith Alani. Cross-Lingual Classification of Crisis Data. International Semantic Web Conference 2018, October 2018. Monterey, California, US.
  • Paper: Sara Torsner, Jackie Harrison, Diana Maynard: "Monitoring violence against journalists: A methodology for comprehensive and systematic data collection” presented at World Press Freedom Day 2018, Accra, Ghana, 3 May 2018
  • Journal paper: G. Resce and D. Maynard. What matters most to people around the world? Retrieving Better Life Index priorities on Twitter. Journal of Technological Forecasting & Social Change.
  • Conference Paper: Prashant Khare, Gregoire Burel, Diana Maynard and Harith Alani. Cross-Lingual Classification of Crisis Data. International Semantic Web Conference, October 8-12 2018, Monterey, California.
  • Conference Paper: Z. Zhang, J. Petrak, D. Maynard. Adapted TextRank for Term Extraction: c Semantics 2018, Vienna, Austria, 10-13 September, 2018
  • Diana Maynard, together with Juan M. Kanai (Geography), has been awarded a British Academy grant via the call "Tackling the UK’s International Challenges", for a research project entitled “Social Understandings of Scale: The role of print and social media in the EU Referendum debate “, The project is running from January 2018 - January 2019.
  • Diana Maynard, together with Jackie Harrison (Journalism Studies) has been awarded funding from Free Press Unlimited for a 6-month project "Improving the monitoring of violence against journalists

Papers accepted to COLING 2018:

  • Automated Fact Checking: Task Formulations, Methods and Future Directions. James Thorne and Andreas Vlachos.
  • Topic or Style? Exploring the Most Useful Features for Authorship Attribution. Yunita Sari, Mark Stevenson and Andreas Vlachos.
  • deepQuest: A Framework for Neural-based Quality Estimation. Julia Ive, Frédéric Blain and Lucia Specia.
  • Can Rumour Stance Alone Predict Veracity? – Sebastian Dungs, Ahmet Aker, Norbert Fuhr and Kalina Bontcheva.
  • Zeerak Waseem is organising the 2nd Workshop of Abusive Language Online at EMNLP 2018 in Brussels, Belgium.
  • Zeerak Waseem recently co-organised the workshop Widening NLP at NAACL 2018.
  • Professor Jochen Leidner has become a member of the Industrial Liaison Board
  • Conference paper: Igbo Diacritic Restoration using Embedding Models. Ignatius Ezeani, Mark Hepple, Ikechukwu Onyenwe and Enemuo Chioma
  • Andreas Vlachos gave a talk at the University of Sussex on Imitation Learning, Zero-shot Learning and Automated Fact Checking
  • Andreas Vlachos gave a talk on Artificial Intelligence vs Misinformation with James Thorne at the Pint of Science Launch Festival in Sheffield
  • FEVER shared task on fact checking claims against Wikipedia using the 200K claims dataset described in our upcoming NAACL paper started!
  • Andreas Vlachos gave a talk at Benevolent.AI on automated fact checking
  • Andreas Vlachos gave a talk at the Ubiquitous Knowledge Processing at Darmstadt on automated fact checking and imitation learning

Papers accepted to EACL 2017:

  • Continuous N-gram Representations for Authorship Attribution, Y. Sari, A. Vlachos, M. Stevenson, Proceedings of EACL: Volume 2, Short Papers pdf bib
  • An Extensible Framework for Verification of Numerical Claims, J. Thorne, A. Vlachos, Proceedings of the Software Demonstrations pdf bib
  • Book: Natural Language Processing for the Semantic Web, Diana Maynard, Kalina Bontcheva, Isabelle Augenstein. Morgan and Claypool, December 2016. ISBN:97816270590
  • Journal paper: A Framework for Real-time Semantic Social Media Analysis. Diana Maynard, Ian Roberts, Mark A. Greenwood, Dominic Rout and Kalina Bontcheva. Web Semantics: Science, Services and Agents on the World Wide Web, 2017
  • Conference paper: Towards an Infrastructure for Understanding and Interlinking Knowledge Co-Creation in European research, Diana Maynard, Adam Funk and Benedetto Lepori. ESWC 2017 Workshop on Scientometrics, Portoroz, Slovenia, May 2017
  • Diana Maynard taught 2 practical tutorials at the AI Seminar on Social Media Content Analysis, UPC Barcelona, May 2017
  • Diana Maynard gave an invited tutorial at the EU CLARIN-PLUS workshop on "Creation and Use of Social Media Resources", Lithuania, 2017
  • Diana Maynard gave an invited talk at 2017 Joint EC-OECD workshop on Semantic Technologies and Semantic Web: Structuring Data for STI Policy Analysis, 19 June, Brussels
  • Diana Maynard gave an invited talk at 2017 EPSRC The Future of Patent Analytics Workshop, 3 March, Cambridge, UK
  • The KNOWMAK project has started. A 3 year EC H2020 project from 1 Jan'17 - 31 Dec’20. The University of Sheffield PI is Diana Maynard.
  • Diana Maynard was Programme Chair of the ESWC conference in Portoroz, Slovenia in May.
  • Diana Maynard has won an ESRC-funded award from Understanding Society to access and analyse EU Referendum UK household survey data, for the project "Brexit narratives of place and scale: a media environment analysis of the EU Referendum debate” Co-PIs: Jackie Harrison (Journalism), J. Miguel Kanai (Geography)

Papers accepted for COLING 2016:

  • Representation and Learning of Temporal Relations. L. Derczynski (2016). COLING
  • Broad Twitter Corpus: A Diverse Named Entity Recognition Resource. L. Derczynski, K. Bontcheva, I. Roberts (2016). COLING
  • Stance classification in Rumours as a Sequential Task Exploiting the Tree Structure of Social Media Conversations. A. Zubiaga, E. Kochkina, M. Liakata, R. Procter, M. Lukasik.(2016). COLING
  • Anita: An Intelligent Text Adaptation Tool. G. Paetzold, L. Specia. (2016). COLING
  • Understanding the Lexical Simplification Needs of non-Native Speakers of English. G. Paetzold, L. Specia. (2016). COLING
  • Collecting and Exploring Everyday Language for Predicting Psycholinguistic Properties of Words. G. Paetzold, L. Specia. (2016). COLING
  • Imitation learning for language generation from unaligned data. G. Lampouras, A. Vlachos. (2016). COLING
  • Carolina Scarton, Gustavo Paetzold and Lucia Specia will give a tutorial at COLING 2016, titledQuality estimation for language output applications
  • We are please to announce that Gutsavo Paetzold has passed his PhD viva, having submitted only 2 years after joining as a PhD student.
  • Leon Derczynski will give a course at ESSLLI 2017 with Matteo Magnani, titled "Networks and User-generated Content"
  • Book in press in Springer Studies in Computational Intelligence: Automatically ordering events and times in text - Leon Derczynski
  • Diana Maynard has had an article on automatic sarcasm detection published in Quartz Magazine
  • Diana Maynard will give tutorials on NLP and Social Media Analysis at the 1st International Deep Learning, Big Data and Big Compute Camp, Rabat, Morocco, 24-28 October 2016.
  • Paper published in European Psychiatry: Novel psychoactive substances: an investigation of temporal trends in social media and electronic health records - A Kolliakou, M Ball, L Derczynski, D Chandran, G Gkotsis, P Deluca, R Jackson, H Shetty, R Stewart
  • Mark Stevenson and Adam Poulson are collaborating with ScHaRR and Human on a project to visualise emotion in social media at the Festival of the Mind - Link to the Guardian Article
  • Paper: An IR-based Approach Utilising Query Expansion for Plagiarism Detection in MEDLINE. R. Nawab, M Stevenson and P. Clough (2016). IEEE/ACM Transactions of Computational Biology and Bioinformatics.
  • Paper: The Effect of Word Sense Disambiguation Accuracy on Literature Based Discovery. J. Preiss and M. Stevenson (2016). BMC Decision Making and Medical Informatics.
  • Paper: A Corpus of Potentially Contradictory Research Claims from Cardiovascular Research Abstracts. A. Alamri and M. Stevenson (2016). Journal of Biomedical Semantics, 7 (36).

Papers accepted for EMNLP 2016:

  • Stance Detection with Bidirectional Conditional Encoding , Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos and Kalina Bontcheva
  • Leon Derczynski has won an NVIDA hardware grant for summary generation from collections of text.
  • Prof. Lucia Specia has been awarded an EC H2020 funded ERC Starting Grant. The project on Multimodal Context Modelling for Machine Translation (MultiMT) will start on 1 July 2016 for 5 years.

Papers accepted for ACL 2016

  • Hawkes Processes for Continuous Time Sequence Classification: an Application to Rumour Stance Classification in Twitter. Michal Lukasik, P. K. Srijith, Duy Vu, Kalina Bontcheva, Arkaitz Zubiaga, Trevor Cohn.
  • Metrics for Evaluation of Word-level Machine Translation Quality Estimation. Varvara Logacheva, Michal Lukasik and Lucia Specia.

Papers accepted for TSD2016

  • Automatic Restoration of Diacritics for Igbo Language . Ignatius Ezeani, Mark Hepple and Ikechukwu Onyenwe. 
  • Predicting Morphologically-Complex Unknown Words in Igbo. Ikechukwu Onyenwe and Mark Hepple
  • Paper nominated for Best Paper Award at WebSci 2016: Miriam Fernandez, Harith Alani, Lara Piccolo, Christoph Meili, Diana Maynard and Meia Wippoo. Talking Climate Change via Social Media: Communication, Engagement and Behaviour, May 22-25 2016, Hannover, Germany.
  • Diana Maynard taught a 3-hour practical tutorial at the AI Seminar on Social Media Content Analysis, UPC Barcelona, 9-13 May 2016.
  • Leon Derczynski is co-organising a workshop on Noisy User-generated Text (WNUT) at COLING in Osaka, Japan, 10th December 2016.
  • Diana Maynard will teach two 6-hour courses, "Introduction to NLP" and "Practical social media and sentiment analysis" at the University of Essex Big Data and Analytics Summer School in September 2016.
  • Andreas Vlachos will be speaking at the Lisbon Machine Learning Summer School about imitation learning for structured prediction.
  • Andreas Vlachos will be speaking at the Knowledge Representation Workshop at the University of Liverpool on 28th June 2016.
  • Paper: Noise reduction and targeted exploration in imitation learning for Abstract Meaning Representation parsing. James Goodman, Andreas Vlachos and Jason Naradowsky. ACL 2016.
  • Paper: Emergent: A novel data-set for stance classification. William Ferreira and Andreas Vlachos. NAACL 2016.
  • Paper: Large-scale Multitask Learning for Machine Translation Quality Estimation. Kashif Shah and Lucia Specia. NAACL 2016.
  • Paper: Phrase Level Segmentation and Labelling of Machine Translation Errors. Frederic Blain, Varvara Logacheva, and Lucia Specia. In Proc. of Language Resources and Evaluation Conference (LREC), May 2016, Portoroz, Slovenia
  • Paper: Challenges of Evaluating Sentiment Analysis Tools on Social Media. Diana Maynard and Kalina Bontcheva. In Proc. of Language Resources and Evaluation Conference (LREC), May 2016, Portoroz, Slovenia
  • Paper: Complementarity, F-score, and NLP Evaluation. Leon Derczynski. In Proc. of Language Resources and Evaluation Conference (LREC), May 2016, Portoroz, Slovenia
  • Paper: GATE-Time: Extraction of Temporal Expressions and Events. Leon Derczynski, Jannik Strötgen, Diana Maynard, Mark A. Greenwood, Manuel Jung. In Proc. of Language Resources and Evaluation Conference (LREC), May 2016, Portoroz, Slovenia
  • Dr. Diana Maynard has been awarded a grant for a fully-funded 4-year PhD student project by the Grantham Centre for Sustainable Futures, to start in October 2016, on the topic of disaster relief reporting and climate change. The Grantham Scholar will be supervised by Diana Maynard and co-supervised by Prof. Jacqueline Harrison from the Dept of Journalism and Prof. Shaun Quegan from the Centre for Terrestrial Carbon Dynamics.
  • The next annual GATE training course will be held from 6-10 June 2016.
  • Mark Stevenson was awarded a grant from Defence Science and Technology Laboratory: "Hypothesis Generation and Visualisation from Data"
  • Paper: A Graph-based Approach to Topic Clustering for Online News. Ahmet Aker, Emina Kurtic, Balamurali Andiyakkal Rajendran, Monica Paramita, Emma Barker, Mark Hepple and Rob Gaizauskas. ECIR 2016.
  • Paper: Automated Content Analysis: A Sentiment Analysis on Malaysian Government Social Media. Siti Salwa Hasbullah and Diana Maynard. In Proc. of ACM International Conference on Ubiquitous Information Management and Communication (IMCOM), January 2016, Danang, Vietnam.
  • The COMRADES project has started. A 3 year EC H2020 project from 1 Jan'16 - 31 Dec'18. The University of Sheffield PI is Prof. Kalina Bontcheva
  • We are pleased to announce two new NLP Professors: Kalina Bontcheva and Lucia Specia have both been promoted to Personal Chair.

Older news

Older news stories


Scheduled Speakers

2018 - 2019

4 July 2019 - Daniel Beck (University of Melbourne) - Natural Language Generation in the Wild

Traditional research in NLG focuses on building better models and assessing their performance using clean, preprocessed and curated datasets, as well as standard automatic evaluation metrics. From a scientific point-of-view, this provides a controlled environment where different models can be compared and robust conclusions can be made.
However, these controlled settings can drastically deviate from scenarios that happen when deploying systems in the real world. In this talk, I will focus on what happens *before* data is fed into NLG systems and what happens *after* we generate outputs. For the first part, I will focus on addressing heterogeneous data sources using tools from graph theory and deep learning. In the second part, I will talk about how to improve decision making from generated texts through Bayesian techniques, using Machine Translation post-editing as a test case.

Bio: Daniel is a Lecturer at The University of Melbourne. His main research topic is Natural Language Generation, with a focus on Machine Translation. He is particularly interested in using tools from Machine Learning, Theoretical Computer Science and Statistics to address challenges in NLG that go beyond the usual input-output pipeline. He obtained a PhD from The University of Sheffield, United Kingdom, and his thesis on using Gaussian Processes for NLP applications received a Best Thesis Award from the European Association for Machine Translation. Daniel is also an advocate for queer and LGBT+ visibility in STEM, in particular within NLP and Machine Learning. He is currently a board member of the Widening NLP initiative (, which foster inclusivity from underrepresented groups in NLP. His personal webpage can be found at and he tweets at

23 May 2019 - Karin Verspoor (University of Melbourne) - Natural Language Processing (NLP) for structuring complex biomedical texts: progress and remaining challenges

The NLP community has been focused on methods for identifying and extracting key concepts and relations from highly specialised and terminology-rich texts; these texts have posed a challenge to general NLP tools as well as providing an opportunity to explore the robustness of relation extraction methods to domain-specific applications. In this talk I will present our recent studies with graph kernels and neural methods for relation extraction from the biomedical literature, present empirical work on core supporting tasks such as syntactic analysis of these texts, and discuss open challenges for work in this direction and beyond.

Bio:Karin Verspoor is a Professor in the School of Computing and Information Systems and Deputy Director of the Health and Biomedical Informatics Centre at the University of Melbourne. Trained as a computational linguist, Karin’s research primarily focuses on extracting information from clinical texts and the biomedical literature using machine learning methods to enable biological discovery and clinical decision support. Karin held previous posts as the Scientific Director of Health and Life Sciences at NICTA Victoria Research Laboratory, at the University of Colorado School of Medicine, and Los Alamos National Laboratory. She also spent 5 years in start-ups during the US Tech bubble, where she helped design an early artificial intelligence system.

11 April 2019 - Ryan Cotterell (University of Cambridge) - Probabilistic Typology: Deep Generative Models of Vowel Inventories

Linguistic typology studies the range of structures present in human language. The main goal of the field is to discover which sets of possible phenomena are universal, and which are merely frequent. For example, all languages have vowels, while most—but not all—languages have an [u] sound. In this paper we present the first probabilistic treatment of a basic question in phonological typology: What makes a natural vowel inventory? We introduce a series of deep stochastic point processes, and contrast them with previous computational, simulation-based approaches. We provide a comprehensive suite of experiments on over 200 distinct languages.

Bio: Ryan is a lecturer (≈assistant professor) of computer science at the University of Cambridge. He specializes in natural language processing, computational linguistics and machine learning, focusing on deep learning and statistical approaches to phonology, morphology, linguistic typology and low-resource languages. He will receive his Ph.D. in Spring 2019 from the computer science department of the Johns Hopkins University, where he was affiliated with the Center for Language and Speech Processing; he was co-advised there by Jason Eisner and David Yarowsky. He has received best paper awards at ACL 2017 and EACL 2017 and two honorable mentions for best paper at EMNLP 2015 and NAACL 2016. Previously, he was a visiting Ph.D. student at the Center for Information and Language Processing at LMU Munich supported by a Fulbright Fellowship and a DAAD Research Grant under the supervision of Hinrich Schütze. His PhD was supported by an NDSEG graduate fellowship, the Fredrick Jelinek Fellowship, and a Facebook Fellowship.

4 April 2019 - Walid Magdy (University of Edinburgh) - Online Users' Behaviour Understanding and Prediction with Data Science

Large concern by public has emerged recently about social media data can reveal about users. In this talk, some examples are presented of how “public” social media data could be explored with data science to predict users’ behaviour and societies trends, including public interest, individual preferences, and personal information. Example studies on the US election, hate-speech, opinion change, and fake accounts are covered in this talk.

Bio: Walid Magdy is an assistant professor at the school of Informatics, the University of Edinburgh (UoE) and a faculty fellow at the Alan Turing Institute. His main research interests include computational social science, information retrieval, and data mining. He holds his PhD from the School of Computing at Dublin City University (DCU), Ireland. He has an extensive industrial background from working earlier for IBM, Microsoft, and QCRI. Walid has over 60 peer-reviewed published articles in top tier conferences and journals. He also has a set of 9 patents filed under his name. Some of his work was featured in popular press, such as CNN, BBC, Washington Post, National Geographic, and MIT Tech reviews.

28 March 2019 - Arpit Mittal (Amazon Research Cambridge) - Learning when not to answer

I will talk about our recent work where we investigate the challenges of using reinforcement learning agents for question-answering over knowledge graphs for real-world applications. We examine the performance metrics used by state-of-the-art systems and determine that they are inadequate for such settings. More specifically, they do not evaluate the systems correctly for situations when there is no answer available and thus agents optimized for these metrics are poor at modelling confidence. We introduce a simple new performance metric for evaluating question-answering agents that is more representative of practical usage conditions, and optimize for this metric by extending the binary reward structure used in prior work to a ternary reward structure which also rewards an agent for not answering a question rather than giving an incorrect answer. We show that this can drastically improve the precision of answered questions while only not answering a limited number of previously correctly answered questions. Employing a supervised learning strategy using depth-first-search paths to bootstrap the reinforcement learning algorithm further improves performance.

Bio: Dr Arpit Mittal is a Senior Machine Learning Scientist at Amazon Research Cambridge. He is currently working on projects involving knowledge extraction, information retrieval and question answering. Before joining Amazon, Arpit worked on augmented reality (AR) and made fundamental contributions to an industrial AR SDK: Vuforia. He received his PhD from the University of Oxford in Computer Vision and Machine Learning. Within Amazon, Arpit manages the research internship program for their Cambridge UK office.

21 March 2019 - Vlad Niculae (Instituto de Telecomunicações, Lisbon, Portugal) - Learning with Sparse Latent Structure

Structured representations are a powerful tool in machine learning, and in particular in natural language processing: The discrete, compositional nature of words and sentences leads to natural combinatorial representations such as trees, sequences, segments, or alignments, among others. At the same time, deep, hierarchical neural networks with latent representations are increasingly widely and successfully applied to language tasks. Deep networks conventionally perform smooth, soft computations resulting in dense hidden representations.

We study deep models with structured and sparse latent representations, without sacrificing differentiability. This allows for fully deterministic models which can be trained with familiar end-to-end gradient-based methods. We demonstrate sparse and structured attention mechanisms, as well as latent computation graph structure learning, with successful empirical results on large scale problems including sentiment analysis, natural language inference, and neural machine translation.

Joint work with Claire Cardie, Mathieu Blondel, and André Martins.

Bio: Vlad is a postdoc in the DeepSpin project at the Instituto de Telecomunicações in Lisbon, Portugal. His research aims to bring structure and sparsity to neural network hidden layers and latent variables, using ideas from convex optimization, and motivations from natural language processing. He earned a PhD in Computer Science from Cornell University in 2018, advised by Claire Cardie. He is co-organizing the NAACL 2019 Workshop on Structured Prediction for NLP (, and the ACL 2019 Tutorial on Latent Structure Models for NLP.

12 March 2019 - Alfredo Kalaitzis (Element AI) - Enabling human rights experts through data-science and machine learning

I will present our lab's joint work Amnesty International, leveraging crowd-sourcing to study online abuse against women on Twitter. This is the first hand-in-hand collaboration between human rights activists and machine learners. On a technical front, we carefully curate an unbiased yet low-variance dataset of labeled tweets, analyze it to account for the variability of abuse perception, and establish baselines, preparing it for release to community research efforts. On a social impact front, this study provides the technical backbone for a media campaign aimed at raising public and deciders’ awareness and elevating the standards expected from social media companies.

For more details see

Bio: Alfredo is a Research Engineer in the AI for Good lab in London, working on applications that enable NGOs.
He is one of the primary co-authors of the first technical report made in partnership with Amnesty International, on the large-scale study of online abuse against women on Twitter from crowd-sourced data.
His research interests lie in probabilistic modeling, stochastic inference, dimensionality reduction, and optimization.
Prior to joining Element AI, he was a Senior Data Scientist in Digital Shadows, specializing in cyber-security and digital risk management, and a consulting Data Scientist in Microsoft's Xbox EMEA team, where he also collaborated with Microsoft Research Cambridge.
He has been a research scientist with the Department of Statistical Science in University College London, working on probability models to better understand ordinal data coming from surveys, and later on the interface of Machine Learning and Signal Processing to detect faults in the low-voltage power-line grid. During his time with UCL his team won the first data challenge competition organized by the Royal Statistical Society, for which he designed and developed his team's algorithm for the analysis of resting f-MRI time-series data.
He earned his MSc in Artificial Intelligence from the University of Edinburgh, and his PhD in Machine Learning from the University of Sheffield under the supervision of Professor Neil Lawrence. His PhD research led to contributions in probability methods for the dimensionality reduction of data and developed methods for gene-expression time-series to discover genetic factors of disease.

6 March 2019 - Yorick Wilks (The University of Sheffield / IHMC) - Moral Orthoses: a new approach to human and machine ethics

I argue that both human and machine actions are more opaque than is generally realized, will require explanation that an ethical orthosis might provide in both cases, as aspects of artificial Companions for both human and machine actors.
These explanations might well be closer to ethical accounts based on moral sentiment or emotion in the tradition of the primacy of sentiment over reason in this area of human and machine action.

24 January 2019 - Loïc Barrault (LIUM, University of Le Mans) - Some recent work on neural machine translation

Neural Machine Translation systems are more and more effective. However, they are still far from reaching the human level.
One of the reason is that the machine is using text only, lacking of general context. I will present our recent research work on integrating visual information as context into an NMT system. I will then discuss about the quantitive and qualitative aspects of the obtained results.

29 November 2018 - Adam Tsakalidis (University of Warwick) - Nowcasting User Behaviour with Social Media and Smart Devices

The adoption of social media and smart devices by millions of users worldwide over the last decade has resulted in an unprecedented opportunity for natural language processing and social sciences. Users publish their thoughts and opinions on everyday issues through social media platforms, while they record their digital traces through their smart devices. Mining these rich resources offers new opportunities in sensing real-world events and indices in a longitudinal fashion. This talk will focus on how to utilise such user-generated content in order to "nowcast" (i.e., predict the current state of) user-specific (a) political and (b) mental health indices, under a real-world and longitudinal setting. The talk will be divided into two parts. In the first part, we will focus on mining social media to infer user voting intention. We model social media users based on the content they share and their network structure over time, aiming to nowcast their political stance under a time constrained setting (i.e., Greek bailout referendum 2015). In the second part, we will also account for heterogeneous information sources about the user (e.g., information derived from users' smart phones, SMS and social media messages), aiming this time to nowcast time-varying and user-specific mental health indices on a longitudinal basis. We will emphasise the importance of sticking to a real-world evaluation setting and present the challenges that current state-of-the-art face, when tested under such an evaluation framework. Finally, we will outline open challenges in both domains and provide directions for future research.

Bio: Adam Tsakalidis is a final stage PhD candidate at the University of Warwick (Supervisors: A. I. Cristea and M. Liakata) and is currently working as a Research Associate at The Alan Turing Institute. He holds a PG Diploma in Computer and Communications Engineering (University of Thessaly, Greece) and a MSc in Computer Science and Applications (University of Warwick). Before his PhD, he had worked as a Research Assistant in the SocialSensor project (CERTH/ITI, Greece). His research interests lie in the area of natural language processing, with a particular focus on the longitudinal modelling of user-generated information as a step towards real-time monitoring of real-world indices.

7 November 2018 - Yanai Elazar (Bar-Ilan University) - Adversarial Removal of Demographic Attributes from Text Data

Recent advances in Representation Learning and Adversarial Training seem to succeed in removing unwanted features from the learned representation. We show that demographic information of authors is encoded in -- and can be recovered from -- the intermediate representations learned by text-based neural classifiers. The implication is that decisions of classifiers trained on textual data are not agnostic to -- and likely condition on -- demographic attributes. When attempting to remove such demographic information using adversarial training, we find that while the adversarial component achieves chance-level development-set accuracy during training, a post-hoc classifier, trained on the encoded sentences from the first part, still manages to reach substantially higher classification accuracies on the same data. This behavior is consistent across several tasks, demographic properties and datasets. We explore several techniques to improve the effectiveness of the adversarial component. Our main conclusion is a cautionary one: do not rely on the adversarial training to achieve invariant representation to sensitive features.

18 October 2018 - Shadrock Roberts (Ushahidi) - Natural Language Processing for Humanitarian Response: a view from the field

Drawing on real-life case studies from Nepal, Indonesia, and Kenya, I will provide an overview of how crowdsourced and social media data are used or ignored in humanitarian response and the challenges they pose for practitioners. Designed in order to respond to these challenges, I will present early stage software prototypes using the GATE open source NLP toolkit to identify context, actionability, and veracity in social media and crowdsourced data in order to speed and prioritize the delivery of humanitarian aid. Speaking as a practitioner, I will also propose avenues for impactful research and design to help increase the adoption of new tools and methods.

Bio: Shadrock Roberts is a humanitarian geographer and the Director of resilience and research programs at the Kenyan non-profit, Ushahidi, which builds open source software to crowdsource information for humanitarian response. He has worked for a variety of humanitarian and development organizations in multiple countries and holds a Ph.D. in Geography from the University in Georgia. His career has focused on the intersection of geographic information systems, information and communication technologies, and community engagement to improve the availability of data for humanitarian and development assistance. He has only recently learned what a “chip butty” is, and remains unclear on the concept.

2017 - 2018

7 June 2018 - Peter Cochrane (University of Suffolk) - Self Awareness: The Next BIG Breakthrough in NLP

For >50 years the dream of talking to a machine at a (human) conversational level has always been 30 years in the future. However, recent advances in computer, sensor, network, robotic, and mobile device hardware has brought that horizon much closer. In short; transistor density and connectivity per chip, along with network complexity crossed a critical threshold and accelerated the abilities of AI.
We know that NLP is critically dependent on context and cognition, plus the most vital element - self-awareness; and whilst context is easily established, cognition, and even more so ‘self-awareness’ remain hotly debated and in the future.
Here we present an entropic quantification of AI which is intuitively extended to AL and life in general. Self-awareness is then identified as an emergent property which we attempt to place on a realistic time-line.

22 March 2018 - Marco Damonte (University of Edinburgh) - Natural Language Understanding with Abstract Meaning Representation

Abstract meaning representation (Banarescu et al, 2013), or AMR for short, is a semantic representation that provides sentences with a deep semantic interpretation. AMR includes most of the shallow-semantic NLP tasks that are usually addressed separately, such as named entity recognition, semantic role labeling and coreference resolution. AMR is not an interlingua, but AMR graphs can be exploited for a number of NLP tasks such as machine translation, summarisation and paraphrasing. Text-to-AMR (parsing) and AMR-to-text (generation) is however far from providing and using sufficiently accurate graphs for downstream applications. Moreover, not much work has been carried out on AMR for languages other than English. In this talk I’ll present my work on addressing these issues.

1 March 2018 - Wang Ling (Google DeepMind)

1 February 2018 - Johannes Welbl (University College London) - Constructing Datasets for Multi-hop Reading Comprehension Across Documents

Contemporary Reading Comprehension (RC) datasets — SQuAD, TriviaQA, etc. — are dominated by queries that can be answered with a single paragraph or document. However, enabling models to combine pieces of textual information from different sources would drastically extend the scope of RC. In this talk, I will introduce a novel Multi-hop RC task, where a model has to learn how to find and combine disjoint pieces of textual evidence, effectively performing multi-step (alias multi-hop) inference.
I present two datasets, WikiHop and MedHop, from different domains — both constructed using a unified methodology. I will then discuss the behaviour of several baseline models, including two established end-to-end RC models, BiDAF and FastQA. For example, one model is in fact capable of integrating information across documents, but both models struggle to select relevant information.
Overall the end-to-end models outperform multiple baselines, but their best accuracy is still far behind human performance, leaving ample room for model improvement. It is our hope that these new datasets will drive future RC model development, leading to new and improved applications in areas such as Search, Question Answering, and Fact Checking.

18 January 2018 - Horacio Saggion (Universitat Pompeu Fabra) - Mining and Enriching Scientific Text Collections

In the current online Open Science context, scientific datasets and tools for deep text analysis, visualization and exploitation play a major role. I will present a system developed over the past three years for “deep” analysis and annotation of scientific text collections. After a brief overview of the system and its main components, I will present our current work on the development of a bi-lingual (Spanish and English) fully annotated text resource in the field of natural language processing that we have created with our system. Moreover, a faceted-search and visualization system to explore the created resource will be also discussed.

I will take the opportunity to present further areas of research carried out in our Natural Language Processing group.

7 December 2017 - Miquel Espla-Gomis (Universitat d'Alacant) - Identifying insertion positions in word-level machine translation quality estimation

Machine translation (MT) quality estimation (QE) is the task of predicting the quality of a translation produced by an MT system without having a reference translation. At the level of sentences, quality is usually estimated in terms of the effort required to fix the translation, trying to predict metrics such as translation error rate (TER) or post-editing time. When it comes to word level, QE is usually tackled as the task of identifying which words in the translation need to be replaced or deleted. The main advantage of word-level MT QE in front of MT sentence- or document-level MT QE is that it can be used to help post-editors to focus their attention on those parts of the translation that need to be fixed. However, with the current approach of only identifying the words that need to be fixed, post-editors using word-level MT QE could be disregarding missing words. In order to improve the performance of such systems, we propose an approach capable to identifying both the words that need to be deleted and the positions where one or more words need to be inserted. The work presented compares different types of simple neural network architectures that build on different sources of bilingual information in order to provide such predictions. The results obtained not only confirm the feasibility of the approach proposed, but also that a reasonably high performance on both tasks can be obtained using relatively simple architectures.

16 November 2017 - Zeerak Waseem (The University of Sheffield) - Why the F*ck do You Talk Like That?

Over the past year, abusive language detection has received a surge in interest from the NLP community. In spite of this surge in interest, very little work bases itself in the social scientific theories on abusive language. In addition, little work deals with the social contexts surrounding abusive statements or bridging the gaps that are introduced by switching to different social contexts.

9 November 2017 - Yorick Wilks (University of Sheffield / Florida Institute for Human & Machine Cognition) - Will there be superintelligence and would it hate us?

The paper examines Bostrom’s notion of Superintelligence and argues that, although we should not be sanguine about the future of AI or its potential for harm, superintelligent AI is highly unlikely to come about in the way Bostrom imagines.

2 November 2017 - Emem Rita Usanga (Bnkability) - Rethinking how deals investment is raised in Africa using NLP

With a $100bn annual infrastructure funding deficit over the next 10years and a npopulation anticipated to double by 2045, the need for infrastructure across the African continent is a pressing need. Government acknowledge this can only be done in partnership with private investors. Problem - international private investor often argue there's a lack of bankable projects in Africa.

The issue of bankability, in other words investability, is one that Bnkability aims to solve using NLP to identify and design bankable projects. Some of the key challenges we face include:

  • How do you take knowledge based information and turn into data that a machine can assess uploaded project information against?
  • How does a machine identify missing elements within a project plan, so not simply a word but sections for instance risk mitigation?
  • And even if it is present, how does it assess how robust the risk mitigation strategy is against sector, country and environmental factors?

This is an interactive session where we present our challenges in the application of NLP to our business solution and attendees propose possible solutions.

26 October 2017 - NLP Student Talks

Chiraag Lala - Multimodal Lexical Translation

Inspired by the tasks of Multimodal Machine Translation and Visual Sense Disambiguation we introduce a task called Multimodal Lexical Translation (MLT). The aim of this new task is to correctly translate an ambiguous word given its context - an image and a sentence in the source language. To facilitate the task, we introduce the MLT datasets, where each data point is a $4$-tuple consisting of an ambiguous source word, its visual context (an image), its textual context (a source sentence), and its translation that conforms with the visual and textual contexts. The dataset has been created from the Multi30K corpus using word-alignment followed by human inspection for English to German and English to French language directions. These datasets form a very valuable multimodal and multilingual language resource with several potential uses including evaluation of lexical disambiguation within (Multimodal) Machine Translation systems.

Fernando Manchego - Sentence Simplification via Sequence Labeling

Text Simplification aims to modify the content and structure of a text, in order to make it easier to read and understand. At the sentence-level, several rewriting operations can be performed to achieve this goal: replacing complex words or phrases for simpler synonyms, deleting unimportant content, splitting the sentence, etc. Most research treats sentence simplification as machine translation (MT), with complex and simple as source and target languages, respectively. In this talk, we will first present an in-depth analysis on the potential and limitations of end-to-end MT-style models using automatic and manual evaluations. To deal with some of the identified problems, we devise a two-step sequence labeling method: (i) identify the simplification operations that need to be performed (if any) in each token of sentence, and (ii) execute the operation using transformation-specific strategies. We show that this operation-based approach is able to produce simpler texts than end-to-end models.

19 October 2017 - Kris Cao (University of Cambridge) - Latent variable models of language

Behind the observed surface form of language exist underlying structures and themes, such as syntax, topic and utterance intent. In this talk, I will present some work which composes graphical models to learn underlying variables with powerful data likelihood functions to model the observed surface form. One such application is in open-domain dialogue modelling, where the latent variables capture the variation in the possible responses to a user utterance. We show that the latent variable approach generates more acceptable diverse output, as measured by human annotators. Another is extending topic models to instead learn topics underlying entire sentences, rather than just words. This lets the model learn topics which capture compositional meaning, which a standard word-level model has difficult doing.

12 October 2017 - Sasha Narayan (University of Edinburgh) - Text-to-text Generation Beyond Machine Translation

In recent years we have witnessed the achievements of sequence-to-sequence encoder-decoder models for machine translation.
It is no surprise that these models are also setting a trend in various other generation tasks such as dialogue generation, image caption generation, sentence compression, paraphrase generation, sentence simplification and document summarization. Yet, these deep learning sequence models are often applied off-the-shelf to these text-to-text generation tasks, not tailoring the underlying model to the specific task to improve performance.

In this talk I will discuss two examples, sentence simplification and document summarization, that explore the hypothesis that tailoring the model with knowledge of the task structure and linguistic requirements leads to better performance. In the first part, I will propose a new sentence simplification task (split-and-rephrase) where the aim is to split a complex sentence into a meaning preserving sequence of shorter sentences. I will show that the semantically-motivated split model is a key factor in generating fluent and meaning preserving rephrasings.
In the second part, I will discuss the shortcomings of sequence-to-sequence abstractive methods for document summarization and show that an extractive summarization system trained to globally optimize a common summarization evaluation metric outperforms state-of-the-art extractive and abstractive systems in both automatic and extensive human evaluations.

BIO: Shashi Narayan is a postdoctoral researcher in the School of Informatics at the University of Edinburgh. He obtained his PhD in Computer Science at the University of Lorraine, INRIA under Claire Gardent in 2014. His research focuses on natural language generation and understanding with an aim to develop general frameworks for generation from underlying meaning representation or for text rewriting such as summarization, text simplification and paraphrase generation. He also has experience with parsing and other structured prediction problems.

4 September 2017 - Thushari Atapattu (University of Adelaide) - Disclosure Analysis of Educational Big Data

Discourse analysis within the educational context consists of processing natural language data generated from learning and teaching processes including written assessments, transcripts, discussion forums, and micro blogs. Computational approaches for discourse analysis integrates NLP with psychological theories of social interaction, discourse comprehension, and communication. Discourse analysis is a complex problem, particularly within massive classrooms (e.g. Massive Open Online Courses – MOOCs). In this talk, I will discuss two of our research in understanding the academic discourse of lecturers as well as learner-generated discourse in MOOCs. Our work aims to detect the learners’ video interactions patterns and inform us of the influence of quality of lecturers’ discourse. This work analyses millions of video interactions in two MOOCs and found that transition in discourse (i.e. lexical diversity, connectivity) impacts on learners’ video engagement behaviour. Further, I will talk about the association between the quality of learner-generated discourse (i.e. discussion posts) and its impact on learning success. Thus, I will explain how the understanding of discourse enables us to identify the interventions for positive student trajectories.

Past seminars

Reading Group

NLP Reading Group

The target audience is all the members of the NLP group and other possible interested participants.

The meeting will take place weekly for one hour usually on Mondays from 1-2:30pm.

The meetings of the group will be informal and no necessary preparation will be required with the exception of the moderator reading the current paper and the rest having at least a brief overview of it.

Full details of the reading group can be found at

Past Meetings

Monday 11 March 2019
Transfer Learning from LDA to BiLSTM-CNN for Offensive Language Detection in TwitterFriday 1 March 2019

ACL Review Session

Monday 25 February 2019

Explaining Recurrent Neural Network Predictions in Sentiment Analysis
ACL'17 paper, by (Arras et al)

Monday 18 February 2019

Universal Transformer by Dehghani et al

Related blog post:
OpenReview for ICLR 2019:

Monday 11 February 2019

Parameter-Efficient Transfer Learning for NLP
by Neil Houlsby, Andrei Giurgiu, and al., 2018

Monday 4 February 2019

Universal Language Model Fine-tuning for Text Classification
Howared and Ruder (ACL 2018)

Monday 28 January 2019

Unsupervised Neural Text Simplification

Tuesday 12 June 2018

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Chelsea Finn, Pieter Abbeel, Sergey Levine, ICML 2017
Blog post about the paper by the authors

Tuesday 10 April 2018

Style Transfer from Non-Parallel Text by Cross-Alignment

Shen, T; Lei, T; Barzilay, R; Jaakola, T.

Tuesday 3 April 2018

Generating Natural Adversarial Examples

Zhengli Zhao, Dheeru Dua and Sameer Singh

Tuesday 20 February 2018

ACL Paper submission feedback session

Tuesday 13 February 2018

Unbounded cache model for online language modeling with open vocabulary

Edouard Grave, Moustapha Cisse & Armand Joulin

Tuesday 6 February 2018

Neural Sequence Learning Models for Word Sense Disambiguation

Alessandro Raganato, Claudio Delli Bovi & Roberto Navigli

Tuesday 30 January 2018

End-to-End Differentiable Proving

Tim Rocktäschel & Sebastian Riedel

Tuesday 23 January 2018

Unsupervised Learning of Universal Sentence Representations from NLI Data.

Tuesday 28 November 2017

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Melvin Johnson, Mike Schuster, Quoc V. Le, et al.

Tuesday 14 November 2017

Representations of language in a model of visually grounded speech signal

Grzegorz Chrupała, Lieke Gelderloos & Afra Alishahi

Tuesday 7 November 2017

A Class of Submodular Functions for Document Summarization

Hui Lin & Jeff Bilmes

Tuesday 31 October 2017

Question Generation for Question Answering

Nan Duan, Duyu Tang, Peng Chen & Ming Zhou

Tuesday 24 October 2017

Morphological Inflection Generation with Hard Monotonic Attention

Roee Aharoni & Yoav Goldberg

Tuesday 17 October 2017

A Factored Neural Network Model for Characterizing Online Discussions in Vector Space

Hao Cheng, Hao Fang, Mari Ostendorf

Tuesday 10 October 2017

Understanding Black-box Predictions via Influence Functions

Pang Wei Koh, Percy Liang; Published in Proceedings of International Conference on Machine Learning, 2017

Tuesday 3 October 2017

Zero-Shot Relation Extraction via Reading Comprehension

Omer Levy, Minjoon Seo, Eunsol Choi and Luke Zettlemoyer

Tuesday 19 September 2017

"Men also like shopping: Reducing Gender Bias Amplification Using Corpus Level Constraints"

Tuesday 29 August 2017

Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction

Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell

Proceedings of the 34th International Conference on Machine Learning, PMLR 70:3309-3318, 2017.

Tuesday 22 August 2017

Split and Rephrase, Accepted for EMNLP 2017

Shashi Narayan, Claire Gardent, Shay B. Cohen and Anastasia Shimorina

Tuesday 15 August 2017

Attention Is All You need
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely

Tuesday 8 August 2017

Learning to Compute Word Embeddings On the Fly

Dzmitry Bahdanau, Tom Bosc, Stanisław Jastrzębski, Edward Grefenstette, Pascal Vincent, Yoshua Bengio

Tuesday 1 August 2017

Learning to Generate Textual Data, EMNLP 2016
Guillaume Bouchard and Pontus Stenetorp and Sebastian Riedel

Tuesday 11 July 2017

SoundNet: Learning Sound Representations from Unlabeled Video

Yusuf Aytar, Carl Vondrick, Antonio Torralba

Tuesday 4 July 2017

Sentence Simplification with Deep Reinforcement Learning

Xingxing Zhang, Mirella Lapata

Tuesday 27 June 2017

Generation and Comprehension of Unambiguous Object Descriptions

Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan Yuille, Kevin Murphy

Tuesday 20 June 2017

Understanding the BPE algorithm

Tuesday 13 June 2017

Sequence-to-Sequence Models Can Directly Transcribe Foreign Speech

Ron J. Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu, Zhifeng Chen

Tuesday 6 June 2017

Covonlutional Sequence to Sequence Learning

Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin

Tuesday 30 May 2017

Program Induction by Rationale Generation:Learning to Solve and Explain Algebraic Word Problems

Wang Ling, Dani Yogatama, Chris Dyer, Phil Blunsom

Tuesday 9 May 2017

Chatterjee et al.: Online Automatic Post-editing for MT in a Multi-Domain Translation Environment

Tuesday 6 May 2017

Convolutional Sequence to Sequence Learning

Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin

Tuesday 2 May 2017

Coarse-to-Fine Question Answering for Long Documents

Tuesday 25 April 2017

Re-evaluating Automatic Metrics for Image Captioning

Mert Kilickaya, Aykut Erdem, Nazli Ikizler-Cinbis, Erkut Erdem

Tuesday 18 April 2017

Neural Tree Indexers, EACL2017

Tuesday 11 April 2017

EACL Recap

Tuesday 4 April 2017

Shakir Mohammed's deep learning overview

Tuesday 28 March 2017

Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond

Tuesday 21 March 2017

Unsupervised AMR-Dependency Parse Alignment

Tuesday 14 March 2017

Kim et al. (2016): Examples are not Enough, Learn to Criticize! Criticism for Interpretability, NIPS 2016

Tuesday 7 March 2017

Latent Variable Dialogue Models and their Diversity

Kris Cao and Stephen Clark

Tuesday 28 February 2017

Zhang et al. EACL2017

Tuesday 21 February 2017

Structured Attention Networks

Tuesday 14 February 2017

CORE: Context-Aware Open Relation Extraction with Factorization Machines

by Fabio Petroni, Luciano Del Corro and Rainer Gemulla

Tuesday 7 February 2017

Adversarial Training Methods for Semi-Supervised Text Classification

Takeru Miyato, Andrew, M.Dai, Ian Goodfellow

Tuesday 31 January 2017

Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing

Tim Vieira and Jason Eisner

Tuesday 24 January 2017

Matching Networks for One Shot Learning

Oriol Vinyals, Charles Blundell, Tim Lillicrap, Koray Kavukcuoglu, Daan Wierstra

Tuesday 17 January 2017

Learning Structured Predictors from Bandit Feedback for Interactive NLP. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL). Berlin, Germany

Artem Sokolov, Julia Kreutzer, Christopher Lo, Stefan Riezler

Tuesday 13 December 2016

Optimization and Sampling for NLP from a Unified Viewpoint

Marc Dymetman, Guillaume Bouchard, Simon Carter

Tuesday 6 December 2016

Matrix Completion has No Spurious Local Minimum

Rong Ge, Jason D. Lee, Tengyu Ma

Tuesday 29 November 2016

Compositional Semantic Parsing on Semi-Structured Tables 
Panupong Pasupat and Percy Liang

Tuesday 22 November 2016

Minimum Risk Training for Neural Machine Translation 
Shiqi Shen, Yong Cheng, Zhougjun He, Wei He, Hua Wu, Maosong Sun, Yang Liu

Tuesday 15 November 2016

Generation from Abstract Meaning Representation using Tree Transducers 
Jeffrey Flanigan, Chris Dyer, Noah A. Smith and Jaime Carbonell

Tuesday 1 November 2016

Visual Representations for Topic Understanding and Their Effects on Manually Generated Labels Transactions of the Association for Computational Linguistics, 2016. 
Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Leah Findlater, Jordan Boyd-Graber, and Niklas Elmqvist

Tuesday 25 October 2016

Learning to Search Better than your Teacher

Chang et al. ICML 2015

Tuesday 11 October 2016

A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task 
Danqi Chen, Jason Bolton, Christopher D. Manning

Tuesday 4 October 2016

Ultradense Word Embeddings by Orthogonal Transformation 
Sascha Rothe, Sebastian Ebert, Hinrich Schütze

Tuesday 7 June 2016

Not All Character N-grams Are Created Equal: A Study in Authorship Attribution. 
Upendra Sapkota, Steven Bethard, Manuel Montes-y-Gómez & Thamar Solorio (2015)

Tuesday 31 May 2016

Relation extraction with matrix factorization and universal schemas.

Riedel, S., Yao, L., McCallum, A., & Marlin, B. M. (2013)

Tuesday 10 May 2016

Training Deterministic Parsers with Non-Deterministic Oracles, TACL

Goldberg, Y. and Nivre, J. (2013)

Tuesday 3 May 2016

A New Corpus and Imitation Learning Framework for Context-Dependent Semantic Parsing 
Vlachos, A. and Clark, S.

Tuesday 22 April 2016

Sequence Level Training with recurrent Neural Networks 
Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, Wojciech Zaremba

Tuesday 22 March 2016

"Distributed Representation of Sentences and Documents" 
Quoc Le and Tomas Mikolov

Tuesday 8 March 2016

AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes 
Sascha Rothe; Hinrich Schütze. ACL2015 (best student paper)

Tuesday 23 February 2016

From Word Embeddings To Document Distances 
Kusner et al.

Tuesday 16 February 2016

"Target-Dependent Twitter Sentiment Classification with Rich Automatic Features"

Tuesday 9 February 2016

"Evaluation methods for unsupervised word embeddings"

Tuesday 25 January 2016

Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks 
Hua He, Kevin Gimpel, and Jimmy Lin. EMNLP2015

Tuesday 19 January 2016

Multilingual Image Description with Neural Sequence Models

Tuesday 12 January 2016

"Improving Distributional Similarity with Lessons Learned from Word Embeddings"

Tuesday 8 December 2015

Using Discourse Structure Improves Machine Translation Evaluation
F Guzmán, S Joty, L Màrquez, P Nakov

And here are the author's slides

Tuesday 1 December 2015

Practical Bayesian Optimization of Machine Learning Algorithms Advances in Neural Information Processing Systems, 2012 
Snoek, J.; Larochelle, H. & Adams, R. P.

Related presentations/lecture slides:

Related Video

My reading group presentation slides

Tuesday 24 November 2015

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks ACL 2015 
LSTMs? Kai Sheng Tai, Richard Socher, Christopher D. Manning

Additional resource about LSTM: "Anyone Can Learn To Code an LSTM-RNN in Python"

Tuesday 17 November 2015

RNNs/LSTMs ConvNets

More details on auto encoders for unsupervised pre-training:

Tuesday 10 November 2015

Multi-Metric Optimization Using Ensemble Tuning. NAACL2013. Video 
Baskaran Sankaran, Anoop Sarkar and Kevin Duh

Tuesday 3 November 2015

NN tutorials by Quoc Le

Josiah's slides

Other resources:

Andrej Karpathy's notes

Different objective functions, multiclass problems

Gradient descent


Discussion about different activation functions

Tuesday 27 October 2015

Three blog posts introducing RNNs for language modelling in equations and code

might help to read this NLP primer

Additional material:
a thorough explanation of back propagation

Tuesday 20 October 2015

Teaching Machines to Read and Comprehend. NIPS 2015. 
Karl Moritz Hermann, Tomáš Kociský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, Phil Blunsom

Slides (presented at LXMLS)

Background reading:

Understanding LSTMs

NAACL 2013 Tutorial "Deep Learning without Magic"

EMNLP 2014 Tutorial "Embedding Methods for NLP"

Related Work:

Entailment with Neural Attention (better description of attention models than in the NIPS paper in my opinion)

Memory Networks

Tuesday 13 October 2015

A large annotated corpus for learning natural language inference. Proceedings of EMNLP 2015. 
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning

Should compare this to work on (multilingual) textual similarity



Funded Research Projects

Current Projects

Currently these group projects are active (in alphabetical order)

  • ALEXA Graduate Fund
    Nikolaos Aletras / Eleni Vasilaki
  • APE-QUEST: Automated Post-Editing & Qualitiy Estimation
    Lucia Specia
  • Automatic Detection of Online Misinformation
    Kalina Boncheva
  • BERGAMOT: Browser-based Multilingual Translation
    Lucia Specia / Nikolaos Aletras
  • Data Analytics
    Mark Stevenson
  • ELG: European Language Grid
    Kalina Bontcheva
  • GATE: A General Architecture for Text Engineering 
    Hamish Cunningham / Kalina Bontcheva
  • Digital Sensitivity Review
    Mark Stevenson
  • Distinguishing Common and Proper Nouns 
    Mark Stevenson
  • Healtex: UK Healthcare Text Analytics Research Network
    Rob Gaizauskas
  • Information Retrieval Facility
    Hamish Cunningham
  • Journalist-in-the-Loop Machine Learning as a Service for Rumour Analysis
    Kalina Bontcheva / Nikolaos Aletras
  • KNOWMAK: Knowledge in the making in the European society
    Diana Maynard / Kalina Bontcheva
  • Resilient Campus, Resilient City (RC²)
    Hamish Cunningham
  • RISIS 2: European Research Infrastructure for Science, technology and Innovation policy Studies 2
    Diana Maynard / Kalina Bontcheva
  • SIMPATICO:SIMplifying the interaction with Public Administration Through Information technology for Citizens and cOmpanies
    Lucia Specia
  • SoBigData: SoBigData Research Infrastructure
    Hamish Cunningham / Kalina Bontcheva
  • SUMMA: Scalable Understanding of Multilingual MediA 
    Andreas Vlachos
  • Transforming Food Production - Project 2
    Hamish Cunningham
  • WeVerify: Wider and enhanced Verification for you
    Kalina Bontcheva
Previous Projects

Previous projects (in alphabetical order)

  • ACCURAT: Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation 
    Rob Gaizauskas & Paul Clough (Information School)
  • ABRAXAS: Automating Ontology Learning for the Semantic Web 
    Yorick Wilks & Fabio Ciravegna
  • AKT: Advanced Knowledge Technolgies 
    Yorick Wilks
  • AMILCARE: An adaptive IE system for the Semantic Web 
    Fabio Ciravegna
  • AMITIES: Automated Multilingual Interaction with Information and Services 
    Yorick Wilks
  • AnnoMarket: Annotation Resource Marketplace in the Cloud 
    Hamish Cunningham
  • ARCOMEM: From Collect-All Archives to Community Memories - Leveraging the Wisdom of the Crowds for Intelligent Preservation 
    Hamish Cunningham
  • AVENTINUS: Advanced Information System for Multinational Drug Enforcement 
    Yorick Wilks & Hamish Cunningham
  • Barista: Non-Parametric Models of Phrase-based Machine Translation 
    Trevor Cohn
  • CA4NLP: Engineering Natural Language Interfaces: can CA help? 
    Mark Hepple & Peter Wallis
  • CASTLE: Computational Adaptive Semantics for Language Engineering 
    Mark Stevenson
  • COMRADES: Collective Platform for Community Resilience and Social Innovation during Crises 
    Kalina Bontcheva / Diana Maynard
  • CLARIN: Common Language Resources and Technology Infrastructure 
    Wim Peters
  • CLARITY: Cross Language Information Retrieval and Organisation of Text and Audio Documents 
    Rob Gaizauskas & Mark Sanderson (Information Studies)
  • CLEF: CLinical E-Science Framework 
    Rob Gaizauskas & Mark Hepple
  • CLUE II: Contextual Learning for detecting Unexpected Events 
    Louise Guthrie
  • COMIC: COnversational Multimodal Interaction with Computers 
    Yorick Wilks
  • COMPANIONS: Intelligent, Persistent, Personalised Multimodal Interfaces to the Internet 
    Yorick Wilks
  • Cracker: Cracking the Language Barrier: Coordination, Evaluation and Resources for European MT Research
    Lucia Specia
  • CRONOPATH: Information Retrieval/Extraction through time 
    Yorick Wilks
  • CONVERSE: A Conversational Companion 
    Yorick Wilks
  • CLUE: Contextual Learning for detecting Unexpected Events 
    Louise Guthrie
  • Cub Reporter: QA and Summarisation for Preparation of Background News Reports 
    Rob Gaizauskas, Yorick Wilks & Jonathan Foster (Jounalism Studies)
  • DALOS: DrAfting Legislation with Ontology-based Support 
    Wim Peters
  • DAPPER: Natural Language Processing Tools for Discourse Analysis in Psychology 
    Horacio Saggion
  • DecarboNET: A Decarbonisation Platform for Citizen Empowerment and Translating Collective Awareness into Behavioural Change
    Kalina Bontcheva
  • DILiGENt: Domain-Independent Language Generation
    Andreas Vlachos
  • DOT KOM: Designing Adaptive Information Extraction from Text for Knowledge Management and the Semantic Web 
    Fabio Ciravegna
  • DotRural: A Text Analytic Approach to Rural and Urban Legal Histories 
    Wim Peters
  • Expert: EXPloiting Empirical appRoaches to Translation 
    Lucia Specia
  • Extraction of Content: Research at Near Market 
    Yorick Wilks
  • ELSE: Evaluation in Language and Speech Engineering 
    Rob Gaizauskas
  • EMILLE: Enabling Minority Language Engineering 
    Rob Gaizauskas
  • EMPATHIE: Enzyme and Metabolic Path Information Extraction 
    Rob Gaizauskas
  • EMPIRICAL GRAMMAR: Inducing Adequate Grammars from Electronic Texts 
    Yorick Wilks & Rob Gaizauskas
  • eNeMILP: Non-Monotonic Incremental Language Processing
    Andreas Vlachos
  • EnviLOD: Semantic Enrichment and Search with Linked Open Data: A Case Study on Environmental Science Literature
    Kalina Bontcheva
  • EWN: EuroWordNet 
    Yorick Wilks
  • FASiL: Flexible and Adaptive Spoken Language and Multi-Modal Interfaces 
    Yorick Wilks
  • FLaReNet: Fostering Language Resources Network 
    Yorick Wilks & Wim Peters
  • ForgetIT: Concise Preservation by combining Managed Forgetting and Contextualized Remembering 
    Hamish Cunningham
  • GATE Cloud Exploratory: Adapting the General Architecture for Text Engineering to Cloud Computing 
    Hamish Cunningham
  • GoTag: Real-Time Text Mining for the Biomedical Literature: A Collaboration between Discoverynet & Mygrid 
    Rob Gaizauskas
  • HUMAINE: Research on Emotions and Human-Machine Interaction 
    Yorick Wilks & Daniela Romano
  • Investigating Spoken Dialogue to Support Manufacturing Processes
    Rob Gaizauskas
  • InPuT: Individual Profiling using Text Analysis
    Mark Stevenson
  • KConnect: Khresmoi Multilingual Medical Text Analysis, Search and Machine Translation Connected in a Thriving Data-Value Chain 
    Angus Roberts
  • KHRESMOI: Knowledge Helper for Medical and Other Information users 
    Hamish Cunningham
  • KTA PoC Award: Scaling-up WSD for the Life Sciences 
    Mark Stevenson
  • KnowledgeWeb: Network on excellence on realising the Semantic Web 
    Hamish Cunningham
  • LarKC: Large Scale Semantic Computing Semantic Web Technologies distributed reasoning 
    Hamish Cunningham
  • LaSIE: Large Scale Information Extraction 
    Yorick Wilks & Rob Gaizauskas
  • LEXDIS: Lexical Disambiguation for the Biomedical Domain 
    Mark Stevenson
  • LIRICS: Linguistic Infrastructure for Interoperable Resources and Systems 
    Kalina Bontcheva
  • LOIS: Lexical Ontologies for Legal Information Sharing 
    Wim Peters
  • Low Resource Aquaponic Agriculture in Nepal
    Hamish Cunningham
  • MALT: Mappings, Agglomerations and Lexical Tuning 
    Yorick Wilks
  • MiAkt: Grid enabled knowledge services: collaborative problem solving environments in medical informatics 
    Yorick Wilks & Fabio Ciravegna
  • MediaCampaign: Discovering, inter-relating and navigating cross-media campaign knowledge 
    Hamish Cunningham
  • Medics: Language Processing for Literature Based Discovery in Medicine 
    Mark Stevenson
  • MLi: Towards a MultiLingual Data Services infrastructure 
    Hamish Cunningham
  • MoDiST: Modelling Discourse in Statistical Machine Translation 
    Lucia Specia
  • MULTIFLORA_II: Combining Information Extraction and Knowledge Representation for Biodiversity Informatics 
    Yorick Wilks & Hamish Cunningham
  • MultiMatch: Multilingual/Multimedia Access To Cultural Heritage 
    Paul Clough (Information Studies)
  • MultiMT: Multimodal Machine Translation
    Lucia Specia
  • MUMIS: Multi-Media Indexing and Searching Environment 
    Yorick Wilks & Hamish Cunningham
  • MUSE: Multi-Source Entity finder 
    Yorick Wilks
  • Musing: Multi-Industry, Semantic-based Next Generation Business IntelliGence 
    Kalina Bontcheva
  • MyGrid: Supporting the Biologist E-Scientist 
    Rob Gaizauskas
  • NAMIC: News Agencies Multilingual Information Categorisation 
    Yorick Wilks
  • NEON: Lifecycle support for networked ontologies 
    Hamish Cunningham
  • OpenMinTed: Open Mining INfrastructure for TExt and Data
    Angus Roberts
  • PAROLE/SIMPLE: Preparatory Action for Linguistic Resources Organistion for Language Engineering 
    Yorick Wilks
  • PASTA: Protein Active Site Template Acquisition 
    Yorick Wilks
  • PEEC: Partitioning the Enron Email Corpus 
    Louise Guthrie
  • PEEC II: Partitioning the Enron Email Corpus 
    Louise Guthrie
  • PHEME: Computing Veracity Across Media, Languages, and Social Networks
    Kalina Bontcheva
  • POESIA: Public Open-source Environment for a Safer Internet 
    Mark Hepple
  • POETIC: The POrtable Extendable Traffic Information Collator 
    Rob Gaizauskas
  • Predicting Relevance and Quality of Machine Translation for Product Reviews
    Lucia Specia
  • PrestoSpace: Digital preservation and rich metadata indexing of audio-video collections 
    Hamish Cunningham
  • QT21: Quality Translation 21
    Lucia Specia
  • QTLaunchpad: Preparation and Launch of a Large-Scale Action for Quality Translation Technology 
    Lucia Specia
  • Recommendation Algorithm
    Mark Stevenson
  • RESuLT: Relation Extraction using Semi-Supervised Learning Techniques 
    Mark Stevenson
  • REVEAL: The Identification of Anomalous Segments in Text on a Large Scale 
    Louise Guthrie
  • REVEAL II: The Identification of Anomalous Segments in Text on a Large Scale 
    Louise Guthrie
  • RolTech: Platform for Romanian Language Technology: Resources, Tools and Interfaces 
    Valentin Tablan
  • SEKT: Semantically-Enabled Knowledge Technologies (central page) 
    Hamish Cunningham
  • SENSEI: Making Sense of Human-Human Conversation Data
    Rob Gaizauskas
  • SenseMaking: Information Processing and Sensemaking: An Exploratory Search System for Document Collections 
    Mark Stevenson
  • SERA: Social Engagagement with Robots and Agents 
    Peter Wallis
  • ServiceFinder: Realizing Web Service Discovery at Web Scale 
    Kalina Bontcheva
  • SLaTr: A Joint Model of Spoken Language Translation 
    Trevor Cohn / Thomas Hain
  • Sumerian/ETCSL: Tools for linguistic annotation and Web-based analysis of literary Sumerian 
    Hamish Cunningham
  • SOCIS: Scene Of Crime Information System 
    Yorick Wilks
  • SToBS: Structured Transcription of Broadcast Speech 
    Rob Gaizauskas
  • TaaS: Terminology as a Service
    Rob Gaizauskas
  • TAO: Transitioning Applications to Ontologies 
    Kalina Bontcheva
  • h-Techsight: A Knowledge management platform with intelligence and insight capabilities for technology intensive industries 
    Hamish Cunningham
  • TEXTvre: Emerging, collective intelligence for personal, organizational and social use 
    Kalina Bontcheva & Angus Roberts
  • TrendMiner: Large-scale, Cross-lingual Trend Mining and Summarisation of Real-time Media Streams 
    Kalina Bontcheva & Trevor Cohn
  • TRESTLE: Text Retrieval, Extraction and Summarisation for Large Enterprises 
    Rob Gaizauskas & Micheline Beaulieu (Information Studies)
  • uComp: Embedded Human Computation for Knowledge Extraction and Evaluation 
    Wim Peters
    Rob Gaizauskas
  • VIEWGEN: Belief Modelling and Dialogue Systems 
    Yorick Wilks
  • VIKEF: Virtual Information and Knowledge Environment Framework 
    Rob Gaizauskas
  • VisualSense: Tagging visual data with semantic descriptions 
    Rob Gaizauskas

PhD Projects

Current Research Students

Current students listed in alphabetical order

Awarded PhD's

Awarded PhD's by year


Gustavo Henrique Paetzold
Lexical Simplification for Non-Native Speakers
(Award Date: 24 October 2016)

Xingyi Song
Training Machine Translation for Human Acceptability
(Award Date: 16 October 2016)

Roland Roller
Information Extraction from Documents in the Life Sciences
(Award Date: 26 August 2016)


Dominic Rout
A ranking approach to summarising Twitter home timelines
(Award Date: 24 November 2015)


Nikolaos Aletras
Exploring the Semantics of Topic Models
(Award Date: 11 December 2014)

Ayman Alhelbawy
A new approach to information extraction from natural language texts
(Award Date: 23 September 2014)

Daniel Preotiuc-Pietro
Unsupervised learning for time-based clustering of language
(Award Date: 19 June 2014)

Ahmet Aker
Entity Type Modeling for Multi-Document Summarization of Geo-Located Entity Descriptions
(Award Date: 20 February 2014)


Leon Derczynski
Determining the Types of Temporal Relations in Discourse
(Award Date: 2 October 2013)

Samuel Fernando 
enriching knowledge bases using relation extraction
(Award Date: 13 June 2013)

Giuseppe Di Fabbrizio
Automatic Summarization of Opinions in Service and Product Reviews
(Award Date: 8 May 2013)


Angus Roberts
Clinical Information Extraction: Lowering the Barrier
(Award Date: 18 December 2012)

Rao Muhammad Adeel Nawab
Mono-lingual Paraphrased Text reuse and Plagiarism detection
(Award Date: 18 September 2012)

Niraj Aswani
Evolving a Generail Framework for Text Alignment: Case Studies with Two Asian Languages
(Award Date: 7 August 2012)

Kumutha Swampillai
Information Extraction Across Sentences
(Award Date: 7 March 2012)


Angelo Dalli
Timeline Extraction From Hyperlinked Text Corpora
(Award Date: 10 October 2011)

Danica Damljanovic
Natural Language Interfaces to Conceptual Models
(Award Date: 18 August 2011)


Ben Allison
An Improved Hierarchical Bayesian Model of Language for Document Classification
(Award Date: 21 October 2010)

Nick Webb
Cue-based dialogue act classification
(Award Date: 16 March 2010)

Sanaz Jabbari
A Statistical Model of Lexical Context
(Award Date: 23 February 2010)

Valentin Tablan
Toward Portable Information Extraction
(Award Date: 25 January 2010)


David Guthrie
Unsupervised Detection of Anomalous Text
(Award Date: 3 December 2008)

Joe Polifroni
Enabling Browsing in Interactive Systems
(Award Date: 18 November 2008)

Christopher Brewster
Mind the Gap: Bridging from text to ontological Knowledge
(Award Date: 1 October 2008)

Francios Mairesse
Learning to Adapt in Dialogue Systems: Data-driven Models for Personality Recognition and Generation
(Award Date: 30 September 2008)

Hrafn Loftsson
Tagging and Parsing Icelandic Text
(Award Date: 5 February 2008)


Michael Conway
Approaches to Automatic Biographical Sentence Classification: An Empirical Study
(Award Date: 27 July 2007)


Mark Greenwood
Open-Domain Question Answering
(Award Date: 13 March 2006)


Fang Huang
Multi-Document Summarization with Latent Semantic Analysis
(Award Date: 19 May 2005)

Ekaterini Pastra
Vision \96 Language Integration: a Double-Grounding Case
(Award Date: 5 January 2005)


Alexiei Dingli
Annotating the Semantic Web
(Award Date: 6 December 2004)

Wim Peters
Detection and Characterization of Figurative Language Use WordNet
(Award Date: 29 November 2004)

Diego Uribe
LEEP: Learning Event Extraction Patterns
(Award Date: 18 October 2004)

Brian Mitchell
Prepositional Phase Attachment using Machine Learning Algorithms
(Award Date: 5 July 2004)


Paul Clough
Measuring Text Reuse
(Award Date: 11 April 2003)


Tomas By
Tears in the Rain
(Award Date: 15 March 2002)

Andrea Setzer
Temporal information in newswrite articles: An annotation scheme and corpus study
(Award Date: 15 March 2002)


Kalina Bontcheva
Generating Adaptive Hypertext
(Award Date: 17 September 2001)

Alexandar Krotov
Parsing with a Compacted Treebank Grammar
(Award Date: 17 September 2001)


ChunYu Kit 
Unsupervised Lexical Learning as Inductive Inference
(Award Date: 15 November 2000)

Hamish Cunningham
Software Architecture for Language Engineering
(Award Date: 10 July 2000)

H.M. Harmain
Building Object-Oriented Conceptual Models Using Natural Language Processing Techniques
(Award Date: 2000)

Paul Woods
Cognitive Schemas for Chinese Noun Classifiers: A Corpus-Based Investigation
(Award Date: 25 February 2000)


Ted Dunning
Finding Structure In Text Genome And Other Symbolic Sequences
(Award Date: 29 November 1999)

Mark Stevenson
Multiple Knowledge Sources for Word Sense Disambiguation
(Award Date: 27 September 1999)

Hammid Khosravi
Extracting Pragmatic Content From Email
(Award Date: 9 August 1999)


Mark Lee
Belief Rationality and Inference
(Award Date: 14 December 1998)

Rob Collier
Automatic Template Creation for Information Extraction
(Award Date: 10 August 1998)

Resources Group member resources