Natural Language Processing


About

The Natural Language Processing Research Group , established in 1993 , is one of the largest and most successful language processing groups in the UK and has a strong global reputation.

Natural Language Processing (NLP) is an interdisciplinary field that uses computational methods:

  • To investigate the properties of written human language and to model the cognitive mechanisms underlying the understanding and production of written language (scientific focus)
  • To develop novel practical applications involving the intelligent processing of written human language by computer (engineering focus)

Research themes


Contact us
nlp-enquiries@shef.ac.uk
Natural Langauge Processing Research Group
Department of Computer Science
University of Sheffield
Regent Court
211 Portobello
Sheffield, S1 4DP
UK
+44 (0)114 222 1901

Twitter logo Follow us on Twitter @SheffieldNLP

Research

Research

The group's research interests fall into the broad areas of: 

Information Access: Building applications to improve access to information in massive text collections, such as the web, newswires and the scientific literature. Subtopics include: information extraction, text mining and semantic annotation, question answering, summarization.

Language Resources and Architectures for NLP: Providing resources - both data and processing resources - for research and development in NLP. Includes platforms for developing and deploying real world language processing applications, most notably GATE, the General Architecture for Text Engineering.

Machine Translation: Building applications to translate automatically between human languages, allowing access to the vast amount of information written in foreign languages and easier communication between speakers of different languages.

Human-Computer Dialogue Systems: Building systems to allow spoken language interaction with computers or embodied conversational agents, with applications in areas such as keyboard-free access to information, games and entertainment, articifial companions.

Detection of Reuse and Anomaly: Investigating techniques for determining when texts or portions of texts have been reused or where portions of text do not fit with surrounding text. These techniques have applications in areas such as plagiarism and authorship detection and in discovery of hidden content.

Foundational Topics: Developing applications with human-like capabilities for processing language requires progress in foundational topics in language processing. Areas of interest include: word sense disambiguation, semantics of time and events.

The NLP group's research has received support from: the EU's Framework Programmes (Frameworks 4, 5, 6 and 7) as well as Horizon 2020 and the European Research Council, the UK Research Councils (EPSRC, BBSRC, MRC and AHRC) and various governmental and industrial sponsors, including GlaxoSmithKline and IBM.

The NLP group has close associations with the Speech and Hearing and Information Retrieval research groups which carry out research into other areas of computational processing of human language.

We also host the ICCL and CLUK Websites

People

People

These are currently the members of NLP group. Click on a name to see a home page.

Administrative Support

Lucy Moffatt

Joanne Suter

Alice Tucker

Visitors

Prof. Mikel Forcada

Jonathan Foster

Pawandeep Kaur

Luis Mesquita

Former group members

News

News

Click on a year to read the news stories

2017

Papers accepted to EACL 2017:

  • Continuous N-gram Representations for Authorship Attribution, Y. Sari, A. Vlachos, M. Stevenson, Proceedings of EACL: Volume 2, Short Papers pdf bib
  • An Extensible Framework for Verification of Numerical Claims, J. Thorne, A. Vlachos, Proceedings of the Software Demonstrations pdf bib
  • Book: Natural Language Processing for the Semantic Web, Diana Maynard, Kalina Bontcheva, Isabelle Augenstein. Morgan and Claypool, December 2016. ISBN:97816270590
  • Journal paper: A Framework for Real-time Semantic Social Media Analysis. Diana Maynard, Ian Roberts, Mark A. Greenwood, Dominic Rout and Kalina Bontcheva. Web Semantics: Science, Services and Agents on the World Wide Web, 2017
  • Conference paper: Towards an Infrastructure for Understanding and Interlinking Knowledge Co-Creation in European research, Diana Maynard, Adam Funk and Benedetto Lepori. ESWC 2017 Workshop on Scientometrics, Portoroz, Slovenia, May 2017
  • Diana Maynard taught 2 practical tutorials at the AI Seminar on Social Media Content Analysis, UPC Barcelona, May 2017
  • Diana Maynard gave an invited tutorial at the EU CLARIN-PLUS workshop on "Creation and Use of Social Media Resources", Lithuania, 2017
  • Diana Maynard gave an invited talk at 2017 Joint EC-OECD workshop on Semantic Technologies and Semantic Web: Structuring Data for STI Policy Analysis, 19 June, Brussels
  • Diana Maynard gave an invited talk at 2017 EPSRC The Future of Patent Analytics Workshop, 3 March, Cambridge, UK
  • The KNOWMAK project has started. A 3 year EC H2020 project from 1 Jan'17 - 31 Dec’20. The University of Sheffield PI is Diana Maynard.
  • Diana Maynard was Programme Chair of the ESWC conference in Portoroz, Slovenia in May.
  • Diana Maynard has won an ESRC-funded award from Understanding Society to access and analyse EU Referendum UK household survey data, for the project "Brexit narratives of place and scale: a media environment analysis of the EU Referendum debate” Co-PIs: Jackie Harrison (Journalism), J. Miguel Kanai (Geography)
2016

Papers accepted for COLING 2016:

  • Representation and Learning of Temporal Relations. L. Derczynski (2016). COLING
  • Broad Twitter Corpus: A Diverse Named Entity Recognition Resource. L. Derczynski, K. Bontcheva, I. Roberts (2016). COLING
  • Stance classification in Rumours as a Sequential Task Exploiting the Tree Structure of Social Media Conversations. A. Zubiaga, E. Kochkina, M. Liakata, R. Procter, M. Lukasik. (2016). COLING
  • Anita: An Intelligent Text Adaptation Tool. G. Paetzold, L. Specia. (2016). COLING
  • Understanding the Lexical Simplification Needs of non-Native Speakers of English. G. Paetzold, L. Specia. (2016). COLING
  • Collecting and Exploring Everyday Language for Predicting Psycholinguistic Properties of Words. G. Paetzold, L. Specia. (2016). COLING
  • Imitation learning for language generation from unaligned data. G. Lampouras, A. Vlachos. (2016). COLING
  • Carolina Scarton, Gustavo Paetzold and Lucia Specia will give a tutorial at COLING 2016, titledQuality estimation for language output applications
  • We are please to announce that Gutsavo Paetzold has passed his PhD viva, having submitted only 2 years after joining as a PhD student.
  • Leon Derczynski will give a course at ESSLLI 2017 with Matteo Magnani, titled "Networks and User-generated Content"
  • Book in press in Springer Studies in Computational Intelligence: Automatically ordering events and times in text - L Derczynski
  • Diana Maynard has had an article on automatic sarcasm detection published in Quartz Magazine
  • Diana Maynard will give tutorials on NLP and Social Media Analysis at the 1st International Deep Learning, Big Data and Big Compute Camp, Rabat, Morocco, 24-28 October 2016. https://dlwensias.wordpress.com/2016/09/05/3/
  • Paper published in European Psychiatry: Novel psychoactive substances: an investigation of temporal trends in social media and electronic health records - A Kolliakou, M Ball, L Derczynski, D Chandran, G Gkotsis, P Deluca, R Jackson, H Shetty, R Stewart
  • Mark Stevenson and Adam Poulson are collaborating with ScHaRR and Human on a project to visualise emotion in social media at the Festival of the Mind - Link to the Guardian Article
  • Paper: An IR-based Approach Utilising Query Expansion for Plagiarism Detection in MEDLINE. R. Nawab, M Stevenson and P. Clough (2016). IEEE/ACM Transactions of Computational Biology and Bioinformatics.
  • Paper: The Effect of Word Sense Disambiguation Accuracy on Literature Based Discovery. J. Preiss and M. Stevenson (2016). BMC Decision Making and Medical Informatics.
  • Paper: A Corpus of Potentially Contradictory Research Claims from Cardiovascular Research Abstracts. A. Alamri and M. Stevenson (2016). Journal of Biomedical Semantics, 7 (36).

Papers accepted for EMNLP 2016:

  • Stance Detection with Bidirectional Conditional Encoding , Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos and Kalina Bontcheva
  • Leon Derczynski has won an NVIDA hardware grant for summary generation from collections of text.
  • Prof. Lucia Specia has been awarded an EC H2020 funded ERC Starting Grant. The project on Multimodal Context Modelling for Machine Translation (MultiMT) will start on 1 July 2016 for 5 years.

Papers accepted for ACL 2016

  • Hawkes Processes for Continuous Time Sequence Classification: an Application to Rumour Stance Classification in Twitter. Michal Lukasik, P. K. Srijith, Duy Vu, Kalina Bontcheva, Arkaitz Zubiaga, Trevor Cohn.
  • Metrics for Evaluation of Word-level Machine Translation Quality Estimation. Varvara Logacheva, Michal Lukasik and Lucia Specia.

Papers accepted for TSD2016: 

  • Automatic Restoration of Diacritics for Igbo Language . Ignatius Ezeani, Mark Hepple and Ikechukwu Onyenwe. 
  • Predicting Morphologically-Complex Unknown Words in Igbo. Ikechukwu Onyenwe and Mark Hepple
  • Paper nominated for Best Paper Award at WebSci 2016: Miriam Fernandez, Harith Alani, Lara Piccolo, Christoph Meili, Diana Maynard and Meia Wippoo. Talking Climate Change via Social Media: Communication, Engagement and Behaviour, May 22-25 2016, Hannover, Germany.
  • Diana Maynard taught a 3-hour practical tutorial at the AI Seminar on Social Media Content Analysis, UPC Barcelona, 9-13 May 2016.
  • Leon Derczynski is co-organising a workshop on Noisy User-generated Text (WNUT) at COLING in Osaka, Japan, 10th December 2016.
  • Diana Maynard will teach two 6-hour courses, "Introduction to NLP" and "Practical social media and sentiment analysis" at the University of Essex Big Data and Analytics Summer School in September 2016. http://www.essex.ac.uk/iads/events/summer-school.aspx
  • Andreas Vlachos will be speaking at the Lisbon Machine Learning Summer School about imitation learning for structured prediction.
  • Andreas Vlachos will be speaking at the Knowledge Representation Workshop at the University of Liverpool on 28th June 2016.
  • Paper: Noise reduction and targeted exploration in imitation learning for Abstract Meaning Representation parsing. James Goodman, Andreas Vlachos and Jason Naradowsky. ACL 2016.
  • Paper: Emergent: A novel data-set for stance classification. William Ferreira and Andreas Vlachos. NAACL 2016.
  • Paper: Large-scale Multitask Learning for Machine Translation Quality Estimation . Kashif Shah and Lucia Specia. NAACL 2016.
  • Paper: Phrase Level Segmentation and Labelling of Machine Translation Errors. Frederic Blain, Varvara Logacheva, and Lucia Specia. In Proc. of Language Resources and Evaluation Conference (LREC), May 2016, Portoroz, Slovenia
  • Paper: Challenges of Evaluating Sentiment Analysis Tools on Social Media. Diana Maynard and Kalina Bontcheva. In Proc. of Language Resources and Evaluation Conference (LREC), May 2016, Portoroz, Slovenia
  • Paper: Complementarity, F-score, and NLP Evaluation. Leon Derczynski. In Proc. of Language Resources and Evaluation Conference (LREC), May 2016, Portoroz, Slovenia
  • Paper: GATE-Time: Extraction of Temporal Expressions and Events Leon Derczynski, Jannik Strötgen, Diana Maynard, Mark A. Greenwood, Manuel Jung. In Proc. of Language Resources and Evaluation Conference (LREC), May 2016, Portoroz, Slovenia
  • Dr. Diana Maynard has been awarded a grant for a fully-funded 4-year PhD student project by the Grantham Centre for Sustainable Futures, to start in October 2016, on the topic of disaster relief reporting and climate change. The Grantham Scholar will be supervised by Diana Maynard and co-supervised by Prof. Jacqueline Harrison from the Dept of Journalism and Prof. Shaun Quegan from the Centre for Terrestrial Carbon Dynamics.
  • The next annual GATE training course will be held from 6-10 June 2016.
  • Mark Stevenson was awarded a grant from Defence Science and Technology Laboratory: "Hypothesis Generation and Visualisation from Data"
  • Paper: A Graph-based Approach to Topic Clustering for Online News. Ahmet Aker, Emina Kurtic, Balamurali Andiyakkal Rajendran, Monica Paramita, Emma Barker, Mark Hepple and Rob Gaizauskas. ECIR 2016.
  • Paper: Automated Content Analysis: A Sentiment Analysis on Malaysian Government Social Media. Siti Salwa Hasbullah and Diana Maynard. In Proc. of ACM International Conference on Ubiquitous Information Management and Communication (IMCOM), January 2016, Danang, Vietnam.
  • The COMRADES project has started. A 3 year EC H2020 project from 1 Jan'16 - 31 Dec'18. The University of Sheffield PI is Prof. Kalina Bontcheva
  • We are pleased to announce two new NLP Professors: Kalina Bontcheva and Lucia Specia have both been promoted to Personal Chair.
2015
  • A piece was published in the Guardian technology blog on Tuesday 8.12.2015 on our work in the EU-funded SENSEI project.
  • Tutorial given by Diana Maynard at Search Solutions 2015, British Computer Society, London, November 2015: "Text analysis with GATE"
  • Mark Stevenson is co-organising a workshop on Topic Models: Post-processing and Applications at CIKM 2015 with Nikolaos Aletras (UCL), Jey Han Lau (King's College London) and Timothy Baldwin (University of Melbourne).
  • Andrés Duque from UNED in Madrid visited the group for 3 months (October - December 2015)
  • Paper: Understanding climate change tweets: an open source toolkit for social media analysis. D. Maynard and K. Bontcheva. In Proc. of EnviroInfo 2015, Copenhagen, Sep. 2015.PDF
  • Poster: Real-time Social Media Analytics through Semantic Annotation and Linked Open Data. D. Maynard, M. A. Greenwood, I. Roberts, G. Windsor, K. Bontcheva. Proceedings of WebSci 2015, Oxford, UK
  • Paper: "Generalised Brown Clustering and Roll-Up Feature Generation". Leon Derczynski, Sean Chester. AAAI 2016.
  • We are pleased to announce that Dr. Andreas Vlachos has joined the group from 1 September 2015.
  • Paper: Evaluating Topic Representations for Exploring Document Collections. N. Aletras, T. Baldwin, J. Lau and M. Stevenson (to appear), Journal of the Association for Information Science and Technology
  • Paper: Exploring Relation Types for Literature-based Discovery. J. Preiss, M. Stevenson and R. Gaizauskas. (to appear), Journal of the American Medical Informatics Association.
  • Paper: Why are these similar? Investigating item similarity types in a large Digital Library. A. Gonzalez-Agirre, N. Aletras, G. Rigau, M. Stevenson and E. Agirre. (to appear), Journal of the Association for Information Science and Technology
  • Paper: Cognitive Styles within an Exploratory Search System for Digital Libraries. P. Goodale, P. Clough, S. Fernando, N. Ford and M. Stevenson (2014), Journal of Documentation, 70(6):970-996.
  • Paper: Improving Distant Supervision using Inference Learning. R. Roller, E. Agirre, A. Soroa and M. Stevenson (2015). In Proceedings of the 53rd Annual Meeting of the Association for Computational Lingusitics and the 7th International Conference on Natural Language Processing of the Asican Federation of Natural Language Processing (ACL-IJCNLP 2015), Beijing, China.
  • Paper: A Hybrid Distributional and Knowledge-based Model of Lexical Semantics. N. Aletras and M. Stevenson (2015). In Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics, pages 20--29, Denver, Colorado
  • Paper: Investigating Continuous Space Language Models for Machine Translation Quality Estimation. Kashif Shah, Raymond W. M. Ng, Fethi Bougares and Lucia Specia. EMNLP, 2015 (To Appear)
  • Paper: SHEF-NN: Translation Quality Estimation with Neural Networks. Kashif Shah, Varvara Logacheva, Gustavo Paetzold, Frédéric Blain, Daniel Beck, Fethi Bougares and Lucia Specia. WMT, 2015 (To Appear)
  • Paper: A study on the stability and effectiveness of features in quality estimation for spoken language translation. Raymond W. M. Ng, Kashif Shah, Lucia Specia and Thomas Hain. Interspeech, 2015.
  • Paper: Quality estimation for ASR K-best list rescoring in spoken language translation. Raymond W. M. Ng, Kashif Shah, Wilker Aziz, Lucia Specia and Thomas Hain. ICASSP, 2015.
  • Article: A Bayesian non-linear method for feature selection in machine translation quality estimation Kashif Shah, Trevor Cohn and Lucia Specia. Journal of Machine Translation, 2015.
  • The Pheme project is co-supporting Clinical TempEval again in 2016, a shared evaluation task with the NIH THYME project and Harvard Children's Hospital, which will run at SemEval.
  • Special issue on "Time and Information Retrieval" in the Information Processing & Management journal was published, with Leon Derczynski as lead guest editor.
  • Martin Leginus from Aalborg University, co-supervised by Leon Derczynski, won the Best Student Paper award at WEBIST with his work improving tag clouds using entity disambiguation in streams.
  • Sean Chester from Aarhus University will visit and give a seminar in late September.
  • Book deal signed with O'Reilly on Temporal Information Processing for Language, by Leon Derczynski working with James Pustejovsky and Marc Verhagen (both from Brandeis).
  • Our entry in the W-NUT entity recognition challenge in tweets won 3rd place for untyped entity recognition.
  • Paper: Extracting Relations Between Non-Standard Entities using Distant Supervision and Imitation Learning.Isabelle Augenstein, Andreas Vlachos, Diana Maynard. EMNLP 2015.
  • Article: Distantly Supervised Web Relation Extraction for Knowledge Base Population. Isabelle Augenstein, Diana Maynard, Fabio Ciravegna. Semantic Web Journal.
  • Tutorial with Barry Norton at ESWC Summer School 2015: "Information Extraction with Linked Data"
  • Article from the group published in the journal Information Processing and Management: Leon Derczynski, Diana Maynard, Giuseppe Rizzo, Marieke van Erp, Genevieve Gorrell, Raphaël Troncy, Johann Petrak, Kalina Bontcheva. 2015. Analysis of Named Entity Recognition and Linking for Tweets.
  • Paper presented at the SemEval workshop: Steven Bethard, Leon Derczynski, Guergana Savova, James Pustejovsky, Marc Verhagen. 2015. SemEval-2015 Task 6: Clinical TempEval.
  • Paper presented at the SemEval workshop: Fatih Uzdilli, Martin Jaggi, Dominic Egger, Pascal Julmy, Leon Derczynski, Mark Cieliebak. 2015. Swiss-Chocolate: Combining Flipout Regularization and Random Forest with Artificially Built Subsystems to Boost Text-Classification for Sentiment.
  • Paper from the group presented at the SemEval workshop: Hegler Tissot, Genevieve Gorrell, Angus Roberts, Leon Derczynski, Marcos Didonet del Fabro. 2015. UFPRSheffield: Contrasting Rule-based and Support Vector Machine Approaches to Time Expression Identification in Clinical TempEval.
  • Book chapter form the group to appear in The Handbook of Linguistic Annotation (edited by Nancy Ide and James Pustejovsky): Kalina Bontcheva, Leon Derczynski, Ian Roberts. 2015. Crowdsourcing Named Entity Recognition and Entity Linking Corpora.
  • Paper from the group presented at the ISA-11 workshop: Hegler Tissot, Angus Roberts, Leon Derczynski, Genevieve Gorrell, Marcos Didonet del Fabro. 2015. Analysis of Temporal Expressions Annotated in Clinical Notes.
  • Paper presented at the WEBIST conference: Martin Leginus, Leon Derczynski, Peter Dolog. 2015. Enhanced Information Access to Social Streams through Word Clouds with Entity Grouping.
  • Paper from the group at the W-NUT workshop: Leon Derczynski, Isabelle Augenstein, Kalina Bontcheva. 2015. USFD: Twitter NER with Drift Compensation and Linked Data.
  • Diana Maynard will give a Tutorial on "Practical Sentiment Analysis" at Essex University Summer School on Big Data and Analytics, 24-28 August 2015
  • Book chapter publication. Diana Maynard and Jonathon Hare. Entity-based Opinion Mining from Text and Multimedia. In "Advances in Social Media Analysis", Mohamed Gaber, Nirmalie Wiratunga, Ayse Goker, and Mihaela Cocea (eds.) 2015, Springer.
  • Diana Maynard gave a keynote speech at 5th International Conference on Web Intelligence, Mining and Semantics (WIMS), July 13-15, 2015, Cyprus. "What you Tweet is What You Get: challenges and opportunities for social media analysis in industry"
  • The annual GATE training course was held in Sheffield from 8-12 June, with 21 participants.
  • Diana Maynard gave a tutorial on "Text Analysis with GATE" at the Reading University Workshop on Big Social Data, 24 April 2015.
2014
  • A paper by Roland Roller and Mark Stevenson (Self-supervised Relation Extraction using UMLS) won the best paper award atCLEF 2014
  • Paper published in the Journal of Biomedical Informatics: B. McInnes and M. Stevenson (2014) Determining the Difficulty of Word Sense Disambiguation. Journal of Biomedical Informatics, 47:83-90.
  • Paper accepted for the journal Studies in the Digital Humanities: M. Hall, P. Goodale, P. Clough and M. Stevenson (2014) The PATHS System for Exploring Digital Cultural Heritage. Studies in the Digital Humanities.
  • Paper published in the journal Information Retrieval: M. Hall, S. Fernando, P. Clough, A. Soroa, E. Agirre and M. Stevenson (2014) Evaluating hierarchical organisation structures for exploring digital libraries. Information Retrieval 17(4):351-379.
  • Paper accepted for the journal Science of Computer Programming M. Shahbaz, P. McMinn and M. Stevenson (2014) Automatic generation of valid and invalid test data for string validation routines using web searches and regular expressions. Science of Computer Programming.
  • Paper from the group published at ACL 2014: N. Aletras and M. Stevenson (2014) Labelling Topics using Unsupervised Graph-based Methods. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), pages 631--636, Baltimore, Maryland
  • Paper from the group published at Digital Libraries 2014: N. Aletras, T. Baldwin, J. Lau and M. Stevenson (2014) Representing Topics Labels for Exploring Digital Libraries. In Digital Libraries 2014 (ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014) and International Conference on Theory and Practice of Digital Libraries (TPDL 2014), London, UK
  • Paper from the group published at EACL 2014: N. Aletras and M. Stevenson (2014) Measuring the Similarity between Automatically Generated Topics. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 22--27, Gothenburg, Sweden

Papers from the group published at EMNLP 2014:

  • Wilker Aziz and Lucia Specia. 2014. Exact Decoding for Phrase-Based Statistical Machine Translation. EMNLP, Doha.
  • Daniel Beck, Trevor Cohn and Lucia Specia. 2014. Joint Emotion Analysis via Multi-task Gaussian Processes. EMNLP, Doha.
  • Kashif Shah, Trevor Cohn and Lucia Specia. 2014. A Bayesian non-Linear Method for Feature Selection in Machine Translation Quality Estimation. Machine Translation.
  • The University of Sheffield (Sheffield NLP Group) was ranked 3rd in the list of institutions that have published the most LREC papers.
  • The Clinical TempEval exercise will run at SemEval 2015, a collaboration between researcher at Brandeis University, U. Alabama Birmingham and Leon Derczynski for the University of Sheffield
  • Leon Derczynski will give two guest lectures at a course on Network Science and online Social Network Analysis at Uppsala Universitet in May

Members of the group have chapters in 2 new books:

  • Documenting Contemporary Society by Preserving Relevant Information from Twitter In 'Twitter and Society', edited by K. Weller, A. Bruns, J. Burgess, M. Mahrt and C. Puschmann, 2014. T. Risse, W. Peters, P. Senellart, D. Maynard
  • Crowdsourcing Named Entity Recognition and Entity Linking Corpora in "The Handbook of Linguistic Annotation" edited by Nancy Ide & James Pustejovsky. Kalina Bontcheva, Leon Derczynski, Ian RobertsMatteo Magnani and Leon Derczynski will teach a week-long course at ESSLLI 2014 in Tubingen in August, on "Human Information Networks"

We have 2 demos accepted at EACL 2014:

  • The GATE Crowdsourcing Plugin: Crowdsourcing Annotated Corpora Made Easy Kalina Bontcheva, Ian Roberts and Leon Derczynski
  • DKIE: Open Source Information Extraction for Danish Leon Derczynski, Camilla Vilhelmsen Derczynski Field, Kenneth Sejdenfaden Bøgh

The group have 6 papers accepted at LREC 2014

  • Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines Marta Sabou, Kalina Bontcheva, Leon Derczynski, Arno Scharl
  • An efficient and user-friendly tool for machine translation quality estimation Kashif Shah, Marco Turchi, Lucia Specia
  • Who cares about sarcastic tweets? Investigating the impact of sarcasm on sentiment analysis Diana Maynard
  • Bilingual dictionaries for all EU languages, LREC Ahmet Aker, Monica Paramita, Marcis Pinnis, Robert Gaizauskas
  • Bootstrapping Term Extractors for Multiple Languages Ahmet Aker, Monica Paramita, Emma Barker, Robert Gaizauskas
  • Spatio-temporal grounding of claims made on the web, in Pheme Leon Derczynski, Kalina Bontcheva
  • A paper is accepted in JASIST journal: Generating Descriptive Multi-Document Summaries of Geo-Located Entities Using Entity Type Models. JASIST Ahmet Aker, Robert Gaizauskas
  • The PHEME project has started. A 3 year EC FP7 project from 1 Jan'14 - 31 Dec'16 with 9 partners worth a total of € 4,269,938 with an EC contribution of € 2,916,000. The University of Sheffield PI is Dr Kalina Bontcheva
2013

Three full papers from the group have been accepted at RANLP 2013, to be held in the spa town of Hisarya, Bulgaria

  • "Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data" Derczynski, L., Ritter, A., Clarke, S. & Bontcheva, K.
  • "Recognising and Interpreting Named Temporal Expressions" M. Brucato, M., Derczynski, L., Llorens, H., Bontcheva, K. & Jensen, C.S.
  • "TwitIE: A Fully-featured Information Extraction Pipeline for Microblog Text" Bontcheva, K., Derczynski, L., Funk, A., Greenwood, M.A., Maynard, D. & Aswani, N.
  • The group has had a discussion paper accepted at the International Conference on the Theory of Information Retrieval: "Information Retrieval for Temporal Bounding" Derczynski, L. & Gaizauskas, R.

2 short papers & 3 demonstrations have been accepted by the group at ACL 2013

Short Papers

  • "Reducing Annotation Effort for Quality Estimation via Active Learning" Beck, D., Specia, L. & Cohn, T.
  • "Temporal Signals Help Label Temporal Relations" Derczynski, L. & Gaizauskas, R.

Demonstrations

  • "QuEst - A translation quality estimation framework" Specia, L., Shah, K., Guilherme Camargo de Souza, J. & Cohn, T.
  • "PATHS: A System for Accessing Cultural Heritage Collections" Agirre, E., Aletras, N., Clough, P., Fernando, S., Goodale, P., Hall, M., Soroa, A. & Stevenson, M.
  • "AnnoMarket: An Open Cloud Platform for NLP" Bontcheva, K., Tablan, V., Roberts, I., Cunningham, H. & Dimitrov, M.
  • Two out of the three nominations for the ACM SIGWEB Ted Nelson prize at Hypertext 2013, Paris are both from Sheffield's NLP group. (link)

5 papers by the group accepted at ACL 2013

  • "Extracting bilingual terminologies from comparable corpora" Aker, A., Paramita, M. & Gaizauskas, R.
  • "An Infinite Hierarchical Bayesian Model of Phrasal Translation" Cohn, T. & Haffari, G.
  • "Modelling Annotator Bias with Multi-task Gaussian Processes: An Application to Machine Translation Quality Estimation" Cohn, T & Specia, L.
  • "Markov Translation using Non-parametric Bayesian Inference" Feng, Y. & Cohn, T.
  • "A user-centric model of voting intention from Social Media" Lampos, V., Preotiuc-Pietro, D. & Cohn, T.

3 papers by the group accepted for NAACL 2013

  • "Representing Topics Using Images" Aletras, N. and Stevenson, M.
  • "Unsupervised Domain Tuning to Improve Word Sense Disambiguation" Preiss, J. and Stevenson, M.
  • "DALE: A Word Sense Disambiguation System for Biomedical Documents Trained using Automatically Labeled Examples (demo)" Preiss, J. and Stevenson, M.
  • The ForgetIT: Concise Preservation by combining Managed Forgetting and Contextualized Remembering project has started. A 3 year EC FP7 project from 1 Feb'13 - 31 Jan'16. The project has 11 partners worth a total of € 9,085,190 with an EC contribution of € 6,590,000. The University of Sheffield PI is Prof. Hamish Cunningham
  • The VisualSense: Tagging visual data with semantic descriptions project has started. A 3 year EPSRC project from 1 Jan'13 - 31 Dec'15. The project has 4 partners and is part of the Chist-Era EC funding programme. The University of Sheffield PI is Prof. Rob Gaizauskas

Older news

Older news stories

Seminars

Scheduled Speakers

2 November 2017 - Emem Rita Usanga

9 November 2017 - Yorick Wilks

16 November 2017 - Zeerak Waseem

18 January 2018 - Horacio Saggion

15 February 2018 - Wang Ling

2017 - 2018

26 October 2017 - NLP Student Talks

Chiraag Lala - Multimodal Lexical Translation

Inspired by the tasks of Multimodal Machine Translation and Visual Sense Disambiguation we introduce a task called Multimodal Lexical Translation (MLT). The aim of this new task is to correctly translate an ambiguous word given its context - an image and a sentence in the source language. To facilitate the task, we introduce the MLT datasets, where each data point is a $4$-tuple consisting of an ambiguous source word, its visual context (an image), its textual context (a source sentence), and its translation that conforms with the visual and textual contexts. The dataset has been created from the Multi30K corpus using word-alignment followed by human inspection for English to German and English to French language directions. These datasets form a very valuable multimodal and multilingual language resource with several potential uses including evaluation of lexical disambiguation within (Multimodal) Machine Translation systems.

Fernando Manchego - Sentence Simplification via Sequence Labeling

Text Simplification aims to modify the content and structure of a text, in order to make it easier to read and understand. At the sentence-level, several rewriting operations can be performed to achieve this goal: replacing complex words or phrases for simpler synonyms, deleting unimportant content, splitting the sentence, etc. Most research treats sentence simplification as machine translation (MT), with complex and simple as source and target languages, respectively. In this talk, we will first present an in-depth analysis on the potential and limitations of end-to-end MT-style models using automatic and manual evaluations. To deal with some of the identified problems, we devise a two-step sequence labeling method: (i) identify the simplification operations that need to be performed (if any) in each token of sentence, and (ii) execute the operation using transformation-specific strategies. We show that this operation-based approach is able to produce simpler texts than end-to-end models.

19 October 2017 - Kris Cao (University of Cambridge) - Latent variable models of language

Behind the observed surface form of language exist underlying structures and themes, such as syntax, topic and utterance intent. In this talk, I will present some work which composes graphical models to learn underlying variables with powerful data likelihood functions to model the observed surface form. One such application is in open-domain dialogue modelling, where the latent variables capture the variation in the possible responses to a user utterance. We show that the latent variable approach generates more acceptable diverse output, as measured by human annotators. Another is extending topic models to instead learn topics underlying entire sentences, rather than just words. This lets the model learn topics which capture compositional meaning, which a standard word-level model has difficult doing.

12 October 2017 - Sasha Narayan (University of Edinburgh) - Text-to-text Generation Beyond Machine Translation

In recent years we have witnessed the achievements of sequence-to-sequence encoder-decoder models for machine translation.
It is no surprise that these models are also setting a trend in various other generation tasks such as dialogue generation, image caption generation, sentence compression, paraphrase generation, sentence simplification and document summarization. Yet, these deep learning sequence models are often applied off-the-shelf to these text-to-text generation tasks, not tailoring the underlying model to the specific task to improve performance.

In this talk I will discuss two examples, sentence simplification and document summarization, that explore the hypothesis that tailoring the model with knowledge of the task structure and linguistic requirements leads to better performance. In the first part, I will propose a new sentence simplification task (split-and-rephrase) where the aim is to split a complex sentence into a meaning preserving sequence of shorter sentences. I will show that the semantically-motivated split model is a key factor in generating fluent and meaning preserving rephrasings.
In the second part, I will discuss the shortcomings of sequence-to-sequence abstractive methods for document summarization and show that an extractive summarization system trained to globally optimize a common summarization evaluation metric outperforms state-of-the-art extractive and abstractive systems in both automatic and extensive human evaluations.

BIO: Shashi Narayan is a postdoctoral researcher in the School of Informatics at the University of Edinburgh. He obtained his PhD in Computer Science at the University of Lorraine, INRIA under Claire Gardent in 2014. His research focuses on natural language generation and understanding with an aim to develop general frameworks for generation from underlying meaning representation or for text rewriting such as summarization, text simplification and paraphrase generation. He also has experience with parsing and other structured prediction problems.

4 September 2017 - Thushari Atapattu (University of Adelaide) - Disclosure Analysis of Educational Big Data

Discourse analysis within the educational context consists of processing natural language data generated from learning and teaching processes including written assessments, transcripts, discussion forums, and micro blogs. Computational approaches for discourse analysis integrates NLP with psychological theories of social interaction, discourse comprehension, and communication. Discourse analysis is a complex problem, particularly within massive classrooms (e.g. Massive Open Online Courses – MOOCs). In this talk, I will discuss two of our research in understanding the academic discourse of lecturers as well as learner-generated discourse in MOOCs. Our work aims to detect the learners’ video interactions patterns and inform us of the influence of quality of lecturers’ discourse. This work analyses millions of video interactions in two MOOCs and found that transition in discourse (i.e. lexical diversity, connectivity) impacts on learners’ video engagement behaviour. Further, I will talk about the association between the quality of learner-generated discourse (i.e. discussion posts) and its impact on learning success. Thus, I will explain how the understanding of discourse enables us to identify the interventions for positive student trajectories.


Past seminars

Reading Group

NLP Reading Group

The target audience is all the members of the NLP group and other possible interested participants.

The meeting will take place weekly for one hour usually on Tuesdays from 11-12pm.

The meetings of the group will be informal and no necessary preparation will be required with the exception of the moderator reading the current paper and the rest having at least a brief overview of it.

Next Meeting

Tuesday 24 October 2017

Morphological Inflection Generation with Hard Monotonic Attention

Roee Aharoni & Yoav Goldberg

Past Meetings

Tuesday 17 October 2017

A Factored Neural Network Model for Characterizing Online Discussions in Vector Space

Hao Cheng, Hao Fang, Mari Ostendorf

Tuesday 10 October 2017

Understanding Black-box Predictions via Influence Functions

Pang Wei Koh, Percy Liang; Published in Proceedings of International Conference on Machine Learning, 2017

Tuesday 3 October 2017

Zero-Shot Relation Extraction via Reading Comprehension

Omer Levy, Minjoon Seo, Eunsol Choi and Luke Zettlemoyer

Tuesday 19 September 2017

"Men also like shopping: Reducing Gender Bias Amplification Using Corpus Level Constraints"

Tuesday 29 August 2017

Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction

Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell

Proceedings of the 34th International Conference on Machine Learning, PMLR 70:3309-3318, 2017.

Tuesday 22 August 2017

Split and Rephrase, Accepted for EMNLP 2017

Shashi Narayan, Claire Gardent, Shay B. Cohen and Anastasia Shimorina

Tuesday 15 August 2017

Attention Is All You need
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely

Tuesday 8 August 2017

Learning to Compute Word Embeddings On the Fly

Dzmitry Bahdanau, Tom Bosc, Stanisław Jastrzębski, Edward Grefenstette, Pascal Vincent, Yoshua Bengio

Tuesday 1 August 2017

Learning to Generate Textual Data, EMNLP 2016
Guillaume Bouchard and Pontus Stenetorp and Sebastian Riedel

Tuesday 11 July 2017

SoundNet: Learning Sound Representations from Unlabeled Video

Yusuf Aytar, Carl Vondrick, Antonio Torralba

Tuesday 4 July 2017

Sentence Simplification with Deep Reinforcement Learning

Xingxing Zhang, Mirella Lapata

Tuesday 27 June 2017

Generation and Comprehension of Unambiguous Object Descriptions

Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan Yuille, Kevin Murphy

Tuesday 20 June 2017

Understanding the BPE algorithm

Tuesday 13 June 2017

Sequence-to-Sequence Models Can Directly Transcribe Foreign Speech

Ron J. Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu, Zhifeng Chen

Tuesday 6 June 2017

Covonlutional Sequence to Sequence Learning

Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin

Tuesday 30 May 2017

Program Induction by Rationale Generation:Learning to Solve and Explain Algebraic Word Problems

Wang Ling, Dani Yogatama, Chris Dyer, Phil Blunsom

Tuesday 9 May 2017

Chatterjee et al.: Online Automatic Post-editing for MT in a Multi-Domain Translation Environment

Tuesday 6 May 2017

Convolutional Sequence to Sequence Learning

Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin

Tuesday 2 May 2017

Coarse-to-Fine Question Answering for Long Documents

Tuesday 25 April 2017

Re-evaluating Automatic Metrics for Image Captioning

Mert Kilickaya, Aykut Erdem, Nazli Ikizler-Cinbis, Erkut Erdem

Tuesday 18 April 2017

Neural Tree Indexers, EACL2017

Tuesday 11 April 2017

EACL Recap

Tuesday 4 April 2017

Shakir Mohammed's deep learning overview

Tuesday 28 March 2017

Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond

Tuesday 21 March 2017

Unsupervised AMR-Dependency Parse Alignment

Tuesday 14 March 2017

Kim et al. (2016): Examples are not Enough, Learn to Criticize! Criticism for Interpretability, NIPS 2016

Tuesday 7 March 2017

Latent Variable Dialogue Models and their Diversity

Kris Cao and Stephen Clark

Tuesday 28 February 2017

Zhang et al. EACL2017

Tuesday 21 February 2017

Structured Attention Networks

Tuesday 14 February 2017

CORE: Context-Aware Open Relation Extraction with Factorization Machines

by Fabio Petroni, Luciano Del Corro and Rainer Gemulla

Tuesday 7 February 2017

Adversarial Training Methods for Semi-Supervised Text Classification

Takeru Miyato, Andrew, M.Dai, Ian Goodfellow

Tuesday 31 January 2017

Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing

Tim Vieira and Jason Eisner

Tuesday 24 January 2017

Matching Networks for One Shot Learning

Oriol Vinyals, Charles Blundell, Tim Lillicrap, Koray Kavukcuoglu, Daan Wierstra

Tuesday 17 January 2017

Learning Structured Predictors from Bandit Feedback for Interactive NLP. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL). Berlin, Germany

Artem Sokolov, Julia Kreutzer, Christopher Lo, Stefan Riezler

Tuesday 13 December 2016

Optimization and Sampling for NLP from a Unified Viewpoint

Marc Dymetman, Guillaume Bouchard, Simon Carter

Tuesday 6 December 2016

Matrix Completion has No Spurious Local Minimum

Rong Ge, Jason D. Lee, Tengyu Ma

Tuesday 29 November 2016

Compositional Semantic Parsing on Semi-Structured Tables 
Panupong Pasupat and Percy Liang

Tuesday 22 November 2016

Minimum Risk Training for Neural Machine Translation 
Shiqi Shen, Yong Cheng, Zhougjun He, Wei He, Hua Wu, Maosong Sun, Yang Liu

Tuesday 15 November 2016

Generation from Abstract Meaning Representation using Tree Transducers 
Jeffrey Flanigan, Chris Dyer, Noah A. Smith and Jaime Carbonell

Tuesday 1 November 2016

Visual Representations for Topic Understanding and Their Effects on Manually Generated Labels Transactions of the Association for Computational Linguistics, 2016. 
Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Leah Findlater, Jordan Boyd-Graber, and Niklas Elmqvist

Tuesday 25 October 2016

Learning to Search Better than your Teacher

Talk 
Chang et al. ICML 2015

Tuesday 11 October 2016

A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task 
Danqi Chen, Jason Bolton, Christopher D. Manning

Tuesday 4 October 2016

Ultradense Word Embeddings by Orthogonal Transformation 
Sascha Rothe, Sebastian Ebert, Hinrich Schütze

Tuesday 7 June 2016

Not All Character N-grams Are Created Equal: A Study in Authorship Attribution. 
Upendra Sapkota, Steven Bethard, Manuel Montes-y-Gómez & Thamar Solorio (2015)

Tuesday 31 May 2016

Relation extraction with matrix factorization and universal schemas.

Riedel, S., Yao, L., McCallum, A., & Marlin, B. M. (2013)

Tuesday 10 May 2016

Training Deterministic Parsers with Non-Deterministic Oracles, TACL

slides 
Goldberg, Y. and Nivre, J. (2013)

Tuesday 3 May 2016

A New Corpus and Imitation Learning Framework for Context-Dependent Semantic Parsing 
Vlachos, A. and Clark, S.

Tuesday 22 April 2016

Sequence Level Training with recurrent Neural Networks 
Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, Wojciech Zaremba

Tuesday 22 March 2016

"Distributed Representation of Sentences and Documents" 
Quoc Le and Tomas Mikolov

Tuesday 8 March 2016

AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes 
Sascha Rothe; Hinrich Schütze. ACL2015 (best student paper)

Tuesday 23 February 2016

From Word Embeddings To Document Distances 
Kusner et al.

Tuesday 16 February 2016

"Target-Dependent Twitter Sentiment Classification with Rich Automatic Features"

Tuesday 9 February 2016

"Evaluation methods for unsupervised word embeddings"

Tuesday 25 January 2016

Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks 
Hua He, Kevin Gimpel, and Jimmy Lin. EMNLP2015

Tuesday 19 January 2016

Multilingual Image Description with Neural Sequence Models

Tuesday 12 January 2016

"Improving Distributional Similarity with Lessons Learned from Word Embeddings"

Tuesday 8 December 2015

Using Discourse Structure Improves Machine Translation Evaluation
F Guzmán, S Joty, L Màrquez, P Nakov

And here are the author's slides

Tuesday 1 December 2015

Practical Bayesian Optimization of Machine Learning Algorithms Advances in Neural Information Processing Systems, 2012 
Snoek, J.; Larochelle, H. & Adams, R. P.

Related presentations/lecture slides:

http://becs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath_Presentation.pdf

http://drona.csa.iisc.ernet.in/~indous/Lectures-2014/slides/jasper.pdf

Related Video

My reading group presentation slides

Tuesday 24 November 2015

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks ACL 2015 
LSTMs? Kai Sheng Tai, Richard Socher, Christopher D. Manning

http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-2-implementing-a-language-model-rnn-with-python-numpy-and-theano/

http://www.wildml.com/2015/10/recurrent-neural-networks-tutorial-part-3-backpropagation-through-time-and-vanishing-gradients/

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Additional resource about LSTM: "Anyone Can Learn To Code an LSTM-RNN in Python"

Tuesday 17 November 2015

RNNs/LSTMs ConvNets

More details on auto encoders for unsupervised pre-training:

http://deeplearning.stanford.edu/wiki/index.php/Autoencoders_and_Sparsity

http://www.jmlr.org/papers/volume11/erhan10a/erhan10a.pdf

http://www.slideshare.net/billlangjun/simple-introduction-to-autoencoder

Tuesday 10 November 2015

Multi-Metric Optimization Using Ensemble Tuning. NAACL2013. Video 
Baskaran Sankaran, Anoop Sarkar and Kevin Duh

Tuesday 3 November 2015

NN tutorials by Quoc Le

Josiah's slides

Other resources:

Andrej Karpathy's notes

Different objective functions, multiclass problems

Gradient descent

Backpropagation

Discussion about different activation functions

Tuesday 27 October 2015

Three blog posts introducing RNNs for language modelling in equations and code

might help to read this NLP primer

Additional material:
a thorough explanation of back propagation

Tuesday 20 October 2015

Teaching Machines to Read and Comprehend. NIPS 2015. 
Karl Moritz Hermann, Tomáš Kociský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, Phil Blunsom

Slides (presented at LXMLS)

Background reading:

Understanding LSTMs

NAACL 2013 Tutorial "Deep Learning without Magic"

EMNLP 2014 Tutorial "Embedding Methods for NLP"

Related Work:

Entailment with Neural Attention (better description of attention models than in the NIPS paper in my opinion)

Memory Networks

Tuesday 13 October 2015

A large annotated corpus for learning natural language inference. Proceedings of EMNLP 2015. 
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning

Should compare this to work on (multilingual) textual similarity

Projects

Projects

Funded Research Projects

Current Projects

Currently these group projects are active (in alphabetical order)

  • Career Accelleration Fellowship: Machine Learning Methods for Personalised, Abstractive Summerisation of Consumer-Generated Media 
  • Kalina Bontcheva
  • COMRADES: Collective Platform for Community Resilience and Social Innovation during Crises 
  • Kalina Bontcheva
  • Cracker: Cracking the Language Barrier: Coordination, Evaluation and Resources for European MT Research
  • Lucia Specia
  • DILiGENt: Domain-Independent Language Generation
  • Andreas Vlachos
  • GATE: A General Architecture for Text Engineering 
  • Hamish Cunningham
  • GOOGLE Grant: Distinguishing Common and Proper Nouns 
  • Mark Stevenson
  • Healtex: UK Healthcare Text Analytics Research Network 
  • Rob Gaizauskas
  • Investigating Spoken Dialogue to Support Manufacturing Processes
  • Rob Gaizauskas
  • KConnect: Khresmoi Multilingual Medical Text Analysis, Search and Machine Translation Connected in a Thriving Data-Value Chain 
  • Angus Roberts
  • KNOWMAK: Knowledge in the making in the European society
  • Diana Maynard
  • MultiMT: Multimodal Machine Translation 
  • Lucia Specia
  • OpenMinTed: Open Mining INfrastructure for TExt and Data 
  • Angus Roberts
  • Predicting Relevance and Quality of Machine Translation for Product Reviews
  • Lucia Specia
  • QT21: Quality Translation 21 
  • Lucia Specia
  • Recommendation Algorithm
    Mark Stevenson
  • SIMPATICO:SIMplifying the interaction with Public Administration Through Information technology for Citizens and cOmpanies 
  • Lucia Specia
  • SoBigData: SoBigData Research Infrastructure 
  • Hamish Cunningham
  • SUMMA: Scalable Understanding of Multilingual MediA 
  • Andreas Vlachos
Previous Projects

Previous projects (in alphabetical order)

  • ACCURAT: Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation 
  • Rob Gaizauskas & Paul Clough (Information School)
  • ABRAXAS: Automating Ontology Learning for the Semantic Web 
  • Yorick Wilks & Fabio Ciravegna
  • AKT: Advanced Knowledge Technolgies 
  • Yorick Wilks
  • AMILCARE: An adaptive IE system for the Semantic Web 
  • Fabio Ciravegna
  • AMITIES: Automated Multilingual Interaction with Information and Services 
  • Yorick Wilks
  • AnnoMarket: Annotation Resource Marketplace in the Cloud 
  • Hamish Cunningham
  • ARCOMEM: From Collect-All Archives to Community Memories - Leveraging the Wisdom of the Crowds for Intelligent Preservation 
  • Hamish Cunningham
  • AVENTINUS: Advanced Information System for Multinational Drug Enforcement 
  • Yorick Wilks & Hamish Cunningham
  • Barista: Non-Parametric Models of Phrase-based Machine Translation 
  • Trevor Cohn
  • CA4NLP: Engineering Natural Language Interfaces: can CA help? 
  • Mark Hepple & Peter Wallis
  • CASTLE: Computational Adaptive Semantics for Language Engineering 
  • Mark Stevenson
  • CLARIN: Common Language Resources and Technology Infrastructure 
  • Wim Peters
  • CLARITY: Cross Language Information Retrieval and Organisation of Text and Audio Documents 
  • Rob Gaizauskas & Mark Sanderson (Information Studies)
  • CLEF: CLinical E-Science Framework 
  • Rob Gaizauskas & Mark Hepple
  • CLUE II: Contextual Learning for detecting Unexpected Events 
  • Louise Guthrie
  • COMIC: COnversational Multimodal Interaction with Computers 
  • Yorick Wilks
  • COMPANIONS: Intelligent, Persistent, Personalised Multimodal Interfaces to the Internet 
  • Yorick Wilks
  • CRONOPATH: Information Retrieval/Extraction through time 
  • Yorick Wilks
  • CONVERSE: A Conversational Companion 
  • Yorick Wilks
  • CLUE: Contextual Learning for detecting Unexpected Events 
  • Louise Guthrie
  • Cub Reporter: QA and Summarisation for Preparation of Background News Reports 
  • Rob Gaizauskas, Yorick Wilks & Jonathan Foster (Jounalism Studies)
  • DALOS: DrAfting Legislation with Ontology-based Support 
  • Wim Peters
  • DAPPER: Natural Language Processing Tools for Discourse Analysis in Psychology 
  • Horacio Saggion
  • DecarboNET: A Decarbonisation Platform for Citizen Empowerment and Translating Collective Awareness into Behavioural Change
  • Kalina Bontcheva
  • DOT KOM: Designing Adaptive Information Extraction from Text for Knowledge Management and the Semantic Web 
  • Fabio Ciravegna
  • DotRural: A Text Analytic Approach to Rural and Urban Legal Histories 
  • Wim Peters
  • Expert: EXPloiting Empirical appRoaches to Translation 
  • Lucia Specia
  • : Extraction of Content: Research at Near Market 
  • Yorick Wilks
  • ELSE: Evaluation in Language and Speech Engineering 
  • Rob Gaizauskas
  • EMILLE: Enabling Minority Language Engineering 
  • Rob Gaizauskas
  • EMPATHIE: Enzyme and Metabolic Path Information Extraction 
  • Rob Gaizauskas
  • EMPIRICAL GRAMMAR: Inducing Adequate Grammars from Electronic Texts 
  • Yorick Wilks & Rob Gaizauskas
  • EnviLOD: 
  • Kalina Bontcheva
  • EWN: EuroWordNet 
  • Yorick Wilks
  • FASiL: Flexible and Adaptive Spoken Language and Multi-Modal Interfaces 
  • Yorick Wilks
  • FLaReNet: Fostering Language Resources Network 
  • Yorick Wilks & Wim Peters
  • ForgetIT: Concise Preservation by combining Managed Forgetting and Contextualized Remembering 
  • Hamish Cunningham
  • GATE Cloud Exploratory: Adapting the General Architecture for Text Engineering to Cloud Computing 
  • Hamish Cunningham
  • GoTag: Real-Time Text Mining for the Biomedical Literature: A Collaboration between Discoverynet & Mygrid 
  • Rob Gaizauskas
  • HUMAINE: Research on Emotions and Human-Machine Interaction 
  • Yorick Wilks & Daniela Romano
  • InPuT: Individual Profiling using Text Analysis 
  • Mark Stevenson
  • KHRESMOI: Knowledge Helper for Medical and Other Information users 
  • Hamish Cunningham
  • KTA PoC Award: Scaling-up WSD for the Life Sciences 
  • Mark Stevenson
  • KnowledgeWeb: Network on excellence on realising the Semantic Web 
  • Hamish Cunningham
  • LarKC: Large Scale Semantic Computing Semantic Web Technologies distributed reasoning 
  • Hamish Cunningham
  • LaSIE: Large Scale Information Extraction 
  • Yorick Wilks & Rob Gaizauskas
  • LEXDIS: Lexical Disambiguation for the Biomedical Domain 
  • Mark Stevenson
  • LIRICS: Linguistic Infrastructure for Interoperable Resources and Systems 
  • Kalina Bontcheva
  • LOIS: Lexical Ontologies for Legal Information Sharing 
  • Wim Peters
  • M4L: Memories for Life Network 
  • Yorick Wilks, Christopher Brewster & Mark Sanderson (Information Studies)
  • MALT: Mappings, Agglomerations and Lexical Tuning 
  • Yorick Wilks
  • METER: Measuring Text Reuse 
  • Rob Gaizauskas, Yorick Wilks & Jonathan Foster (Jounalism Studies)
  • MiAkt: Grid enabled knowledge services: collaborative problem solving environments in medical informatics 
  • Yorick Wilks & Fabio Ciravegna
  • MediaCampaign: Discovering, inter-relating and navigating cross-media campaign knowledge 
  • Hamish Cunningham
  • Medics: Language Processing for Literature Based Discovery in Medicine 
  • Mark Stevenson
  • MLi: Towards a MultiLingual Data Services infrastructure 
  • Hamish Cunningham
  • MoDiST: Modelling Discourse in Statistical Machine Translation 
  • Lucia Specia
  • MULTIFLORA_II: Combining Information Extraction and Knowledge Representation for Biodiversity Informatics 
  • Yorick Wilks & Hamish Cunningham
  • MultiMatch: Multilingual/Multimedia Access To Cultural Heritage 
  • Paul Clough (Information Studies)
  • MUMIS: Multi-Media Indexing and Searching Environment 
  • Yorick Wilks & Hamish Cunningham
  • MUSE: Multi-Source Entity finder 
  • Yorick Wilks
  • Musing: Multi-Industry, Semantic-based Next Generation Business IntelliGence 
  • Kalina Bontcheva
  • MyGrid: Supporting the Biologist E-Scientist 
  • Rob Gaizauskas
  • NAMIC: News Agencies Multilingual Information Categorisation 
  • Yorick Wilks
  • NEON: Lifecycle support for networked ontologies 
  • Hamish Cunningham
  • PAROLE/SIMPLE: Preparatory Action for Linguistic Resources Organistion for Language Engineering 
  • Yorick Wilks
  • PASTA: Protein Active Site Template Acquisition 
  • Yorick Wilks
  • PATHS: Personalised Access To cultural Heritage Spaces 
  • Mark Stevenson & Paul Clough (Information School)
  • PEEC: Partitioning the Enron Email Corpus 
  • Louise Guthrie
  • PEEC II: Partitioning the Enron Email Corpus 
  • Louise Guthrie
  • PHEME: Computing Veracity Across Media, Languages, and Social Networks
    Kalina Bontcheva
  • POESIA: Public Open-source Environment for a Safer Internet 
  • Mark Hepple
  • POETIC: The POrtable Extendable Traffic Information Collator 
  • Rob Gaizauskas
  • PrestoSpace: Digital preservation and rich metadata indexing of audio-video collections 
  • Hamish Cunningham
  • QTLaunchpad: Preparation and Launch of a Large-Scale Action for Quality Translation Technology 
  • Lucia Specia
  • RESuLT: Relation Extraction using Semi-Supervised Learning Techniques 
  • Mark Stevenson
  • REVEAL: The Identification of Anomalous Segments in Text on a Large Scale 
  • Louise Guthrie
  • REVEAL II: The Identification of Anomalous Segments in Text on a Large Scale 
  • Louise Guthrie
  • RolTech: Platform for Romanian Language Technology: Resources, Tools and Interfaces 
  • Valentin Tablan
  • SEKT: Semantically-Enabled Knowledge Technologies (central page) 
  • Hamish Cunningham
  • SENSEI: Making Sense of Human-Human Conversation Data
    Rob Gaizauskas
  • SenseMaking: Information Processing and Sensemaking: An Exploratory Search System for Document Collections 
  • Mark Stevenson
  • SERA: Social Engagagement with Robots and Agents 
  • Peter Wallis
  • ServiceFinder: Realizing Web Service Discovery at Web Scale 
  • Kalina Bontcheva
  • SLaTr: A Joint Model of Spoken Language Translation 
  • Trevor Cohn / Thomas Hain
  • Sumerian/ETCSL: Tools for linguistic annotation and Web-based analysis of literary Sumerian 
  • Hamish Cunningham
  • SOCIS: Scene Of Crime Information System 
  • Yorick Wilks
  • SToBS: Structured Transcription of Broadcast Speech 
  • Rob Gaizauskas
  • TaaS: Terminology as a Service 
  • Rob Gaizauskas
  • TAO: Transitioning Applications to Ontologies 
  • Kalina Bontcheva
  • h-Techsight: A Knowledge management platform with intelligence and insight capabilities for technology intensive industries 
  • Hamish Cunningham
  • TEXTvre: Emerging, collective intelligence for personal, organizational and social use 
  • Kalina Bontcheva & Angus Roberts
  • TrendMiner: Large-scale, Cross-lingual Trend Mining and Summarisation of Real-time Media Streams 
  • Kalina Bontcheva & Trevor Cohn
  • TRESTLE: Text Retrieval, Extraction and Summarisation for Large Enterprises 
  • Rob Gaizauskas & Micheline Beaulieu (Information Studies)
  • TRIPOD: TRI-Partite multimedia Object Description 
  • Mark Sanderson (Information Studies) & Rob Gaizauskas
  • uComp: Embedded Human Computation for Knowledge Extraction and Evaluation 
  • Wim Peters
  • VIEWGEN: Belief Modelling and Dialogue Systems 
  • Yorick Wilks
  • VIKEF: Virtual Information and Knowledge Environment Framework 
  • Rob Gaizauskas
  • VisualSense: Tagging visual data with semantic descriptions 
  • Rob Gaizauska

PhD Projects

Current Research Students

Current students listed in alphabetical order


Awarded PhD's

Awarded PhD's by year


2016

Gustavo Henrique Paetzold
Lexical Simplification for Non-Native Speakers
(Award Date: 24 October 2016)

Xingyi Song
Training Machine Translation for Human Acceptability
(Award Date: 16 October 2016)

Roland Roller
Information Extraction from Documents in the Life Sciences
(Award Date: 26 August 2016)

2015

Dominic Rout
A ranking approach to summarising Twitter home timelines
(Award Date: 24 November 2015)


2014

Nikolaos Aletras
Exploring the Semantics of Topic Models
(Award Date: 11 December 2014)

Ayman Alhelbawy
A new approach to information extraction from natural language texts
(Award Date: 23 September 2014)

Daniel Preotiuc-Pietro
Unsupervised learning for time-based clustering of language
(Award Date: 19 June 2014)

Ahmet Aker
Entity Type Modeling for Multi-Document Summarization of Geo-Located Entity Descriptions
(Award Date: 20 February 2014)


2013

Leon Derczynski
Determining the Types of Temporal Relations in Discourse
(Award Date: 2 October 2013)

Samuel Fernando 
enriching knowledge bases using relation extraction
(Award Date: 13 June 2013)

Giuseppe Di Fabbrizio
Automatic Summarization of Opinions in Service and Product Reviews
(Award Date: 8 May 2013)


2012

Angus Roberts
Clinical Information Extraction: Lowering the Barrier
(Award Date: 18 December 2012)

Rao Muhammad Adeel Nawab
Mono-lingual Paraphrased Text reuse and Plagiarism detection
(Award Date: 18 September 2012)

Niraj Aswani
Evolving a Generail Framework for Text Alignment: Case Studies with Two Asian Languages
(Award Date: 7 August 2012)

Kumutha Swampillai
Information Extraction Across Sentences
(Award Date: 7 March 2012)


2011

Angelo Dalli
Timeline Extraction From Hyperlinked Text Corpora
(Award Date: 10 October 2011)

Danica Damljanovic
Natural Language Interfaces to Conceptual Models
(Award Date: 18 August 2011)


2010

Ben Allison
An Improved Hierarchical Bayesian Model of Language for Document Classification
(Award Date: 21 October 2010)

Nick Webb
Cue-based dialogue act classification
(Award Date: 16 March 2010)

Sanaz Jabbari
A Statistical Model of Lexical Context
(Award Date: 23 February 2010)

Valentin Tablan
Toward Portable Information Extraction
(Award Date: 25 January 2010)


2008

David Guthrie
Unsupervised Detection of Anomalous Text
(Award Date: 3 December 2008)

Joe Polifroni
Enabling Browsing in Interactive Systems
(Award Date: 18 November 2008)

Christopher Brewster
Mind the Gap: Bridging from text to ontological Knowledge
(Award Date: 1 October 2008)

Francios Mairesse
Learning to Adapt in Dialogue Systems: Data-driven Models for Personality Recognition and Generation
(Award Date: 30 September 2008)

Hrafn Loftsson
Tagging and Parsing Icelandic Text
(Award Date: 5 February 2008)


2007

Michael Conway
Approaches to Automatic Biographical Sentence Classification: An Empirical Study
(Award Date: 27 July 2007)


2006

Mark Greenwood
Open-Domain Question Answering
(Award Date: 13 March 2006)


2005

Fang Huang
Multi-Document Summarization with Latent Semantic Analysis
(Award Date: 19 May 2005)

Ekaterini Pastra
Vision \96 Language Integration: a Double-Grounding Case
(Award Date: 5 January 2005)


2004

Alexiei Dingli
Annotating the Semantic Web
(Award Date: 6 December 2004)

Wim Peters
Detection and Characterization of Figurative Language Use WordNet
(Award Date: 29 November 2004)

Diego Uribe
LEEP: Learning Event Extraction Patterns
(Award Date: 18 October 2004)

Brian Mitchell
Prepositional Phase Attachment using Machine Learning Algorithms
(Award Date: 5 July 2004)


2003

Paul Clough
Measuring Text Reuse
(Award Date: 11 April 2003)


2002

Tomas By
Tears in the Rain
(Award Date: 15 March 2002)

Andrea Setzer
Temporal information in newswrite articles: An annotation scheme and corpus study
(Award Date: 15 March 2002)


2001

Kalina Bontcheva
Generating Adaptive Hypertext
(Award Date: 17 September 2001)

Alexandar Krotov
Parsing with a Compacted Treebank Grammar
(Award Date: 17 September 2001)


2000

ChunYu Kit 
Unsupervised Lexical Learning as Inductive Inference
(Award Date: 15 November 2000)

Hamish Cunningham
Software Architecture for Language Engineering
(Award Date: 10 July 2000)

H.M. Harmain
Building Object-Oriented Conceptual Models Using Natural Language Processing Techniques
(Award Date: 2000)

Paul Woods
Cognitive Schemas for Chinese Noun Classifiers: A Corpus-Based Investigation
(Award Date: 25 February 2000)


1999

Ted Dunning
Finding Structure In Text Genome And Other Symbolic Sequences
(Award Date: 29 November 1999)

Mark Stevenson
Multiple Knowledge Sources for Word Sense Disambiguation
(Award Date: 27 September 1999)

Hammid Khosravi
Extracting Pragmatic Content From Email
(Award Date: 9 August 1999)


1998

Mark Lee
Belief Rationality and Inference
(Award Date: 14 December 1998)

Rob Collier
Automatic Template Creation for Information Extraction
(Award Date: 10 August 1998)

Resources Group member resources