Dr Mark Hepple
MSc, PhD
School of Computer Science
Reader
Member of the Natural Language Processing (NLP) research group
+44 114 222 1829
Full contact details
School of Computer Science
Regent Court (DCS)
211 Portobello
Sheffield
S1 4DP
- Profile
-
Mark Hepple is a Reader in Computer Science. He studied Psychology at Sheffield University (BSc, 1986), and Cognitive Science at Edinburgh University (MSc, 1987; PhD, 1990). Thereafter, he was a Research Associate at Cambridge University (1990-92), and a Postdoctoral Research Fellow at the University of Pennsylvania (1992-93).
He joined the Department of Computer Science at Sheffield University in 1993, as a Lecturer, and as a member of the Natural Language Processing group.
- Research interests
-
Dr Hepple has wide-ranging interests across Computational Linguistics and Natural Language Processing, and has published on many topics, including formal grammar and parsing, information extraction, clinical text mining, temporal information processing, robust dialogue processing, and efficient storage of large-scale linguistic data.
- Publications
-
Journal articles
- Toward an effective Igbo part-of-speech tagger. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(4). View this article in WRRO
- A Basic Language Resource Kit Implementation for the IgboNLP Project. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) , 17(2). View this article in WRRO
- Sub-story detection in Twitter with hierarchical Dirichlet processes. Information Processing & Management, 53(4), 989-1003. View this article in WRRO
- The TempEval challenge: identifying temporal relations in text.. Lang. Resour. Evaluation, 43, 161-179.
- Mining clinical relationships from patient narratives.. BMC Bioinformatics, 9 Suppl 11(Suppl 11), S3.
- The CLEF corpus: semantic annotation of clinical text.. AMIA Annu Symp Proc, 2007, 625-629.
- A web service for biomedical term look-up.. Comp Funct Genomics, 6(1-2), 86-93.
- Evaluating two methods for Treebank grammar compaction.. Nat. Lang. Eng., 5, 377-394.
- Feature-based formalism for two-level phonology: A description and implementation. Computer Speech and Language, 7(4), 333-358.
Book chapters
- Using Semantic Inferences for Temporal Annotation Comparison, The Language Of Time (pp. 575-584). Oxford University PressOxford
- Machine Learning Approaches to Human Dialogue Modelling, Text, Speech and Language Technology (pp. 355-370). Springer Netherlands
- Two Functional Approaches For Interpreting D-Tree Grammar Derivations, Studies in Linguistics and Philosophy (pp. 185-204). Springer Netherlands
- Grammatical relations and the Lambek calculus, Discontinuous Constituency (pp. 255-278). DE GRUYTER
Conference proceedings
- Multi-task projected embedding for Igbo. Text, Speech, and Dialogue : 21st International Conference, Proceedings (pp 285-294). Brno, Czech Republic, 11 September 2018 - 11 September 2018. View this article in WRRO
- Igbo Diacritic Restoration using Embedding Models. Proceedings of the 2018 Conference of the North American Chapter of
the Association for Computational Linguistics: Student Research
Workshop (pp 54-60), June 2018 - June 2018.
- Transferred Embeddings for Igbo Similarity, Analogy and Diacritic Restoration Tasks. Coling 2018 3rd Workshop on Semantic Deep Learning Semdeep 2018 Proceedings (pp 30-38)
- The SENSEI Overview of Newspaper Readers’ Comments. Advances in Information Retrieval. ECIR 2017. Lecture Notes in Computer Science, vol 10193. Springer (pp 758-761)
- Automatic Label Generation for News Comment Clusters. Proceedings of the 9th International Natural Language Generation Conference (pp 61-69). Edinburgh, UK, 5 September 2016 - 5 September 2016. View this article in WRRO
- Automatic Restoration of Diacritics for Igbo Language. Text, Speech, and Dialogue, Vol. 9924 (pp 198-205). Brno, CzechRepublic, 12 September 2016 - 12 September 2016. View this article in WRRO
- Predicting Morphologically-Complex Unknown Words in Igbo. Text, Speech, and Dialogue, Vol. 9924 (pp 206-214). Brno, CzechRepublic, 12 September 2016 - 12 September 2016. View this article in WRRO
- The SENSEI Annotated Corpus: Human Summaries of Reader Comment Conversations in On-line News. Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp 42-52). Los Angeles, USA, 13 September 2017 - 13 September 2017. View this article in WRRO
- What's the issue here?: Task-based evaluation of reader comment summarization systems. Proceedings of LREC 2016, Tenth International Conference on Language Resources and Evaluation (pp 2094-3101). Portorož, Slovenia, 23 May 2016 - 23 May 2016. View this article in WRRO
- Studying the temporal dynamics of word co-occurrences: An application to event detection. Proceedings of the 10th International Conference on Language Resources and Evaluation Lrec 2016 (pp 4380-4387)
- Améliorer la précision d’annotation d’un corpus Igbo parreconstruction morphologique et l’apprentissage basé sur latransformation. Proceedings of TALAf 2016 - Traitement automatique des langues africaines (pp 1-10). Paris, France View this article in WRRO
- A Graph-Based Approach to Topic Clustering for Online Comments to News. Advances in Information Retrieval, Vol. 9626 (pp 15-29). Padua, Italy, 20 March 2016 - 20 March 2016. View this article in WRRO
- Sheffield-Trento System for Sentiment and Argument Structure Enhanced Comment-to-Article Linking in the Online News Domain (Ahmet Aker, Fabio Celli, Adam Funk, Emina Kurtic, Mark Hepple and Rob Gaizauskas). MultiLing 2015 in SIGDIAL. Prague, 2 September 2015 - 4 September 2015.
- Use of Transformation-Based Learning in Annotation Pipeline of Igbo, an African Language. Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (pp 24-33). Hissar, Bulgaria View this article in WRRO
- Comment-to-Article Linking in the Online News Domain. Proceedings of the SIGDIAL 2015 Conference (pp 245-249). Prague, 2 September 2015 - 2 September 2015. View this article in WRRO
- Part-of-speech Tagset and Corpus Development for Igbo, an African. Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop (pp 93-98). Dublin, Ireland View this article in WRRO
- Reliably evaluating summaries of twitter timelines. AAAI 2013 Spring Symposium on Analyzing Microtext. Stanford
- Storing the Web in Memory: Space Efficient Language Models with Constant Time Retrieval.. EMNLP (pp 262-272)
- Evaluating Lexical Substitution: Analysis and New Measures. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (pp 3250-3254)
- Evaluation Metrics for the Lexical Substitution Task.. HLT-NAACL (pp 289-292)
- Efficient Minimal Perfect Hash Language Models.. LREC
- Evaluating Lexical Substitution: Analysis and New Measures.. LREC
- Building a semantically annotated corpus of clinical texts.. J. Biomed. Informatics, Vol. 42 (pp 950-966)
- Cross-Domain Dialogue Act tagging. Proceedings of the 6th International Conference on Language Resources and Evaluation Lrec 2008 (pp 1969-1976)
- Cross-Domain Dialogue Act Tagging.. LREC
- Combining Terminology Resources and Statistical Methods for Entity Recognition: an Evaluation.. LREC
- SemEval-2007 task 15. Proceedings of the 4th International Workshop on Semantic Evaluations - SemEval '07 (pp 75-80), 23 June 2007 - 24 June 2007.
- SemEval-2007 task 15: TempEval temporal relation identification. Acl 2007 Semeval 2007 Proceedings of the 4th International Workshop on Semantic Evaluations (pp 75-80)
- USFD: Preliminary exploration of features and classifiers for the TempEval-2007 tasks. Acl 2007 Semeval 2007 Proceedings of the 4th International Workshop on Semantic Evaluations (pp 438-441)
- Task-Oriented Extraction of Temporal Information: The Case of Clinical Narratives.. TIME (pp 188-195)
- The Role of Inference in the Temporal Annotation and Analysis of Text.. Lang. Resour. Evaluation, Vol. 39 (pp 243-265)
- The University of Sheffield's TREC 2005 Q&A Experiments.. TREC, Vol. 500-266
- SUPPLE. Proceedings of the Ninth International Workshop on Parsing Technology - Parsing '05 (pp 200-201), 9 October 2005 - 10 October 2005.
- SUPPLE: A practical parser for natural language engineering applications. Iwpt 2005 Proceedings of the 9th International Workshop on Parsing Technologies (pp 200-201)
- Error analysis of Dialogue Act classification. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, Vol. 3658 (pp 451-458)
- Human dialogue modelling using machine learning. Recent Advances in Natural Language Processing III, Vol. 260 (pp 17-28)
- The University of Sheffield's TREC 2004 QA Experiments.. TREC, Vol. 500-261
- Information retrieval for question answering a SIGIR 2004 workshop.. SIGIR Forum, Vol. 38 (pp 41-44)
- Human Dialogue Modelling Using Annotated Corpora.. LREC
- A Large-Scale Resource for Storing and Recognizing Technical Terminology.. LREC
- NLP-enhanced Content Filtering within the POESIA Project. Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC2004)
- The University of Sheffield's TREC 2003 Q&A Experiments.. TREC, Vol. 500-255 (pp 782-790)
- Independence and commitment. Proceedings of the 38th Annual Meeting on Association for Computational Linguistics - ACL '00 (pp 278-277), 3 October 2000 - 6 October 2000.
- Independence and commitment: Assumptions for rapid training and execution of rule-based POS taggers. 38TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE (pp 278-285)
- Compacting the Penn Treebank Grammar. CoRR, Vol. cs.CL/9902001
- An Earley-style Predictive Chart Parsing Method for Lambek Grammars.. ACL (pp 465-472)
- University of Sheffield TREC-8 Q&A System.. TREC, Vol. 500-246
- Memoisation for glue language deduction and categorial parsing. Proceedings of the 17th international conference on Computational linguistics -, Vol. 1 (pp 538-538), 10 August 1998 - 14 August 1998.
- Linear Categorial Deduction via First-order Compilation.. TAPD (pp 108-117)
- Memoisation for Glue Language Deduction and Categorial Parsing.. COLING-ACL (pp 538-544)
- Compacting the Penn Treebank Grammar.. COLING-ACL (pp 699-703)
- Maximal incrementality in linear categorial deduction. Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics - (pp 344-351), 7 July 1997 - 12 July 1997.
- Maximal incrementality in linear categorial deduction. 35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE (pp 344-351)
- A Compilation-Chart Method for Linear Categorial Deduction.. COLING (pp 537-542)
- Hybrid Categorial Logics.. Log. J. IGPL, Vol. 3 (pp 343-355)
- Mixing modes of linguistic description in categorial grammar. SEVENTH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (pp 127-132)
- Discontinuity And The Lambek Calculus.. COLING (pp 1235-1239)
- Chart Parsing Lambek Grammars: Modal Extensions And Incrementality.. COLING (pp 134-140)
- EFFICIENT INCREMENTAL PROCESSING WITH CATEGORIAL GRAMMAR. 29TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS : PROCEEDINGS OF THE CONFERENCE (pp 79-86)
- PROOF FIGURES AND STRUCTURAL OPERATORS FOR CATEGORIAL GRAMMAR. FIFTH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (pp 198-203)
- Normal Form Theorem Proving for the Lambek Calculus.. COLING (pp 173-178)
- PARSING AND DERIVATIONAL EQUIVALENCE. FOURTH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (pp 10-18)
- Lexical Disambiguation of Igbo using Diacritic Restoration. Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications (pp 53-60). Valencia, Spain, 3 April 2017 - 3 April 2017. View this article in WRRO
Preprints
- Toward an effective Igbo part-of-speech tagger. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(4). View this article in WRRO
- Grants
-
- SENSEI: Making Sense of Human - Human Conversation, EC FP7, 11/2013 - 10/2016, £459,034, as Co-PI
- uComp: Embedded Human Computation for Knowledge Extraction and Evaluation, EPSRC, 11/2012 - 05/2016, £375,621, as Co-PI
- Reveal II, GCHQ, 10/2008 - 03/2010, £141,763, as PI
- CA4NLP: Engineering Natural Language Interfaces: can CA help?, EPSRC, 04/2008 - 03/2009, £49,480, as PI
- CLEF-Services, MRC, 01/2005 - 06/2008, £401,021, as Co-PI
- CLEF: Clinical E-Science Framework, MRC, 10/2002 - 01/2006, £280,725, as Co-PI
- POESIA: Public Open-source Environment for a Safer Internet, EC FP6, 02/2002 to 02/2004, £89,129, as PI
- Professional activities and memberships
-
Member of the Natural Language Processing research group