Dr Judita Preiss

MA (Cambridge), MPHil (Cambridge), PhD (Cambridge)

Information School

Lecturer in Data Science

Judita Preiss
Profile picture of Judita Preiss
judita.preiss@sheffield.ac.uk

Full contact details

Dr Judita Preiss
Information School
Room 235
Regent Court (IS)
211 Portobello
Sheffield
S1 4DP
Profile

I have a MA Cantab in Mathematics, MPhil in Computer Speech and Language Processing from Engineering and a PhD in Natural Language Processing (Computer Science) all from Cambridge. Natural Language Processing was a way to combine my interest in Mathematics and Languages.

After finishing my PhD, I was an RA at Cambridge in the Natural Language Processing group, working on multiple projects. Between 2008-2010, I was a visiting professor at The Ohio State University, before returning to the UK to undertake a number of research projects in the Natural Language Processing group at the University of Sheffield. The constant need for more and more data fuelled an interest in approaches to gathering data and big data techniques, and I took up a post as a lecturer in Data Science at the University of Salford, which I held from 2017 to 2022.

Alongside my interest in data, I have worked on knowledge transfer to industry and applications of my research to real life settings.
 

Research interests

I have a great number of interests: my current research topics range from work in the biomedical domain (such as automatic discoveries) with the associated applications in health, through mental health which includes work with social media texts as well as other sources of input, the automatic organization of data and presentation of it to users, to approaches involving multiple languages and automatically detectable differences between cultures.

I am very interested in work which involves text or speech, particularly when large quantities of data are involved. My current areas of PhD topics include:

  • mining, and deriving, of knowledge and applications
  • social media applications
  • automatic arranging of knowledge
  • multi-lingual models and the differences between these
Publications

Journal articles

Conference proceedings papers

  • Preiss J (2023) Avoiding background knowledge: literature based discovery from important information. BMC Bioinformatics, Vol. 23(S9), 22 October 2021 - 22 October 2021. RIS download Bibtex download
  • Preiss J & Stevenson M (2018) HiDE: A Tool for Unrestricted Literature Based Discovery. COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings of System Demonstrations (pp 34-37) View this article in WRRO RIS download Bibtex download
  • Preiss J (2014) Seeking Informativeness in Literature Based Discovery. Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp 112-117) RIS download Bibtex download
  • Preiss J & Stevenson M (2013) Distinguishing Common and Proper Nouns. SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task: Semantic Textual SimilaritySEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity (pp 80-84) RIS download Bibtex download
  • Preiss J & Stevenson M (2013) Unsupervised Domain Tuning to Improve Word Sense Disambiguation. Proceedings of the 2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013 (pp 680-684) RIS download Bibtex download
  • Preiss J & Stevenson M (2013) Distinguishing Common and Proper Nouns. *SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Vol. 1 (pp 80-84) RIS download Bibtex download
  • Preiss J & Stevenson M (2013) DALE: A Word Sense Disambiguation System for Biomedical Documents Trained using Automatically Labeled Examples. 2013 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013 - Demonstration Session (pp 1-4) RIS download Bibtex download
  • Preiss J & Stevenson M (2013) Unsupervised domain tuning to improve word sense disambiguation. NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference (pp 680-684) RIS download Bibtex download
  • Cheng W, Preiss J & Stevenson M (2012) Scaling up WSD with automatically generated examples. BioNLP@HLT-NAACL 2012 - Workshop on Biomedical Natural Language Processing, Proceedings (pp 231-239) RIS download Bibtex download
  • Preiss J (2012) Identifying comparable corpora using LDA. NAACL HLT 2012 - 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp 558-562) RIS download Bibtex download
  • Biggins S, Mohammed S, Oakley S, Stringer L, Stevenson M & Priess J (2012) University of Sheffield: Two approaches to semantic text similarity. *SEM 2012 - 1st Joint Conference on Lexical and Computational Semantics, Vol. 2 (pp 655-661) RIS download Bibtex download
  • Preiss J, Briscoe T & Korhonen A (2007) A system for large-scale acquisition of verbal, nominal and adjectival subcategorization frames from corpora. ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (pp 912-919) RIS download Bibtex download
  • Preiss J, Gasperin C & Briscoe T (2004) Can anaphoric definite descriptions be replaced by pronouns?. Proceedings of the 4th International Conference on Language Resources and Evaluation, LREC 2004 (pp 1499-1502) RIS download Bibtex download
  • Preiss J (2004) Probabilistic WSD in SENSEVAL-3. Proceedings of the SENSEVAL@ACL 2004: 3rd International Workshop on the Evaluation of Systems for the Semantic Analysis of Text - Held in cooperation with ACL 2004 (pp 213-216) RIS download Bibtex download
  • Preiss J & Korhonen A (2004) WSD for subcategorization acquisition task description. Proceedings of the SENSEVAL@ACL 2004: 3rd International Workshop on the Evaluation of Systems for the Semantic Analysis of Text - Held in cooperation with ACL 2004 (pp 33-36) RIS download Bibtex download
  • Preiss J (2003) Using grammatical relations to compare parsers. 10th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2003 (pp 291-298) RIS download Bibtex download
  • Korhonen A & Preiss J (2003) Improving subcategorization acquisition using word sense disambiguation. Proceedings of the Annual Meeting of the Association for Computational Linguistics, Vol. 2003-July RIS download Bibtex download
  • Preiss J, Korhonen A & Briscoe T (2002) Subcategorization acquisition as an evaluation method for WSD. Proceedings of the 3rd International Conference on Language Resources and Evaluation, LREC 2002 (pp 1551-1556) RIS download Bibtex download
  • Preiss J (2001) Anaphora Resolution with Word Sense Disambiguation. Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp 143-146) RIS download Bibtex download
  • Preiss J () Predicting Informativeness Of Semantic Triples. Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications RIS download Bibtex download
Teaching activities
  • Leading the Big Data module(INF6032)
  • Contributing to Introduction to Programming (INF4002)
  • Contributing to Practical Programming for Data Science (INF111)
Professional activities and memberships

As well as being Databricks certified Associate Developer for Apache Spark 3.0 - Python, I am an active member of the Databricks University Alliance. Similarly, I have been involved with Amazon Web Services, where I'm certified SysOps Administrator - Associate as well as being an AWS Academy Educator. I am also a member of the rolling review panel for ACL.