Dr Judita Preiss
MA (Cambridge), MPHil (Cambridge), PhD (Cambridge)
Lecturer in Data Science
Full contact details
Regent Court (IS)
I have a MA Cantab in Mathematics, MPhil in Computer Speech and Language Processing from Engineering and a PhD in Natural Language Processing (Computer Science) all from Cambridge. Natural Language Processing was a way to combine my interest in Mathematics and Languages.
After finishing my PhD, I was an RA at Cambridge in the Natural Language Processing group, working on multiple projects. Between 2008-2010, I was a visiting professor at The Ohio State University, before returning to the UK to undertake a number of research projects in the Natural Language Processing group at the University of Sheffield. The constant need for more and more data fuelled an interest in approaches to gathering data and big data techniques, and I took up a post as a lecturer in Data Science at the University of Salford, which I held from 2017 to 2022.
Alongside my interest in data, I have worked on knowledge transfer to industry and applications of my research to real life settings.
- Research interests
I have a great number of interests: my current research topics range from work in the biomedical domain (such as automatic discoveries) with the associated applications in health, through mental health which includes work with social media texts as well as other sources of input, the automatic organization of data and presentation of it to users, to approaches involving multiple languages and automatically detectable differences between cultures.
I am very interested in work which involves text or speech, particularly when large quantities of data are involved. My current areas of PhD topics include:
- mining, and deriving, of knowledge and applications
- social media applications
- automatic arranging of knowledge
- multi-lingual models and the differences between these
- Validation through a comparison of physical examination and DNA test results: OLFML3 case study. Meta Gene, 27, 100819-100819.
- Is automatic detection of hidden knowledge an anomaly?. BMC Bioinformatics, 20(S10).
- Quantifying and filtering knowledge generated by literature based discovery. BMC Bioinformatics, (Suppl 7):249, 59-67. View this article in WRRO
- The Effect of Word Sense Disambiguation Accuracy on Literature Based Discovery. BMC Medical Informatics and Decision Making, 16(Suppl 1). View this article in WRRO
- The effect of word sense disambiguation accuracy on literature based discovery. BMC Medical Informatics and Decision Making, 16(S1).
- Exploring relation types for literature-based discovery. Journal of the American Medical Informatics Association, 22(5), 987-992. View this article in WRRO
- Towards semantic literature based discovery. AAAI Fall Symposium - Technical Report, FS-12-05, 86-87.
- A detailed comparison of WSD systems: an analysis of the system answers for the S
ENSEVAL-2 English all words task. Natural Language Engineering, 12(3), 209-228.
- Probabilistic word sense disambiguation. Computer Speech & Language, 18(3), 319-337.
- Introduction to the special issue on word sense disambiguation. Computer Speech & Language, 18(3), 201-207.
- Predicting the impact of online news articles – is information necessary?. Multimedia Tools and Applications.
Conference proceedings papers
- View this article in WRRO HiDE: A Tool for Unrestricted Literature Based Discovery. COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings of System Demonstrations (pp 34-37)
- Distinguishing Common and Proper Nouns. SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task: Semantic Textual SimilaritySEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity (pp 80-84)
- Unsupervised Domain Tuning to Improve Word Sense Disambiguation. Proceedings of the 2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013 (pp 680-684)
- Distinguishing Common and Proper Nouns. *SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Vol. 1 (pp 80-84)
- DALE: A Word Sense Disambiguation System for Biomedical Documents Trained using Automatically Labeled Examples. 2013 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013 - Demonstration Session (pp 1-4)
- Unsupervised domain tuning to improve word sense disambiguation. NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference (pp 680-684)
- Scaling up WSD with automatically generated examples. BioNLP@HLT-NAACL 2012 - Workshop on Biomedical Natural Language Processing, Proceedings (pp 231-239)
- Identifying comparable corpora using LDA. NAACL HLT 2012 - 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp 558-562)
- University of Sheffield: Two approaches to semantic text similarity. *SEM 2012 - 1st Joint Conference on Lexical and Computational Semantics, Vol. 2 (pp 655-661)
- A system for large-scale acquisition of verbal, nominal and adjectival subcategorization frames from corpora. ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (pp 912-919)
- Can anaphoric definite descriptions be replaced by pronouns?. Proceedings of the 4th International Conference on Language Resources and Evaluation, LREC 2004 (pp 1499-1502)
- Probabilistic WSD in SENSEVAL-3. Proceedings of the SENSEVAL@ACL 2004: 3rd International Workshop on the Evaluation of Systems for the Semantic Analysis of Text - Held in cooperation with ACL 2004 (pp 213-216)
- WSD for subcategorization acquisition task description. Proceedings of the SENSEVAL@ACL 2004: 3rd International Workshop on the Evaluation of Systems for the Semantic Analysis of Text - Held in cooperation with ACL 2004 (pp 33-36)
- Using grammatical relations to compare parsers. 10th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2003 (pp 291-298)
- Subcategorization acquisition as an evaluation method for WSD. Proceedings of the 3rd International Conference on Language Resources and Evaluation, LREC 2002 (pp 1551-1556)
- Predicting Informativeness Of Semantic Triples. Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications
- Teaching activities
- Leading the Big Data module(INF6032)
- Contributing to Introduction to Programming (INF4002)
- Contributing to Practical Programming for Data Science (INF111)
- Professional activities
As well as being Databricks certified Associate Developer for Apache Spark 3.0 - Python, I am an active member of the Databricks University Alliance. Similarly, I have been involved with Amazon Web Services, where I'm certified SysOps Administrator - Associate as well as being an AWS Academy Educator. I am also a member of the rolling review panel for ACL.