Dr Carolina Scarton
BSc, MSc, PhD
Department of Computer Science
Outreach, Open Days and Headstart Officer
Member of the Natural Language Processing research group
+44 114 222 1892
Full contact details
Department of Computer Science
Regent Court (DCS)
Carolina Scarton is an Academic Fellow at the Department of Computer Science, University of Sheffield, UK. She is a member of the Natural Language Processing group and part of the GATE team.
Previously, she worked as a Research Associate for the WeVerify (from March 2019 to August 2019) and SIMPATICO (from July 2016 to February 2019) European projects.
In 2017, she was awarded a PhD degree in Computer Science from the University of Sheffield, under the supervision of Professor Lucia Specia. Her PhD was funded by the EXPERT project (a Marie Curie ITN network).
She also has a MSc and a BSc degree from the University of São Paulo, Brazil (awarded in 2013).
Her MSc supervisor was Dr. Sandra Aluísio and she was a member of the Interinstitutional Center for Computational Linguistics (NILC). Since 2018, she is the Secretary for the European Association for Machine Translation (EAMT).
- Research interests
Dr Scarton's research area is Natural Language Processing (NLP). She is particularly interested in text adaptation, machine translation, online misinformation detection and verification, evaluation of NLP task outputs, NLP applied to healthcare and robotics, and dialog systems.
- Quality Estimation for Machine Translation. Morgan & Claypool Publishers LLC.
- Special Issue on Disinformation, Hoaxes and Propaganda within Online Social Networks and Media. Online Social Networks and Media, 23.
- Data-driven sentence simplification: Survey and benchmark. Computational Linguistics, 135-187. View this article in WRRO
- Horacio Saggion, Automatic Text Simplification. Synthesis lectures on human language technologies, April 2017.. Nat. Lang. Eng., 26, 489-492.
- Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis. Dementia & Neuropsychologia, 8(3), 227-235. View this article in WRRO
- A Quantitative Analysis of Discourse Phenomena in Machine Translation. Discours(16).
- Multistage BiCross Encoder: Team GATE Entry for MLIA Multilingual Semantic Search Task 2.
- Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis.
- Measuring What Counts: The case of Rumour Stance Classification.
- The False COVID-19 Narratives That Keep Being Debunked: A Spatiotemporal Analysis.
- Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of COVID-19 Infodemic.
- Horacio Saggion, automatic text simplification. Synthesis lectures on human language technologies, April 2017. 137 pages, ISBN:1627058680 9781627058681. Natural Language Engineering. View this article in WRRO
Conference proceedings papers
- Measuring the Impact of Readability Features in Fake News Detection.. LREC (pp 1404-1413)
- ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations.. ACL (pp 4668-4679)
- Linguistic Analysis Model for Monitoring User Reaction on Satirical News for Brazilian Portuguese (pp 313-320)
- Deciding When, How and for Whom to Simplify.. ECAI, Vol. 325 (pp 2172-2179)
- Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis.. AACL/IJCNLP (pp 914-924)
- Measuring What Counts: The Case of Rumour Stance Classification.. AACL/IJCNLP (pp 925-932)
- ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, July 2020 - July 2020. View this article in WRRO
- EASSE: easier automatic sentence simplification evaluation. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations (pp 49-54). Hong Kong, China, 3 November 2019 - 7 November 2019. View this article in WRRO
- Cross-Sentence Transformations in Text Simplification.. WNLP@ACL (pp 181-184)
- Text simplification from professionally produced corpora. LREC 2018 - 11th International Conference on Language Resources and Evaluation (pp 3504-3510)
- Simpa: A sentence-level simplification corpus for the public administration domain. LREC 2018 - 11th International Conference on Language Resources and Evaluation (pp 4333-4338)
- Exploring Gap Filling as a Cheaper Alternative to Reading Comprehension Questionnaires when Evaluating Machine Translation for Gisting. Proceedings of the Third Conference on Machine Translation, Vol. 1 (pp 192-203), 31 October 2018 - 1 November 2018. View this article in WRRO
- Exploring gap filling as a cheaper alternative to reading comprehension questionnaires when evaluating machine translation for gisting. Proceedings of the Third Conference on Machine Translation: Research Papers, October 2018 - October 2018.
- Sheffield Submissions for WMT18 Multimodal Translation Shared Task. Proceedings of the Third Conference on Machine Translation: Shared Task Papers, October 2018 - October 2018.
- Sheffield Submissions for the WMT18 Quality Estimation Shared Task. Proceedings of the Third Conference on Machine Translation: Shared Task Papers, October 2018 - October 2018.
- Learning Simplifications for Specific Target Audiences. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), July 2018 - July 2018.
- Bilexical Embeddings for Quality Estimation. Proceedings of the Second Conference on Machine Translation, September 2017 - September 2017.
- Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs.. IJCNLP(1) (pp 295-305)
- MUSST: A Multilingual Syntactic Simplification Tool.. IJCNLP (System Demonstrations) (pp 25-28)
- Improving Evaluation of Document-level Machine Translation Quality Estimation. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, April 2017 - April 2017.
- Findings of the 2016 Conference on Machine Translation. Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, August 2016 - August 2016.
- Quality estimation for language output applications. COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Tutorial Abstracts (pp 14-17)
- A reading comprehension corpus for machine translation evaluation. Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016 (pp 3652-3658)
- Word embeddings and discourse information for Quality Estimation. Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, August 2016 - August 2016.
- SAARSHEFF at SemEval-2016 Task 1: Semantic Textual Similarity with Machine Translation Evaluation Metrics and (eXtreme) Boosted Tree Ensembles. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), June 2016 - June 2016.
- Evaluating Progression of Alzheimer’s Disease by Regression and Classification Methods in a Narrative Language Test in Portuguese (pp 109-114)
- Multi-level Translation Quality Prediction with QuEst++. Proceedings of ACL-IJCNLP 2015 System Demonstrations, July 2015 - July 2015.
- Searching for context: A study on document-level labels for translation quality estimation. EAMT 2015 - Proceedings of the 18th Annual Conference of the European Association for Machine Translation (pp 121-128)
- USHEF and USAAR-USHEF participation in the WMT15 QE shared task. Proceedings of the Tenth Workshop on Statistical Machine Translation, September 2015 - September 2015.
- USAAR-SHEFFIELD: Semantic Textual Similarity with Deep Regression and Machine Translation Evaluation Metrics. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), June 2015 - June 2015.
- Discourse and Document-level Information for Evaluating Language Output Tasks. Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, June 2015 - June 2015.
- Findings of the 2015 Workshop on Statistical Machine Translation. Proceedings of the Tenth Workshop on Statistical Machine Translation, September 2015 - September 2015.
- Using Cross-Linguistic Knowledge to Build VerbNet-Style Lexicons: Results for a (Brazilian) Portuguese VerbNet (pp 149-160)
- Exploring Consensus in Machine Translation for Quality Estimation. Proceedings of the Ninth Workshop on Statistical Machine Translation, June 2014 - June 2014.
- Verb Clustering for Brazilian Portuguese (pp 25-39)
- Document-level translation quality estimation: Exploring discourse and pseudo-references. Proceedings of the 17th Annual Conference of the European Association for Machine Translation, EAMT 2014 (pp 101-108)
- Verb clustering for Brazilian Portuguese. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 8403 LNCS(PART 1) (pp 25-39)
- Computational Processing of the Portuguese Language
- Revisiting the readability assessment of texts in Portuguese. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 6433 LNAI (pp 306-315)
- SIMPLIFICA: a tool for authoring simplified texts in Brazilian Portuguese guided by readability assessments.. NAACL (Demos) (pp 41-44)
- Text readability analysis with natural language processing tools: The adaptation of coh-metrix metrics for Portuguese. STIL 2009 - 2009 7th Brazilian Symposium in Information and Human Language Technology (pp 53-62)
- Probing for idiomaticity in vector space models. Proceedings of the 16th conference of the European Chapter of the Association for Computational Linguistics
- View this article in WRRO Estimating post-editing effort: a study on human judgements, task-based and reference-based metrics of MT quality
- Assessing Idiomaticity Representations in Vector Models with a Noun Compound Dataset Labeled at Type and Token Levels. Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)
Theses / Dissertations
Modelling the link between working memory and language deficits in schizophrenia, Royal Society, 12/2020 - 11/2022, £74,000, as Co-PI
Modeling Idiomaticity in Human and Artificial Language Processing, EPSRC, 06/2020 - 05/2023, £446,163, as Co-PI