Dr Carolina Scarton
BSc, MSc, PhD
Department of Computer Science
Lecturer in Natural Language Processing
Outreach, Open Days and Headstart Officer
Member of the Natural Language Processing research group

+44 114 222 1892
Full contact details
Department of Computer Science
Regent Court (DCS)
211 Portobello
Sheffield
S1 4DP
- Profile
-
Carolina Scarton is a Lecturer in Natural Language Processing at the Department of Computer Science, University of Sheffield, UK. She is a member of the Natural Language Processing group and part of the GATE team.
Previously, she worked as an Academic Fellow (from September 2019 to November 2021) and as a Research Associate for the WeVerify (from March 2019 to August 2019) and SIMPATICO (from July 2016 to February 2019) European projects.
- Qualifications
-
In 2017, she was awarded a PhD degree in Computer Science from the University of Sheffield, under the supervision of Professor Lucia Specia. Her PhD was funded by the EXPERT project (a Marie Curie ITN network).
She also has a MSc and a BSc degree from the University of São Paulo, Brazil (awarded in 2013).
Her MSc supervisor was Dr. Sandra Aluísio and she was a member of the Interinstitutional Center for Computational Linguistics (NILC). Since 2018, she is the Secretary for the European Association for Machine Translation (EAMT).
- Research interests
-
Dr Scarton's research area is Natural Language Processing (NLP). She is particularly interested in text adaptation, machine translation, online misinformation detection and verification, evaluation of NLP task outputs, NLP applied to healthcare and robotics, and dialog systems.
- Publications
-
Books
- Quality Estimation for Machine Translation. Morgan & Claypool Publishers LLC.
- Quality Estimation for Machine Translation.
Journal articles
- The (Un)Suitability of Automatic Evaluation Metrics for Text Simplification. Computational Linguistics, 1-29.
- Special Issue on Disinformation, Hoaxes and Propaganda within Online Social Networks and Media. Online Social Networks and Media, 23.
- Data-driven sentence simplification: Survey and benchmark. Computational Linguistics, 135-187. View this article in WRRO
- Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis. Dementia & Neuropsychologia, 8(3), 227-235. View this article in WRRO
- A Quantitative Analysis of Discourse Phenomena in Machine Translation. Discours(16).
Book reviews
- Horacio Saggion, automatic text simplification. Synthesis lectures on human language technologies, April 2017. 137 pages, ISBN:1627058680 9781627058681. Natural Language Engineering. View this article in WRRO
Conference proceedings papers
- Comparative Analysis of Engagement, Themes, and Causality of Ukraine-Related Debunks and Disinformation (pp 128-143)
- Controlling Formality in Low-Resource NMT with Domain Adaptation and Re-Ranking: SLT-CDT-UoS at IWSLT2022. Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022) (pp 341-350)
- SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding. Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), July 2022 - July 2022.
- GateNLP-UShef at SemEval-2022 Task 8: Entity-Enriched Siamese Transformer for Multilingual News Article Similarity. Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), July 2022 - July 2022.
- Assessing the Representations of Idiomaticity in Vector Models with a Noun Compound Dataset Labeled at Type and Token Levels. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), August 2021 - August 2021.
- AStitchInLanguageModels : dataset and methods for the exploration of idiomaticity in pre-trained language models. Findings of the Association for Computational Linguistics: EMNLP 2021. Punta Cana, Dominican Republic, 7 November 2021 - 11 November 2021. View this article in WRRO
- Linguistic Analysis Model for Monitoring User Reaction on Satirical News for Brazilian Portuguese (pp 313-320)
- ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, July 2020 - July 2020. View this article in WRRO
- EASSE: easier automatic sentence simplification evaluation. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations (pp 49-54). Hong Kong, China, 3 November 2019 - 7 November 2019. View this article in WRRO
- Exploring Gap Filling as a Cheaper Alternative to Reading Comprehension Questionnaires when Evaluating Machine Translation for Gisting. Proceedings of the Third Conference on Machine Translation, Vol. 1 (pp 192-203), 31 October 2018 - 1 November 2018. View this article in WRRO
- Exploring gap filling as a cheaper alternative to reading comprehension questionnaires when evaluating machine translation for gisting. Proceedings of the Third Conference on Machine Translation: Research Papers, October 2018 - October 2018.
- Sheffield Submissions for WMT18 Multimodal Translation Shared Task. Proceedings of the Third Conference on Machine Translation: Shared Task Papers, October 2018 - October 2018.
- Sheffield Submissions for the WMT18 Quality Estimation Shared Task. Proceedings of the Third Conference on Machine Translation: Shared Task Papers, October 2018 - October 2018.
- Learning Simplifications for Specific Target Audiences. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), July 2018 - July 2018.
- Sheffield Submissions for WMT18 Multimodal Translation Shared Task. WMT 2018 - 3rd Conference on Machine Translation, Proceedings of the Conference, Vol. 2 (pp 624-631)
- Sheffield Submissions for the WMT18 Quality Estimation Shared Task. WMT 2018 - 3rd Conference on Machine Translation, Proceedings of the Conference, Vol. 2 (pp 794-800)
- Bilexical Embeddings for Quality Estimation. Proceedings of the Second Conference on Machine Translation, September 2017 - September 2017.
- Improving Evaluation of Document-level Machine Translation Quality Estimation. Proceedings of the 15th Conference of the European Chapter of the
Association for Computational Linguistics: Volume 2, Short Papers, April 2017 - April 2017.
- Findings of the 2016 Conference on Machine Translation. Proceedings of the First Conference on Machine Translation: Volume 2,
Shared Task Papers, August 2016 - August 2016.
- Word embeddings and discourse information for Quality Estimation. Proceedings of the First Conference on Machine Translation: Volume 2,
Shared Task Papers, August 2016 - August 2016.
- SAARSHEFF at SemEval-2016 Task 1: Semantic Textual Similarity with Machine Translation Evaluation Metrics and (eXtreme) Boosted Tree Ensembles. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), June 2016 - June 2016.
- Evaluating Progression of Alzheimer’s Disease by Regression and Classification Methods in a Narrative Language Test in Portuguese (pp 109-114)
- Multi-level Translation Quality Prediction with QuEst++. Proceedings of ACL-IJCNLP 2015 System Demonstrations, July 2015 - July 2015.
- USHEF and USAAR-USHEF participation in the WMT15 QE shared task. Proceedings of the Tenth Workshop on Statistical Machine Translation, September 2015 - September 2015.
- USAAR-SHEFFIELD: Semantic Textual Similarity with Deep Regression and Machine Translation Evaluation Metrics. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), June 2015 - June 2015.
- Discourse and Document-level Information for Evaluating Language Output Tasks. Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, June 2015 - June 2015.
- Findings of the 2015 Workshop on Statistical Machine Translation. Proceedings of the Tenth Workshop on Statistical Machine Translation, September 2015 - September 2015.
- Using Cross-Linguistic Knowledge to Build VerbNet-Style Lexicons: Results for a (Brazilian) Portuguese VerbNet (pp 149-160)
- Exploring Consensus in Machine Translation for Quality Estimation. Proceedings of the Ninth Workshop on Statistical Machine Translation, June 2014 - June 2014.
- Verb Clustering for Brazilian Portuguese (pp 25-39)
- Verb clustering for Brazilian Portuguese. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 8403 LNCS(PART 1) (pp 25-39)
- Computational Processing of the Portuguese Language
- Revisiting the readability assessment of texts in Portuguese. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 6433 LNAI (pp 306-315)
- Text readability analysis with natural language processing tools: The adaptation of coh-metrix metrics for Portuguese. STIL 2009 - 2009 7th Brazilian Symposium in Information and Human Language Technology (pp 53-62)
- View this article in WRRO
Theses / Dissertations
- VerbNet.Br: construção semiautomática de um léxico verbal online e independente de domínio para o português do Brasil.
Preprints
- Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of the COVID-19 Infodemic, Research Square Platform LLC.
- Grants
-
Current Grants
-
VIGILANT: Vital IntelliGence to Investigate ILlegAl DisiNformaTion, Horizon Europe, 11/2022 - 10/2025, £476,955, as PI
-
vera.ai: VERification Assisted by Artificial Intelligence, Horizon Europe, 09/2022 - 08/2025, £776,703, as Co-PI
-
Modelling the link between working memory and language deficits in schizophrenia, Royal Society, 12/2020 - 11/2022, £74,000, as Co-PI
-
Modeling Idiomaticity in Human and Artificial Language Processing, EPSRC, 12/2020 - 11/2024, £446,163, as Co-PI
-