Simplifying Text with SIMPATICO
- Researchers at the University of Sheffield are building an automatic tool which simplifies online text making it more accessible to readers.
- The technology will help non-native speakers, the elderly, low literacy individuals and those with dyslexia by making information on public services websites easier to understand.
In order to reduce cost and speed up access to public services, local councils have been making more information and processes available online.
However, this assumes that users are digitally literate and can process and act upon the digital content they require. Non-native speakers, the elderly, low-literacy individuals and those who suffer from dyslexia, among others, may find it difficult to understand the complex language often used on council and other government web pages.
The SIMPATICO project (a European consortium including three universities, four companies and three public administrations) addresses this challenge by automatically simplifying the words and structure of the information published on these pages.
In the UK, a team of researchers at the Natural Language Processing (NLP) Research Group in the Department of Computer Science at University of Sheffield works to build automatic tools that can be embedded into a website to simplify textual content for its users. Some of these tools are being tested on the Sheffield City Council website.
The SIMPATICO platform simplifies texts for four languages: English, Spanish, Italian and Galician. The Sheffield team coordinates this task force, where two main approaches were developed:
1 - Lexical Simplification: replaces complex words and phrases with simpler alternatives.
To replace complex words and phrases, the underlying technology extracts synonyms for a complex word from dictionaries and subsequently replaces that word with the synonym that occurs most frequently in a large text collection.
While this strategy can be very effective, dictionaries are not available for every language and may have low coverage. Moreover, the most frequent synonym may not be correct in the context of the original text.
SIMPATICO employs more sophisticated machine learning techniques that transform words and phrases into meaningful numerical representations, and then uses these representations to produce synonyms that are ranked by their simplicity in a specific context. These representations can be generated using raw text and only requires a large text collection, which is easily achievable for languages that do not have synonym dictionaries, such as Galician.
2 - Syntactic (sentence) Simplification: rewrites complex sentences to make them easier to understand.
This can be done using handcrafted rules and/or machine learning. For example, a rule-based system applies handcrafted simplification rules to perform transformations to certain sentences. This includes splitting longer sentences in two for easier comprehension as well as reordering words so that the sentence sounds more natural.
However, the application of general rules can lead to incorrect simplifications and produce a rather artificial output. Deep learning is used to create systems that learn how to simplify sentences by looking at examples. These systems are built using data that contains original sentences paired with their simplified versions, which were created by humans.
This strategy does have its limitations. For one, the amount of data required to train these systems is not available for most languages. Moreover, machine learning techniques can make mistakes, thus producing sentences that sometimes contain grammar and meaning errors, for example removing important information from a sentence. Therefore, a confidence model component is also important in this case.
Although SIMPATICO has made significant impact when tackling automatic simplification, important challenges still exist, the most important of which is learning how to generate simplifications that are customised to the needs of specific users. Different users have different needs in terms of text simplification. For instance, a text that may be complex for someone suffering from dyslexia may be considered simple by a non-native speaker and vice-versa. Non-native speakers with different native languages may also have different needs.
The ultimate goal of this project is to create a platform that continuously adapts itself to the needs of individual users. This will use previous interactions between individual citizens and the SIMPATICO platform to personalise simplifications. This requires users to interact with early versions of the SIMPATICO platform so that enough data can be collected.
About the NLP group
The Natural Language Processing Research Group is part of the Department of Computer Science at the University of Sheffield, UK. Established in 1993, it has a strong global reputation. It has extensive experience in the fields of NLP infrastructures, text adaptation (including text simplification), machine translation, machine learning, information extraction, dialogue systems, lexicography, and social media.
(https://www.simpatico-project.eu/): SIMplifying the interaction with Public Administration Through Information technology for Citizens and cOmpanies” (EU H2020, grant agreement 692819). The goal of SIMPATICO project is to improve the experience of citizens and companies in their daily interactions with the local councils by providing a personalized delivery of e-services based on advanced cognitive system technologies. This will be achieved through a solution based on the interplay of language processing, machine learning and the wisdom of the crowd to change for the better the way citizens interact with the Local Councils. Partners in SIMPATICO: