List of results published directly linked with the projects co-funded by the Spanish Ministry of Economy and Competitiveness under the María de Maeztu Units of Excellence Program (MDM-2015-0502).

List of publications acknowledging the funding in Scopus.

The record for each publication will include access to postprints (following the Open Access policy of the program), as well as datasets and software used. Ongoing work with UPF Library and Informatics will improve the interface and automation of the retrieval of this information soon.

The MdM Strategic Research Program has its own community in Zenodo for material available in this repository   as well as at the UPF e-repository   

 

 

Back Štajner S, Saggion H, Ponzetto SP. Improving Lexical Coverage of Text Simplification Systems for Spanish. Expert Systems with Applications

 

Štajner S, Saggion H, Ponzetto SP. Improving Lexical Coverage of Text Simplification Systems for Spanish. Expert Systems with Applications

The current bottleneck of all data-driven lexical simplification (LS) systems is scarcity and small size of parallel corpora (original sentences and their manually simplified versions) used for training. This is especially pronounced for languages other than English. We address this problem, taking Spanish as an example of such a language, by building new simplification-specific datasets of synonyms and paraphrases using freely available resources. We test their usefulness in the LS task by adding them, in various combinations, to the existing text simplification (TS) training dataset in a phrase-based statistical machine translation (PBSMT) approach . Our best systems significantly outperform the state-of-the-art LS systems for Spanish, by the number of transformations performed and the grammaticality, simplicity and meaning preservation of the output sentences. The results of a detailed manual analysis show that some of the newly built TS resources, although they have a good lexical coverage and lead to a high number of transformations, often change the original meaning and do not generate simpler output when used in this PBSMT setup. The good combinations of these additional resources with the TS training dataset and a good choice of language model, in contrast, improve the lexical coverage and produce sentences which are grammatical, simpler than the original, and preserve the original meaning well.

https://doi.org/10.1016/j.eswa.2018.08.034