Sanja Štajner, Horacio Saggion. March 13th 2019 Sanja Štajner, Horacio Saggion. March 13th 2019

Tutorial - Data-driven text simplification 

  • Date: March 13th 15:30 - 17:30
  • Location: Poblenou Campus, UPF (Roc Boronat 138, Barcelona). Room 55.309
  • Tutorial presenters: Sanja Štajner, Horacio Saggion 
  • Free registration HERE

Material:

Slides available in Zenodo ( https://doi.org/10.5281/zenodo.2593328 )

This tutorial is an updated version of the tutorial presented at COLING2018.

Recording of the tutorial below.

References:

Sanja Štajner, Horacio Saggion. Data-Driven Text Simplification. Proceedings of the 27th International Conference on Computational Linguistics.

Abstract

In this tutorial, we aim to provide an extensive overview of automatic text simplification systems proposed so far, the methods they used and discuss the strengths and shortcomings of each of them, providing direct comparison of their outputs. We aim to break some common misconceptions about what text simplification is and what it is not, and how much it has in common with text summarisation and machine translation. We believe that deeper understanding of initial motivations, and an in-depth analysis of existing TS methods would help researchers new to ATS propose even better systems, bringing fresh ideas from other related NLP areas. We will describe and explain all the most influential methods used for automatic simplification of texts so far, with the emphasis on their strengths and weaknesses noticed in a direct comparison of systems outputs. We will present all the existing resources for TS for various languages, including parallel manually produced TS corpora, comparable automatically aligned TS corpora, paraphrase- and synonym- resources, TS-specific sentence-alignment tools, and several TS evaluation resources. Finally, we will discuss the existing evaluation methodologies for TS, and necessary conditions for using each of them.

Topics covered in the tutorial:

  • Motivation for ATS:
    • Problems for various NLP tools and applications
    • Reading difficulties of various target populations
  • TS projects:
    • Short description of TS projects (PSET, Simplext, PorSimples, FIRST, SIMPATICO)
    • Discussion about the TS projects (what they share and in what they differ)
  • TS resources:
    • Resources for lexical simplification
    • Resources for lexico-syntactic simplification
    • Resources for languages other than English
  • Evaluation of TS systems:
    • Automatic evaluation
    • Human evaluation
  • Comparison of non-neural TS approaches:
    • Rule-based systems
    • Data-driven systems (supervised and unsupervised)
    • Hybrid systems
    • Semantically-motivated ATS systems
  • Neural text simplification (NTS) systems:
    • State-of-the-art neural text simplification (NTS) systems
    • Direct comparison of NTS systems
    • Strengths and weaknesses of NTS systems
  • NTS systems vs. previously proposed (non-neural) ATS systems (direct comparison)
  • Current challenges in ATS