Principal Investigator: Leo Wanner


The TALN group, Tractament Automàtic del Llenguatge Natural (Natural Language Processing) focuses its research on selected aspects of Natural Language Processing such as Multilingual Natural Language Generation, Summarization, Language-oriented Machine Learning, and Computational Lexicology. Special focus is put on application-oriented NLP and synergies with other fields of Computer Science and Artificial Intelligence.


Main Research interests:

Multilingual Document Generation

One of TALN's major research focus is Multilingual and Multimodal Natural Text Generation (MMNTG). All major aspects of MMNTG are addressed: discourse structure and layout planning, mode selection strategies, sentence planning, and multilingual grammar and morphological resource development. Our linguistic framework is the Meaning-Text Theory. The goal is the development of a large coverage open source MTT-based MMNT-Generator. Currently, we tackle, in cooperation with external collaborators, Catalan, English, Finnish, French, German, Polish, Portuguese, Swedish and Spanish.


Computational Lexicology / Lexicography

Within Computational Lexicology / Lexicography, we are working, first of all, on the problem of automatic recognition and classification of collocations in text corpora. For the time being, Spanish and German has been focused on. A further topic in this area is the problem of the organization of lexical resources and collocation dictionaries.


Language-oriented Machine Learning

Within this area, we study the application of various paradigms of ML (supervised, unsupervised and reinforcement learning) to the acquisition of linguistic resources.


Automatic Text Summarization and Abstracting

We develop robust multilingual single-document and multi-document summarization technology. We have a set of resources to create summarization systems adapted to different needs and domains, this is being developed in the SUMMA system. Other areas of research we are working on are abstractive text summarization and summarization evaluation.


Sentiment Analysis and Opinion Mining

We work on multilingual (English/Spanish) sentiment analysis using lexical resources. We focus on the use of machine learning technology over linguistic and semantic features for classification of opinionated texts. We also apply summarization techniques as filtering for opinion classification.