Multilingual Document Generation
One of our major research focus is Multilingual and Multimodal Natural Text Generation (MMNTG). We address all major aspects of MMNTG: discourse structure and layout planning, mode selection strategies, sentence planning, and surface generation. Our linguistic framework is the Meaning-Text Theory. The goal is the development of large coverage open source MTT-based MMNT generators in both the rule-based and the statistical paradigm.
Computational Lexicology / Lexicography
Within the field of Computational Lexicology / Lexicography, we are working, first of all, with collocations as idiosyncratic binary word co-occurrences. We are interested in the identification of collocations in text corpora and in their semantic classification. Furthermore, we investigate (in collaboration with the DICE group of the University of La Coruña) the problems of language learners with collocations and develop techniques for automatic detection, classification, and correction of collocation errors in the writings of the learners.
Language-oriented Machine Learning
Within this area, we study the application of various paradigms of ML (supervised, unsupervised and reinforcement learning) to the acquisition of linguistic resources.
Sentiment Analysis and Opinion Mining
We work on multilingual (English/Spanish) sentiment analysis using lexical resources focusing on the use of machine learning technology over linguistic and semantic features for classification of opinionated texts. A new line of research is addressing the issue of irony identification in social networks and extending the irony detection model to humor and sarcasm.
Syntactic Data-driven Parsing
We research statistical approaches to phrase-structure and dependency parsing at various levels of linguistic abstraction – surface-syntactic, deep-syntactic and semantic.
Author profiling targets the automatic recognition of distinctive characteristics of the author of a given writing – including, age, native tongue, social and educative background, etc. We investigate novel supervised and non-supervised techniques for various applications.
Linguistic Resources Acquisition
Within this area, we work on the annotation of corpora for data-driven NLP, creation of lexica and study of the application of various paradigms of ML (supervised, unsupervised and reinforcement learning) to the acquisition of linguistic resources.