An important characteristic of our research is that it is multilingual. We are working with a number of languages, including Arabic, Catalan, English, French, German, Greek, Italian, and Spanish. Nearly all outcomes of our research are implemented as operational modules of larger systems developed in the scope of European projects.
Our internal working platform, in which all modules are integrated, is based on UIMA.
Natural Language Text Generation
Natural Language Text Generation (NLG) is the oldest research line of our group. In the context of NLG, we are active, first of all, in the area of content selection and linguistic generation. We developed a graph transducer-based framework for linguistic generation (MATE), which is grounded in the multilayer linguistic model of the Meaning-Text Theory (MTT), and compiled large scale lexical and grammatical generation resources for several languages for this framework.
We are also working on deep learning-based sentence generation and regularly co-organize shared tasks on the topic (cf. MSR 18 and MSR 19).
Deep language analysis
In the context of Deep Language Analysis, we are doing research, in particular, in three areas: (i) syntactic parsing for downstream applications, (ii) derivation of semantic respectively conceptual structures that can be mapped onto ontological representations (e.g., in RDF format) in order to populate large scale ontologies; (iii) concept extraction.
For applied research in language analysis, we developed a robust analysis pipeline that includes syntactic and deep-syntactic dependency parsing, named entity recognition, word sense disambiguation and entity linking.
Computational Lexicography and Lexicology
Within the field of lexicography and lexicology, we are interested, first of all, in the investigation of problems related to idiosyncratic word co-occurrences (or collocations): their theoretical description, semantically-oriented classification, automatic recognition in text corpora and sound representation in dictionaries - for both human and machine use. In this context, we are also interested in problems related to collocations in computer assisted language learning (CALL).
We carry out research on several meta-characteristics of textual material, including, e.g., text author profiling and identification and text classification. In the context of text classification, we currently focus on aspect-oriented hate speech detection and the related problem of the annotation of datasets with hate speech categories.
Communicative Structure and Thematic Progression
We research the influence of the Communicative (or Information) Structure on the syntactic structure during text generation, interpretation of the intention of the speaker, and it influence on the prosody of the generated speech. We furthermore study the use of thematic progression strategies as defined by Daneš in monologues and dialogues and apply the outcome of these studies to different NLP applications.