Below the list of projects cofunded by the María de Maeztu program (selected via internal calls, in this link the first one launched at the beginning of the program, and in this link the second one, launched in September 2016).
In addition, the program supported:
- joint calls for cooperation between DTIC and the UPF Department of Experimental and Health Sciences (CEXS), also recognised as a María de Maeztu Unit of Excellence. Here the link to the second call (November 2017). The first call took place in January 2017.
- its own Open Science and Innovation program
- a pilot program to promote educational research collaborations with industry
The detail of the internal procedures for the distribution of funds associated to the program can be found here
Multimodal annotation for expressive communication
Multimodal annotation for expressive communication
Multimodal annotation for expressive communication
This research line Multimodal annotation for expressive communication builds upon the results and experience gained in the H2020 KRISTINA (Knowledge-Based Information Agent with Social Competence and Human Interaction Capabilities) project in the areas of natural language processing, computer vision, and virtual character design research, with the participation of 3 of DTIC’s research groups: Natural Language Processing (TALN, project coordinator, http://www.taln.upf.edu/), Cognitive Media Technologies (CMTech, http://cmtech.upf.edu/), and Interactive Technologies (http://gti.upf.edu/).
The research and development activities within KRISTINA are expected to bring deeper understanding and know-how in natural human-computer interaction and affective computing, with the advantage of being embedded in a strong consortium that can boost international visibility and impact of the results.
The DTIC groups are responsible for the activities related to computer vision and low level facial expression analysis, language analysis and expressive speech synthesis and virtual character design and realization. In particular:
-
We are building on our previous expertise in algorithms developed for static image processing and extending these to dynamic processing, where our expertise is considerably more recent.
-
We are extending our research oriented natural language processing toolkit. The modules of language parsing and generation that are being developed further in KRISTINA will imply a significant increase of the value of the toolkit. The individual modules of the toolkit will be made available through software libraries to the community.
-
We are integrating the KRISTINA development into our VR toolkit, adding to it such important features as natural facial modelling, including synchronized speech – lip movements.
The availability of sufficiently large volumes of training material (or ground truth) is indispensable for all areas of data-driven scientific research. To function as “training material”, data must be annotated. Specifically, the activities within this research line require large amounts of this training material. In other words, specific features identified in the data (text corpora, videos, time series, etc.) as characteristic (and thus suitable to capture distinctive patterns in the data) must be highlighted. We are currently working on the design of coherent guidelines that take into account and synchronize all communication modi (gestures, mimics, voice) and annotate the material following such guidelines, in order to provide valuable resources not only for this research line and, in particular, for the current and future research of the three involved DTIC groups, but also for the multimodal communication research community in general.
To know more:
- Wanner L. et al. (2017) KRISTINA: A Knowledge-Based Virtual Conversation Agent. In: Demazeau Y., Davidsson P., Bajo J., Vale Z. (eds) Advances in Practical Applications of Cyber-Physical Multi-Agent Systems: The PAAMS Collection. PAAMS 2017. Lecture Notes in Computer Science, vol 10349. Springer, Cham
- Presentation of the project at the Data-driven Knowledge Extraction Workshop, June 2016 (Slides and information on KRISTINA, an EU funded research project, which aims at developing technologies for a human-like socially competent and communicative agent. It runs on mobile communication devices and serves for migrants with language and cultural barriers in the host country)
Principal researchers
Leo WannerResearchers
Xavier Binefa Josep Blat Mónica Domínguez Alun Evans Mireia Farrús Federico Sukno Jens Grivolla Beatriz FisasRelated Assets:
-
Sukno FM, Domínguez M, Ruiz A, Schiller D, Lingenfelser F, Pragst L, Kamateri E, Vrochidis S. A Multimodal Annotation Schema for Non-Verbal Affective Analysis in the Health-Care Domain. MARMI'16: 1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction Proceedings
-
Special Session on Multimedia and Multimodal Interaction for Health and Basic Care Applications at MMM2017
-
Fernandez-Lopez A, Martinez O, Sukno FM. Towards estimating the upper bound of visual-speech recognition: The Visual Lip-Reading Feasibility Database. In Proc. 12th IEEE Conference on Automatic Face and Gesture Recognition, Washington DC, USA, in press, 2017.
-
Ruiz A, Martinez O, Binefa X, Sukno FM. Fusion of Valence and Arousal Annotations through Dynamic Subjective Ordinal Modelling. In Proc. 12th IEEE Conference on Automatic Face and Gesture Recognition, Washington DC, USA, in press, 2017
-
Praat on the Web
-
Dmytro Derkach, Adrià Ruiz and Federico Sukno (CMTech) won the FG2017 Head Pose Estimation Challenge at FG2017
-
Towards Intelligible and Conversational Speech Synthesis Engines
-
Dominguez M, Farrus M, Wanner L. A Thematicity-based Prosody Enrichment Tool for CTS. Interspeech 2017.
-
Derkach D, Ruiz A, Sukno FM. Head Pose Estimation Based on 3-D Facial Landmarks Localization and Regression. FG 2017 Workshop on Dominant and Complementary Emotion Recognition Using Micro Emotion Features and Head-Pose Estimation, Washington DC, USA, in press, 2017.
-
Fernandez-Lopez A, Sukno FM. Automatic Viseme Vocabulary Construction to Enhance Continuous Lip-reading. 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017)
-
Derkach D, Sukno FM. Local Shape Spectrum Analysis for 3D Facial Expression Recognition. Proc. 12th IEEE Conference on Automatic Face and Gesture Recognition, Washington DC, USA, in press
-
[PhD thesis] The Information Structure - Prosody Interface: On the role of hierarchical thematicity in an empirically-grounded model
-
[AUDIOVISUAL] The Visual Lip-Reading Feasibility Database
-
Domínguez, M, Burga, A, Farrús, M, Wanner L. Compilation of Corpora for the Study of the Information Structure-Prosody Interface. Language Resources and Evaluation Conference (LREC2018)
-
Derkach D, Ruiz A, Sukno F. 3D Head Pose Estimation Using Tensor Decomposition and Non-linear Manifold Modeling. 2018 International Conference on 3D Vision (3DV)
-
Derkach D, Sukno FM. Automatic local shape spectrum analysis for 3D facial expression recognition. Image and Vision Computing
-
Dominguez M, Burga A, Farrús M, Wanner L. Towards expressive prosody generation in TTS for reading aloud applications. Proc. IberSPEECH 2018
-
Fernandez-Lopez A, Sukno FM. Survey on automatic lip-reading in the era of deep learning. Image and Vision Computing
-
[PhD thesis] Incorporating Prosody into Neural Speech Processing Pipelines. Applications on automatic speech transcription and spoken language machine translation
-
Morales A, Piella G, Martínez O, Sukno FM. A quantitative comparison of methods for 3D face reconstruction from 2D images. 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition
-
Fernandez-Lopez A, Martinez O, Sukno FM. Towards Estimating the Upper Bound of Visual-Speech Recognition: The Visual Lip-Reading Feasibility Database. Proc. 12th IEEE Conference on Automatic Face and Gesture Recognition, Washington DC, USA, in press.
-
Dominguez M , Farrus M , Wanner L. Thematicity-based Prosody Enrichment for Text-to-Speech Applications. 9th International Conference on Speech Prosody 2018
-
Facial features, illnesses and computer vision