IPCV Image Processing and Computer Vision Group (UPF)

In the Intelligent Multimodal Vision Analysis (IMVA) group (previously, Image Processing and Computer Vision group), we investigate the automatic analysis and understanding of visual content and to address real-world problems and applications, often involving also modalities beyond vision, such as audio, natural language, ultrasound or magnetic resonance. We develop model-based and data-driven (deep learning) approaches, algorithms and innovative digital technologies, together with their theoretical analysis. The applications include: accessibility of people with visual, hearing or reading impairment to multimedia content and may contribute to the development of more accessible devices; the analysis of the human face both in terms of its morphology and its dynamics (e.g. expressions and emotions) with enormous potential for disciplines such as psychology, linguistics, neuroscience, health or developmental biology; the separation of the different audio sources that make up the audio mixture of a particular video; the understanding and the exploitation of the correlations and complementations among different modalities; etc

Link to some of our reserch projects on github and on hugging face. More software and demos also here.

Department of Engineering

Edifici Tànger (campus del Poblenou)
Tànger, 122-140
08018 Barcelona