List of results published directly linked with the projects co-funded by the Spanish Ministry of Economy and Competitiveness under the María de Maeztu Units of Excellence Program (MDM-2015-0502).

List of publications acknowledging the funding in Scopus.

The record for each publication will include access to postprints (following the Open Access policy of the program), as well as datasets and software used. Ongoing work with UPF Library and Informatics will improve the interface and automation of the retrieval of this information soon.

The MdM Strategic Research Program has its own community in Zenodo for material available in this repository   as well as at the UPF e-repository   



Back Large-Scale Multimedia Music Data

Large-Scale Multimedia Music Data

Large-Scale Multimedia Music Data


Music is a highly multimodal concept, where various types of heterogeneous information are associated to a music piece (audio, musician’s gestures and facial expression, lyrics, etc.). This has recently led researchers to apprehend music through its various facets, giving rise to multimodal music analysis studies. In this project we research on the complementarity of audio and image description algorithms for the automatic description and indexing of user-generated video recordings. We address relevant music information research tasks, in particular music instrument recognition, synchronization of audio / video streams, similarity, quality assessment, structural analysis and segmentation and automatic video mashup generation. In order to do so, we develop strategies to exploit multimedia repository and gather human annotations.


This line of research involves a main collaboration of faculty members from two different groups: Music Information Research Lab at the Music Technology Group (E. Gómez), Image Processing Group (G. Haro). Coloma Ballester acts as advisor to the project and it will exploit synergies with the Information Processing for Enhanced Cinematography - IP4EC Group (Marcelo Bertalmío) as exploiting knowledge of cinematography for video generation.


This work builds upon the PHENICX European project, in which we have worked in research topics related to audio source separation and acoustic rendering, audio alignment/synchronization and music description in the context of music concerts. During the project, we have gathered a set of multimedia recordings from music concerts, with a focus on symphonic music repertoire. From these recordings, we have generated human annotated datasets for different music tasks, including melody extraction, synchronization and instrument detection, following standard strategies for user data gathering (see the full list of publications). At the moment, the project has gathered multimodal recording of music concerts (audio, video, score and user-generated videos), and this data is accessible through REST API infrastructures on the repovizz repository.


From the image processing side, Gloria Haro has worked in the 3D reconstruction from multiple cameras and multi-image fusion for image enhancement; Coloma Ballester has worked on establishing correspondences across different images. Image correspondences are a basic ingredient for object detection in images, computation of image similarities, and also for computing the camera pose and 3D reconstruction.


To know more:



Principal researchers

Emilia Gómez
Gloria Haro


Julián Urbano
Olga Slizovskaya