Thesis linked to the implementation of the María de Maeztu Strategic Research Program.

Open access to PhD thesis carried out at the Department can be found at TDX

Please visit these pages for information on our PhD, MSc and BSc programs.

 

Back Large-Scale Multimedia Music Data

Large-Scale Multimedia Music Data

Large-Scale Multimedia Music Data

 

Music is a highly multimodal concept, where various types of heterogeneous information are associated to a music piece (audio, musician’s gestures and facial expression, lyrics, etc.). This has recently led researchers to apprehend music through its various facets, giving rise to multimodal music analysis studies. In this project we research on the complementarity of audio and image description algorithms for the automatic description and indexing of user-generated video recordings. We address relevant music information research tasks, in particular music instrument recognition, synchronization of audio / video streams, similarity, quality assessment, structural analysis and segmentation and automatic video mashup generation. In order to do so, we develop strategies to exploit multimedia repository and gather human annotations.

 

This line of research involves a main collaboration of faculty members from two different groups: Music Information Research Lab at the Music Technology Group (E. Gómez), Image Processing Group (G. Haro). Coloma Ballester acts as advisor to the project and it will exploit synergies with the Information Processing for Enhanced Cinematography - IP4EC Group (Marcelo Bertalmío) as exploiting knowledge of cinematography for video generation.

 

This work builds upon the PHENICX European project, in which we have worked in research topics related to audio source separation and acoustic rendering, audio alignment/synchronization and music description in the context of music concerts. During the project, we have gathered a set of multimedia recordings from music concerts, with a focus on symphonic music repertoire. From these recordings, we have generated human annotated datasets for different music tasks, including melody extraction, synchronization and instrument detection, following standard strategies for user data gathering (see the full list of publications). At the moment, the project has gathered multimodal recording of music concerts (audio, video, score and user-generated videos), and this data is accessible through REST API infrastructures on the repovizz repository.

 

From the image processing side, Gloria Haro has worked in the 3D reconstruction from multiple cameras and multi-image fusion for image enhancement; Coloma Ballester has worked on establishing correspondences across different images. Image correspondences are a basic ingredient for object detection in images, computation of image similarities, and also for computing the camera pose and 3D reconstruction.

 

To know more:

 

 

Principal researchers

Emilia Gómez
Gloria Haro

Researchers

Julián Urbano
Olga Slizovskaya