Below the list of projects cofunded by the María de Maeztu program (selected via internal calls, in this link the first one launched at the beginning of the program, and in this link the second one, launched in September 2016).
In addition, the program supported:
- joint calls for cooperation between DTIC and the UPF Department of Experimental and Health Sciences (CEXS), also recognised as a María de Maeztu Unit of Excellence. Here the link to the second call (November 2017). The first call took place in January 2017.
- its own Open Science and Innovation program
- a pilot program to promote educational research collaborations with industry
The detail of the internal procedures for the distribution of funds associated to the program can be found here
Large-Scale Multimedia Music Data
Large-Scale Multimedia Music Data
Large-Scale Multimedia Music Data
Music is a highly multimodal concept, where various types of heterogeneous information are associated to a music piece (audio, musician’s gestures and facial expression, lyrics, etc.). This has recently led researchers to apprehend music through its various facets, giving rise to multimodal music analysis studies. In this project we research on the complementarity of audio and image description algorithms for the automatic description and indexing of user-generated video recordings. We address relevant music information research tasks, in particular music instrument recognition, synchronization of audio / video streams, similarity, quality assessment, structural analysis and segmentation and automatic video mashup generation. In order to do so, we develop strategies to exploit multimedia repository and gather human annotations.
This line of research involves a main collaboration of faculty members from two different groups: Music Information Research Lab at the Music Technology Group (E. Gómez), Image Processing Group (G. Haro). Coloma Ballester acts as advisor to the project and it will exploit synergies with the Information Processing for Enhanced Cinematography - IP4EC Group (Marcelo Bertalmío) as exploiting knowledge of cinematography for video generation.
This work builds upon the PHENICX European project, in which we have worked in research topics related to audio source separation and acoustic rendering, audio alignment/synchronization and music description in the context of music concerts. During the project, we have gathered a set of multimedia recordings from music concerts, with a focus on symphonic music repertoire. From these recordings, we have generated human annotated datasets for different music tasks, including melody extraction, synchronization and instrument detection, following standard strategies for user data gathering (see the full list of publications). At the moment, the project has gathered multimodal recording of music concerts (audio, video, score and user-generated videos), and this data is accessible through REST API infrastructures on the repovizz repository.
From the image processing side, Gloria Haro has worked in the 3D reconstruction from multiple cameras and multi-image fusion for image enhancement; Coloma Ballester has worked on establishing correspondences across different images. Image correspondences are a basic ingredient for object detection in images, computation of image similarities, and also for computing the camera pose and 3D reconstruction.
To know more:
- Presentation of the project at the Data-driven Knowledge Extraction Workshop, June 2016 (Slides)
Related Assets:
-
Zinemanas et al., Visual music transcription of clarinet video recordings trained with audio-based labelled data. CVAVM 2017.
-
New European Training Network on Music Information Retrieval
-
2 year of my PhD in a (very personal) review, by Olga Slizovskaia
-
[MSc thesis] Audio Data Augmentation with respect to Musical Instrument Recognition
-
Slizovskaia O, Gómez E, Haro G. Musical Instrument Recognition in User-generated Videos using a Multimodal Convolutional Neural Network Architecture. ACM International Conference on Multimedia Retrieval (ICMR 2017)
-
Fonseca, E., Gong R., Bogdanov D., Slizovskaia O., Gomez E., Serra X. Acoustic scene classification by ensembling gradient boosting machine and convolutional neural networks. Workshop on Detection and Classification of Acoustic Scenes and Events
-
Slizovskaia O, Gomez E, Haro G. Correspondence between audio and visual deep models for musical instrument detection in video recordings. 18th International Society for Music Information Retrieval Conference (ISMIR17)
-
Pons J, Slizovskaia O, Gong R, Gómez E, Serra X. Timbre Analysis of Music Audio Signals with Convolutional Neural Networks. 25th European Signal Processing Conference (EUSIPCO)
-
RepoVizz - Data repository and visualization tool for structured storage and user-friendly browsing of multi-modal recordings
-
Urbano J, Marrero M. Toward Estimating the Rank Correlation between the Test Collection Results and the True System Performance. International ACM SIGIR Conference on Research and Development in Information Retrieval, 2016
-
[AUDIO] ORCHSET: A dataset for melody extraction in symphonic music recordings
-
Bosch J, Marxer R, Gomez E. Evaluation and Combination of Pitch Estimation Methods for Melody Extraction in Symphonic Classical Music, Journal of New Music Research
-
[AUDIO] MTG - Query-by-Humming (QBH)
-
Bosch J.J., Bittner R.M., Salamon J., Gómez E. A Comparison of Melody Extraction Methods Based on Source-Filter Modelling. Proc. 17th International Society for Music Information Retrieval Conference (ISMIR 2016)
-
[AUDIO AND TEXT] PHENICX-Anechoic: denoised recordings and note annotations for Aalto anechoic orchestral database
-
Mc Fee B, Humphrey EJ, Urbano J. A plan for sustainable MIR evaluation. 17th International Society for Music Information Retrieval Conference (ISMIR2016)
-
Technology and Music: collaboration with Radio Clasica
-
Workshop on music knowledge extraction using machine learning on December 4th
-
Bosch, J. Gómez E. Melody extraction based on a source-filter model using pitch contour selection. 13th Sound and Music Computing Conference (SMC 2016)
-
Slizovskaia O, Gómez E, Haro G. Automatic musical instrument recognition in audiovisual recordings by combining image and audio classification strategies. 13th Sound and Music Computing Conference (SMC 2016)
-
Won M, Chun S, Serra X. Toward Interpretable Music Tagging with Self-Attention. arXiv pre-print
-
Slizovskaia O, Gómez E, Haro G. A Case Study of Deep-Learned Activations via Hand-Crafted Audio Features. The 2018 Joint Workshop on Machine Learning for Music. Joint workshop program of ICML, IJCAI/ECAI, and AAMAS
-
Slizovskaia O, Haro G, Gómez E. Conditioned Source Separation for Music Instrument Performances. arXiv preprint
-
Slizovskaia O, Kim L, Haro G, Gomez E. End-to-End Sound Source Separation Conditioned On Instrument Labels. arXiv pre-print
-
25th anniversary of the Music Technology Group
-
TROMPA: Towards Richer Online Music Public-domain Archives, new H2020 project coordinated at DTIC