Research

Most of the research I supervise at the MTG focuses on the understanding of sound and music signals by combining signal processing and machine learning methods. We work both on data-driven methodologies, in which the development and use of large data collections is a fundamental aspect, and on knowledge-driven approaches, in which domain knowledge of the problem to be addressed is needed. Combining these research approaches, we are able to tackle practical problems related to automatic sound and music description, music exploration and recommendation, and tools for supporting music education.

Related to data collections, we maintain a number of corpora of relevance both for research and practical applications. Especially significant are Freesound, a collaborative database of Creative Commons licensed sounds, AcousticBrainz, a platform for crowdsourcing analysis data of published music recordings, and Dunya, which comprises collections of audio recordings plus complementary information of various musical traditions. From these corpora, we have been creating datasets for specific research tasks, for example, supported by our Freesound Annotator, we are developing the FSD dataset, which is used for sound classification tasks. We have also created datasets, like the CompMusic datasets, to work on specific music analysis problems. We also promote research collaborations by organizing open machine learning challenges like the Freesound Audio Tagging 2019 or the Emotion and Theme recognition in music.

In our audio signal processing and machine learning work, we develop methodologies and algorithms for the analysis of sound and music signals for a wide variety of tasks, going from low-level characterization of audio signals to high-level sound and music classification tasks. Most of the resulting algorithms become part of our software library Essentia, which is extensively used both in research and commercial applications.

We have been studying music repertories from a variety of music traditions, like from India (Hindustani and Carnatic), Turkey (Turkish-makam), Maghreb (Arab-Andalusian), China (Beijing Opera), and more recently from Jazz. We have focused on the study of core musical dimensions, like melody, rhythm, and harmony (for the case of Jazz), which requires musicological knowledge of the music being studied and the involvement of musicians. This work has resulted in a number of tools and exploitable technologies, like Dunya or Riyaz (being exploited by our spin-off company MusicMuni Labs).

We are developing technologies to support music education. For example, we are developing MusicCritic, a service to facilitate the assessment of exercises which is able to evaluate the intonation, rhythm, and timbre of musical performances. This technology is already being exploited in an online course on Hindustani music.

We also carry projects in collaboration with industrial partners, for example for improving and adapting our existing technologies (like Essentia) to their particular industrial applications. We are interested in long-lasting and mutually beneficial partnerships.

Xavier Serra

Research