New project at the MTG: IMPA Multimodal AI for Audio Processing
The project started in September 2024 and is funded by the Ministry of Science, Innovation and Universities of the Spanish Government, the Agencia Estatal de Investigación (AEI) and co-financed by the European Union
The audio industry, which encompasses the fields of music, video games, audiovisual production, podcasts, audiobooks, and various other creative industries, is experiencing significant growth both in Spain and internationally. This growth is primarily driven by the development of digital platforms and artificial intelligence (AI). Through advanced signal processing algorithms and machine learning, AI has radically transformed the way we interact with sound, ranging from improving audio quality and noise cancellation to voice recognition and music generation. However, the disruptive potential of AI, while bringing about significant advancements, also poses important challenges that
need to be addressed.
To fully address all the challenges, we will adopt a cross-cutting approach that considers the ethical, legal, social, economic, and cultural aspects of AI development in the audio sector at every phase of the project. The project focuses on the development of multimodal AI methodologies of relevance to the audio industry, addressing different but complementary research areas.
The type of audio processing methodologies to be developed in this project will enable many innovative applications for the digital transformation of areas such as artistic creation, music distribution, music education, cultural preservation, or health and well-being. The
contributions will be related to the development of automated processes for generation, search, discovery, and re-use of audio content.
Specifically, within the context of current AI methodologies, the research contributions will be related to:
- Development of methodologies for the curation of multimodal datasets
- Development of pre-training models for audio representation
- Development of task-specific models of relevance for the audio sector
- Development of evaluation metrics of relevance to the defined tasks
- Development of prototypes for each of the defined tasks.
Starting date: September 1st, 2024
Duration: 4 years
PIs: Xavier Serra, Rafael Ramírez, Sergi Jordà, Martín Rocamora, Dmitry Bogdanov, Frederic Font
IMPA project is funded by: MCIU/AEI/10.13039/501100011033/FEDER, UE