AI models for Sound and Music Applications

Starting date: December 2022

Project duration: 2 years

Project reference: PDC2022-133319-I00

Funded by: MCIN/AEI/10.13039/501100011033 and for the European Union "NextGeneration EU"/PRTR .


In the last decade, the technologies resulting from Artificial Intelligence research, especially machine learning, have revolutionized the field, bringing exploitation opportunities and challenges that require expertise and job profiles that most companies do not have and are struggling to cover. These companies cannot develop their own AI technologies and require technological solutions that can be easily integrated into their products and services and that can be personalized to their use cases and clients.

In the Musical-AI project (Ref: PID2019-111403GB-I00) we have been advancing on a number of research topics related to large-scale corpora and data-driven machine learning, with applications to sound and music. We are developing AI-based data models to support learning, creation, production, distribution, and many other sound and music exploitation tasks that can serve as the basis for specific solutions covering most of the needs that the different sectors have.

The current landscape of the software tools and machine learning models based on the outcomes of such research and available for sound and music applications is highly fragmented and incomplete, with the majority of pieces of software lacking portability and featuring low technology readiness levels.
In this proposal, we address the technical and conceptual gap between the output of our research and the technology packages, frameworks, and APIs that may be directly used by the industry. Specifically, our goal is to prepare our technology for business-tobusiness (B2B) exploitation in diverse sectors working with sound and music. To this end, we will build on our current experience of licensing Essentia algorithms and models, and we will consider new exploitation use cases to package our technologies into new tools and frameworks. It will help provide quick and flexible access to state-of-the-art technologies to a large variety of companies of any size, allowing them to create new applications and find solutions to their problems using our technology in an agile and rapid fashion.

In contrast to existing approaches to the technology transfer of sound and music AI models, we propose to address the limitations of closed commercial services and develop a solution suitable for B2B commercialization while keeping transparency, customization, scalability, continuous integration, and data licensing regulations as priorities. This includes flexibility in customization to different data with different licenses and the possibility of clients to publicly access and assess components of our technology. To that end, during the project, we will: compare existing commercial solutions; collect datasets and train machine learning models based on these datasets; develop software tools that will streamline the process of building new models and deploying them on different devices and platforms; validate the developed models to produce metrics which are relevant and reliable in an industry context; define exploitation strategies for the technologies developed within the project.