Another kind of Musical Artificial Intelligence is possible

Another kind of Musical Artificial Intelligence is possible

by Xavier Serra, Frederic Font, Martin Rocamora
19.05.2025

Imatge inicial -


Spanish version

For a collaborative, open, and ethical AI built by and for the musical community

We are at a turning point. Generative artificial intelligence is transforming every creative field, and music is no exception. New tools can generate soundscapes, melodies, or full compositions from a simple text prompt. But these technologies are far from neutral: the models behind them are trained on massive datasets — sounds, music, voices — often obtained without the creators’ consent or recognition. In this context, the challenge is not just technical or legal, but deeply cultural and political. What kind of artificial intelligence do we want for music? What roles should musicians, researchers, and creative communities play?

At the Music Technology Group (MTG) of Universitat Pompeu Fabra in Barcelona, in the context of the UPF-BMAT Chair in AI and Music, we propose to think and build a different kind of musical AI: one that is collaborative, open, and ethical. This is not only about criticizing existing practices, but about showcasing and fostering concrete alternatives being developed within academic, community-based, and creative spheres.

Data as Raw Material

Most current generative audio models require massive amounts of data to learn: recordings, compositions, text, metadata. Without this raw material, the model simply cannot function. But sound alone is not enough — it must come with context. Metadata — tags, descriptions, licenses — allow models to understand (or at least represent) fundamental aspects of sound, from its source to its musical function. The origin of these data, and the permissions for their use in training generative models, is a key issue that must be clarified and respected.

Freesound, a platform created by the MTG in 2005, is a pioneering example of how sound can be shared in an open and responsible way. With over 680,000 sounds available under Creative Commons licenses, Freesound has served millions of creators worldwide and has also supported scientific research and AI model training. Importantly, Freesound empowers users to define how their sounds can be used — whether they can be reused for commercial purposes, or restricted to open contexts.

While Freesound is just one example, it represents a radically different approach from that of many tech companies, which extract data in opaque ways, without explicit permission, and take advantage of regulatory gaps. The result is a growing mistrust among artists towards openly sharing their work, and a risk of impoverishing the creative ecosystem. Freesound aims to prove that another kind of data relationship is possible — one that respects the will of creators and promotes responsible use.

Generative Models: Transparency, Reusability, and Community

The impact of generative AI on music is already clear. Some types of production or stock music are being replaced by automatically generated content. But beyond this utilitarian dimension lies a vast creative potential: these models can serve as tools to expand artists’ expressive possibilities, help them explore new sonic textures, transform compositional processes, and experiment with collaborative approaches.

For this to happen, however, the models must be open. This means that their key components — training data, code to train and run the system, and the trained model parameters — must be available in a transparent and reusable way. Only then can the models be adapted to different contexts, audited, improved, or serve as the basis for new creations.

The difference between an open and a closed model is not merely technical — it is political. Platforms like Suno or Udio operate as black boxes, with models trained in secrecy and offered as paid services with no possibility for adaptation. In contrast, open models allow the community to actively participate in their development. These are technologies that are not imposed, but rather built with and for their users.

At the MTG, we actively promote this approach. Our researchers develop models trained on open datasets like Freesound (available via the Essentia library), and support open-source models like RAVE, which enable artists to build and customize their own tools. In parallel, we maintain a leaderboard of open generative music models, which documents the state of the art and fosters collaborative evaluation.

Ethics, Responsibility, and a Shared Vision of the Future

Artificial intelligence originated in academic and collaborative environments, closely linked to free software and open science. However, the recent boom in generative AI has shifted that spirit: large tech corporations now dominate the field, set the rules, and reap the rewards. In response, the music sector faces both a challenge and an opportunity: to reclaim leadership from the public, creative, and collaborative domains.

To move in this direction, a number of concrete actions are needed. First, we must create interdisciplinary spaces where musicians, engineers, and researchers can collaborate from the early stages of tool design. Second, we must promote open-source and open-science projects, supported by public infrastructures that do not rely on corporate cloud services. We also need to develop alternative funding mechanisms — from public grants and subsidies to cooperative or distributed models — to ensure the sustainability of these initiatives. And crucially, we must train cultural agents in the critical and creative use of AI, so they are not merely passive consumers of predefined tools, but active protagonists in shaping the future.

All of this must be grounded in a clear ethical foundation: a shared vision in which technological innovation serves the common good, cultural diversity, and collective creativity. An AI that does not replace creators, but empowers them. That does not compete with artistic works, but opens up new spaces for collaboration and experimentation.

Towards a Musical Artificial Intelligence by and for All

At the MTG, we have been working for years to build this alternative. Not as a distant utopia, but as a concrete and ongoing practice — through our platforms, our models, and our collaborations. We know it is possible to develop a musical AI that respects creators, places them at the center, and gives back to the community what it takes from it.

But this journey cannot be undertaken alone. It requires the active participation of musicians, producers, researchers, collectives, and institutions. We need to imagine together what kind of technologies we want for the future of music. And above all, we need to act: upload sounds, tag them, share them, contribute to model development, demand transparency, create new tools, educate, regulate, collaborate.

Because the future of music with AI should not be defined solely by what is technically feasible or economically profitable — but by what is fair, sustainable, and musically meaningful.

Another kind of AI is possible. And it needs all of us.

 

Cátedra UPF-BMAT en Inteligencia Articial y Música (TSI-100929-2023-1). Project funded by Secretaría de Estado de Digitalización e Inteligencia Artificial, the European Union-Next Generation EU, and by BMAT Music Innovators, the Music Operating System