Back AES Europe 2024 Heyser Lecture by Xavier Serra

AES Europe 2024 Heyser Lecture by Xavier Serra

The Heyser Series is an endowment for lectures that bring eminent individuals in audio engineering and related fields to AES conventions


Imatge inicial

The AES Conventions are the largest gatherings of audio professionals in the world. The  comprehensive and unparalleled tech program of workshops, tutorials and  research papers, along with the trade show floor, provide attendees with a wealth of learning, networking and business opportunities. Xavier Serra will give a lecture at the 156th Audio Engineering Convention (AES Europe 2024) that will take place at the Universidad Politécnica de Madrid from June 15 to 17. The talk will go through his research history, highlighting some of the most relevant developments and giving a view on past and current trends in his area of research.


From Audio Processing to Music Understanding – a Research Journey

Dr. Serra’s PhD research, carried out in the 1980s, focused on modeling complex sounds. By using spectral analysis and synthesis techniques we developed a deterministic plus stochastic model able to obtain sonically and musically meaningful audio parameterizations. That research found practical applications in synthesizing and transforming a wide variety of sounds, including the human singing voice.


As a natural progression of that research, in the 1990s, it became interesting and relevant to analyze collections of sounds, thus aiming to describe and model the relationships between sound entities. To accomplish this, we incorporated machine learning methodologies to complement the signal processing approaches used until then. This research was the beginning of the Music Information Retrieval (MIR) field, within which the aim is to analyze and describe music collections.


In the 2000s, with the growth of the Web, scaling these analysis technologies gained importance. In our research group we embarked on curating and leveraging large audio collections with which to conduct research in this direction and develop efficient software tools supporting music search, retrieval, and recommendation systems many of which gained relevance for the music industry.


As web-based music applications became globalized, it became clear that the existing research approaches and systems had important cultural biases. Thus, in the 2010s, we started to work on refining music description methodologies, integrating domain knowledge from diverse music traditions. This research led to the development of culture-specific audio signal processing and machine learning approaches to analyze music signals. These methodologies are of major relevance in the field of Computational Musicology, putting the emphasis on the music understanding perspective.


In recent years, the emergence of deep learning techniques and large AI models based on self-supervised approaches has reshaped the research landscape. Presently, we are working on the development of large AI models trained on huge amounts of diverse multimodal music data that can capture the complex relationships that make up music. From those models, we can then develop smaller task-specific models to support applications related to the creation, production, distribution, access, analysis, or enjoyment  of music. The challenge here is how to drive our research from an ethical perspective, putting the musician at the center while supporting all the stakeholders of the music sector.


June 16 at 5pm





SDG - Sustainable Development Goals:

Els ODS a la UPF