Seminar by Stefan Lattner on the Advancements in Generative Music Models at Sony CSL Paris
Thursday, July 18 at 4pm at UPF Campus Poblenou room 55.410, and online
16.07.2024
Title: Advancements in Generative Music Models at Sony CSL Paris
Stefan Lattner (research leader at Sony CSL Paris)
Room 55.410 and online on https://zoom.us/j/ 5911355273
Abstract:
In my talk, I will present the recent work of Sony CSL Paris revolving around music generation. Firstly, I will introduce Diff-A-Riff, a latent diffusion music accompaniment generation model that is particularly designed for music production use cases. It produces high-quality 48kHz audio stems that fit a given musical context and can be controlled by either audio references or text input.
Diff-A-Riff is relatively lightweight because it builds upon Music2Latent, a Consistency Autoencoder that achieves a 64x compression ratio for musical audio. The generative decoder of Music2Latent produces high-quality reconstructions by filling in information that gets lost during compression.
Furthermore, I will present Stem-JEPA, a self-supervised learning model that is trained to assess the compatibility between stems and mixes, potentially extending the range of data that can be used to train Diff-A-Riff.
Finally, I will introduce a method to control the information content of generated symbolic music using beam search, which is applicable to any autoregressive language model.
All the works mentioned above were accepted for this year's ISMIR conference, but only one of them is currently available online. Therefore, this presentation constitutes a sneak preview of the works that are expected to be published within the next months.
Bio:
Stefan Lattner serves as a researcher leader at the music team at Sony CSL Paris, where he focuses on generative AI for music production, music information retrieval, and computational music perception. He earned his PhD in 2019 from Johannes Kepler University (JKU) in Linz, Austria, following his research at the Austrian Research Institute for Artificial Intelligence in Vienna and the Institute of Computational Perception Linz. His studies centered on the modeling of musical structure, encompassing transformation learning and computational relative pitch perception. His current interests include human-computer interaction in music creation, live staging, and information theory in music. He specializes in generative sequence models, computational short-term memories, (self-supervised) representation learning and musical audio generation. Website: https://csl.sony.fr/ member/stefan-lattner-phd/