Blai Meléndez defends his PhD thesis
Blai Meléndez defends his PhD thesis
Title: Relative Music Loudness Estimation In TV Broadcast Audio Using Deep Learning
Supervisor: Dr. Emilia Gómez and Dr. Emilio Molina (BMAT Licensing SL)
Jury: Dr. Pedro Cano (BMAT Licensing SL), Dr. Marius Miron (UPF), Dr. Amélie Anglade (Independent MIR)
Abstract:
Under the current copyright management business model, broadcasters are taxed by the corresponding copyright management organization according to the percentage of music they broadcast, and the collected money is then distributed among the copyright holders of that music. In the specific case of TV broadcasts, whether a musical piece is played in the foreground or the background is often a relevant factor that affects the amount of money collected and distributed. In recent years, the music industry is increasingly adopting technological solutions to automatize this process. We have conducted this industrial PhD at BMAT, a company that has an active role in providing these solutions: since 2015, this company has been offering a service that currently monitors about 4300 radio stations and TV channels to automatically detect the presence of music, and to classify it as foreground or background music. We name this task relative music loudness estimation. From an industrial point of view, this thesis focuses on the improvement of the technology behind the service; and from the academic point of view, it pursues the introduction and promotion of the task in the research field of music information retrieval, and provides computational approaches to it.
The industrial and academic contributions of this thesis result from logical steps towards these goals. We first create BAT: a new open-source, web-based tool for the efficient annotation of audio events and their partial loudness in the presence of other simultaneous events. We use BAT to annotate two datasets: one private and the other public. We use the private dataset for training in the development of BMAT's new relative music loudness estimation algorithm called the Deep Music Detector. The Deep Music Detector represents the first application of deep learning within BMAT, and provides a significant boost in performance with respect to its predecessor. The public dataset, called OpenBMAT, is released in order to foster transparent, comparable and reproducible research on the task of relative music loudness estimation. We use OpenBMAT in our proposal of a novel deep learning solution to this task based on an architecture that combines regular convolutional neural networks, and temporal convolutional networks. This architecture is able to extract robust features from a time-frequency representation of an audio file, and then model them as temporal sequences, producing state-of-the-art results with an efficient usage of the network's parameters. Finally, this thesis also offers a review of the concepts, resources and literature about tasks related to the detection of music.
This thesis defense will take place online. To attend use this link (ID of the meeting 821 6744 3931). The microphone and camera must be turned off, and the online access will be unavailable after 30 minutes from the start of the defense.
Vídeo: