Back Guillem Cortès defends his PhD thesis

Guillem Cortès defends his PhD thesis

Tuesday, February 18th 2025 at 10:30h - Room 55.309 (3rd floor) Tanger building (UPF Poblenou)
10.02.2025

Imatge inicial -

Title: Music Identification with Audio Fingerprinting. An Industrial Perspective

Supervisors: Dr. Xavier Serra Casals (UPF) and Dr. Emilio Molina Martínez (BMAT Licensing S.L.)

Jury: Dr. Manuel Moussallam (Deezer Research), Dr. Martín Rocamora  (UPF), Dr. Pedro Cano Vila (BMAT Licensing S.L.)

Abstract:

Music identification is a mature and well-studied field in the Music Information Retrieval community. In the music industry, it ensures fair distribution of royalties, which are allocated based on usage, such as plays in live venues or airtime in broadcasts. This thesis has been conducted as part of an industrial PhD at BMAT, a company specializing in music monitoring and identification services. This thesis explores advancements in Audio Fingerprinting (AFP), a core technology for music identification that identifies audio by matching compact signatures extracted from audio signals. From their early development in the 2000s, AFP systems have evolved to address challenges such as robustness to time-frequency modifications, or noise and speech overlays, for instance. However, scenarios like background music identification or extreme time-frequency modifications remain challenging for these systems. 

To address these gaps, this thesis first introduces a self-contained dataset specifically designed for broadcast monitoring, featuring TV recordings with a high prevalence of background music and reference tracks of production music. Alongside this dataset, it proposes PeakFP, a new baseline method tailored for background music identification. To improve the AFP performance, this thesis explores a two-step approach combining source separation algorithms with AFP systems. This approach demonstrates substantial performance improvements in background music identification, albeit at the cost of computational overhead. Finally, this thesis presents PeakNetFP, the first hybrid AFP system that integrates the simplicity and scalability of spectral peaks with the abstraction capabilities of neural networks. PeakNetFP achieves comparable performance to SOTA models while being 100 times smaller, offering a scalable and efficient solution for AFP tasks, including severe time-stretched audio. 

Despite being conducted in an industrial setting, this work adheres to the principles of open science, with all datasets, code, and evaluations made publicly available. This thesis aims to foster further research in AFP, particularly in underexplored scenarios, and to contribute to the development of more robust and versatile AFP systems.