Guillem Cortès defends his PhD thesis
Title: Music Identification with Audio Fingerprinting. An Industrial Perspective
Supervisors: Dr. Xavier Serra Casals (UPF) and Dr. Emilio Molina Martínez (BMAT Licensing S.L.)
Jury: Dr. Manuel Moussallam (Deezer Research), Dr. Martín Rocamora (UPF), Dr. Pedro Cano Vila (BMAT Licensing S.L.)
Abstract:
Music identification is a mature and well-studied field in the Music Information Retrieval community. In the music industry, it ensures fair distribution of royalties, which are allocated based on usage, such as plays in live venues or airtime in broadcasts. This thesis has been conducted as part of an industrial PhD at BMAT, a company specializing in music monitoring and identification services. This thesis explores advancements in Audio Fingerprinting (AFP), a core technology for music identification that identifies audio by matching compact signatures extracted from audio signals. From their early development in the 2000s, AFP systems have evolved to address challenges such as robustness to time-frequency modifications, or noise and speech overlays, for instance. However, scenarios like background music identification or extreme time-frequency modifications remain challenging for these systems.
To address these gaps, this thesis first introduces a self-contained dataset specifically designed for broadcast monitoring, featuring TV recordings with a high prevalence of background music and reference tracks of production music. Alongside this dataset, it proposes PeakFP, a new baseline method tailored for background music identification. To improve the AFP performance, this thesis explores a two-step approach combining source separation algorithms with AFP systems. This approach demonstrates substantial performance improvements in background music identification, albeit at the cost of computational overhead. Finally, this thesis presents PeakNetFP, the first hybrid AFP system that integrates the simplicity and scalability of spectral peaks with the abstraction capabilities of neural networks. PeakNetFP achieves comparable performance to SOTA models while being 100 times smaller, offering a scalable and efficient solution for AFP tasks, including severe time-stretched audio.