Furkan Yesiler defends his PhD thesis
Wednesday, January 12th, 2022 at 15.30h (CET) - online
Title: Data-driven Musical Version Identification: Accuracy, Scalability, and Bias Perspectives.
Supervisor: Dr. Emilia Gómez and Dr. Joan Serrà (Dolby Laboratories)
Jury: Dr. Xavier Serra (UPF); Dr. Dan Ellis (Google Research); Dr. Meinard Müller (Friedrich-Alexander Universität Erlangen-Nürnberg)
Abstract:
One of the key practices that enrich the world’s musical heritage is creating versions of existing musical works (e.g., cover songs, acoustic versions, and live performances). In addition to establishing connections between musicians, such versions provide listeners with an opportunity to rediscover a known tune. Due to a wide range of musical characteristics that may show variations between versions (e.g., key, tempo, structure, etc.), building computational systems to automatically identify versions of the same musical work is a challenging task. However, such systems open doors to many applications ranging from musical plagiarism detection in media platforms to assisting musicians in their creative process.
This dissertation aims at developing audio-based musical version identification (VI) systems for industry-scale corpora. To employ such systems in industrial use cases, they must demonstrate high performance on large-scale corpora while not favoring certain musicians or tracks above others. Therefore, the three main aspects we address in this dissertation are accuracy, scalability, and algorithmic bias of VI systems.
We first perform a large-scale analysis on the frequency and extent of the common musical changes between versions, by which we determine the crucial components of our first system. Using the insights from the analysis, we propose a deep learning–based model that incorporates domain knowledge in its network architecture and training strategy. We design explicit modules to handle common transformations between musical versions. We then take two main directions to further improve our model. Firstly, we experiment with data-driven fusion methods to combine information from models that process harmonic and melodic information, which greatly enhances identification accuracy. Secondly, we investigate embedding distillation techniques to reduce the size of the embeddings produced by our model, which reduces the requirements for data storage and, more importantly, retrieval time. After exploring potential improvements in accuracy and scalability, we analyze the algorithmic biases of our systems and point out the impact such systems may have on various stakeholders in the music ecosystem (e.g., musicians, composers) when used in an industrial context. We conclude our research by analyzing the performance of our proposed systems on two industrial use cases, in collaboration with a broadcast monitoring company.
Overall, our work addresses the research challenges of the next generation of VI systems. We show the feasibility of developing systems that are both accurate and scalable at the same time by carefully combining domain knowledge into data-driven workflows. We believe that our contributions will accelerate the integration of VI systems into industrial scenarios, and, thus, the impact of VI research on musicians and listeners will be more eminent than ever.
This thesis defense will take place online. To attend use this link (ID of the meeting 857 7035 6519). The microphone and camera must be turned off, and the online access will be unavailable after 30 minutes from the start of the defense.
Video: