Latest news Latest news

Return to Full Page

Xavier Favory defends his PhD thesis

Xavier Favory defends his PhD thesis

Thursday March 4th 2021, at 15.30 - online


Imatge inicial

Title: Improving Sound Retrieval in Large Collaborative Collections

Supervisor: Dr. Xavier Serra and Dr. Frederic Font

Jury: Dr. Sergi Jordà (UPF), Dr. Gerard Roma (University of Huddersfield), Dr. Emmanouil Benetos (Queen Mary University of London)


Capturing sounds on a recording medium to enable their preservation and reproduction started to be possible during the industrial revolution of the 19th century, originally achieved through mechanic and acoustic devices, and later electronic and magnetic ones. Eventually, the digital age of the mid-20th century brought about the democratization of recording and reproduction devices, as well as accessible ways of storing and sharing content. As a consequence, massive collections of audio samples are nowadays increasingly available online, some of which are created collaboratively thanks to sharing platforms. This content has become essential for entertainment media, such as movies, music, video games, and for human-machine interaction. Nonetheless, given the amount and diversity of the content, exploring, searching and retrieving from collaborative collections becomes increasingly challenging. Methods for automatically organizing content, and facilitating its retrieval therefore become more and more necessary, creating an opportunity for novel Information Retrieval approaches.

This thesis aims at improving the retrieval of sounds in large collaborative collections, and does so from different perspectives. We first investigate data collection methodologies for creating large and sustainable audio datasets, including the design and development of a website and an annotation tool to engage users in the collaborative process of dataset creation. Additionally, we focus on improving the manual annotation of audio samples when using large taxonomies. This calls for specialized tools to assist users towards providing exhaustive and consistent annotations.

This produced a number of publicly available large-scale datasets for developing and evaluating machine listening models. From another perspective, we propose novel methods for learning audio representations, suitable for diverse machine learning applications, by taking advantage of large amounts of online content and its metadata. We then investigate the problem of unsupervised classification by first identifying which type of audio features are suited for clustering the wide variety of sounds present in online collections. Finally, we focus on Search Results Clustering, an approach that organizes the search results into coherent groups. This research improved the retrieval of sounds from large collections, namely through facilitating exploration and interaction with search results.


This thesis defense will take place online. To attend use this link (ID of the meeting 965 3834 6607). The microphone and camera must be turned off, and the online access will be unavailable after 30 minutes from the start of the defense.