Latest news Latest news

Return to Full Page
Back

MTG-UPF and Google collaborate to foster open research in sound event recognition

MTG-UPF and Google collaborate to foster open research in sound event recognition

10.04.2018

The MTG-UPF, through Freesound, and the Google Machine Perception Team, creators of the AudioSet Ontology, join efforts to create an open audio dataset and to organize a machine learning competition, efforts to stimulate research in sound event recognition. The MTG-UPF has been awarded a Google Faculty Research Award to support this initiative.

 

 

 

 

 

Recognizing all kinds of everyday sounds is an emerging research field with applications in multiple areas, ranging from automatic description of multimedia content to the development of context-aware applications for healthcare. Current machine learning needs demand substantial amounts of reliably annotated audio data. However, two major shortcomings of current datasets limit research in sound recognition: size and availability.

To address this, the MTG has launched the Freesound Datasets platform. The goal of this platform is the collaborative creation of open audio collections labeled by humans and based on Freesound content, and it is built on the principles of transparency, openness, dynamic character of the datasets, and sustainability.

Freesound Datasets allows to explore the contents of datasets built with Freesound content, and to contribute to them by providing annotations. It also promotes discussion around the datasets and it will allow to download different timestamped releases of them. All datasets collected through the platform will be openly available under Creative Commons licenses. You can find more information about the Freesound Datasets platform in a paper presented at ISMIR last year (Fonseca et al., 2017).
 

 

Our first Dataset: FSD

The first dataset created through the Freesound Datasets platform is FSD, which is a large-scale, general-purpose dataset composed of Freesound content annotated with labels from Google’s AudioSet Ontology. One of the characteristics of Freesound is the heterogeneity of its sounds, uploaded by thousands of users across the globe. We wanted our first dataset to reflect this, and for this reason we decided to use the AudioSet Ontology - a hierarchical collection of over 600 sound classes of everyday sounds - to annotate sounds in FSD. Hence FSD features a large vocabulary of everyday sounds, ranging from human and animal sounds to music and sounds made by things.

FSD will grow with the involvement of users in the labeling process. The current target is to have 100 verified samples per category (in the categories where this amount is available). This crowdsourcing process is done via the Freesound Datasets platform. Our goal is to provide the research community with one of the largest, freely-distributable, audio datasets for sound recognition and related tasks.
 

 

Collaboration with Google and Challenge

The Freesound team of the MTG has been awarded a Google Faculty Research Award to support the Freesound Datasets project and the creation of FSD. The first outcome of this collaboration is the organization of the Freesound General-Purpose Audio Tagging Challenge, in which participants are challenged to build systems able to recognize 41 diverse categories of everyday sounds. The dataset used for the competition is a small subset of FSD.

The competition is currently taking place in Kaggle (a well-known platform that hosts machine learning competitions) and under the framework of the DCASE Challenge (an academic competition featuring tasks related to the computational analysis of sound events).

In the future we plan on organizing further competitions with upcoming releases of FSD. We believe that creating datasets using open and collaborative approaches like the ones mentioned above and fostering research in sound event recognition by organizing machine learning competitions will have a significant impact in our research community.

 

The members of the Freesound Datasets team are Xavier Favory, Eduardo Fonseca, Frederic Font, and Jordi Pons, with contributions from Andres Ferraro and Alastair Porter and the supervision of Prof. Xavier Serra.

Categories: