Back FSDnoisy18k, an open access resource developed by the MTG in collaboration with Google AI

FSDnoisy18k, an open access resource developed by the MTG in collaboration with Google AI

A collection of open access data for research in the recognition and classification of sound events, a field of research in which the Music Technology Group is working and which has many applications ranging from the automatic description of multimedia contents to the development of applications in the area of health.

14.01.2019

 

Previously, the Music Technology Group (MTG) of the Department of Information and Communications Technologies (DTIC) at UPF, through Freesound and Google Sound Understanding Team, creators of AudioSet Ontology, had joined forces to promote research into the recognition of sound events.

The recognition and classification of all types of everyday sounds is an emerging field of research in which the Music Technology Group (MTG) is working and which has applications in many areas, from the automatic description of multimedia content to the development of applications in the area of health.

The problem is that as the sound dataset grows, the noise in labels becomes inevitable. To date, little research has been done on the impact of these errors

In the classification of sound events, dataset creation consists of two stages: data acquisition - for example, retrieving data from sites like Freesound or YouTube, or making new recordings - and data curation - organization, cleaning and most importantly, the labelling.

The problem is that as the sound dataset grows, noise in the label, i.e., the wrong use of the labels, becomes inevitable. To date, little research has been done on the impact of these errors.

Members of the MTG in collaboration with Sound Understanding Team at Google AI (Artificial Intelligence) have developed a collection of data to facilitate research into the classification of large volumes of sound data when labels present noise. The authors of the study explain that “some websites provide a large volume of audio and metadata contributed by the users, but inferring labels of these metadata leads to the introduction of errors caused by the introduction of unreliable data and mapping limitations”.

An evaluation method to measure the impact of noise on labels and mitigate its effects for a labelled sound dataset

Hence they have developed FSDnoisy18k, an open access resource to search for label noise with which “we characterize the noise of the label empirically and provide a frame of reference”, the authors say.

The collection of data contains 42.5 hours of audio from Freesound - another project by the MTG - spread across 20 types of sound. The dataset is labelled individually and consists of a small amount of manually-labelled data and a large amount of sound data obtained from the real world with a high percentage of noise.

“In this same work we present an evaluation method to measure the impact of noise on labels and mitigate its effects for a labelled sound dataset”. This is the first time that this method has been used in sound classification. 

FSDnoisy18k opens the door to the evaluation of a range of measures against noise inherent in the labelling and classification of sounds, as well as in several semi-supervised learning approaches.

Related work:

Eduardo Fonseca, Manoj Plakal, Daniel P. W. Ellis, Frederic Font, Xavier Favory, Xavier Serra (2019), “Learning Sound Event Classifiers from Web Audio with Noisy Labels”, arXiv preprint arXiv:1901.01189.

Multimedia

Categories:

SDG - Sustainable Development Goals:

Els ODS a la UPF

Contact