We are constantly hearing sounds, and we are pretty good at recognizing them and knowing what they mean. Wouldn’t it be great if machines could do the same? If they could track what is going on in the environment, or summarize for us what is happening in an audio recording. Wouldn’t it be great if machines could help people with hearing problems to perceive sounds? If they could tell when the microwave is ready or when someone just knocked at the door?
Recognizing all kinds of everyday sounds like environmental, urban or domestic sounds, is an emerging research field with many interesting applications. But to teach machines how to recognize sounds, large amounts of reliably labeled audio data are essential for training them, and currently there is a lack of open audio data for this type of research. To address this issue, we started to develop a framework to support the creation of datasets for research in sound recognition.
In this blog post, we explain the Freesound Annotator (our proposed framework for creating audio datasets), we describe the Freesound Dataset (FSD) (which is the main dataset we are building), and the applications that this dataset will be useful for.