Music Technology Group

Specialized in audio signal processing, music information retrieval, musical interfaces, and computational musicology

About us Watch video

Generative Music AI Workshop

June 17 to 21, 2024

Information

Freesound

The biggest creative-commons sound sharing website

Website

Essentia

Open-source library and AI models for audio and music analysis

Discover Essentia Industrial applications

Back An Interpretable Deep Learning Model for Automatic Sound Classification

An Interpretable Deep Learning Model for Automatic Sound Classification

This article by Pablo Zinemanas provides insights on how deep neural networks work for a sound classifier

13.04.2021

 

The popularization of deep learning has led to significant advances in a wide range of scientific fields and practical problems, specially in computer vision (e.g. object detection in images) and natural language processing (e.g. translators), but also in the audio domain, improving tasks such as speech recognition and music recommendation. However it is often hard to provide insights into the decision-making process of deep neural networks.

At the MTG we have recently published a paper entitled “An Interpretable Deep Learning Model for Automatic Sound Classification” in which we carry out sound recognition experiments with a classifier based on deep neural networks (DNNs) but which, unlike other classifiers, has been designed to be interpretable.

The black-box nature of DNNs may produce unintended effects, such as reinforcing inequality and bias. The integration of such algorithms into our daily lives requires wide social acceptance, but the potential malfunctioning can decrease trust in them. In consonance with this, there is a recent surge of research on machine learning models that provide explanations of their decisions in some level of detail, a field that is commonly known as interpretable machine learning. However, there is still a lack of research in interpretable machine learning in the audio domain.

In our recent research, we have designed a classifier that is able to recognize sound sources from three different contexts (urban sounds, musical instruments and keywords in speech) and yet provide insight about how the decision-making of the classifier works so we can learn from it. We achieve results that are comparable to the state of the art in the three contexts and we also propose methods for model refinement based on the decision-making insights provided by the interpretable classifier, reducing the number of parameters while improving the accuracy.

Our work is a proof of concept that DNNs do not necessarily need to be treated as black boxes to achieve great results. Our experiments show that interpretability allows for evaluating its performance beyond the typical accuracy measure while providing useful insights into the inner workings of the model.

 

Zinemanas, Pablo; Rocamora, Martín; Miron, Marius; Font, Frederic; Serra, Xavier. 2021. "An Interpretable Deep Learning Model for Automatic Sound Classification" Electronics 10, no. 7: 850. https://doi.org/10.3390/electronics10070850

 
 

Multimedia

Categories:

SDG - Sustainable Development Goals:

Els ODS a la UPF

Contact

MTG Videos

Watch videos for more information about our activities, technology demonstrations or media coverage.