An Interpretable Deep Learning Model for Automatic Sound Classification
This article by Pablo Zinemanas provides insights on how deep neural networks work for a sound classifier
The popularization of deep learning has led to significant advances in a wide range of scientific fields and practical problems, specially in computer vision (e.g. object detection in images) and natural language processing (e.g. translators), but also in the audio domain, improving tasks such as speech recognition and music recommendation. However it is often hard to provide insights into the decision-making process of deep neural networks.
At the MTG we have recently published a paper entitled “An Interpretable Deep Learning Model for Automatic Sound Classification” in which we carry out sound recognition experiments with a classifier based on deep neural networks (DNNs) but which, unlike other classifiers, has been designed to be interpretable.
The black-box nature of DNNs may produce unintended effects, such as reinforcing inequality and bias. The integration of such algorithms into our daily lives requires wide social acceptance, but the potential malfunctioning can decrease trust in them. In consonance with this, there is a recent surge of research on machine learning models that provide explanations of their decisions in some level of detail, a field that is commonly known as interpretable machine learning. However, there is still a lack of research in interpretable machine learning in the audio domain.
In our recent research, we have designed a classifier that is able to recognize sound sources from three different contexts (urban sounds, musical instruments and keywords in speech) and yet provide insight about how the decision-making of the classifier works so we can learn from it. We achieve results that are comparable to the state of the art in the three contexts and we also propose methods for model refinement based on the decision-making insights provided by the interpretable classifier, reducing the number of parameters while improving the accuracy.
Our work is a proof of concept that DNNs do not necessarily need to be treated as black boxes to achieve great results. Our experiments show that interpretability allows for evaluating its performance beyond the typical accuracy measure while providing useful insights into the inner workings of the model.
Zinemanas, Pablo; Rocamora, Martín; Miron, Marius; Font, Frederic; Serra, Xavier. 2021. "An Interpretable Deep Learning Model for Automatic Sound Classification" Electronics 10, no. 7: 850. https://doi.org/10.3390/electronics10070850