Paper on Interpretable music classification at ICASSP 2024 conference
Pablo Alonso will present the paper in the conference’s XAI-SA workshop in Seoul on Monday, April 15th
The article “Leveraging pre-trained autoencoders for interpretable prototype learning of music audio” is the result of a work developed as a collaboration between the Music Technology Group (Pablo Alonso, Roser Batlle, Pablo Zinemanas, Dmitry Bogdanov, Xavier Serra, and Martín Rocamora) and Universidad de Buenos Aires, (Leonardo Daniel Pepino).
In the article we present PECMAE, an interpretable music classifier using prototype learning. In prototype learning, class probabilities are derived from distances between input samples and learnable class prototypes. After training, class prototypes can be analysed to interpret the model decisions.
Our work is based on APNet, by Pablo Zinemanas, a prototype learning classifier operating in the latent space of an autoencoder to allow prototype sonification. One APNet limitation is that it relies on information from close samples to sonify prototypes. Instead of jointly training the autoencoder and the prototype network as in APNet, we used a pre-trained denoising decoder conditioned on summarised EnCodecMAE embeddings developed by Leonardo Daniel Pepino.
We show that decoupling training of the autoencoder and prototype net allows for using pretrained models that increase performance. Also, using a decoder that doesn't need real sample information allows for less biased sonification better suited for model developers.
The paper will be presented at the ICASSP Workshop on Explainable AI for Speech and Audio (XAI-SA) in Seoul, Korea, on April 15th, 2024.
Article: http://hdl.handle.net/10230/59220
Examples: https://palonso.github.io/pecmae/
Code: https://github.com/palonso/pecmae
ICASSP 2024: https://2024.ieeeicassp.org/
Video: https://youtu.be/t1Dw6e3hBdg