Pons J, Lidy T, Serra X. Experimenting with Musically Motivated Convolutional Neural Networks. 14th International Workshop on Content-based Multimedia Indexing (CBMI2016)
Pons J, Lidy T, Serra X. Experimenting with Musically Motivated Convolutional Neural Networks. 14th International Workshop on Content-based Multimedia Indexing (CBMI2016)
Pons J, Lidy T, Serra X. Experimenting with Musically Motivated Convolutional Neural Networks. 14th International Workshop on Content-based Multimedia Indexing (CBMI2016)
A common criticism to deep learning relates to the difficulty of understanding the underlying relationships that the neural networks are learning, thus behaving like a blackbox. In this article we explore various architectural choices of relevance for music signals classification tasks in order to start understanding what the chosen networks are learning. We first discuss how convolutional filters with different shapes can fit specific musical concepts and based on that we propose several musically motivated architectures. These architectures are then assessed by measuring the accuracy of the deep learning model in the prediction of various music classes using a known dataset of audio recordings of ballroom music. The classes in this dataset have a strong correlation with tempo, what allows assessing if the proposed architectures are learning frequency and/or time dependencies. Additionally, a black-box model is proposed as a baseline for comparison. With these experiments we have been able to understand what some deep learning based algorithms can learn from a particular set of data.
Additional material:
- Best Paper Award at #CBMI2016
- Version at UPF repository
- Code in Github (Python, it requires having Lasagne-Theano and Essentia installed)
- Ballroom dataset
- Slides presented at the conference
- Updated record of the publication at the web of the Music Technology Group