“Deep Learning”, a technique on the increase that also contributes to music technology

“Deep Learning”, a technique on the increase that also contributes to music technology

Jordi Pons and Xavier Serra, researchers of the Music Technology Group, propose a model based on this technique for researching music information. This paper was granted the Best Paper Award at the IEEE CBMI 2016 conference.


Deep learning is a technique of extracting, transforming and sorting data-based features. They are algorithms that operate on a layer system, simulating the basic functioning of the neuronal synapses, that has been applied in many research areas such as artificial intelligence. Currently, the technique has gained great relevance and is used in speech and image recognition systems, artificial vision, etc.

One of the areas in which deep learning still has a long way to go is in the research into music information. A brief review of the advances with these techniques in this field reveals that such algorithms have achieved competitive results in a short period of time. In addition, given that today the number of audio recordings we have available is enormous and is constantly growing, combining deep learning with large amounts of data has a great potential to achieve better models that allow structuring audio bookshops automatically.


A study by Jordi Pons and Xavier Serra, of the Music Technology Group (MTG) at the Department of Information and Communication Technologies (DTIC) at UPF, with the participation of Thomas Lidy, from the Vienna University of Technology, has taken this approach into account and presents solutions based on deep learning to classify music in a study that is considered the best paper  at the 14th IEEE International Workshop on Content-based Multimedia Indexing (CBMI 2016), held in Bucharest (Romania) on 16 June.

As Jordi Pons, first author of the paper, says: ”The point is that technology companies are making heavy investments in deep learning although it is still not really known why it works or what the system learns”. He continues to explain: “our main contribution has been to propose deep learning architectures designed to represent musical concepts, specifically, we are working with musical audio, so that, in our context, we have been able to provide some insight into what these networks are learning”.

The authors have made their proposal classifying audio recordings of dance music, a highly rhythmic music source that enables assessing whether the architectures proposed in the paper are learning frequential or temporal characteristics. In addition, the article shows a representation that reduces the computational cost, which makes the models proposed by Pons and Serra highly efficient and have aroused great interest among the experts.  

Reference work:

Pons, J., Lidy T. and Serra X. (2016), “Experimenting with Musically Motivated Convolutional Neural Networks”, IEEE 14th International Workshop on Content-based Multimedia Indexing (CBMI 2016), Bucharest (Romania), 16 June. Best Paper Award.