Back Artificial neural networks characterize pieces of music

Artificial neural networks characterize pieces of music

Sergio Oramas, a member of the Music Technology Group, uses the deep learning technique to score the musical genres of large collections of albums. This work earned him an award at ISMIR 2017 (Suzhou, China) and is part of the doctoral thesis he is to read on 29 November at UPF, which will be attended by Brian Whitman, one of the principal inventors of Spotify.



In large collections of music, such as Spotify or Pandora, it is essential to know the genre of a piece of music in order to classify it, identify it through search engines and also to make recommendations to users. This could not be solved manually, as it would require a great deal of time and human effort.

Sergio Oramas, a member of the Music Technology Group (MTG)  at the UPF Department of Information and Communication Technologies (DTIC) is the first author of an article that solves the problem of how to ascertain automatically the genre of an album. He performed his work in collaboration with Pandora, one of the biggest companies in the sector of music in streaming that is already using some of the findings published in his research.

The article that describes his work was presented by Sergio Oramas, Francesco Barbieri and Xavier Serra, researchers with the DTIC, and Oriol Nieto (NYU-Steinhardt, USA) at the ISMIR 2017 Conference (Suzhou, China), the world’s biggest on an academic and a business level, in the field of computational analysis of musical information. Their presentation won the award for best oral communication, voted by the 300 participants at ISMIR 2017.  

The research was carried out through an innovative methodological approach, the deep learning technique, a concept of artificial intelligence that uses artificial neural networks to analyse the information contained in the audio, the image and the textual information associated with a large volume of information from musical albums. “In this work, I used the audio of the songs, the album covers and the reviews written by users who bought them on Amazon”, states Oramas.

Detecting different genres at the same time in the same piece of music

“In the article, I demonstrate that deep learning techniques help improve the results obtained so far with other techniques, both when using each type of data separately (audio, image and text), and when combining them”, explains Oramas. And he adds, “one of the contributions of the study is that the system is capable of detecting different genres at the same time in the same album. This comes closer to reality, as an album or song can be pop, for example, but at the same time it may have elements of jazz, a soul voice and techno percussion. Our system is able to detect all genres at the same time, and there are not many published systems that are able to do so, let alone combining different types of data”.

This work is part of the doctoral thesis titled Knowledge Extraction and Representation Learning for Music Recommendation and Classification that Sergio Oramas is to read on 29 November. He has been supervised by Xavier Serra. The doctoral thesis panel will include Brian Whitman, co-founder of The Echo Nest and one of the main researchers of Spotify, and Markus Schedl, a renowned expert in the field of music recommendation at the Johannes Kepler University Linz (Austria).  

Reference works: 

·  Sergio Oramas, Knowledge Extraction and Representation Learning for Music Recommendation and Classificationdoctoral thesis supervised by Xavier Serra, 29 November, 11 am, room 55.309, Tànger building, Communication Campus, Pompeu Fabra University.

·  Sergio Oramas, Oriol Nieto, Francesco Barbieri, Xavier Serra (2017), “Multi-label Music Genre Classification from Audio, Text, and Images Using Deep Features“, 16 July, ISMIR, Suzhou, China.



SDG - Sustainable Development Goals:

Els ODS a la UPF