Best presentation award at ISMIR 2017
Best presentation award at ISMIR 2017
The work “Multi-label Music Genre Classification from Audio, Text and Images Using Deep Features” has received the Best Oral Presentation Award at the 18th International Society for Music Information Retrieval Conference (ISMIR2017) that took place at National University of Singapore Research Institute (NUSRI) in Suzhou, China, October 23-27, 2017
The work “Multi-label Music Genre Classification from Audio, Text and Images Using Deep Features” has received the Best Oral Presentation Award at the 18th International Society for Music Information Retrieval Conference (ISMIR2017) that took place at National University of Singapore Research Institute (NUSRI) in Suzhou, China, October 23-2827, 2017.
The work is co-authored by Sergio Oramas and Xavier Serra (Music Technology Group) and Francesco Barbieri (Natural Language Processing Group) and Oriol Nieto (Pandora Media Inc.). It has been developed in the context of the project “Natural Language Processing for Music Information Retrieval”, cofounded as part of the María de Maeztu Strategic Research Program.
Abstract:
Music genres allow to categorize musical items that share common characteristics. Although these categories are not mutually exclusive, most related research is traditionally focused on classifying tracks into a single class. Furthermore, these categories (e.g., Pop, Rock) tend to be too broad for certain applications. In this work we aim to expand this task by categorizing musical items into multiple and fine-grained labels, using three different data modalities: audio, text, and images. To this end we present MuMu, a new dataset of more than 31k albums classified into 250 genre classes. For every album we have collected the cover image, text reviews, and audio tracks. Additionally, we propose an approach for multi-label genre classification based on the combination of feature embeddings learned with state-of-the-art deep learning methodologies. Experiments show major differences between modalities, which not only introduce new baselines for multi-label genre classification, but also suggest that combining them yields improved results.
Link to the publication, including postprint and dataset here