Oramas, S, Nieto O, Barbieri F, Serra X. Multi-label Music Genre Classification from Audio, Text and Images Using Deep Features. 18th International Society for Music Information Retrieval Conference (ISMIR 2017)

We have relevant datasets, repositories, frameworks and tools of relevance for research and technology transfer initiatives related to knowledge extraction. This section provides an overview on a selection of them and links to download or contact details.

The MdM Strategic Research Program has its own community in Zenodo for material available in this repository as well as at the UPF e-repository . Below a non-exhaustive list of datasets representative of the research in the Department.

As part of the promotion of the availability of resources, the creation of specific communities in Zenodo has also been promoted, at level of research communities (for instance, MIR and Educational Data Analytics) or MSc programs (for instance, the Master in Sound and Music Computing)

Back Oramas, S, Nieto O, Barbieri F, Serra X. Multi-label Music Genre Classification from Audio, Text and Images Using Deep Features. 18th International Society for Music Information Retrieval Conference (ISMIR 2017)

Oramas, S, Nieto O, Barbieri F, Serra X. Multi-label Music Genre Classification from Audio, Text and Images Using Deep Features. 18th International Society for Music Information Retrieval Conference (ISMIR 2017)

Music genres allow to categorize musical items that share common characteristics. Although these categories are not mutually exclusive, most related research is traditionally focused on classifying tracks into a single class. Furthermore, these categories (e.g., Pop, Rock) tend to be too broad for certain applications. In this work we aim to expand this task by categorizing musical items into multiple and fine-grained labels, using three different data modalities: audio, text, and images. To this end we present MuMu, a new dataset of more than 31k albums classified into 250 genre classes. For every album we have collected the cover image, text reviews, and audio tracks. Additionally, we propose an approach for multi-label genre classification based on the combination of feature embeddings learned with state-of-the-art deep learning methodologies. Experiments show major differences between modalities, which not only introduce new baselines for multi-label genre classification, but also suggest that combining them yields improved results.

Additional material:

MuMu is a Multimodal Music dataset with multi-label genre annotations that combines information from the Amazon Reviews dataset and the Million Song Dataset (MSD). The former contains millions of album customer reviews and album metadata gathered from Amazon.com. The latter is a collection of metadata and precomputed audio features for a million songs.

To map the information from both datasets we use MusicBrainz.

Tartarus deep learning code in GitHub

Tartarus is a python module for Deep Learning experiments on Audio and Text and their combination

Link: http://mtg.upf.edu/node/3803

DTIC MdM Strategic Program: Artificial and Natural Intelligence for ICT and beyond

Oramas, S, Nieto O, Barbieri F, Serra X. Multi-label Music Genre Classification from Audio, Text and Images Using Deep Features. 18th International Society for Music Information Retrieval Conference (ISMIR 2017)

Related Assets