Oramas S, Barbieri F, Nieto O, Serra X. Multimodal Deep Learning for Music Genre Classification. Transactions of the International Society for Music Information Retrieval

We develop a large number of software tools and hosting infrastructures to support the research developed at the Department. We will be detailing in this section the different tools available. You can take a look for the moment at the offer available within the UPF Knowledge Portal, the innovations created in the context of EU projects in the Innovation Radar and the software sections of some of our research groups:

Artificial Intelligence

Nonlinear Time Series Analysis

Downloads

Web Research

Dyswebxia

Music Technology

Interactive Technologies

Barcelona MedTech

GitHub

Natural Language Processing

GitHub
Resources (datasets, software and other material)

Nonlinear Time Series Analysis

Downloads

UbicaLab

GitHub

Wireless Networking

GitHub

Educational Technologies

GitHub

Back Oramas S, Barbieri F, Nieto O, Serra X. Multimodal Deep Learning for Music Genre Classification. Transactions of the International Society for Music Information Retrieval

Oramas S, Barbieri F, Nieto O, Serra X. Multimodal Deep Learning for Music Genre Classification. Transactions of the International Society for Music Information Retrieval

Music genre labels are useful to organize songs, albums, and artists into broader groups that share similar musical characteristics. In this work, an approach to learn and combine multimodal data representations for music genre classification is proposed. Intermediate representations of deep neural networks are learned from audio tracks, text reviews, and cover art images, and further combined for classification. Experiments on single and multi-label genre classification are then carried out, evaluating the effect of the different learned representations and their combinations. Results on both experiments show how the aggregation of learned representations from different modalities improves the accuracy of the classification, suggesting that different modalities embed complementary information. In addition, the learning of a multimodal feature space increase the performance of pure audio representations, which may be specially relevant when the other modalities are available for training, but not at prediction time. Moreover, a proposed approach for dimensionality reduction of target labels yields major improvements in multi-label classification not only in terms of accuracy, but also in terms of the diversity of the predicted genres, which implies a more fine-grained categorization. Finally, a qualitative analysis of the results sheds some light on the behavior of the different modalities in the classification task.

Additional material:

Both datasets used in the experiments are released as MSD-I and MuMu. The released data includes mappings between data sources, genre annotations, splits, texts, and links to images.
The source code to reproduce the audio, text, and multimodal experiments (Tartarus) and the visual experiments is also available.

Link: http://exemple.com

DTIC MdM Strategic Program: Artificial and Natural Intelligence for ICT and beyond

Oramas S, Barbieri F, Nieto O, Serra X. Multimodal Deep Learning for Music Genre Classification. Transactions of the International Society for Music Information Retrieval

Related Assets