ELMDist: A vector space model with words and MusicBrainz entities

ELMDist: A vector space model with words and MusicBrainz entities

Espinosa-Anke, L., Oramas S., Saggion H., & Serra X. ELMDist: A vector space model with words and MusicBrainz entities. Workshop on Semantic Deep Learning (SemDeep), collocated with ESWC 2017

Music consumption habits as well as the Music market have changed dramatically due to the increasing popularity of digital audio and streaming services. Today, users are closer than ever to a vast number of songs, albums, artists and bands. However, the challenge remains in how to make sense of all the data available in the Music domain, and how current state of the art in Natural Language Processing and semantic technologies can contribute in Music Information Retrieval areas such as music recommendation, artist similarity or automatic playlist generation. In this paper, we present an evaluate a distributional sense-based embeddings model in the music domain, which can be easily used for these tasks, as well as a device for improving artist or album clustering. The model is trained on a disambiguated corpus linked to the MusicBrainz musical Knowledge Base with an estimated precision of above 0.9, and following current knowledge-based approaches to sense-level embeddings, entity-related vectors are provided a` la WordNet, concatenating the id of the entity and its mention (in WordNet lingo, the entity’s synset and sense). The model is evaluated both intrinsically and extrinsically in a supervised entity typing task, and released for the use and scrutiny of the community.

 

Additional material:

ELMDist - sense-level embeddings model in the music domain, trained on a music-specific corpus of artist biographies, where musical entities have been automatically annotated with high precision against the musical KB MusicBrainz (MB).

The pretrained sense-level word2vec model against MusicBrainz can be downloaded from: http://mtg.upf.edu/system/files/projectsweb/elmdist_vectors.zip

If you want to retrain the vectors, ELMD 2.0 can be downloaded from here:

http://mtg.upf.edu/download/datasets/elmd

And you can train the model runing train_word2vec.py