We have relevant datasets, repositories, frameworks and tools of relevance for research and technology transfer initiatives related to knowledge extraction. This section provides an overview on a selection of them and links to download or contact details.

The MdM Strategic Research Program has its own community in Zenodo for material available in this repository  as well as at the UPF e-repository  . Below a non-exhaustive list of datasets representative of the research in the Department.

As part of the promotion of the availability of resources, the creation of specific communities in Zenodo has also been promoted, at level of research communities (for instance, MIR and Educational Data Analytics) or MSc programs (for instance, the Master in Sound and Music Computing)

 

 

Back Espinosa-Anke, L., Oramas S., Saggion H., & Serra X. ELMDist: A vector space model with words and MusicBrainz entities. Workshop on Semantic Deep Learning (SemDeep), collocated with ESWC 2017

Espinosa-Anke, L., Oramas S., Saggion H., & Serra X. ELMDist: A vector space model with words and MusicBrainz entities. Workshop on Semantic Deep Learning (SemDeep), collocated with ESWC 2017

Music consumption habits as well as the Music market have changed dramatically due to the increasing popularity of digital audio and streaming services. Today, users are closer than ever to a vast number of songs, albums, artists and bands. However, the challenge remains in how to make sense of all the data available in the Music domain, and how current state of the art in Natural Language Processing and semantic technologies can contribute in Music Information Retrieval areas such as music recommendation, artist similarity or automatic playlist generation. In this paper, we present an evaluate a distributional sense-based embeddings model in the music domain, which can be easily used for these tasks, as well as a device for improving artist or album clustering. The model is trained on a disambiguated corpus linked to the MusicBrainz musical Knowledge Base with an estimated precision of above 0.9, and following current knowledge-based approaches to sense-level embeddings, entity-related vectors are provided a` la WordNet, concatenating the id of the entity and its mention (in WordNet lingo, the entity’s synset and sense). The model is evaluated both intrinsically and extrinsically in a supervised entity typing task, and released for the use and scrutiny of the community.

 

Additional material:

ELMDist - sense-level embeddings model in the music domain, trained on a music-specific corpus of artist biographies, where musical entities have been automatically annotated with high precision against the musical KB MusicBrainz (MB).

The pretrained sense-level word2vec model against MusicBrainz can be downloaded from: http://mtg.upf.edu/system/files/projectsweb/elmdist_vectors.zip

If you want to retrain the vectors, ELMD 2.0 can be downloaded from here:

http://mtg.upf.edu/download/datasets/elmd

And you can train the model runing train_word2vec.py