ELMD: Entity Linking for the Music Domain DatasetELMD is a corpus of annotated named entities from the music domain that comes from a collection of about 13k Last.fm artist biographies. Entities are linked to DBpedia thanks to a voting system among different state of the art Entity Linking systems (ELVIS) with a precision of at least 0,94. In addition, by setting up a higher confidence threshold it is possible to obtain a subset of ELMD that prioritizes higher Precision by sacrificing Recall.
ELMD 2.0During the last months we have reviewed and expanded ELMD, expanding it as follows:
- Most of the entities are also linked now to MusicBrainz (Mapping retrieved through Last.fm API)
- More annotations have been added by propagating existing annotations throughout the document in which they were found, assuming they appear in a one-sense-per-discourse fashion.
- New output formats have been added: NIF and GATE
ELMD 2.0 is available in the following formats
In the XML version, entities are annotated inside text using the category of the entity as the XML tag and with 3 attributes: dbp (DBpedia URI), mb (MusicBrainz ID) and lfm (Last.fm URL).
The NIF version has the whole dataset in one single file, following the NIF 2.0 specification
The original ELMD 1.0 is also available for download here.
ELVIS (Entity Linking Framework Voting and Integration System), the source code used to generate ELMD 1.0 and 2.0, is also available for download here: https://github.com/sergiooramas/elvis
ELMDist: A vector space model with words and MusicBrainz entitiesIn addition, word vectors have been trained from ELMD 2.0 using word2vec. Vectors can be downloaded here: here.
Scientific ReferencesPlease cite the following paper if using ELVIS or any of the datasets (ELMD 1.0 and 2.0).
Please cite the following paper if using ELMDist.
ELMDist: A vector space model with words and MusicBrainz entities. Workshop on Semantic Deep Learning (SemDeep), collocated with ESWC 2017.(2017).