We have relevant datasets, repositories, frameworks and tools of relevance for research and technology transfer initiatives related to knowledge extraction. This section provides an overview on a selection of them and links to download or contact details.

The MdM Strategic Research Program has its own community in Zenodo for material available in this repository  as well as at the UPF e-repository  . Below a non-exhaustive list of datasets representative of the research in the Department.

As part of the promotion of the availability of resources, the creation of specific communities in Zenodo has also been promoted, at level of research communities (for instance, MIR and Educational Data Analytics) or MSc programs (for instance, the Master in Sound and Music Computing)

 

 

Back Espinosa-Anke L, Camacho-Collados J, Delli Bovi C, Saggion H. Supervised Distributional Hypernym Discovery via Domain Adaptation. 2016 Conference on Empirical Methods on Natural Language Processing (EMNLP 2016)

Espinosa-Anke L, Camacho-Collados J, Delli Bovi C, Saggion H. Supervised Distributional Hypernym Discovery via Domain Adaptation. 2016 Conference on Empirical Methods on Natural Language Processing (EMNLP 2016)

Lexical taxonomies are graph-like hierarchical structures that provide a formal representation of knowledge. Most knowledge graphs to date rely on is-a (hypernymic) relations as the backbone of their semantic structure. In this paper, we propose a supervised distributional framework for hypernym discovery which operates at the sense level, enabling large-scale automatic acquisition of disambiguated taxonomies. By exploiting semantic regularities between hyponyms and hypernyms in embeddings spaces, and integrating a domain clustering algorithm, our model becomes sensitive to the target data. We evaluate several configurations of our approach, training with information derived from a manually created knowledge base, along with hypernymic relations obtained from Open Information Extraction systems. The integration of both sources of knowledge yields the best overall results according to both automatic and manual evaluation on ten different domains.

Additional material

In this link the following information is available:

  • Training Data: Wikidata, KB-Unify
  • Nasari Domain Labels
  • SensEmbed Sense Vectors
  • Python API