We have relevant datasets, repositories, frameworks and tools of relevance for research and technology transfer initiatives related to knowledge extraction. This section provides an overview on a selection of them and links to download or contact details.

The MdM Strategic Research Program has its own community in Zenodo for material available in this repository  as well as at the UPF e-repository  . Below a non-exhaustive list of datasets representative of the research in the Department.

As part of the promotion of the availability of resources, the creation of specific communities in Zenodo has also been promoted, at level of research communities (for instance, MIR and Educational Data Analytics) or MSc programs (for instance, the Master in Sound and Music Computing)

 

 

Back Ferres D, AbuRa'ed A, Saggion H. Spanish Morphological Generation with Wide-Coverage Lexicons and Decision Trees. Procesamiento del Lenguaje Natural

FERRÉS, Daniel; ABURA'ED, Ahmed; SAGGION, Horacio. Spanish Morphological Generation with Wide-Coverage Lexicons and Decision Trees. Procesamiento del Lenguaje Natural, [S.l.], v. 58, p. 109-116, mar. 2017. ISSN 1989-7553

 

Morphological Generation is the task of producing the appropiate inected form of a lemma in a given textual context and according to some morphological features. This paper describes and evaluates wide-coverage morphological lexicons and a Decision Tree algorithm that perform Morphological Generation in Spanish at state-of-the art level. The Freeling, Leffe and Apertium Spanish lexicons, the J48 Decision Tree algorithm and the combination of J48 with Freeling and Leffe lexicons have been evaluated with the following datasets for Spanish: i) CoNLL2009 Shared Task dataset, ii) Durrett and DeNero dataset of Spanish Verbs (DDN), and iii) SIGMORPHON 2016 Shared Task (task-1) dataset. The results show that: i) the Freeling and Leffe lexicons achieve high coverage and precision over the DDN and SIGMORPHON 2016 datasets, ii) the J48 algorithm achieves state-of-the-art results in all of the three datasets, and iii) the combination of Freeling, Leffe and the J48 algorithm outperformed the results of our other approaches in the three evaluation datasets, improved slightly the results of the CoNLL2009 and SIGMORPHON 2016 reported in the state-of-the-art literature, and achieved results comparable to the ones reported in the state-of-the-art literature on the DDN dataset evaluation.

Additional material.