We have relevant datasets, repositories, frameworks and tools of relevance for research and technology transfer initiatives related to knowledge extraction. This section provides an overview on a selection of them and links to download or contact details.

The MdM Strategic Research Program has its own community in Zenodo for material available in this repository  as well as at the UPF e-repository  . Below a non-exhaustive list of datasets representative of the research in the Department.

As part of the promotion of the availability of resources, the creation of specific communities in Zenodo has also been promoted, at level of research communities (for instance, MIR and Educational Data Analytics) or MSc programs (for instance, the Master in Sound and Music Computing)

 

 

Back Ronzano F, Saggion H. Knowledge Extraction and Modeling from Scientific Publications. Enhancing Scholarly Data Workshop –SAVE-SD2016

 

Ronzano F, Saggion H. Knowledge Extraction and Modeling from Scientific Publications. Enhancing Scholarly Data Workshop –SAVE-SD2016

 

During the last decade the amount of scientific articles available online has substantially grown in parallel with the adoption of the Open Access publishing model. Nowadays researchers, as well as any other interested actor, are often overwhelmed by the enormous and continuously growing amount of publications to consider in order to perform any complete and careful assessment of scientific literature. As a consequence, new methodologies and automated tools to ease the extraction, semantic representation and browsing of information from papers are necessary. We propose a platform to automatically extract, enrich and characterize several structural and semantic aspects of scientific publications, representing them as RDF datasets. We analyze papers by relying on the scientific Text Mining Framework developed in the context of the European Project Dr. Inventor. We evaluate how the Framework supports two core scientific text analysis tasks: rhetorical sentence classification and extractive text summarization. To ease the exploration of the distinct facets of scientific knowledge extracted by our platform, we present a set of tailored Web visualizations. We provide on-line access to both the RDF datasets and the Web visualizations generated by mining the papers of the 2015 ACL-IJCNLP Conference.

 

Keywords: scientific knowledge extraction, knowledge modeling, RDF, software framework

 

Additional material:

Ronzano, F., & Saggion, H.: Dr. Inventor Framework: Extracting Structured Information from Scientific  Publications. Discovery Science (pp. 209-220). Springer International Publishing. (2015)

Fisas, B., Ronzano, F., & Saggion, H. (2015). A Multi-Layered Annotated Corpus of Scientific Papers. To appear in the LREC Conference 2016.

This Corpus includes 40 Computer Graphics papers containing 8,877 sentences that have been manually annotated with respect to their scientific discourse rhetorical category. Moreover, the corpus includes for each paper three handwritten summaries of maximum 250 words.

Saggion, H.: SUMMA: A robust and adaptable summarization tool. Traitement Automatique des Langues, 49(2) (2008