The second Maria de Maeztu Strategic Research Program (CEX2021-001195-M) of the Department of Information and Communication Technologies (DTIC) takes place between 2023 and 2026. The website for this program is under construction. You can find some details in this news.

The first María de Maeztu Strategic Research Program (MDM-2015-0502) took place between January 2016 and June 2020. It was focused on data-driven knowledge extraction, boosting synergistic research initiatives across our different research areas.

Back Ronzano F, Saggion H. Knowledge Extraction and Modeling from Scientific Publications. Enhancing Scholarly Data Workshop –SAVE-SD2016

 

Ronzano F, Saggion H. Knowledge Extraction and Modeling from Scientific Publications. Enhancing Scholarly Data Workshop –SAVE-SD2016

 

During the last decade the amount of scientific articles available online has substantially grown in parallel with the adoption of the Open Access publishing model. Nowadays researchers, as well as any other interested actor, are often overwhelmed by the enormous and continuously growing amount of publications to consider in order to perform any complete and careful assessment of scientific literature. As a consequence, new methodologies and automated tools to ease the extraction, semantic representation and browsing of information from papers are necessary. We propose a platform to automatically extract, enrich and characterize several structural and semantic aspects of scientific publications, representing them as RDF datasets. We analyze papers by relying on the scientific Text Mining Framework developed in the context of the European Project Dr. Inventor. We evaluate how the Framework supports two core scientific text analysis tasks: rhetorical sentence classification and extractive text summarization. To ease the exploration of the distinct facets of scientific knowledge extracted by our platform, we present a set of tailored Web visualizations. We provide on-line access to both the RDF datasets and the Web visualizations generated by mining the papers of the 2015 ACL-IJCNLP Conference.

 

Keywords: scientific knowledge extraction, knowledge modeling, RDF, software framework

 

Additional material:

Ronzano, F., & Saggion, H.: Dr. Inventor Framework: Extracting Structured Information from Scientific  Publications. Discovery Science (pp. 209-220). Springer International Publishing. (2015)

Fisas, B., Ronzano, F., & Saggion, H. (2015). A Multi-Layered Annotated Corpus of Scientific Papers. To appear in the LREC Conference 2016.

This Corpus includes 40 Computer Graphics papers containing 8,877 sentences that have been manually annotated with respect to their scientific discourse rhetorical category. Moreover, the corpus includes for each paper three handwritten summaries of maximum 250 words.

Saggion, H.: SUMMA: A robust and adaptable summarization tool. Traitement Automatique des Langues, 49(2) (2008

Department of Information and Communication Technologies, UPF

Grant CEX2021-001195-M funded by MCIN/AEI /10.13039/501100011033


 


Department of Information and Communication Technologies, UPF

[email protected]

  • Àngel Lozano - Scientific director
  • Aurelio Ruiz - Program management