The second Maria de Maeztu Strategic Research Program (CEX2021-001195-M) of the Department of Information and Communication Technologies (DTIC) takes place between 2023 and 2026. The website for this program is under construction. You can find some details in this news.

The first María de Maeztu Strategic Research Program (MDM-2015-0502) took place between January 2016 and June 2020. It was focused on data-driven knowledge extraction, boosting synergistic research initiatives across our different research areas.

Back [BSc thesis] Mining Zenodo: Data extraction and indexing of a research repository

[BSc thesis] Mining Zenodo: Data extraction and indexing of a research repository

Author: Sergi Pastor Rochina

Supervisor: Horacio Saggion

The output of scientific publications is increasing steadily each year. Due to this scientific literature overload, an exhaustive research of a certain topic becomes overwhelming and researchers cannot get a solid grasp of all this valuable knowledge.

Natural language processing and text mining have become essential to tackle this issue and provide a solution: a comprehensive, careful and accessible perspective on the knowledge contained in scientific publications. This work aims to take advantage of this opportunity and develop an application to extract and index part of the information contained in Zenodo, an open-access repository developed under the European OpenAIRE program. The work will be based on the employment of the Dr Inventor tool (a text mining framework that enables the automated analysis of scientific publications) to support the extraction of specific information types from Zenodo’s collection of research papers in order to allow semantic indexing, discourse classification and discovery. The application is meant to be a useful and helpful tool to ease the learning experience of researchers, students and more.

 

Keywords: Natural Language Processing; data mining; open-access repository; web application

Department of Information and Communication Technologies, UPF

Grant CEX2021-001195-M funded by MCIN/AEI /10.13039/501100011033


 


Department of Information and Communication Technologies, UPF

[email protected]

  • Àngel Lozano - Scientific director
  • Aurelio Ruiz - Program management