[BSc thesis] Mining Zenodo: Data extraction and indexing of a research repository
[BSc thesis] Mining Zenodo: Data extraction and indexing of a research repository
Author: Sergi Pastor Rochina
Supervisor: Horacio Saggion
The output of scientific publications is increasing steadily each year. Due to this scientific literature overload, an exhaustive research of a certain topic becomes overwhelming and researchers cannot get a solid grasp of all this valuable knowledge.
Natural language processing and text mining have become essential to tackle this issue and provide a solution: a comprehensive, careful and accessible perspective on the knowledge contained in scientific publications. This work aims to take advantage of this opportunity and develop an application to extract and index part of the information contained in Zenodo, an open-access repository developed under the European OpenAIRE program. The work will be based on the employment of the Dr Inventor tool (a text mining framework that enables the automated analysis of scientific publications) to support the extraction of specific information types from Zenodo’s collection of research papers in order to allow semantic indexing, discourse classification and discovery. The application is meant to be a useful and helpful tool to ease the learning experience of researchers, students and more.
Keywords: Natural Language Processing; data mining; open-access repository; web application