[BSc thesis] Mining Zenodo: Data extraction and indexing of a research repository
We develop a large number of software tools and hosting infrastructures to support the research developed at the Department. We will be detailing in this section the different tools available. You can take a look for the moment at the offer available within the UPF Knowledge Portal, the innovations created in the context of EU projects in the Innovation Radar and the software sections of some of our research groups:
Artificial Intelligence |
Nonlinear Time Series Analysis |
Web Research |
Music Technology |
Interactive Technologies |
Barcelona MedTech |
Natural Language Processing |
Nonlinear Time Series Analysis |
UbicaLab |
Wireless Networking |
Educational Technologies |
[BSc thesis] Mining Zenodo: Data extraction and indexing of a research repository
Author: Sergi Pastor Rochina
Supervisor: Horacio Saggion
The output of scientific publications is increasing steadily each year. Due to this scientific literature overload, an exhaustive research of a certain topic becomes overwhelming and researchers cannot get a solid grasp of all this valuable knowledge.
Natural language processing and text mining have become essential to tackle this issue and provide a solution: a comprehensive, careful and accessible perspective on the knowledge contained in scientific publications. This work aims to take advantage of this opportunity and develop an application to extract and index part of the information contained in Zenodo, an open-access repository developed under the European OpenAIRE program. The work will be based on the employment of the Dr Inventor tool (a text mining framework that enables the automated analysis of scientific publications) to support the extraction of specific information types from Zenodo’s collection of research papers in order to allow semantic indexing, discourse classification and discovery. The application is meant to be a useful and helpful tool to ease the learning experience of researchers, students and more.
Keywords: Natural Language Processing; data mining; open-access repository; web application