The second Maria de Maeztu Strategic Research Program (CEX2021-001195-M) of the Department of Information and Communication Technologies (DTIC) takes place between 2023 and 2026. The website for this program is under construction. You can find some details in this news.

The first María de Maeztu Strategic Research Program (MDM-2015-0502) took place between January 2016 and June 2020. It was focused on data-driven knowledge extraction, boosting synergistic research initiatives across our different research areas.

Back Mining the Knowledge of Scientific Publications

Mining the Knowledge of Scientific Publications

Description
Researchers

During the last decade the amount of scientific information available on-line increased at an unprecedented rate, with recent estimates reporting that a new paper is published every 20 seconds. PubMed includes about 24.6M papers with a growth rate of about 1,370 new articles per day. Elsevier’ Scopus and Thomson Reuther’s ISI Web of Knowledge respectively contain more than 57 and 90 million papers.

In this scenario of scientific information overload, researchers are overwhelmed by an enormous and continuously growing number of articles to access in their daily activities. The exploration of recent advances concerning specific topics, methods and techniques, peer reviewing, the writing and evaluation of research proposals and in general any activity that requires a careful and comprehensive assessment of scientific literature has turned into an extremely complex and time-consuming task. The availability of text mining tools able to extract, aggregate, summarize and turn scientific unstructured textual contents into well organized and interconnected knowledge is fundamental in a scientific information access scenario.

In order to take full advantage of the knowledge present in scientific publications proper semantic indexing, search and content aggregation approaches, are required. In general, the semantic interpretation and enrichment of scientific texts would leverage the development of a varied set of interacting applications supporting tasks such as search of new information on specific scientific problems, semi-automatic assessment of papers and research proposals hypothesis formulation tracking of scientific and technological advances scientific intelligence assisted report and review writing, and question answering

The main objective of this project is the extension and development of software and approaches as well as the creation of new datasets that will facilitate the extraction and summarization of knowledge from scientific publications in different disciplines The project is a collaboration between the Natural Language Processing Research Group (TALN) and the Web Research Group (WRG).

To learn more:

Presentation of the project at the Data-driven Knowledge Extraction Workshop, June 2016. (Slides)

Principal researchers

Horacio Saggion

Researchers

Francesco Ronzano
Francesco Barbieri
Beatriz Fisas
Ahmed Abura’ed
Luis Espinosa-Anke
Ricardo Baeza-Yates
Ana Freire
Diego Sáez-Trumper (EURECAT)

Related Assets

Department of Information and Communication Technologies, UPF

Grant CEX2021-001195-M funded by MCIN/AEI /10.13039/501100011033

DTIC MdM Strategic Program: Artificial and Natural Intelligence for ICT and beyond

Mining the Knowledge of Scientific Publications

Principal researchers

Researchers

Related Assets

Department of Information and Communication Technologies, UPF