Below the list of projects cofunded by the María de Maeztu program (selected via internal calls, in this link the first one launched at the beginning of the program, and in this link the second one, launched in September 2016).
In addition, the program supported:
- joint calls for cooperation between DTIC and the UPF Department of Experimental and Health Sciences (CEXS), also recognised as a María de Maeztu Unit of Excellence. Here the link to the second call (November 2017). The first call took place in January 2017.
- its own Open Science and Innovation program
- a pilot program to promote educational research collaborations with industry
The detail of the internal procedures for the distribution of funds associated to the program can be found here
Mining the Knowledge of Scientific Publications
Mining the Knowledge of Scientific Publications
Mining the Knowledge of Scientific Publications
During the last decade the amount of scientific information available on-line increased at an unprecedented rate, with recent estimates reporting that a new paper is published every 20 seconds. PubMed includes about 24.6M papers with a growth rate of about 1,370 new articles per day. Elsevier’ Scopus and Thomson Reuther’s ISI Web of Knowledge respectively contain more than 57 and 90 million papers.
In this scenario of scientific information overload, researchers are overwhelmed by an enormous and continuously growing number of articles to access in their daily activities. The exploration of recent advances concerning specific topics, methods and techniques, peer reviewing, the writing and evaluation of research proposals and in general any activity that requires a careful and comprehensive assessment of scientific literature has turned into an extremely complex and time-consuming task. The availability of text mining tools able to extract, aggregate, summarize and turn scientific unstructured textual contents into well organized and interconnected knowledge is fundamental in a scientific information access scenario.
In order to take full advantage of the knowledge present in scientific publications proper semantic indexing, search and content aggregation approaches, are required. In general, the semantic interpretation and enrichment of scientific texts would leverage the development of a varied set of interacting applications supporting tasks such as search of new information on specific scientific problems, semi-automatic assessment of papers and research proposals hypothesis formulation tracking of scientific and technological advances scientific intelligence assisted report and review writing, and question answering
The main objective of this project is the extension and development of software and approaches as well as the creation of new datasets that will facilitate the extraction and summarization of knowledge from scientific publications in different disciplines The project is a collaboration between the Natural Language Processing Research Group (TALN) and the Web Research Group (WRG).
To learn more:
- Presentation of the project at the Data-driven Knowledge Extraction Workshop, June 2016. (Slides)
Principal researchers
Horacio SaggionResearchers
Francesco Ronzano Francesco Barbieri Beatriz Fisas Ahmed Abura’ed Luis Espinosa-Anke Ricardo Baeza-Yates Ana Freire Diego Sáez-Trumper (EURECAT)Related Assets:
-
Dr. Inventor Text Mining Framework
-
[TEXT] Dr. Inventor Multi-layer Scientific Corpus
-
SUMMA - Text Summarization Toolkit
-
Ronzano F, Saggion H. Knowledge Extraction and Modeling from Scientific Publications. Enhancing Scholarly Data Workshop –SAVE-SD2016
-
Ronzano F, Saggion H. An Empirical Assessment of Citation Information in Scientific Summarization. Proceedings of the 21st International Conference on Applications of Natural Language to Information Systems (NLDB 2016), 22-24 June 2016, Manchester, UK
-
Natural Language Processing for Intelligent Access to Scientific Information Tutorial
-
Saggion H, AbuRa'ed A, Ronzano F. Trainable citation-enhanced summarization of scientific articles. Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016).
-
Ronzano F, Freire A, Saez-Trumper D, Saggion H. Making sense of massive amounts of scientific publications: the scientific knowledge miner project. Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2016)
-
Fisas B, Ronzano F, Saggion H. A Multi-Layered Annotated Corpus of Scientific Papers. Language Resource and Evaluation Conference 2016
-
Francesco Ronzano wins the 1st Language Technologies Hackathon at 4YFN
-
How can Natural Language Processing improve access to scientific literature? Tutorial at COLING 2016
-
Ferres D, AbuRa'ed A, Saggion H. Spanish Morphological Generation with Wide-Coverage Lexicons and Decision Trees. Procesamiento del Lenguaje Natural
-
Scientific Text Mining and Summarization Services
-
Saggion H, Ronzano F, Accuosto P, Ferrés D. MultiScien: a Bi-Lingual Natural Language Processing System for Mining and Enrichment of Scientific Collections. 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2017), at SIGIR 2017
-
Accuosto P, Ronzano F, Ferrés D, Saggion H. Multi-level mining and visualization of scientific text collections. 6th International Workshop on mining scientific publications, Proceedings of The 6st International Workshop on Mining Scientific Publications. Joint Conference on Digital Libraries (JCDL’17)
-
KOLUMBA, prototype web-based e-mail client for people with disabilities, awarded at Web for All congress
-
Abura’ed A, Chiruzzo L, Saggion H, Accuosto P, Bravo A. LaSTUS/TALN @ CLSciSumm-17: Cross-document Sentence Matching and Scientific Text Summarization Systems. Proceedings of the Second Joint Workshop on Bibliometric Enhanced Information Retrieval and Natural Language Processing for Digital Libraries
-
Talk by Horacio Saggion at the Allen Institute for Artificial Intelligence on NLP for the scientific domain
-
AbuRa'ed A, Saggion H, Chiruzzo L. What Sentence are you Referring to and Why? Identifying Cited Sentences in Scientific Literature. Recent Advances in Natural Languages Processing (RANLP2017)
-
Espinosa-Anke L, Tello J, Pardo A, Medrano I, Ureña A, Salcedo I, Saggion H. Savana: A Global Information Extraction and Terminology Expansion Framework in the Medical Domain. Procesamiento del Lenguaje Natural 57: 23-30 (2016)
-
[MSc thesis] Term extraction and document similarity in an Integrated Learning Design Environment
-
PDF Digest - free online tool to parse PDF files
-
Abura'ed A, Bravo A, Chiruzzo L, Saggion H. LaSTUS/TALN+INCO @ CL-SciSumm 2018 - Using Regression and Convolutions for Cross-document Semantic Linking and Summarization of Scholarly Literature. Proceedings of the 3rd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2018)
-
Best summarization system at CL-SciSumm 2018 challenge at BIRNDL by the TALN-UPF team
-
AbuRa’ed A, Chiruzzo L, Saggion H. Experiments in detection of implicit citations. OSP 2018: 7th International Workshop on Mining Scientific Publications
-
[BSc thesis] Mining Zenodo: Data extraction and indexing of a research repository
-
Accuosto P, Saggion H. Transferring knowledge from discourse to arguments: A case study with scientific abstracts. 6th ACL Workshop on Argument Mining
-
Pablo Accuosto receives the Best Paper Award at ACL ArgMining 2019
-
Chiruzzo L, AbuRa’ed A, Bravo A, Saggion H. LaSTUS-TALN+INCO @ CL-SciSumm 2019. 4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2019)
-
Accuosto P, Saggion H. Discourse-Driven Argument Mining in Scientific Abstracts. Natural Language Processing and Information Systems. NLDB 2019
-
[PhD thesis] Automatic generation of descriptive related work reports
-
Accuosto P, Saggion H. Improving the accessibility of biomedical texts by semantic enrichment and definition expansion. XXXIV Congreso Internacional de la Sociedad Española para el Procesamiento del Lenguaje Natural; 2018
-
Ferrés D, Saggion H, Ronzano F, Bravo À. PDFdigest: an adaptable layout-aware PDF-to-XML textual content extractor for scientific articles. Language Resources and Evaluation Conference (LREC 2018)
-
AbuRa’ed A, Saggion H. LaSTUS/TALN at Complex Word Identification (CWI) 2018 Shared Task. Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications