The second Maria de Maeztu Strategic Research Program (CEX2021-001195-M) of the Department of Information and Communication Technologies (DTIC) takes place between 2023 and 2026. The website for this program is under construction. You can find some details in this news.

The first María de Maeztu Strategic Research Program (MDM-2015-0502) took place between January 2016 and June 2020. It was focused on data-driven knowledge extraction, boosting synergistic research initiatives across our different research areas.

Back Best summarization system at CL-SciSumm 2018 challenge at BIRNDL by the TALN-UPF team

 

The UPF team, composed by Ahmed Abura’ed, Alex Bravo and Horacio Saggion, in collaboration with Luis Chiruzzo from Universidad de la República (Uruguay), developed several systems to participate in the challenge, described in “LaSTUS/TALN+INCO @ CL-SciSumm 2018 - Using Regression and Convolutions for Cross-document Semantic Linking and Summarization of Scholarly Literature”, where they obtaind the best performance in task 2 on summarisation against the abstract and human summaries, and the second-best performance against community sumaries. The novelty in the systems proposed by UPF this year (the team has participated in the two previous editions as well) was the use of convolutions and regression in order to identify which parts of a reference paper has been cited by a set of citing papers and finally furnish a summary of the reference paper based on those citations. 

This 2018 task follows up on the successful CLScisumm-17 at the BIRNDL workshop co-located with SIGIR 2017, Tokyo, CLScisumm-16 co-located with JCDL 2016, Rutgers, NJ, USA and the CL Pilot Task conducted as a part of the BiomedSumm Track at the Text Analysis Conference 2014 (TAC 2014). The CL-SciSumm Shared Task is run off the CL-SciSumm corpus, and comprises three sub-tasks in automatic research paper summarization on a new corpus of research papers. A training corpus of forty topics has been released, as well as a test corpus of ten topics. The topics comprise of ACL Computational Linguistics research papers, and their citing papers and t hree output summaries each. The three output summaries comprise: the traditional self-summary of the paper (the abstract), the community summary (the collection of citation sentences ‘citances’) and a human summary written by a trained annotator. Within the corpus, each citance is also mapped to its referenced text in the reference paper and tagged with the information facet it represents (annotated dataset and evaluation scripts available at https://github.com/WING-NUS/scisumm-corpus )

The 2018 challenge had two tasks: Task 1 [A and B] to identify the reference paper parts that are being cited by a citing paper and why (the discourse facet they belong to) and Task2 to generate a summary based on those citations. The organizers provided three types of summaries as output: community (based on the citations), abstract (the traditional self-summary of the reference paper) and human (manual written by an annotator).

Department of Information and Communication Technologies, UPF

Grant CEX2021-001195-M funded by MCIN/AEI /10.13039/501100011033


 


Department of Information and Communication Technologies, UPF

[email protected]

  • Àngel Lozano - Scientific director
  • Aurelio Ruiz - Program management