Thesis linked to the implementation of the María de Maeztu Strategic Research Program.

Open access to PhD thesis carried out at the Department can be found at TDX

Please visit these pages for information on our PhD, MSc and BSc programs.

 

Back [PhD thesis] Automatic generation of descriptive related work reports

[PhD thesis] Automatic generation of descriptive related work reports

Authors: Ahmed Ghassan Tawfiq AbuRa'ed

Supervisor: Horacio Saggion

A related work report is a section in a research paper which integrates key information from a list of related scientific papers providing context to the work being presented. Related work reports can either be descriptive or integrative. Integrative related work reports provide a high-level overview and critique of the scientific papers by comparing them with each other, providing fewer details of individual studies. Descriptive related work reports, instead, provide more in-depth information about each mentioned study providing information such as methods and results of the cited works. In order to write a related work report, scientist have to identify, condense/summarize, and combine relevant information from different scientific papers. However, such task is complicated due to the available volume of scientific papers. In this context, the automatic generation of related work reports appears to be an important problem to tackle. The automatic generation of related work reports can be considered as an instance of the multi-document summarization problem where, given a list of scientific papers, the main objective is to automatically summarize those scientific papers and generate related work reports. In order to study the problem of related work generation, we have developed a manually annotated, machine readable data-set of related work sections, cited papers (e.g. references) and sentences, together with an additional layer of papers citing the references. We have also investigated the relation between a citation context in a citing paper and the scientific paper it is citing so as to properly model cross-document relations and inform our summarization approach. Moreover, we have also investigated the identification of explicit and implicit citations to a given scientific paper which is an important task in several scientific text mining activities such as citation purpose identification, scientific opinion mining, and scientific summarization. We present both extractive and abstractive methods to summarize a list of scientific papers by utilizing their citation network. The extractive approach follows three stages: scoring the sentences of the scientific papers based on their citation network, selecting sentences from each scientific paper to be mentioned in the related work report, and generating an organized related work report by grouping the sentences of the scientific papers that belong to the same topic together. On the other hand, the abstractive approach attempts to generate citation sentences to be included in a related work report, taking advantage of current sequence-to-sequence neural architectures and resources that we have created specifically for this task. The thesis also presents and discusses automatic and manual evaluation of the generated related work reports showing the viability of the proposed approaches.

Link to manuscript: http://hdl.handle.net/10803/669975