The huge amount of scientific articles available online as well as the high publication rate of new research has turned the collection of relevant information about a topic into an extremely complex and time-consuming task for researchers. In this context, scientific summarization emerges as an indispensable tool to automate the aggregation of textual data and help users with the selection of relevant contents. Traditional approaches to the summarization of scientific literature usually consider the textual contents of one or more articles as the main source of information to be exploited to generate summaries. During the last decade, in parallel with the widespread diffusion of socially generated contents all over the Web, new on-line services to share and discuss findings, connect people and create groups of interest reached the research community. Web portals like ResearchGate, or LinkedIn are attracting more and more researchers, thus progressively becoming relevant hubs to interact, share and search for scientific contents. Many universities and researchers are extremely active in  exchanging opinions both by their Web sites and by their social media profiles, like Twitter and Facebook. Repositories like Google Scholar, PubMed, Microsoft Academic Search are indexing a huge amount of scientific publications, tracking their citations in order to support faceted searches as well as to provide updated metrics to quantify the output of research exposing considerable amount of valuable data.

Based on current trends in scientific text summarization, this proposal aims at developing new text understanding and summarization approaches in the academic world which will  take advantage of  new data sources and social connections to improve scientific text understanding and summarization. We will investigate new techniques to complement and extend the textual information present in scientific articles with the new research-related data currently available on-line. As a consequence we will create an extended semantic network of links among research papers, authors, institutions (by means not only of citations but social connections, comments and opinions, etc.) useful to improve scientific text understanding and summarization.  The work to be carried out requires a candidate with excellent skills in current machine learning methods in NLP including current neural network architectures for NLP.

In the context of the Maria de Maeztu Strategic Research Program, we are looking for a highly motivated PhD candidate in the area of Natural Language Processing to work  in a project dealing with Mining,  Understanding, and Summarizing Academic Social Networks.

The PhD will be carried out at the TALN research group of the Department of Information and Communication Technologies (DTIC),  Universitat Pompeu Fabra (UPF) in Barcelona.

The PhD student should have background in Natural Language Processing with a solid knowledge of statistics, mathematics, computer programming, and machine learning. Experience in Information Extraction, Text Summarization, or related areas would be appreciated.