During the last decades scientific literature has experimented an exponential growth: every thirteen seconds a new article is published and thus added to the already huge set of more than 2.5 million papers that are currently available online. In this scenario, automated approaches to extract, enrich, aggregate and summarize the content of publications have become essential tools to help researchers and any other interested actor to deal with scientific information overload. Natural Language Processing and Text Mining play a central role as key technologies to enable such automated analyses of scientific literature.
The tutorial "Natural Language Processing for Intelligent Access to Scientific Information", presented at the 26th International Conference on Computational Linguistics (COLING 2016), provides an overview of the main approaches to mine a broad variety of structured, semantic information from scientific articles by focusing on the following four aspects:
The main scientific information extraction challenges proposed during the last few years are also reviewed during the tutorial as well as the most relevant dataset useful to experiment with new approaches of scientific text mining. Some examples of big data architectures to crawl and process huge volumes of scientific publications are discussed. The Dr. Inventor Text Mining Framework, a java-based library that integrates a varied set of scientific text mining tools, is presented by providing an overview of its architecture and a practical demo of how a scientific publication is processed and which kind of information is extracted. The tutorial was attended by more than thirty people thus providing a chance to discuss in more details current approaches and new ideas.
Relevant links:
Tutorial Web Site (including slides): http://taln.upf.edu/pares/coling2016tutorial/
Dr. Inventor Text Mining Framework: http://driframework.readthedocs.io/