Dr. Inventor Text Mining Framework
Dr. Inventor Text Mining Framework is a Java library that integrates several Document Engeneering and Natural Language Processing tools customized to enable and ease the analysis of the textual contents of scientific publications.
Dr. Inventor Text Mining Framework is a standalone Java library that enable users to process the contents of papers both in PDF and JATS XML format. Once imported a paper from a local file or a remote URL, the Framework automatically extracts and characterizes several aspects including:
- Structural elements: title, abstract, hierarchy of sections, sentences inside each section, bibliographic entries
- Bibliographic entries are parsed and enriched by accessing external web services (Bibsonomy, CrossRef, FreeCite, Google Scholar)
- Inline citations are spotted and linked to the respective bibliographic entry
- The dependency tree is built from each sentence by considering inline citations
- The discoursive category of each sentence is identified among: Background, Challenge, Approach, Outcome and Future Work
- BabelNet synsets are spotted inside the contents of each sentence thanks to Babelfy
- Subject-Verb-Object graphs are build to represent the contents of paper excerpts (the connectedness of these graphs is enhanced thanks to coreference resolution)
- Relevant sentences are selected with respect to several criteria to build extractive summaries of a paper
- etc.
Ronzano, F., & Saggion, H.: Dr. Inventor Framework: Extracting Structured Information from Scientific Publications. Discovery Science (pp. 209-220). Springer International Publishing. (2015)
Related Assets:
-
Scientific Text Mining and Summarization Services
-
Abura’ed A, Chiruzzo L, Saggion H, Accuosto P, Bravo A. LaSTUS/TALN @ CLSciSumm-17: Cross-document Sentence Matching and Scientific Text Summarization Systems. Proceedings of the Second Joint Workshop on Bibliometric Enhanced Information Retrieval and Natural Language Processing for Digital Libraries
-
Mining the Knowledge of Scientific Publications
-
Saggion H, Ronzano F, Accuosto P, Ferrés D. MultiScien: a Bi-Lingual Natural Language Processing System for Mining and Enrichment of Scientific Collections. 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2017), at SIGIR 2017
-
Opening Science to Open Innovation
-
How can Natural Language Processing improve access to scientific literature? Tutorial at COLING 2016
-
[BSc thesis] Mining Zenodo: Data extraction and indexing of a research repository