The objective of the project TERMMED is to analyze the terminological change in medical texts, in order to detect clues that explain the evolution of knowledge.

Scientific knowledge in medicine is constantly growing, due to the identification of new diseases, the development of new methods and technologies, the appearance of new treatments and, especially, due to a more detailed description of different phenomena that occur. One of the clues to detect this knowledge is the analysis of the doctors’ and experts’ oral and written productions, within which terminology is a key factor. The analysis of medical texts lexicon and, in particular, of the terminological variation detected over time is decisive to represent this (hidden) evolution of knowledge, always paying special attention to the presence or absence of units, the changes that occur between paradigms or the changes that are detected in terms of content.

We will focus on medical terminology in Spanish. The phenomena of lexical change that we want to address include selection of variants, polysemy, fixation of reductions; lexicalization of phrases, conversions, nominalizations, neology and the disappearance of forms. The methodology combines exploration of textual corpora; automatic extraction of terminology; query of ontologies and other lexical sources; statistical analysis of term variants; and experimentation on automatic detection of semantic neology.

The preparation of linguistic resources suitable for the development of advanced language technologies adapted to medical documentation in Spanish is one of the challenges for the 21st century. The setting-up of these resources allows us to consider applications that will cover multiple needs not only from health professionals and researchers in biomedicine (information retrieval systems, data prediction, medical texts classifiers, text summarisers, text coders, writing assistants, etc.), but also from patients (text simplification systems, rewriting of texts adapted to different cognitive levels, etc.). Given the natural proliferation of terminological variation, language technologies require verified data that allows them to automate decision making. We propose to provide data that will allow inferring behavioural patterns of terminological variation in medicine. In this project we will not develop technological resources, but we will publish open access lexical resources derived from our analysis, so that they can be implemented in language technologies adapted to different fields.