We develop a large number of software tools and hosting infrastructures to support the research developed at the Department. We will be detailing in this section the different tools available. You can take a look for the moment at the offer available within the UPF Knowledge Portal, the innovations created in the context of EU projects in the Innovation Radar and the software sections of some of our research groups:

 

 Artificial Intelligence

 Nonlinear Time Series Analysis

 Web Research 

 

 Music Technology

 Interactive  Technologies

 Barcelona MedTech

 Natural Language  Processing

 Nonlinear Time Series  Analysis

UbicaLab

Wireless Networking

Educational Technologies

GitHub

 

 

Back ColWordNet incorporates and proposes the preferred combinations of words inherent in the practice of language

ColWordNet incorporates and proposes the preferred combinations of words inherent in the practice of language

An extension of the most widely used lexical resource, WordNet, developed by members of the research group on Natural Language Processing that they will be presenting at the 2016 International Coling Conference, in Osaka (Japan) in December.

28.10.2016

 

In the field of natural language processing, WordNet is probably the best known lexical resource. WordNet is a lexical database in English that combines lexicographic information (which we can find in dictionaries), such as definitions and synonyms, with semantic information, such as the hyperonyms, or general and abstract term that may refer to another more specific and particular term, for example convertible and car.

In practice, WordNet is able to link words, such as cat/feline, ford/car and so on. All of these aspects are very important in Artificial Intelligence, since they are crucial in the teaching and learning process of an automatic system. Taking into account all these aspects of language makes a non-human device capable of reading a text properly.

But, the WordNet reference database lacks a very important aspect which is information on collocations or preferred combinations of words, a feature of language that humans learn inductively in the practice of speech and that today’s standard dictionaries hardly take into account. Members of the Research Group on Natural Language Processing (TALN) have created an extended version of WordNet, ColWordNet, which incorporates into the database millions of links between lexemes belonging to a collocation.

Collocations are a type of phraseological unit, consisting of elements that have a certain mutual attraction, a combinatorial preference, and have a transparent and compositional meaning. Luis Espinosa-Anke, first author of the paper, explains that “collocations are combinations of words whereby we might say that the use of one is conditioned by the presence of the other”. For example, while we say in Spanish, “dar un paseo” in English they say “take a walk” while in Spanish we never speak “tomar un paseo”.

This aspect of language is what the authors have introduced into the WordNet reference database in order to create ColWordNet because “we do not want a machine to say “big rain” or “colossal rain”, but we want it to refer to “heavy rain”. Although big/colossal/heavy are very close, almost synonymous, only one of them is correct when combined with “rain”, adds Anke-Espinosa.

This concretion exists in collocations, and measuring it, compared to other more grammaticalized phrases, is a complicated task. ColWordNet does not just read collocations from the McMillan Collocations Dictionary, it also uses a machine learning technique to discover new collocational relationships between concepts of WordNet.

This research has been carried out by Luis Espinosa-Anke, Sara Rodríguez, Horacio Saggion, researchers of the TALN, Leo Wanner, group leader and ICREA researcher with the Department of Information and Communication Technologies (DTIC), and José Camacho-Collados of Sapienza University of Rome (Italy), which they are to present at the 26th International Conference on Computational Linguistics (Coling 2016) to be held from 11 to 16 December 2016 in Osaka (Japan). This work has been partly funded by European projects MULTISENSOR (FP7), KRISTINA (H2020), HARENES; and the Spanish Maria de Maeztu Unit of Excellence programme and MINECO/ERDF; it is very much in the line promoted by the DTIC-MDM Strategic Programme for the promotion of reproducibility, given that the results, whenever possible, are published in Creative Commons and open-type licences in order to promote their use.

Reference work:

Luis Espinosa-Anke, José Camacho-Collados, Sara Rodríguez-Fernández, Horacio Saggion, Leo Wanner (2016), “Extending WordNet Fine-Grained Collocational Information via Supervised Distributional Learning”, The 26th International Conference on Computational Linguistics (Coling 2016), 11-16 de desembre, Osaka (Japó)

Multimedia

Categories:

SDG - Sustainable Development Goals:

Els ODS a la UPF

Contact