Language Understanding Test Sets

Project PID2019-104512GB-I00 funded by Ministerio de Ciencia e Innovación (Spain).

LUTEST objective is the creation of test sets and an evaluation methodology that provide significant evidence about the linguistic generalization capabilities of deep learning methods applied to natural language processing.  For the last years, there have been different works on building test sets and evaluation methods for the purpose of assessing the language understanding capabilities of deep neural models and what information they select and encode. However, there is much work to be done yet, in particular from a linguistically-motivated perspective. LUTEST attempts to base the assessment of deep language model's generalization capabilities on the linguistics-grounded hypothesis that, if actually generalizing, whatever difference in the representations of two same meaning, but different in structure sentences, this difference will show up the same in a significant number of sentence pairs which exhibit the same structure-transformational phenomenon, despite of any lexical variation.

More information: https://github.com/nuriabel/LUTEST



The CLARIN ParlaMint project is compiling comparable parliamentary corpora for a number of countries and languages. ParlaMint corpora are interoperable, i.e. encoded to a very constrained common ParlaMint schema, a specialisation of the Parla-CLARIN recommendations, which are a customisation of the TEI Guidelines.

More information: https://github.com/clarin-eric/ParlaMint 



The project Red Estratégica para la promoción de las infraestructuras de tecnologías del lenguaje en eHumanidades y Ciencias Sociales funded by Ministerio Ciencia, Innovación y Universidades (RED2018-102797-E) is aiming at improving dissemination of NLP and Language Technologies among researchers in the eHumanities and Social Sciences. 

Founding partners are: Universidad de Vigo, Universidad del País Vasco, Universidad Nacional de Educación a Distancia, Universidad de Jaén, Biblioteca Miguel de Cervantes, Universidad Complutense de Madrid. Universitat Pompeu Fabra

More information: http://ixa2.si.ehu.es/intele/



IULA-UPF CLARIN Competence Center. (http://clarin-es-lab.org/) Co-funded by the FEDER Catalunya 2007-2013 program, Departament d'Economia i Coneixement, Generalitat de Catalunya. CLARIN (www.clarin.eu) is one of the Research Infrastructures that were selected for the European Research Infrastructures Roadmap by ESFRI, the European Strategy Forum on Research Infrastructures. It is a distributed data infrastructure, with sites all over Europe. Typical sites are universities, research institutions, libraries and public archives. They all have in common that they provide access to digital language data collections, to digital tools to work with them, and to expertise for researchers to work with them.




The CLARIN Competence Centre IULA-UPF and HDLab@UPF, Department of Humanities (at Universitat Pompeu Fabra, Barcelona), UNED – LINHD: Laboratorio de innovación en Humanidades Digitales (Universidad Nacional de Educación a Distancia, Madrid) and UPV – Grupo IXA (University of the Basque Country, San Sebastián) have been jointly officially recognized as the Spanish CLARIN K Centre.

More information: http://clarin-es.org/

All groups are offering services to researchers working with Spanish texts and, additionally, IXA can afford experience in handling Basque texts and IULA-UPF-CCC Catalan texts.

Services provided by the Spanish K-Center:

  • Virtual consultancy: by offering e-mail contact and a 24-h reply compromise about directions and references on current practices, standards, tools and resources.
  • Support for self-learning with specialized resources: linked catalogues collecting existing knowledge, videotutorials, MOOCs, etc., with information about tools to support the actual usage of technologies.
  • Organization of teaching and training programs for researchers, students, projects or interest groups.