Tenured Associate Professor (Professora Agregada) at the Departament de Traducció i Ciéncies del Llenguatge of the Universitat Pompeu Fabra (UPF) and senior researcher at the Institut Universitari de Lingüstica Aplicada (IULA), also, of the UPF.

My area of research is Natural Language Processing (NLP). My current interests are mainly related to the automatic acquisition and exploitation of Language Resources (LR). Currently, I am leading a research group focused on projects related to Technologies of Language Resources

In 1986, I started working in NLP when I was awarded a grant to collaborate with the group that was developing Spanish modules for METAL, a machine translation system, in Siemens-Barcelona. In 1987, I moved to work for the EU project EUROTRA, also in Machine Translation, hosted by the Universitat de Barcelona (UB). In 1993, I became technical director of a small group of researchers in the UB while continuing work within the field of NLP through different research projects under the Grup d'Investigació en Lingüstica Computacional - Universitat de Barcelona. These projects were mainly related to the development of Language Resources and NLP applications (MULTEXT, PAROLE, SIMPLE, TRADE, PEKING).

In 2003 I was awarded with a 5 years grant by the Ramón y Cajal Program of the Spanish Ministerio de Educación y Ciencia to work at the IULA. In 2009 I was appointed Associate Professor of the Departament de Traducció i Ciències del Llenguatge of the UPF.

Current projects


LPS-BIGGER: Línea de Productos Software para BIG data a partir de aplicaciones innovadoras en Entornos Reales is a CIEN project, funded by the Ministerio de Economía y Competitividad and CDTI (IDI-20141260) and leaded by Indra Software Labs. HAVAS MEDIA Group, Fractalia, Playence are the other industrial participants. (http://www.cienlpsbigger.es) .

The main goal of the LPS-Bigger Project is the creation of a development framework for Big Data applications. One of these applications is a Social Media Monitoring system. Our group participation is devoted to the development of a social media text analysis system for the detection and classification of customer intention and opinion in real time.


The objective of the Red de recursos para tecnologías de la lengua (ReTeLe) Network of Excellence (MINECO, TIN2015-68955-REDT) is to organize activities that support communication among Language Resources Spanish stakeholders in order to: (1) organize and coordinate a roadmap and research on methods to speed-up the production of LRs, (2) shorten the distance with English resources so as to guarantee the availability of the required LRs for the most popular and innovative technologies; and (3) guarantee and maximize the availability of these resources by publishing them in the cloud as Linked Open Data (LOD) to make our community visible.


IULA-UPF CLARIN Competence Center. (http://clarin-es-lab.org/) Co-funded by the FEDER Catalunya 2007-2013 program, Departament d'Economia i Coneixement, Generalitat de Catalunya. CLARIN (www.clarin.eu) is one of the Research Infrastructures that were selected for the European Research Infrastructures Roadmap by ESFRI, the European Strategy Forum on Research Infrastructures. It is a distributed data infrastructure, with sites all over Europe. Typical sites are universities, research institutions, libraries and public archives. They all have in common that they provide access to digital language data collections, to digital tools to work with them, and to expertise for researchers to work with them.


The CLARIN Competence Centre IULA-UPF and HDLab@UPF, Department of Humanities (at Universitat Pompeu Fabra, Barcelona), UNED – LINHD: Laboratorio de innovación en Humanidades Digitales (Universidad Nacional de Educación a Distancia, Madrid) and UPV – Grupo IXA (University of the Basque Country, San Sebastián) have been jointly officially recognized as the Spanish CLARIN K Centre.

All groups are offering services to researchers working with Spanish texts and, additionally, IXA can afford experience in handling Basque texts and IULA-UPF-CCC Catalan texts.

Services provided by the Spanish K-Center:

• Virtual consultancy: by offering e-mail contact and a 24-h reply compromise about directions and references on current practices, standards, tools and resources.

• Support for self-learning with specialized resources: linked catalogues collecting existing knowledge, videotutorials, MOOCs, etc., with information about tools to support the actual usage of technologies.

• Organization of teaching and training programs for researchers, students, projects or interest groups.

Past research projects (2004 - 2015)


Scenario Knowledge Acquisition by Text Reading: Terminology Knowledge (SKATER-UPF-IULA), 2013-2016, Ministerio de Economía y Competitividad (TIN2012-38584-C06-05). The aim of SKATeR (Scenario Knowledge Acquisition by Textual Reading) is advancing the state-of-the-art in the integration of textual processing, semantic interpretation, inference and reasoning, detection and generalization of events, scenario induction and its exploitation in a number of advanced content-based domain applications. SKATeR is a coordinated project with Universitat Politècnica de Catalunya (UPC) Universidad del País Vasco/Euskal Herriko Unibertsitatea (UPV/EHU), Universitat de Barcelona (UB), Universitat Oberta de Catalunya (UOC) and Universidad de Vigo (UV) (http://nlp.lsi.upc.edu/skater/) .


DASISH brings together all 5 ESFRI research infrastructure initiatives in the social sciences and humanities (SSH): CLARIN, DARIAH, CESSDA, ESS and SHARE. The goal of DASISH is to determine areas of cross-fertilization and synergy in the infrastructure development all five communities are entering into as of the beginning of 2012 and to work on concrete joint activities related to data, such as data access, data sharing, data quality, and data archiving. Synergy can also be achieved by working together on solutions regarding legal and ethical aspects. Consortium partners are: Göteborgs Universiteit; Max Planck Institute; Kovenhavns Universitet, Koninklije Nederlandse Akademie van Wetenschappen-Knaw; King’s College London; Georg-August-Universitaet; Oesterrichsche Akademie der wissenschaften; University of Essex; Tampereen Ylipisto; Norsk Samfunnvitenskapelig Datatjeneste; The City University; GESIS – Leibniz Institut fur Sozialwissenchaften; National University of Ireland; Universitetet i Bergen; Universitat Mannheim; Universita Ca’Foscari Venezia; Stichting Centerdata; Tartu Ulikool. (http://dasish.eu/).


CLARA Initial Training Network for Common Language Resources and their Applications, has started its activities on 1st of December, within the context of the Marie Curie programme of the EU 7th Framework Programme ( Marie Curie Initial Training Network 7FP-ITN-238405). The objective of the CLARA network is to launch the training of a new generation of experts in linguistics that can develop methods of research for the construction, the use and the applications of language resources. The scientific objectives of CLARA are to go in greater depth into the creation of linguistic models based on real data that are then analysed with statistical and machine-learning tools, and on the hybridisation of techniques and methods of analysis. CLARA will fund a total of 17 training grants in different areas related to the creation, the use, and the applications of language resources. The calls will be being made public on the web page of the project (http://clara.uibo.no) and in Euraxess (http://ec.europa.eu/euraxess)


METANET4U: Enhancing the European Linguistic Infrastructure, (2011-2013), funded by UNER - Competitiveness and Innovation Framework Program, (CIP-PSP-270893). The central objective of the METANET4U project is to contribute to the establishment of a pan-European digital platform that makes available language resources and services, encompassing both datasets and software tools, for speech and language processing, and supports a new generation of exchange facilities for them. The project will be developed by the following partners in a consortium: Universitat Pompeu Fabra, University of Lisbon, Universitat Politècnica de Catalunya, University de Manchester, University Alexandru Ioan Cuza, Insitutul de Cercetarri Pentru Inteligenta Artificiala and the University de Malta. (http://metanet4u.eu/).


PANACEA: Platform for the Automatic, Normalized Acquisition of Language Resources for Human Language Technologies, (2010-2012), funded by the Language Technologies Area, Information and Communication Technologies, of the 7th Framework Programme (7FP-ITC-248064). PANACEA will develop technologies for the automation of all stages involved in the acquisition, production, updating, validation and maintenance of Linguistic Technologies and Resources. The project, coordinated by our group, counts with the participation of Cambridge University, the Istituto di Linguistica Computazionale, Italy, the Institute for Language and Speech Processing, Greece, Dublin City University, Ireland and two companies, the german Linguatec and the french ELDA, Evaluation and Language Resources Distribution Agency (http://www.panacea-lr.eu).


Flarenet: Fostering Language Resources Network, funded by the e-contentplus program of the European Union, Flarenet is a networking organization whose aims are devising and promoting consensual recommendations concerning the future development, deployment and use of LRs. Flarenet will indicate best practices and best policies for coordinating future actions and projects. The major activities of the Network will be to survey, analyse, classify LRs and relevant standards, together with their organisational and economic models, and discuss with major stakeholders and players upon new common strategies for a capillary deployment and use of LRs in real-world products.(http://flarenet.eu)


Clarin: Common Language Resources and Technologies Infrastructure, in Spain co-funded by the 7FP of the EU (FP7-INFRASTRUCTURES-2007-1-212230) and the Spanish Ministerio de Educación y Ciencia(CAC-2007-23) and Ministerio de Ciencia e Innovación (ICTS-2008-11 and ACI2009-0995). CLARIN is committed to establish an integrated and interoperable research infrastructure of language resources and its technology. It aims at lifting the current fragmentation, offering a stable, persistent, accessible and extendable infrastructure and therefore enabling eHumanities. clarin-es.iula.upf.edu, online lab at clarin-es-lab.org and www.clarin.eu


Clarin-CAT, funded by the Departament d'Innovació, Universitats i Empresa of the Generalitat de Catalunya, this project is committed to the integration of the Catalan language in CLARIN by the development of a demonstrator. This demonstrator integrate resources in Catalan as well as exploitation tools into the European Infraestructure CLARIN. The online lab is clarin-cat-lab.org

Adquisición automática de información léxica (AAILE y AAILE2), funded by the Ministry of Education and Culture (HUM2004-05111-C02-01/FILO and HUM2007-61067/FILO). The goal of our research is to study the feasibility of the automatic acquisition of the information contained in computational lexicons from corpus. The methodology is by using syntactic restrictions to bias the data, checking the lexical representation against experimental observations. Eventually, what deserves our interest in this area is to understand the role of the syntactic and semantic constraints that operate in texts, and in the feasibility of acquiring related information. Finding how Machine Learning methods can capture them will allow us to improve both the applications aiming at automatic acquisition of lexical information as well as the representation of the lexicon itself. AAILE web page


Linguistic Infrastructure for Interoperable Resources and Systems (LIRICS), funded by e-content program of European Union (EDC-22236). Duration of the project: 2004-2006. The key objective of LIRICS is to provide the European content and language industries with a common and stable set of formats, in the form of ISO standards, enabling interoperability and reuse of multilingual language resources, digital content and language engineering software.


Traducció automàtica de codi obert per al català (TACOC), funded by the Catalan government, its goal is the development of machine translation modules for Catalan-French, Catalan-Aranès, Catalan-English for the open source shallow-transfer machine translation platform Apertium developed by the Group Transducens, University of Alicante. [finished 02-07] DEMO at http://xixona.dlsi.ua.es/apertium

Current research topics

Technologies of Language Resources

Automatic acquisition of Language Resources: PANACEA web page, AAILE web page and www.flarenet.eu

Corpus and tools: Treebanking and Dependency Parsing

Machine Translation

I am teaching at the UPF Master's in Theoretical and Applied Linguistics

I am currently (co-)supervising the following PhD Students:

  • Marina Fomicheva (with Iria da Cunha): Modeling acceptable variation between candidate and reference translations in the context of machine translation evaluation
  • Jingyi Han: Phrase Table Expansion for Statistical Machine Translation with reduced parallel corpora: the Chinese-Spanish Case
  • Marco del Tredici: A distributional approach to the modeling of metaphor and polysemy
  • PhD Dissertations I recently supervised:

  • Silvia Necsulescu (2016) Automatic acquisition of lexical-semantic relations. Gathering information in a dense representation
  • Silvia Vázquez Suárez (2016) Pattern-Based Automatic Induction of Domain Adapted Resources for Social Media Analysis
  • Lauren Romeo (2015) The structure of the lexicon in the task of the automatic acquisition of lexical information
  • Sheila Queralt (2015) Estudio Piloto para la evaluación de evidencias lingüísticas en la comparación forense de textos mediante distribuciones poblacionales y relaciones de verosimilitudes. Cosupervisor with Lawrence M. Solan.
  • Hèctor Martínez (2013) Annotation of regular polysemy. An empirical assessment of the underspecied sense. Cosupervisor with Bolette Pedersen, University of Copenhanguen.
  • Gabriela Resnik (2012) Los nombre eventivos no deverbales en español. Cosupervisor with Àlex Alsina, Universitat Pompeu Fabra.
    Publications, Publicaciones, Publicacions

