Back PhD position open! Apply by March 18

PhD position open! Apply by March 18



[The deadline for this call has passed. The shortlisted candidates have been informed.]

4-year PhD position: Computational/Data Science approaches to reference to objects in visual data

Universitat Pompeu Fabra, Barcelona, Spain

Funded by ERC Starting Grant 715154, AMORE: A distributional MOdel of Reference to Entities -

Application deadline: Wednesday March 18 2020



Humans can communicate in part because they share the way they refer to objects. For instance, suppose I see my neighbor's dog, a chihuahua, running in the park: do I refer to it as "the animal”, or “the dog”, or “the chihuahua"? Or maybe "the chihuahua that's running towards the tree", or "the small dog on the left"? For any given object, there is a large number of different referring expressions that we could choose to use; and yet there are regularities in how people choose, and interpret, such referring expressions, as otherwise we would not be able to communicate. Despite substantial work on this topic in Computational Linguistics, Linguistics, and Cognitive Science, it is still far from clear how reference works.   

This project examines reference to objects in visual data (images, perhaps also video) with two methodologies: 

- Data Science for Linguistics / Cognitive Science

- Artificial Intelligence: Computational modeling with Machine Learning

As for the former, the availability of large-scale data resources as well as usable computational representations (in particular distributed representations of the sort used in deep learning) allows us to address linguistic and psycholinguistic questions related to reference using Data Science techniques. The primary questions here are 1) what kind of regularities/variation do we find in different referring expressions for the same object?, 2) how do object properties, on the one hand, and contextual information, on the other, affect the choice of referring expression?

As for the latter, research in Computational Linguistics and Language and Vision has made quite a bit of progress, in terms of both data and modeling, in addressing referring expression generation and interpretation; however, there is still a long way to go for models to truly mimic human behavior. The goal of this part of the thesis will be to improve computational models of tasks related to reference, incorporating insights from the analysis mentioned above. 

The emphasis can be placed more in one or the other methodology depending on the interests and experience of the successful candidate.

Part of the work can be carried out on a dataset developed within the AMORE project:

Silberer, C., S. Zarrieß, G. Boleda. 2020. Object Naming in Language and Vision: A Survey and a New Dataset. In Proceedings of LREC 2020, to appear. (Pre-print version available here.)



The thesis will be carried out in the COLT research group ( COLT is a young, dynamic, cohesive group currently consisting of 12 senior, post-doc, and PhD researchers whose interests are related to the thesis topic. Its premises are in the Communication Campus of UPF (, with a lively ecosystem of researchers working on Linguistics, Computer Science, and Cognitive Science, and specifically on Computational Linguistics / Natural Language Processing.

Universitat Pompeu Fabra is a small, research-oriented, highly international institution (, consistently ranked top in research among Spanish universities and placed 15th worldwide in the Times Higher Education ranking "150 under 50".

Barcelona is a unique city, with a Mediterranean and cosmopolitan culture, and very livable (



The position is open to people of all nationalities. The selected candidate will enrol in the PhD program of the department; for this, a Master's (or equivalent; exact conditions under "Admission requirements" here) is required by the starting date of the job (NOT at application time). Interest in the thesis topic and an appropriate academic background are of course also required. 

The ideal candidate is someone who is genuinely interested in finding out how language works, and has a solid background in linguistics, quantitative methods (statistics, Data Science) and computational linguistics. It is going to be difficult to find someone that covers the three areas; please apply if you have a relevant background even if it doesn't include all three aspects.



The PhD position comes with an employment contract.

Salary: Approximately 19,440€ first year / 23,000€ second, third, and fourth years (brutto).

Benefits: Social Security and health insurance provided to all workers by the Spanish state.



The PhD student will teach approximately two labs per academic year in B. A. courses of the department.



Applicants should submit via email to Prof. Gemma Boleda (gemma.boleda AT a single pdf file with:

- CV (max. 2 pages), including name and e-mail address of two academic referees;

- a cover letter (max. 2 pages) explaining why you are interested in this position and how your profile fits the project.

We aim at building a diverse team; all applications are welcome, especially those of female researchers and members of other underrepresented collectives. Informal inquiries are welcome (gemma.boleda AT 



Application deadline: March 18 2020.

Starting date: October 1 2020 (with some flexibility; an earlier starting date would be welcome).




SDG - Sustainable Development Goals:

Els ODS a la UPF