Back 29/05/2024 Seminari organitzat pel COLT i l'IULATERM

29/05/2024 Seminari organitzat pel COLT i l'IULATERM

Visual grounding of verbs and nominalisations in multimodal models, a càrrec d' Albert Gatt (Utrecht University)




Dia: 29 de maig del 2024
Horari: de 12.00h a 13.00h 
Lloc: aula 52.217, UPF-Roc Boronat Building

What distinguishes a ‘runner’ from someone who is merely running? And what properties allow us to recognise that in one photograph, a man is arresting or apprehending someone, whereas in a different, but very similar scene, it’s just a case of someone holding another person by the arm? In this talk, I will focus on two sets of experiments on the grounding capabilities of vision-language transformer models. The common thread underlying these experiments is the interaction of linguistic and world knowledge, and the degree to which such models are able to integrate these when recognising the correspondence between a visual scene in an image, and a simple description involving a verb or a nominalisation.



SDG - Sustainable Development Goals:

Els ODS a la UPF
