29/05/2024 Seminari organitzat pel COLT i l'IULATERM
Visual grounding of verbs and nominalisations in multimodal models, a càrrec d' Albert Gatt (Utrecht University)
Dia: 29 de maig del 2024
Horari: de 12.00h a 13.00h
Lloc: aula 52.217, UPF-Roc Boronat Building
Abstract:
What distinguishes a ‘runner’ from someone who is merely running? And what properties allow us to recognise that in one photograph, a man is arresting or apprehending someone, whereas in a different, but very similar scene, it’s just a case of someone holding another person by the arm? In this talk, I will focus on two sets of experiments on the grounding capabilities of vision-language transformer models. The common thread underlying these experiments is the interaction of linguistic and world knowledge, and the degree to which such models are able to integrate these when recognising the correspondence between a visual scene in an image, and a simple description involving a verb or a nominalisation.