Carina Silberer from Universität Stuttgart is visiting our group this week. She will be meeting with different members, exchanging with us on her works, and giving a talk this coming Wednesday.


When: April 12th, 2023, 12.00 to 13.00

Where: Room 52.737 or online on zoom (please reach out to get the passcode!)


Speaker: Carina Silberer, Universität Stuttgart


Title: Language Use in Visual Procedural Tasks



Humans acquire and use language through their communication and interaction within and with the perceptual/physical environment. The area of large-scale visually grounded language models and multimodal learning has made huge progress towards acquiring the ability to verbally interact with humans within this rich multimodal environment.
In this talk, I will speak about works on three aspects which are still challenging in multimodal natural language processing/computational linguistics.
The first part of this talk focuses on the aspect of commonsense knowledge, which is essential but usually left implicit in human communication, reasoning and drawing inferences. I will present a study that examines the types of commonsense knowledge (visual-)linguistic models do and do not capture.
Our findings lead to the second part of the talk, on how we can have VL models learn procedural or physical commonsense knowledge through naturalistic data (noisy transcribed videos). I will focus on the interplay between the events and the causal effect of actions applied to objects during procedural tasks, and its role for affordance learning.
The final part of the talk presents future work that seeks to develop a tutoring system that guides a user towards completion of everyday tasks. Embedded in this framework, we aim to  study the evolving dynamics of verbal and non-verbal references between the teacher and user and the physical environment/context.



