Back COLT seminar - Thursday 24 January, 10:30h, Herman Kamper

COLT seminar - Thursday 24 January, 10:30h, Herman Kamper

Multimodal learning from images and speech

  • Speaker: Herman Kamper
  • When: Thursday 24 January, 10:30h
  • Where: Tanger building, 55.410

 

Current speech recognition systems use supervised models trained on huge amounts of labelled resources. However, for many languages annotating speech is expensive and sometimes impossible, e.g., when dealing with endangered or unwritten languages. Recent work has started to explore models that can learn from images paired with untranscribed speech. This could be useful for understanding language acquisition in humans, word learning in robotics, and for low-resource speech processing.

In the first part of this talk I will present work on using images paired with spoken captions to ground unlabelled speech. Without seeing any parallel speech and text, the resulting neural network model can be used as a keyword spotter, predicting which utterances in a speech collection contain a given textual keyword. In the second part, I will talk about our recent work on multi-modal one-shot learning from images and speech.

18.01.2019

 

COLT seminar - Thursday 24 January, 10:30h, Herman Kamper: Multimodal learning from images and speech

Multimedia

Categories:

SDG - Sustainable Development Goals:

Els ODS a la UPF

Contact