Atrás

24/01/19: Seminari COLT "Multimodal learning from images and speech"

24/01/19: Seminari COLT "Multimodal learning from images and speech"

20.01.2019

 

Title: Multimodal learning from images and speech given by Herman Kamper (Stellenbosch University, South Africa).
Date: Thursday 24 January, 2019
Schedule: 10:30 am
Location: Tanger building, 55.410
 
Abstract: Current speech recognition systems use supervised models trained on huge amounts of labelled resources. However, for many languages annotating speech is expensive and sometimes impossible, e.g., when dealing with endangered or unwritten languages. Recent work has started to explore models that can learn from images paired with untranscribed speech. This could be useful for understanding language acquisition in humans, word learning in robotics, and for low-resource speech processing.

In the first part of this talk I will present work on using images paired with spoken captions to ground unlabelled speech. Without seeing any parallel speech and text, the resulting neural network model can be used as a keyword spotter, predicting which utterances in a speech collection contain a given textual keyword. In the second part, I will talk about our recent work on multi-modal one-shot learning from images and speech.

https://arxiv.org/abs/1710.01949
https://arxiv.org/abs/1811.03875

Multimèdia

Multimedia

Multimedia

Categorías: