Thesis linked to the implementation of the María de Maeztu Strategic Research Program.

Open access to PhD thesis carried out at the Department can be found at TDX

Please visit these pages for information on our PhD, MSc and BSc programs.

 

Back [PhD thesis] Incorporating Prosody into Neural Speech Processing Pipelines. Applications on automatic speech transcription and spoken language machine translation

[PhD thesis] Incorporating Prosody into Neural Speech Processing Pipelines. Applications on automatic speech transcription and spoken language machine translation

Author: Alp Öktem

Supervisor: Mireia Farrús and Antonio Bonafonte

In this dissertation, I study the inclusion of prosody into two applications that involve speech understanding: automatic speech transcription and spoken language translation. In the former case, I propose a method that uses an attention mechanism over parallel sequences of prosodic and morphosyntactic features. Results indicate an F1 score of 70.3% in terms of overall punctuation generation accuracy. In the latter problem I deal with enhancing spoken language translation with prosody. A neural machine translation system trained with movie-domain data is adapted with pause features using a prosodically annotated bilingual dataset. Results show that prosodic punctuation generation as a preliminary step to translation increases translation accuracy by 1% in terms of BLEU scores. Encoding pauses as an extra encoding feature gives an additional 1% increase to this number. The system is further extended to jointly predict pause features in order to be used as an input to a text-to-speech system.

Keywords: prosody, automatic speech transcription, punctuation restoration, spoken language machine translation, bilingual spoken corpus

Link at TDX: http://hdl.handle.net/10803/666222

Author's GitHub account and personal page: http://alpoktem.github.io/

Video of the defence