Dominguez M, Burga A, Farrús M, Wanner L. Towards expressive prosody generation in TTS for reading aloud applications. Proc. IberSPEECH 2018
We develop a large number of software tools and hosting infrastructures to support the research developed at the Department. We will be detailing in this section the different tools available. You can take a look for the moment at the offer available within the UPF Knowledge Portal, the innovations created in the context of EU projects in the Innovation Radar and the software sections of some of our research groups:
Artificial Intelligence |
Nonlinear Time Series Analysis |
Web Research |
Music Technology |
Interactive Technologies |
Barcelona MedTech |
Natural Language Processing |
Nonlinear Time Series Analysis |
UbicaLab |
Wireless Networking |
Educational Technologies |
Dominguez M, Burga A, Farrús M, Wanner L. Towards expressive prosody generation in TTS for reading aloud applications. Proc. IberSPEECH 2018
Dominguez M, Burga A, Farrús M, Wanner L. Towards expressive prosody generation in TTS for reading aloud applications. Proc. IberSPEECH 2018
Conversational interfaces involving text-to-speech (TTS) applications have improved expressiveness and overall naturalness to a reasonable extent in the last decades. Conversational features, such as speech acts, affective states and information structure have been instrumental to derive more expressive prosodic contours. However, synthetic speech is still perceived as monotonous, when a text that lacks those conversational features is read aloud in the interface, i.e. it is fed directly to the TTS application. In this paper, we propose a methodology for pre-processing raw texts before they arrive to the TTS application. The aim is to analyze syntactic and information (or communicative) structure, and then use the high-level linguistic features derived from the analysis to generate more expressive prosody in the synthesized speech. The proposed methodology encompasses a pipeline of four modules: (1) a tokenizer, (2) a syntactic parser, (3) a communicative parser, and (3) an SSML prosody tag converter. The implementation has been tested in an experimental setting for German, using web-retrieved articles. Perception tests show a considerable improvement in expressiveness of the synthesized speech when prosody is enriched automatically taking into account the communicative structure
DOI: 10.21437/IberSPEECH.2018-9