Dominguez M, Burga A, Farrús M, Wanner L. Towards expressive prosody generation in TTS for reading aloud applications. Proc. IberSPEECH 2018

We develop a large number of software tools and hosting infrastructures to support the research developed at the Department. We will be detailing in this section the different tools available. You can take a look for the moment at the offer available within the UPF Knowledge Portal, the innovations created in the context of EU projects in the Innovation Radar and the software sections of some of our research groups:

Artificial Intelligence

Nonlinear Time Series Analysis

Downloads

Web Research

Dyswebxia

Music Technology

Interactive Technologies

Barcelona MedTech

GitHub

Natural Language Processing

GitHub
Resources (datasets, software and other material)

Nonlinear Time Series Analysis

Downloads

UbicaLab

GitHub

Wireless Networking

GitHub

Educational Technologies

GitHub

Back Dominguez M, Burga A, Farrús M, Wanner L. Towards expressive prosody generation in TTS for reading aloud applications. Proc. IberSPEECH 2018

Dominguez M, Burga A, Farrús M, Wanner L. Towards expressive prosody generation in TTS for reading aloud applications. Proc. IberSPEECH 2018

Conversational interfaces involving text-to-speech (TTS) applications have improved expressiveness and overall naturalness to a reasonable extent in the last decades. Conversational features, such as speech acts, affective states and information structure have been instrumental to derive more expressive prosodic contours. However, synthetic speech is still perceived as monotonous, when a text that lacks those conversational features is read aloud in the interface, i.e. it is fed directly to the TTS application. In this paper, we propose a methodology for pre-processing raw texts before they arrive to the TTS application. The aim is to analyze syntactic and information (or communicative) structure, and then use the high-level linguistic features derived from the analysis to generate more expressive prosody in the synthesized speech. The proposed methodology encompasses a pipeline of four modules: (1) a tokenizer, (2) a syntactic parser, (3) a communicative parser, and (3) an SSML prosody tag converter. The implementation has been tested in an experimental setting for German, using web-retrieved articles. Perception tests show a considerable improvement in expressiveness of the synthesized speech when prosody is enriched automatically taking into account the communicative structure

DOI: 10.21437/IberSPEECH.2018-9

Code: https://github.com/TalnUPF/

Link: https://www.isca-speech.org/archive/IberSPEECH_2018/abstracts/IberS18_P1-5_Dominguez.html

DTIC MdM Strategic Program: Artificial and Natural Intelligence for ICT and beyond

Dominguez M, Burga A, Farrús M, Wanner L. Towards expressive prosody generation in TTS for reading aloud applications. Proc. IberSPEECH 2018

Related Assets