Catotron, the first free, open speech synthesis system, based on neural networks

Back Catotron, the first free, open speech synthesis system, based on neural networks

Catotron, the first free, open speech synthesis system, based on neural networks

Developed by the Col·lectivaT cooperative with the participation of Mireia Farrús, head of the Expressive Speech Laboratory of the TALN research group, and thanks to funding from the Catalan Government’s Department of Culture.

06.11.2020

Imatge inicial

In recent years, speech synthesis technologies have come a long way thanks to deep learning techniques. The biggest change has been the ability to train the speech synthesis system with neural networks.

Catotron is the first free, open speech synthesis system based on neural networks. The Col·lectivaT cooperative has developed it with the participation of members of the Natural Language Processing research group (TALN) of the UPF Department of Information and Communication Technologies (DTIC) and with the collaboration of members of the UPC.

The aim of the project was to train speech system models in Catalan with neural networks and publish it under open source licences

Today, speech coding systems are used with speech synthesis systems also trained using neural networks. Unfortunately, to train these systems, it is essential to have such significant resources as data and computing power. Hence, except for speech systems in English, there were no other models published under open licences.

The project “Speech synthesis against the digital divide” was funded by the Catalan Government’s Department of Culture thanks to which the researchers have been able to train the speech system models in Catalan with neural networks and publish it under open source licences.

Recent work by Mireia Farrús, head of the TALN’s Expressive Speech Laboratory until August 2020, together with Baybars Külebi (Col·lectivaT), Alp Öktem, PhD from UPF (Col·lectivaT), Alex Peiro-Lilja (UPF) and Santiago Pascual (UPC), presented the system at the Interspeech2020 international conference, held online from 25 to 29 October from Shanghai (China).

Coding technologies modified for Catalan

The coding technologies that the developers of Catotron have used are the repositories of Tacotron2 and WaveGlow, of the company NVIDIA published under open licences on GitHub. “One of the most important results achieved in this project is the code: our Tacotron2 fork, which is modified for Catalan, essential for using models of Catalan”, the authors of the study explain.

“In addition, we have developed a second repository catotron-cpu, which can be run using the most common processors, CPUs. This version of Catotron is a lighter, more efficient alternative than existing ones”, they add.

Training of models and project utility for users

To train the models of Catalan, the researchers took advantage of already published open data. The resulting voices of Ona and Pau are trained with data from Festcat, a project by the Catalan Government conducted by researchers from the UPC.

Furthermore, “during our tests, we also conducted experiments with the set of data from ParlamentParla, and we produced a model of speech of Artur Mas, who was the person with the most hours recorded in this data set, and we took advantage of this test to estimate the volume and quality of data needed to train a model”, the Col·lectivaT developers leading the project explain.

With tools published on the project website, i.e., the code and models, it is now possible to adapt the voice by transfer learning based on the published models and recordings of a speaker. ”Our example of catotron-transfer-learning.ipynb explains the steps required”. A speech synthesis test is publicly available via a demo on the web http://catotron.collectivat.cat/, whereby if you enter written text, the system returns it in spoken text.

Related work:

Baybars Külebi, Alp Öktem, Alex Peiró-Lilja, Santiago Pascual i Mireia Farrús (2020), "Catotron: A neural text-to-speech System in Catalan", Interspeech2020, 25 al 29 october, virtualy in Xangai (Xina) https://cloud.laklak.eu/s/PTJNAK8ZcX5ZFZX

Multimedia

Categories:

Research

SDG - Sustainable Development Goals:

Els ODS a la UPF

Contact

For more information

News published by:

Communication Office

Engineering School. Department of Information and Communications Technologies

Catotron, the first free, open speech synthesis system, based on neural networks

Catotron, the first free, open speech synthesis system, based on neural networks

Coding technologies modified for Catalan

Training of models and project utility for users

Multimedia

Categories:

SDG - Sustainable Development Goals:

Contact

Related Assets