News News

Return to Full Page
Back

Researchers design a singing synthesizer based on state-of-the-art neural networks

Researchers design a singing synthesizer based on state-of-the-art neural networks

The proposed system is capable of modelling singing with a few recordings and incorporates an algorithm that allows creating synthetic singing quickly, making it a very competitive system in terms of sound quality and efficiency.

09.06.2017

To date, the best singing synthesizers were based on samples or statistical models. The first are characterized by joining small fragments of recordings, as if making a huge jigsaw puzzle, but they have problems to generate fluid singing without discontinuities. Statistical models, on the other hand, are based on a careful statistical analysis of the acoustic characteristics of the recordings. They are capable of generating fluid singing without discontinuities, but have problems to generate details and nuances.

Jordi Bonada and Merlijn Blaauw, researchers with the Music Information Research Lab (MIRLab) linked to the Music Technology Group (MTG) at UPF, have developed an innovative system that uses state-of-the-art neural networks specializing in acoustic signals. This new model allows better joining the two main aspects of traditional singing synthesizers, and is capable of generating fluid singing with details and nuances, and without discontinuities.

In addition, breaking the general tendency of neural networks requiring many hours of recording to create voice models, the proposed system is capable of modelling singing with few recordings, 15 minutes in Spanish and 35 in English. It also includes an algorithm that allows creating synthetic singing about 20 times more quickly than real time, which makes it a clearly competitive system in terms of sound quality and efficiency.

The evaluation and validation of the new system has been carried out via a listening experiment with 18 listeners. The result is that the new method is clearly preferred to other existing systems based on samples and statistical parametric synthesis.

The two researchers, Bonada and Blaauw, are to present the synthesizer at the forthcoming Interspeech 2017  international congress, to be held from 20 to 24 August in Stockholm (Sweden). The meeting aims to provide a comprehensive approach to speech-related communication problems.

Demonstrations of how it works

Reference work: Bonada and Blaauw; A neural parametric singing synthesizerarxiv.org

Categories: