A study on methods to align lyrics and audio in the singing of the Beijing opera awarded

Back A study on methods to align lyrics and audio in the singing of the Beijing opera awarded

A study on methods to align lyrics and audio in the singing of the Beijing opera awarded

Conducted by members of the Music Technology Group and considered the best article at the 6^th edition of the Folk Music Analysis (FMA) International Workshop held from 15 to 17 June in Dublin (Ireland).

18.07.2016

In a study related to the Beijing opera, Intangible Cultural Heritage of Humanity since 2010, researchers Georgi Dzhambazov, Yile Yang, Rafael Caro Repetto and Xavier Serra, members of the Music Technology Group (MTG) of the Department of Information and Communication Technologies (DTIC) at UPF, have proposed a new method for studying the alignment of the lyrics with the audio in a cappella performances, a style of vocal music or singing without any instrumental accompaniment whose characteristics allow better studying the phonetics of singing. The aim of the research was to design an algorithm capable of obtaining better results than the existing ones, and it won the prize for best article at the 6^th edition of the Folk Music Analysis (FMA 2016) International Workshop held from 15 to 17 June in Dublin (Ireland).

Some features of the Beijing opera

The jingjù 京剧, also known as Beijing opera, is performed in standard Mandarin with some dialect, and its lyrics are based on the principles of poetry. It is structured according to two-verse couplets each divided into three dous which are in turn composed of two to four written characters. In general, an aria begins with a slower part that gradually accelerates to express more intense moods. So, to emphasize the semantics of a sentence or in accordance with the plot, the actor has the option of holding the vowel of the final syllable of the dou.

As explained by Georgi Dzhambazov, first author of the paper, “normally, techniques for aligning lyrics and audio in singing are based on techniques developed for the spoken voice. But there are a couple of problems: the pronunciation of singing is not quite that of the spoken voice and the length of the vowels, which are longer when sung, affect these techniques”.

What this study has done to achieve its goal of designing an algorithm capable of achieving better results, has been to use a statistical model that takes into account changes in the duration of speech, specifically the so called: explicit-duration hidden Markov model, DHMM.

Thus, this model has been used in phonetic representations directly extracted from samples of singing, instead of using it in samples of speech, as is usual. That is, the DHMM been applied to a cappella recordings of the jingjù whose very nature does not present instrumental accompaniment that could affect phonetic analysis.

Moreover, to predict the duration of the syllables a set of rules that were obtained from the theoretical knowledge of jingjù was added to the model, i.e., the fact that the last syllable of a dou is usually longer, and even more so if it is the end syllable of a banshi, “the rhythmic patterns into which the arias are structured and usually go from slower to faster”, explains Rafael Caro Repetto, co-author and expert in this Chinese genre. And he also highlighted, “precisely this last aspect is one of the fundamental principles of the CompMusic project: to include specific cultural information of the genre of music being studied in the development of computational tools”.

CompMusic is a European project that aims to develop automated analysis systems for musical traditions other than Western ones, according to the respective cultural specificities. Traditional Chinese music is one of the musical traditions being studied by the MTG led by Xavier Serra, with the support of the European Research Council and is continuing thanks to the help of the Proof of Concept grant obtained in 2015 with the aim of commercializing some of the technologies developed within the framework of the CompMusic project.

Reference work:

Georgi Dzhambazov, Yile Yang, Rafael Caro Repetto, Xavier Serra (2016), “Automatic Alignment of Long Syllables in a Cappella Beijing Opera”, Folk Music Analysis: FMA 16: 6^th International Workshop, 15-17 June 2016.

Department | School of Engineering

A study on methods to align lyrics and audio in the singing of the Beijing opera awarded

A study on methods to align lyrics and audio in the singing of the Beijing opera awarded

Multimedia

Categories:

SDG - Sustainable Development Goals:

Contact

Related Assets