Jyoti Narang defends her PhD thesis
Title: Analyzing Singing Voice Expressivity: Focus on Singing Voice Musical Dynamics
Supervisor: Dr. Xavier Serra
Jury: Dr. Perfecto Herrera (UPF), Dr. Jose Javier Valero Mas (University of Alicante), Dr. Elaine Chew (King's College London)
Abstract:
Musical dynamics, a key expressive dimension of the singing voice, plays a vital role in shaping phrasing and emotional impact. Despite its importance, its study and formalization remain limited. This work addresses these challenges by proposing methodologies for analyzing and modeling dynamics from both audio performances and scores. Specifically, we carry out three tasks: (1) conducting a comparative analysis of musical dynamics using different audio performances; (2) studying musical dynamics from scores - analyzing curated real-world audio performances paired with scores featuring rich dynamics labels; and (3) analyzing listener agreement on perceived dynamics - investigating the subjectivity of annotations.
To support these approaches, we curated diverse datasets, including a synthetic dataset for choral singing, score-performance datasets from both the performer and listener perspectives, and karaoke datasets for imitation-based dynamics analysis. Our findings reveal that while synthetic data enable controlled comparisons, real-world performances exhibit musical dynamics absent in synthetic renditions. Using Romantic-era Lieder scores, we semi-automatically curated score-performance pairs using state-of-the-art source separation and alignment techniques to train a dynamics prediction model. Collaborating with expert musicians, we annotated scores with synchronized dynamics labels and examined inter-annotator agreement using computational linguistics methods. Additionally, we developed a system to identify vocal dynamics automatically, employing structural segmentation and machine learning models trained on the Western classical Lieder corpus. Finally, we conducted a preliminary study on Hindustani music, applying our methodology to explore dynamics variations in relation to tradition-specific characteristics.
Although musical dynamics perception is inherently subjective, our findings indicate that a degree of formalization can be achieved by categorizing dynamics into broad perceptual bands such as soft, medium, and loud, providing a structured framework for analysis. These findings contribute to a more systematic understanding of musical dynamics, paving the way for improved analysis, modeling, and applications in both research and performance contexts.