Pedro Ramoneda defends his PhD thesis

Pedro Ramoneda defends his PhD thesis

Wednesday, April 29th 2026 at 15:00h (CEST) - Room 55.309 (3rd floor) Tanger building (UPF Poblenou) and online
22.04.2026

Imatge inicial -

Title: A Machine Learning Approach to Modelling Piano Performance Difficulty Across Modalities

Supervisor: Dr. Xavier Serra Casals

Jury: Dr. Rafael Ramírez (UPF), Dr. Jorge Calvo (Universitat d'Alacant), Dr. Cynthia Liem (Delft University of Technology)

Abstract:

Assessing the difficulty of performing musical works is a fundamental task in music education. Reliable estimation of difficulty supports teachers and institutions in designing curriculum and exploring large music collections. Automating this process can strengthen pedagogical practice by providing computational tools for curriculum design and for empowering students to explore repertoire suited to their abilities, fostering autonomy and engagement in learning new music pieces. In essence, this thesis addresses the pedagogical and computational dimensions behind a long-standing question in music education: Can I Play It?

This dissertation approaches these challenges from a machine learning perspective, combining data resources, model development, and evaluation methodologies to study piano performance difficulty in a systematic way. Since the availability of high-quality annotated data is a prerequisite for any machine learning task, in this work we curate six datasets that cover symbolic, visual, and audio modalities. These include CIPI (652 works), the PSyllabus dataset (7,901 recordings), the PDF Difficulty datasets (about 7,500 annotated scores) or PianoPairs (around 1.1 million easy-hard aligned score pairs), constituting the largest multimodal resource available for the computational study of piano performance difficulty.

Building on these data resources, we developed models that predict piano performance difficulty directly from symbolic, visual, and audio data. In the symbolic domain, we investigated three complementary strategies for explainability and interpretability: separating technical, notational, and expressive components of difficulty; analyzing architectural behavior through attention-based interpretability; or designing interpretable models inspired by educational rubrics. These approaches are grounded in education, cognitive and piano performance hypotheses and draw on recent advances in NLP, treating music as structured sequences comparable to language. In the CIPI dataset, our models achieve accuracies between 39.5% and 41.4%, with MSE ranging from 1.1 to 1.7 on a 9-level scale.

In the visual and audio domains, the datasets are substantially larger, enabling the use of deep learning architectures adapted to sheet music images and performance recordings. Each modality is studied independently and formalized as a multitask learning problem, handling several difficulty rankings at once. For sheet music images evaluated on the CIPI dataset, our best model achieves an accuracy of 40.3% with a MSE of 1.3, while audio-based models reach an accuracy of 37.3% with a MSE of 1.8 on an 11-level difficulty scale. Both are further tested in out-of-distribution subsets, including a collection of works by historically underrepresented composers, to assess model robustness and bias. These experiments indicate that visual and acoustic cues also provide a reliable basis for automatic difficulty estimation.

Beyond difficulty prediction, this thesis also includes research on performance-centered systems that generate and adapt piano scores according to target difficulty levels, integrating difficulty estimation into the generation process. The resulting music is readable in MusicXML format, serving two main educational purposes: generating sight-reading exercises tailored to specific skill levels and simplifying existing works while preserving their musical style and structure. This integration extends the question Can I Play It? from analysis to generation, placing the performer at the center of the generative process.

All models, datasets, and code are made publicly available to support future research and integration into educational and performance learning systems.

 

Video: https://youtu.be/MoSKolnBWtc