Back David Cabrera Dalmazzo defends his PhD thesis

David Cabrera Dalmazzo defends his PhD thesis

Wednesday, December 2nd 2020 at 11h - online


Imatge inicial

Title: Machine Learning and Deep Neural Networks Approach to Modelling Musical Gestures

Supervisor: Dr. Rafael Ramírez

Jury: Dr. Gualtiero Volpe (Università di Genova), Dr. Sergio Giraldo (Universitat Pompeu Fabra), Dr. Atau Tanaka (Goldsmith University)


Gestures can be defined as a form of non-verbal communication associated with an intention or an emotional state articulation. They are not only intrinsically part of the human language but also explain specific details of a body-knowledge execution. Gestures are being studied not only in the language research field but also in dance, sports, rehabilitation, and music; where the term is understood as a “learned technique of the body”. Therefore, in music education, gestures are assumed to be automatic-motor abilities learned by repetitional practice, to self-teach and fine-tune the motor actions optimally. Hence, those gestures are intended to be part of the performer’s technical repertoire to take fast actions/decisions on-the- flight, assuming that they are not only relevant in music expressive capabilities but also, a method for a correct ‘energy-consumption’ habit development to avoid injuries.

In this thesis, we applied state-of-the-art machine learning (ML) techniques to the model violin bowing gestures of professional players. Concretely, we recorded a database of violin performances of experts and students of different levels, implemented a multi-modal recording system to automatically synchronise audio, video and Inertial Measurement Unit sensor data and developed a custom application to visualise the output from the ML models. We explored three approaches to classify and identify violin gestures in real-time: a) We implemented a Hidden Markov Model to detect fingering and bow- stroke gesture performance from electromyogram and motion data. b) We extracted general time features from gestures samples, creating a dataset of audio and motion data from expert performers, and trained and compared different Recurrent Neural Network models. c) We implemented Mel-spectrogram-based Recurrent Neural Networks models for classifying bowing gestures from only audio data. This allows the recognition of bow strokes without the need for motion capture sensors. These three approaches are complementary and were incorporated into a real-time feedback system to enhance violin learning and practice.


This thesis defense will take place online. To attend use this link. The microphone and camera must be turned off, and the online access will be unavailable after 30 minutes from the start of the defense.





SDG - Sustainable Development Goals:

Els ODS a la UPF