List of results published directly linked with the projects co-funded by the Spanish Ministry of Economy and Competitiveness under the María de Maeztu Units of Excellence Program (MDM-2015-0502).

List of publications acknowledging the funding in Scopus.

The record for each publication will include access to postprints (following the Open Access policy of the program), as well as datasets and software used. Ongoing work with UPF Library and Informatics will improve the interface and automation of the retrieval of this information soon.

The MdM Strategic Research Program has its own community in Zenodo for material available in this repository   as well as at the UPF e-repository   

 

 

Back [PhD thesis] Machine learning and deep neural networks approach to modelling musical gestures

[PhD thesis] Machine learning and deep neural networks approach to modelling musical gestures

Author: David Cabrera Dalmazzo

Supervisor: Rafael Ramírez

Gestures can be defined as a form of non-verbal communication associated with an intention or an emotional state articulation. They are not only intrinsically part of the human language, but also explain specific details of a body-knowledge execution. Gestures are being studied not only in the language research field but also in dance, sports, rehabilitation, and music; where the term is understood as a “learned technique of the body”. Therefore, in music education, gestures are assumed as automatic-motor abilities learned by repetitional practice, to self-teach and fine-tune the motor actions optimally. Hence, those gestures are intended to be part of the performer’s technical repertoire to take fast actions/decisions on-the flight, assuming that they are not only relevant in music expressive capabilities but also, a method for a correct ‘energy-consumption’ habit development to avoid injuries. In this thesis, we applied state-of-the-art machine learning (ML) techniques to model violin bowing gestures in professional players. Concretely, we recorded a database of expert performers and different student levels and developed three strategies to classify and recognise those gestures in real-time: a) First, we developed a multimodal synchronisation system to record audio, video and IMU sensor data with a unified time reference. We programmed a custom C++ application to visualise the output from the ML models. We implemented a Hidden Markov Model to detect fingering disposition and bow-stroke gesture performance. b) A second approach is a system that extracts general time features from the gestures samples, creating a dataset of audio and motion data from expert performers implementing a Deep Neural Networks algorithm. To do so, we have implemented the hybrid model CNN LSTM architecture. c) Furthermore, a Melspectrogram based analysis that can read and extract patterns from only audio data, opening the option of recognising relevant information from the audio recordings without the need for external sensors to achieve similar results. All of these techniques are complementary and also incorporated into an education application as a computer assistant to enhance music-learners practice by providing useful real-time feedback. The application will be tested in a professional education institution.

Link to manuscript: http://hdl.handle.net/10803/670399