Pons J, Slizovskaia O, Gong R, Gómez E, Serra X. Timbre Analysis of Music Audio Signals with Convolutional Neural Networks. 25th European Signal Processing Conference (EUSIPCO)

Thesis linked to the implementation of the María de Maeztu Strategic Research Program.

Open access to PhD thesis carried out at the Department can be found at TDX

Please visit these pages for information on our PhD, MSc and BSc programs.

Back Pons J, Slizovskaia O, Gong R, Gómez E, Serra X. Timbre Analysis of Music Audio Signals with Convolutional Neural Networks. 25th European Signal Processing Conference (EUSIPCO)

Pons J, Slizovskaia O, Gong R, Gómez E, Serra X. Timbre Analysis of Music Audio Signals with Convolutional Neural Networks. 25th European Signal Processing Conference (EUSIPCO)

The focus of this work is to study how to efficiently tailor Convolutional Neural Networks (CNNs) towards learning timbre representations from log-mel magnitude spectrograms. We first review the trends when designing CNN architectures. Through this literature overview we discuss which are the crucial points to consider for efficiently learning timbre representations using CNNs. From this discussion we propose a design strategy meant to capture the relevant time-frequency contexts for learning timbre, which permits using domain knowledge for designing architectures. In addition, one of our main goals is to design efficient CNN architectures – what reduces the risk of these models to over-fit, since CNNs’ number of parameters is minimized. Several architectures based on the design principles we propose are successfully assessed for different research tasks related to timbre: singing voice phoneme classification, musical instrument recognition and music auto-tagging.

Additional material:

Postprint in arXiv
Code. The code to reproduce each of the experiments is available online:
- Phoneme classification of Jingu singing: github.com/ronggong/EUSIPCO2017
- Musical instrument recognition: github.com/Veleslavia/EUSIPCO2017
- Music auto-tagging: github.com/jordipons/EUSIPCO2017
Datasets. This work was possible because several benchmarks/datasets are available for research purposes:
- Jingju a cappella singing dataset: github.com/MTG/jingjuPhonemeAnnotation
- IRMAS, a dataset for instrument recognition in musical audio signals: mtg.upf.edu/download/datasets/irmas
- MagnaTagATune dataset: mirg.city.ac.uk/codeapps/the-magnatagatune-dataset and github.com/keunwoochoi/magnatagatune-list
Presentation slides in Zenodo

Link: http://mtg.upf.edu/node/3802

DTIC MdM Strategic Program: Artificial and Natural Intelligence for ICT and beyond

Pons J, Slizovskaia O, Gong R, Gómez E, Serra X. Timbre Analysis of Music Audio Signals with Convolutional Neural Networks. 25th European Signal Processing Conference (EUSIPCO)

Related Assets