Pons J, Nieto O, Prockup M, Schmidt EM, Ehmann AF, Serra X. End-to-end learning for music audio tagging at scale. The 18th International Society for Music Information Retrieval Conference (ISMIR17)

Thesis linked to the implementation of the María de Maeztu Strategic Research Program.

Open access to PhD thesis carried out at the Department can be found at TDX

Please visit these pages for information on our PhD, MSc and BSc programs.

Back Pons J, Nieto O, Prockup M, Schmidt EM, Ehmann AF, Serra X. End-to-end learning for music audio tagging at scale. The 18th International Society for Music Information Retrieval Conference (ISMIR17)

The lack of data tends to limit the outcomes of deep learning research – specially, when dealing with end-to-end learning stacks processing raw data such as waveforms. In this study we make use of musical labels annotated for 1.2 million tracks. This large amount of data allows us to unrestrictedly explore different front-end paradigms: from assumption-free models – using waveforms as input with very small convolutional filters; to models that rely on domain knowledge – log-MEL spectrograms with a convolutional neural network designed to learn temporal and timbral features. Results suggest that, while spectrogrambased models surpass their waveform-based counterparts, the difference in performance shrinks as more data is employed.

Additional material:

Software (GitHub)
Preprint at arXiv https://arxiv.org/abs/1711.02520

Link: https://ismir2017.smcnus.org/lbds/Pons2017.pdf

DTIC MdM Strategic Program: Artificial and Natural Intelligence for ICT and beyond

Pons J, Nieto O, Prockup M, Schmidt EM, Ehmann AF, Serra X. End-to-end learning for music audio tagging at scale. The 18th International Society for Music Information Retrieval Conference (ISMIR17)

Related Assets