Machine learning approaches for structuring large sound and music collections
Machine learning approaches for structuring large sound and music collections
Machine learning approaches for structuring large sound and music collections
One fundamental goal in the field of Music Information Research is to automatically structure sound and music collections in order to facilitate the access and retrieval of their audio content. Current state of the art has not yet succeeded in automatically labeling large collections of audio recordings with the most common semantic labels used in practical applications. For music collections we are interested in labeling semantic concepts such as genre, lead instrument, musical key, or mood of each musical piece. This type of task has only been partly successful for small audio collections and few number of classes for each concept. Current known problems for the advancement of this type of research include the lack of sufficiently large audio collections with appropriate labels that can be used for training and not having adequate and robust audio features of relevance for the classification tasks to be performed. In this project we tackle both problems.
In order to solve the problem of availability of training and testing datasets we are taking advantage of two initiatives that are being developed at the MTG. One is freesound.org, which is an open and collaborative site of sound samples (currently with more than 300k samples) which offer the possibility to create datasets for many sound classification tasks. The other is AcousticBrainz.org, which is an open initiative, done in collaboration with MusicBrainz, that supports a large collection of music analysis data (currently with more than 4 million songs analyzed) in which users upload audio features obtained from analyzing their personal music collections. With this approach we can get around the copyright problem of using most of the commercial music recordings and we can make public a large music collection with which to create datasets for many MIR tasks.
For the problem of not having adequate audio features we are exploring two complementary approaches. One is based on further exploiting the potential of the audio analysis algorithms already available in our Essentia software library, and growing with new algorithms being developed at the MTG. The other approach, which is the main focus of this project, is based on using deep learning methodologies. We plan to develop musically motivated architectures and exploit the potential of deep learning methods for automatically learning relevant audio features. Moreover, deep learning methodologies allow to efficiently combine supervised and unsupervised learning. By being able to potentially learn from any music audio excerpt, we aim at improving the current state of the art for genre classification and for other semantic annotation tasks.
To learn more:
- Presentation of the project at the Data-driven Knowledge Extraction Workshop, June 2016 (Slides)
- "Deep Learning", a technique that also contributes to mjsic technology (UPF news, 13/07/2016)
Principal researchers
Xavier SerraResearchers
Jordi Pons Dmitry Bogdanov Frederic Font Alastair PorterRelated Assets:
-
Essentia - Software library for audio analysis and audio-based music information retrieval
-
Best paper award at CBMI2016 for work on musically motivated deep learning architectures
-
Pons J, Lidy T, Serra X. Experimenting with Musically Motivated Convolutional Neural Networks. 14th International Workshop on Content-based Multimedia Indexing (CBMI2016)
-
[AUDIO FEATURES] AcousticBrainz - Crowdsourced acoustic information available under CC0 license
-
Pons, J., Serra X. Designing Efficient Architectures for Modeling Temporal Features with Convolutional Neural Networks. 42th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017)
-
The role of academia in empowering participatory and collaborative action, plenary session at SIS2016
-
Workshop on music knowledge extraction using machine learning on December 4th
-
New European Training Network on Music Information Retrieval
-
New awards for PhD students at the Department
-
Pons J, Nieto O, Prockup M, Schmidt EM, Ehmann AF, Serra X. End-to-end learning for music audio tagging at scale. The 18th International Society for Music Information Retrieval Conference (ISMIR17)
-
Gong, R, Pons J, Serra X. Audio to Score Matching by Combining Phonetic and Duration Information. The 18th International Society for Music Information Retrieval Conference (ISMIR17)
-
Pons J, Gong R, Serra X. Score-informed syllable segmentation for a capella singing voice with convolutional neural networks. In 18th International Society for Music Information Retrieval Conference (ISMIR2017)
-
Pons J, Slizovskaia O, Gong R, Gómez E, Serra X. Timbre Analysis of Music Audio Signals with Convolutional Neural Networks. 25th European Signal Processing Conference (EUSIPCO)
-
Sustainable Freesound
-
GPU server for training deep learning models with TensorFlow
-
Kim J, Won M, Serra X, Liem C.C.S. Delft University of Technology, Delft, NetherlandsTransfer Learning of Artist Group Factors to Musical Genre Classification. Proceedings of the 2018 World Wide Web Conference
-
Videos from the European Music Conference
-
Pons J, Nieto O, Prockup M, Schmidt EM, Ehmann AF, Serra X. End-to-end learning for music audio tagging at scale. In the workshop on Machine Learning for Audio Signal Processing (ML4Audio), NIPS.
-
Minz Won wins the WWW 2018 Challenge: Learning to Recognize Musical Genre
-
Pons J, Serra X. Randomly weighted CNNs for (music) audio classification. arXiv preprint
-
Rethage D, Pons J, Serra X. A Wavenet for Speech Denoising. 43rd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2018)
-
Deep learning architectures for music audio classification: a personal (re)view - Talk by Jordi Pons at UPC DL Winter Seminar
-
ERC=Science2 and the Music Technology Group gathers 22 ERC grantees working in music
-
Lluís F, Pons J, Serra X. End-to-end music source separation: is it possible in the waveform domain? arXiv pre-print
-
Personal AMA interview for the María de Maeztu program & AI Grant, by Jordi Pons
-
Pons J, Serrà J, Serra X. Training neural audio classifiers with few data. arXiv pre-print.
-
Kim J, Won M, Liem CSM, Hanjalic A. Towards Seed-Free Music Playlist Generation: Enhancing Collaborative Filtering with Playlist Title Information. Proceedings of the ACM Recommender Systems Challenge 2018
-
25th anniversary of the Music Technology Group
-
Pons J, Serrà J, Serra X. Training neural audio classifiers with few data. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
-
[PhD thesis] Deep Neural Networks for Music and Audio Tagging