Back

Machine learning approaches for structuring large sound and music collections

Machine learning approaches for structuring large sound and music collections

Machine learning approaches for structuring large sound and music collections

One fundamental goal in the field of Music Information Research is to automatically structure sound and music collections in order to facilitate the access and retrieval of their audio content. Current state of the art has not yet succeeded in automatically labeling large collections of audio recordings with the most common semantic labels used in practical applications. For music collections we are interested in labeling semantic concepts such as genre, lead instrument, musical key, or mood of each musical piece. This type of task has only been partly successful for small audio collections and few number of classes for each concept. Current known problems for the advancement of this type of research include the lack of sufficiently large audio collections with appropriate labels that can be used for training and not having adequate and robust audio features of relevance for the classification tasks to be performed. In this project we tackle both problems.

 

In order to solve the problem of availability of training and testing datasets we are taking advantage of two initiatives that are being developed at the MTG. One is freesound.org, which is an open and collaborative site of sound samples (currently with more than 300k samples) which offer the possibility to create datasets for many sound classification tasks. The other is AcousticBrainz.org, which is an open initiative, done in collaboration with MusicBrainz, that supports a large collection of music analysis data (currently with more than 4 million songs analyzed) in which users upload audio features obtained from analyzing their personal music collections. With this approach we can get around the copyright problem of using most of the commercial music recordings and we can make public a large music collection with which to create datasets for many MIR tasks.

 

For the problem of not having adequate audio features we are exploring two complementary approaches. One is based on further exploiting the potential of the audio analysis algorithms already available in our Essentia software library, and growing with new algorithms being developed at the MTG. The other approach, which is the main focus of this project, is based on using deep learning methodologies. We plan to develop musically motivated architectures and exploit the potential of deep learning methods for automatically learning relevant audio features. Moreover, deep learning methodologies allow to efficiently combine supervised and unsupervised learning. By being able to potentially learn from any music audio excerpt, we aim at improving the current state of the art for genre classification and for other semantic annotation tasks.

 

To learn more: