Back OpenBMAT: a new open dataset for music detection with loudness annotations
OpenBMAT: a new open dataset for music detection with loudness annotations
We announce the publication of OpenBMAT, an open dataset for the tasks of music detection and relative music loudness estimation. The dataset contains 27.4 hours of audio from 8 different TV program types at 4 different countries, cross-annotated by 3 people using 6 different classes. It has been published as a dataset paper at Transaction of the International Society for Music Information Retrieval, the open journal of ISMIR. This research has been carried out as a collaboration between the MTG and BMAT in the context of the industrial Doctorates program of the Catalan Government.
Music detection refers to the task of finding music segments in an audio file. Thus, the minimum requirement for a dataset to be suitable for this task is to include annotations about the presence of music. However, we find the following two features to be essential to any music detection dataset that aims to provide a certain level of generalization: first, music should appear both isolated and mixed with other type of non-music sounds, because, otherwise, the dataset may not be representative of many real-life scenarios such as broadcast audio; and second, a significant number of the audio files included in the dataset should be multi-class, i.e., contain class changes so that it allows the evaluation of an algorithm's precision in detecting them.
The two main applications of music detection algorithms are (1) the automatic indexing and retrieving of auditory information based on its audio content, and (2) the monitoring of music for copyright management [1-4]. Additionally, the detection of music can be applied as an intermediate step to improve the performance of algorithms designed for other purposes . In the current copyright management business model, broadcasters are taxed based on the percentage of music they broadcast. It is relevant to know whether this music is used in the foreground or the background as it is considered differently for the distribution of copyright royalties by some collective management organizations. In this scenario, the music detection task falls short as we need to estimate the loudness of music in relation to other simultaneous non-music sounds, i.e., its relative loudness. We define relative music loudness estimation as the task of finding music segments and audio file and classifying them into foreground or background music. We use the concept of loudness as it was defined by Moore in : "that attribute of auditory sensation in terms of which sounds can be ordered on a scale extending from quiet to loud" (p. 133).
Currently, there is no dataset with annotations about the relative loudness of music, and the only publicly available dataset including the aforementioned two features is the dataset published by Seyerlehner . Unfortunately, despite containing both isolated music and music mixed with other types of sounds, its annotations do not reflect this information and specify only the presence of music. In the case of the dataset published by Scheirer and Slaney , which is the first open dataset that included annotations about the presence of music, the type of sounds that appear mixed with music are restricted to speech, and this is reflected in the chosen taxonomy: Music, Speech, Simultaneous Music and Speech and Other. Other publicly available datasets that include music presence annotations are MUSAN  and GTZAN datasets, but none of them include music mixed with other type of sounds and both of them consist of single-class instances, i.e., audio files annotated as a single segment of a single class.
OpenBMAT is a dataset containing 27.4 hours of audio sampled from 8 different TV program types that have been broadcast in the most popular TV channels of 4 different countries: France, German, Spain and the United Kingdom. It consists of 1647 one-minute multi-class audio files that include music and non-music sounds both mixed and isolated. OpenBMAT has been cross-annotated by 3 annotators using a taxonomy that mixes the presence of music with its loudness with respect to other types of simultaneous non-music sounds. This taxonomy contains 6 classes: Music, Foreground Music, Similar, Background Music, Low Background Music and No Music. Despite not being the longest dataset, OpenBMAT is the only one that brings together all the appropriate characteristics for the task of music detection and also for the estimation of the music's relative loudness.
You can find the dataset at Zenodo. To download it you need to make a request stating the purpose you want to use it for. Only non-profit purposes will be accepted. We would appreciate scientific publications of works using OpenBMAT to refer to:Meléndez-Catalán, Blai, Molina, Emilio, & Gómez, Emilia. (2019). Open Broadcast Media Audio from TV (OpenBMAT) (Version 1.0.0) [Data set]. Transactions of the International Society for Music Information Retrieval (TISMIR). Zenodo. http://doi.org/10.5281/zenodo.3381249
And also the scientific publication describing it:Meléndez-Catalán, B., Molina, E. and Gómez, E., 2019. Open Broadcast Media Audio from TV: A Dataset of TV Broadcast Audio with Relative Music Loudness Annotations. Transactions of the International Society for Music Information Retrieval, 2(1), pp.43–51. DOI: http://doi.org/10.5334/tismir.29
The link to the dataset in the mtg page: https://www.upf.edu/web/mtg/openbmat
OpenBMAT and the associated research has been carried out as a collaboration between MTG and BMAT in the context of the Industrial Doctorates program of the Catalan Government.
 Zhu, Y., Sun, Q., and Rahardja, S. (2006). Detecting musical sounds in broadcast audio based on pitch tuning analysis. In IEEE International Conference on Multimedia and Expo, pages 13–16.
 Seyerlehner, K., Pohle, T., Schedl, M., and Widmer, G. (2007). Automatic music detection in television productions. In Proceedings of the 10th International Conference on Digital Audio Effects (DAFx-07).
 Izumitani, T., Mukai, R., and Kashino, K. (2008). A background music detection method based on robust feature extraction. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 13–16.
 Giannakopoulos, T., Pikrakis, A., and Theodoridis, S. (2008). Music tracking in audio streams from movies. In Proceedings of the IEEE 10th Workshop on Multimedia Signal Processing (MMSP), pages 950–955.
 Gfeller, B., Guo, R., Kilgour, K., Kumar, S., Lyon, J., Odell, J., Ritter, M., Roblek, D., Sharifi, M., Velimirović, M., et al. (2017). Now playing: Continuous low-power music recognition. In NIPS 2017 Workshop: Machine Learning on the Phone (NIPS).
 Moore, B. C. (2012). An introduction to the psychology of hearing. Brill.
 Scheirer, E. and Slaney, M. (1997). Construction and evaluation of a robust multifeature speech/music discriminator. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 2, pages 1331–1334. Snyder, D., Chen, G., and Povey, D. (2015). MUSAN: A Music, Speech, and Noise Corpus. arXiv:1510.08484v1