Solos: A Dataset for Audio-Visual Music Analysis. 22nd IEEE International Workshop on Multimedia Signal Processing (MMSP), September 21-24, 2020

  • Authors
  • Montesinos JF, Slizovskaia O, Haro G
  • UPF authors
  • HARO ORTEGA, GLORIA; MONTESINOS GARCIA, JUAN FELIPE; SLIZOVSKAIA ., OLGA;
  • Type
  • Scholarly articles
  • Journal títle
  • arxiv.org
  • Publication year
  • 2020
  • Pages
  • 1-6
  • ISSN
  • ISSN-0611
  • Publication State
  • Published
  • Abstract
  • In this paper, we present a new dataset of music performance videos which can be used for training machine learning methods for multiple tasks such as audio-visual blind source separation and localization, cross-modal correspondences, cross-modal generation and, in general, any audio-visual self-supervised task. These videos, gathered from YouTube, consist of solo musical performances of 13 different instruments. Compared to previously proposed audio-visual datasets, Solos is cleaner since a big amount of its recordings are auditions and manually checked recordings, ensuring there is no background noise nor effects added in the video post-processing. Besides, it is, up to the best of our knowledge, the only dataset that contains the whole set of instruments present in the URMP\cite{URPM} dataset, a high-quality dataset of 44 audio-visual recordings of multi-instrument classical music pieces with individual audio tracks. URMP was intented to be used for source separation, thus, we evaluate the performance on the URMP dataset of two different source-separation models trained on Solos. The dataset is publicly available at this https URL: https://www.juanmontesinos.com/Solos/
  • Complete citation
  • Montesinos JF, Slizovskaia O, Haro G. Solos: A Dataset for Audio-Visual Music Analysis. 22nd IEEE International Workshop on Multimedia Signal Processing (MMSP), September 21-24, 2020. arxiv.org 2020; ( ): 1-6.