MTG-QBH: Query By Humming dataset
Description
This dataset includes 118 recordings of sung melodies. The recordings were made as part of the experiments on Query-by-Humming (QBH) reported in the following article:
J. Salamon, J. Serrà and E. Gómez, "Tonal Representations for Music Retrieval: From Version Identification to Query-by-Humming", International Journal of Multimedia Information Retrieval, special issue on Hybrid Music Information Retrieval, In Press (accepted Nov. 2012).
The recordings were made by 17 different subjects, 9 female and 8 male, whose musical experience ranged from none at all to amateur musicians. Subjects were presented with a list of songs out of which they were asked to select the ones they knew and sing part of the melody. The subjects were aware that the recordings will be used as queries in an experiment on QBH. There was no restriction as to how much of the melody should be sung nor which part of the melody should be sung, and the subjects were allowed to sing the melody with or without lyrics. The subjects did not listen to the original songs before recording the queries, and the recordings were all sung a capella without any accompaniment nor reference tone. To simulate a realistic QBH scenario, all recordings were done using a basic laptop microphone and no post-processing was applied. The duration of the recordings ranges from 11 to 98 seconds, with an average recording length of 26.8 seconds.
In addition to the query recordings, three meta-data files are included, one describing the queries and two describing the music collections against which the queries were tested in the experiments described in the aforementioned article. Whilst the query recordings are included in this dataset, audio files for the music collections listed in the meta-data files are NOT included in this dataset, as they are protected by copyright law. If you wish to reproduce the experiments reported in the aforementioned paper, it is up to you to obtain the original audio files of these songs.
All subjects have given their explicit approval for this dataset to be made public.