Software & Datasets
Open Science and Reproducibility are core goals of the MTG, promoting collaborations by making sure that our research results can be used by other researchers and by society at large. Here we highlight some of the software tools and datasets developed as part of our research and that are being maintained by researchers of the MTG. All our open source projects are made available from our github repository and all our open datasets are made available from Zenodo. Please review the terms and conditions of use stated in each software tool and dataset to make sure that they allow your intended use.
Apart from research collaborations, we are also interested in technology transfer, offering commercial licenses for exploiting our technology portfolio in industrial applications. Contact us for any further information.
Software
Software library and AI models for audio and music analysis, description, and synthesis.
GAIA:
Software library to apply similarity measures and classifications on the results of audio analysis.
JavaScript (JS) library for music/audio signal analysis and processing powered by Essentia.
Sound analysis/synthesis tools for music applications.
Modular and extensible desktop application to explore the Dunya corpora.
Corpora
Music corpora of several non-western music repertoires and related software tools.
Collaborative database of Creative Commons Licensed sounds.
Crowdsourced acoustic information of songs available under open licenses.
Datasets platforms
Platform for the collaborative creation of open audio collections from Freesound.
Library of tools for accessing and working with datasets of relevance to the field of Music Information Retrieval (MIR).
Python library for downloading, loading & working with sound datasets.
Datasets
Collection of datasets related to the Dunya/Compmusic corpora.
Datasets using sounds from Freesound.
Monophonic and polyphonic audio files of a set of common Flamenco singing.
Audio recordings of 3 pieces a cappella with their associated MIDI files.
Multitrack dataset of a cappella choral music.
A Dataset for Cover Song Identification and Understanding.
Annotations of drum events within known music audio recordings datasets.
EEP:
Multimodal recordings of string quartet performances.
Knowledge Base of flamenco music.
Key annotations of a music audio collection.
Tempo annotations of a music audio collection.
Recordings of single notes and scales played by several instruments.
Scores and harmonic annotations of Haydn's String Quartets Op. 20.
Musical audio excerpts with annotations of the predominant instruments.
ISMIR 2004 Genre Identification task dataset.
JAAH:
Audio-aligned jazz harmony dataset.
KBSF:
Knowledge Base automatically extracted from songfacts.com.
Last.fm Dataset 360k users - Last.fm Dataset 1k users:
<user, artist-mbid, artist-name, total-plays> tuples from Last.fm.
Datasets
Text and accompanying metadata of Amazon customer reviews.
MASS:
Multi-track recordings for audio source separation research.
MAST:
Rhythmic pattern reproductions with grades for a subset of performances.
MEDIAEVAL ACOUSTICBRAINZ GENRE:
AcousticBrainz music features and genre/subgenre annotations extracted from AllMusic, Discogs, Lastfm and Tagtraum
148,826 playlists, with 649,091 songs. Genre, tag information plus mel-spectrograms for each song.
55,000 full audio tracks with 195 tags from genre, instrument, and mood/theme categories.
Recordings of sung melodies for Query-by-Humming research.
Benchmark dataset of relative arousal/valence annotations for validation of audio models for music emotion recognition.
Open dataset for the tasks of music detection and relative music loudness estimation.
Orchestral music excerpts with annotations for melody extraction research.
Denoised recordings and note annotations for Aalto anechoic orchestral database.
Different Motion Capture (MoCap) recordings of conducting movements.
Excerpts of the Eroica Symphony by Beethoven plus audio descriptors.
PHENICX Symphonies Recordings:
Multimodal recordings of orchestra performances.
Multimodal data of string quartet performances.
SAS:
List of artists and biographical information for semantic artist similarity research.
A corpus of audio captions for music and language evaluation.
Flamenco a cappella sung melodies with manual transcriptions.
... more datasets