New release of mirdata with major contributions by the MTG
New release of mirdata with major contributions by the MTG
We're happy to announce a major new release of mirdata in which the MTG has had a major participation. mirdata is an open-source Python library that provides tools for accessing and working with a number of publicly available datasets of relevance to the field of Music Information Retrieval (MIR). mirdata proposes unified tools for downloading and validating datasets, making sure they are complete and files are not corrupted, loading annotation files to a common format (consistent with mir_eval), and parsing track level metadata for detailed evaluations.
This new release has been a joint effort by NYU, Spotify and MTG researchers. From the MTG side, we have added loaders for 10 MTG datasets (as for now!) and we have helped extend the software tools, making them more relevant for research reproducibility purposes. These efforts will continue and many new and openly available datasets will be supported.
Currently, mirdata includes 26 dataset loaders! You can see them in this table. In this new v.0.3.0 release, we have integrated the MTG datasets listed below, all of them shared using Creative Commons licenses and available from Zenodo.
- AcousticBrainz Genre
- Beatport EDM Key
- cante100 (from COFLA research corpus)
- IRMAS
- Giantsteps Key
- Giantsteps Tempo
- Mridangam Stroke
- Saraga Carnatic (from Compmusic research corpus)
- Saraga Hindustani (from Compmusic research corpus)
- Tonality ClassicalDB
Additionally, in this new release, you can find a new API which introduces a dataset object and annotations, and new documentation, including tutorials and notebooks with examples.
We want to thank Magdalena Fuentes; Rachel Bittner; Marius Miron; Genís Plaja; Pedro Ramoneda; Vincent Lostanlen; David Rubinstein; Andreas Jansson; Thor Kell; Keunwoo Choi; Tom Xi; Kyungyun Lee and Xavier Serra, for all the great work done on this amazing project.