We have relevant datasets, repositories, frameworks and tools of relevance for research and technology transfer initiatives related to knowledge extraction. This section provides an overview on a selection of them and links to download or contact details.

The MdM Strategic Research Program has its own community in Zenodo for material available in this repository  as well as at the UPF e-repository  . Below a non-exhaustive list of datasets representative of the research in the Department.

As part of the promotion of the availability of resources, the creation of specific communities in Zenodo has also been promoted, at level of research communities (for instance, MIR and Educational Data Analytics) or MSc programs (for instance, the Master in Sound and Music Computing)



Back [TEXT] MARD: Multimodal Album Reviews Dataset


MARD contains texts and accompanying metadata originally obtained from a much larger dataset of Amazon customer reviews, which have been enriched with music metadata from MusicBrainz, and audio descriptors from AcousticBrainz. MARD amounts to a total of 65,566 albums and 263,525 customer reviews. A breakdown of the number of albums per genre is provided here:


Genre Amazon MusicBrainz AcousticBrainz
Alternative Rock 2,674 1,696 564
Reggae 509 260 79
Classical 10,000 2,197 587
R\&B 2,114 2,950 982
Country 2,771 1,032 424
Jazz 6,890 2,990 863
Metal 1,785 1,294 500
Pop 10,000 4,422 1701
New Age 2,656 638 155
Dance & Electronic 5,106 899 367
Rap & Hip-Hop 1,679 768 207
Latin Music 7,924 3,237 425
Rock 7,315 4,100 1482
Gospel 900 274 33
Blues 1,158 448 135
Folk 2,085 848 179
Total 66,566 28,053 8,683


A subset of the dataset was created for genre classification experiments. It contains 100 albums by genre from different artists, from 13 different genres. All the albums have been mapped to MusicBrainz and AcousticBrainz. It contains semantic, acoustic and sentiment features. 

We also provide all the necessary files to reproduce the experiments on genre classification in the paper referenced below

For details on the datasets and download please go to http://mtg.upf.edu/download/datasets/mard 

For more details on how these files were generated, we refer to the following scientific publication. We would highly appreciate if scientific publications of works partly based on the MARD dataset quote the following publication:

Oramas, S., Espinosa-Anke L., Lawlor A., Serra X., & Saggion H. (2016).  Exploring Customer Reviews for Music Genre Classification and Evolutionary Studies. 17th International Society for Music Information Retrieval Conference (ISMIR'16). 
The MARD dataset will be introduced in the next ISMIR tutorial "Natural Language Processing for MIR" https://wp.nyu.edu/ismir2016/event/tutorials/