KGRec: Sound and Music Recommendation with Knowledge Graphs

Two different datasets with users, items, implicit feedback interactions between users and items, item tags, and item text descriptions are provided, one for Music Recommendation (KGRec-music), and other for Sound Recommendation (KGRec-sound).

Music Recommendation Dataset (KGRec-music)

Number of items: 8,640
Number of users: 5,199
Number of items-users interactions: 751,531

All the data comes from songfacts.com and last.fm websites. Items are songs, which are described in terms of textual description extracted from songfacts.com, and tags from last.fm.

Files and folders in the dataset:

/descriptions
In this folder there is one file per item with the textual description of the item. The name of the file is the id of the item plus the ".txt" extension

/tags
In this folder there is one file per item with the tags of the item separated by spaces. Multiword tags are separated by -. The name of the file is the id of the item plus the ".txt" extension. Not all items have tags, there are 401 items without tags.

implicit_lf_dataset.txt
This file contains the interactions between users and items. There is one line per interaction (a user that downloaded a sound in this case) with the following format, fields in one line are separated by tabs:

user_id \t sound_id \t 1 \n

Sound Recommendation Dataset (KGRec-sound)

Number of items: 21,552
Number of users: 20,000
Number of items-users interactions: 2,117,698

All the data comes from Freesound.org. Items are sounds, which are described in terms of textual description and tags created by the sound creator at uploading time.

Files and folders in the dataset:

/descriptions
In this folder there is one file per item with the textual description of the item. The name of the file is the id of the item plus the ".txt" extension

/tags
In this folder there is one file per item with the tags of the item separated by spaces. The name of the file is the id of the item plus the ".txt" extension

downloads_fs_dataset.txt
This file contains the interactions between users and items. There is one line per interaction (a user that downloaded a sound in this case) with the following format, fields in one line are separated by tabs:

user_id \t sound_id \t 1 \n

Scientific References

For more details on how these files were generated, we refer to the following scientific publication. We would highly appreciate if scientific publications of works partly based on this dataset quote the following publication:

Sergio Oramas, Vito Claudio Ostuni, Tommaso Di Noia, Xavier Serra and Eugenio Di Sciascio. Sound and Music Recommendation with Knowledge Graphs. ACM Transactions on Intelligent Systems and Technology (TIST), Volume 8 Issue 2, October 2016.

Download

The dataset can be downloaded here

License

Dataset compiled by Sergio Oramas, Vito Claudio Ostuni and Gabriel Vigliensoni. This dataset is licensed under Creative Commons CC BY-NC 3.0, except 3rd party data. Song text descriptions are licensed by Songfacts.com and user interactions and tags by Last.fm. Note that this dataset is considered derivative work according to paragraph 4.1 of Last.fm’s API Terms of Service. The data is made available for non-commercial use.

Feedback

Problems, positive feedback, negative feedback... it is all welcome! You can send your feedback to: [email protected]. In case of a problem report please include as many details as possible.

MTG - Music Technology Group