Recommender Systems Workshop

Public Data Sets

Hi all!

Recommender Systems and Personalization Datasets

collection of datasets that have been collected for research

  • by the group of Julian McAuley at UCSD
  • Download

MovieLens (Movies)

One of the most popular datasets for collaborative filtering

  • Time range: 1995–2019
  • Type of data: Explicit ratings (1–5 stars)
  • Size: 100K, 1M, 10M, 25M ratings
  • Download

Amazon Product Reviews (E-commerce)

Large-scale dataset of product reviews and ratings

  • Time range: 1996–2018
  • Type of data: Explicit ratings (1–5 stars), reviews
  • Size: 233M+ reviews across categories
  • Download

Netflix Prize (Movies)

Classic dataset from the Netflix Prize competition

  • Time range: ~1998–2005
  • Type of data: Explicit ratings (1–5 stars)
  • Size: 100M+ ratings from 480K users
  • Download

Yelp Open Dataset (Local Businesses)

Large dataset for local business recommendation

  • Time range: 2004–2022
  • Type of data: Explicit ratings (1–5 stars), reviews
  • Size: 8.6M reviews from 1.3M users
  • Download

Spotify Million Playlist Dataset (Music)

Dataset for playlist-based music recommendation

  • Time range: 2010–2017
  • Type of data: Implicit interaction (playlists)
  • Size: 1M playlists, 2M unique tracks
  • Download

MIND (News Recommendation)

Large-scale dataset of MSN News click interactions

  • Implicit feedback (clicks on news articles)
  • Includes article titles, abstracts, categories
  • Download

Yahoo! Music Ratings (Music)

Explicit song and album ratings from Yahoo! Music users

  • Explicit ratings (scale 0-100)
  • Part of KDD Cup 2011 challenge
  • Download