Skip to contents

This data set contains 1,019,318 unique users' music play counts in the Echo Nest, which is available at "http://millionsongdataset.com/tasteprofile/". As a basic step, it is interesting to predict the play counts using the song information collected in the Million Song Dataset (Bertin-Mahieux et al. (2011)). After cleaning up and feature engineering the data in total contains 205,032 observations where we consider the covariates duration, loudness, tempo, artist hotness, song hotness, and album hotness to model the response, the number of song counts.

Usage

One_Million_Songs

Format

A data frame with 4 columns and 309,685 rows.

Counts

Number of playback counts for songs

Duration

Duration of the song

Loudness

Loudness of the song

Tempo

Tempo of the song

Artist_Hotness

A value between 0 and 1

Song_Hotness

A value between 0 and 1

Album_Hotness

A value between 0 and 1

References

McFee B, Bertin-Mahieux T, Ellis DP, Lanckriet GR (2012). “The million song dataset challenge.” In Proceedings of the 21st International Conference on World Wide Web, 909--916. Ai M, Yu J, Zhang H, Wang H (2021). “Optimal subsampling algorithms for big data regressions.” Statistica Sinica, 31(2), 749--772.

Examples

nrow(One_Million_Songs)
#> [1] 205032