This data set contains 1,019,318 unique users' music play counts in the Echo Nest, which is available at "http://millionsongdataset.com/tasteprofile/". As a basic step, it is interesting to predict the play counts using the song information collected in the Million Song Dataset (Bertin-Mahieux et al. (2011)). After cleaning up and feature engineering the data in total contains 205,032 observations where we consider the covariates duration, loudness, tempo, artist hotness, song hotness, and album hotness to model the response, the number of song counts.
Format
A data frame with 4 columns and 309,685 rows.
Counts
Number of playback counts for songs
Duration
Duration of the song
Loudness
Loudness of the song
Tempo
Tempo of the song
Artist_Hotness
A value between 0 and 1
Song_Hotness
A value between 0 and 1
Album_Hotness
A value between 0 and 1
References
McFee B, Bertin-Mahieux T, Ellis DP, Lanckriet GR (2012). “The million song dataset challenge.” In Proceedings of the 21st International Conference on World Wide Web, 909--916. Ai M, Yu J, Zhang H, Wang H (2021). “Optimal subsampling algorithms for big data regressions.” Statistica Sinica, 31(2), 749--772.
Examples
nrow(One_Million_Songs)
#> [1] 205032