[1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
Version control:
Git
Github
Consistent file structure and naming (e.g., 0-dataMunging)
Github page that includes all the files
Clear and properly commented code
The data was squired through Spotify API in 2020 by TidyTuesday
The class of the data frame
[1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
[1] 32833
[1] "track_id" "track_name"
[3] "track_artist" "track_popularity"
[5] "track_album_id" "track_album_name"
[7] "track_album_release_date" "playlist_name"
[9] "playlist_id" "playlist_genre"
[11] "playlist_subgenre" "danceability"
[13] "energy" "key"
[15] "loudness" "mode"
[17] "speechiness" "acousticness"
[19] "instrumentalness" "liveness"
[21] "valence" "tempo"
[23] "duration_ms"
edm latin pop r&b rap rock
6043 5155 5507 5431 5746 4951
# A tibble: 5 × 4
track_name track_artist track_album_name track_id
<chr> <chr> <chr> <chr>
1 <NA> <NA> <NA> 69gRFGOWY9OMpFJgFol1u0
2 <NA> <NA> <NA> 5cjecvX0CmC9gK0Laf5EMQ
3 <NA> <NA> <NA> 5TTzhRSWQS4Yu8xTgAuq6D
4 <NA> <NA> <NA> 3VKFip3OdAvv4OfNTgFWeQ
5 <NA> <NA> <NA> 69gRFGOWY9OMpFJgFol1u0
[1] 4477
Songs are duplicated because they are on multiple playlists
Since the were not concerned with how many playlists a song is on, removing the duplicate is justified.
the class of “Key” and “Mode” is numerical
[1] "numeric"
[1] "numeric"
However, they should be categorical variables with different levels
[1] "factor"
[1] "factor"
The new data with no duplicates
edm latin pop r&b rap rock
4877 4137 5132 4504 5401 4305

How features change within genres
Latin stands out in danceability and valence.
The massive dips in the plot are caused by small number of observations around the year 1960 which is clear in the bar plot below.
What’s the correlation between energy, loudness and acousticness?
Positive correlation between energy and loudness
Negative correlation between energy and acousticness
Does the track duration affect the popularity of the song?
Pop and Latin are the top most popular genres.
The higher the danceability/ valence, the more positively it correlates to the popularity.
Energy and loudness are positively correlated.
Energy and acoustics are negativelly correlated.
Release date does not have a clear effect on the track popularity.
Track duration does not have a clear effect on the track popularity