top of page
Search

Exploratory Data Analysis (EDA): Taylor Swift Spotify Data Sets

Writer's picture: Regina Candra DewiRegina Candra Dewi

Updated: Dec 26, 2022



Taylor Swift is the music industry.

Everyone who knows me personally must have known that I love Taylor Swift songs. But that's not enough. I need to tell everyone; my internet friends, my future employers (*wink*) , or just random strangers that come across my page on the internet, that I love her songs!


That's why I can't resist from using this Taylor Swift Spotify datasets that I discovered on Kaggle as materials for my data analyst portfolio.


Okay, so what am I gonna present in here? Spoiler alert: there will be a lot. Taylor Swift is a versatile singer/songwriter that is known for her wide range of genres. Even for the fans, it is still debatable for what are her best songs and best albums (fyi she has 10 original albums, excluding the re-recorded and deluxe versions of it).



Here are the questions that I keen to answer using the Spotify datasets:

  • Her most popular albums on Spotify

  • What is the feature that best characterized her albums?

  • Is there any feature that can be correlated to her most popular album and songs?



Get Close with Taylor Swift Spotify Dataset

Although officially Taylor "only" has 10 original albums, but the number of albums on her Spotify profile is a whopping 46 albums with 916 songs. This number of course includes the deluxe version, karaoke version, the radio version, and the famously re-recorded version (the Taylor's Version, click here for TV explanation).


For the cleaned version, where I only included the original recording, deluxe version, and the Taylor's versions albums the new dataset has 17 albums and 319 songs.

The Spotify dataset also includes `popularity` variable to indicate (well you guess it) the popularity of the songs with the scale from 1 to 100. Based on the two data sets I previously described, the average popularity for all of her songs available in Spotify is 40.9, while the average popularity for the cleaned version is at 65.6. Mind you, I do not know the exact threshold that Spotify uses to determine a song's popularity, but I'm guessing the number 65.6 is quite high. This is, of course, to be expected from the 2nd most streamed artist on Spotify, duh.





#1 What are her most popular albums?


Taylor Swift has a lot of record breaking albums, but for the purposes of this analysis, I will only rank based on data from Spotify data sets. It's difficult to truly define her best and most popular album (even among fans, what's her best album is debatable).



It is not surprising that "Midnights", both the original version and 3AM edition, ranked 1st and subsequently 2nd as her most popular albums. The albums have broken records (that some of which were previously held by TaySwift too), such as Midnight's songs occupying all top 10 spots on billboards hot 100. Many fans speculated that this was because Taylor had successfully built the momentum. It first started from her sudden released of indie-folk sister albums: folklore & evermore which managed to augment her fanbase to indie songs enthusiast then followed by her releases of Speak Now and Red Taylor's Version which boomed on TikTok & garnered new Gen-Z followings. It is then no surprise that many people had anticipated Midnights weeks before its release.




# 2 What 'features' best characterized each of her albums?


We can't pin down a single characteristic for Taylor Swift songs, a fact that the fans delightfully celebrate. Taylor Swift has consistently ventured and reinvented herself with each of her new albums over the course of her >15-year career in the music industry. So, based on the available song features provided by this dataset, can we try to identify at least one feature that best characterizes each of her albums?


There will be 4 features that I use: `acousticness`, `danceability`, `energy`, and `valence`

As each albums have different number of songs, I tried to find the average score of these for features in each album. I tried to illustrate the way I calculate things in this chart:


It is a simple average method in which I consider all songs to be equal regardless of popularity score (I'll be doing analysis at the song level anyway, so I didn't use weighted average).



After determining the average score for all features in each album, I attempted to determine which of these four features has the highest score for each album. For example, the album Speak Now has avg score for each feature as follows; acousticness (0.5), danceability (0.6), energy (0.5), and valence (0.3). Because danceability received the highest score, I will conclude that this feature best characterizes the Speak Now album.


Here's the definition that the datasets owner had provided:


I couldn't help but think that most of her albums will have low valence scores (she has a lot, like..a lot of sad songs) especially album like folklore and evermore. And perhaps, reputation will have high score in danceability & energy.


Here's the result:


Table.1 Taylor Swift's album and its features, sorted by the most popular ones


A lot of insights can be drawn from the single table above!


First, the easiest one to spot, most of her top popular albums has `danceability` as the top feature. It is understandable given the majority of popular songs are something that the general public can dance to. This then makes folklore as the oddball among the top 5 albums, as it is the only album with `acousticness` as the top feature (but no wonder, though it is among the top 5 popular albums, folklore is the least popular).


Second, notice that none of these top albums have valence score greater than 0.5 (reminder, the lower the score, the sadder, depressed, and angry the song is). This confirms the assumption that...most of her songs are sad songs (and I love sad songs). "Unsurprisingly," Lover (0.48) has the highest valence score. Taylor herself even stated that in Lover, she wanted to explore the happiness and joy of being in love, a rare sight in her previous albums. Although, fans speculate that Lover could exist thanks to her healthy & happy relationship with the British actor, Joe Alwyn (Siri, play London Boy on Spotify now).


Third (or more like 2.1), surprisingly, the albums with the lowest valence score is Midnight (0.22 for the original vers and 0.28 for the 3AM vers). Also notice that Reputation also among the top 3 album with the lowest valence score (0.29). These three albums also her most popular albums on Spotify. It's quite interesting how, despite having a low valence score, these albums have a high danceability score.

I think Taylor has a knack on expressing her sadness, anger, depression and even happiness in danceable songs. She's the master of creating pop songs from any emotion that a human can feel, I guess.


Also, to my surprise, evermore and folklore score high on valence! Turned out although I cried a lot while listening to these two albums, the sound and melody are not sad enough. To my defense, the lyrics are sad enough, something that is not considered in this datasets (don't believe me? TaySwift don't write "Friends break up, friends get married. Strangers get born, strangers get buried. Trends change, rumors fly through new skies but I'm right where you left me" for nothing)




#3 Correlation, correlation. Is there any feature that has high correlation with her albums & songs popularity?


Almost all of Taylor Swift's top albums have received high marks in the danceability category. Is it, however, associated to popularity? Using the same previous datasets, I created a heatmap chart of the feature correlation for all of her albums. Pearson Correlation is the correlation method that I used in this case.


At first glance, we can see that danceability has a moderate positive correlation with popularity (r = 0.4) (see the orange box at the top row on the 3rd column) and valence have high negative correlation with popularity (r = -0.58)! So it appears that danceable albums are more likely to be popular, whereas happier albums are less likely to be popular.


However...I think it will be more accurate to analyze the correlation in songs-level data (that's why I did not test for the statistical significance for album-level data, sorry).



Analyzing in Songs-Level Data


Okay, let's take a look at some of her most popular songs! I've listed her (sort of) top ten songs below. Because many of her songs are tied in the popularity score, I assigned the same rank to songs with the same score and proceeded from there.


Quickly skimming, we can see that songs from Midnights album dominate the popularity ranking.It is still unclear what characteristics correlate with the popularity of the songs.


Table.2 Taylor Swift's most popular songs on Spotify



Okay, let's recalculate the correlation with the songs-level data using the Pearson Correlation Method (while also checking the statistical significance, I promise).


The heatmap chart for the correlation on Taylor Swift songs is shown below. At first glimpse, all features appear to have a weak correlation with popularity. The highest one is acousticness, which has a correlation coefficient of 0.13. Now, let's get to the important part: calculating the p-values for each of the relationships.





When I checked the p-value, I discovered that acousticness and valence have a statistically significant relationship with popularity, with a 95% confidence level.


Acousticness has a positive relationship with popularity (r=.12, p = 0.000), which means that the more acoustic a song is, the more likely it is to be popular.


Finding on the relationship of valence with popularity is still consistent with the above conclusions for album-level data, indicating that the happier the song sounds, the less likely it is to be popular (r=-0.06, p-value = 0.038).



Table.3 Correlation Analysis on Taylor Swift's Songs: Correlation Coefficient and p-value.


You may be wondering, where is the danceability? Isn't that a feature that came up frequently during my analysis of album-level data? Well.....here's the explanation on the different findings: When I looked at the standard deviation (SD) of danceability, I discovered that it has the lowest (and quite low) score among other features (see table below).


Table.4 Mean and Standard Deviation for each feature



Take a look at below charts for better visualization of standard deviation. Notice that for danceability, the data is more clustered near the mean (average) line, whereas for valence it is more spread out. Meaning that, most of Taylor Swift's songs are actually quite danceable (a common feature for pop songs).





Analyzed with python in jupyter notebook using libraries: pandas, numpy, seaborn, matplotplib, and scipy.stats

See the .ipynb file on my github repo here









Special Bonus: Fan Edition

  • How popular is ATW 10 minutes version?

It is the most popular among the other ATW version!


  • ME! is hated by many of the fans, but how is its popularity?

Among other songs in Lover album, it ties with Afterglow with popularity score 76. Even higher than Cornelia Street.......(weird right??)


  • Rank of the saddest Taylor Swift Songs, is it truly from folklore and evermore albums?

I was wrong again... most of the songs with lowest scores of valence feature are actually from Midnights and reputation!





609 views0 comments

©2022 by reginacdewi

bottom of page