Spotify Playlist Classification with Logistic Regression

Original article was published by Marcelo Dias on Artificial Intelligence on Medium

Spotify Playlist Classification with Logistic Regression


Being a big music fan and having that little touch of OCD, I have this necessity of organizing my songs. I have mainly two playlists that, whenever I am in a more happy mood, I listen to one, but when I feel more in a sad mood, I listen to the other. I have done this for about three years now and to define which songs should go to each playlist, I had to listen and classify it by my standards. However, when I started learning a bit about Data Science, I realized that this could be automatized.

First, I had to build the Dataset. Then did a Data Exploration, so that I could analyze and find patterns in it. Afterward, I did some Feature Selection, to select the features that impact the most to my model. After that, I searched for the best Classification Model for my project. Finally, I had to understand the Result Metrics, to properly understand my results.

Building the Dataset

To build a classification model, a labeled dataset is necessary. Fortunately, I already had that, since I was labeling my songs for three years, as I classified it manually into each playlist. So we have the label 1 for my happy playlist and the label 0 for the sad one.

So I used the Spotify API to build the dataset. Getting the song’s features that are going to be used as parameters for the classification. Learn more about the features here.

The dataset ended up like:

Dataset’s first 5 rows

Data Exploration

Data exploration is extremely useful to understand the dataset. So I started building a heatmap to understand how each feature related to each other:

Features heatmap

From here, we can already see that energy and acousticness will be important to the model since they are inversely proportional to each other.

Besides the heatmap, plotting a histogram for each feature is also a good way to see how they are distributed. For example, valence (measures the ‘happiness’ of a song) presents a different distribution for each label:

Valence histogram

Feature Selection

To select the features that matter the most to the model, I decided to use the chi2 test. This test consists of verifying how relevant that feature is and it does that by calculating the p-value for each feature. Learn more about the chi2 test here.
The p-value here explains, in a very rough manner, how much is this feature interfering with the result. Here we have a hypothesis that the feature is not relevant, but a low p-value rejects that hypothesis. Therefore, a high value shows that this feature does not influence much on the result and a low p-value shows that the feature influences a lot. Learn more about p-values here.
So, we must choose a limit, where does the p-value become relevant enough. Usually, this limit is set at 0.05 (95% confidence interval). So, we filter out every feature with a p-value higher than 0.05.

The p-value for each feature

As we can see, some key features were kicked out of the model such as danceability, which is pretty strange, since its common knowledge that a sad song would be not as danceable as a happy one. That shows how my brain splits those two feelings apart. Sad for me is something more acoustics with not as much energy, but it could be sad and danceable, we can see that in the score of acousticness and energy.

Classification Model

To choose the model, it was necessary to look at the problem, and from there, realize which model fits best. There are plenty of models for classification, such as Naive Bayes, Logistic Regression, SVC. Since I am classifying into two categories, I don’t need a complex algorithm that can be used to classify 50 categories, which is a bit ‘overkill’, like SVC. Since I am doing a simple project only to be used by me, I don’t need a super-fast model, that would be able to classify with just a small amount of data, like Naive Bayes. Since LR’s downfall is that it can only classify between 2 categories, but that is exactly what I need, I decided to go with Logistic Regression.

To understand this model, the best way, in my opinion, is to compare it to the Linear Regression.

Linear Regression and Logistic Regression comparison (source)

As we can see in the image, the Logistic Regression works similarly to the Linear one. However, instead of fitting a line, it fits an S shape function (Sigmoid Function).
Since the results we want are discrete and not continuous, the Logistic Regression can more accurately predict this categorical value.
Learn more about the Logistic Regression here.

Result Metrics

After training the model, I had to understand the results. I used the classification_report function from the sklearn. This returns 4 metrics:

Precision, the percentage of true positives divided by all positives, show accuracy for the positive predicted values.
Recall, the percentage of true negatives divided by all negatives, show accuracy for negative predicted values.
F1-score, a function of precision and recall, give a more general performance of the model
Support, the number of events (in this case, songs) for each label.
Learn more about these metrics here.

Classification report of the model

I chose these metrics for my results because they can easily show the false positives and false negatives from the classification. More so, the f1-score sums all of this information in one number, so we can clearly see the efficiency of the model.


As we can see from the classification report, the model performers amazingly for the playlist labeled as 1. However, the recall for playlist 0 is quite low, bringing the f1-score down a bit. That was probably because of the number of songs in the training, we can see that the label 1 has approximately the double of songs as the label 0. In general, the results were pretty good since this was my first time messing with Data Science.

Even though the positives results, there are plenty of things to improve. Tests with different models such as the Linear SVC or even Naive Bayes could have been a better way to validate which model is better. More basic statistic exploration over the dataset could have helped to bring more insights. Even a clusterization over the dataset, so it could bring a more visual approach. All those limitations will be covered in the next release, stay tuned.

This project was an amazing learning experience and really helped me to classify my music. I truly hope that this can help anyone starting with Data Science just like me! Any questions and tips would be greatly appreciated!

Check the whole code in my repository here.

Contact: LinkedIn