How to Extract Any Artist’s Data Using Spotify’s API, Python, and Spotipy

Original article can be found here (source): Artificial Intelligence on Medium

Writing the Code

I highly recommend following along using Repl, it is a simple to use yet powerful online IDE that works great and requires no setup. Perfect for a one-time data extraction or dataset creation project.

Import libraries

Let’s start by calling the necessary libraries that we need. We’ll be using Spotipy, pandas to create a dataframe and save our dataset, and time to pause the execution of the loop.

Connect to the API

Next, we need to authenticate and connect to Spotify’s API. To do so, we need our “Client ID” and “Client Secret”.

In the code above, replace the Client ID and Client Secret variables with your own and make sure they are inside quotes.

Retrieve IDs for each track

As I mentioned earlier, I’ll be extracting Kehlani’s albums and singles from a playlist I created which is a collection of 54 songs (~3 hours), containing every single and every song from every album she has released that is currently on Spotify, updated with her new singles.

Now we’ll write a function to get the IDs for each track of this playlist.

At the bottom of this code, where it says ids = getPlaylistTrackIDs, the two variables separated by a comma in quotes will be the username (found in the URL) of the person who created the playlist, and then the playlist URI which you can find by hitting the setting button on the playlist where you’d find the share link.

Now, let’s check what we have so far by running the length of the ids we grabbed to see if it matches the 54 songs we have on the playlist.


This is the result:

54['7AiMnJSODcJoKDejQ3mnoJ', '73C4vh7W8u41Vll5HvBqv7', '5kYZbBLAGrrhFKNbOs6D95', '18z6OV5lknJmKnZi7aA1zH', '1gHtbcRP4tz1O1NsxPpBea', '3kJudfRjZMItdFYVCCaSi6', '6mzaCRuLTRiz1caGOum3zT', '5h4Uqkh9RpRZwm5ADLh5uj', '3rGew9pmFEmGD9nZ12F1tN', '0dYDmow4l5hbPs5E6QLMSC', '1B3jkf6CyuiF8CQcKlUx9y', '6DkmFhzJrkVhDlcgcEy7Pc', '5dKy6Cgv6xwiRY3j3AJ7Uq', '0oz4ZqHuUaz3uEkP2vD0u8', '389hKTL3ZBPPWP3VuXfEyv', '5cw9s2zGrbny2M2p3WRmGm', '3YaMX9Cf68dxiG6RKo0pSY', '1cAL4sFzXXRMbpZnTPa7Zi', '4w5BVeKJFCj2rrrEy31s0n', '4ta2AWru6ldjg1aHzww0aK', '45DJ0PbKPdbslnyrcM80HN', '7yNu82yd6dYmGQ0H1q0jKo', '4v4HwTfMPslhWAnJxIXchn', '6XptjfnUvLfejptpjPRhCT', '1y2SK8EjL3WSnJvJEMWOoq', '4UMp46x46Zmu9OEr8m3Gl2', '7nb50hgKYhnHJLHKZ7qiKO','5y5OzukBTl0yTRMEdNmApJ', '3Hdl3BEFb1IEbL0Jq53enx', '0lsC0OkBgiLYbSsoHOzMnr', '0Pm1BZp4MpoMKkNxIXCfAu', '2droOB3xlZkhgfUM0owDTq', '2Nd2HLWrIq1DcNMiYPTQUC', '4j644tViOFAf4i0BYT12R8', '4k6hX9RKD096K1NCjjJZLc', '0aSW5EMeNnQSMJQ8QN3zIW', '0tkmYNfaEaH9HpR59ApRtE', '0kas95RruYRVqrOb07rgkh', '4BOikd4oZjOYMde9AXfrTo', '6ZRuF2n1CQxyxxAAWsKJOy', '1EGrDTfEuAiRzRdxlblpET', '32s2Dn9EVvO2f85MrpRoBV', '4jM3c9KLTO9iZPm9A7neiL', '6GCRnf1W9OKxok9fvNp3pz', '1Zm9qGPQkTAOBiVpGSnJUq', '3ucRKbRlikYHyoI17gfR0c', '6HUO25AttZZCoKAY0vUVtc', '5Qr7StTFbXhgHt9JlqJx0I', '1QzC4y8h6WFxHE4KlokhVr', '1xz905v9g71heS0BQQM9re', '5QTdOvIF2ehBMZpSIIGzIo', '7IJiDYPZy2AIJn3YVHhvD4', '23wuZgeX1oyJ43QYOTo9s7', '0tBBihoEWiWKqsO5ZlCbwS']

Create a function used to grab all track info from IDs

Now that we have a list of every track ids we need, we will now write a function that will be used to grab the track information such as track name, album, release date, length, and popularity, along with the list of audio features that Spotify’s API has to offer from each track ids.

Loop over tracks and apply the function

We’ll now loop over the tracks — applying the function we created— and save the dataset to a .csv file using pandas.

My raw dataset looks like this: