Extracting Audio Files from API & Storing it on a NoSQL Database

Original article was published on Artificial Intelligence on Medium

Extracting Audio Files from API & Storing it on a NoSQL Database

The simplest way to store and extract audio files (.wav/API) in MongoDB

Please follow the below steps to store and extract audio files in MongoDB database using PyMongo. So let’s start.

Step 1: Import all the Important Libraries

from pymongo import MongoClient
import requests
from io import BytesIO
from scipy.io.wavfile import read, write

Import all the important libraries and classes, if not present then install.

Step 2: Setup Connection to Database

cluster = MongoClient(
"mongodb+srv://<USERNAME>:<PASSWORD>@cluster04x3li.mongodb.net/test?retryWrites=true&w=majority"
)

Enter the username and password if you are storing the audio in the cloud. To store in your local system put the below command to connect to the database.

cluster = MongoClient('localhost', 27017)

If you need help to setup MongoDB on the cloud follow the link.

Step 3: Create your DB and Collection

We have to create the database and collection then specify which DB and collection we want to use by the below commands.

# database
db = cluster["aiproduct"]
# collections
collection = db["audiofile"]

Once our database is ready, we can move to the audio part.

Step 4: Get the Audio File

In my case, I am taking the audio from Twilio recording via HTTP GET calls to the Recordings API resource.

Step 5: Extracting the Data from the API

rate, data = read(BytesIO(response.content))

The response from the request is a binary file, therefore we use BytesIO to make it readable by Scipy library. In return, we get the sample rate which is a Python integer (in samples/sec), and data from the file is returned as a NumPy array.

Step 7: Storing the Data in MongoDB

collection.insert_one(
{"rate": rate,
"data": data.tolist(),
}

Since our data is smaller than 16MB, we will use the pymongo insert_one() method to insert a single document. We got the data as NumPy array which we can’t store, so we will convert it to a list using tolist().

We have successfully uploaded the file in the DB

Step 8: Reading the data from MongoDB

results = collection.find({})
for result in results:
write("audio.wav", result["rate"],
np.fromiter(result["data"],np.int16))

The audio file in the DB is saved as a list, to convert the list to an audio file first we have to make it a NumPy array so that scipy can write. The write method of scipy will convert a NumPy array to a WAV file.

Congratulations now you know how to store and extract audio file in a NoSQL database.