Source: Deep Learning on Medium
Google Colab: Work with large datasets even without downloading it!!
Yeah you read it right! In the end you are going to work with huge datasets to train your model without downloading it in your machine, Google drive, Cloud, etc.
At the very initial step we used to work with inbuilt datasets like keras datasets, sklearn, MNIST, CIFAR10 and further. But the real challenge is to work with messy dataset from all around the world, and bring the best out of it.
This sounds very pretty, but lemme tell you we also used to download the large dataset in our local machine or cloud services. Which is very irritating for developers like us.
I am going through Udacity Deep Learning Nanodegree materials(course). In this course there is one project named “Dog breed classification” in this i have to download a large size of dataset around 1.05GB. And then after i have to download the VGG16 file named DogVGG16Data.npz which is around 860MB. Which is time consuming and also increases carbon emissions.
So, i decided to find a way where i don’t need to download these things particularly, and also i can train my model rapidly without wasting my time.
So you can follow these steps:
Step 1: Open your Jupyter Notebook
Right now i am using Google Colab to train my models, as i am a beginner.
Step 2: Download the dataset using wget
This will download the dataset in 2–3 minutes!
As it was a zip file i have to extract that in my Google Drive. For this you can write this code:
from zipfile import ZipFile
file_name = “dogImages.zip”
with ZipFile(file_name, ‘r’) as zip:
print(‘Extracting all the files now’)
Easy isn’t it. 😛
Step 3: Download in certain directories
Here first i have created bottleneck_features folder and then i downloaded my DogVGG16Data.npz file in it.
To create a folder in Google Drive:
And in the next cell i downloaded my particular file.
!wget -P /bottleneck_features https://s3-us-west-1.amazonaws.com/udacity-aind/dog-project/DogVGG16Data.npz
Step 4: Train your model
Here is the link of my Github repo:
Hope you enjoyed reading this blog.