Import Data into Google Colaboratory



As we saw in my previous story, how we can use the GPU/TPU for free. So, now we can implement our deep learning programs without paying money for expensive GPU on AWS or other cloud platforms.

How to Import Data to Colab?

1. Using wget

The Google Colab machines are built on Debian based Linux, therefore the simplest way for downloading data over a network is wget.

You could upload files somewhere, after that you can download from code cell notebooks and use this shell command: wget.

!wget http://your_domain/your_file.zip

2. From Local System

If your data is in your local system, then run this lines of code, after execution of this lines you will see a Browse button through which you can browse in your local directories and upload your data.

from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
print(‘User uploaded file “{name}” with length {length} bytes’.format(name=fn, length=len(uploaded[fn])))

Here, files.upload returns a dictionary of the files which were uploaded. The dictionary is keyed by the file name, the value is the data which was uploaded.

3. Google Drive

If your data is stored in Google Drive then you can access files fromDrive in a number of ways, including:

a. Mounting Google Drive locally

The example below shows how to mount your Google Drive in your virtual machine using an authorization code. Execute this lines in your notebook,

from google.colab import drive
drive.mount(‘/content/gdrive’)

Once executed, click on that link and login in your Google account, it will redirect you to a page where you will get an authorization code. Copy that code and paste in the input box show and press Enter.

Congrats, your gdrive is mounted at /content/gdrive. Now you can read and write files in Gdrive.

b. Using a wrapper around the API such as PyDrive

The example below shows, 1) authentication, 2) file upload, and 3) file download.

!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# PyDrive reference:
# https://gsuitedevs.github.io/PyDrive/docs/build/html/index.html
# 2. Create & upload a file text file.
uploaded = drive.CreateFile({‘title’: ‘Sample upload.txt’})
uploaded.SetContentString(‘Sample upload file content’)
uploaded.Upload()
print(‘Uploaded file with ID {}’.format(uploaded.get(‘id’)))
# 3. Load a file by ID and print its contents.
downloaded = drive.CreateFile({‘id’: uploaded.get(‘id’)})
print(‘Downloaded content “{}”’.format(downloaded.GetContentString()))

For more details on PyDrive visit- Pydrive Documentation

4. Google Sheets

Yes, we can import data from Google Sheets too, using the existing open-source gspread library for interacting with Sheets. But First, we need to install the package using pip.

!pip install — upgrade -q gspread

Next, we’ll import the library, authenticate, and create the interface to sheets.

from google.colab import auth
auth.authenticate_user()
import gspread
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())

After executing this code, you will find a input box same as we saw while getting permission for google drive. Visit link and login in your google account and copy and paste the verification code in this box.

Now, you can upload as well as download data to and from google sheets. To read data from google sheet execute the following code.

# Open your exisiting sheet and read some data.
worksheet = gc.open(‘spreadsheet name’).sheet1
# get_all_values gives a list of rows.
rows = worksheet.get_all_values()
print(rows)
# Convert to a DataFrame and render.
import pandas as pd
pd.DataFrame.from_records(rows)

For more details visit this Colaboratory notebook.

So, we have learned to import our data from websites, local file system, google drive and google sheets.

So Programmers, let go and harness the power of free GPU and TPU’s.

Source: Deep Learning on Medium