Two months ago we started working on an academic project that makes use of deep neural networks. To train our model, we used Google Colaboratory platform as it is the only free GPU platform we have found. After exploring a good tutorial for using this platform, we managed to find this excellent tutorial. Although very useful, I still found it difficult to train our model effectively. For that reason Ori Licht and myself are writing this tutorial, that will hopefully help you to get the most out of your Google Colab. In this tutorial we will go over the flow we used step-by-step.
Most of the code in this tutorial can be found here.
As of October 13, 2018, Google Colab provides a single 12GB NVIDIA Tesla K80 GPU that can be used up to 12 hours continuously. Recently, Colab also started offering free TPU.
To use the google colab in a GPU mode you have to make sure the hardware accelerator is configured to GPU. To do this go to Runtime→Change runtime type and change the Hardware accelerator to GPU.
Sometimes, all GPUs are in use and there is no GPU available. If this is the case, you will get the following alert, and you will have to wait for a while and try again.
To make sure Colab uses GPU you can run:
As noted earlier, the GPU provided has 12GB of memory. However, sometimes the memory of your GPU is shared with other users. At that situation you won’t be able to train your model properly. To make sure that the current GPU memory utilization is 0, run the code below:
You should see the following output (the last line with Util 0% is what you should look for):
Requirement already satisfied: numpy in /usr/local/lib/python2.7/dist-packages (from gputil) (1.14.5)
Building wheels for collected packages: gputil
Running setup.py bdist_wheel for gputil ... done
Stored in directory: /root/.cache/pip/wheels/17/0f/04/b79c006972335e35472c0b835ed52bfc0815258d409f560108
Successfully built gputil
Installing collected packages: gputil
Successfully installed gputil-1.3.0
Requirement already satisfied: psutil in /usr/local/lib/python2.7/dist-packages (5.4.7)
Building wheels for collected packages: humanize
Running setup.py bdist_wheel for humanize ... done
Stored in directory: /root/.cache/pip/wheels/69/86/6c/f8b8593bc273ec4b0c653d3827f7482bb2001a2781a73b7f44
Successfully built humanize
Installing collected packages: humanize
Successfully installed humanize-0.5.1
('Gen RAM Free: 12.8 GB', ' | Proc size: 145.1 MB')
GPU RAM Free: 11439MB | Used: 0MB | Util 0% | Total 11439MB
If this is not the case, and the utilization is greater than 0% try to kill your machine, using the code below. This might not work on your first attempt and you can try this multiple times.
!kill -9 -1
Using your data
When training your model, we will need not only your code, but also data. Uploading your code and data to Colab is possible but has 3 main drawbacks:
- The uploading speed is very low.
- After 12 hours your machine will be shut down, and consequently all your data will be gone.
- The data is saved to the /dev/shm mount point which has only 64MB available (running !df -h will show you that).
For that reason, the easiest way to use your data is to mount your Google Drive account.
Mounting Google Drive
After running the code above, you’ll have to authenticate your Colab in your Drive account using the following steps:
A new link for authentication may appear and you should repeat the steps above.
After that being done, you should get:
Please enter the verification code: Access token retrieved correctly.
You are now ready to mount your Drive account to Colab:
!mkdir -p drive
Updating your data
Once your Drive is mounted, any change to your drive directory (editing code, adding files, etc.) will not be encountered by Colab. This is due to Colab caching mechanism. To overcome this, you should clear the cache before using your new files, using:
Note that this is a time-consuming process, and therefore you should better avoid it, if no change has been made to your drive.
If you reached to this point, you might think that you are ready to train your model. Although you can indeed start to train your model (after installing the required libraries), the training is going to be slow, and the promised speed-up using GPU won’t be noticeable. This is due to the overhead of transferring the data from your Drive to Colab. In order to overcome this issue we suggest you to use some sort of memory mapped file formats like HDF5 (aka H5) or LMDB. For your convenience, Python offers the h5py and lmdb libraries to handle these file formats. h5py makes it easy to store and manipulate existing data in the form of NumPy arrays.
This code creates a new .h5 file with placeholders for training, evaluation and test datasets.
To load your data into these placeholders you can access you place holders in a Python dictionary style. Here we load images to our X_train placeholder.
To read you .h5 file, you use the same approach. for more details see the official h5py documentation.
If you are using PyTorch, you will have to write your own .h5 Dataset that will be used by PyTorch DataLoader:
Source: Deep Learning on Medium