Setting up your GPU machine to be Deep Learning ready


Hi there,

This tutorial is a loose continuation of my previous article, do take a look.

This is written assuming you have a bare machine with GPU available, feel free to skip some part if it came partially pre set-up, also I’ll assume you have an NVIDIA card, and we’ll only cover setting up for TensorFlow in this tutorial, being the most popular Deep Learning framework (Kudos to Google!)

Installing the CUDA drivers

CUDA is a parallel computing platform by NVIDIA, and a basic prerequisite for TensorFlow. But as we will understand later, it is actually better to start the reverse way, so let’s get back to this part later.

Installing TensorFlow

Fire up your terminal (or SSH maybe, if remote machine). Find the version of TensorFlow you need for your particular application (if any), or if no such restriction let’s just go for TensorFlow 1.8.0 which I currently use.

pip install tensorflow-gpu==1.8.0

Let it install. Now move to a Python shell by running,

python

In your Python shell, type in:

import tensorflow as tf

At this moment since we didn’t install CUDA, you should see an error similar to this:

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

9.0 and filename may be something else depending on the version of TensorFlow you chose. But the whole point of going the reverse way was to know which version of CUDA do we need, which is 9.0 in this case. The official documentation is not clear on the correspondence of TF version and CUDA version, so I always found this reverse engineering method better.

Let’s go back to installing CUDA.

Installing the CUDA drivers (this time we’ll really do it, promise)

So navigate to https://developer.nvidia.com/cuda-toolkit-archive . Choose the version you just determined above.

Linux->x86_64->Ubuntu->16.04 (or 17.04)->deb (network)

Download the deb to your machine, and follow the instructions given below to install CUDA.

Going back, reopen the Python shell and,

import tensorflow as tf

So we are not yet done, you should see a little different error message now. (if you see the same one as earlier, refer to “Troubleshooting” below)

ImportError: libcudnn.so.7: cannot open shared object file: No such file or directory

We need one more piece of NVIDIA library called cuDNN, which is used for GPU accelerations for deep neural networks. Again notice the version of cuDNN we need, 7.0 in this case.

Navigate to https://developer.nvidia.com/cudnn and register for an account (it’s free). After you make your account, login and go to https://developer.nvidia.com/rdp/cudnn-archive

Choose the required cuDNN version and also make sure of the CUDA version you choose that version for. In this case we need,

Download cuDNN v7.0.5 (Dec 5, 2017), for CUDA 9.0

and in drop down choose,

cuDNN v7.0.5 Library for Linux

The tgz file will start downloading, move it to your machine, extract it using

tar -xzvf <CUDNN_TAR_FILENAME>

A folder “cuda” will be extracted, cd to that directory, and execute both of these,

sudo cp lib64/* /usr/local/cuda/lib64/
sudo cp include/cudnn.h /usr/local/cuda/include/

And we are done (hopefully). Again fire up the Python shell, and you know what to do.

If it doesn’t throw any error this time over, we are good.

Just to make sure out GPU is being detected by TensorFlow, run this on the same Python shell

tf.test.gpu_device_name()

It should print out all GPUs available to you.

Troubleshooting

If you did everything correctly, and still TensorFlow throws the same error, it is probably an issue with the environment path variables. Execute these to fix:

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH}
export PATH=/usr/local/cuda/lib64:${PATH}

Let me know about any issues, suggestions or criticism.

Cheers.

Saurabh is an undergraduate Computer Science major at National Institute of Technology, Warangal, India and currently a research intern at Indian Institute of Science, Bangalore.

Source: Deep Learning on Medium