Upgrade your Nvidia GPU Drivers for a Better Performance

Original article was published on Artificial Intelligence on Medium

Upgrade your Nvidia GPU Drivers for a Better Performance 🚀

Your training is unstable? Consider upgrading Cuda to the latest version

I have a GPU server running on an Ubuntu 16.04 distribution and an Nvidia 1080 TI. I bought it in 2018 and to this day, it’s still using Cuda 9.0 with a driver version of 384.130.

My setup

I’ve decided to upgrade it to Cuda 10.2 and use the latest version of the driver (440.33.01). I wanted to do this for a long time for a couple of reasons:

  • I occasionally noticed an unstable training and GPU failures with my current setup
  • Some of the latest versions of common deep learning frameworks (PyTorch mainly) were running on Cuda 10+ only

So it was the perfect time to do it. I documented the different steps in this post. Let’s go!

  1. head over https://developer.nvidia.com/cuda-downloads
  2. select your
    – operating system (Linux)
    – architecture (x86_64)
    – distribution (Ubuntu)
    – version (16.04)
    – installer type (runfile (local))
  3. Once you’ve selected the elements, you’ll be prompted with the two following commands. Execute them
wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.runsudo sh cuda_10.2.89_440.33.01_linux.run

4. Reboot your machine

sudo reboot now

You should be okay now. However, if you encounter this error message when running nvidia-smi

Failed to initialize NVML: Driver/library version mismatch

This means that your previous driver was too old to be detected and removed by thecuda_10.2.89_440.33.01_linux.run script.

That’s why you’ll need to purge Nvidia old dependencies:

sudo apt-get update
sudo apt-get --purge remove cuda
sudo apt-get autoremove
dpkg --list |grep "^rc" | cut -d " " -f 3 | xargs sudo dpkg --purge
sudo apt-get purge nvidia*
sudo apt-get update
sudo reboot now

5. Create a symbolic link to the latest version of Cuda

sudo ln -s /usr/local/cuda-10.2 /usr/local/cuda

6. Update your .bashrc

old lines:
export PATH=/usr/local/cuda-9.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64:$LD_LIBRARY_PATH

new lines:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

then reboot to save changes.

7. Install Cudnn

Create an Nvidia Developer account here. Then, head over this link and download Cudnn.

Once this file downloaded, run the following commands:

sudo cp cuda/include/cudnn.h /usr/local/cuda/include sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64 sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

8. Check your install

I created a new conda environment with the latest PyTorch version that supports Cuda 10.2. Once installed I run the following commands:

import torchtorch.cuda.is_available()
# True
# '10.2'
# 7605

But you can check with your favorite framework as well.

Thanks for reading, I hope this was useful!