Setting Up a Multi-GPU Machine and Testing With a Tensorflow Deep Learning Model

Source: Deep Learning on Medium

Setting Up a Multi-GPU Machine and Testing With a TensorFlow Deep Learning Model

In the past I have built a single GPU computer using a GeForce GTX 1080 and trained several deep learning models. Recently I have had the opportunity to build a multi-GPU computer for some deep learning and multi-GPU processing use cases. In the following article I will walk you through the set-up and we will train a TensorFlow model accessible via TensorFlow’s GitHub page.

Hardware specs and OS of the build

The CUDA Toolkit and Nvidia Driver was needed to utilize my graphics cards. I ran into an issue where I initially install CUDA Toolkit 10.1 which is the second latest version at the time of this article but had some compatibility issues with TensorFlow so I downgraded to 10.0. It is very important to check out the compatibility charts of what CUDA Toolkit and Driver are compatible and what TensorFlow works with specific CUDA versions. The driver I ended up choosing was 435

Here is a compatibility chart with each CUDA Toolkit version and the Nvidia Driver version

CUDA Compatibility Chart

So, since we have the CUDA version and Nvidia driver picked out we will need to start the process of installing everything. Start with a update and upgrade, Ubuntu libraries, and verify the graphics cards are available. If you run the command “lspci | grep -i nvidia” and don’t see the cards. Reboot the PC and run the command again.