Source: Deep Learning on Medium
Setting Up a Multi-GPU Machine and Testing With a TensorFlow Deep Learning Model
In the past I have built a single GPU computer using a GeForce GTX 1080 and trained several deep learning models. Recently I have had the opportunity to build a multi-GPU computer for some deep learning and multi-GPU processing use cases. In the following article I will walk you through the set-up and we will train a TensorFlow model accessible via TensorFlow’s GitHub page.
Hardware specs and OS of the build
The CUDA Toolkit and Nvidia Driver was needed to utilize my graphics cards. I ran into an issue where I initially install CUDA Toolkit 10.1 which is the second latest version at the time of this article but had some compatibility issues with TensorFlow so I downgraded to 10.0. It is very important to check out the compatibility charts of what CUDA Toolkit and Driver are compatible and what TensorFlow works with specific CUDA versions. The driver I ended up choosing was 435
Here is a compatibility chart with each CUDA Toolkit version and the Nvidia Driver version
So, since we have the CUDA version and Nvidia driver picked out we will need to start the process of installing everything. Start with a update and upgrade, Ubuntu libraries, and verify the graphics cards are available. If you run the command “lspci | grep -i nvidia” and don’t see the cards. Reboot the PC and run the command again.