Software installation for NVIDIA GPU on Fedora 30

Source: Deep Learning on Medium

Software installation for NVIDIA GPU on Fedora 30

In the previous blog post I went over the hardware setup. This blog post covers the software installation part.

Definitions

Driver: A computer program that operates or controls a particular type of device attached to the computer.

Computer Unified Device Architecture(CUDA): Is a parallel computing platform which enables General Purpose Computing (widely known as GPGPU) in a very efficient and easy way.

cuDnn: Library for deep neural networks built using CUDA. It provides GPU accelerated functionality for common operations in neural networks.

There are a couple of ways to use GPUs from training models. The first one is to install the driver, cuda and cuDnn OR install the driver and nvidia-docker and allow docker to handle the version dependencies. The docker version is easier to setup and start training.

Order of software installation –

  • OS(Fedora 30)
  • GPU driver(430.34) + CUDA(10.1) + cuDnn
  • Docker

OS Installation

  • From an existing machine go to getfedora.org and use Fedora package manager write the bootable iso to a USB for a live install.
  • On the linux machine, update the boot order to read from the USB and insert the USB drive from the previous step.
  • From the live install save Fedora to hard disk and reclaim all partitions if you want to format the hard disk.

GPU driver(430.34) + CUDA(10.1) + cuDnn(7.6.0) + gcc(8.3.0) Installation

I have tried different combinations of driver, cuda, cudnn and gcc versions until I settled on the above versions. I have used the negativo17 as the installation repository as it has the drivers nicely packaged and importantly uses 8.3.0 version of gcc.

$ dnf config-manager — add-repo=https://negativo17.org/repos/fedora-nvidia.repo

Driver Installation

$ sudo dnf install nvidia-driver nvidia-driver-cuda akmod-nvidinvidia-settings

Verify Driver Installation

$ glxgears -info

Install CUDA and cuDNN

$ sudo dnf install cuda cuda-devel cuda-gcc cuda-gcc-c++ cuda-cudnn cuda-cudnn-devel cuda-samples

Search for cuda packages

$ dnf search cuda

Verify gcc version is 8.3.0

$ cuda-gcc — versioncuda-gcc (GCC) 8.3.0Copyright © 2018 Free Software Foundation, Inc.This is free software; see the source for copying conditions. There is NOwarranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Verify installation and get GPU info using deviceQuery

$ cp -r /usr/share/cuda/samples ~/
$ cd samples/1_Utilities/deviceQuery

Replace

[code]CUDA_PATH ?= /usr/local/cuda[/code]

with

[code]CUDA_PATH ?= /usr[/code]

in the Makefile

$ make runDevice 0: “GeForce RTX 2080 Ti”CUDA Driver Version / Runtime Version 10.1 / 10.1CUDA Capability Major/Minor version number: 7.5Total amount of global memory: 11019 MBytes (11554324480 bytes)(68) Multiprocessors, ( 64) CUDA Cores/MP: 4352 CUDA CoresGPU Max Clock rate: 1545 MHz (1.54 GHz)Memory Clock rate: 7000 MhzMemory Bus Width: 352-bitL2 Cache Size: 5767168 bytesMaximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layersMaximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layersTotal amount of constant memory: 65536 bytesTotal amount of shared memory per block: 49152 bytesTotal number of registers available per block: 65536Warp size: 32Maximum number of threads per multiprocessor: 1024Maximum number of threads per block: 1024Max dimension size of a thread block (x,y,z): (1024, 1024, 64)Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)Maximum memory pitch: 2147483647 bytesTexture alignment: 512 bytesConcurrent copy and kernel execution: Yes with 3 copy engine(s)Run time limit on kernels: YesIntegrated GPU sharing Host Memory: NoSupport host page-locked memory mapping: YesAlignment requirement for Surfaces: YesDevice has ECC support: DisabledDevice supports Unified Addressing (UVA): YesDevice supports Compute Preemption: YesSupports Cooperative Kernel Launch: YesSupports MultiDevice Co-op Kernel Launch: YesDevice PCI Domain ID / Bus ID / location ID: 0 / 65 / 0Compute Mode:< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1Result = PASS

Compile and run simplePrintf

$ cd 0_Simple/simplePrintf/$ make run/usr/bin/nvcc — include-path /usr/include/cuda -ccbin /usr/bin/cuda-g++ -I../../common/inc -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o simplePrintf.o -c simplePrintf.cu/usr/bin/nvcc — include-path /usr/include/cuda -ccbin /usr/bin/cuda-g++ -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o simplePrintf simplePrintf.omkdir -p ../../bin/x86_64/linux/releasecp simplePrintf ../../bin/x86_64/linux/release./simplePrintfGPU Device 0: “GeForce RTX 2080 Ti” with compute capability 7.5Device 0: “GeForce RTX 2080 Ti” with Compute 7.5 capabilityprintf() is called. Output:[0, 0]: Value is:10[0, 1]: Value is:10[0, 2]: Value is:10[0, 3]: Value is:10[0, 4]: Value is:10[0, 5]: Value is:10[0, 6]: Value is:10[0, 7]: Value is:10[1, 0]: Value is:10[1, 1]: Value is:10[1, 2]: Value is:10[1, 3]: Value is:10[1, 4]: Value is:10[1, 5]: Value is:10[1, 6]: Value is:10[1, 7]: Value is:10[3, 0]: Value is:10[3, 1]: Value is:10[3, 2]: Value is:10[3, 3]: Value is:10[3, 4]: Value is:10[3, 5]: Value is:10[3, 6]: Value is:10[3, 7]: Value is:10[2, 0]: Value is:10[2, 1]: Value is:10[2, 2]: Value is:10[2, 3]: Value is:10[2, 4]: Value is:10[2, 5]: Value is:10[2, 6]: Value is:10[2, 7]: Value is:10

Driver version

nvidia-settings
nvidia-smi

CUDA version

nvcc — version

cuDNN version

which nvcc
whereis cuda
cat /usr/include/cuda/cudnn.h | grep CUDNN_MAJOR -A 2 # cudnn version

Docker Installation

sudo dnf -y update
sudo reboot
sudo dnf -y install dnf-plugins-core
sudo dnf config-manager --add-repo https://download.docker.com/linux/fedora/docker-ce.repo
sudo dnf install --releasever=29 docker-ce docker-ce-cli containerd.io
sudo systemctl enable --now docker
sudo systemctl status docker
sudo usermod -aG docker $(whoami)
sudo reboot
docker version
docker pull alpine
docker run -it --rm alpine /bin/sh

Docker NVIDIA Installation

$ sudo dnf -y install dnf-plugins-core
$ sudo dnf config-manager \
--add-repo \
https://download.docker.com/linux/fedora/docker-ce.repo
$ sudo dnf install docker-ce
$ sudo systemctl start docker
$ sudo curl -s -L https://nvidia.github.io/nvidia-docker/centos7/nvidia-docker.repo | \
sudo tee /etc/yum.repos.d/nvidia-docker.repo
$ sudo dnf install nvidia-docker2
$ sudo pkill -SIGHUP dockerd
$ sudo docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

Run docker using nvidia runtime

$sudo docker run --runtime=nvidia -it tensorflow/tensorflow:latest-gpu-py3 bash

Verify GPU usage

Tensorflow version

pip3 list | grep tensorflowORpython3 -c ‘import tensorflow as tf; print(tf.__version__)’

Tensorflow helper methods

python
>>> import tensorflow as tf
>>> tf.test.is_built_with_cuda()
True
>>> tf.test.is_gpu_available()
2019-07-21 23:40:27.882020: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-21 23:40:27.903098: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-07-21 23:40:28.022270: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-21 23:40:28.022862: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x41bbe30 executing computations on platform CUDA. Devices:
2019-07-21 23:40:28.022905: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2019-07-21 23:40:28.038203: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3493075000 Hz
2019-07-21 23:40:28.040134: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x45fd760 executing computations on platform Host. Devices:
2019-07-21 23:40:28.040166: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2019-07-21 23:40:28.040404: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-21 23:40:28.041668: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:41:00.0
2019-07-21 23:40:28.042004: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-21 23:40:28.043380: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-21 23:40:28.044575: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-07-21 23:40:28.044756: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-07-21 23:40:28.045767: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-07-21 23:40:28.046535: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-07-21 23:40:28.048659: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-21 23:40:28.048797: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-21 23:40:28.049352: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-21 23:40:28.049799: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-21 23:40:28.049829: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-21 23:40:28.050719: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-21 23:40:28.050733: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-07-21 23:40:28.050737: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-07-21 23:40:28.050817: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-21 23:40:28.051320: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-21 23:40:28.051792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/device:GPU:0 with 10029 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:41:00.0, compute capability: 7.5)
True
>>> tf.test.gpu_device_name()
2019-07-21 23:40:44.938269: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-21 23:40:44.939155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:41:00.0
2019-07-21 23:40:44.939204: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-21 23:40:44.939222: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-21 23:40:44.939236: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-07-21 23:40:44.939252: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-07-21 23:40:44.939267: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-07-21 23:40:44.939282: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-07-21 23:40:44.939297: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-21 23:40:44.939373: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-21 23:40:44.940268: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-21 23:40:44.941088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-21 23:40:44.941126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-21 23:40:44.941138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-07-21 23:40:44.941147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-07-21 23:40:44.941257: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-21 23:40:44.941811: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-21 23:40:44.942295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/device:GPU:0 with 10029 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:41:00.0, compute capability: 7.5)
'/device:GPU:0'
>>> a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
>>> b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
>>> c = tf.matmul(a, b)
>>>
>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2019-07-21 23:41:59.131500: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-21 23:41:59.132476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:41:00.0
2019-07-21 23:41:59.132530: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-21 23:41:59.132548: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-21 23:41:59.132571: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-07-21 23:41:59.132589: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-07-21 23:41:59.132604: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-07-21 23:41:59.132617: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-07-21 23:41:59.132633: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-21 23:41:59.132724: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-21 23:41:59.133624: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-21 23:41:59.134459: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-21 23:41:59.134499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-21 23:41:59.134509: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-07-21 23:41:59.134517: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-07-21 23:41:59.134629: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-21 23:41:59.135533: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-21 23:41:59.136571: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10029 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:41:00.0, compute capability: 7.5)
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:41:00.0, compute capability: 7.5
2019-07-21 23:41:59.137987: I tensorflow/core/common_runtime/direct_session.cc:296] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:41:00.0, compute capability: 7.5
>>> print(sess.run(c))
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2019-07-21 23:43:15.876826: I tensorflow/core/common_runtime/placer.cc:54] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-07-21 23:43:15.876890: I tensorflow/core/common_runtime/placer.cc:54] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-07-21 23:43:15.876920: I tensorflow/core/common_runtime/placer.cc:54] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0
[[22. 28.]
[49. 64.]]

References & Related Links