Setup CentOS for Deep Learning

Source: Deep Learning on Medium

Setup CentOS for Deep Learning

Setting up your Linux machine for Deep Learning is the first step towards a successful model. Although the process is very straight forward, there are some issues that may take some time to fix. This blog gives a quick note for system setup.

The general idea is to setup the machine as host (only install cuda) and use (nvidia-)docker to support all other dependencies.

  1. Install the OS

Please refer to CentOS websit.

Please refer to https://www.centos.org/

2. Internet setting (you can skip this section if there is no proxy)

  • Proxy: Companies usually have additional security settings (firewall) for the internet. Therefore, employees need to use particular proxy settings to access external internet. Usually, there will be certificate verification issue when working behind firewall. You can easier fix the certificate with the help of internet manager (IT team) or disable the certificate check (unsecure)
    Initially, the proxy can be set as
export http_proxy=//http:<>:<>@<proxy server>:<port>
export https_proxy=//https:<>:<>@<proxy server>:<port>
  • CNTLM: We don’t want to expressly put the username/password combination as an environment variable. I use CNTLM.

3. CUDA

Before start, we need to update OS and install some basic dependencies. As sudo user, run the following command

yum update
yum groupinstall “Development Tools”
yum install kernel-devel epel-release
yum install dkms

Download cuda rpm file from nvidia. Basically, google search “cuda <version> download” and you will get the url. A sample selection is given below (cuda10.0)

Sometimes the new driver raises some errors in boot sequence. If you don’t have a complete UI or control OS, it’s better to run the following command before reboot

systemctl isolate multi-user.target

4. Docker

Following the official instructions at the docker website.

sudo yum remove docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-engine
sudo yum install -y yum-utils \
device-mapper-persistent-data \
lvm2
sudo yum-config-manager \
--add-repo \
https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install docker-ce docker-ce-cli containerd.io

Create a drop-in

mkdir /etc/systemd/system/docker.service.d

Create a file with name

/etc/systemd/system/docker.service.d/http-proxy.conf

that adds the HTTP_PROXY environment variable:

[Service]
Environment="HTTP_PROXY=http://user:password@10.10.10.10:8080/"
Environment="HTTPS_PROXY=https://user:password@10.10.10.10:8080/"
Environment="NO_PROXY= hostname.example.com,172.10.10.10"

Reload the systemd daemon

systemctl daemon-reload

Restart docker

systemctl restart docker

Test the installation

docker pull hello-world

5. nvidia-docker

The official command from nvidia-docker is

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
$ sudo yum install -y nvidia-container-toolkit
$ sudo systemctl restart docker

With proxy

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -skL https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
$ sudo yum install -y nvidia-container-toolkit
$ sudo systemctl restart docker

Nvidia-docker will share the same proxy setting as docker. Test the installation with

docker run --gpus all nvidia/cuda:9.0-base nvidia-smi