Source: Deep Learning on Medium
An year ago, I built a 5 GPU mining rig for crypto mining. Here is my specs —
– Intel Pentium G4400 Skylake Dual-Core 3.3 GHz
– Thermaltake Toughpower 1200W
– MSI Z170A XPOWER GAMING TITANIUM EDITION LGA 1151
– Team Elite Plus 4GB 288-Pin DDR4 SDRAM DDR4 2400
– Silicon Power Slim S55 2.5″ 120GB SATA III TLC Internal Solid State Drive
– Sapphire Radeon NITRO+ RX 580 4GB
– Sapphire PULSE Radeon RX 570 4GB
– PowerColor RED DEVIL Radeon RX 570 4GB
– Sapphire PULSE Radeon RX 580 8GB
– PowerColor RED DEVIL Radeon RX 580 8GB
– 6-Pack Ver006C Mining Dedicated PCIe Riser Cable Card Adapter
The reason I used AMD GPUs is because they are inexpensive and high hash rates. In order to quickly setup, I used ethOS system to mine ethereum. Nowadays, it’s no longer profitable to mine etherem. So I want to rebuild the system so that i can do both crypto mining and deep learning. Luckily, recently AMD release of ROCm enabled TensorFlow v1.8 (https://gpuopen.com/rocm-tensorflow-1-8-release/). Here are steps and takeaways for this. I hope this would help others when they build the system.
- Install Ubuntu system. Here I recommend to use 16.04.3 or 18.04.1 versions. Both are proved to be good to go. I used to install 16.04.5, but facing problems to install amdgpu and rocm.
- Boot with UEFI. In BIOS, the default setting is “Lagency+UEFI”. We will need to change to UEFI so that we can enable “Above 4G Decoding” for 5th GPU.
- Create a USB driver to Ubuntu 16.04.3. Firstly download desktop img from http://old-releases.ubuntu.com/releases/16.04.3/. I recommend Universal-USB-Installer-188.8.131.52 to create this image.
- Before install, we unplugged all GPUs. we will need to first connect HDMI to integrated GPU (this is optional, sometimes connect to one discrete GPU is fine too)
- When install Ubuntu, we will need to create an EFI partition. Basically, we will have three partitions: 1)EFI 2GB, 2) / 15GB, 3) swap 16GB+, 4) /home remaining GBs. The system should be installed in EFI. Otherwise we will face the issue: “grub-efi-amd64-signed package failed to install into /target/. without the grub boot loader, the installed system will not boot.”
- After installed, we need to open the terminal and code:
sudo apt updates
sudo apt dist-upgrade
After this, the kernel should be updated to 14.15 or higher. Then we need to open grubs:
sudo nano /etc/default/grub
In GRUB_CMDLINE_DEFAULT=“quiet splash”, need to add as =“quiet splash amdgpu.dc=0”.
The amdgpu display code is introduced in 14.15+ to better support AMD RX Vega GPU. However, that created issues for RX570/580 sometimes. By setting it to 0, we will bypass those issues. If not, we may see black screen, or infinite loop or login screen. Here we cannot use nomodeset option, because that will create trouble for amdgpu installation and rocm. If we forget to to this, we will need to boot into recovery mode to do so (type ESC during booting, then choose advance Ubuntu->recovery mode), or change grub (type ESC during booting, the type E to edit grub)
- Now we can install amdgpu. This is optional for deep learning though. Follow this guideline: https://www.amd.com/en/support/kb/faq/gpu-635
- Now everything is ready! We can plug in all 5 GPUs (or you want to start one by one). In Bios, we will need to enable “Aboe 4G Decoding” to support 5+ GPUs. We may also want to change PEG1/PEG2/PEG3 to “GEN2”, and change PCI latency cycle to “96 cycles”. Then we connect HDMI to the first GPU. We should be able to see login screen. Once we login, we open terminal and type: lspci|grep ‘ VGA ’ to check if all 5 GPUs are recognized. If not, most likely it is a hardware problem. You may want to test each raiser, or exchange raisers for GPU, or change PCI slot for GPU. Normally I will need to do those several times to get all 5. (I bought very cheap raisers from Newegg that explains why)
- Now we can follow this guideline to install rocm: https://rocm.github.io/ROCmInstall.html, starting from “Ubuntu Support — installing from a Debian repository”. If we firstly install amdgpu-pro, we will encounter error. That’s because amdgpu also use DKMS at the same time. We will need to force overwrite:
sudo dpkg -i — force-overwrite /var/cache/apt/archives/rock-dkms_*.*-*_all.deb
sudo apt install -f
Here you can check in the given directory to replace * with correct version number. If we didn’t install amdgpu, we won’t have this issue.
- After reboot, we can type in terminal: /opt/rocm/bin/rocminfo to see if we install successfully. We will see couple agents with given GPU info.
- Now we can install python packages and tensorflow:
sudo apt-get update && sudo apt-get install -y \
pip3 install tensorflow-rocm
- Now we can test using examples in https://gpuopen.com/rocm-tensorflow-1-8-release/
- Sometimes we want to use anaconda. Keep in mind rocm only support python 3.5 and 3.6. So you will either create virtual environment, or just install archived version Anaconda3–4.2.0-Linux-x86_64.sh. Here we may face issue:“/usr/lib/libstdc++.so.6: version `GLIBCXX_3.4.20′ not found”. We need to open conda prompt and install libgcc: conda install libgcc
- We also want to do crypto mining during free time. Go to https://github.com/ethereum-mining/ethminer/releases to download linux version, extract in a desired location. The open terminal and type:
sudo ~/bin/ethminer -G stratums://0x(your wallet).(your machine).(pwd)@us1.ethermine.org:5555 — farm-recheck 200
Replace wallet, machine and pwd without bracket
- We also want to set fan speed. In terminal type:
git clone https://github.com/dominilux/amdgpu-pro-fans
chmod +x amdgpu-pro-fans.sh
Now we can set fan speed to 75%:
sudo ~/amdgpu-pro-fans/amdgpu-pro-fans.sh -s 75
- We may also want to check GPU temperature (using lm-sensors)
In terminal type:
sudo apt-get install lm-sensors
- Finally you don’t want to put this huge machine in your bedroom/living room to make your wife, girl friend or mom mad. So you want to put that in the garage. This machine is actually very helpful to dry garage and prevent mold. You will need remote desktop. You need a static IP, open: /etc/network/interfaces, then add:
iface eth0 inet static
then sudo reboot
sudo /etc/init.d/networking restart
now you can install xrdp to enable remote:
sudo apt install xrdp
sudo systemctl enable xrdp
Now in your windows laptop, find the “remote desktop connection”. Type IP address and connect using name and password from remote machine. At the very last, you will find the remote screen too big and occupy everything! You type ctrl+alt+break to make it small.
That’s all. I hope this solved most of issues we faced.