Build your own deep learning box — Bill of materials

Source: Deep Learning on Medium

Build your own deep learning box — Bill of materials

A trilogy in four parts

This series is a step-by-step walkthrough on how to build a GPU tower dedicated for training deep neural nets.

I have relied on the following resources to plan my build and as such these resources can provide readers with a much richer understanding of the component selection rationale.

The intent here is to document one build executed at a specific point in time (December 2019), with emphasis on detailed steps and challenges encountered along the way.

Why build?

Reason 1: If you are a beginner like me, then running several experiments on well-known models is undoubtedly the best way to jumpstart learning.

  • While the free GPU options from Kaggle and Google Collab are generous, they have strict session, storage and memory limits. (See note at end of this article).
  • Paid instances from AWS and GCP, can end up costing several hundreds of dollars every month.

You may still find a way to work within these limits e.g. by using smaller batch sizes, saving checkpoints and splitting epochs over multiple sessions and so on. But learning is clearly faster when it is unconstrained by worries about cost and time.

Reason 2: Building one’s own hardware is a unique experience and helps acquire mechanical sympathy. You can expect this to pay off in better hyper-parameter selection, code performance and training throughput.

Reason 3: CPU, RAM and storage costs are trending down and hence cheaper to upgrade. GPU is the main cost driver. Having a self-built machine gives more options to manage this cost e.g add more GPUs later when you can afford it and/or use lower-cost older generation secondary GPUs.

A typical deep learning workload

A typical deep neural net training session follows the following steps:

Step 1: Load and pre-process data. (CPU)

Step 2: Initialize model.

Step 3: Train in epochs, with each epoch roughly following these steps:

  • Load epoch data ( full or partial subset of entire data). (CPU)
  • Divide input data into batches / mini-batches. (CPU / GPU)
  • Apply transforms to mini-batch data. (CPU / GPU)
  • Feed data into model and compute layer activations and output. (GPU)
  • Compute losses and metrics. (GPU)
  • Back-propagate losses into weights and biases for each layer. (GPU)
  • Repeat for next mini-batch.
  • At the end of the epoch, feed validation data set into semi-trained model and compute validation losses and metrics (CPU / GPU).

Step 4: Feed test data set into fully trained model to predict outputs (inference) (CPU / GPU)

The utilization profile for a well-balanced training job will look like this;

Utilization charts for a sample training job (logged in wandb.com)

The main takeaways are:

  • Data pre-processing is CPU-intensive workload.
  • Model training, validation and testing is distributed between CPU (mini-batch data loading and augmentation) and GPU (forward pass and back-propagation).
  • While GPU is the star of the show, not feeding it data at a fast enough rate will reduce its utilization and overall throughput. Ensure CPU is adequately powered and do as much data pre-processing as possible before the first training epoch.

Component Selection

PCPartPicker.com is easily the best place to compose your BOM. The site allows you to compare features and price and make selections for each component. More importantly it ensures compatibility between your components at every step and keeps track of your wattage. You can also review what others have built and learn from their experiences.

As GPU is the most difficult and expensive choice, it is logical to select GPU first, then a compatible motherboard, CPU, RAM, storage and so on.

GPU

EVGA GeForce RTX 2060 6 GB

Tim Dettmer’s blog is probably the last thing anyone ever needs to say about selecting a GPU. Even though he recomments a 2070 for beginners, I went with an RTX 2060 6 GB for the following reasons:

  • Good price: $340 (vs. $540 for RTX 2070).
  • Best performance per dollar.
  • Supports 16-bit training.

Note: NVIDIA is practically a monopoly when it comes to deep learning GPUs. All popular deep learning frameworks leverage NVIDIA’s CUDA Toolkit for distributing load to GPU. Library support for other manufacturers is very limited. So it is safest to go with an NVIDIA GPU for now.

Post-build lessons learnt:

  • This is not a blower-style fan as recommended for multi-GPU builds. This GPU still vents hot air into the case. The EVGA 2-fan model would have been a wiser choice. It is wider but shorter than the 1-fan model. The 1-fan model takes up more vertical space on the motherboard and now I am not sure if another GPU will fit easily in my case.

Motherboard

Asus ROG STRIX B450-F GAMING ATX AM4

There are really a lot of choices out there with no information on what really differentiates each manufacturer and model. So my selection was based primarily on compatibility, cost and a few other fuzzy criteria.

  • Multi-GPU support. There are 2 PCI 3.0 x 16 and 1 PCI 2.0 x 16 slots. However after installing the RTX 2060, I find space is available for just one more GPU.
  • Up to 64 GB DDR4 RAM
  • No inbuilt Wi-Fi / bluetooth adapters. I opted to use an external wi-fi adapter (ASUS PCE-AC68 at $69), just so mobo and wifi can be upgraded separately.

Post-build lessons learnt:

  • Even though the board has 3 PCI E x16 expansion slots, there is barely enough space for 2 GPUs. Here is a feature idea for PCPartPicker — virtual assembly.
  • An onboard graphics card or CPU with integrated graphics would have completely separated display workload from deep learning workload.

CPU

AMD Ryzen 7 2700X 3.7 GHz 8-Core Processor

CPU (left), Wraith Prism Cooler (right)
  • General consensus that AMD has better price performance compared to Intel.
  • Good price for the number of cores and clock speed: $160 (relative to $310 for the next level 3700X).
  • Sufficient head room for additional GPUs with 8 cores and support for upto 64 GB RAM.
  • Comes with its own “Wraith Prism” cooler.

Post-build lessons learnt:

  • An onboard graphics card or CPU with integrated graphics would have completely separated display workload from deep learning workload.

RAM

Corsair Vengeance LPX 16GB (2x8GB) DDR4 DRAM 3000MHz C15 Desktop Memory Kit

  • The selected motherboard has 4 DDR4 slots that accept 16 GB memory.

Post-build lessons learnt:

  • I added 2 x 16 GB later and now have oddly sized RAM of 48 GB with unusable capacity of 16GB. Should have selected 2 x 16 GB in the first place.

Storage

Samsung 970 Evo 1 TB M.2–2280 NVME Solid State Drive

  • Price: $150 (17 cents per GB).
  • Reviews indicate fast bootspeed and everything else.

Case

Fractal Design Focus G ATX Mid Tower Case

  • Roomy case with in-built fans.
  • Space to hide cables in the back.
  • Support for liquid cooling (if needed in the future).
  • Good price at $58.

Power Supply

GameMax RGB 850 W 80+ Gold Certified Fully Modular ATX

PCPartPicker also computes the estimated wattage of your build (which in my case was 374 watts).

  • Reasonably priced at $80.
  • High efficiency rating
  • At 850 W, provides adequate room for growth.

Post-build lessons learnt:

  • Oversized for build. I never intended to use SATA drives. The motherboard is constrained to 1 CPU and 2 GPUs. A smaller PSU potentially could have created space for the 3rd GPU.

Peripherals, Tools and Supplies

  • PS2 or USB keyboard and mouse for initial startup
  • Monitor with HDMI cable
  • Thermal compound paste (Stock cooler has thermal compound pre-applied, which may come off over multiple installation attempts)
  • Anti-static wrist band and/or mat
  • Cable ties
  • Screw drivers — 1 Phillips, 1 Flathead, preferably with magnetic tip
  • Magnetic pickup
  • Magnifying lens
  • Compressed air duster
  • Flashlight
  • Scissors

You can view the complete BOM and contrast with other deep learning builds on PCPartPicker and read about the installation experience in Part 2: Hardware assembly.