Original article was published on Deep Learning on Medium
How to Properly Use the GPU within
a Docker Container
In this post, we walk through the steps required to access your machine’s GPU within a Docker container.
Configuring the GPU on your machine can be immensely difficult. The configuration steps change based on your machine’s operating system and the kind of NVIDIA GPU that your machine has. To add another layer of difficulty, when Docker starts a container — it starts from almost scratch. Certain things like the CPU drivers are pre-configured for you, but the GPU is not configured when you run a docker container. Luckily, you have found the solution explained here. It is called the NVIDIA Container Toolkit!
Potential Errors in Docker
When you attempt to run your container that needs the GPU in Docker, you might receive any of the following errors.
Error: Docker does not find Nvidia drivers
docker: Error response from daemon: Container command 'nvidia-smi' not found or does not exist..
Error: tensorflow cannot access GPU in Docker
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:150] kernel reported version is: 352.93
I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.
Error: pytorch cannot access GPU in Docker
RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:50
Error: keras cannot access GPU in Docker
The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
You may receive many other errors indicating that your Docker container cannot access the machine’s GPU. In any case, if you have any errors that look like the above, you have found the right place here.
First, Make Sure Your Base Machine Has GPU Drivers
You must first install NVIDIA GPU drivers on your base machine before you can utilize the GPU in Docker. As previously mentioned, this can be difficult given the plethora of distribution of operating systems, NVIDIA GPUs, and NVIDIA GPU drivers. The exact commands you will run will vary based on these parameters. Here are some resources that you might find useful to configure the GPU on your base machine.
Once you have worked through those steps, you will know you are successful by running the nvidia-smi command and viewing an output like the following.
Now that we can assure we have successfully assure that the NVIDIA GPU drivers are installed on the base machine, we can move one layer deeper to the Docker container.
Next, Exposing the GPU Drivers to Docker
In order to get Docker to recognize the GPU, we need to make it aware of the GPU drivers. We do this in the image creation process. Docker image creation is a series of commands that configure the environment that our Docker container will be running in.
The Brute Force Approach — The brute force approach is to include the same commands that you used to configure the GPU on your base machine. When docker builds the image, these commands will run and install the GPU drivers on your image and all should be well. The brute force approach will look something like this in your Dockerfile (Code credit to stack overflow):
MAINTAINER Regan <http://stackoverflow.com/questions/25185405/using-gpu-from-a-docker-container>RUN apt-get update && apt-get install -y build-essential
RUN apt-get --purge remove -y nvidia*ADD ./Downloads/nvidia_installers /tmp/nvidia > Get the install files you used to install CUDA and the NVIDIA drivers on your host
RUN /tmp/nvidia/NVIDIA-Linux-x86_64-331.62.run -s -N --no-kernel-module > Install the driver.
RUN rm -rf /tmp/selfgz7 > For some reason the driver installer left temp files when used during a docker build (i don't have any explanation why) and the CUDA installer will fail if there still there so we delete them.
RUN /tmp/nvidia/cuda-linux64-rel-6.0.37-18176142.run -noprompt > CUDA driver installer.
RUN /tmp/nvidia/cuda-samples-linux-6.0.37-18176142.run -noprompt -cudaprefix=/usr/local/cuda-6.0 > CUDA samples comment if you don't want them.
RUN export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64 > Add CUDA library into your PATH
RUN touch /etc/ld.so.conf.d/cuda.conf > Update the ld.so.conf.d directory
RUN rm -rf /temp/* > Delete installer files.
The Downsides of the Brute Force Approach — First of all, every time you rebuild the docker image you will have to reinstall the image, slowing down development. Second, if you decide to lift the docker image off of the current machine and onto a new one that has a different GPU, operating system, or you would like new drivers — you will have to re-code this step every time for each machine. This kind of defeats the purpose of build a Docker image. Third, you might not remember the commands to install the drivers on your local machine, and there you are back at configuring the GPU again inside of Docker.
The Best Approach — The best approach is to use the NVIDIA Container Toolkit. The NVIDIA Container Toolkit is a docker image that provides support to automatically recognize GPU drivers on your base machine and pass those same drivers to your Docker container when it runs. So if you are able to run nvidia-smi, on your base machine you will also be able to run it in your Docker container (and all of your programs will be able to reference the GPU). In order to use the NVIDIA Container Toolkit, you simply pull the NVIDIA Container Toolkit image at the top of your Dockerfile like so — nano Dockerfile:
This is all the code you need to expose GPU drivers to Docker. In that Dockerfile we have imported the NVIDIA Container Toolkit image for 10.2 drivers and then we have specified a command to run when we run the container to check for the drivers. Now we build the image like so with docker build . -t nvidia-test:
Now we run the container from the image by using the command docker run — gpus all nvidia-test. Keep in mind, we need the — gpus all or else the GPU will not be exposed to the running container.
From this base state, you can develop your app accordingly. In my case, I use the NVIDIA Container Toolkit to power experimental deep learning frameworks. The layout of a fully built Dockerfile might look something like the following (where /app/ contains all of the python files):
CMD nvidia-smi#set up environment
RUN apt-get update && apt-get install --no-install-recommends --no-install-suggests -y curl
RUN apt-get install unzip
RUN apt-get -y install python3
RUN apt-get -y install python3-pipCOPY app/requirements_verbose.txt /app/requirements_verbose.txtRUN pip3 install -r /app/requirements_verbose.txt#copies the applicaiton from local path to container path
COPY app/ /app/
WORKDIR /appENV NUM_EPOCHS=10
ENV TRAIN_TIME_SEC=100CMD ["python3", "train_and_eval.py"]
The above Docker container trains and evaluates a deep learning model based on specifications using the base machines GPU. Pretty cool!
What if I need a different base image in my Dockerfile — Let’s say you have been relying on a different base image in your Dockerfile. Then, you should consider using the NVIDIA Container Toolkit alongside the base image that you currently have by using Docker multi-stage builds.
The Power of the NVIDIA Container Toolkit — Now that you have you written your image to pass through the base machine’s GPU drivers, you will be able to lift the image off the current machine and deploy it to containers running on any instance that you desire.
Congratulations! Now you know how to expose GPU Drivers to your running Docker container using the NVIDIA Container Toolkit.
Want to use your new Docker capabilities to do something awesome? You might enjoy our other posts on training a state of the art object detection model, training a state of the art image classification model, or simply by looking into some free computer vision data!