Deep Learning with docker container from NGC — Nvidia GPU Cloud

Source: Deep Learning on Medium

Deep Learning with docker container from NGC — Nvidia GPU Cloud

After too many times of sweating over installation of Deep Learning environment, a new way is presented here. No more cuda and driver installations, welcome to the docker era. You download a docker from NGC cloud, the docker contains Python, TensorFlow and all the needed cuda code. There are dockers with Pytorch, Caffe, trained model, and more. Here I use Python 3 and TensorFlow 1.9. Nvidia, please add TensorFlow 2 !

Nvidia cloud

Get yourself a Docker

Go ti https://ngc.nvidia.com/catalog/all. Choose Container tab, and view all available containers. I chose the TensorFlow container:

TensorFlow container on Nvidia cloud

You can use it directly from the cloud, with: nvcr.io/nvidia/tensorflow: 19.12-tf2-py3 . Or download it, after you created and account and logged in.

Start docker container

List all available local docker images

docker images
docker run --rm -it --runtime=nvidia --net=host -v <local dir>:<destination dir> <Image id>

For example , of local docker, use:

docker run --rm -it --runtime=nvidia --net=host -v /my_server_dir/:/naomi 989143febb16

Or use a docker from Nvidia cloud:

docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/tensorflow:19.12-tf2-py3

Where:

  • -it means run in interactive mode
  • --rm will delete the container when finished
  • -v is the mounting directory
  • local_dir is the directory or file from your host system (absolute path) that you want to access from inside your container. container_dir is the target directory when you are inside your container. For example,

For example, the local_dir in the following path is /home/jsmith/data/mnist and the target directory is: /data/mnist .

-v /home/jsmith/data/mnist:/data/mnist

Example taken from: https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow

Stop a docker container

# list running dockers:
$ docker ps
# Find the docker container id, then run:docker kill <container id>

Attach to a running docker container

When you loose the command line window that manages the docker, you need to attach to the running docker container. Use the following command to create a shell inside the docker container.

docker exec -it <container id> /bin/bash

Run Jupyter notebook on docker container

Nvidia docker container comes without jupyter. Install it with:

pip install jupyter

Then run it with:

jupyter-notebook --ip 0.0.0.0 --allow-root

In your browser, type: http://<server ip>:8888/?token=<token> . In our case:

http://100.200.15.20:8888/?token=2a0630188ca7bc5b99d2c5b7923e2a2ebe391bc15a4bc3f1

Copy the token that is given in the cmd.

Its convenient to use the notebook for installation of packages and other back-end staff. Just use ! at the beginning of command line, and run each command line in a separate cell. For example:

! pip install keras

To save your installations in the docker image, you need to commit the docker.

Commit a docker

Commit a docker, will create a new docker with all your installations.

docker commit <docker container id> <new docker name>:<docker tag>

For example, committing a docker with the following command:

docker commit 9571cb71d812 naomi/brats:version2

Will create a docker image that will look like this, when running `docker images`:

Reconnect to a running Jupyter notebook

Deep learning training, usually take long time. So in most cases, when working on remote server, the connection is closed, and the token to the notebook is lost. To find the token of a running notebook, do:

# use -mtime -30 to list only notebooks that started in the last 30 days.find `jupyter --runtime-dir` -mtime -30 | grep nbserver | xargs cat

You will see something like:

...
<body>
<p>
This page should redirect you to Jupyter Notebook. If it doesn't,
<a href="http://0.0.0.0:8888/tree?token=7caaccce4fb4e670d9053b67c773e93c4a6ceb128fde2f56">click here to go to Jupyter</a>.
</p>
...

Just copy the address (changing 0.0.0.0 to your servers IP, or localhost) to your browser, to reopen the running notebook.

User issues

In the docker container, you are the king of your kingdom, you are root user, which I guess is needed for installations. The files you write, get root ownership on the disk, even if you don’t have root password. Delete a file you created in a container, can be done only with root password, or from a docker. I haven’t figured it out yet. Any insights will be appreciated.

Docker Summary

Whats an article worth without some copy paste 🙂

# run
docker run --rm -it --runtime=nvidia --net=host -v <local dir>:<destination dir> <docker Image id>
# list available dockers
docker images
# list running docker
docker ps
# attach to a running docker
docker exec -it <container id> /bin/bash
# run notebook
jupyter-notebook --ip 0.0.0.0 --allow-root
# commit a docker
docker commit <docker container id> <new docker name>

More explanations on using dockers and the meaning of each flag, can be found here: https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow