Source: Deep Learning on Medium
Deep Learning with docker container from NGC — Nvidia GPU Cloud
After too many times of sweating over installation of Deep Learning environment, a new way is presented here. No more cuda and driver installations, welcome to the docker era. You download a docker from NGC cloud, the docker contains Python, TensorFlow and all the needed cuda code. There are dockers with Pytorch, Caffe, trained model, and more. Here I use Python 3 and TensorFlow 1.9. Nvidia, please add TensorFlow 2 !
Get yourself a Docker
Go ti https://ngc.nvidia.com/catalog/all. Choose Container tab, and view all available containers. I chose the TensorFlow container:
You can use it directly from the cloud, with:
19.12-tf2-py3 . Or download it, after you created and account and logged in.
Start docker container
List all available local docker images
docker run --rm -it --runtime=nvidia --net=host -v <local dir>:<destination dir> <Image id>
For example , of local docker, use:
docker run --rm -it --runtime=nvidia --net=host -v /my_server_dir/:/naomi 989143febb16
Or use a docker from Nvidia cloud:
docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/tensorflow:19.12-tf2-py3
-itmeans run in interactive mode
--rmwill delete the container when finished
-vis the mounting directory
local_diris the directory or file from your host system (absolute path) that you want to access from inside your container.
container_diris the target directory when you are inside your container. For example,
For example, the
local_dir in the following path is
/home/jsmith/data/mnist and the target directory is:
Example taken from: https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow
Stop a docker container
# list running dockers:
$ docker ps# Find the docker container id, then run:docker kill <container id>
Attach to a running docker container
When you loose the command line window that manages the docker, you need to attach to the running docker container. Use the following command to create a shell inside the docker container.
docker exec -it <container id> /bin/bash
Run Jupyter notebook on docker container
Nvidia docker container comes without jupyter. Install it with:
pip install jupyter
Then run it with:
jupyter-notebook --ip 0.0.0.0 --allow-root
In your browser, type: http://<server ip>:8888/?token=<token> . In our case:
Copy the token that is given in the cmd.
Its convenient to use the notebook for installation of packages and other back-end staff. Just use ! at the beginning of command line, and run each command line in a separate cell. For example:
! pip install keras
To save your installations in the docker image, you need to commit the docker.
Commit a docker
Commit a docker, will create a new docker with all your installations.
docker commit <docker container id> <new docker name>:<docker tag>
For example, committing a docker with the following command:
docker commit 9571cb71d812 naomi/brats:version2
Will create a docker image that will look like this, when running `docker images`:
Reconnect to a running Jupyter notebook
Deep learning training, usually take long time. So in most cases, when working on remote server, the connection is closed, and the token to the notebook is lost. To find the token of a running notebook, do:
# use -mtime -30 to list only notebooks that started in the last 30 days.find `jupyter --runtime-dir` -mtime -30 | grep nbserver | xargs cat
You will see something like:
This page should redirect you to Jupyter Notebook. If it doesn't,
<a href="http://0.0.0.0:8888/tree?token=7caaccce4fb4e670d9053b67c773e93c4a6ceb128fde2f56">click here to go to Jupyter</a>.
Just copy the address (changing 0.0.0.0 to your servers IP, or localhost) to your browser, to reopen the running notebook.
In the docker container, you are the king of your kingdom, you are root user, which I guess is needed for installations. The files you write, get root ownership on the disk, even if you don’t have root password. Delete a file you created in a container, can be done only with root password, or from a docker. I haven’t figured it out yet. Any insights will be appreciated.
Whats an article worth without some copy paste 🙂
docker run --rm -it --runtime=nvidia --net=host -v <local dir>:<destination dir> <docker Image id># list available dockers
docker images# list running docker
docker ps# attach to a running docker
docker exec -it <container id> /bin/bash# run notebook
jupyter-notebook --ip 0.0.0.0 --allow-root# commit a docker
docker commit <docker container id> <new docker name>
More explanations on using dockers and the meaning of each flag, can be found here: https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow