Source: Deep Learning on Medium
Post by Chris van Pelt
There are two kinds of people building deep learning models: those who do everything in Docker, and those who should be doing everything in Docker.
Why is Docker so great for deep learning?
- Docker supports reproducibility. Much has been written about the reproducibility crisis in machine learning, and the difficulty is real. Using Docker removes one major source of variability.
- Using Docker allows your code to continue to run painlessly in the future. When I clone a github repo of an ML experiment, I always prepare for an unknown amount of time messing with libraries and dependencies. The deep learning “stack” keeps changing, with popular libraries like Pytorch and TensorFlow constantly putting out new versions and breaking dependencies. Using Docker shortens the time it takes to rerun old models and makes it substantially easier for teammates to reproduce your results.
Here at Weights & Biases, we are working towards zero-overhead reproducibility by making it easy to use Docker.
Run this simple command from inside your ML project:
This sets up a machine learning Docker image with standard packages installed, mounts your code, and places you inside of it. You can now train your models exactly the way you were doing before, with the added benefits of Docker. During model training, wandb will save the digest, which is a permanent record of your Docker image state. This means you will always be able to recover the exact environment your code ran in.
At any time in the future, you can run the following to be taken back to the exact state your code and Docker image were in during your training run:
wandb restore <username>/<project>:<run_id>
Weights & Biases pre-fills your bash history with the original command.
wandb docker floydhub/dl-docker:cpu
This command loads the Floydhub Deep Learning Docker image for CPUs.
To analyze results or launch runs with Jupyter, run the following:
wandb docker — jupyter
This installs Jupyter and starts JupyterLab on port 8888.
For People Already Using Docker
We automatically track the digest to ensure environment replication in the future. You can pass the digest manually by setting the WANDB_DOCKER environment variable. We also provide a helper to get the digest for workflows that launch docker images manually: wandb docker image_name — digest. For more advanced users who already have Docker in their workflows, we provide a new command:
Much like nvidia-docker, this command is a simple wrapper that injects the WANDB_DOCKER and WANDB_API_KEY environment variables to your existing Docker run calls. For users running their payloads in kubernetes, our latest client will populate the digest automatically if the k8s control plane api is exposed to the pod.
You can learn more about our Docker support in our documentation.