MLflow Projects

Source: Deep Learning on Medium

If you create a new project or clone an existing one you can make it an MLflow project by simply adding two YAML files, viz., MLproject File and Conda environment file, to the root directory of the project.

This step is not obligatory but is highly recommended, as it not only enhances the reproducibility of your models but also links the run to a specific version of the code (its git hash). This is very useful, as a user can simply git checkout to a particular git commit, if the future changes to code have affected it’s functionality and/or the results.

An example of an MLproject file for Deeplab on Cityscapes Semantic Segmentation Dataset:

name: deeplab
conda_env: conda.yaml
entry_points:
main:
parameters:
training_number_of_steps: {type: int, default: 900}
output_stride: {type: int, default: 16}
decoder_output_stride: {type: int, default: 4}
train_batch_size: {type: int, default: 1}
dataset: {default: 'cityscapes'}
train_logdir: {default: /home/sumeet/models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/train}
dataset_dir: {default: /home/sumeet/models/research/deeplab/datasets/cityscapes/tfrecord}
command: "python train.py \
--logtostderr \
--training_number_of_steps={training_number_of_steps} \
--train_split='train' \
--model_variant='xception_65' \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride={output_stride} \
--decoder_output_stride={decoder_output_stride} \
--train_crop_size='769,769' \
--train_batch_size={train_batch_size} \
--dataset={dataset} \
--train_logdir={train_logdir} \
--dataset_dir={dataset_dir}"

An example of a Conda environment file:

name: production_env
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- ca-certificates=2019.8.28=0
- certifi=2019.9.11=py37_0
- libedit=3.1.20181209=hc058e9b_0
- libffi=3.2.1=hd88cf55_4
- libgcc-ng=9.1.0=hdf63c60_0
- libstdcxx-ng=9.1.0=hdf63c60_0
- ncurses=6.1=he6710b0_1
- openssl=1.1.1d=h7b6447c_2
- pip=19.2.3=py37_0
- python=3.7.4=h265db76_1
- readline=7.0=h7b6447c_5
- sqlite=3.30.0=h7b6447c_0
- tk=8.6.8=hbc83047_0
- xz=5.2.4=h14c3975_4
- zlib=1.2.11=h7b6447c_3
- pip:
- absl-py==0.8.1
- alembic==1.2.1
- astor==0.8.0
- attrs==19.2.0
- backcall==0.1.0
- bleach==3.1.0
- chardet==3.0.4
- cityscapesscripts==1.1.0
- click==7.0
- cloudpickle==1.2.2
- configparser==4.0.2
- cycler==0.10.0
- databricks-cli==0.9.0
- decorator==4.4.0
- defusedxml==0.6.0
- docker==4.1.0
- entrypoints==0.3
- flask==1.1.1
- gast==0.2.2
- gitdb2==2.0.6
- gitpython==3.0.3
- google-pasta==0.1.7
- gorilla==0.3.0
- grpcio==1.24.1
- gunicorn==19.9.0
- h5py==2.10.0
- idna==2.8
- imdbclassifier==0.6.6
- importlib-metadata==0.23
- ipykernel==5.1.2
- ipython==7.8.0
- ipython-genutils==0.2.0
- ipywidgets==7.5.1
- itsdangerous==1.1.0
- jedi==0.15.1
- jinja2==2.10.3
- joblib==0.14.0
- jsonschema==3.1.1
- jupyter==1.0.0
- jupyter-client==5.3.4
- jupyter-console==6.0.0
- jupyter-core==4.6.0
- keras==2.3.1
- keras-applications==1.0.8
- keras-preprocessing==1.1.0
- kiwisolver==1.1.0
- mako==1.1.0
- markdown==3.1.1
- markupsafe==1.1.1
- matplotlib==3.1.1
- mistune==0.8.4
- mlflow==1.3.0
- more-itertools==7.2.0
- nbconvert==5.6.0
- nbformat==4.4.0
- notebook==6.0.1
- numpy==1.17.2
- opt-einsum==3.1.0
- pandas==0.25.1
- pandocfilters==1.4.2
- parso==0.5.1
- pexpect==4.7.0
- pickleshare==0.7.5
- pillow==6.2.0
- prettytable==0.7.2
- prometheus-client==0.7.1
- prompt-toolkit==2.0.10
- protobuf==3.10.0
- ptyprocess==0.6.0
- pygments==2.4.2
- pyparsing==2.4.2
- pyrsistent==0.15.4
- python-dateutil==2.8.0
- python-editor==1.0.4
- pytz==2019.3
- pyyaml==5.1.2
- pyzmq==18.1.0
- qtconsole==4.5.5
- querystring-parser==1.2.4
- requests==2.22.0
- scikit-learn==0.21.3
- scipy==1.3.1
- send2trash==1.5.0
- setuptools==41.4.0
- simplejson==3.16.0
- six==1.12.0
- sklearn==0.0
- smmap2==2.0.5
- sqlalchemy==1.3.9
- sqlparse==0.3.0
- tabulate==0.8.5
- tensorboard==1.15.0
- tensorflow==1.15.0
- tensorflow-estimator==1.15.1
- tensorflow-gpu==1.15.0
- termcolor==1.1.0
- terminado==0.8.2
- testpath==0.4.2
- tornado==6.0.3
- traitlets==4.3.3
- urllib3==1.25.6
- wcwidth==0.1.7
- webencodings==0.5.1
- websocket-client==0.56.0
- werkzeug==0.16.0
- wheel==0.33.6
- widgetsnbextension==3.5.1
- wrapt==1.11.2
- zipp==0.6.0
prefix: ~/anaconda3/envs/production_env

You can manually create the above file or if you already have a stable Conda environment you can export it to a file by using the following command:

conda env export > conda.yaml

For more information refer to MLflow Projects.