Original article was published by Dimitris Poulopoulos on Artificial Intelligence on Medium
From Jupyter Notebooks to HP Tuning
In this article, I follow a different approach. Instead of showing you how to set up your workspace first, we dive directly into the task at hand: we have a Jupyter Notebook that’s training an ML model, and we want to find the optimal set of the learning algorithm’s hyperparameters automatically, without writing a single line of code. Later, I will provide you with the details about how to do all these yourself.
That been said, we use Kubeflow and a tool called Katib. Katib is a Kubernetes-based system for HP tuning and Neural Architecture Search. Katib supports several ML frameworks, including TensorFlow, Apache MXNet, PyTorch, XGBoost, and others.
Also, for this example, we use the Titanic dataset and Kale (Kubeflow Automated pipeLines Engine). Kale is an open-source project that aims at simplifying the Data Science experience of deploying Kubeflow Pipelines workflows. So, let’s see how simple it is to run an HP tuning experiment from your Jupyter environment using Kale and Katib.
The project example is available on GitHub, but as we discussed setting up your environment comes later. For now, assuming we have a Notebook kernel running, all we have to do is open Kale from the left panel, enable it and set up a Katib job.
Moreover, we see that, when enabled, Kale annotates the cells of the Notebook. Each annotation is a step in the ML Pipeline. You can play around and create your own steps, but don’t forget to set the dependencies of each cell also. In any case, the annotations that we care about are two:
pipeline-parameterscell, that defines which are the hyperparameters in this Notebook
pipeline-metricscell in the end, which defines the metric that we try to optimize by changing the hyperparameters
The Katib UI will pick up the parameters automatically from the
pipeline-parameters. All we got to do is choose the values that we would like to try for each parameter, the optimization algorithm and the metric we want to optimize. After, pressing the
COMPILE AND RUN KATIB JOB button on the bottom, we can follow the link to watch the experiment live or wait to get the results via the Kale user interface. This is what the Katib experiment looks in the end.
Kale will return the best hyperparameter combination for our model. This is as easy as it gets!
How Can I Get All That?
So, let’s see how to set up our environment now. To get an instance of Kubeflow running, we will use MiniKF. To install it is pretty simple; all we need is a GCP account and the ability to deploy applications from the Marketplace.
- Visit the MiniKF on GCP page
- Choose Launch, set the VM configuration and click deploy
For best performance, it is recommended to keep the default VM configuration
That’s it! The deployment takes up to ten minutes, and you can watch the progress by following the on-screen instructions; ssh into the machine, run
minikf on the terminal and wait until you have your endpoint and credentials ready.
Now, we are ready to visit the Kubeflow Dashboard. Click on the URL, enter your credentials, and you’re ready to go!
Running a Jupyter Server
Now, we need a Jupyter Notebook instance. Creating a Jupyter Notebook is relatively easy in MiniKF, using the provided Jupyter Web App:
- Choose notebooks from the left panel
- Choose the
- Fill in a name for the server and request the amount of CPU and RAM you need and click
To follow this tutorial leave the Jupyter Notebook image as is (
jupyter-kale:v0.5.0-47-g2427cc9— Note that the image tag may differ)
After completing these four steps, wait for the Notebook Server to get ready and connect. You’ll be transferred to your familiar JupyterLab workspace. To get the example of the story, create a new terminal in the JupyterLab environment and clone the following repo:
You can find the Titanic example in
medium > minikf > titanic-katib.ipynb. If you want to see more things you can do with MiniKF and Kale, check out the following stories.
In this story, we saw how to decouple our model training pipelines from the HP Tuning process. We demonstrated how we can use MiniKF to create a single node Kubeflow instance. Then, we turn to Kale and Katib to automate the process of running HP Tuning experiments from a Jupyter Notebook.
We are now ready to let Katib handle HP Tuning for us and use state of the art techniques, like Bayesian Optimization and Neural Architecture Search (NAS).
About the Author
My name is Dimitris Poulopoulos and I’m a machine learning engineer working for Arrikto. I have worked on designing and implementing AI and software solutions for major clients such as the European Commission, Eurostat, IMF, the European Central Bank, OECD, and IKEA.
Opinions expressed are solely my own and do not express the views or opinions of my employer.