Training and deploying ML/DL models on AWS SageMaker made simple

As a Data Scientist or Machine Learning engineer, I’m sure you have faced the following situation: you are working on a Machine Learning or Deep Learning project, and you need to train and deploy your models in production. But you had enough creating your own recipe scripts to train the models on the cloud (AWS, Google Cloud, etc) or waiting for ages to train the models locally on your workstation.

This is why you started using AWS SageMaker! However, it is still not straightforward to deploy your own training/prediction code on SageMaker. You know, AWS is like IKEA! They provide you with all the required processed parts, screws, screwdrivers but at the end of the day it is still your responsibility to follow the instructions correctly and connect all the bits and pieces in the right way. What if you don’t want to go through all this hassle?

For those of you not looking to assemble your next IKEA library, but deploy your next Machine Learning or Deep Learning project on SageMaker, Sagify comes to the rescue!

So, wouldn’t it be AWESOME to code your own training logic, as you usually do, and then call a simple command from the terminal to execute it on AWS like this:

sagify cloud train -d local-src-dir/ -i s3://my-bucket/training-data/ -o s3://my-bucket/model-output/ -e ml.m4.xlarge

There are clear benefits in using the above command:

  1. No need to move data to your code, but code goes to data (-i s3://my-bucket/training-data/).
  2. Trained models are saved in timestamped subfolders under s3://my-bucket/model-output/, for example: s3://my-bucket/model-output/my-code-2018–04–29–15–04–14–483/model.tar.gz
  3. Easy way to specify EC2 instance.
  4. Code is deployed to AWS in a Docker image and pushed to AWS ECS.
  5. Training data from S3 are available to EC2 instance’s EBS storage seamlessly.

Let’s dive into an example on how to use Sagify. Please, follow the Getting Started section in docs for a complete walkthrough. Here is the gist of it:

Step 1:

Clone a Deep Learning codebase that will evaluate arithmetic additions on up to 3 digit integers git clone

Step 2:

Initialize Sagify by executing the following command on your terminal:sagify init -d src

Step 3:

Call your training logic in train(…) function in sagify/training/train file like:

print('Training complete.')
except Exception as e:

Step 4:

Build the Docker image that will contain your code: sagify build -d src -r requirements.txt

Step 5:

Push the code to AWS ECS: sagify push -d src

Step 6 (Optional):

Upload the data to S3, if they are not already available there:

sagify cloud upload-data -d src -i data/processed/ -s s3://my-dl-addition/training-data

Step 7:

Finally, train your model on AWS SageMaker:

sagify cloud train -d src/ -i s3://my-dl-addition/training-data/ -o s3://my-dl-addition/output/ -e ml.m4.xlarge

I’m pretty sure that some of you have spotted something important here. The above commands can be orchestrated by tools such as Airflow to automate your training pipeline! Why is this important?

  1. No more manual execution of training.
  2. Keep track of training code, hyperparameters, trained models, etc on a storage like S3.
  3. Avoid situations like “it works on my laptop”.
  4. Catch issues early rather than later.
  5. Increase visibility enabling greater communication.
  6. Spend less time debugging and more time adding features.

Do these points remind you of anything?

Well, these are some of the benefits of Continuous Integration/Continuous Delivery! At the end of the day, you want your Machine Learning and Deep Learning models to be part of a software system, so you still have to follow Software Engineering Best Practices alongside Machine Learning Best Practices!

About Kenza

Sagify is open sourced by the team behind Kenza, one of the winners of Product Hunt Global Hackathon. We are currently working on a Continuous Integration solution for Machine Learning on top of open source projects! Sagify is one them as we believe in openness, sharing and passion for Machine Learning!

Source: Deep Learning on Medium