Original article was published by Nicole Ramirez on Deep Learning on Medium
Step 1: Package the saved model file and upload to S3
Following training, our saved model (Requirement 1 above) needs to be compressed into a
model.tar.gz file in our local system, which is the format SageMaker recognises. We then transfer the compressed model to AWS by uploading it to the S3 bucket we’ve created previously. We copy the file address of our model as we’ll need to reference it later on in our notebook:
Step 2: Going into script mode: creating our main entry-point script
In the process of training our model, we‘d have written code to read data in, train the model, validate the model, do inference with it, and format the results to a final output. We have to transfer these functionalities to AWS by refactoring them into an entry point script (Requirement 3 above) invoked by the model’s Docker container when we first initialise the model in SageMaker. This script will hold everything needed for your model to perform inference, or be trained.
When your model is deployed through
transform() commands (more on this later), SageMaker starts your model server inside a Docker container, as previously mentioned.
The server then loads and uses your model by invoking a series of specific functions that have default implementations we override with our own. The first function, called
model_fn() handles loading the model to the server.
Below is our implementation for loading a
BERTForSequenceClassification model into the PyTorch model server with
model_fn(). It returns a model loaded onto the correct device (i.e. GPU or CPU, dependent on availability)
Incoming requests to the model may be one of two types — a request for inference (which is our case here), and a request for re-training (which we don’t cover here). For the former, requests are handled by the server in 3 steps, with the associated functions described:
- First, the input data within the request is processed by
- The result of 1.) is passed on to
predict_fn()which contains code for inference. This code makes use of the model loaded in
- An optional function (i.e.
output_fn())gets the result of 2.) and post-processes it prior to transfer to S3 storage, or display somewhere else.
Here’s our implementation for each of the functions above:
So, to recap — a model deployed for inference must implement
predict_fn(), and optionally,
output_fn() in its entry-point script.
Step 2.5: Other tips for the entry-point script
- Adding logging events to the script will help debug any issues we might encounter during runtime. In our script, we implement logging through python’s
logginglibrary. We implement logging in our entry-point script via the below pieces of code:
- Remember how the model server is run inside a container? Inside the container’s
opt/mldirectory, the model and its files are expected to be laid out in the following structure:
- When we first initialise a model, the SageMaker Containers library extracts the artefact from the
model.tar.gzfile we created in Step 1.), and transfers the uncompressed artefact (i.e.
opt/ml/model. The entry-point script is then transferred to
opt/ml/model/code. This latter directory is where the containers library looks for python scripts to run, plus the
requirements.txtfile. So we add code at the top of our entry-point script (i.e.
train_deploy.py) to transfer any other model scripts and the
requirements.txtfile to this directory.
Step 3: Creating the requirements.txt file
This file specifies any package dependencies and their versions for our model script(s). To populate this file, look at the release notes for the deep learning container you’re using (here are the release notes for the PyTorch v.1.6.0 container image we’ve used), and look at what packages are included within. Any package or package version that’s not included but is currently used by any of your model script(s) would have to be specified in
requirements.txt. The container downloads these packages before running your scripts.
Step 4: Import packages in Notebook Instance
Whilst we can create a model and a batch transform job using the SageMaker Console, I prefer doing so with SageMaker APIs in a notebook instance. This allows me to have the workflow all in one place, and view log messages created by my scripts in the notebook itself during runtime. The rest of this step is demonstrated via those APIs.
We start set-up by importing necessary packages:
Step 5: Get session, role, and bucket information
sagemaker_session: The session object that manages interactions with SageMaker APIs and any other AWS service that this inference job uses.
role: The IAM policy that controls access to other AWS services. Is the
bucket: The name of an S3 bucket we’d like results to be stored into.
Step 6: Initialise PyTorch Model
We initialise a PyTorchModel for inference above, calling the
PyTorchModel estimator, with documentation here. The path to our
model.tar.gz file copied in Step 1.) is specified, as is our IAM role and AWS PyTorch container version (i.e. the version of PyTorch we used to write our model scripts). Specifying
source_dir tells SageMaker to look into a folder called
bert-sa-scripts in our current working environment, and find the entry point script,
train_deploy.py there. As mentioned in Step 2.5), this script is transferred to the container’s
opt/ml/code directory when this code executes.
Step 7: Create a Transformer from
SageMaker’s Transformer handles transformations, including inference, on a batch of data. We use it instead of an Estimator in deploying our model, because while an Estimator does predictions on a single input, a Transformer does it for multiple inputs. In SageMaker, a model deployed to a real-time endpoint via the
deploy() command uses an Estimator, whereas a model deployed for serverless offline predictions with the
transform() command uses a Transformer.
In initialising a Transformer, as the documentation shows, the first parameter is the name of the model we’ve initialised in Step 6. We either provide this name as a string, or alternatively call the
.transformer(...) method on the model object directly as we’ve done above. We specify the transformer’s EC2 instance type, which gets started only when a batch transform job runs, making batch transforms truly serverless! You can read about the other parameters in the linked API, but the
bucket we’ve specified in Step 5.) are used here.
Step 8: Run a batch transform job!
Up until this point, we’ve initialised a PyTorch model and used that model to initialise a transformer. We’re now ready to use that transformer to perform batch inference on our data! Our data currently sits inside a .csv file in the
sagemaker-bert-pytorch S3 bucket we’ve alluded to in Step 5.).
Below is some helper code to list the contents of that bucket:
Once we’ve copied the filename of our data from the output of the above code, let’s paste it into the
.transform(...) call, which starts our batch inference job
Here’s the API documentation for the
.transform(...) function. The
.wait() function causes logs in our scripts to be printed out as a docstring into the output cell of the notebook, useful for seeing where our programme is at.
I found it useful to include a log indicating whether the predictions were successfully generated in my model script. This log was inside my
output_fn() call and prints the number of entries which predictions were generated for.
Step 9: Check the predictions
.transform(...) call is finished, we can check the output by calling the following command within our notebook:
!aws s3 cp --quiet --recursive $transformer.output_path ./batch_predictions
Which tells AWS to copy the output of the transformer existing in the
output_path which was set in Step 8.) to a folder called
batch_predictions in the current notebook’s working environment. This is so we can inspect it without having to go into S3.
--quiet disables any printed logs while the process is running, and
--recursive indicates that we copy folders as well, if they exist.
Checking that folder, we see the output file, which looks like this:
Confirming that our batch inference job did generate predictions successfully!
In this post we’ve covered how to deploy a deep learning model trained outside AWS in an AWS-managed deep learning container. To further automate this set-up, we can configure our batch transform job to automatically start every x period of time, or when a new .csv file hits the S3 bucket, instead of manually like we did in this SageMaker notebook. I will cover this in a future post, so stay tuned!
Special thanks to our Senior Dev, Denis Tereshchenko for proof-reading this article in full!