Getting started with TensorFlow Serving

Original article was published by Arunkumar L on Artificial Intelligence on Medium

Getting started with TensorFlow Serving

Credits: TensorFlow

TensorFlow Serving is a part of TensorFlow Extended(TFX) that makes deploying your machine learning model to a server more comfortable than ever. Before Google released TensorFlow Serving, your model has to be deployed into production using Docker. Using Docker to deploy your model is tedious, time-consuming, and prone to many errors. TensorFlow Serving provides us with an API that can be called upon using HTTP requests to run inference on the server. In this blog, we will serve an emotion recognition model and, through that, understand the basics of TensorFlow Serving.

PS: I will not explain about training the emotion recognition model in this blog. I will only be talking about serving the model. You can find the gist to train the emotion recognition model here. You can find the repository to run the model locally here.

Why serve a model?

Once you have trained your model, it has to be deployed into production so that it can be used. Various methods can be used to deploy the model like deploying locally on phones using TFlite, deploying on a website using TFjs, creating a docker container to deploy your model on the cloud, etc. TensorFlow Serving has an advantage over the other methods for the following reasons.

  1. It is much easier to deploy your model using TensorFlow Serving than with Docker, and it saves you time and prevents unnecessary errors.
  2. It is easier to manage different versions of the model as compared to TFlite or TFjs.
  3. When the model is updated, all the clients will be using the same version of the model, and the result will thus be uniform.
  4. Since the model will be running on the server, you can use powerful computational resources like GPUs or TPUs to run inference faster.
  5. Since the model is served an API, it can be used by different programming languages that TensorFlow does not support.

Installing TensorFlow Serving

TensorFlow Serving can be installed using Docker, apt package installer for Linux, or the pip package manager. In this blog, we will focus on using apt and pip to install TensorFlow Serving.

Run the code given below in your production environment’s terminal to install TensorFlow Serving. You can either use the apt package installer or pip package manager.

Installing TensorFlow Serving

Setting up and Serving your model.

Once you have trained, tested, and validated your model, it is time to serve your model.

Step 1: Specify the version number for your model and save your model to a tmp directory using the tf.keras.models.save_model() function or the function.

Saving the model

Step 2: The model will be served through a bash command. For the bash command to access the model directory, add the model directory to the environment.

Adding the model directory to the environment

Step 3: Now that the model has been saved, it is ready to be served. Run the bash command given below to serve the model.

Serving the model

Step 4: Now that your model has been served, you can use the tail command to check the server.log file. If you find “Status: success: OK” line in your log, your model has been served successfully.

Check for the Status: success: OK log

Running inference on the model using an HTTP request

Once your model has been served, it will be accessible through the port that you have defined. Your model can be served anywhere; a cloud compute engine like GCE, AWS lambda, AWS EC2, Google Colab, your local computer, etc. Note that when you deploy your model into production, it doesn’t make much sense to serve it on Colab or your local computer. For the sake of simplicity, I will be deploying the model on Google Colab.

You can run inference on the served model using HTTP requests. Once an HTTP request has been made, the server will pass that data to the model, get the response and send it back to the client. The input to the API endpoint is a JSON with the preprocessed image embedded in it. You can use the requests library to make HTTP requests to the API.

Before calling the API, the input image must be preprocessed.

Image Preprocessing

Once the image has been preprocessed, embed the image to a JSON. This JSON object will be the data sent to the API endpoint. The JSON data will have two items, “signature_name” and “instances.” The “instances” item will contain the image data as a list.

Specify the header for the API request that tells the endpoint that a JSON file is being passed as input. Use the POST method to request from the API endpoint with the URL, JSON data, and headers as the parameters. The API will return the predictions for the image.

Prediction from the served model

Here is the output sample from the model.



TensorFlow Serving makes it easier than ever to deploy deep learning models into production. You can avoid preprocessing the input image by using a lambda layer in your model that preprocesses the image. This makes calling the API easier.