Tensorflow 2.0 — from preprocessing to serving (Part 4)

Original article was published on Deep Learning on Medium

TensorFlow 2.0 — From Preprocessing to Serving (Part 4)

Welcome to this the fourth and last part of a tutorial into TensorFlow and its Keras API. We’ll be discussing everything deep learning — starting from how to preprocess input data, then modelling your neural net to encode your data and process an output, optimize training and serve the model as a REST API.

As you can see from the title, we’ve already had 3 stories on this topic already, in this one we’ll go through all those steps quickly, and use the model created to serve requests.

Before you start reading this article and its predecessors, you should be aware of the basics of the following subjects so that you don’t feel flustered while reading them:

  1. Calculus
  2. Linear Algebra
  3. Neural Networks
  4. NumPy, Pandas

And as you might have inferred from the topic this is a programming article so it might help having some pre-existing experience in python.

As this article involves serving models, it’s better to do this on Google Colab -Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.

With Colaboratory you can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser.

The dataset we’ll be using here is the Fashion-MNIST. Fashion-MNIST is a dataset of Zalando‘s article images-consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28×28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.

Sample from the Fashion MNIST Dataset (each class takes three-rows)

You might have prior experience with MNIST’s handwritten digits dataset but we won’t use that here because of two very elegant reasons:

  1. Network design for MNIST is too easy, you can get a 98% accuracy even with a simple Dense Network.
  2. It’s overused, we want to learn something new, not regurgitate old material.

But first we should get to know what an API is: an application program interface (API) is a set of routines, protocols, and tools for building software applications. An API specifies how software components should interact.

We’ll look at a small example:

Our TensorFlow model server is going to be the same, except it returns the predictions from a ML model. Now let us load in and normalize the data just like in the first part of the tutorial:

We’ll then expand dimensions, reshape and get the number of unique classes. Now we go ahead and build out our convolutional model, just like in our second tutorial.

Now we go ahead and train:

Now that we’ve trained the model, we need to save the it in a temporary file so that it is ready to be served, if there’s already a saved model, we remove it from the system. A SavedModel is a directory containing serialized signatures and the state needed to run them, including variable values and vocabularies.

Note the use of versions.

The saved_model.pb file stores the actual TensorFlow program, or model, and a set of named signatures, each identifying a function that accepts tensor inputs and produces tensor outputs.

SavedModels may contain multiple variants of the model (multiple v1.MetaGraphDefs, identified with the --tag_set flag to saved_model_cli).

The saved_model_cli contains more class information in it’s output, it’d be really great if you could look at the full output of this command.

Now go ahead and install the tensorflow-model-server package on your colab terminal using the “!”(bang) symbol. (Instructions to set up the package locally are also included in the whole code)

Now we’ll go ahead and start serving the model as a service.

This is where we start running TensorFlow Serving and load our model. After it loads we can start making inference requests using REST. There are some important parameters:

  • rest_api_port: The port that we’ll use for REST requests.
  • model_name: We’ll use this in the URL of REST requests. It can be anything.
  • model_base_path: This is the path to the directory where we’ve saved our model.

The nohup parameter is required so the service runs uninterrupted. Let’s just check our logs so as to see that everything ran properly:

If no errors are logged, it means that our model has started serving and we can send requests to it to get predictions.

Now, let’s look at a random example from our dataset:

Now we’ll create a request, to be passed in a JSON format, it should contain the inputs, properly formatted, exactly like we passed the data for training, because our model is trained to recognize only preprocessed data.

As you can clearly see, we’re passing in 3 instances to get inferences for.

Now, we package this in a request and send it to out model for predictions, we get a ping back, predictions namely, in the same JSON format.

The predictions are in SoftMax format, a probability corresponding to every class an instance belongs to. So what we can do is find the index each instance belongs to by taking argmax of each prediction and mapping them back to string labels.

Now let’s see how they matched up with actual predictions:

Pretty good results!

Now, we look into versioning our models, let’s see how to make and select models by version:

Pay extra attention to the last couple parts of the URL to which we’re making a post request, specifically versions/1:predict, by default if there’s only one model, that’s v1 and that’s the only one being served.

We’ll now go ahead and make another version of our model, compile and fit it on our dataset again.

Now save the second version of our model appropriately:

The coolest thing is, you don’t need to restart the server for the model to recognize the second version, it has already been saved as an asset and is ready to be served. We again make a post request, but this time to the second version of the model

See? — the URL now has changed to 2:predict. Versioning is easy with tensorflow-server.

But a valid question to ask at this point is, “what if we use a version number that does not exist?” Let’s test it out!

We encounter an error, which we expected.

And with that we are done, from preprocessing to serving.

To look at the code in whole please check this out:

Stay tuned for my next series.

Till then, Cheerio!