Tensorflow Lite model inferencing fast and lean!!

Source: Deep Learning on Medium

Tensorflow Lite model inferencing fast and lean!!

This article is intended to talk more about how TFLite achieves inference over all the different types of edge devices in a fast and lean way.

We have a different set of edge devices such as IoT devices, mobile devices, embedded devices, etc. How TFLite is taking inference seamlessly and elegant way. To understand this let us jump into it.

What is an interpreter?

As we know TFLite consists of a set of tools and the TFLite consist of two core components:

  1. Converter
  2. Interpreter

The converter will help us to convert deep learning models into the TFLite format and the interpreter makes our life easier while inferencing.

The TensorFlow Lite interpreter, which runs specially optimized models on many different hardware types, including mobile phones, embedded Linux devices, and microcontrollers

TFLite interpreter people refer to interchangeably as inferencing. The term inference refers to the process of executing a TensorFlow Lite model on edge devices in order to make predictions based on user input. To perform inference with a tensorflow lite model, you must run it through interpreter.

TFLite interpreter is designed to be lean and fast to achieve this it uses a static graph ordering and a custom memory allocator to ensure minimal load, initialization, and execution latency.

Step of inferencing:

TensorFlow inference APIs are provided for most common mobile/embedded platforms such as Android, iOS, & Linux, in multiple programming languages. Across all libraries, the TensorFlow Lite API enables you to load models, feed inputs, and retrieve inference outputs.

TFLite interpreter follows below steps in general:

  1. Loading a model:-

The first and more must step is to load the .tflite model into the memory, which contains the execution graph.

2. Transforming data:-

The model doesn’t understand the raw input data. To make raw compatible into a model understandable format you need to transform the data. For e.g for the computer vision model, you need to resize the input image and then provide that image to model.

3. Running inference:-

Now the model is in memory and data is in the required format let us take the inference. It involves a few steps such as building the interpreter and allocating tensors.

4. Interpreting output:-

After the third step, we will get some output after inference but the end-user won’t understand that. Model results most of the time are probabilities or approximate value. We have interpreted this result into meaningful output.