Source: Deep Learning on Medium
Deploying AI at the Edge with Intel OpenVINO- Part 1
In my last blog, I briefly talked about AI at the edge application and introduced Intel OpenVINO toolkit. In this post, I will talk about obtaining a pre-trained model from OpenVINO’s model zoo and how to leverage it in your app. The topics that will be covered in this post are,
- Different Computer Vision model types
- Available Pre-Trained Models in the Software
- Downloading a pre-trainned model with the model downloader
- Deploying a Basic App with the Pre-Trained Model
Different Computer Vision Models
There are several types of computer vision models that serves different purposes. Classification, Object Detection, Segmentation are the major three types of computer vision models.
Classification model simply classifies an image under some category. It might be a binary classification like Yes or No, or it might be thousands of classes like dog, cat, car, plane, ship etc.
Object detection model detects the class of an object in the image and also detects where in the image that object is present. It returns a rectangular bounding box surrounding the object. For example, when you take a picture in your smart phone, you see a rectangular box surrounding your subject’s face and the camera focuses there. That is an application of face (object) detection model.
Segmentation problem predicts the class of every single pixel that it belongs to. Instead of just giving a rectangular box around the object, like in object detection models, it masks the object pixel by pixel. Under the Segmentation types model, there are two semi-types: Semantic Segmentation and Instance Segmentation. Semantic Segmentation treats all the objects in the image of same class as one single object. Instance Segmentation treats different objects of the same class in the same image as different objects. For example, two cat in a same image will be masked as same in a Semantic Segmentation but in Instance Segmentation, they will be treated as two different objects.
There are several other types of models like text detection, pose detection, face mapping etc. There are also different model architectures used for computer vision. Some of them are SSD, ResNet, YOLO, MobileNet, Faster R-CNN and so on. The documentation of the models also mentions what architecture is used in that particular model.
Pre-trained Models in the OpenVINO Model Zoo
Pre-trained models means the models that are already trained with high accuracy. Training a deep learning model needs a big amount of data and a lot of processing power. Also you need to tune your model parameters to find the best choices to obtain the highest accuracy. This needs hours and hours of work. Yes it is exciting to create your own model from scratch and train it and see it grow to higher accuracy. But if you want to build an app where the AI’s functionality is just one part, it might not be practical for you to use all your time just building the model and training it. You want to focus more on your app’s development than training the AI. And also, getting your model to its best accuracy performance is no easy task. Therefore, leveraging the available pre-trained models in the model zoo, which provides cutting-edge accuracy, are great way to build a better app in less time and without the need of large data.
This link contains documentation on the available models in OpenVINO. You will see two categories – Public Model Set and Free Model Set. The public model set have the models trained but not converted to the intermediate representation. So you need the model optimizer of OpenVINO to convert them first before using it in the inference engine. Part – 2 will contain details on how to use the model optimizer of OpenVINO. You can also modify the model by yourself by further training if you need to.
On the other hand, the free model set has the models already been converted by the optimizer so you can directly use it in the inference engine. But you cannot tune the model any further as the original model is not available.
When you click on one of the models, you will see additional documentation of that particular model which contains information about the architecture, input, output, examples etc.
How to get the models
This link lists all the models with their full name and additional information. You can use the OpenVINO’s model downloader to download your desired model in your machine. You can find the downloader in the following path.
The downloader python file needs some arguments. In your terminal/command prompt (in windows), run the downloader python file with a ‘-h’ to see all the arguments as the following (use ‘/’ in linux or mac instead of ‘\’). In my case, the command looks like this.
python "C:\Program Files(x86)\deployment_tools\open_model_zoo\tools\downloader\downloader.py" -h
For downloading a specific model, use the –name <model_name> argument. And to specify the location where you want to save the downloaded model, use the -o <download_location> argument. You can also specify the precision that you want using the –precisions argument.
For example, I want to download the face-detection-adas-0001 model, the INT8 precision, save it in the ‘pre-trained-model’ folder. Here is the code I will run in the command prompt,
python "C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\open_model_zoo\tools\downloader\downloader.py" --name face-detection-adas-0001 -o model --precisions INT8
You will see the progress of the download in the terminal. After the download is finished, a new folder name ‘intel’ will be created inside the output folder that you specified. Inside the intel folder, you will find the downloaded model.
Pre-processing the input
Different models expects different inputs. It varies from model to model. So you need to check the model’s documentation properly to know what input your model expects and pre-process the input image accordingly. For example, the model I downloaded in the previous section expects an input of shape [1x3x384x672], here the first number 1 is batch size, 3 is the color channel, and 384 and 672 is the height and the width of the image respectively. Also note the color channel order is BGR (Blue Green Red). Some models might use RGB (Red Green Blue). So you need to be careful about your model’s expectation and how you feed your input.
We will use openCV for reading and processing the images. OpenVINO installs the openCV library so we don’t need to do any installation even if we didn’t have it before.
We will create two python files in our project directory, one named app.py and another is inference.py. We will keep all the codes related to OpenVINO in the inference.py file and all the code related to our app will be in app.py.
1First, we need to import openCV in both our app.py file and inference.py file. So write following line in both file.
2Then we will define a function named “preprocessing” in our inference.py file. It will take three arguments: the input image, desired height and width. So define the function as bellow.
def preprocessing(input_image, height, width):
# Code here
3To resize the image, the function will use “resize” function from openCV. Note that in the resize function, we need to feed the width first, then height.
image = cv2.resize(input_image, (width, height))
4Also notice, the model we are working with expects the color channel first and then the dimension of the image. But openCV reads an image with the color channels as the last axis. So, we will use transpose method to bring the color channel first.
# Fix color channel
image = image.transpose((2,0,1))
5Then we need to do a reshape function to add the batch size which is 1.
# Adding batch size
image = image.reshape(1, 3, height, width)
6 Finally the function will return the image.
# Returning the processed image
7 Now, import this function in our app file.
from inference import preprocessing
8 We will load the image in our app using the “imread” function from openCV and using the “preprocessing” function, we will prepare the image for our model. Define a function named “main” in our app file. Inside this function, we will first load our image and do the processing.
image = cv2.imread(image)
width = 672
height = 384
preprocessed_image = preprocessing(image, height, width)
### Code for inference result here (from part 4) ###
For now, we have hard coded the width and height. Later when we will discuss the inference engine in part 3, we will use a function to get the input shape the model expects, without hard coding it. Our input pre-processing is now complete.
Applying Inference with the Model
The second step of our workflow is to apply the inference using the the inference engine. The inference process needs a separate post to discuss. So I am not going into the details of how to apply the inference. It will be discussed in part 3, so check that out.
After doing the inference, you will get an output from your model which needs some processing just like the input. So, let’s define another function to handle the output, assuming we have got the result from the inference. (If you want, you can check github repository where I have uploaded the complete file. This might be helpful for you to do your experiments.)
Processing the Output
Just like we discussed about the different input requirements for different models, each model will produce a different output. Some models might gve one output, others might give more than one output. So you need to properly extract the output according to your need. For example, the “vehicle-attributes-recognition-barrier-0039” model outputs two blobs, one for the vehicle type, another for the color.
How you will process the output depends on what you want your app to do. In our example app, we just simply want to draw a bounding box around the face or faces present in a given image.The model we are working with in this post, has one output blob of shape: [1, 1, N, 7], where N is the number of detected boxes. Here are the steps of how I processed the output to get the bounding box in my image.
(We are assuming, we have stored the result from the inference engine in “result” variable)
1The output blob from the model is a “dictionary” object in python so it is assigned to a key. The documentation doesn’t mention what the key is. So first you need to print the key to know what it is.
# Continue inside the "main" function, after getting inference result
For our face detection model, you will get the following printed line if you run the file,
So, ‘detection_out’ is our key. Now you can delete the print statement from your code, or comment it out because you don’t need it anymore. We will extract the ndarray of our output from the dictionary.
# Extracting the ndarray from the dictionary object
result = result['detection_out']
2 Now we will define a new function for processing the output. As the output processing varies according to the app’s need, I am going to write this function inside the app.py file. So, define a new function named “output_processing” (outside of our main function of course) which will take three arguments, the result, the original image and threshold which we will set to 0.5 default. We will explain what this is later.
def output_processing(result, image, threshold = 0.5):
# Code here
3 Our app gonna draw a bounding box around the faces. We will first set the color of the bounding box. You can also use this as an argument of the function. But for now, I will just keep this fixed. Then, we will extract the width and height of the original image in the corresponding variable for later use. So, inside the function, add these three line.
# Setting bounding box color, image height and width
color = (0,0,255)
height = image.shape
width = image.shape
4Remember, the output shape is 1x1xNx7 where N is the number of boxes. So, we will slice the output first through the first two axis to get to the boxes. The length of the sliced output will be the number of boxes the net detected. That means “result[i]” will give the i-th box. Each box has 7 numbers, the 3rd (which is indexed 2 according to python indexing) is the confidence. The 4th, 5th, 6th and 7th (indexed 3, 4, 5, 6 respectively) are the xmin, ymin, xmax, ymax values of the bounding box respectively. The two diagonal corner point position is enough for us to draw the rectangle.
Now, careful here. Not all the predicted boxes are true. We, even humans sometimes make mistake of finding face in a random pattern, for example, we see faces in the clouds of the sky. And computer vision is still in its primitive stage. I used a picture of mine, so I expected only one bounding box. Guess what, the output contained 200 bounding boxes. But don’t worry. It’s going to identify the right location of the expected box with high confidence. As for the rest of the boxes, the confidence values will be very low. So, we need to set a threshold of confidence. Only the bounding boxes above this threshold will be accepted. I set the threshold to be 0.5.
We will iterate over the detected boxes, find the ones that has the confidence above the threshold, and for those boxes, we will draw the bounding boxes using the “rectangle” function from openCV.
# Drawing the box/boxes
for i in range(len(result)):
box = result[i] # i-th box
confidence = box
xmin = int(box * width)
ymin = int(box * height)
xmax = int(box * width)
ymax = int(box * height)
# Drawing the box in the image
cv2.rectangle(image, (xmin, ymin), (xmax, ymax), color, 1)
4 Then, the function will return the image.
# Returning the image with the drawn boxes
5 Now going back inside the “main” function in our app file to use are newly defined “output_processing” function.
# inside the main function
output = output_processing(result, image)
The “imwrite” function from openCV saves the output image to the hard disk with out defined name “output.png”. You can change it to whatever you want.
Running the file
At the bottom of the app.py file, add these two lines of code and save the file.
if __name__ == "__main__":
Open command prompt and run the “setupvars” first. Without running this file first, openVINO won’t work. Activate the virtual environment also to work in an isolated sandbox. To do so, change your directory in the command prompt to where you created the virtual environment (as discussed in the first part) and the run the commands.
openvinoenv\Scripts\activateC:\"Program Files (x86)"\IntelSWTools\openvino\bin\setupvars.bat
Change the directory (using “cd” command)to where you have saved the app.py file. Run the following command to run the file (you haven’t implemented the inference engine yet, so you can’t run your file as there is no “result” in your file. But if you comment out the parts that needs the inference, you can just play with the “pre-processing” step for now by running the file. Hopefully, you will be able to run the whole app after part 3. Meanwhile, you can also download my code from my github repository if you need the inference result).
This was my output.
We used a model which was already optimized for the OpenVINO toolkit. We didn’t need to do any preprocessing of the model. But if you want to use your own model that you might have created using tensorflow or Pytorch or some other deep learning library, you need to use the ‘model optimizer’ to convert the models into intermediate representation. I will talk about the model optimizer in Part 2 and then we will talk about how to use the inference engine in Part 3. See you there.