Facial Keypoint Detection with Openvino

Source: Deep Learning on Medium

Facial Keypoint Detection with Openvino

In facial keypoint detection, we try to predict where the key facial features are located. There are 34 point we are trying to predict and detail of what each point represent can be found here. To perform those task, we will be using Openvino.

Openvino is a computer vision toolkit developed by Intel which focused on model deployment on the edge. It means the model inference is performed in decentralized manner and on the front end device, rather than server side.

Openvino leverage the use of pretrained model for various computer vision task in Model Zoo, which can be used right away. Pretrained model in the Model Zoo have been optimized for inference process on Intel hardware.

In case our model doesn’t exist in the Model Zoo, we can still use it by converting it first to it’s intermediate representation (IR). Model optimizer can convert model from various framework, such as Tensorflow, Pytorch, Caffe etc.

openvino overview

Luckily for facial key point detection task, there are model we can use right away in the model zoo. We will use facial-landmarks-35-adas-0002 from the model zoo. We will use model with precision 16 floating point (FP16).

#Change with the model downloader path and desired output path
sudo $openvino_dir/downloader.py --name facial-landmarks-35-adas-0002 -o $output_dir --precisions FP16
  • .xml: Describes the network topology
  • .bin: Contains the weights and biases binary data

I use the inference code from Intel® Edge AI Scholarship Foundation Course Nanodegree Program.

To perform inference, we need to use IENetwork and IECore from openvino inference engine.

import os
import sys
import logging as log
from openvino.inference_engine import IENetwork, IECore
class Network:
def __init__(self):
self.plugin = None
self.network = None
self.input_blob = None
self.output_blob = None
self.exec_network = None
self.infer_request = None

IENetwork: This class contains the information about the network model read from IR and allows you to manipulate with some model parameters such as layers affinity and output layers

IECore: This class represents an Inference Engine entity and allows you to manipulate with plugins using unified interfaces

Full documentation can be found here.

Next we need to initialize the inference engine and then load the IR model.

class Network:
def load_model(self, model, device="CPU", cpu_extension=None):
# Initialize the plugin
self.plugin = IECore()
# Add a CPU extension, if applicable
if cpu_extension and "CPU" in device:
self.plugin.add_extension(cpu_extension, device)
# Read the IR as a IENetwork
self.network = IENetwork(model=model_xml, weights=model_bin)
# Load the IENetwork into the plugin
self.exec_network = self.plugin.load_network(self.network,\ device)
# Get the input layer
self.input_blob = next(iter(self.network.inputs))
self.output_blob = next(iter(self.network.outputs))

For streaming input, we will use async inference and extract the output when it’s done.

class Network:
def async_inference(self, image):
#Makes an asynchronous inference request, given an input image.
self.infer_request = self.exec_network.start_async(
request_id=0, inputs={self.input_blob: image})
def wait(self):
#Checks the status of the inference request.
status = self.exec_network.requests[0].wait(-1)
return status
def extract_output(self):
#Returns a list of the results for the output layer of the network.
res = self.exec_network.requests[0].outputs[self.output_blob]
return res

Let’s prepare the setup for inference using video stream from webcam. To use webcam we use 0 argument in the OpenCV video capture function.

#Path to IR model xml
ir_model = 'D:/facial_keypoint/models/intel/facial-landmarks-35-adas-0002/FP16/facial-landmarks-35-adas-0002.xml'
def capture_stream(args):
plugin = Network()
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades+ "haarcascade_frontalface_default.xml")
plugin.load_model(ir_model, 'CPU')
net_input_shape = plugin.get_input_shape()
#Argument 0 to use video stream from webcam
cap = cv2.VideoCapture(0)
# print(net_input_shape)
model_width = int(net_input_shape[2])
model_height = int(net_input_shape[3])

Before feeding the video to the inference engine we need to preprocess it first. We will use haarcascade classifier to detect face area and then feed forward it to pretrained model.

def capture_stream(args):
while cap.isOpened():
flag, frame = cap.read()
faces = face_cascade.detectMultiScale(frame, 1.2, 2)
if faces is not None:
for (ix, iy, w, h) in faces:
#Crop face detected
face_image = frame[iy:iy+h, ix:ix+w]

# Draw rectangle around face
frame = cv2.rectangle(
frame, (ix, iy), (ix+w, iy+h), color, int(args.lw))

face_width = face_image.shape[1]
face_height = face_image.shape[0]

#Resize cropped face to match IR input size
face_image = cv2.resize(face_image, (model_width, model_height))
face_image = face_image.transpose((2, 0, 1))
face_image = face_image.reshape(1, *face_image.shape)

After preprocessing the input stream, we can start performing inference and displaying the predicted keypoints.

def capture_stream(args):
flag, frame = cap.read()
while cap.isOpened():
if plugin.wait() == 0:
result = plugin.extract_output()
for i in range(0, result.shape[1], 2):
x, y = int(ix+result[0][i]*ori_width), iy+int(result[0][i+1]*ori_height)
# Draw Facial key points
cv2.circle(frame, (x, y), 1, color, int(args.kw))
if not flag:
key_pressed = cv2.waitKey(60)
cv2.imshow("facial_keypoint_detection", frame)

Here are the sample result.