The Concept of Head Pose estimation 觀念教學

Source: Deep Learning on Medium


Go to the profile of Mikeeee

我們必須先瞭解camera calibration才能更深入了解使用dlib以及opencv,
如何幫助我們完成Head pose estimation的Task.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — —

1. 相機校正(Camera Calibration)

中文公式示意圖

Figure. The concept of Camera calibration

Figure. The concept of Camera calibration

Purpose:

利用multiple個 世界座標(World coordinates) 跟其corresponding的 影像座標(Camera coordinates) 的點,來求該 Camera 的 內部參數矩陣(Intrisic parameters) 外部參數矩陣(Extrinsic parameters).

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

相機內部參數:

相機焦距在成像平面 x, y 軸方向的距離(以像素為單位),相機焦距與成像平面的 z 軸相交點的位置,成像平面上 x, y 軸的夾角。

相機外部參數:

相機不動,由物體相對於世界座標原點的移動及旋轉組成。

失真系數(Distortion coefficients):

跟相機鏡頭的透鏡有關,光線經過透鏡時的所產生的彎曲。

— — — — — — — — — — — — — — — — — — — — — — — — 
2. Dlib

使用dlib先抓出68個人臉feature points,
我們能透過兩個dlib函數來抓到feature points

dlib.get_frontal_face_detector()
dlib.shape_predictor()

— — — — — — — — — — — — — — — — — — — — — — — —

3. Opencv

1) 求出外部參數(Extrinsic parameters)

cv2.solvePnP()

2) 需要先定義 world coordinates, 這邊使用現成設定好的coordinetes參數做使用

http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2

3) 需要先自行定義camera matrix及distortion coefficients
4) 最後能求出eular angle即可做Head pose estimation

cv2.decomposeProjectionMatrix()

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

以下附上Implement code:


import cv2
import dlib
import numpy as np
import pdb
import time
import math
from imutils import face_utils
face_landmark_path = ‘./shape_predictor_68_face_landmarks.dat’
## For camera calibration
# Camera parameter (intrinsic)
K = [6.5308391993466671e+002, 0.0, 3.1950000000000000e+002,
0.0, 6.5308391993466671e+002, 2.3950000000000000e+002,
0.0, 0.0, 1.0]
# Distorition coefficients
D = [7.0834633684407095e-002, 6.9140193737175351e-002, 0.0, 0.0, -1.3073460323689292e+000]
cam_matrix = np.array(K).reshape(3, 3).astype(np.float32)
dist_coeffs = np.array(D).reshape(5, 1).astype(np.float32)
# 3D points referenced by oher model that corresponds to each face landmarks
object_pts = np.float32([[6.825897, 6.760612, 4.402142],
[1.330353, 7.122144, 6.903745],
[-1.330353, 7.122144, 6.903745],
[-6.825897, 6.760612, 4.402142],
[5.311432, 5.485328, 3.987654],
[1.789930, 5.393625, 4.413414],
[-1.789930, 5.393625, 4.413414],
[-5.311432, 5.485328, 3.987654],
[2.005628, 1.409845, 6.165652],
[-2.005628, 1.409845, 6.165652],
[2.774015, -2.080775, 5.048531],
[-2.774015, -2.080775, 5.048531],
[0.000000, -3.116408, 6.097667],
[0.000000, -7.415691, 4.070434]])
# For projecting the box on image plane
reprojectsrc = np.float32([[10.0, 10.0, 10.0],
[10.0, 10.0, -10.0],
[10.0, -10.0, -10.0],
[10.0, -10.0, 10.0],
[-10.0, 10.0, 10.0],
[-10.0, 10.0, -10.0],
[-10.0, -10.0, -10.0],
[-10.0, -10.0, 10.0],
[10.0, 0.0, 0.0],
[0.0, 10.0, 0.0],
[0.0, 0.0, 10.0],
[0.0, 0.0, 0.0]])

def get_head_pose(shape):
 # dlib shape_predictor can get 68 points of face
 image_pts = np.float32([shape[17], shape[21], shape[22], shape[26], shape[36],
shape[39], shape[42], shape[45], shape[31], shape[35],
shape[48], shape[54], shape[57], shape[8]])
 _, rotation_vec, translation_vec = cv2.solvePnP(object_pts, image_pts, cam_matrix, dist_coeffs)
 # project 3D points to image plane , output image points
reprojectdst, _ = cv2.projectPoints(reprojectsrc, rotation_vec, translation_vec, cam_matrix,
dist_coeffs)
 reprojectdst = tuple(map(tuple, reprojectdst.reshape(12, 2)))
# calc euler angle
rotation_mat, _ = cv2.Rodrigues(rotation_vec)
pose_mat = cv2.hconcat((rotation_mat, translation_vec))
_, _, _, _, _, _, euler_angle = cv2.decomposeProjectionMatrix(pose_mat)
 return reprojectdst, euler_angle 

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Deep learning Method

這邊附上找到的Deep learning-based的做法,個人認為效能不是說非常好,但大家有興趣還是可以參考看看。

DeepGaze(CNN-based)
https://github.com/mpatacchiola/deepgaze

  • Tensorflow
  • 用他們已經pretrained好的model可以直接算出eular angle
  • 判斷效能差
Deep learning method pipeline
Training and validation Results

如果喜歡的話幫我點個applaud, Thanks!