Original article was published by Samuel Mohebban on Artificial Intelligence on Medium
Above, we see the ROC score, confusion matrix, and loss/accuracy for the mobilenet training. Considering the amount of data we used, these metrics are pretty good. For the validation set, only 13 images were incorrectly classified. As for the loss and accuracy, the loss was able to go below .1 and the accuracy was well above 94%. Finally, the ROC score shows great success as each class had a perfect score of 1.0, while F1 scores for each class were greater than .98.
Like the last model, we must first start by extracting the image values and placing them into a NumPy array. Like I mentioned earlier, we will reuse the get_image_value function within a new function designed to extract only the emotion images. The dataset contains 7 classes: angry, happy, neutral, sad, disgust, fear, and surprise. For this project, we will only be focusing on the first 3 classes: angry, happy, and neutral. Also, the model that we will use for training will take an input image with size (48,48,3), which is much smaller than mobilenet dimensions of (224,224,3).
As you can see above, we limited each class to include only a maximum of 4000 images. This was performed so that training would be faster and so that we could properly track the performance with equal class balance.
After running the code above, you should see a similar window as before that looks like this:
Now that we have the train test split arrays, we will build the neural network that will detect the emotion on a person’s face. Below is the code for how to build this neural network. Unlike mobilenet, we will not apply augmentation to the dataset.
After running the code above, you should see a window like this:
Once the training is complete, you should find a .h5 file called Normal_Emotions.h5 which will contain the weights for the model. Like the previous model we trained, this will be used for the live video portion below.