Original article was published on Deep Learning on Medium
This blog gonna be Course Project of “Deep Learning with PyTorch: Zero to GANs” hosted on Jovian.ML in a series of tutorials on building deep learning models with PyTorch, an open-source neural networks library by Aakash who is founder of Jovian.ML.
The project aims at predicting the emotion of a person based on the image of their face. The image can be anything ranging from a selfie to an image captured while scrolling the feed via the mobile’s front camera, webcam or video. We will be implement everything from scratch here.
Because of the PyTorch’s diverse modules and packages in Deep learning, that is the main tool used in this project.
Say Hi to those lovely smiles before going down. So now we ready to go, Lets gets started.
We’ll divide the whole project into the following phases:
- Data selection and data preparation
- Data Augmentation
- Neural Network Architecture
- Hyper-parameter Tuning
- Transfer Learning and Further Training
- Performance and testing results
- What next..
1. Data selection and data preparation
Dateset used from The Facial Expression Recognition 2013 (FER-2013)
The first step in any project is to choose a dateset, we chose the publicly available FER datasets for our task.
- It has images categorized in one of the seven emotions.
- Is publicly available and the length of the dateset is suitable for our task.
- Training set: 28,709 examples.
- Test set: 3,589 examples.
- Validation set: 3,589 examples
The data was pulled from a past Kaggle Competition. The emotions available are as follows :
However, the model is built in a way to work with any dateset. We can use commercial datasets with our model and get better results.
After choosing the dateset, we prepossessed the images. The images were already centered crop with good to go dimensions. We did not choose to resize and went with just normalizing the images and converting them to tensors.
2. Data Augmentation
The distribution of samples per category in the FER datasets is not balanced. The category disgust is least represented with only 547 samples whereas the category happiness is most represented with 8989 samples.
Future Scope lies in augmentation. Multiple balancing techniques can be used to present an equal number of apparations per category which will result in higher accuracy
3. Network Architecture
4. Hyper-parameter Tuning
The pre-trained models are trained using the Image-net datasets, which has 1000 classes. Our task is to only classify the images into one of the 7 emotions so we had to alter the classification layer. We prepared our own Network to merge with the vgg19 pre-trained layers.
For this task we chose-
- 1024 dense hidden layer
- ReLu Activation Function
- Dropout layers in between the hidden layers with p=0.2
- Adam Optimizer
- 25 Epochs
- Batch Size of 64
5. Transfer Learning and further Training
The motivation of using transfer learning for our task came after we implemented a Deep Neural Network from Scratch. The model built from scratch gives accuracy only around 18%-20%. We boosted the accuracy with the help of transfer learning.
PyTorch’s sub package model has a variety of pre-trained networks that can be easily downloaded. I have tried multiple networks before settling to VGG19. Initially, we used VGG16 which gave us accuracy below 40% followed by ResNet50 with 41% and DenseNet101 with 42.5%.
VGG19 yields an accuracy of 46% which is better than all other pre-trained models. Therefore, we decided to choose VGG19 for the implementation.
6. Performance and testing results
The model gives an accuracy of 46% which is due to the dateset we used. The commercial large dateset with high-resolution pictures can outperform and give better accuracy.
Some examples of prediction results:
7. What Next…
clap and now you have completed whole lot of project. you can further learn and play with below shared links to gets hands dirty in code section.
visit the project repository
Link to Code Github