How did the Deep Learning model achieve 100% accuracy?

Original article was published on Deep Learning on Medium

How did the Deep Learning model achieve 100% accuracy?

It is important to look through the datasets before solving a Deep Learning problem. This article will take you through a scenario where the test set and validation set contains the same data as that of the training set. Have you ever wondered if this can happen? Well there are possibilities that datasets can be biased and this article is a result of my personal experience with one such problem.

Introduction

The data set used here is a subset of food-101 data set, and can be found in TensorFlow datasets. My aim is to classify the images into four categories ‘chicken_curry, hamburger, omelette and waffles’. The data set consists of 8,345 images belonging to four classes and is divided in the following way:
Training set: 4000 images belonging to 4 classes
Validation set: 2217 images belonging to 4 classes
Testing set: 2128 images belonging to 4 classes

With this knowledge of the problem, let’s get started.

ResNet50 Pre-Trained CNN

It is a deep residual network and the number ‘50’ refers to the depth of the network, meaning the network is 50 layers deep. It belongs to a sub-class of Convolution Neural Network. The network has over 23 million trainable parameters. ResNet-50 came into existence to solve the problem of vanishing gradients. ResNet-50 uses a skip connection where the input is added to the output of the original block and this mitigates the problem of vanishing gradient. You can also refer to the original research paper for more details. I assume that you are aware of the architecture if you have chosen to implement a deep learning solution using ResNet. The ResNet-50 model consists of 5 stages with each stage having a convolution and an identity block, where each Convolution block has three convolution layers and each identity block also has three convolution layers. The ResNet-50 model looks this:

ResNet-50 architecture

Transfer Learning

It is a method of training the data where the model has already been trained earlier on a large data set and has been used for classifying huge number of classes. ResNet-50 has been trained on ImageNet database consisting of millions of images and the weights are saved. These weights can be used while implementing the Transfer learning approach. There are two ways of using Transfer Learning, Feature-Extraction and Fine-Tuning. The solution to the design problem here focuses on the fine-tuning approach. In the fine-tuning approach, a fully connected layer is added on the top of the pre-trained ResNet-50 base model. The entire pre-trained model is frozen, which means the weights are not updated, and the learning process is halted. The fully connected layer added on the top of ResNet-50 base is trained for small number of epochs and then the 5c block of ResNet-50 pre-trained model is set trainable by unfreezing the relevant layers. The layers which are unfroze, in conjunction with the fully connected layer added are trained and used for training the data to obtain the classification result. The initial layers of a Convolution Neural Network learn the low-level features and as the network goes deep, higher level features are learnt. Hence, the higher layers are unfroze and further used in the training process. The dimension requirement for ResNet-50 model is (224,224,3). Hence, all the images are converted or resized to this dimension before feeding into the pre-trained model.

Having understood the concept of fine-tuning based transfer learning approach to image classification, I am moving ahead to the core of this article.

Exploratory Data Analysis

I regard this as the most important section of this article.
Upon starting the training process of the model, I was astonished to see a very high accuracy from the initial stages of training. That didn’t seem like everything was alright with the network and the images. I started peeping into the network and found nothing guilty in there, that is when I tried to peep into the data set and to my surprise found that all the images in testing and validation set were available in the training set. This seemed very strange to me as the validation and test set should contain those images that aren’t part of training set. We are supposed to train the model on the training data and evaluate the performance of the model with the validation and test set. Whereas, in this case, since all the images of validation and testing set are already present in the training set, we are training the model and evaluating the model’s performance using the same images. This isn’t the regular process for working with the Convolution Neural Network or be it any Deep Learning model. Hence, I wrote a script to show that the files (images) in testing and training set, and validation and training set match with each other. Please refer to the python script in the Google Colaboratory notebook where I’ve explicitly compared the matches found in the given data set.

Result of exploratory data analysis

Design of the Convolution Neural Network and Model training

As mentioned earlier, I have fine-tuned the pre-trained ResNet-50 model and have added a fully connected layer on the top to classify the images. The architecture of the CNN model is shown below:

Fully Connected layer added on top of ResNet-50

I have added Flatten() layer followed by Dropout with a percentage of 40. Therefore, 40% of neurons are excluded randomly in the training process. The Dense layer consists of 2048 neurons with ‘relu’ as the activation function and the last layer is again a Dense layer with 4 neurons, because the aim here is to classify the images of four different categories. ‘Softmax’ classifier has been used since this problem is a multi-class classification problem. The architecture when the entire ResNet-50 base is frozen looks like this:

CNN architecture with frozen ResNet-50 base

Notice the number of non-trainable parameters when the entire base is frozen. The weights aren’t updated when the layers are frozen. The architecture when the last block is unfrozen and combined with the fully connected layer looks like this:

Architecture of fine-tuned CNN model

As mentioned about the peculiarity of the data set given, the performance of the model recorded 100% for all the training set, validation set and test set. I used RMSprop as the optimizer with the learning rate of 1e^-4. I could notice that the training and validation accuracy started to converge towards 100% as soon as the learning rate dropped to 1e^-5. The learning curves and confusion matrices are shown below:

Learning curve of training and validation accuracy