Source: Deep Learning on Medium
How to build a CNN model that can predict the classification of the input images using transfer learning
First misconception — Kaggle is a website that hosts machine learning competitions. And I believe this misconception makes a lot of beginners in data science — including me — think that Kaggle is only for data professionals or experts with years of experience. In fact, Kaggle has much more to offer than solely competitions!
There are so many open datasets on Kaggle that we can simply start by playing with a dataset of our choice and learn along the way. If you are a beginner with zero experience in data science and might be thinking to take more online courses before joining it, think again! Kaggle even offers you some fundamental yet practical programming and data science courses. Besides, you can always post your questions in the Kaggle discussion to seek advice or clarification from the vibrant data science community for any data science problems.
One of the quotes that really enlightens me was shared by Facebook founder and CEO Mark Zuckerberg in his commencement address at Harvard
You just have to get started.
— Mark Zuckerberg
Getting started and making the very first step has always been the hardest part before doing anything, let alone making progression or improvement.
There are so many online resources to help us get started on Kaggle and I’ll list down a few resources here which I think they are extremely useful:
In the following section, I hope to share with you the journey of a beginner in his first Kaggle competition (together with his team members) along with some mistakes and takeaways. You can check out the codes here. The sections are distributed as below:
- The context of the Competition and Data
- Final Thoughts
Let’s get started and I hope you’ll enjoy it!
Context of the Competition and Data
In my very first post on Medium — My Journey from Physics into Data Science, I mentioned that I joined my first Kaggle machine learning competition organized by Shopee and Institution of Engineering and Technology (IET) with my fellow team members — Low Wei Hong,Chong Ke Xin, and Ling Wei Onn. We had a lot of fun throughout the journey and I definitely learned so much from them!!
We were given merchandise images by Shopee with 18 categories and our aim was to build a model that can predict the classification of the input images to different categories.
Great. Now that we have an understanding of the context. Let’s move on to our approach for image classification prediction — which is the FUN (I mean hardest) part!
Some Images for Classification
As you can see from the images, there were some noises (different background, description, or cropped words) in some images, which made the image preprocessing and model building even more harder.
In the next section I’ll talk about our approach to tackle this problem until the step of building our customized CNN model.
Whenever people talk about image classification, Convolutional Neural Networks (CNN) will naturally come to their mind — and not surprisingly — we were no exception.
With little knowledge and experience in CNN for the first time, Google was my best teacher and I couldn’t help but to highly recommend this concise yet comprehensive introduction to CNN written by Adit Deshpande. The high level explanation broke the once formidable structure of CNN into simple terms that I could understand.
Image preprocessing can also be known as data augmentation.
The data augmentation step was necessary before feeding the images to the models, particularly for the given imbalanced and limited dataset. Through artificially expanding our dataset by means of different transformations, scales, and shear range on the images, we increased the number of training data.
— FIRST Mistake —
I believe every approach comes from multiple tries and mistakes behind. So let’s talk about our first mistake before diving in to show our final approach.
We began by trying to build our CNN model from scratch (Yes literally!) to see how the CNN model performed based on the training and testing images. Little did we know that most people rarely train a CNN model from scratch with the following reasons:
- Insufficient dataset (training images)
- CNN models are complex and normally take weeks — or even months — to train despite we have clusters of machines and high performance GPUs.
- The costs and time don’t guarantee and justify the model’s performance
Fortunately, transfer learning came to our rescue.
So… What the heck is transfer learning?
Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task. In our case, it is the method of taking a pre-trained model (the weights and parameters of a network that has been trained on a large dataset previously) and “fine-tuning” the model with our own dataset.
Great. With so many pre-trained models available in Keras, we decided to try different pre-trained models separately (VGG16, VGG19, ResNet50, InceptionV3, DenseNet etc.) and selected the best model.
Apologies for the never-ending comments as we wanted to make sure every single line was correct.
At first glance the codes might seem a bit confusing. Let’s break it down this way to make things more clearer with the logic explained below:
- We first created a base model using the pre-trained InceptionV3 model imported earlier. The fully connected last layer was removed at the top of the neural network for customization purpose later.
- We then added a Global Spatial Average Pooling layer and the reason can be found here.
- After that we created a new fully connected output layer, followed by a dropout layer for regularization purpose.
- Finally, we added a softmax layer for 18 classes (18 categories of the images) and combined the base model with the new output layers created.
At this stage, we froze all the layers of the base model and trained only the new output layer.
This is the beauty of transfer learning as we did not have to re-train the whole combined model knowing that the base model has already been trained.
Fine Tuning the Combined Model
Once the top layers were well trained, we fine-tuned a portion of the inner layers.
Optionally, the fine tuning process was achieved by selecting and training the top 2 inception blocks (all remaining layers after 249 layers in the combined model). The training process was same as before with the difference of the number of layers included.
YES, and we’re done!
The final accuracy was 78.96%.
We tried different ways of fine-tuning the hyperparameters but to no avail.
When all the results and methods were revealed after the competition ended, we discovered our second mistake…
— SECOND Mistake —
We did not use ensemble models with stacking method.
The common point from all the top teams was that they all used ensemble models.
Instead, we trained different pre-trained models separately and only selected the best model. This approach indirectly made our model less robust to testing data with only one model and prone to overfitting.
Despite the short period of the competition, I learned so much from my team members and other teams — from understanding CNN models, applying transfer learning, formulating our approach to learning other methods used by other teams.
The process wasn’t easy. The learning curve was steep. The learning journey was challenging but fruitful at the same time. And I’m definitely looking forward to another competition! 😄
Thank you for reading.
If you enjoyed this article, feel free to hit that clap button 👏 to help others find it.
As always, if you have any questions or comments feel free to leave your feedback below or you can always reach me on LinkedIn. Till then, see you in the next post! 😄