Product Image Classification using Ensemble Learning

Original article was published on Deep Learning on Medium


Product Image Classification using Ensemble Learning

Hi guys, it’s been a while since the last time I wrote a post in Medium about Bangkit Academy.

Now, I want to share my experience in team with my friends in college that compete in Shopee Code League 2020. We are all beginners in Deep Learning, so that’s why we called our team “Numpang Belajar”, haha. The challenge is product detection which is a problem in image classification. We have 42 categories of the Shopee products. This is the sample of the training set that is provided.

We used Transfer Learning for the approach. By using transfer learning, we can re-use the feature extraction part of the pretrained model and re-train the classification part to learn our dataset. Thus, we don’t have to provide much computational resource to do the training from scratch.

Talking about computational resource, we used Google Colab to train the model using GPU resource with Tensorflow 2.xx. From our experience when using the resource, Tesla-P100 is the fastest one. You can check what device you use in the session by running the following code:

from tensorflow.python.client import device_lib
device_lib.list_local_devices()

Next, here are some of the pretrained models that we used.

InceptionV3

The architecture of the InceptionV3 contains two parts : feature extraction and classification part.

taken from https://cloud.google.com/tpu/docs/inception-v3-advanced

It has 48 layers deep and is commonly used for image classification and it is scored 0.779 accuracy on the ImageNet dataset.

The idea of Inception is to develop sparsely connected network in the convolutional layer, so it won’t force us to specifically define the number of filters that we used in the basic CNN. We can have them all and let the Inception combine ‘em!

taken from https://www.joycexu.io/2017/intuitive-architectures/

After defined the image width and image height (we used standard size for the model that is 299×299), we used the InceptionV3 from TF Keras using this code:

from tensorflow.keras.applications import InceptionV3
from tensorflow.keras.models import Model
from tensorflow.keras.layers import MaxPooling2D, GlobalAveragePooling2D
base_model = InceptionV3(
input_shape=(image_width, image_height, 3),
weights='imagenet',
include_top=False)
# Freeze the first 10 layers
for layer in base_model.layers[:10]:
layer.trainable = False
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.4)(x)
predictions = Dense(42, activation='softmax')(x)
model = Model(inputs=base_model.inputs, outputs=predictions)

More about Inception Architecture:

Fun fact : The name Inception is taken from the meme of the Inception series that is written “We Need to Go Deeper”.

Xception

Xception stands for “eXtreme Inception”. The Inception module have been replaced with depthwise separable convolutions, which consists of a depthwise convolution (a spatial convolution performed independently for each channel) followed by a pointwise convolution (a 1×1 convolution across channels). Xception is faster and more accurate than the Inception, which has 71 layers deep (but with almost same number of parameters) and scored 0.79 accuracy for classifying the ImageNet dataset.

taken from https://www.joycexu.io/2017/intuitive-architectures/

Here the code we used:

from tensorflow.keras.applications import Xception
from tensorflow.keras.models import Model
from tensorflow.keras.layers import MaxPooling2D, GlobalAveragePooling2D
base_model = Xception(
input_shape=(image_width, image_height, 3),
weights='imagenet',
include_top=False)
# Freeze the first 10 layers
for layer in base_model.layers[:10]:
layer.trainable = False
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.3)(x)
predictions = Dense(42, activation='softmax')(x)
model = Model(inputs=base_model.inputs, outputs=predictions)

More about Xception architecture:

Anyway, as an intermezzo, I want to shout-out to my final project teammate in Bangk!t that made me learn much about these things.

To save the model directly to Google Drive, we used ModelCheckpoint and saved every best validation accuracy that we got.

# Checkpoint to save best model per epoch
model_filepath = "/content/drive/My Drive/model-{epoch:02d}-{val_accuracy:.4f}.hdf5"
checkpoint = ModelCheckpoint(
filepath=model_filepath,
monitor='val_accuracy',
mode='max',
save_best_only=True,
verbose=1
)

Ensemble Learning

It is the most interesting part when we did this challenge. After taking a long time in training the model (it took ~30 min for each epochs), we can feel a little bit satisfied to see that Ensemble Learning will boost our model performance, haha. (Although, I’ve ever read that we can only do ensemble learning in the competition and not recommended to put it in the production.)

We used a simple but powerful technique for ensemble : Averaging the Output. So we just take an average of predictions from all the models and then use it as final prediction.

taken from https://www.kaggle.com/fengdanye/machine-learning-6-basic-ensemble-learning

Here the code we used for the Ensemble Learning (with the example using 2 models):

from tensorflow.keras.models import Model, load_model
from tensorflow.keras.layers import Input, Average
model_1 = load_model(model_1_path)
model_1 = Model(inputs=model_1.inputs,
outputs=model_1.outputs,
name='name_of_model_1')
model_2 = load_model(model_2_path)
model_2 = Model(inputs=model_2.inputs,
outputs=model_2.outputs,
name='name_of_model_2')
models = [model_1, model_2]model_input = Input(shape=(image_width, image_height, 3))
model_outputs = [model(model_input) for model in models]
ensemble_output = Average()(model_outputs)ensemble_model = Model(inputs=model_input, outputs=ensemble_output, name='ensemble')

More about Ensemble Learning:

However, we only got ~0.81 at the first time using Ensemble since our each model was much overfit. To tackle this problem, we used Albumentations to do the image augmentation. The idea of image augmentation is to add more scenario in our training images like the image below.

It is almost the same with built-in augmentation using “ImageDataGenerator” in Keras, but ImageDataGenerator gives limited augmentation options. You can find more about Albumentation in this link. However, in this case, we used Albumentation from ImageDataAugmentor. Here is our recipe for the image augmentation:

from ImageDataAugmentor.image_data_augmentor import *
import albumentations
AUGMENTATIONS = albumentations.Compose([
albumentations.HorizontalFlip(p=0.5),
albumentations.OneOf([
albumentations.RandomGamma(),
albumentations.RandomBrightness(),
albumentations.RandomContrast()
],p=0.3),
albumentations.OneOf([
albumentations.ElasticTransform(alpha=120, sigma=120 * 0.05, alpha_affine=120 * 0.03),
albumentations.GridDistortion(),
albumentations.OpticalDistortion(distort_limit=2, shift_limit=0.5),
],p=0.3),
albumentations.ShiftScaleRotate(shift_limit=0.1625, scale_limit=0.6, rotate_limit=0, p=0.7)
])

Besides Albumentation, we also used a callback called ReduceLRonPlateau. It reduces the learning rate automatically if there is no improvement is seen for the quantity that is monitored for a ‘patience’ number of epochs.

tf.keras.callbacks.ReduceLROnPlateau(
monitor='val_loss', factor=0.1, patience=10, verbose=0, mode='auto', min_delta=0.0001, cooldown=0, min_lr=0, **kwargs
)

In result, we can get more than 0.80 for each model. After doing Ensemble Learning again, the accuracy score improved from ~0.81 to ~0.82.

We then added EfficientNetB3 that improved our score to ~0.83. We chose EfficientNetB3 because it gave good result as the image below.

taken from https://ai.googleblog.com/2019/05/efficientnet-improving-accuracy-and.html

Knowing that EfficientNet is so powerful in improving our score, we then tried more of them and also added InceptionResNetV2 in the Ensemble. It is kind of “brute force” I guess. Lol. Anyway, you can find more for the ImageNet Classification Leaderboard from this link.

For the last touch, we tried to review again and did hyperparameter tuning for each model. We tried with different dropout rate, applied bilinear pooling, added regularizer, adjusted learning rate, and reduced the number of freezing layer for the pretrained model. Until we can improve our score to ~0.84.

Here is the accuracy we got for both the training and validation set (20% of all training set) when did Ensemble Learning for the last time :

Yeah, you see how long the epoch takes xD

And this is the final result… we are on 19th place out of 600+ participants. Yeay!

Taken from Shopee Code League #2 Challengeleaderboard in Kaggle

From the score we got, we see that the test accuracy (0.83839) is not much different with the validation accuracy (0.83877). For our model result, you can see that in this sample, the model is still wrong to identify shoes as a trousers.

Overall, I am pretty sure it’s a beginner’s luck. Lol. But after doing this challenge, I think I love to learn more about the image classification.

I know there are much mistakes that we’ve made or this is such an inefficient training method. Let’s discuss if you have any feedback!

References :