Soda Bottle Identification

This post details the techniques used for soda bottle identification from a set of images posted as part of the Deep Cognition Soda Bottle Identification Challenge.

The approach is inspired by and uses a ResNeXt-101 architecture released by Facebook Research. ResNext-101 is a Deep CNN and the pre-trained weights from ImageNet are utilized in a PyTorch environment.

There are 5769 images which make up 8 different classes of soda bottles. Examples are shown below:

Steps taken:

  1. Optimized Learning Rate Identification
  2. Data Augmentation and Stochastic Gradient Descent with Restarts
  3. Differential learning rate
  4. Analyze Results

  1. Optimized Learning Rate Identification

The size of images is brought down to 224 which is the standard for ImageNet. Batch size used was 32 to ensure the GPU computing could handle large resources during training.

Finding the exact learning rate does not have a strategic approach and is time-consuming in conventional methods. However, in’s first course, Jeremey Howard showed the learning rate finder technique. Essentially, the appropriate learning rate is found by plotting the loss vs the learning rate. At a certain point, the loss falls fast and is ideal for training the deep learning net. In this model, it is 0.03 which is shown below.

Training output layer by keeping weights of earlier layers fixed, 4 epochs are run initially with validation and training losses accompanied by accuracy on the validation set:

Average accuracy has increased from 91% to 96%.

2. Data Augmentation and Stochastic Gradient Descent with Restarts

Since soda bottles shown above are zoomed in and sometimes at different angles, it makes sense to augment the data accordingly. The activations are allowed to change this time. Stochastic Gradient Descent with Restarts which refers to cyclical learning rate [1] is utilized as the optimizing method.

3. Differential learning rate

Finally, the entire model is unfreezed and the neural net is allowed to learn all activations. Differential learning rate is employed by keeping small learning rates for earlier layers as they represent basic features and high learning rates for later layers as those are the new objects that need to be identified.

Also, 3 different cycles of learning rate are utilized to generalize to a better loss surface.

An accuracy of close to 99% is reached.

4. Analyze Results

Test time augmentation or making a prediction on 4 different augmented versions of the test image is employed on the validation set. We reach an accuracy of close to 99% as expected.

From the confusion matrix, we see the wrongly predicted images for each class.

A few correct images at random:

A few incorrect images at random:

We can see here that the soda is disoriented or has bad lighting and the model is unable to distinguish well.


The algorithm easily obtains around 99% accuracy. However, there are images where it fails and it can be improved upon probably by more set of images where hands are involved.

Thanks a lot to Jeremy Howard and Rachel Thomas for course where I learned a lot. It’s a great way to learn deep learning models and intuition that I recommend.

Also, the code can be found on my Github.

[1] Leslie Smith, 2017, Cyclical Learning Rates for Training Neural Networks

Source: Deep Learning on Medium