Original article was published on Deep Learning on Medium
Building a Generalizable Classifier on Tiny ImageNet using Data Augmentation and Transfer Learning
CS 182: Generalizable Classifiers
Andrew Huang, Srujay Korlakunta, Aman Sidhant, Roth Yin
Problem Statement and Background
Deep learning has allowed computer systems to mimic human performance on a variety of tasks. Models like LeNet  and AlexNet , among others, have achieved amazing results in vision tasks like digit and character recognition. Such results have led to a large variety of deep learning based systems being used in consumer and enterprise products to help humans, whether it is catching up on daily news with an Alexa voice recognition system, or completing emails using Gmail’s autocomplete feature.
The success of deep learning often hinges on the availability of large and accurately labeled training datasets. Acquiring large datasets of well-labeled data is often a luxury we don’t have when tackling computer vision problems. It may be expensive, infeasible, or impossible to build a dataset that encompasses all possible scenarios or inputs. An example is medicine — when approaching disease detection, there are infinite possibilities of the form one can take.
Another issue that has become more prevalent with the ubiquity of deep learning is the possibility of perturbations and adversarial examples fooling classifiers to provide incorrect results to an input. This might not have significant consequences when recognizing digits, but could pose serious risks for vision systems in self-driving cars that have been known to ignore stop signs with some markings on them .
Such problems can be solved, or at the very least mitigated, with the use of generalizable classifiers that are robust to perturbations and adversarial examples. In this report, we outline methods to train a generalizable classifier on Tiny Imagenet, a subset of the Imagenet dataset, with 200 classes, each with 500 training images that are 64×64 RGB images. Our primary metric of success is validation accuracy on the Tiny Imagenet data.
Image data augmentation played a large role in our approach to this problem. Data augmentation is a well known technique used to improve the size and expanse of a training set, and is used quite often in fields like healthcare. See below for a dichotomy of data augmentation, found at .
Our application of data augmentation focused on geometric and color space transformations, and we attempted GAN-based augmentation. We provide some sample images of our augmentation approach. Our main transformations in the geometric space are rotations, flipping, and occlusion. We also do some scaling and cropping, but to a lesser degree, as large crops can destroy an image’s class identity. Perhaps a better approach would be to include large crops while considering bounding boxes.
In the color space, we mostly modify brightness, hue, and saturation.
One important strategy that we considered in the image transformation step of augmentation was whether to pursue offline or online (during training) augmentation. Because our dataset is sufficiently small (200 training images for each of 500 classes, about 200MB on disk), it is feasible to transform these images offline and store many copies on disk. However, we make the further optimization of featurizing images directly after augmenting them, then saving the features on disk. This strategy vastly speeds up training.
spatial_aug = iaa.Sequential([
iaa.SomeOf(1, [iaa.Resize(256, interpolation=”linear”),
iaa.PerspectiveTransform(scale=(0.01, 0.10), keep_size=False),
iaa.CropAndPad(percent=(-0.1, 0.15), keep_size=False),
iaa.Cutout(fill_mode=”gaussian”, nb_iterations=2, size=(0.2, 0.3), squared=False),
iaa.Fliplr(p = 2/3),
iaa.Flipud(p = 1/5),
iaa.MultiplyAndAddToBrightness(mul=(0.9, 1.1), add=(-5, 5))
Above is the code for spatial augmentation. It sequentially adds random geometric transformations. This layer of transformations is non-deterministic per-image, and is run for every image in the training and validation sets before each epoch.
color_aug = iaa.Sequential([
iaa.CropAndPad(percent=(-0.1, 0.1), keep_size=False),
iaa.MultiplyAndAddToBrightness(mul=(0.7, 1.1), add=(-30, 10)),
iaa.LinearContrast((0.75, 1.5), per_channel=0.5),
iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255), per_channel=0.5),
iaa.AddToHueAndSaturation((-50, 50), per_channel=True)
This second code block shows the color space transformations each image underwent. Again, these are non-deterministic, and include actions such as adding brightness, contrast, and gaussian noise.
We went through several papers and did several experiments on different sequences of augmentation. We read  and found that layering multiple transformations proved to be most effective when augmenting the dataset.
Our data augmentation approach was complemented by transfer learning. Transfer learning lends itself well to a situation where well-labeled target training data might not be in abundance. Collecting well labeled training data is an important foundational step for building a successful model, but is very hard to do and takes a lot of time. Most state-of-the-art deep learning models are highly optimized for one particular task in a niche subdomain, and don’t generalize well to other tasks. One can think of transfer learning as a way for a classifier to perform on one dataset by learning information from another, related dataset or from a previous task. In one sense, we can think of both these datasets being subdomains of a larger domain, which means that they both have some common characteristics that a classifier can effectively use to learn.
For our task of building a generalizable classifier on Tiny Imagenet, we first trained on Imagenet, and used those model weights to optimize for perturbations to the data from Tiny Imagenet. Our classifier was built on top of an EfficientNet  implementation using AdvProp .
AdvProp, short for Adversarial Propagation, is a popular way to use adversarial examples themselves as additional training data for the classifier. This makes the classifier more robust since the classifier has a wider range of possible observations to learn from, which helps it to generalize better with future test sets. This also helps in preventing overfitting, since the model can learn more from the initial training data before it is used on a validation set with potential adversarial examples.
As a more deep learning based augmentation algorithm, generative adversarial networks are able to generate new training data when only given a limited amount of data. We first noticed the idea to be widely applied in the field of medical image processing. It is proven to be effective and useful in data augmentation by many researchers. Notebly, together only with flipping, GAN achieves one of the best results in this research, outperforming many other sophisticated combinations of methods. 
Among various GAN models, the standard GAN employs KL as the measurement for distance, which is usually disadvantageous due to gradient instability. Wasserstein GAN (WGAN) attempts to solve the problem by introducing Wasserstein distance instead of KL. WGAN is also proven to work the best empricially. [cite the same 2 above]
Specifically, we train the standard WGAN on each class of training data and call generators in the end to get more training data. Batch size is empirically decided to be 8. The quality of the images seem to stagnate after approximately 80 iterations. The losses for both generators and discriminators do not seem to vary enough into one direction, and thus the resulting generated images are not very clear. The potential reasons for this are further explained in the Lessons Learned section.
K-fold validation was not necessary for the MLP classifier because there was little variation in validation accuracy across folds. This result is not too surprising because the dataset has large classes of images with similar classification difficulties. As a result, subsamples of the dataset should be pretty homogenous. Below is a figure of k-fold validation accuracy curves, trained for 2 epochs on unaugmented data, 4 epochs on spatially augmented data, and 2 epochs on color-augmented data, and 2 epochs on unaugmented data. In further models, we used a random 80:20 train-test split.
Training curve of final model, trained for 2 epochs each on unaugmented, spatially augmented_1, spatially_augment_2, color_augment_1, spatial_augment_3, and unaugmented datasets.
The result of GAN did not achieve our expectations. We think there are a few reasons for this. One may be that for each class we only have 500 sample images, there are not enough images for the GAN to get optimal results. Second, we largely based our implementation on research in MNIST with linear layers, which may have limited the capacity of the GAN. Because our dataset comes from ImageNet, we may be able achieve better results with convolutional layers.
Regarding general lessons, we have tried to reinvent the wheel a few times, but found it much more effective to acquire and build off of existing code. However, it is also important to understand the details of such code, as it can be buggy in unexpected ways.
One possible way we think we can improve our results to build a generalizable classifier is to use AutoAugment . We have already seen how data augmentation can help our classifier perform more effectively against adversarial examples. However, a big limitation to data augmentation is that most methods have to be performed manually. Further, each dataset requires a different method of augmentation for it to be effective. For example, horizontally flipping images during training is an effective augmentation method on the CIFAR-10 dataset, but not so much on MNIST . These problems call for an automated process of augmenting datasets, which can be fulfilled using AutoAugment. By learning the underlying policies of which augmentations work best for which dataset, we can optimize the process of data augmentation. Further, it has been shown that AutoAugment generalizes well to transfer learning by transferring learned policies for one dataset to different, related datasets . Initial results have been promising, with AutoAugment achieving an error of 1.5% on CIFAR-10, and state-of-the-art performance on CIFAR-100 and ImageNet without any additional data .
 Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, November 1998.
 A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. Neural Information Processing Systems, 2012.
 E. Ackerman. Slight Street Sign Modifications Can Completely Fool Machine Learning Algorithms. In IEEE Spectrum, 2017.
 M. Tan, Q. V. Le. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ICML 2019.
 C. Xie, M. Tan, B. Gong, J. Wang, A. Yuille, Q.V. Le. Adversarial Examples Improve Image Recognition. CVPR 2020.
 E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, Q. V. Le. AutoAugment: Learning Augmentation Policies from Data. CVPR 2019.
 J. Shijie, W. Ping, J. Peiyi and H. Siping, “Research on data augmentation for image classification based on convolution neural networks,” 2017 Chinese Automation Congress (CAC), Jinan, 2017, pp. 4165–4170, doi: 10.1109/CAC.2017.8243510.
 Shorten, C., Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J Big Data 6, 60 (2019).
 Martin Arjovsky, Soumith Chintala, and LeonBottou. Wasserstein GAN. arXiv:1701.07875v2 [stat.ML], 9 Mar 2017.