Original article was published on Deep Learning on Medium
How an image’s nature affects how models learn
A PyTorch: Zero to GANs project blogpost
Image classification is an interesting and widely used application of deep learning. There are many image datasets, and even more techniques used for implement image classification can be found on the Internet. In this project my goal wasn’t to build the world’s greatest classifier, instead, I tried to detect, how the nature of an image can influence the efficiency of a classifier.
In this project I used PyTorch, which is a Python-based scientific computing package.
The dataset I used in this project is the Intel Image Classification Dataset, what I found on Kaggle. It consists of around 25 000 images, which we can classify into 6 unique classes, which are the follows: buildings, forest, glacier, mountain, sea and street.
After I loaded the images, I had some trouble with them. I ran some diagnostics, and found out, there were some problematic images, which didn’t share the same dimensions as the others. 48 images in the training set, 7 in the validation set and another 14 images in the test set had different sizes than 150 × 150 pixels, so I had to resize them.
Like I said, I wanted to train some models with somewhat different images, so I created some more slightly different datasets from the original.
First, I determined the mean and the standard deviation (std) of the images, so I could normalize them. This means we subtract the mean, and divide them by the std across all three (RGB) channels. As a result, the mean of the data becomes 0, and the std becomes 1. That means the data distribution resembles to the Gaussian curve. It can help the model for quicker convergence.
The normalization can be used by itself, but we can apply other different transformations on the datasets, which can have a positive impact on the training process. This is called data augmentation.
We can apply randomly chosen transformations on the images while loading them from the dataset. Since this transformations appear randomly on the images, the model sees slightly different inputs at each epoch — and it’s a good thing for better generalizing.
In this case I normalized all the images, then added some randomness to them. Some images may have been flipped horizontally, some of them may have missed some areas, and others may have moved a bit in a random direction (see RandomCrop).
That 25 000 images could been found in separated folders. There was a training set with around 14 000, a train set with 3 000 and a prediction set with 7 000 images.
Since there wasn’t a dedicated validation set, I used the random_split function to separate 20% of the dataset for this purpose, and the remainder became the training set. The prediction set with its 7 000 images didn’t contain any labels for the images — the dataset was a part of a Kaggle competition — so I couldn’t use that.
For the modded dataset I had to trick a little. Since I modified the whole training dataset before separate the validation set, I had to use the normalized validation set. Because I didn’t want any overlap in the modified training set, and the normalized validation set (what we would use when we trained the model with the modded dataset), I used fixed random_splitting with the same randomness — in this way I got two datasets exactly as I wanted.
After I had that many colorful datasets, I got curious, how colors affect the learning of a model — so I created a grayscaled dataset from the original one. I know, there must be a dozen build-in or third-party functions for this task, what do its job faster, maybe more accurately — this procedure was one of the few image operations I remembered from school, so I wrote my own one to do it.
Before I could start the training, I had to check the distribution of the images — it wouldn’t help the learning phase, if the majority of the images were buildings alone.
Luckily it wasn’t the case, the images were well-distributed, ready for the training — so let’s move to the next step.
The GPUs can much more faster carry out some kind of tasks, and matrix operations are one of them. Since in this project the most calculations were related to matrices, if we had the opportunity, we should have used GPUs instead of CPUs. Since Kaggle provides us 30 hours GPU usage in a week, I had to move the data to the GPU. That moving mission was defined by the cell of code below.
If we don’t have a GPU — or we don’t always have one — we can still use this code. It search for Cuda GPU, and if finds one, will set the default device to the GPU — if there’s not Cuda GPU the CPU will remain the default device. If so, there won’t happen anything.
The next task was define some models we would use. Since my aim wasn’t building the most accurate predictive system, there are just some simple neural networks.
The first model (Model 0) is made of fully-connected layers. The input layer receives the input pixels — notice that the input number of pixels varies according to the size of the input images. Since we had grayscaled and colorful images as well, all model has a modified version, what can get the grayscaled batch of images, the models on the sketches are made for RGB images.