Original article was published on Deep Learning on Medium
Data looks fairly balanced out barring few classes. Now let’s go ahead to process of data transformation, augmentation.
Since the image size of all pokemons varies, we will resize all images to be of 400 x 400 size.
2. Horizontal Flip
Horizontal flipping is most of the times a good idea while training the model, acts as an augmenter to our training model.
Pokemons in images are likely not to be always in ‘straight’ posture, some are jumping (Machop) , flying (Farfetchd), sleeping (Snorlax). As a result of which images can come to our evaluator in different positions. Therefore, it is a good idea to feed images with random rotation to our training model.
torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
Normalize does the following for each channel:
image = (image — mean) / std
The parameters mean, std are passed as 0.5, 0.5 in our case. This will normalize the image in the range [-1,1]. For example, the minimum value 0 will be converted to (0–0.5)/0.5=-1, the maximum value of 1 will be converted to (1–0.5)/0.5=1.
5. Convert to Tensor
Required as PyTorch deals data in form of tensors.
We can compose all 5 above listed transformation and pass it as parameter to ImageFolder which loads the data.
train_tfms = transforms.Compose([transforms.Resize((400, 400)),transforms.RandomHorizontalFlip(),transforms.RandomRotation(15),transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
Once we have applied transformations, let’s load one image to observe see them in place.
Now that we have transforms ready let’s divide the data into train and test data sets in ratio of 85 and 15%.
train_size = int(len(dataset)*0.85)
val_size = int(len(dataset)*0.15)train_ds, val_ds = random_split(dataset, [train_size, val_size])
Out: (5797, 1023)
Now using Dataloader we will load the data from dataset in batch size of 32 images.
trainloader = DataLoader(train_ds, batch_size = 32, shuffle=True, num_workers = 2)
testloader = DataLoader(val_ds, batch_size = 32, shuffle=False, num_workers = 2)
Our training function looks something like this:-
train_model(model, criterion, optimizer, scheduler, n_epochs = 5)
We will use pre-trained ResNet-34 model with 150 input features.
model_ft = models.resnet34(pretrained=True)num_ftrs = model_ft.fc.in_featuresmodel_ft.fc = nn.Linear(num_ftrs, 150)
2. Loss Function
We will cross entropy function as our loss function as it works quite well for multi-class classification
criterion = nn.CrossEntropyLoss()
We will use Stochastic Gradient Descent with Momentum. Momentum loosely explained is a moving average of our gradients, which is used to update weights, and ultimately helps in faster convergence.
We pass 0.9 as beta (hyperparameter) value, which often works well with SGD.
optimizer = optim.SGD(model_ft.parameters(), lr=0.01, momentum=0.9)
Scheduler helps in adjusting the learning rate. We are tracking the training accuracy and measuring whether it increases by atleast 0.9 per epoch and if it hasn’t increased by 0.9 reduce the learning rate (lr) by 0.1x.
lrscheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode=’max’, patience=3, threshold = 0.9)
Default number of epochs are set to 5.
Using this model and hyperparameter tuned, we are able to achieve 95% accuracy in just 5 epochs in less than 10 mins, which is really cool, isn’t it?
Testing and Evaluating the trained model
Now comes the main part, where we evaluate our model on never-seen-before Pokemon images randomly picked from Google.
Our model performs outstanding well of on these 5 never seen images with accuracy of 100%
(Note : In test case no. 3 where we have two pokemons (Persian/Meowth) in the same image, the model picks up one of them, as it is trained to pick only one pokemon in an image.)
I hope you would have had great time reading the article.
Full Implementation Notebook: https://jovian.ml/gitrohitjain/pokemon-classification