Training Model to Classify Cifar100 with ResNet

Original article was published on Deep Learning on Medium

The another data, with data augmentation, besides give more data, it give us more variability with the Resize and Crop that we apply to it.

Analyze the data and make the data Loader

Now that we have the dataset, we proceed to analyze some aspects and see some samples, we now that the dataset in total is 100k for training and 10k for testing, so we will see some samples of the dataset:

With this we can verify that the dataset was downloaded correctly and the images have their corresponding label.

Also can see the changes of the data augmentation in the second part of the dataset that we will use, where we apply a resize and crop, later we will see how this improves the model training, but basically it’s for more variation.

Training and validation datasets and dataloaders

Here we first split the dataset with their respective sizes, the validation will be a 10% of the total and the rest remain for training dataset, for this part we will use the random_split function, as its name says, this split randomly the data that we give it, into two datasets with the given sizes.

With this datasets we can make the dataloaders that we will use for training the model, using a batch size of 32, and after, we can show some samples of batch:

We can see a sample of one batch, for that we have to define a function using make_grid and this help for verify if the batches are correctly randomized, but doing it is not necessary to continue.


In this part before define a model, we will define some base that will calculate and show the training and validation loss, and the validation accuracy, for that we will also define the accuracy function:

Define a model

Using the previous base we define a class that will define the model, as we say in the beginning, we will use the ResNet model, specifically ResNet34, we will use this model from the torchvision libraries, the output of the model is defined as 1000, so to apply in our case we have to change the output to 100 and then instance the model.

We recommend you to use a GPU for train this model, with only the CPU the training will take a long time, so for that we will use this functions for move all to the GPU:

For that first will see if the GPU is available and then define the functions to move the data.

As already says, using a GPU is highly recommended, specially working with images, so move everything to the GPU will make it faster to train and evaluate.


So now that we wrap the dataloaders to move all automatically, proceed to define the fitting and evaluation.

In the fitting have to insert the hyperparameters that we will use, so here we use weight decay, a little explanation of what is: Is a regularization limits and prevents the weight to becoming too large, adding an additional term to the loss function. And also use the Gradient clipping: This limit some gradients to a small range, preventing undesirable changes in the parameters, with this and the others common hyperparameters, define that for each epoch, calculate the loss of the model with the training data and clip the gradients (if you put a value for it), then optimize the model and repeat.

Having the functions we can start training, instantiating the model in the GPU. Then we put the values that will use:

As optimizer function, we choice Adam, this work better in SGD in this case, and for Weight decay and Gradient clipping use 5e-4 and 0.1 respectively.

Using this values, we train with different epochs and learning rates.

Last version of the notebook

In the this version we use five different epochs: [30, 20, 40, 25, 15], with different learning rates: [1e-3, 1e-4, 5e-5, 1e-4, 1e-5], and get a validation accuracy of 55.7% but with a test accuracy of 50.1%, although we get better results with less epoch and other learning rates, and that was we did in the previous version of the notebook.

Previous version

In the previous version the epoch used was: [8, 10, 5, 15, 25], and the learning rates: [1e-3, 1e-4, 5e-5, 1e-4, 1e-5], without gradient clipping, this get better validation accuracy, reaching a validation accuracy of 58.9% and test accuracy of 50.7%, these results might be for less number of epochs, having less overfitting and therefore more generalization.

Make predictions

After training with different hyperparameters to see the better results of accuracy, we proceed to do some predictions to see some samples of the result:

For that, we define this function, that uses the model to do the prediction and then choose the max value as the definitive prediction.

So for the predictions we will use the version 5(the better) and see some predictions:

We predict by inserting the image and the model used as shown above.

As you can see in this three samples of item 0, 1002 and 3252 of the testing dataset, are good predictions.

Obviously this is a very small part of the dataset, but with this samples we can see that make good predictions in different images.

After see some samples, you can save the model that is trained with, and then evaluate the model with the testing dataset, and save the metrics of that results:

Test accuracy of version five

To save this metric we use the Jovian libraries that can save our notebook and metrics.

As you could see previously, is easy to define and train a model, still it is important to choose the hyperparameters carefully, for best results, weight decay and gradient clipping can help to improves the model, but choosing the right values for the model you want to train is the hard part, we hope this article helps you learn or provide feedback.

If you want to learn more about Deep Learning, PyTorch and related topics, we recommend you go to and, for learn more specially in YouTube videos or Courses.

If you want to see the notebook used for this article and its different versions, go to this link, thanks for reading up to this part.