Natural Scenes Classification Using ResNet in PyTorch

Original article was published on Deep Learning on Medium

Natural Scenes Classification Using ResNet in PyTorch

This article was translated with the help of Google Translate, the original source in Spanish is here.

This article summarizes the process and results of the implementation of a deep learning model using a pre-trained ResNet, reaching an accuracy of 90% with a training time of less than 10 minutes. Concepts learned in the course “Deep Learning with PyTorch: Zero to GANs” taught by Aakash and his team were applied for free through the internet.

The complete code for this implementation is found in this NoteBook.

Exploring and Preparing the Data

Data description

This is image data of Natural Scenes around the world. This Data contains around 25k images of size 150×150 distributed under 6 categories.

  • buildings -> 0
  • forest -> 1
  • glacier -> 2
  • mountain -> 3
  • sea -> 4
  • street -> 5

Exploring the data

The data is divided into 3 directories, training, test and prediction, training and test directories contain subdirectories for each category and within the subdirectories are the images, in the case of the prediction directory, it only contains images, it is say, untagged images.

Path to directories

Looking at the number of images for each category:

Number of images per category

Defining Datasets


Number of images per dataset:

  • Training: 14034
  • Tests: 3000

Viewing the dimension and data of a training dataset image:

Size and values ​​of an image

Reviewing the images of both datasets, it was found that 48 images of the training dataset and 7 images of the test dataset have a different dimension than 150×150 and are discarded for training.

Classes in the dataset:


An example of one of the images:

Example image

Defining DataLoaders

Subsets of the datasets are created, because only images that are exactly 150×150 in dimension will be used. The test dataset will be divided into two, 50% for validation and 50% for tests

Dataset subsets
Subsets for validation and testing

The sizes of each dataset would look like this:

  • Training: 13986
  • Validation: 1496
  • Tests: 1497

Dataloaders are created from subsets of the datasets:


Example of some images inside the dataloader:

Training dataloader images

The Model

Defining the model

Defining the generic model for image classification:

ImageClassificationBase Model

Extending the ImageClassificationBase model and using the pre-trained resnet18 network:

NaturalSceneResnet Model

Training the model

The training function is defined:

Training function

The model is instantiated:

Model Instance

The newly instantiated model is evaluated:

Evaluate untrained model

Hyperparameters for training are defined:


The training is executed with the frozen model and the defined hyperparameters:

Frozen Model Training

The model is unfrozen and sent to train once more with a smaller learning rate

Unfrozen Model Training

This is how it varied throughout the epochs:

Loss vs No. of epochs
Accuracy vs No. of epochs
F1 score vs No. of epochs
Learning rate vs No. of batchs

This training process is done several times with different values ​​for the hyperparameters, and thus find the most suitable ones. Also vary the model to have better accuracy and performance

Training history


The model was tested with the set of test images that were not used in training, the accuracy achieved was over 90%

Evaluate trained model

The confusion matrix shows that where there are more false positives and false negatives are in the street with buildings and glacier with mountain categories, this is due to obvious reasons, that on a street there are buildings and that a glacier is like a mountain of ice

Confusion matrix


In the prediction set you have untagged images, so some images are selected at random and the model has to predict which category it belongs to

Prediction of 10 random images


The use of deep learning for image classification gives good results, and by making use of pre-trained neural networks, training is streamlined since it has some learned patterns.

It is necessary to explore and know very well the data that will be used for training, because it can lead to errors, as in this case, some images were not 150×150 in size and caused an error in training.

You always have to visualize the different metrics to see the progress of the training and thus know if the choice of hyperparameters helped improve or not.

Future work

  • Other ResNet (Rest34, ResNet50, etc.)
  • Neural Network from scratch
  • Data Transformation (Data Augmentation & Normalisation)
  • Keep learning