Original article was published on Deep Learning on Medium
Natural Scenes Classification Using ResNet in PyTorch
This article was translated with the help of Google Translate, the original source in Spanish is here.
This article summarizes the process and results of the implementation of a deep learning model using a pre-trained ResNet, reaching an accuracy of 90% with a training time of less than 10 minutes. Concepts learned in the course “Deep Learning with PyTorch: Zero to GANs” taught by Aakash and his team Jovian.ml were applied for free through the internet.
The complete code for this implementation is found in this NoteBook.
Exploring and Preparing the Data
This is image data of Natural Scenes around the world. This Data contains around 25k images of size 150×150 distributed under 6 categories.
- buildings -> 0
- forest -> 1
- glacier -> 2
- mountain -> 3
- sea -> 4
- street -> 5
Exploring the data
The data is divided into 3 directories, training, test and prediction, training and test directories contain subdirectories for each category and within the subdirectories are the images, in the case of the prediction directory, it only contains images, it is say, untagged images.
Looking at the number of images for each category:
Number of images per dataset:
- Training: 14034
- Tests: 3000
Viewing the dimension and data of a training dataset image:
Reviewing the images of both datasets, it was found that 48 images of the training dataset and 7 images of the test dataset have a different dimension than 150×150 and are discarded for training.
Classes in the dataset:
An example of one of the images:
Subsets of the datasets are created, because only images that are exactly 150×150 in dimension will be used. The test dataset will be divided into two, 50% for validation and 50% for tests
The sizes of each dataset would look like this:
- Training: 13986
- Validation: 1496
- Tests: 1497
Dataloaders are created from subsets of the datasets:
Example of some images inside the dataloader:
Defining the model
Defining the generic model for image classification:
ImageClassificationBase model and using the pre-trained
Training the model
The training function is defined:
The model is instantiated:
The newly instantiated model is evaluated:
Hyperparameters for training are defined:
The training is executed with the frozen model and the defined hyperparameters:
The model is unfrozen and sent to train once more with a smaller learning rate
This is how it varied throughout the epochs:
This training process is done several times with different values for the hyperparameters, and thus find the most suitable ones. Also vary the model to have better accuracy and performance
The model was tested with the set of test images that were not used in training, the accuracy achieved was over 90%
The confusion matrix shows that where there are more false positives and false negatives are in the street with buildings and glacier with mountain categories, this is due to obvious reasons, that on a street there are buildings and that a glacier is like a mountain of ice
In the prediction set you have untagged images, so some images are selected at random and the model has to predict which category it belongs to
The use of deep learning for image classification gives good results, and by making use of pre-trained neural networks, training is streamlined since it has some learned patterns.
It is necessary to explore and know very well the data that will be used for training, because it can lead to errors, as in this case, some images were not 150×150 in size and caused an error in training.
You always have to visualize the different metrics to see the progress of the training and thus know if the choice of hyperparameters helped improve or not.
- Other ResNet (Rest34, ResNet50, etc.)
- Neural Network from scratch
- Data Transformation (Data Augmentation & Normalisation)
- Keep learning