Source: Deep Learning on Medium
In this article we will explore how to build a CNN using keras and classify images.
We will classify the image as either cats or dogs using the dataset.
I have stored the images in a directory structure as shown below
High level steps to build the CNN to classify images are
- Create convolutional layers by applying kernel or feature maps
- Apply Max pool for translational invariance
- Flatten the inputs
- Create a Fully connected neural network
- Train the model
- Predict the output
First we initialize the neural network for building the CNN
from keras.models import Sequential
classifier = Sequential()
We apply convolution operation using multiple feature detector or kernels on the input image. Feature detectors can be to sharpen the image, blur the image etc.
Our input image is a 64 by 64 pixel with 3 channels for the colored image.
We want 32 feature maps by using a 3 by 3 kernel or feature detector with a stride of 1 from left to right and a stride of 1 from top to bottom.
The activation function is relu — Rectifier Linear Unit which helps with non linearity in the neural network.
from keras.layers import Conv2D
classifier.add(Conv2D(filters=32, kernel_size=(3,3),strides=(1, 1), input_shape=(64,64,3), activation='relu'))
Once we have 32 feature maps we apply max pooling for translational invariance. Translational invariance is when we change the input by a small amount the outputs do not change. Max pooling reduces the number of cells.
Pooling helps detect features like colors, edges etc.
For max pooling we use the pool_size of 2 by 2 matrix for all 32 feature maps.
from keras.layers import MaxPooling2D
We can add one more convolutional layer.
This time we will have 64 feature maps with the kernel of (3,3). Default stride is (1,1). We then apply the max pooling to the convolutional layers.
classifier.add(Conv2D(filters=64, kernel_size=(3,3), activation='relu'))
Next step is to flatten all the inputs. The flattened data will be the input to the fully connected neural network.
from keras.layers import Flatten
We now build the fully connected neural network with 128 input units and one output unit. we use Dropout rate of 20% to prevent overfitting.
This is a binary classification problem so we use the sigmoid activation function in the output layer.
from keras.layers import Dense
from keras.layers import Dropout
we now compile the neural network with adadelta optimizer. Adadelta accelerate the convergence.
Loss function will be binary_crossentropy as this is a binary classification problem.
classifier.compile( optimizer='adadelta', loss='binary_crossentropy', metrics=['accuracy'])
We fit CNN to images by apply image augmentation via a number of random transformations.
We zoom the image, shear the image and horizontally flip the image. This helps prevent overfitting and helps the model generalize better.
Our original images consist in RGB coefficients in the range 0–255.These values are too high for our models to process given a typical learning rate. To handle that we set target values between 0 and 1 by scaling with a 1/255 factor.
from keras.preprocessing.image import ImageDataGenerator
# applying transformation to image
train_datagen = ImageDataGenerator(
test_datagen = ImageDataGenerator(rescale=1./255)
We create the training and the test set. Our target size should match the input dimensions of the input image which is (64, 64).
As our data is stored in directories we use flow_from_directory method. flow_from_directory takes the dataframe from the specified path and generates batches of augmented normalized data.
training_set = train_datagen.flow_from_directory(
test_set = test_datagen.flow_from_directory(
We finally fit the data to the CNN model that we created above using fit_genator.
We use fit_genator when
- Datasets are often too large to fit into memory.
- when we need to perform data augmentation to avoid overfitting. This increase the ability of our model to generalize.
To set the values of the parameters we can use the below formula but this is not a hard and fast rule.
steps_per_epoch = Total Training Samples / Training Batch Size validation_steps = Total Validation Samples / Validation Batch Size
We had 8000 images in training data and our training batch size is 32 so steps_per_epoch is set to 8000/32.
We have 2000 images in test set and our batch size is 32 so validation_steps = 2000/32 which is rounded off to 64. I did not get a good accuracy on validation so I tried different combinations and then settled for 150
from IPython.display import display
We get a 99% accuracy on training data and 77% accuracy on test data.
We now finally take an image to make the prediction. I have added test images of dog and a cat to a new folder under dataset called as single_prediction.
How can we identify if 0 is for cat or dog?
We use class_indices for the training_set to understand what 0 and 1 stands for.
If we get an output of 0 then the image is of a cat and if we get an output of 1 then the image of a dog.
How do we make prediction for images?
We need to shape test image input for the predict method to work properly.
The test_image is a 64 by 64 pixel input. We first need to first add the 3 channels for the color to match the input shape that we specified for the first convolutional layers. For this we use the image library from Keras. After applying img_to_array() method to the test_image it has a dimension of (64, 64, 3)
Predict method also expects a batch_size, which is the first dimension for the input image.
Batch_size specifies how many images we will be sending to the predict method. In our example we are sending only one image but we still need to specify that.
Final dimension for test_image is (1, 64, 64, 3)
import numpy as np
from keras.preprocessing import image
test_image = image.load_img("D:\\ML-data\\dataset\\single_prediction\\cat_or_dog_1.jpg",target_size=(64, 64) )
# Adding the channel
test_image = image.img_to_array(test_image)
# adding the batch size as predict method expects
test_image = np.expand_dims(test_image, axis=0)
# Predicting the test image
The result we get is a 1 which means the first image a dog was classified correctly.
We can further fine tune the model by adding more convolutional layers or increasing the depth of the fully connected layer.