Source: Deep Learning on Medium
For Computer Vision and Object Detection problems, Convolutional Neural Networks provide exceptional classification accuracy
In some cases, CNN’s have proven to be more accurate than human image classification while requiring less pre-processing than classical machine learning approaches
CNN’s have proven very useful in other domains such as recommendation systems and natural language processing
What is a Convolutional Neural Network?
A Convolutional Neural Network often abbreviated to CNN or ConvNet is a type of artificial neural network used to solve supervised machine learning problems.
Specifically, supervised machine learning is often divided into two subfields. The first is regression which involves models that have a continuous output. The second is classification in which the model output is discrete and categorically defined. For example, a cat or a dog.
In this article we will unpack what a CNN is, then we will look at what it does, what real-world application it has and finally we look at a practical example of how to implement a world-class CNN using Tensorflow 2, which has Keras as a default API.
What on earth is a convolution?
To understand what a Convolutional Neural Network is, we first need to understand a convolution. A convolution is basically a symmetrical sliding window also called a filter. It doesn’t have to be symmetrical, but it usually is.
This sliding filter moves across the input image to look for activations based on the target features. The image below shows a simplified version of what a convolutional layer does when you provide an image.
The sliding window moves along the x-axis until the end and then drops down on the y-axis to cover the entire image
The window that slides acts as a filter on the image to find any pixels or features that it considers relevant. Relevance is determined by comparing the pixels in the input image with the features in the training data.
A single convolutional layer will do this over and over with different features until it has a stack of filters or outputs. Below are the original input image and output of some of the filters
Pooling to prevent overfitting
Another key component of convolutional neural network architecture is a pooling layer. This layer typically sits between two sequential convolutional layers.
A pooling layer is responsible for dimensionality reduction to ultimately prevent overfitting. By reducing the computations and parameters of the network it allows the network to scale better and at the same time provide regularization.
Regularization allows the network to generalize better which ultimately improves the performance of the network over unseen data.
The most common pooling layer types are Max Pooling and Average Pooling. In the practical CNN example later in the article, we will look at how the Max Pooling layer is used. Max pooling is by far the most common pooling layer as it produces better results.
The max pooling calculation finds the max value of the stride parameter which represents the factor by which to downsample in relation to the W x H x D of the data shape.
Dropout for smaller networks
While pooling helps to avoid overfitting by reducing the dimensionality of the parameters, dropout extends regularization further.
Dropout works really well with fully connected layers (which we will discuss next). Dropout will randomly drop units in the hidden layers in order to add noise to the data.
Its surprisingly effective on smaller neural networks and is not only limited to CNN’s. For larger networks Batch Normalization appear to be more popular and personally, I have achieved far better results.
Fully Connected Layer
A central part of a Convolutional Neural Network is that the hidden layers are fully connected. Like most Neural Networks this means that every activated output neuron is fully connected to the input of the next layer.
A CNN fully connected has slight differences from a general FC layer, but bear in mind when designing the architecture of your CNN the fully connected layers expect the same shape as the previous layer.
What can Convolutional Neural Networks Do?
As with many Machine Learning problems, the solution is often to break the overall problem into smaller subproblems. When identifying images or objects a great solution is to look for very similar pixel arrangements or patterns (features).
But image recognition or object classifications are not the only uses for CNN’s. They have proven useful in many general classification problems.
By using smaller regions or filters a Convolutional Neural Networks scale far better than regular neural networks and makes it a great starting point for any classification problem
With Tensorflow and Keras its been easier than ever to design a very accurate ConvNet for either binary classification or multi-classification problems.
Building a convolutional neural network using Python, Tensorflow 2, and Keras
Now that we know what Convolutional Neural Networks are, what they can do, its time to start building our own.
For this tutorial, we will use the recently released TensorFlow 2 API, which has Keras integrated more natively into the Tensorflow library.
At the time of writing, the Tensorflow 2.0 library is still only in alpha release. This means you will need to install it by running the following command:
pip install tensorflow==2.0.0-alpha0
For this tutorial, we will be using the famous MNIST dataset. The data is a well-known set of written hand digits. With only a few lines of code, we can achieve an accuracy of 99.25%.
Let’s get started at designing our first Convolutional Neural Network. First, make sure you import the necessary dependencies following the installation of Tensorflow 2.
import tensorflow as tf
Next, we download our dataset into our training and test sets. The training data is what we will use to train out CNN. The test set is used to measure our accuracy.
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
In order to provide our CNN with the correct classification data, we convert our class vectors into binary class matrices
y_train = tf.keras.utils.to_categorical(y_train, num_classes) y_test = tf.keras.utils.to_categorical(y_test, num_classes)
Now we finally get to the fun part. We create our Convolutional Neural Network model using the Keras API.
# We use the Sequential model in keras which is used 99% of the time model = tf.keras.Sequential()
# We add our first convolutional layer with 32 neurons and filter size of 3 x 3
model.add(tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
# We add our max pooling layer model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
# We flatten the features
#We add a second convolutional layer model.add(tf.keras.layers.Conv2D(64, (3, 3), activation='relu'))
# Our dropout layer
# A fully connected layer
# Another dropout layer with more dropouts model.add(tf.keras.layers.Dropout(0.5))
# We add an output layer that uses softmax activation for the 10 classes
Next, we compile our model and add a loss function along with an optimization function. Detailing these two hyperparameters is outside of the scope of this article, but its something you should look into.
model.compile(loss=tf.keras.losses.categorical_crossentropy, optimizer=tf.keras.optimizers.Adadelta(), metrics=['accuracy'])
Next, we train our model based on the training set and test set
model.fit(x_train, y_train, batch_size=128, epochs=12, verbose=1, validation_data=(x_test, y_test))
This is the output I see from the training data. As you can see the training accuracy and validation accuracy are close to each other. This means we are not overfitting or underfitting.
Train on 60000 samples, validate on 10000 samples
Epoch 1/12 60000/60000 [==============================] - 5s 88us/sample - loss: 0.2183 - acc: 0.9439 - val_loss: 0.1160 - val_acc: 0.9655
Epoch 2/12 60000/60000 [==============================] - 5s 83us/sample - loss: 0.0565 - acc: 0.9827 - val_loss: 0.0597 - val_acc: 0.9822
Epoch 3/12 60000/60000 [==============================] - 5s 84us/sample - loss: 0.0391 - acc: 0.9880 - val_loss: 0.0419 - val_acc: 0.9865
Epoch 4/12 60000/60000 [==============================] - 5s 81us/sample - loss: 0.0303 - acc: 0.9905 - val_loss: 0.0411 - val_acc: 0.9872
Epoch 5/12 60000/60000 [==============================] - 5s 78us/sample - loss: 0.0241 - acc: 0.9923 - val_loss: 0.0335 - val_acc: 0.9892
Epoch 6/12 60000/60000 [==============================] - 5s 78us/sample - loss: 0.0206 - acc: 0.9936 - val_loss: 0.0448 - val_acc: 0.9870
Epoch 7/12 60000/60000 [==============================] - 5s 78us/sample - loss: 0.0163 - acc: 0.9946 - val_loss: 0.0411 - val_acc: 0.9886
Epoch 8/12 60000/60000 [==============================] - 5s 78us/sample - loss: 0.0134 - acc: 0.9957 - val_loss: 0.0473 - val_acc: 0.9881
Epoch 9/12 60000/60000 [==============================] - 5s 79us/sample - loss: 0.0115 - acc: 0.9961 - val_loss: 0.0396 - val_acc: 0.9891
Epoch 10/12 60000/60000 [==============================] - 5s 79us/sample - loss: 0.0103 - acc: 0.9965 - val_loss: 0.0409 - val_acc: 0.9888
Epoch 11/12 60000/60000 [==============================] - 5s 78us/sample - loss: 0.0082 - acc: 0.9974 - val_loss: 0.0407 - val_acc: 0.9888
Epoch 12/12 60000/60000 [==============================] - 5s 79us/sample - loss: 0.0080 - acc: 0.9973 - val_loss: 0.0405 - val_acc: 0.9902 <tensorflow.python.keras.callbacks.History at 0x7fe474f92358>
As a final step, we evaluate the accuracy of the model against the test set. We do this to test the model accuracy on completely unseen data
score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', score) print('Test accuracy:', score)
Here is the output I see when I run the above command
Test loss: 0.03787359103212468 Test accuracy: 0.9923
Now you have an example of an exceptional image classification solution using Convolutional Neural Network.
See if you can tune the hyperparameters to get it even more accurate.
Try to run the same neural network on a different data set.
In this article, we discovered the components that make up a Convolutional Neural Network and detailed the inner workings of the various layers, regularization techniques, and when CNN’s are a great choice.
Finally, we looked at how to use Tensorflow and Keras to build an exceptional digit recognizer.
Please let me know if this was helpful by leaving me a comment. Also, reach out if you have questions or run into issues.
Originally published at https://www.machineislearning.com.