Source: Deep Learning on Medium
A Simple Image Classifier on Satellite Imagery using Tensorflow
Much of the latest technological innovation is based on computer vision — the science behind how computers can make sense of digital images or videos. Indeed, fairly recent breakthroughs in the field of Convolutional Neural Networks (CNNs) — a powerful technology specializing in computer vision tasks — are the main drivers behind the explosion of new and improved tools used in self-driving cars, facial recognition applications, and medical diagnostic imaging. And use cases abound across industries.
Maritime transport is one example where computer vision in used to optimize various operations. Vessel image recognition and recording systems support automated monitoring of ships and other maritime surveillance actions. Examples include detecting illegal activities in the oceans such as piracy, smuggling, pollution, and illegal fishing.
In this post, I demonstrate how to quickly build a very simple binary Image Classifier trained only on images from Planet satellite images collected over the San Francisco Bay and San Pedro Bay areas of California (Open California dataset) that are labelled either ‘ship’ (1000 images) or ‘no-ship’ (3000 images) [Source: Kaggle]. Once your deep learning environment is set up, this should not take you longer than 10 minutes to complete.
Preparing the dataset
Organising the data
First, let’s write a Python script that will read the image directory path and arrange the data into directories and subdirectories of train and test data for both positive and negative categories of images. Organizing the data this way will make the use of Tensorflow’s ImageDataGenerator class easy as it requires you to point at a directory so that its subdirectories will automatically generate labels for you.
There are two main components of the dataset generator script. The first is the split of images into positive (ship) and negative (no ship) directories.
The second is a further split into training and test sets into a structure of directories and subdirectories compatible with requirements of the ImageDataGenerator.
To use the script straight away, you can clone this repo. What this does is it takes in the image directory path and outputs two directories ‘training’ and ‘testing’ with two subdirectories ‘POSITIVES’ and ‘NEGATIVES’, each. If you then point the ImageDataGenerator to either of the directories, it will load and label the images accordingly. Make sure the data is contained in one folder and that its class/label is apparent from the image name in the following format: 0__xyz.jpg vs 1__xyz.jpg, where 0 denotes a negative example and 1 a positive one.
Once our data is in the right shape, we can instantiate an image generator, pass <rescale> to normalize the data (also pass data augmentation technique you wish), and call the <flow_from_directory> method to get it to load images from that directory and its subdirectories. Make sure that the generator is pointed to the directory that contains subdirectories, which in turn contain images. A great feature of this API is that because images are resized/augmented as they are loaded, there is no need to pre-process thousands of images on our file system; this allows us to experiment with different sizes and augmentation techniques without impacting source data.
A caveat on data augmentation
While conventional wisdom stipulates that data augmentation techniques are useful to improve a network’s ability to generalize to unseen data, this is not always the case. Data augmentation is a powerful tool for dealing with overfitting as it artifically increases the number of training examples and therefore provides a network with more information to learn what makes an image that of, say, a cat.
But in some cases the network’s performance actually worsens, particularly when the images collected for training as well as testing and actual application are collected in a very specific way (e.g. let’s say the same camera collects images for both training and production purposes, therefore image quality and angle, for example, are always the same). In this case there is no need to worry about generalizing over these properties — overfitting to the characteristics of image collection method would actually be more beneficial. More generally, if the training data is relatively consistent, data augmentation techniques can hurt performance more than they improve it. Here is a more in-depth piece on the topic.
Remember that the ImageDataGenerator provides a straightforward and quick way to experiment with data augmentation — use it.
Training and evaluating the network
Next, we prepare for training. You can use Google Colab for this exercise. Google Colab is a free cloud service by Google that is based on Jupyter Notebooks and supports free GPU. This makes it possible for absolutely anybody to develop deep learning applications using popular libraries such as TensorFlow, Keras, and PyTorch. Here’s a great brief introduction.
Now that we’ve pre-processed and split our dataset, we are ready to implement our neural network. Our architecture consists of 3 convolution layers with 2 x 2 max-pooling, followed by a flattening layer and two fully connected layers. The final fully connected layer is a sigmoid layer since we have only two classes (ship, no ship).
Our pattern is:
CONV — POOL — CONV — POOL — CONV — POOL — FC — FC — SIGMOID
We start with 3 convolution/pooling layers, which essentially break down the images into features. The result of this process then feeds into two fully connected layers, which are responsible for the final classification decision. A flattening layer is placed between the last pooling layer and first fully connected layer to adjust for dimensionality. The first fully connected layer takes inputs from the feature analysis and applies weights to predict the correct class; the second one provides final probabilities for each class.
The way convolutions tease features out of images is by passing ‘filters’ over images in order to change the underlying image in such a way that certain characteristics of the image are highlighted (example: horizontal lines, vertical lines, etc.). The goal is for these convolutions to filter the images down to the features that will determine the output. The pooling layers are then used to compress an image by going over it 4 pixels at a time (in our case of 2 x 2 MaxPooling) and retaining just the largest value. What this does it to preserve features that were emphasized by the convolution while quartering the size of the image.
In the input layer we define the shape of our data (150 x 150 x 3). The flattening layer takes the square 150 x 150 images and turns them into a one-dimensional array so that It can be fed into a fully connected layer. We use ‘relu’ as our activation function (this means negatives values will be thrown out). The output layer is in the shape of the number of categories we are trying to predict. Since we are building a binary classifier, it consists of 1 output unit.
We use the Adam as the optimizer, binary_crossentrophy to calculate the loss, and accuracy as our metric. The network is trained for 25 epochs.
We first use the ImageDataGenerator to rescale pixel values as well as to augment images in a number of ways: rotation, width shifting, height shifting, shear intensity, zoom, horizontal flip, and defining fill mode. This gives us a training accuracy of 0.9304 and a testing accuracy of 0.9408. The then decide to exclude all augmentation techniques to see how this will impact performance. We end up with a training accuracy of 1.0000 and a testing accuracy of 0.99. Not bad given the simplicity of our network. This is an example of an image classification task where data augmentation does not help a network’s performance — on the contrary.