My Tryst with Deep Learning — German Traffic data set with Keras

Source: Deep Learning on Medium

Deep Learning course offered by New York Data Science Academy is great to get you started on your journey with deep learning and also encourages you to do a full fledged deep learning project. I decided to do an image recognition challenge using the German Traffic sign data set. I have never worked on image recognition before and hence this project was a great learning experience personally.

Problem Statement and Goal of the Project

The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011.

Traffic sign detection is a high relevance computer vision problem and is the basis for a lot of applications in industry such as Automotive etc. Traffic signs can provide a wide range of variations between classes in terms of color, shape, and the presence of pictograms or text.

In this challenge, we will develop a deep learning algorithm that will train on German traffic sign images and then classify the unlabeled traffic signs. The deep learning model will be built using Keras (high level API for tensorflow) and we will also understand various ways to preprocess images using OpenCV and also use a cloud GPU service provider.

We will be working with Keras for our algorithm building. Keras was chosen as it is easy to learn and use. Keras also seamlessly integrates well with TensorFlow. After Tensorflow, Keras seems to be the framework that is widely used by the deep learning community.

The Entire code for the project could be found on my GitHub account.

Algorithmic Process

Similar to any machine learning model building process we will also be executing the same golden steps defined below

  1. Understand the data
  2. Preprocess the data
  3. Build the architecture of the model
  4. Test the model
  5. Iterate the same process until you achieve the optimal results
  6. Deploy the model (Not considered for this exercise)

Data Understanding

The Image dataset consists of 43 classes (Unique traffic sign images).

Training Set has 34799 Images , Test set has 12630 images and the validation set has 4410 images.

# Understand the data
print("Training Set:", len(X_train))
print("Test Set:", len(y_test))
print("Validation Set:", len(X_valid))
print("Image Dimensions:", np.shape(X_train[1]))
print("Number of classes:", len(np.unique(y_train)))
n_classes = len(np.unique(y_train))
Sample Images
Class distribution

Couple of inferences from the data that we will tackle during the preprocessing stage

a) Class bias issue as some classes seem to be underrepresented

b) Image contrast seems to be low for lot of images

Establishing a score without any preprocessing

It’s always a good practice to understand where your model stands without doing any preprocessing as that would help you establish a score for your model, which you could improve upon each iteration. The evaluation metric for our model would be “accuracy” score. I had resource constraints and was running the tests model on my mac (8GB RAM) and hence used a simple“dense” or “fully” connected neural network architecture for baseline scores and other testing.

Dense Network Architecture

model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(32*32*3,)))
model.add(Dense(128, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(n_classes, activation='softmax'))

The model = Sequential() statements loads the network. The input shape is 32*32*3 (as images have 3 color channels) . In Keras, there is no specific input layer command as the input shape is the implicit input layer. The number of parameters on the first layer would be 393344 ((32*32*3*128) + 128)). We can calculate the number of parameter for the other layers in the same fashion.

The Activation function is “relu”. During hyperparameters optimization we can check with Tanh, Sigmoid and other activation function if they are better suited for the task. For now we stick on to “relu”.

There are 4 hidden layers of 128 neurons with relu activation and after each hidden layer except the last one a dropout(50%) function is included.

The output layer has the softmax activation since we are dealing with multi class classification and there are 43 classes.

The model was able to achieve an accuracy score of 84% without any preprocessing.

Data Preprocessing

Now that we have a score at hand, lets understand if preprocessing the images would lead to a better accuracy score and help our model.

Data Augmentation is used to increase the training set data. Augmenting the data is basically creating more images from the available images but with slight alteration of the images. We generally need data proportional to the parameters we feed the neural networks.

I found OpenCV to be excellent for image preprocessing. Here’s the link to the general tutorials to use OpenCV with Python implementation. Some of the techniques used in the process are Rotation, Translation, Bi lateral filtering, Grayscaling and Local Histogram Equilization.

Slight Rotation of Images: I used 10 degrees rotation of images. It would not make much sense to rotate images more than that as that might lead to wrong representations of the traffic signs. Let’s view few images after slight rotation(not that noticeable in few images also)

M_rot = cv2.getRotationMatrix2D((cols/2,rows/2),10,1)
Images after 10 degree rotation

Image Translation: This is a technique by which you shift the location of the image. In layman terms, if the image’s location is (x1,y1) position, after translation it is moved to (x2,y2) position. As you can see from the below images, the location is slightly moved downwards.

Images after translation

Bilateral Filtering: Bilateral filtering is a noise reducing , edge preserving smoothening of images.

Gray Scaling: Gray scaling of images is done to reduce the information provided to the pixels and also reduces complexity.

def gray_scale(image):

return cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)

Local Histogram Equalization: This is done to increase the contrast of the images as we had identified during “Data Understanding” that the images might need an increase in contrast.

def local_histo_equalize(image):

kernel = morp.disk(30)
img_local = rank.equalize(image, selem=kernel)
return img_local

Here are the images after all the preprocessing.

Images after preprocessing

Fixing Class Bias with Data augmentation: We are set to increase to the training set images with data augmentation, it would also make sense to address the class bias issue. Hence during augmentation, all the classes were fed with 4000 images. In the original dataset Class 2 had the maximum number of training images with 2010 records. The number 4000 (Max class records * ~2)is an arbitrary number I took to make all classes have same number of records. We can definitely play around this distribution further.

Here’s the code snippet that makes all the classes to have the same number of records as we need.

for i in range(0,classes):

class_records = np.where(y_train==i)[0].size
max_records = 4000
if class_records != max_records:
ovr_sample = max_records - class_records
samples = X_train[np.where(y_train==i)[0]]
X_aug = []
Y_aug = [i] * ovr_sample

for x in range(ovr_sample):
img = samples[x % class_records]
trans_img = data_augment(img)

X_train_final = np.concatenate((X_train_final, X_aug), axis=0)
y_train_final = np.concatenate((y_train_final, Y_aug))

Y_aug_1 = Y_aug_1 + Y_aug
X_aug_1 = X_aug_1 + X_aug
Class distribution after fixing class bias

Model Score after Data augmentation and after fixing class Bias:

The same dense neural network architecture as one used above was able to better it’s accuracy score to 88.2% after data preprocessing, which suggests to us that preprocessing of the images (Augmenting the data) was worth the effort.

Convolutional Neural Networks

The next step in the model building journey would be to use a much sophisticated architecture to boost our model performance. Research in the field of computer vision has established that Convolutional neural networks performs exceedingly better at image recognition challenges and hence should be the first choice. Our goal from the project was to systematically build a deep learning model and understand how each step would affect the model performance. Hence CNN was not used at the first place. It’s also beyond the scope of the article to explain how CNN’s work. Here’s an intuitive article on the same.

Convolutional Neural Network Architecture

Here’s the Convolutional neural network architecture for the model

model_conv = Sequential()
## If You preprocessed with gray scaling and local histogram equivalization then input_shape = (32,32,1) else (32,32,3)
model_conv.add(Conv2D(32, kernel_size=(3, 3),activation='relu', input_shape=(32, 32, 1)))
model_conv.add(MaxPooling2D(pool_size=(2, 2)))
model_conv.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model_conv.add(MaxPooling2D(pool_size=(2, 2)))
model_conv.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model_conv.add(MaxPooling2D(pool_size=(2, 2)))
model_conv.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model_conv.add(MaxPooling2D(pool_size=(2, 2)))
model_conv.add(Dense(128, activation='relu'))
model_conv.add(Dense(n_classes, activation='softmax'))

There are 4 convolutional layers + Max Pooling layers . The kernel size for the convolutional layers is (3,3). The Kernel refers to the filter size. The general size used are (5,5) or (3,3).

One thing to note here is that the input shape is (32,32,1). In the dense networks we had (32,32,3) as we had not done grayscaling. Since we performed grayscaling on our images, the channels value is would become one.

A max pooling layer is added with a pool size of (2,2) along with Batch Normalization. Max pooling layers is used to reduce the dimensionality which helps shorten the training time and also helps reduce overfitting.

Then there are also two fully connected layers before the output layer. Note here that we need to flatten the output before this layer as the input expected is one dimensional vector.

Since this is a multiclass classification the solftmax activation is used.

CNN Architecture

I ran the model on my computer for 100 epochs and it took 4 days to complete(was curious to know how long it runs). The model score boosted to 97.2%. Kind of explains the hype around CNN’s.

Now it made more sense to either buy some GPU’s for faster processing or go to a cloud service provider to experiment different architectures. I found FloydHub to be excellent in that regard. Using Floydhub is extremely easy. We just need to upload the dataset and import the Python code through GitHub or manually upload the code.

The entire code now runs in approximately 15 minutes and I can definitely test with different architectures going forward.

Way Forward

This experience of building a deep learning model from scratch and also follow the process to build one was a great learning experience. I am constantly learning new stuffs everyday in this journey and trying new improvements. The next few steps to implement would be

  1. Identify the best architecture along with the best hyperparameters. Also to try AlexNet or VGGNet.
  2. Use transfer learning