Implementing end to end learning for self-driving cars

Source: Deep Learning on Medium


Go to the profile of Subham Tewari

Self-driving cars mark the biggest change in the automotive industry in the last decade. All the major car companies are involved in developing their own self-driving car. Autonomous tech will be a $7 trillion industry and also save a lot of human lives in years to come.

This article will give you an insight into how to develop a self-driving car using Convolutional Neural Network.

The Approach

Mostly when we talk about self-driving cars, we talk about LIDARs, RADARs, 360 degree cameras and costly GPUs. We decompose the problem into several parts like lane detection, path planning, and control. But in end to end self-driving car model, we are building a model which only takes the front camera images of the car and predicts the steering angles. We use minimum training data and minimum computation, the car learns to drive on roads with or without lane marking.

This builds a great model which is light and computationally inexpensive and provides a end to end solution to this self driving car problem.

Why we need Self-driving car?

* Reduce driver costs.
* Reduce driver stress.
* More efficient parks.
* Saves time and reduces traffic.
* Reduces accidents.
* Supports car pooling.

Methodology

The car is equipped with three cameras mounted behind the windshield of the car. Video is captured simultaneously with the steering angle data. In order to make the system independent of the car geometry we consider the steering data as 1/r where r is the turning radius in meters.We use 1/r instead of r to prevent singularity when driving straight. The training data consists of single image frames sampled from the video and paired with the steering data.

CNN Model for End to End Self Driving Car

The above image depicts the CNN model where the videos are fed into the CNN as image frames and the model outputs the desired steering angle. Then using back-propagation algorithms we try to minimise the loss between the desired steering angle and the computed steering angle.

Testing the CNN model

After training, the model can predict the steering angles using video images of a single front camera.

Dataset– Around 45,000 images of the driving car, 2.2 GB. Dataset was made by Sully Chen in 2017. The data was recorded around Rancho Palos Verdes and San Pedro California. The dataset can be found through this link.

Network Architecture– The network consists 9 layers which includes 5 convolutional layers, one normalization layer and 3 fully connected layers.

Network architecture
Code for the network architecture

The first layer of the network consists of image normalization. This is hard coded as it is not learned during the model learning process. Normalization helps in accelerating by GPU processing.

The convolutional layers are used for feature extraction, which were chosen empirically through experiments done for convolutional layers configuration.We used strided convolutions in the first three convolutional layers with a 2×2 stride and 5×5 kernel size and used non-strided convolutions with a kernel size of 3×3 in the last two layers.

The five convolutional layers are followed by three fully connected layers which outputs a inverse of turning radius.

Training Details

For training a convolutional neural network we have to select the image frames which will be input to it. We have sampled the video at 10 FPS. We haven’t used a higher sampling rate as it would lead to inclusion of similar images which wouldn’t have given any useful information.

We have also done image augmentation on image frames by adding shifts, rotations so that car can learn to recover from unexpected situations. Image augmentation perturbations is chosen randomly from a normal distribution. The normal distribution has mean zero and standard deviation is twice the standard deviation that was measured with human drivers.

Visualisation

We have taken two examples- an unpaved road and a forest road. In case of unpaved road we can see that the feature map activation shows the outline of the road. While in case of forest road, the model is not able to find anything useful and mostly contains noise.

We can see that the CNN is able to detect the outline of the road, but we never explicitly trained it to detect the outline of the road.

CNN Feature Map activation(Bottom left: Activation of first layer feature map. Bottom right: Activation of second layer feature map)
CNN Feature Map activation(Bottom left: Activation of first layer feature map mostly contains noise. Bottom right: Activation of second layer feature map mostly contains noise)

Conclusions

So now we have discussed how with less than 100 hours of driving data, the CNN was able to learn the outline of the road(including diverse conditions like unpaved roads, sunny weather, rainy weather).The model was able to detect the outline of the road, without explicitly label the data.

We need to improve the robustness of the problem, to find a way to verify the robustness and improve the visualisation of the network-internal processing steps.

The implementation is available in my Github page. The research paper can be found here.