Source: Deep Learning on Medium
Behavioural cloning is literally cloning the behaviour of the driver. The idea is to train Convolution Neural Network(CNN) to mimic the driver based on training data from driver’s driving. NVIDIA released a paper, where they trained CNN to map raw pixels from a single front-facing camera directly to steering commands. Surprisingly, the results were very powerful, as the car learned to drive in traffic on local roads with or without lane markings and on highways with minimum amount of training data. Here, we’ll use the simulator provided by udacity. The simulation car is equipped with 3 cameras in the front that record the video as well the steering angle corresponding to centre camera. We’ll train the same model as from the paper.
Collect The Data
The simulator has 2 lanes — the first lane is quite easy with smaller and few curves , while the other lane is difficult with many windy curves and steep hills.
We’ll use training data from both tracks
- We’ll drive on both the lanes keeping car at the centre of the lane. We’ll drive for 2 laps each.
- We’ll drive 1 lap each on both lanes where we try to drift to sides and try to steer to the centre of the lane. This will give us training data that can teach the model corrections.
The captured data contains path to left image, centre image and right image, steering angle, throttle, break and speed values.
We are only interested in path to left image, centre image and right image, and steering angle.
Note:- We’ll use all the left, centre and right images. we’ll adjust the steering angle for left_image by adding some adjustment . Similarly, we’ll adjust steering angle for right_image by subtracting adjustment. In a way , we just triple the training data.
Above histogram shows the training data imbalance. The data for left turns is more than the one for right turns. We’ll compensate this by randomly flipping the training images and adjust steering angle to -steering_angle.
Also the most of the steering angles are concentrated around 0–0.25 and we don’t have much data for larger steering angles. We’ll compensate this by randomly shift the image horizontally and vertically by some pixels and adjusting the steering angles accordingly.
We’ll use following augmentations:
- Randomly flip some of the images and adjust the steering angle to -steering_angle
- Randomly shift the images horizontally and vertically by some pixels and adjust the steering angle by using small adjustment factor.
- There are shadows of trees, poles etc on the road. So, we’ll add some shadows to training images.
- We’ll randomly adjust the brightness of the images.
After applying the augmentations , below are the outputs for some of the training images.
The paper expects the input size of the image to 66*200*3, while the images from the training are of size 160*320*3. Also , paper expects to convert input images to YUV colour space from RGB.
Also there is no need of mountains present in image or car bonnet present at the bottom of the image to be used for training.
So we’ll crop the upper 40 pixel rows and bottom 20 pixel rows from input images. Also as part of pre-processing, we’ll resize cropped image to 66*200*3 size and convert it to YUV colour space.
Here’s the PilotNet model described in the paper:
The model has following layers:
- Normalisation layer (hard-coded)- divide by 127.5 and subtract 1.
- 3 convolution layers with 24, 36, 48 filters, 5*5 kernel and stride of 2.
- 2 convolution layers with 64 filters , 3 *3 kernel and stride 1.
- A flatten layer
- 3 fully connected layers with output sizes of 100, 50, 10
- and final output layer which outputs the steering angle.
We’ll use Mean Squared Error(MSE) as the loss and Adam optimizer with starting learning rate of 1e-4 for training.
I also used EarlyStopping callback on validation loss with patience of 5 epochs. I tried to train it for 40 epochs, but it stopped at 36.
Here are the videos of the prediction using trained model.
Code is available in GitHub repository here.