Udacity Behavioral Cloning Project

For those unacquainted, the objective of the Behavioral Cloning Project is to teach a car to drive by itself in a simulator by cloning the actions that you, as a human, execute in order to steer the car. At a very high level, we will use Udacity’s sample data consisting of images and the corresponding steering angles provided in driving_log.csv to train a model to steer the car around the track.

To meet the project requirements successfully, it is possible to only use the provided sample data by Udacity to teach the car to drive itself. On top of the base data, we will be using data augmentation techniques to generate more data on the fly to help the network generalize well and not overfit the training data. I will go step by step through my solution and offer suggestions along the way and how to avoid pitfalls that I myself made. I have structured the project code as a class to make it more readable and because it’s a good idea to use object oriented design to write clean code.

Note: The solution that I have implemented is able to drive around the first track per the project requirements without going off the track (which is all you need to complete the project), but it is not able to complete the second track unless training data is collected from that track. You may have seen implementations on the web that are able to drive on both tracks having trained only on augmented data from the first track, but it looks like Udacity changed the second track to be quite different. For starters, the second track has much sharper angles (think, +–25.00 angles) as opposed to max +– 6.0 angles on the first track. Unless you supplement the sample dataset with data from the second track, I think that no amount of data augmentation will help you generalize from the first to the second track.

Python/Keras Versions:

  1. Python 3.6.x
  2. Keras 2.x

Library Imports — Neural Network

We’ll start off by building what I would consider the easiest piece of the architecture, namely the model itself. The Keras library is used here because it makes it super easy to arrange the necessary building blocks without spending a lot of time writing boilerplate code (I’m looking at you, TensorFlow). The file in my GitHub project is called nn.py and you can find it here if you want to see that whole file at once.

Modified Nvidia Network Architecture

For the actual network architecture, the Nvidia model is used because of its simplicity and demonstrated ability to perform well on self-driving car tasks. The architecture is slightly modified to add batch normalization layers instead of Dropout before ReLU activations are applied. To quote a really great paper on Batch Normalization:

Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout.

Furthermore, whereas the Nvidia architecture uses images of with a shape of
(66, 200, 3), I have changed the input_shape to be (70, 160, 3) for real reason except that the crop looked sufficient to me when visualizing the data in a Jupyter notebook. Here’s what the architecture likes like from the Nvidia paper:

Nvidia Architecture (source)

The network architecture implementation is fairly straightforward. First we have a Lambda layer that normalizes our input image. What follows are 5 Conv2d layers, each followed by a BatchNormalization and a ReLU Activation layer.

After the 5 convolutional layers, these multidimensional layers are flattened into what can be thought of as a column vector and fed into 3 plain fully connected layers with the final layer outputting the predicted steering angle.


We need several libraries to build out the pipeline that will train our model. Here csv is included to read from, well CSV files; cv2 for image manipulation; argparse for parsing command line arguments; utils, a small set of utilities abstracting some of cv2’s functionality; shuffle and train_test_split from sklearn for shuffling data and splitting it into training and validation sets, respectively; numpy for awesome numerical tools; and finally model, where our Keras model is defined.

The Pipeline

To organize the logic around data importing, image augmentation and training, I built a class called Pipeline that organizes all the needed functions in a pretty package. Let’s take a look at how the instance variables are defined. I have broken up individual methods into separate Gist files for easy parsing of the source code. The methods are organized a bit differently in the article than on Github, but that is only to help you understand the flow of the pipeline. In any case, if you are comfortable reading the source, you can find it here (but I would still encourage you to read the article).

self.data: stores lines read in from driving_log.csv.
self.mode: is the model that we want to use. In this case, we will be importing the network function from above that will be plugged into our architecture.
self.epochs defines the number of epochs we want our model to train for.
self.training_samples and self.validation_examples will store the result of running train_test_split on data stored in self.data. self.correction_factor defines steering angle adjustment for left and right camera images using during data augmentation.
self.base_path holds the root directory of where the images and driving log are stored.
self.batch_size is, well, our batch size.
self.image_path and self.driving_log_path are just helper variables holding the paths to the images folder and driving log, respectively.

Import Data Into the Pipeline

All import_data really does it just open the driving log CSV file and reading each row into self.data. Each row in the file contains absolute paths to the center, left and right camera images as well as the steering angle (along with other data that is of no interest to us).

Splitting Data Into Training and Validation Sets

Given the rows of data from driving_log.csv stored in self.data, the data is then split into a training and validation sets and assigned to the instance variables self.training_samples and self.validation_samples. I’ve chosen to split the data 80% training and 20% validation but split_ratio could be overwritten if you prefer to partition your data in some other way.

Data Generator

Let’s go through the code. What we see here is a method that is actually a generator. What that essentially means in our case is that instead of loading all of our images samples into memory, we generate a batch of 128 samples and return it using yield. The next time thatdata_generator is called, it will not start from the scratch, but will resume where yield left off, keeping the generator’s variables intact.


samples is the data from which to create the generator. In our case, this data will be coming from self.training_samples and self.validation_samples.
batch_size is the number of samples desired in a single batch.

Code Walkthrough

First, the length of the samples list is calculated, which will be used as the upper limit in our loop later down. Because we’re creating a generator we need to create an infinite loop so that we can call our generator as many times as we wish. What happens next is that we create two for loops that will take care of iterating through batches of samples. The first loop iterates through samples using batch_size as the step size and the subsequent loop iterates through each sample in a given batch. Two lists — images and steering_angles — are created to store the result of our data augmentation process. For each batch sample, self.process_batch will be called to generate augmented images and steering angles, which will be covered below. These augmented images and steering angles are then added to the images and steering_angles lists respectively, converted to numpy arrays and assigned to X_train and y_train variables. X_train and y_train will then be used as the starting point when data_generator is called subsequently.

To reiterate, we use a generator to avoid loading all of the augmented images into memory; instead, our function lazily creates a new batch of augmented data only when it is called. While your computer can probably handle 20K images in memory, it becomes unviable to store larger datasets and those computer resources are probably best used for computation and not storage.

Training and Validation Generators

Two simple methods are defined that use the self.data_generator implementation detailed above to create a generator for the training and validation datasets.

Data Augmentation Utilities

Simple utilities to convert images to RGB color space, flip images horizontally and crop and resize images. I created the utilities simply to add a small layer of abstraction and DRYness since these functions were used in multiple places.

Data Augmentation

This following method is really the crux of the whole pipeline and this is where image augmentation occurs. The augmentation pipeline is very simple: we use the provided center, left and right images as well a horizontally flipped center image to create our augmented dataset. It’s a very simple pipeline but gets the job done.


batch_sample is a list containing the paths to the center, left and right images, as well as the steering angle. The rest of items in the list like Throttle, Break, and Speed are not necessary for data augmentation and are set automatically by the driver code.

Code Walkthrough

Since the steering angle is constant for all three camera images, it is immediately pulled out from the 3rd index in the list and converted to a float using np.float32. Next two lists, images and steering_angles are defined to hold the results of the data augmentation.

For each image path, that is, the first three elements in batch_sample, we split the path on a forward slash, which results in a list where each element is located. For example, calling split on a string like a/path/to/a/file will result in a list containing [‘a’, ‘path’, ‘to’, ‘a’, ‘file’].

What we’re really after is the last element in the list, hence the [-1] notation, which gives us the filename of the image that we’re interested in. We don’t care about the rest of the path because it could be different depending on where the data was created. Since we’re using data provided by Udacity, the absolute path to images in their dataset will differ because our images will be loaded from a different directory on the AWS instance.

Here’s what the raw images look like (in BGR colorspace).

In the for loop, OpenCV is used to read in an image. cv2.imread takes in the path to the image and reads it into image_name. Since OpenCV reads in images in the BGR color space and because drive.py will feed RGB images to our model, we convert the image variable containing a numpy array representation of the image into RGB using utils.bgr2rgb so that our model is trained on RGB images and not some other format.

rgb_image is then cropped to remove the sky and the hood and resized to (70, 160) Each image is appended to the images list.

Cropped RGB Image
Resized RGB image

If the image is a from the left or right cameras, we apply a correction of factor 0.2 to the steering angle to balance our dataset with non-zero steering angles. For example, if we have a steering angle of 0.0 for one of the center images, that means that the left and right images also have a steering angle of 0.0 which creates a dataset that is heavily biased towards 0.0. To offset this problem we apply the aforementioned correction factor to create two new steering angles, 0.2 and -0.2 for each left and right image alongside the center image with a steering angle of 0.0. Lastly for each center image, the image is flipped horizontally, added to the images list and the opposite of the steering angle is added to steering_angles.

Running the Pipeline

A run method is defined to split our dataset and run Keras’ model.fit_generator that will actually start the training phase. Here I want to note something very crucial that will save you hours of headaches. From the Keras docs, steps_per_epoch is defined as follows:

Total number of steps (batches of samples) to yield from `generator` before declaring one epoch finished and starting the next epoch. It should typically be equal to the number of samples of your dataset divided by the batch size.

This is important to understand because the most serious issue that I ran into was that training a single epoch was taking over an hour on an AWS g3.4xlarge instance with 8GB of GPU memory. Something was seriously amiss because other students were reporting training times of no more than 3–5 minutes per epoch. steps_per_epoch is telling Keras how many batches to create for each epoch.

The training set consists of 80% of totaldriving_log.csv rows, which is approximately 6428 rows. So instead of training on 50 batches of 128 images each like I was supposed to, what I was doing was training the model on 6428 batches of 128 images. That’s 822784 images per epoch!


steps_per_epoch=len(self.training_samples) // self.batch_size

No wonder training was taking so slow! Once that realization hit, I began getting realistic results of ~ 1 minute training time per epoch after I changed the code to reflect the documentation. In the code below, I am actually running more samples per batch to help the model learn better but training time is only around 3–4 minutes with (6428 * 10 / 128) = 500 * 128 = 64, 000 images per epoch, instead of 822784 images per epoch.

Final Steps

Lastly, just for convenience’s sake, I define a main function that adds command line argument parsing to allow for the specification of the base directory where images and the driving log are stored. The pipeline is instantiated, data is imported into the pipeline and the training is then commenced with pipeline.run().

The code can be running like so:
$ python3 model.py –data-base-path=./data

Closing Notes

When running your model in the simulator, make sure that you modify drive.py to process images in the same way they were processed in training. For my project, that meant modifying image_array to be image_array = crop_and_resize(image_array).

This was by far one the most interesting projects so far and the delight that I experienced when the car made its first lap around the track after hours of frustration cannot be conveyed through words. I hope that this analysis of the project can you help you avoid some of my pitfalls that I experienced. The key takeaways that I got from this project (apart from training a car to drive itself!) is that small syntax errors are usually the root of all sorts of evil. Take the time to walk through your code and ensure that silly mistakes like proper indentation and use of method arguments are not causing mayhem in your implementation.

I would also like to thank my manager @NathanSeither for graciously allowing me to take this course!

Source Code

The full project is available here.


Source: Deep Learning on Medium