Self driving remote-control car with Apache MXNet

Source: Deep Learning on Medium

Autonomous driving is one of the most high-profile applications of deep learning. Recently AWS announced DeepRacer, a fully autonomous 1/18th scale race car driven by reinforcement learning. In this post I’ll show new-comers to machine learning how to assemble their very own self-driving remote control car and use Apache MXNet to teach it to race on a track. I’ll share a shopping list for components, the complete code for machine learning, and tips to prevent others from making the same blunders that I made.

We start with DonkeyCar, an open source library for developing a no-frills self-driving RC car. Most of the parts can be bought from Amazon if you live in the USA or Canada.

DonkeyCar is a mature project that uses a Keras/Tensorflow framework. I’ve ported the code to Apache MXNet using Gluon API so that we can peek under the hood to study the machine learning code, and to show how we can modify the model. For further reading, check out the Donkey Car documentation here.

Part 1–1: Creating the complete Donkey Car

This is the complete list of parts required to build the car. I’ve added some additional comments based on my experience buying some of these items, as the availability of parts varies by location.

A shopping list

After you have procured all of the components, it’s time to assemble the hardware.

The wiring layout: the servo motors are attached to the driver through the hole in the center, and the driver is attached to the Raspberry Pi board.
The battery is fastened to the bottom of the chassis plate. The output cable is fed through the middle hole to the Pi board.
The adapter plates for the alternative cars: the holes on the top may need widening (I used a fork).

Part 1–2: The software

Here is where we show the differences from the DonkeyCar’s source code, as it uses Keras with Tensorflow backend, while I use Apache MXNet with Gluon API.

You will need to install MXNet on the Pi. I have created an image that contains all the machine learning components you need, if you’d like to save time. Here’s how to install the image:

  1. Download and write the .img file here to the SD card. Instructions on how to write an .img file to an SD card are here.
  2. Save the wpa_supplicant.conf file to the boot partition of the SD card. Full instructions are here.
  3. Place the SD card into the Raspberry Pi and boot it.
  4. SSH into the Pi, and run ‘pip install mxnet-donkeycar’.
  5. Enter the Pi’s configurations with ‘raspi-config’ and expand the partition size for the SD card.

Installing the image is convenient, but at a later date you might want to install an updated version of MXNet. Here’s how to install MXNet and my version of DonkeyCar.

  1. Download the DonkeyCar .img files here, and do steps 1–3 from the guide above.
  2. Install Docker on your machine, instructions are listed here.
  3. Git clone the latest MXNet repo.
     i.e. git clone
  4. In the directory made (‘incubator-mxnet’), run : ci/ armv
    This will build a .whl file in the local build folder named mxnet-x.x.x-py2.py3-none-any.whl, with the x’s being placeholders for the version type. Move this file into your Raspberry Pi, (scp) command.
  5. Once you’ve located the file on the Pi run: 
    virtualenv -p 'which python3' mxnet_py3
    source mxnet_py3/bin/activate
    pip install mxnet-x.x.x-py2.py3-none-any.whl
  6. Follow the rest of the steps 4–5 in the earlier guide.

Once installed onto the RaspberryPi, you need to install MXNet and DonkeyCar onto your development machine as well. Install MXNet by following the official installation instructions here, then follow one of these methods here to install the DonkeyCar software except for the instruction for cloning the original donkeycar repository, which should be replaced with the following command to clone my repository with MXNet additions:

git clone donkey --branch cat-dev

Once you have the software installed, you can now drive the car, or rather, tell it to drive itself!

Part 1-3: Creating the racetrack

The average person probably doesn’t have a remote control car racetrack in their household. Here are a few tips on how to become one of the lucky people with their own racetrack. You will need:

  • Black Masking Tape – 60 yards, 2 inch width: Narrower tape makes the track harder to see using the camera. I ended up widening the track sides to 3 inches so that it’s much more visible. Keep in mind that the camera may have difficulty seeing reflective tape under certain lights.
  • Chalk: To sketch the outline of the tracks.
  • Measuring Tape: For marking distances and drawing circles.

First, choose a clean, open floor and test how well the car handles when you drive on it. You approximately need a 12 x 15 square feet of space. Keep in mind that different floor friction results in different minimum turn radius, so be sure to test how your car performs on the floor before creating the tracks. Try to fit as much variation as possible, to make your self-driver robust to a wide variety of situations.
Lastly, you may have to experiment with the width of the track. I found that making the width of the track be at least 2.5 times the width of your car (~20 inches) worked well, especially for good handling driving around corners. 
 Here are a few photos of the racetrack I created in my garage:

The yellow strips are just a personal taste. They make it look very racy.

Part 2: The Apache MXNet code

If you’re reading the DonkeyCar documents alongside this blog, I recommend to read just past ‘Train an autopilot’, which is enough to know how to create and train the self-driving models for the car. In the following sections we go into the MXNet code, show some improvements made and how the whole thing is done in Gluon. If you would rather run a MXNet backend without knowing the works, feel free to jump to part 5.
Following the usual conventions in machine learning, we split the MXNet code into three modules:

  • Model — Holds the neural network model and its parameters in GluonCategorical class. This class provides the output for the automated pilot. Beside overriding hybrid_forward() function of Gluon HybridBlock, this class implements run(), save(), and load(), which are necessary for integrating with donkey car’s platform. This tutorial provides an excellent guide to writing custom Gluon blocks.
  • Trainer — GluonTrainer class feeds the model with training data and trains it according to the loss function, which is property of the model.
  • DataLoader — Composed of GluonDataSet class and a function that separates the data into training and test datasets, wrapped in a DataLoader object. DataLoader generates mini-batches and applies augmentations to data.

The two top level functions responsible for training and driving are train() and drive() functions in

Part 3: Collecting the dataset and training

Now you have a car built, a race track, and code to learn from data. It’s time to start training our model by gathering a data set. The dataset is collected by recording images and the corresponding steering angle and throttle amount for each image while driving the car manually. 
Details of the preparation required before driving are mentioned here. When you’re ready to create and train a MXNet model, run the train command with an additional ‘ –gluon’ parameter, which specifies to use the Gluon backend.

Part 3–1: Overfitting

Once a data set has been created and is being used to train, you may want to start paying attention to the progress of the accuracy and loss of each epoch:

 Epoch 0, Loss: 1.31158543, 
Train_acc: angle=0.6020 throttle=0.7282,
Test_acc: angle=0.5989 throttle=0.7296

Epoch 8, Loss: 1.02547860,
Train_acc: angle=0.6448 throttle=0.8462,
Test_acc: angle=0.6204 throttle=0.8471

Epoch 16, Loss: 0.88336200,
Train_acc: angle=0.6937 throttle=0.8404,
Test_acc: angle=0.6164 throttle=0.8414

Epoch 24, Loss: 0.77504528,
Train_acc: angle=0.7390 throttle=0.8075,
Test_acc: angle=0.6063 throttle=0.8136

As we see here, the model’s accuracy on the training data is increasing over time while the accuracy on test data is staying relatively constant, or even decreasing. As the goal is to have the model perform well in the wild, we need to make changes to improve accuracy on unseen data. So what are some fixes here?
Early stopping based on test accuracy:

The fix here is rather simple: compare current loss to the old value, and count any increases in loss and quit after a threshold. This logic is used in
Adding weight decay to the Trainer:

Weight decay is a regularization technique to reduce overfit. The Gluon trainer object, which is used in, supports weight decay as an optimization parameter. We use weight decay of 0.001.

The great thing with these solutions is that if you’re looking to optimize the network to your specific data set, you only have to play around with these variables and run them to see how well they test out, with modifications just taking up a single line.

Part 3–2: Image modification

The model may be very accurate but it will still need to know what to do when it makes a mistake or when it doesn’t get clean input. We use a technique from deep learning called data augmentation to add some extra training examples. Data augmentation is implemented in augment_img() function of

This function can be added to transform a sample of an image from the Dataset when retrieved. In short, it flips and crops the image at random (cropping is inspired by blog here). This image cropping is intended to act as an instance of the car being out of the center of the track, and the angle is artificially adjusted to move the opposite direction of the crop.

Original Images
Modified Images (Angle shifted 50% left)

By expanding one half of the image to the full size, we then move the given angle to the opposite side to mimic an actual driver moving away from the side of the lines. We likely won’t see any improvements in the Trainer function, but this should help with the real-time application of the model.

After studying my data, I realized that the top part of my images was not particularly important for the network to train on.

Original Image
Cropped Image

The aim of cropping the top of the image is eliminating distracting, unnecessary information. Cropping also has a nice side-effect — it reduces the amount of data to process. In the next part I’ll discuss what it takes to run the model in real time to respond to constant image feed.


Epoch 0, Loss: 1.42877960
Train_acc: angle=0.5783 throttle=0.4127
Test_acc: angle=0.6129 throttle=0.4134
Epoch 8, Loss: 0.99295235
Train_acc: angle=0.6681 throttle=0.6115
Test_acc: angle=0.6636 throttle=0.6086
Epoch 16, Loss: 0.96183932
Train_acc: angle=0.6698 throttle=0.5959
Test_acc: angle=0.6678 throttle=0.5932

As we can see here, the accuracy has gone up! Although the model stopped training at 16 epochs, it has a higher accuracy, and likely is better suited to handle real time driving.

Part 4: Optimizing processing speeds

As the car must be able to react as fast as possible, the time it takes to receive an output once given an image can be quite crucial to its success. This section explains the optimizations for maximizing neural network inference speed.
Currently, the DonkeyCar limits the frequency of the camera at 20Hz. The car cycles through all its parts: the camera, the controller (or the neural network if provided), then the servo motors for throttle and angle. Once complete, the main subroutine sleeps for the remainder of the 5ms (1s/20hz = 5ms) allotted and repeats again. This allows the camera to produce a new photo every iteration that is as recent as possible.

I have measured the command delay between the manual controller and the servo motors to be between 5ms to 10ms. In order for the autopilot to mimic a human driver, my goal is to ensure the inference time stays well below 10ms.

One optimization to help inference time on Rapberry Pi is to perform all data formatting in numpy instead of NDArray. I noticed that this change shaved approximately 2ms off inference time.

With almost no optimization of the neural network, I measured approximately 8ms of inference time, which is well within our intended range. As I’ll suggest again in the next section, it might be worth investing in making more complex neural network architectures if the inference times are within reasonable limits. One of the techniques in accelerating neural networks on Raspberry Pi is using TVM Stack or Amazon SageMaker Neo service.

Chapter 5: Additional notes and conclusion

Before I wrap things up, I’d like to add a few final thoughts. 
Manual control vs. autopilot:

During the data collection phase, each image is labeled with the most recent throttle and steering angle. However the delay for reading out an image maybe larger than the delay between manual controller and servo motors. This theory means that it may be more appropriate to shift labels by one image during training. I noticed some improvement in autopilot performance after making this change in

Test accuracy:
You may have noticed all of the models have rather underwhelming accuracy. Because we need quick inference time, we have to limit the complexity of the model given the computing power of the RaspberryPi.

The ASUS tinker board, with exactly similar size, capabilities, but with a bigger punch.

There are a few options for upgrading, most notably replacing the RaspberryPi with the Asus Tinker Board, as it is a direct hardware upgrade and includes all the same functionality as the Pi at 50% higher price. Another option is optimizing the network for the Raspberry Pi platform using TVM Stack or Amazon SageMaker Neo service.
Throttle output through images:
Since the only input given to the models are the images, how accurately can throttle be possibly observed? Trying to see how fast a car should be going is hard without the context of visual time-lapse. How fast is the car currently going? Is it currently turning? You may have noticed that the models have a difficult time predicting throttle if you’ve been training it on multiple drive speeds, as the model only has the photo to infer the speed it should be driving at, while most human drivers likely consider current speed as a vital factor to throttle as well. The DonkeyCar documents suggest implementing an accelerometer to feed as additional input. 
Last words:
This blog wraps up my experience with Machine Learning and Self Driving cars. When I started, my knowledge was based on what I learned from a few Youtube videos. I’ve learned a lot of things, I’ve also encounter a bunch of inconveniences, such as trying to optimize the model learning, designing a learnable track, and getting all the parts in. The project roughly took about 10 weeks to complete, with most of the time spent testing the performance of multiple models and trying to pick parameters to make a robust model. Hopefully my experience and what I’ve written here can make it much smoother for someone to pick up this absolutely great and enjoyable application of Machine Learning.