Reinforcement Learning From Scratch, Part 1: The Simulator

Source: Deep Learning on Medium

As expected, the car isn’t doing great; it drives around blindly before crashing into a wall. That’s okay, everything has to start somewhere. Let’s make our car drive better than this!

Before implementing any model, however, a reward function is clearly not enough; the car must know why it is doing badly. As a result, the car is equipped with a very rudimentary lidar; it throws out rays at a constant set of angles, which then returns the distance to the closest wall for that angle.

Vehicle lidar

The set of points will be what is fed into the network. This will be then run through the network, which then generates a set of actions. The actions will be inputted into the physics simulator, and a reward is given; this reward will then be back-propagated through the model, which then updates the weights depending on the reward given.

When the score becomes consistently high, the temperature will be increased for subsequent training sessions, in order to promote a model that can actually learn to navigate around the course, as opposed to simply memorizing a track layout in its memory and failing to transfer the knowledge on different tracks.

With the testbed set up, it is time to actually crunch some numbers and train some models! Coming up in part 2.

If you would like to review any of this information in more detail, please visit the repository here. Ignore the messiness, at the time of writing this whole thing was put together, hackathon-style, over the course of 8 hours.

I’m super excited about this project, and feel free to comment any suggestions!