Deep Q Learning — A Reinforcement Learning Algorithm



Do you know what differentiates us, humans, from other species like monkeys, apes, animals that we share the most amount of genes with. For that matter, what separates us from the rest of all life on the planet?

Apart from social differences, what differentiates us isn’t that we have larger brains, or that we’re the strongest. Apparently, even in terms of intelligence, some animals are comparable or maybe even smarter than humans! What differentiates us? I think it boils down to just our ability to learn, adapt, and better our environments.

Sure quite a few animals do use tools, how many ended up inventing computers and the internet? How many have built cars?

So what happens when we start giving this ability of learning and adapting to machines? I’ll tell you what happens! Absolute magic.

Reinforcement learning happens.


I’ve been going super deep into this field and built a couple projects with reinforcement learning (RL) algorithms and I want to explain to you one of them and how I used it to code a self-driving car in a virtual environment!

Let’s dive right into it!

What exactly is Reinforcement Learning?

In it’s most basic form, RL is how we train AI agents to perform desired tasks inside an environment.

So let’s take a maze as an example. We want to train this agent to complete the desired task (finishing the maze) inside an environment (the maze). Simple right? Let’s go deeper!

When we first start this agent, logically he’s going to be starting at the start of the maze. And then it’s going to take a step in a direction and continue taking steps until it solves the maze. We call these steps, actions. It’s just the terminology we use. With each new action (or step), we enter a new state.

But how do they actually learn? How do they know what’s the right task to perform? Let’s think about how we train a dog to do a trick. Everytime it does a trick, we give it a dog treat or a reward. The dog understands this and since it wants to get as many dog treats as possible, keeps doing the trick. If the dog does something we don’t like, we scold the dog or give it a negative reward.

RL works in a similar way in that when an agent performs the desired task, we give it a reward. If it performs a task we don’t want it to, then it gets a negative reward. In order to maximize that reward, the agent learns how to keep repeating the actions that give him a positive reward and minimize the actions that give him a negative reward.

Let’s summarize everything.

The aim of reinforcement learning is to train an agent to perform the desired task in an environment. Each action this agent takes moves it into a different state which gives the agent a certain reward. The goal of the RL agent is to maximize reward, so it will figure out the optimal method of completing this task by completing tasks that give positive reward and avoiding states that give it a negative reward.

And those are the basics of reinforcement learning!

What’s Q-Learning?

Think about the path you would take if you were the agent to finish the maze.

The whole reasoning behind q-learning is trying to make a map of the best path to get the maximum amount of rewards. You can think of it as a treasure path to show the agent which direction to go.

How does the algorithm do this? It takes in as input, the state and the action then calculates the expected future reward if the agent takes that action. This is called the q-value.

This is the exact update function for q-learning otherwise known as the Bellman Equation.

This is how the q-values are updated to make sure that the agent’s treasure map, or the q-table, maps the environment as closely as possible and ensures that the agent can make the best possible decisions to get maximum reward.

The difference between the algorithm I used to code the self-driving car (Deep Q-Learning) and Q-Learning is that instead of using a q-table to map out the environment and all the q-values, Deep Q-Learning uses the power of deep learning, specifically neural networks to predict the q-values of the different states. Instead of using the Bellman equation to update itself like the q-table in q-learning, neural networks simply use loss functions and then backpropagate those losses to adjust the weights of the neural network.

What I Built

I implemented Deep Q-Learning to code a self-driving car in a virtual environment. Over time it learned to avoid the lines that user drew because that gave it a negative reward and it learned the best way to navigate from the bottom right to the top left using these “roads”.

This self-driving car has a few sensors positioned 20 degrees left, straight ahead, and 20 degrees right to help the self-driving car to detect when it is getting too close to the edges of the road.

This is a super important function because this is where the neural network is calculating the loss and updating its weight using stochastic gradient descent according to how much they contributed to the overall loss.

Basically where the network is figuring out where it went wrong, correcting itself so it’s right next time.

This last function that I’m going to show you is actually where the agent decides what action to take in the environment. It takes in the state which is the input for the neural network and then using the self.model (the neural network) gets the output from the neural network which are the q-values.

The applications for reinforcement learning are literally endless and I cannot be more excited to learn more and code more in this field!

Before you go:

1. Clap this post.

2. Share with your network!

3. Connect with me on linkedin!

4. Check out my website: www.anishphadnis.com

Source: Deep Learning on Medium