In today’s article, I am going to show you various approaches to solving the classic game of Snake with Artificial Intelligence. After the end of this post, you will be able to implement solutions that successfully play the game of Snake.
Table of Contents
- Game of Snake
- How to create an AI that plays Snake?
- AI Research Playground
- Domain Specific Solvers
- What’s next?
Game of Snake
Those of you who were raised in the 90’s and before are probably familiar with this classic game and maybe even played it on their first mobile phones.
Anyway, I am still going to briefly introduce you to this game. Snake’s rules are very simple and intuitive.
- We are controlling the snake on a rectangular board.
- We can turn left, right or continue going forward.
- For each fruit we eat, we are scoring a point and our snake gets longer.
- We die when we hit a wall (some versions allow going through them) or our snake’s body.
Sound’s easy, right? For us humans it does, but how do we create a program that maximizes the score while taking into consideration above rules?
How to create an AI that plays Snake?
There is no single good answer to this question, so I’ve decided to show you different approaches to solving the game of snake. I have divided the solvers into: Domain Specific (Part 1) and General Purpose (Part 2).
Without going into further details, domain specific solvers aim to maximize the output in a very specific task (weak AI). In order to do so, they can contain:
- inputs specific to the environment (in our context – locations of the objects on the grid)
- heuristics i.e hardcoded knowledge, shortcuts (in our context – human intuition that pointing towards the fruit is usually a good idea)
In the opposition to the domain specific algorithms, general purpose ones don’t aim for solving narrow tasks. Their goal is to be universal and solve a broader spectrum of problems and maybe someday – reaching or even exceeding human intelligence (Artificial General Intelligence). In order to generalize, they cannot take into consideration any task-specific inputs and heuristics.
AI Research Playground
Before we dive into the algorithms, let’s get more familiar with our Snake implementation. For the purpose of this project, I have developed an AI research playground (Contributions are more than welcome🙂) which is available here. I recommend starting with the instructions provided in the README file.
Each solver (mode) presents a live gameplay preview with stats at the top of the screen, presented in the Score Mode Name (min/avg/max) convention. Whatsmore, you can check ./scores folder for .csv score files and .png plots.
Before starting with the algorithms, I recommend playing Snake yourself.
python slitherin.py --human
Besides entertainment values (it’s fun, isn’t it?), we can get to know the game better and develop strategies to solve the game, usually without even explicitly thinking about it.
An average human player usually starts with a basic policy of pointing the snake towards the fruit in order to maximize the score. Although it’s not an optimal strategy (you’ll later see which one is), it’s a good start.
Domain Specific Solvers
Shortest Path BFS – Breadth First Search
python slitherin.py --shortest_path_bfs
BFS (Breadth First Search) is a graph traversal algorithm and it isn’t the shortest path algorithm per se (if you are more curious you can check and implement Dijkstra or A*), but in our case, it can get a job done.
The algorithm starts at the root node (head of the snake) and explores all neighbor nodes (positions on the grid) at the present depth, before going deeper. It terminates when it finds the fruit.
Shortest Path BFS performs optimally until snake’s length interrupts its shortest path to the fruit.
Shortest Path DFS – Depth First Search
python slitherin.py --shortest_path_dfs
If you take a look at the DFS (Depth First Search) implementation and compare it with BFS you’ll notice that it’s exactly the same except for one difference – data structure that holds neighbor nodes to explore. BFS uses a queue which is a FIFO list type (First In First Out) and DFS uses a stack which is LIFO (Last in First Out). This subtle difference drastically changes the behavior of the algorithm. While BFS starts with exploring present-depth nodes, DFS starts with the highest-depth first. It makes DFS a good way of checking if a graph contains a cycle or not (FYI: Snake’s graph is cyclic). Without termination when detecting a cycle our DFS algorithm would go forever still finding a deeper neighbor.
We’ve already explored two shortest path solvers and while they yield decent results, they are nowhere close to solving the game. Their major drawback is that they don’t take into consideration the snake’s body which gets longer after every eaten fruit.
After a couple of games played, I have found that as it doesn’t matter how fast are we going to eat the fruit, we may choose to take the longest path to reach it instead of the shortest one, just to make sure that we do it safely.
python slitherin.py --longest_path
How to create a longest path?
Let’s consider the following game state on a 4×4 grid:
In orderd to find the longest path from B(←) to D:
- We start with creating the shortest path (BFS) from B to D which is BFGCD (keep in mind that our snake cannot go backwards).
- Then for each pair of consecutive nodes in the path, we try to extend the distance between them by inserting (marked with bold font) <=2 other unoccupied nodes while still maintaining the unbroken chain. When we cannot extend a given pair, we move to the next one
3. We terminate when we have tried to extend all consecutive pairs. Our longest path from B (←)to D is BAIEMNOPLKJFGCD
Longest Path’s results are comparable and definitely not significantly better than the ones from the Shortest Path algorithms.
Why is that?
We didn’t see any significant improvement, because while following the longest path to the fruit, our snake may leave its body on the path and accidentally crash into it after eating the fruit.
How to overcome this? Let’s move on to the next algorithm.
python slitherin.py --hamilton
In order to prevent our snake from hitting its body, we need to make sure that it can ‘escape’ after eating a fruit.
In order to do so, we have to create a cycle. A cycle that contains all nodes in the graph and visits each of them only once is called a Hamiltonian cycle, thus the name of this solver.
How can we create a Hamiltonian cycle in our game?
We need to create the longest path from the snake’s head to its tail (imaginary if the snake has a length of 1).
Let’s go back to our example when our snake’s head is at B and fruit is at D. In fact, we don’t care about the fruit’s position at all. Assuming that we are going to create a path that covers the whole board we are going to eat it anyway. If our head is at B and we are going ←, our (imaginary) tail must be at C.
Let’s create the longest path from B to C then.
As you can see above, we were able to create a Hamiltonian cycle from our grid state and if we make our snake follow this path we can be sure of getting a maximum score of 16 (4*4).
Hamilton solver gets a perfect score of width*height (100 in above example) thus solving a game in the majority of the cases. Creating a Hamiltonian cycle is not always possible as it depends on the initial snake position and its direction.
Although Hamilton solver proved to be successful in solving the game of snake, we don’t have to stop here. All the above solvers were in fact, the sets of rules for the snake to follow, explicitly designed by the human creator. What if we instead of predefining these rules ask our snake to learn them on its own?
Let’s proceed with the essence of AI – Machine Learning!
DNN – Deep Neural Net
python slitherin.py --deep_neural_net
Let’s start with dividing our simple machine learning flow into training and testing.
Training phase’s main objective is to gather training observations and ultimately train a neural network.
Let’s define our training in terms of a Markov chain. For each step:
- State – Return if terminal else observe the snake’s environment
- Action – Pick the random action
- Reward – For each action, get a reward
- State’ – After every action, get a new state
After each step, we are collecting an observation, which contains initial state, picked action and received reward.
After gathering desired number of observations we are going to fit our model with them.
State – In order for our snake to successfully learn. It should accurately perceive its environment. I have picked following features for each step to be extracted from the environment.
- ACTION – Index of an action for a forward pass. Available actions: [left, forward, right]
- LEFT_NEIGHBOR – Bool value whether the left neighbor is empty or not
- TOP_NEIGHBOR – Bool value whether the top neighbor is empty or not
- RIGHT_NEIGHBOR – Bool value whether the right neighbor is empty or not
- ANGLE_TO_FRUIT – Normalized (0, 1) inner angle to the fruit
Action – We let our snake explore and we pick a random action.
Reward – Each state gives us a reward value. Apropriate rewarding is very important and shapes the whole learning process. I’ve decided to give:
- 0.7 – for eating a fruit
- 0.1 – for going closer to the fruit
- -0.2 – for going away from the fruit
- -1.0 – for dying
Keep in mind that it’s important that a reward for a good move and a reward for a bad move shouldn’t even out. Otherwise, we risk making our snake running in circles.
After our training is done we can test the performance of our snake. For each testing step we have to make three forward passes for each possible action [left, forward, right] and then pick the action with the highest predicted value. After that, we can execute our selected action on the snake’s environment. We repeat this process until game’s termination.
I recommend interlacing training and testing phases in order to observe and monitor learning performance.
Last but not least, we need to design our neural network.
Neural Network Architecture
My neural network has a structure of:
- input layer with 5 neurons for [action_vector, left_neighbor, top_neighbor, right_neighbor, angle_to_fruit]
- hidden layer with 125 neurons with ReLu 6 activation
- fully connected layer with Adam optimizer and MSE loss function
- output layer with 1 neuron (prediction value for a given action)
Feel free to modify and tweak it as you want, as it’s a very basic design. It may also be beneficiary to modify it in such way to remove the action_vector from the input and in a result receive 3 output values instead of 1. Doing so we can reduce the number of forward passes for each prediction (from 3 to 1).
DNN’s performance is sub-optimal in the early stages. Our snake doesn’t go into cycles and points towards the fruit. However, with the current model structure, where it knows only about it’s closest surroundings, it doesn’t indicate any bigger picture perspective of the whole environment.
Let’s try feeding our snake with more information!
DNN – Monte Carlo
python slitherin.py --deep_neural_net_monte_carlo
Monte-Carlo tree search is as a heuristic search algorithm that is based on applying the most promising moves.
But how can we determine which one of the available moves is the most promising?
In order to do so, we can for each step:
- Execute a series of background gameplays (in this case DNN-driven)
- Group them by initial move
- Count an average final score for the initial move
- Pick the initial move with the highest average final score
Don’t worry if it’s not entirely clear right now, I’ll cover further details of Monte Carlo tree search in Part 2 of this tutorial.
DNN Monte Carlo Very is very slow and inefficient in the beginning, but relatively favorable in the late stages. DNN-driven simulations allow the snake to choose relatively wise long-term moves.
Even though domain specific solvers appeared to be successful, we cannot stop our Snake solving journey here. Stay tuned for Part 2 of this tutorial where I will cover general purpose solvers.
Questions? Comments? Feel free to leave your feedback in the comments section or contact me directly at https://gsurma.github.io.
Source: Deep Learning on Medium