Playing Snake with AI

Original article can be found here (source): Artificial Intelligence on Medium

The Game

The arcade classic, Snake, is a single player, grid-based, game that challenges a player to grow a snake as long as possible. The player guides the snake around the grid in search of apples, and each time the snake comes in contact with an apple, its body grows one unit in length. However, if the snake collides with its own body or the boundary of the grid, the game ends. As the snake grows, it becomes more difficult to maneuver around its own body and eat apples. The objective is to grow the snake as long as possible (i.e. eat as many apples as possible) before colliding with itself or the wall.

The Project

As people, we can develop different strategies for playing the game well. For example, we could head right for the apple and try to stay far away from the rest of the snake’s body or we could zig-zag across the screen to maximize the amount of free space on the screen where we can move.

If we can intuitively come up with successful strategies for Snake, we should be able to encode these strategies into algorithms in order to build AIs to play the game. The goal of this project is to implement and analyze various algorithms, from the extremely naive to the more complex, in order to maximize a score in the game.

The Implementation

First, I’ll give credit where credit is due. Thanks to techwithtim for providing the base code to play the game, so I could just start getting my hands dirty with the AI. I did have to change a few of his design choices to make the game less complex.

Our game is a 20×20 grid where each snake body unit and apple take up one space. The snake can only move NSWE one block at a time. When the snake makes contact with a wall or itself, the game resets. Unlike other implementations, the snake cannot wrap from one side of the screen to the other. When the snake dies, it reappears randomly on the grid with length 1. When the snake eats the apple, another apple will appear randomly on a space in the grid where the snake is not.

In a perfect game, the snake would eat 400 apples and be 400 units long (20×20 grid). However, due to the random reappearance of the apple, it’s unlikely that the snake would be able to move in a pattern that could produce a perfect game. We can now redefine our goal more precisely. Find an algorithm that maximizes the average score on a scale from [0, 400].

The Algorithms

This section is focuses on deterministic algorithms to maximize the score. Deterministic algorithms will always produce the same output for a given input. In other words, we encode the decision making logic into the algorithm (e.g. don’t turn into your body, take the shortest path to the apple). When given a snake position (list of cells occupied by the snake) and apple position and a deterministic algorithm, the snake will always take the same path to the apple.

For each algorithm, I’ll provide some intuition, issues, gameplay, and data collected from different runs.

Deterministic Algorithms: Encoding the decision making logic

1. Random: A stupid approach

A completely randomized algorithm is the simplest, and least successful, method for an AI to play the game. Given a snake position (list of cells occupied by the snake), the AI randomly moves a valid cell. This idea doesn’t make use of all the information provided (the location of the apple), and the gameplay looks, well random.

The majority of the time, the games ends with a score of 1. Occasionally it will end with a score of 2.

Completely randomized algorithm.

The data from the trials isn’t surprising.

Number of Runs: 64
Min: 1.000
Q1: 1.000
Median: 1.000
Q3: 1.000
Max: 3.000
Mean: 1.078
The 95.0% confidence interval of the mean score is between 0.997 and 1.159.

2. Shortest Path: A bit better

After using no extra information in the randomized approach, it would make sense as a next step to use all the information available. In this next algorithm, we’ll use the position of the apple to guide the snake in the right direction.

At each step, the AI will calculate the manhattan distance between the snake and the fruit for all NSWE squares (i.e. for all possible moves). The manhattan distance is simply the number of NSWE moves it would take to move from A to B. The AI chooses the direction with the lowest distance from the apple and moves that way. It recalculates the distance for all 1-move adjacent squares and moves again. This process continues until it either finds the fruit or dies. This algorithm is guaranteed to find a shortest path from the snake head to the apple because it always moves in the direction that is closest to the apple. You should be able to notice this in the GIF below.

The scores for this algorithm are noticeably greater.

Number of Runs: 53
Min: 3.000
Q1: 9.000
Median: 13.000
Q3: 15.000
Max: 31.000
Mean: 12.962
The 95.0% confidence interval of the mean score is between 11.375 and 14.549.

Although the algorithm is better, it still makes silly mistakes. For example, If the shortest path from the head to the apple includes the snake’s body, it will still take that path, and the game will end. See an example below.

We’ll modify our shortest path algorithm so that the snake will never choose a direction where it will collide with its own body.

2.1 Shortest Path: No self collisions

This is the same algorithm as above, but the snake will never turn and collide with its own body. If the snake traps itself in a box, it will have to collide with its body and the game will end. You’ll see that visually the algorithm produces some interesting maneuvers to get the snake out of sticky situations. I wasn’t able to capture most of the maneuvers in the GIF, so I recommend that you clone the repo and test this one out locally.

As predicted, when the snake avoids self collisions, we get a better score.

Number of Runs: 50
Min: 8.000
Q1: 26.250
Median: 34.000
Q3: 39.000
Max: 57.000
Mean: 33.720
The 95.0% confidence interval of the mean score is between 30.813 and 36.627.

There is a peculiarity in this algorithm that is the result of the order we choose to search the squares’ distances from the apple. First take a look at the GIF below. We have what appears to be a runaway snake. When the snake reaches the same row as the apple, it turns left and continues into wall instead of finding a path to the apple.

Here it is again slowed down frame by frame. Keep in mind two things. First, the snake will not move in a direction where its body already is. Second, the snake looks at blocks in the order of RIGHT, LEFT, UP, DOWN when deciding which way to move. Take a look at the first frame. The only two directions the snake will consider are LEFT and DOWN. Because LEFT and DOWN are the same distance from the fruit, the algorithm has to make a decision about which direction to choose. This decision can introduce bias in the algorithm. In this version, the algorithm chooses the first direction it sees, in this case LEFT.

After the snake chooses to move left, a similar situation arises. The snake looks at blocks RIGHT, LEFT, UP, DOWN and knows it can only go LEFT, UP, or DOWN. Once again because all three directions have the same manhattan distance from the apple, the algorithm has to make an arbitrary choice. In this version, it chooses the first thing it sees, so the snake moves left. This situation repeats until the snake collides with the wall

We can say that the run away snake (RAS) phenomenon happens when there is only one adjacent block the snake can move to that is closer to the apple, but that block is occupied by the snakes own body. The snake then has to make a decision to move between 2+ apple, equidistant blocks. The different ways to choose between those blocks leads to different flavors of the algorithm. We’ll see if any of these different flavors can improve the average score.

2.2 Shortest Path: No self collision, random RAS choice

Instead of always choosing the first seen block from a list of apple equidistant, we can choose a random block. This will prevent, with a high probability, the snake from choosing the same direction over and over until it runs off the screen (as shown above). Instead, the random movement may move the snake to a block where the shortest distance path to the apple isn’t blocked by its body.

It’s clear the random movement may help improve the game score in this specific case, but it is unclear at the moment if the random movement will improve the average score for the algorithm. This depends on how often the RAS situation occurs within the game. The math to determine that is above my pay grade, so we’ll cheat and use statistics. But first, here are the baseline stats for this version of Shortest Path: No self collision.

Number of Runs: 50
Min: 4.000
Q1: 25.000
Median: 35.000
Q3: 44.500
Max: 67.000
Mean: 35.060
The 95.0% confidence interval of the mean score is between 31.078 and 39.042.

From the baseline stats, it appears that the new flavor of Shortest Path performs marginally better. We’ll calculate a T-test for the means of the two independent algorithms.

Ttest_indResult(statistic=0.5461821125601, pvalue=0.5861820444893)

Because the p-value is high, we cannot reject the null hypothesis of identical scores. We cannot conclude there is a difference between the mean scores of the two algorithms. Statistically speaking, we can’t say our randomized flavor is any better than just choosing the first block we see.

3. Hamiltonian Cycles (more like brute force)

If algorithm runtime isn’t an issue, we can build a bot that is guaranteed to score a perfect game. We’ll use a concept called a hamiltonian cycle. In a hamiltonian cycle, the snake will visit all spaces in the grid only once before returning to any single space. This guarantees the snake will eat at least one apple in each cycle and the snake will only run into its tail when the board is full.

In the algorithm, we’ll hard code a known hamiltonian cycle. The snake will zig zag across the screen then use the top row to return to the upper, left space to repeat the zig zag.

You can see that in the beginning it takes a while for the snake to find the apple. As the snake length grows longer, the number of available spaces for the apple to spawn decreases. Because of that, the snake finds the apple faster and the snake’s growth rate accelerates.

There’s no need for stats for this version. By repeating hamiltonian cycles, the snake is guaranteed to get a perfect score.

Conclusion

To wrap things up, we looked at different deterministic approaches to make bots to play the snake game. The random algorithm is useless. Using a clever shortest path algorithm is quick and gives some pretty good scores when compared to the random one or even when compared to a human. The best algorithm uses hamiltonian cycles and is guaranteed to win (but is slowww).

Other approaches involve using non-deterministic algorithms to play the game. Instead of encoding the decision logic, the bot learns on its own. Keep an eye out for some of these in another article.