Game Playing AI

Original article was published on Artificial Intelligence on Medium

AI researchers spent a lot of time trying to teach AI how to beat humans at games, and this isn’t just because games are fun. Games provide constrained scenarios for testing new AI algorithms and approaches. In a game, it’s easy to know whether the AI is winning or losing because there’s usually a score or some objective measure of winning. This is great because AI learns from examples, trying things out, and slowly improving. Games basically provide their own training data, which is a big relief, because AI systems need lots of training data to get really good.

An AI can even play games itself to generate training data and involve better game strategies, or an AI can be programmed to look at previous games, even games played by expert humans for strategies that lead to victory. Comparing AIs against expert human gamers can also help us figure out how an AI is improving over time. This comparison also gives us a sense of the difficulty of problems an AI can solve.

This is the reason why some of my first AI demos were games.

Chess & Checkers

Getting a machine to play chess is actually really complicated, so when AI researchers first tried to tackle the problem in the late 1940s and early 1950s, they focused on simpler chess situations, like games with only a few pieces remaining on the board or full games played on a small 6×6 board without any bishops.

At the same time, researchers worked on an AI that could play checkers, because checkers looked easier, although it was almost as complicated. The first program to play a legitimate game of checkers was built by IBM in 1956 and in a classic Cold War move, two programs that could play a full game of chess were developed in parallel by the US and Russia in 1957, but these programs didn’t get good for another 40 years. Checkers was first, with a program called Chinook, which started dominating masters in 1995, and chess followed when a computer called Deep Blue beat the chess master Garry Kasparov in 1997.

Since then, strategy games have been mastered one by one, with the most recent victories over humans at Go in 2017, Dota 2 in 2018, and Starcraft 2 in 2019.

So, let’s start with a really simple goal, like teaching a Bot here how to play Tic-Tac-Toe. One of the ways we can think about playing Tic-Tac-Toe is as a tree with all the possible moves of any given state of what the gameboard looks like.

For example, if this is the current game state, it’s The Bot’s turn and he’s using Xs, there are three places he can go. We can draw a tree representing all possible outcomes for each of these options, and all the options his opponent, me, or anyone else, can take. Because computers think with numbers, each outcome can be assigned a reward. A number like a 1 for a win and -1 for a loss or a tie. Basically, The Bot will need to search through the tree of possibilities to find his win.

To decide which choice to make, The Bot will assume that in each tree, both he and his opponent will make the best possible choices.

Minimax Algorithm

In other words, his policy or his framework for making decisions will alternate between choosing the branch that will maximize the outcome of winning on his turn and minimize the outcome of his opponent winning on their turn. This is called the minimax algorithm. Then, each game state can be assigned a value based on how likely it leads to the Bot winning and he can decide which move to make based on his policy.

Looking at this tree, the Bot will always pick option 1.0 and win the game. Of course, this is a pretty small tree because we were looking at a game already in progress. To draw the whole Tic-Tac-Toe tree from beginning to end, we’d need to represent about 250,000 boards. Now, that seems like a lot, but it would take like, half a second for a powerful modern computer to compute these many options. By laying out all the possibilities and taking the paths that led to a win, the Bot can solve Tic-Tac-Toe. This means that the Bot will always achieve the best possible outcome, either a win or tie, no matter how his opponent plays.

But we can’t solve all games this way. Checkers, for example, has about 10^20 board states, or 10 followed by 20 zeros. That’s more board states than there are grains of sand on Earth. Chess has 10^50 board states, and Go has 10^250 board states. To put those huge numbers into perspective, there are only 10^80 atoms in the entire known universe. Computer scientists have theorized that it would be impossible for conventional computers to calculate these many states due to the laws of physics.

Like, for example, if you combined all of the planets and stars and everything in the whole universe into a single supercomputer, it still wouldn’t be powerful enough to solve the game of Go, but some people have hope that quantum computers may be able to get there someday. So, if figuring out all the board states could be mathematically impossible, how did computers beat the number one ranked human masters in Chess and Go?

Monte Carlo Tree Search

Many modern systems, including Google’s AlphaGo computer that beat a human master in Go in 2017 use an algorithm called Monte Carlo Tree Search. Monte Carlo is a famous casino, so whenever you see the term Monte Carlo, it’s a good bet that the algorithm will be using randomness and chance to solve a problem.

Combining Monte Carlo randomness and regular tree search like Minimax, modern game AIs decide which part of the huge tree to search by guessing at odds. Basically, they want higher odds at the part of the tree they search will lead to a win, but these aren’t just random guesses like we would make in many casino games. AI systems simulate millions of what-if scenarios and use math to estimate the likelihood of winning if they choose one path or another.

In each what-if scenario, the AI considers making one particular move and then simulates playing a large number of, but not all, possible games where the next moves are chosen at random. By averaging these possible outcomes, the AI estimates how good that particular move is. It’s so much faster to estimate a handful of choices than exhaustively calculate each branch of the game tree, and some computers can even do this estimation in real-time.

One example fo this is Google’s Deepmind, which defeated human professional players at Starcraft 2 in 2019, where time is very critical. Of course, Starcraft 2, Go, and Tic-Tac-Toe, aren’t all the types of games that humans play. Other games require other strategies and have other computational challenges.

IBM’s Watson’s question answering system was able to beat human Jeopardy champions in two televised matches in 2011. Watson listened for keywords in the clue and tried to use a knowledge graph to figure out responses. Watson wasn’t perfect and struggled a bit with context. For example, it famously guessed ‘What is Toronto?’ on something in the category US Cities, but Watson was still able to do better than human contestants overall.

Evolutionary neural networks

Evolutionary neural networks use the environment as an input, like reinforcement learning, but this approach introduces multiple agents who try multiple neural network structures and then build on successful ones for the next generation, sorta like animals. The ones that are better at surviving get to pass on their genes. For example, the AI can learn how to play a Super Mario World level by telling what buttons it can push and that getting further to the right in the level is good. This AI will start by basically mashing buttons at random, but as some button mashes get it further to the right, it remembers and learns from those successful attempts.

Where AI cannot beat humans?

Computers definitely seem to struggle with parts of a language, like humor, irony, metaphor, and wordplay. Computers also aren’t great at understanding and predicting real people who don’t always act optimally, so social games could be more difficult, too, but AI systems are finding some success in bluffing games like online poker, so it’s important to not underestimate them.

Computers might also struggle with creativity or surprise because there’s not a really clear way to assign values of states. It’s really difficult to assign a number to how creative something is compared to saying ‘go as far right as you can’ in the Mario level or ‘achieve a winning state’ in a game of chess. So considering all of that, maybe games like charades will be pretty stacked for human victory, or what about Pictionary? Maybe hide and seek?