Original article was published by Marcos Carlomagno on Deep Learning on Medium
After a few headaches, I was finally able to create a standalone gaming environment that we can interact with via 3 functions.
Starts a new fight, this function will be useful to restart the fight on every iteration of our reinforcement learning algorithm.
The getState function returns the actual state of the game (position and life of each fighter) from a given index, where
index === 0 represents Player 1 (Subzero) and
index === 1 represents Player 2 (Kano).
This function is useful to update the current state of the environment during the agent training, for instance, if the opponent gets close to you, you should probably take an evade or hit action.
This is probably the most important one. To allow the agent to interact with the environment and generate actions that produce a reward or punishment, we need an action callback. In this case, the parameter is an “action” that is represented by an ASCII code from the keyboard, these codes is mapped as follows.
And the action will be executed using the following function
Once the integration was achieved and having discarded all the UI functionalities, we can start the training process.
If I want to train this by playing against myself, I will spend hours playing the game blindly (without UI) and programmatically, but if I train it by playing against a player who doesn’t move, the agent will learn just to get close and hit.
So the solution is … more machine learning! By creating two neural networks that fight each other and share memory of the game. Many reinforcement learning algorithms use this approach to train the model, such as AlphaZero.
Finally, our reinforcement learning diagram looks like this
Testing the game
The first test I did to verify if it’s working fine is a simple fight simulation, I bring them closer and once the agents are close enough I make them fight until one wins, it’s a dummy behavior, but works fine.
Once we create two agents that interact with the environment, the next thing we must do is to make their actions being evaluated according to the state of the environment and their historical (action, reward) peers, and not according to an arbitrary conditional.
Creating an environment for agents to interact is an arduous task that requires a lot of time and effort, so it’s always recommended (when possible) to try to use environments that already exist or variants of them.
In the next article I will try to develop the deep q learning algorithm, to allow our agent to learn to fight without predefined rules.
Then I will analyze the results to see if the learning really works and creates emergent properties that maximize your chance of winning a fight.