Original article can be found here (source): Deep Learning on Medium
In the last two articles, you learned to use ML-Agents and trained two agents. The first was able to jump over walls, and the second learned to destroy a pyramid to get the golden brick. It’s time to do something harder.
When I was thinking about creating a custom environment, I remembered the famous scene in Indiana Jones, where Indy needs to get the golden statue and avoid a lot of traps to survive.
I was thinking: could my agent could be as good as him? Spoiler alert: it is, as you can see in this replay video!
That’s why during the last two weeks, I developed in Unity an open-source reinforcement learning environment called the Mayan Adventure. A dangerous modular environment full of traps. And, in order to train the agent, I used a learning strategy called curriculum learning.
So today, you’ll learn about curriculum learning, and you’ll train our agent to attain the golden statue and beat the game.
So let’s get started!
Introducing the Mayan Adventure
As I said, the Mayan Adventure is an open-source reinforcement learning environment for ML-Agents.
In this environment, you train your agent (Indie) to beat every dangerous trap to get the golden statue.
During the learning process, our agent starts to avoid falling off the ground and get the golden statue. As he’s becoming better we add some difficulty, thanks to the two pressure buttons, your agent can transform itself into rock or wood.
He will need to transform itself into wood to cross the wooden bridge otherwise, the bridge will collapse.
Then into a rock to cross the fire level.
The reward system is:
In terms of observation, our agent does not use computer vision but ray perception tensors. Think of them as lasers that will detect if it passes through an object.
We used two of them detecting safe platform, to rock button, to wood button, fire, wood bridge, goal, and gem. The second ray detects also the void (in order to train our agent to avoid falling from the environment).
Moreover, we added 3 observations: a boolean informing if the agent is rock or not, and x and z velocity of the agent.
The action space is discrete with:
Behind the hood: how was it done?
The Mayan Adventure started with this prototyped version.
But it was not really appealing, in order to create the environment we used different free packages:
- 3D Game Kit: A fantastic environment created by Unity, I use their rock platforms, buttons, vegetations and pillars elements.
- Unity Particle Pack: I used it for the fire system.
- Creator kit: puzzle: I used it only for the win particle fireworks.
- The other elements, wood bridge, and animation, rock heads, golden statue, fedora, etc were made in Blender.
I designed the project to be as modular as possible. It means that you will be able to create new levels and new obstacles. Moreover, this is still a work in progress, it implies that corrections and improvements will be done and new levels will be made.
To help me create the environment in terms of code, I followed the very good course made by Immersive Limit on Unity Learn.
The question you may ask now is how we’re going to train this agent?
Let’s train Indie
To train Indie, we’re going to use PPO with a Curriculum Learning strategy. If you don’t know what PPO is, you can check my article.
What is Curriculum Learning?
Curriculum learning is a reinforcement learning technique to better train an agent when it needs to learn a complicated task.
Suppose you’re 6yo, and I say to you that we’re going to start learning Multivariable Calculus. Then you’ll be overwhelmed and unable to do it. Because this is too hard for the beginning: you’ll fail.
A better strategy would be to learn simple mathematics first and then add complexity as you get better with the basics to be able, at the end, to complete this advanced course.
So you start by arithmetic lesson then when you’re good you continue with Algebra lesson, then Complex Algebra lesson then Calculus lesson and finally Multivariable calculus lesson. Thanks to this strategy, you’ll be able to succeed in Multivariable calculus.
This is the same strategy we’re going to use to train our agents. Instead of giving our agent the whole environment once, we train it by adding a level of difficulty as he gets better.
In order to do that in ML-Agents, we need to specify our curricula: a YAML config file that will specify when to change environment parameters (in our case, increase the level) based on some metrics (the average cumulative reward). Think of this curricula as a syllabus that our agent needs to follow.
The Curricula goes like this:
- In the first level, the agent needs to learn to get the Golden Statue and avoid falling off the platform.
- Then, in the second level, the agent needs to interact with the physics buttons and turns itself to wood to cross this big wood bridge.
- In the third level, the agent needs to transform itself to rock in order to cross this fire.
- Finally, in the last level, the agent needs to learn to transform itself into wood and to cross a slimmer bridge without falling from it.
Let’s get this golden statue!
If you don’t want to train the golden statue, the trained models are in the Unity Project in the Folder “The Mayan Adventure Saved Models”.
So now we understood how the Mayan Adventure environment and curriculum learning work let’s train our agent to beat the game.
The code is divided into 2 main scripts:
- MayanAdventureArea.cs: that controls the level generation, agent, and goal spawn.
- MayanAdventureAgent.cs: that controls the agent movement, handle events (what happens when you’re rock and you’re in the wood bridge etc), and the reward system.
First, we need to define our curricula,
To do that, you need to go to your ML-Agents folder/config/curricula folder and create a new folder called mayanAdventure and inside create your MayanAdventureLearning.yaml file.
- The key indicator we will use to measure the progress of our agent is the reward.
- Then, in the thresholds section, we define the average reward to go to the next level. For instance, to go to level 1, our agent needs to have an average cumulative reward of 0.1
- The min_lesson_length specifies the minimum number of episodes an agent must do before changing the level. It helps us to avoid the risk that our agent is lucky during one episode and changing the level too fast.
- Finally, we define the parameter which is here the level number.
Now that we’ve defined our curricula, we can configure our agent. Our agent is an instance from a prefab. Consequently, to modify all at once we’ll modify the prefab directly.
In the prefab MayanAdventureArea, you need to check that training is true. We created this bool to differentiate the training phase from the testing phase where we added some events such as activating a winning panel, wood bridge destroy animation and also display the firework when you win.
Then in the Prefab go to the Agent, first in the Behavior parameters, if there is one, remove the model.
After that, you need to define the observation stacked vectors. This will depend if you use/does not use a recurrent memory (that will be defined in the trainer config) if not, you should stack 4 to 6 frames. If yes, only 3.
It’s important that both Vector Observations and Ray Perception observations have the same stack number.
Now we can define the hyperparameters. This is the configuration I wrote that gave me the best results.
Finally don’t forget to deactivate the Main Camera that is here for Replay purposes only.
We’re now ready to train. You need to open your terminal, go where ml-agents-master is and type this:
mlagents-learn ./config/trainer_config.yaml --curriculum=config/curricula/mayanAdventure/MayanAdventureLearning.yaml --run-id TheMayanAdventure_beta_train --train
Here, we defined:
- Where the trainer_config is: ./config/trainer_config.yaml
- Our curricula: — curriculum=config/curricula/mayanAdventure/MayanAdventureLearning.yaml
- The id of this training: — run-id TheMayanAdventure_beta_train
- And don’t forget the — train flag.
It will ask you to run the Unity scene,
Press the ▶️ button at the top of the Editor.
You can monitor your training by launching Tensorboard using this command:
tensorboard — logdir=summaries
Before obtaining good results, I’ve made about 20 trainings in order to find a good set of hyperparameters.
I give you the two best trained saved models, the first with a recurrent memory, the other without. The training took me about 2h10 with 30 parallel environments and only CPU on a MacBook Pro Intel Core i5.
We see that the two agents have quite the same results, the training without memory is quite better. This is the one I used for the video recording.
Now that you’ve trained the agent. You need to move the saved models files contained in ml-agents-master/models to the Mayan Adventure Saved Models of the Unity Project.
Then, you need to deactivate all the instances of the MayanAdventureArea except MayanAdventureArea (1).
In fact, as we do in classical Deep Reinforcement Learning when we launched multiple instances of a game (for instance 128 parallel environments) we do the same hereby copy and paste the agents, in order to have more various states. But we need only one for the replay.
And don’t forget to activate the Main Camera.
Now, you need to go back to the MayanAdventureArea prefab and deselect Training.
Finally in Agent Behavior Parameters, drag the model file to Model Placeholder.
Then, press the ▶️ button at the top of the Editor and voila!
If you want to record your results, you just need to go to Window>General>Recorder>RecorderWindow and click on Start Recording with these parameters:
The Next Steps
The Mayan Adventure is a work in progress project, it means that corrections and improvements will be done and new levels will be made. Here some next steps we’re going to take.
Ray casts maybe not sufficient: we need to give him vision ability
What we discover during the training is that our agent is good but some challenges will definitely require vision.
The “problem” with vision is that it increases the state size exponentially . It means that the next version will only be trained on GPU instead of CPU.
The idea could be to have an orthographic upper view of the environment as input like in Unity’s Gridworld example.
New Levels on the row and timed events
Because the Mayan Environment is an RL research environment, we want to add more complex levels to train our agents to learn long-term strategies.
Consequently, in addition to work on the vision version, we’re currently working on adding more complex levels. Such as the rolling ball trap.
But also some timed events such as turning on and off the fire level every 3 seconds.
Adding randomness in the generation of the level
Currently, our generator always outputs the same order for the levels. We want to improve that by adding some randomness in the level generation process.