Reinforcement Learning on Ball Balancer Game from Unity

Original article was published by Aniruddha Choudhury on Artificial Intelligence on Medium

Reinforcement Learning on Ball Balancer Game from Unity

Designing and deploying cutting-edge AI solutions for manufacturing environments is a complicated process.Let’s see how we can build an RL environment for 3D Ball balancer.

ReinForcement Learning:

RL is a more complex and challenging method to be realized, but basically, it deals with learning via interaction and feedback, or in other words learning to solve a task by trial and error, or in other-other words acting in an environment and receiving rewards for it.

ReinForcement Learning
  1. Agent — the learner and the decision maker.
  2. Environment — where the agent learns and decides what actions to perform.
  3. Action — a set of actions which the agent can perform.
  4. State — the state of the agent in the environment.
  5. Reward — for each action selected by the agent the environment provides a reward. Usually a scalar value.
  6. Policy — the decision-making function (control strategy) of the agent, which represents a mapping from situations to actions.
  7. Value function — mapping from states to real numbers, where the value of a state represents the long-term reward achieved starting from that state, and executing a particular policy.
  8. Function approximator — refers to the problem of inducing a function from training examples. Standard approximators include decision trees, neural networks, and nearest-neighbor methods
  9. Markov decision process (MDP) — A probabilistic model of a sequential decision problem, where states can be perceived exactly, and the current state and action selected determine a probability distribution on future states. Essentially, the outcome of applying an action to a state depends only on the current action and state (and not on preceding actions or states).
  10. Dynamic programming (DP) — is a class of solution methods for solving sequential decision problems with a compositional cost structure. Richard Bellman was one of the principal founders of this approach.
  11. Monte Carlo methods — A class of methods for learning of value functions, which estimates the value of a state by running many trials starting at that state, then averages the total rewards received on those trials.
  12. Temporal Difference (TD) algorithms — A class of learning methods, based on the idea of comparing temporally successive predictions. Possibly the single most fundamental idea in all of reinforcement learning.
  13. Model — The agent’s view of the environment, which maps state-action pairs to probability distributions over states. Note that not every reinforcement learning agent uses a model of its environment


1> Download Unity

2> Open Visual Studio

git clone -- branch release_3

3> Environment Setup

Mac OS X Setup

  1. Create a folder where the virtual environments will reside $ mkdir ~/python-envs
  2. To create a new environment named sample-env execute $ python3 -m venv ~/python-envs/sample-env
  3. To activate the environment execute $ source ~/python-envs/sample-env/bin/activate
  4. Upgrade to the latest pip version using $ pip3 install --upgrade pip
  5. Upgrade to the latest setuptools version using $ pip3 install --upgrade setuptools
  6. To deactivate the environment execute $ deactivate (you can reactivate the environment using the same activate command listed above)

Ubuntu Setup

  1. Install the python3-venv package using $ sudo apt-get install python3-venv
  2. Follow the steps in the Mac OS X installation.

Windows Setup

  1. Create a folder where the virtual environments will reside md python-envs
  2. To create a new environment named sample-env execute python -m venv python-envs\sample-env
  3. To activate the environment execute python-envs\sample-env\Scripts\activate
  4. Upgrade to the latest pip version using pip install --upgrade pip
  5. To deactivate the environment execute deactivate (you can reactivate the environment using the same activate command listed above)

Once you build the environment run the following command:

pip install mlagents

5> Install ML-Agent package in Unity

Click the Package Manger.

And the in Advance section Search Ml-agent and select the version from dropdown 1.0.2

5> Open Unity

Once you open Unity Create a new project and paste the github below link folder there in Assets

Drag the Folder in Assets. Below is the Screenshot

6> Open the 3D Ball Agents in Unity

Click the 3DBall agent

It will redirect to

An agent is an autonomous actor that observes and interacts with an environment. In the context of Unity, an environment is a scene containing one or more Agent objects, and, of course, the other entities that an agent interacts with.

Note: In Unity, the base object of everything in a scene is the GameObject. The GameObject is essentially a container for everything else, including behaviors, graphics, physics, etc. To see the components that make up a GameObject, select the GameObject in the Scene window, and open the Inspector window. The Inspector shows every component on a GameObject.

The first thing you may notice after opening the 3D Balance Ball scene is that it contains not one, but several agent cubes. Each agent cube in the scene is an independent agent, but they all share the same Behavior. 3D Balance Ball does this to speed up training since all twelve agents contribute to training in parallel.

7> Training ML-Agent Reinforcement Learning in Virtual Environment


The Agent is the actor that observes and takes actions in the environment. In the 3D Balance Ball environment, the Agent components are placed on the twelve “Agent” GameObjects. The base Agent object has a few properties that affect its behavior:

  • Behavior Parameters — Every Agent must have a Behavior. The Behavior determines how an Agent makes decisions.
  • Max Step — Defines how many simulation steps can occur before the Agent’s episode ends. In 3D Balance Ball, an Agent restarts after 5000 steps.

Behavior Parameters : Vector Observation Space

Before making a decision, an agent collects its observation about its state in the world. The vector observation is a vector of floating point numbers which contain relevant information for the agent to make decisions.

The Behavior Parameters of the 3D Balance Ball example uses a Space Size of 8. This means that the feature vector containing the Agent’s observations contains eight elements: the x and z components of the agent cube’s rotation and the x, y, and z components of the ball’s relative position and velocity.

Behavior Parameters : Vector Action Space

An Agent is given instructions in the form of a float array of actions. ML-Agents Toolkit classifies actions into two types: continuous and discrete. The 3D Balance Ball example is programmed to use continuous action space which is a a vector of numbers that can vary continuously. More specifically, it uses a Space Size of 2 to control the amount of x and z rotations to apply to itself to keep the ball balanced on its head.

Training a new model with Reinforcement Learning

While we provide pre-trained .nn files for the agents in this environment, any environment you make yourself will require training agents from scratch to generate a new model file. In this section we will demonstrate how to use the reinforcement learning algorithms that are part of the ML-Agents Python package to accomplish this. We have provided a convenient command mlagents-learn which accepts arguments used to configure both training and inference phases.

Training the environment

  1. Open a command or terminal window.
  2. Navigate to the folder where you cloned the ml-agents repository. then you should be able to run mlagents-learn from any directory.
  3. Run mlagents-learn config/ppo/3DBall.yaml --run-id=first3DBallRun.
  • config/ppo/3DBall.yaml is the path to a default training configuration file that we provide. The config/ppo folder includes training configuration files for all our example environments, including 3DBall.
  • run-id is a unique name for this training session.

If you want to run again please run the below command which will restart the training.

mlagents-learn config/ppo/3DBall.yaml --force --run-id=first3DBallRun

When the message “Start training by pressing the Play button in the Unity Editor” is displayed on the screen, you can press the Play button in Unity to start training in the Editor.

If mlagents-learn runs correctly and starts training, you should see something like this:

Note how the Mean Reward value printed to the screen increases as training progresses. This is a positive sign that training is succeeding.

8> Observing Training Progress

Once you start training using mlagents-learn in the way described in the previous section, the ml-agents directory will contain a results directory. In order to observe the training process in more detail, you can use TensorBoard. From the command line run:

tensorboard --logdir=results

For Windows:

Instructions For Unity ML-Agents Setup:

  1. Install Python
  2. 2. git clone — branch release_3
  3. 3. Create A New 3D Unity Project
  4. 4. Go To Package Manager and install the ml-agents
  5. 5. Copy ml-agents\Project\Assets\ML-Agents to Your Assets Folder
  6. 6. Virtual Environment Setup
  7. 7. Activate your virtual environment .\[VirtualEnvName]\Scripts\Activate.ps1 or .bat or just activate
  8. 8. Type pip install mlagents
  9. 9. Type mlagents-learn .\config\ppo\3DBall.yaml — run-id=first3DBallRun
  10. 10. Go back to Unity and hit play and training should begin

Next Check out the Train RL Model of 3D Ball Balancer.

That’s it for today. Source code can be found on Github. I am happy to hear any questions or feedback. Connect with me at linkdin.