Monthly Newsletter 09/10/2020

Original article was published by Ahsen Khan on Artificial Intelligence on Medium

Monthly Newsletter 09/10/2020

In this newsletter will be talking about my adventures with creating my first AI algorithm from scratch in Unity3D.

I had always wanted to create an AI algorithm, but had no idea how they worked. Then, when I saw a video in which the person created an AI robot, which was controlled by an array of numbers. At this, I suddenly had an idea on how I could create my own machine learning algorithm, with the knowledge I had. At this time, I had no idea how neural networks worked, so I decided to start on this project.

I wanted to create an unsupervised learning algorithm, so that I could let the algorithm teach itself, rather than looking at the output and telling it what it did wrong, every iteration. So I decided to create an agent which would be trained to make its way around a track as fast as possible. I chose this because I knew I could create checkpoints along the track to automatically set the agent’s reward/punishment, and I also wanted a program that could be placed in a different environment and perform similarly well.

For days, I wrote down small snippets of code that I would need for the algorithm, on paper, whenever I had an idea about it. Soon, I had 3–5 post-it notes on my desk, so I decided to start working on this project.

First, I modelled the track in Fusion 360, and exported it as a .obj for Unity.

After that, I created the code for the AI. There are few steps for how the AI works. First of all, the main code looks for any saved information of previous AI training. If none is found, the array (named “current”) is defaulted to 0, 0, 0, 0, 0, 0, 0. The first digit in the array is reserved for the agent’s score, and the rest control how it behaves. Then, a random one of those numbers is selected, and for one agent (alpha), that digit is increased by the set increment value, and the other agent (beta), has its digit decreased by the increment. The two agents then go through the track, and are rewarded depending on how many checkpoints they go through, and punished if they go through checkpoints backwards (I used a function called to figure out the side of the checkpoint the player was on).

After 20 seconds, the agent with the lower score is deleted and the “current” array is set to the controller array of the agent with the higher score.

Alpha agent on the right, Beta agent on the left