Original article was published by Agustinus Nalwan on Artificial Intelligence on Medium
How AI Techniques Made Me a Better Parent for Our Toddler
I used the knowledge and wisdom I gained from my work in Artificial Intelligence to understand and teach my two-year-old son more effectively and regain my sanity.
The knowledge I have gained from building Artificial Intelligence (AI) tech is directly applicable to raising my toddler. Not only does it help me teach him more effectively, but it also helps me understand why he does naughty things, which I had previously thought he did merely for the sake of torturing us. As I explore such ideas in this blog post, I will also give a brief overview of relevant AI projects to help illustrate my analogy.
My AI Related Work
I have been working in AI for more than five years, and at the risk of sounding creepy, my love of AI is still as strong as it was that first day when I discovered that my image-classification model, Cyclops, was able to automatically categorise photos of cars taken at 27 different angles (the front, side, rear, dashboard, wheels and so on) with 97.2% accuracy.
AI tech provides solutions to problems you could not have solved before by giving you a completely new approach. Rather than dictating every single instruction with if-else statements five levels deep, you simply allow the machine to learn from sets of inputs and expected outputs. Although a fully-trained AI model still consists of hundreds or even thousands of if-else statements, they are automatically created during training, so we do not need to pull our hair out designing and maintaining them. We simply retrain the model with new datasets when there are changes in requirements. Your data is the algorithm, which, simply put, is quite elegant.
New Parenting Experience
Two years ago, our son Dexie was born. For a few challenging months, our world was turned upside down as his three-hourly feeding regime ensured many sleepless nights.
As Dexie got older, the challenge slowly shifted from direct life support such as feeding and burping to how to teach him to do things on his own: putting on clothes, riding a bicycle, not eating his socks and not falling off the dining table.
Always with a mind of his own, he does not always listen. In fact, 80% of the time, he does exactly the opposite of what we tell him. Initially, this was quite frustrating. Then, a few months ago, I discovered the many similarities between building AI tech and training Dexie.
This discovery has helped me understand why he does certain things and allows me to teach him more effectively.
Solving Nature by Learning from Tech
For thousands of years, humans have been copying nature to solve problems — airplane wings were inspired by birds and the sonar in submarines by bats.
Flood fill is a path-finding algorithm inspired by the movement of water, which always flows from higher to lower ground. Interestingly, this nature-inspires-human-technology phenomenon can work the other way around when we use technology to solve natural problems — such as raising a toddler.
The Importance of Negatives Samples
Dexie is no different from other toddlers his age. He runs around climbing up on tables, kitchen bench-top and cabinets. He puts flowers, toys and spiders directly in his mouth. Trying to be protective, my natural reaction was always to stop him from doing these dangerous things. Of course, this only amplified his desire to do them. One day, as I was shooing Dexie off the dining table, my dad smiled and said, ‘Let him fall.’ His three simple words made me realise how similar this situation was to training one of my AI model, and they became the catalyst for all the experiments covered in this blog post.
Late last year, my team at Carsales.com deployed an AI tech called Mystique to detect the presence of cars’ registration, or rego, numbers and blur them. Mystique protects sellers’ privacy from rego plate cloners who look for a car like theirs, print out the rego photo and stick it on their car to commit crimes.
At the core of Mystique is an object-detection AI model that looks for what is called a bounding box around where the rego plate should be in a photo. However, during pre-release tests, we discovered a high false-positive rate. The AI mistakenly blurred objects that were not rego plates, such as odometer counters and manufacturer badges. Further investigation revealed the root cause to be that our training set did not have images of cars without rego plates, in other words, negative samples. Negative samples were important in this case because they allowed our AI model to learn how to distinguish objects that looked like rego plates but were not.
Using some tricks to overcome an object detection architecture limitation, we re-trained our model with a mix of positive and negative samples that significantly reduced the false-positive rate.
How does this relate to a toddler’s learning process? Humans learn from the positive and negative experiences we encounter in life from the moment we are born. The outcome of each of these experiences provides a feedback loop that helps us improve our decision-making in the future. In practise, this means we will avoid an action like one that yielded a negative outcome in the past, and vice versa for a positive one.
Suppose we plot all these positive and negative actions in green and red on a 2D space (Chart 1 below). You will notice a decision-boundary line separating them with positive actions on the left and negative on the right. It is the positioning of this line that we are constantly improving with our feedback loops. With more positive and negative outcomes, we get better at discerning future positive and negative actions.
Now, suppose I only allowed Dexie to learn from positive experiences (Chart 2). As he has never experienced any negative outcomes, he only knows how to position his decision-boundary line to make sure that all positive samples are on the left. Should he be presented with negative actions, he will misclassify four of them as positive. Then, we add some negative samples (Chart 3), allowing Dexie to adjust his line, which results in only three misclassifications of negative samples. Finally, by adding more negative samples (Chart 4), he attains zero misclassifications.
And indeed, this translates directly to teaching Dexie. He cannot experience only actions with positive outcomes. He needs the negative samples, too… E.g. Falling from the table.
I have started letting him experience negative outcomes to a reasonable extent. He still climbs on the dining table. However, after his first fall, I now notice he is more careful not to stand too close to the edge. Falling off a second time made him even more careful to stay even further from the edge. Dexie is perfecting his decision-boundary line.
The Importance of Exploration
I am sure many parents out there can relate to this: no matter what you tell your toddler to do, he will do the exact opposite. Like many of you, I was scratching my head trying to find the reason why. As it turns out, I discovered a plausible answer from another process of AI training — the epsilon parameter.
The most popular AI techniques is supervised learning, which is quite mature and well understood. Mystique object detection is an example of this, where we trained our AI model with labelled training sets. Recently, a new technique called reinforcement learning (RL) has started gaining popularity, especially in games and robotics, due to its ability to learn by trial and error. It can perform exploration by randomly picking an action and measuring its success from the reward or punishment provided. The resulting feedback loop from exploration leads to exploitation, an ability to choose similar actions to what it has seen in the past that will yield bigger rewards.
What is interesting about RL is that we can control how much exploration or exploitation an AI uses to pick an action via an epsilon parameter. Exploration allows an AI to discover a better action outside what it has learned from its past training sets and adapt to changes in trends more swiftly. Epsilon = 0.3 means that the AI is selecting actions randomly with 30% probability, regardless what rewards those actions may deliver.
What we are seeing with rebellious toddlers (and teenagers) is RL in real life, with epsilon value correlating to the degree of naughtiness from the parents’ point of view. The higher the epsilon, the naughtier and more rebellious they seem.
The exploration factor is also critical in nature. Without exploration, children would never try anything different from what they were taught. Future generations would fail to discover anything new, leading to an intelligence decline in society. A reverse evolution. In fact, at a higher level, exploration plays a critical role in evolution, constantly conducting experiments by creating different species with various skills and physical attributes. Survival of the fittest acts as a reward function to the system to improve its exploitation skill to encourage the breeding of similar species which tend to be more successful.
I feel a little better now when Dexie pours water all over the floor after I screamed from a distance not to. Dexie’s brain’s RL module just decided to do an exploration. At the same time, I do need to act swiftly to give him feedback with a negative reward — punishment. He may dump water on the floor a few more times, but I must be patient and constantly provide the negative feedback. After all, an AI model requires a lot more than just one sample for a noticeable change in its output.
A few weeks ago, we started to teach Dexie how to ride a balance bike, a good starting point for learning to ride a real bike. As you can see in the picture, it is a bike without pedals. We were trying to instruct Dexie to propel himself using his own feet as if he were walking. We tried to tell him (with some difficulty, given the limited linguistic ability of a two-year-old) that once he could do that, he would be able to move faster by repeatedly propelling himself forward and quickly lifting both feet off the ground.
Once again, an AI technique came to the rescue. As previously discussed, reward function is a critical component to the success of RL. It provides a feedback loop to the series of actions picked by the AI model to assess whether they yield positive or negative outcomes. For an RL AI model that learns to play Super Mario games, the reward function is designed to give a more positive reward the quicker Mario completes the course. Negative feedback is given when Mario falls into a pit and dies. As you can see, this reward function is designed at a higher level than merely giving a reward for each discrete action Mario performs (walking, jumping, etc.). This allows the AI to explore by itself which series of actions yield bigger rewards without the need for micromanagement.
Applying this knowledge to Dexie’s bike training, I stopped giving him constant instructions. I had the reward function I needed — getting Dexie to move faster. The only problem was that, unlike an AI that will always accept whatever reward function you give it, a toddler will not. An incentive is required. So, since the goal was for Dexie to pick up his feet and ride faster, we got him to play a chasing game with us, as his desire to catch us perfectly aligned with the reward function.
To improve his skills further, we now get Dexie to play with neighbourhood kids who are already balance bike experts. This allows Dexie to mimic their actions, which enables him to move faster. It is similar to an imitation learning technique used to train a reinforcement learning model.
The Mighty Power of the Human Brain
I have gained so much wisdom from these experiments that it has definitely made me a better parent and helped to preserve my sanity. Though there are many similarities between raising a toddler and training an AI, I now realise that there are also significant differences, including just how much AI technology needs to improve to be on par with humans.
The human brain is a generic thinking machine that is able to learn many different disciplines, like language, visual recognition and mathematics. We make decisions by using information we have gleaned from these knowledge domains. Even if we have never seen the dog species in the photo below, we know that it is a dog (even though it looks like a mop) because our knowledge about human lifestyles tells us that people walk their dogs and not their mops. An AI object detection model would not be able to make this distinction if it had never seen this type of dog in its training set, since its only source of knowledge is visual.
The ability to tap into our knowledge graphs enables us to make better decisions, even when facing considerable uncertainties. What makes our brains even more amazing is that we constantly use these knowledge graphs to make sense of, digest and verify new information while extracting high-level abstractions that further enhance our knowledge graphs. And when dealing with ambiguity, we can even ask for more details with ‘Why?’
Answering this question creates additional feedback loops that make our knowledge graph more robust. It becomes easier to gain even more knowledge with a higher degree of reliability. This process is called reasoning, and it is currently missing in AI tech. No one knows how to build an AI with the ability to reason yet, so do not worry about all that ‘AI is taking over humanity’ hype. It is not going to happen for at least another 50 years.
That’s all folks. I hope this story resonates with all of you parents out there who are technology enthusiasts, like me.