Original article can be found here (source): Artificial Intelligence on Medium
Teaching Machines to Learn
Let me paint a picture for you.
It’s a Monday morning, and you’re sitting in a classroom listening to your sixty-something year old teacher scream at you about how ‘when they were your age’, they had to:
- Walk to walk fifteen kilometers in the snow to get to school, while travelling uphill (both ways, of course 💪)
- Go to the library and read thousands of pages to finish an assignment
- Watch paint dry because they didn’t have any other method of entertainment
… The list goes on. As annoying as it is continuously listen to your teacher complain about how ‘kids these days have it so easy’, there is one valid point to his/her argument.
🔑 Our lives have been made fundamentally more efficient because we’ve created technology that enables us to do more, in an easier way 🤖
I mean, come on, the rocket that sent us to the moon in 1969 is literally less powerful than our modern-day calculators 🤯
This growth wasn’t linear. As we technology gets more and more advanced, we progress faster and faster. Right now, we see machines automating jobs and industries left and right. But, what would happen if we gave machines the ability to learn? The ability to copy human skills and habits?*
*Okay okay okay, so that sounds a bit sketchy, but right now, we’re nowhere near creating an artificial general intelligence → Read about that here 😉
👋 Say ‘Hello’ to Machine Learning
Machine learning is a subset of AI that has huge implications for our future. So many industries are going to be and are currently being impacted by this. In a nutshell:
We’re teaching computers to learn from patterns, inference, and trends in order to give certain outputs, without needing to be given 100% direct instruction
Eventually, we want computers to be able to learn without the help of humans.
Machine learning isn’t one concrete system though. Multiple forms of machine learning exist but are linked by the notion that they are trying to find connections between inputs and outputs.
Supervised Learning is when computers learn from a labelled data set. Throughout the process, a ‘supervisor’ will point out errors and mistakes within the computers decisions.
The computer is making random guesses, which are marked as either correct and incorrect. The machine learns from the correct answers and uses this to improve it’s pattern recognition and output identifying skills.
Let’s go a bit deeper though.
The goal of machine learning is to get computers to learn, in a sense, like humans. In order to get to this point, computers need to behave somewhat like the human brain, and so, they’re programmed to act like ‘artificial neurons’.
In order to start, a mapping function is created, at y = f(x). The artificial neuron is given inputs (x), which are then multiplied by different weights. These weights all correspond to the strength of the signal. However, unlike with the brain, these weights can vary.
Wait… aren’t neurons supposed to have one set weight???
Although weights can range, they all have a specific threshold, which is represented by a special kind of weight called a bias. This can be adjusted in order to change how likely the ‘neuron’ is to fire. All the inputs are multiplied by their individual weights to get to give an output (y).
These outputs are usually used to correlate to a ‘true or false’ statement (note: this is a generalization, and is a bit different when working with multi-variable datasets), typically correlating to whether the output given is above or below a certain number.
Since the dataset is labelled, the computer gets feedback on whether or not it’s random guess was correct. It learns from trial and error.
It’s kind of like learning to how to do a cartwheel. You’re shown the proper way to do it, and from there it’s just trial and error. You see what works, and what doesn’t. Eventually, you’re able to get your gymnastics on 🤸♀️.
Now, that’s a general overview, but, we can go even deeper. Supervised learning branches off into two main types of algorithms: Regression and Classification
Like the name suggests, you’re trying to classify objects with this type of algorithm. The most common application of this is with spam detection, but it can be seen within a variety of scenarios.
Awww, such cute animals. Now, it’s easy for us to tell that one is a cat and one is a dog, but to a computer, those words mean nothing. If we input one of them into a classification algorithm, the computer should be able to spit out whether it thinks it is a cat or dog.
It’s trained with a data set of labelled images, half of them being cats, and the other being dogs. The properties of both images would labelled and classified by their respective weights and inputted into the function.
At this point, in order to be classified, they would have to be above or below a certain value. But the computer doesn’t know what this is yet. It makes random guesses and learns from trial and error.
Values above the line could correlate to dogs while values below the line could correlate to cats. At first the guesses are random, like in the photo above, but eventually, the weights adjust to create a more accurate classification system.
They usually aren’t 100% accurate, but then again, most things aren’t 😉. Machine learning algorithms are trying to maximize correct answers and minimize incorrect answers, since there’s bound to be some overlap.
To put it simply, with regression algorithms, we’re trying to find connections between independent and dependant variables.
Let’s say you want to figure out how high a person can jump depending on their height, and you enlist the help of your regression algorithm. First, you give it the labelled set of data.
Once the data is given, a regression line is drawn so the computer can make estimates based on this line 🤯.
Regression algorithms are used all the time. You’ll see it with financial forecasting, trend analysis, and even things like drug remodelling.
Supervised learning gets to chill with and learn from labelled data sets. But what if we took away the labelled outputs? How would the computer learn?
Doing that is basically like giving someone a list of numbers and saying: “Now, I have no idea what any of this means butttttttt, I’m sure you can figure something out”. This is what unsupervised learning has to go through 😅.
The most common form of unsupervised learning is through clustering algorithms. Here, the computer is grouping items by traits that it believes are similar.
Imagine you have a basket of fruit, and all the fruits look like some variation of: 🍓 or 🍒 or 🍇
Now, it’s easy for us to say that those fruits are strawberries, cherries, and grapes, but remember, that doesn’t mean anything to a computer. Clustering algorithms mean the computer is trying to find similarities between all these groups (which have also been modified by different weights and biases).
Unsupervised learning also has a variety of applications, but these are more limited to dealing with data and strengthening computer and Artifical Intelligence systems (i.e. training autoencoders).
This is where it gets juicy. Reinforcement learning is the process of learning by ‘doing’ and through trial and error. Now that sounds a lot like supervised learning, but there’s one 🔑 difference.
Imagine you’re teaching a computer how to play snakes and ladders. If you used supervised learning, after every move you would tell the computer whether it made the right decision or not. With reinforcement learning, you can only tell the computer it made the right decision at the end of the game. If it did, great! You can reward it. If it didn’t, no reward 😔
This works well in teaching computers how to play games, but, there are bigger implications. With reinforcement learning, we can teach computers how to do tasks that we ourselves don’t even understand. You can see this already being integrated into robotics.
Meet Cassie. Cassie can walk like a human because she was trained using reinforcement algorithms. You see, there’s no 100% correct way to walk. It’s not like walking won’t work if we’re not walking at an exact, consistent speed with certain angles between our strides. There are different ways to walk.
This makes it perfect for reinforcement learning. It also means that we can’t train a computer through supervised learning because we won’t know if it’s doing the right thing until we get the final output, a walk.
Let’s make this a bit more visual. Imagine you want to devise a way to wake yourself up in the morning. You could set alarm, you could find a specific lighting lamp that wakes you up, or you could devise a complex pully system that ends up with you getting water thrown at you to wake you up. All of these systems achieve the same end goal, and technically, none of them are wrong. Inefficient maybe, but wrong? No. Here’s actually where the problem comes in.
When the computer does something correct, we reward it by sending a small positive signal, which is the computer equivalent of being given chocolate for going something well 😊.
But not everything the computer does is efficient as it could be, and it’s hard to determine which actions got us to the reward, and which ones were unoptimal. This dilemma is called credit assignment.
The dataset will continue to train and get used to it’s environment. Every time it achieves it’s reward, we can look back at the finished task and try and figure out which actions helped get to the positive signal.
At this point, we’re assigning values to those different actions and assigning a policy for which action works the best. This is what allows reinforcement learning to actually work. Let’s put this into action 🙌
This is Mr. Nay Nay. He is trying to get this bread. There are many paths he could take, but, he can only go up, down, left, or right.
Mr. Nay Nay decides to take the most complicated path possible. Nice one 🤦♀️. Now, he gets the bread, but, it took much longer and used much more effort than he would have liked. Mr. Nay Nay decides to continue practise finding bread.
After a bunch of training, Mr. Nay Nay has gone through multiple paths. He assigns numerical values to the squares, with the higher values related to the squares that helped get him there. This creates a policy.
Now Mr. Nay Nay can follow the higher values. There are still multiple paths that he could take, but, they are much more efficient than what he was doing before. He can now efficiently obtain that grain 🤩.
Now it’s not always that simple. AI agents (the things that are being trained) can go through a couple paths and choose to engage in exploitation. This is where they decide to choose the most efficient path from what they’ve already trained and stick to it because they know it’s correct.
What Mr. Nay Nay did there to achieve all those values is go through exploration, where, although there is a risk of not always receiving the reward, there is also the benefit of doing it more efficiently.
The key in most reinforcement learning problems is finding the balance between both exploration and exploitation.
This subset of ML also has huge impacts. It provides us with things like directions (thank God, I can’t use a map for my life), and impacts a variety of industries, such as healthcare, energy optimization, robotics, etc…