Demystified — Machine Learning

Original article was published by Yash Gupta on Artificial Intelligence on Medium

Consider that you have to get yourself a chocolate! Let’s say a milk chocolate worth $2. You are a teenager who along with 50 of your classmates want to understand if a person (teenager, in this case) will make the purchase decision (to buy or not to buy) or not based on some metrics.

Chocolates (reference picture)

You note things such as age, height, weight, BMI, pocket money, price of the chocolate, the number of friends you have, the pocket money of your best friend, number of siblings, medical history, recent purchases of chocolates, marks in an exam, followers on Instagram or how often you get your pocket money in a week etc. for yourself and all the other people you are studying (classmates, in this case).

Now you must be wondering how do my followers on Instagram affect my purchase decision? Well, we won’t really know if it affects it or not unless we try to study it and analyze it for the presence of a pattern. Data can sometimes tell us things that we never would’ve expected.

When your metrics are in place, you study it to find if it has an impact on your outcome of the purchase decision and keep the metrics that do make an impact (to avoid things that don’t really matter). You try to remove any possible unknown values and outliers or anomalies; which in simple terms would be an old person’s data entering your entries which relate to you and your friends who are possibly just teenagers.

On a given list of metrics and how their combination affects your purchase decision, you might see your best friend and predict if he/she will make the purchase based on his/her metrics and how they fit in the data you have. We then try to explain the same things to the machine and teach it that when the data is in these combinations, this is possible outcome for all of your data (which is termed as the Training Data).

The machine then tries to keep these patterns in it’s memory for any future predictions to be made. (which is the Test Data unknown to the machine)