Source: Deep Learning on Medium
“Let’s adjust ”
We clearly don’t know everything. As a kid we learnt how to balance and walk, how to eat or as a matter of fact how much to eat and so on…
A first-time mother also doesn’t know how much she should feed her baby. The solution adopted is to feed baby a little and wait, if baby cries feed more. This continues until the mother learns how much food her baby actually needs. Even if she learns the quantity by experience, sometimes the baby is less hungry or sometimes more hungry depending on baby’s mood and mother understands this fact. She also understands the “normal”.
Mathematics deals with logic in a similar way. In a very simple case consider, If it requires 2 minutes for 20g coffee beans to grind in a machine, How much should it take for 200g?
10 times i.e 20 minutes. Yay! You know linearity.
That’s a linear model and it follows a linear equation.
It requires to find the magic factor (minutes required for 20g = 2)
Related technical guide
The information collected is called data.
In machine learning, it may or may not have a label. The label is quantitative/qualitative information we are interested to know and is usually a known quantity/quality for some kind information collected.
20g coffee took 2 minutes to grind.
40g coffee took 4 minutes to grind.
60g coffee took 6 minutes to grind.
The ‘number of minutes’ is the label. Instead of time we might have a coffee-type as a qualitative information.
The multiplying factor is the magic formula
(coffee in grams)*(magic number)=time taken to grind
A forward pass is calculation of the quantitative/qualitative information we are interested to know(label) by our linear magic formula.
Weight is the multiplying factor of the magic formula. Usually is well explained by perceptron model.
Sometimes baby is less hungry or more hungry depending on his or her mood. Coffee might take more time to grind if the machine get’s heated.
The changes which differ from “normal” are called errors or losses.
Our mission is to minimize the difference from “normal”.
The explanation above might be really naive but I suggest reading references for more technical depth:
Nevertheless reading notebook attached shall give more intuition even if the guide is skipped.
Related technical guides.
So what’s going on above?
- The machine starts with a random guess, w=random value and checks against all values we give to w
- appends loss for every weight and plot loss and weight
- why? so we know what is our magic number or weight. weight which gives us minimum loss. This is well explained by pictures
About the Author
I am venali sonone, a data scientist by profession and also a management graduate.
This series is inspired by failures.
If you want to have a talk about short 5 years or 50 years, the latter indeed requires something challenging enough to keep spark in your eyes.