Source: Deep Learning on Medium
Reviewing Machine Learning Notes — Part 1
At the end of the day, → there are patterns in data and human behavior → that ML is doing is just finding those patterns and predicting them → since they are how we act in this world → this can be automated (to some degree).
The 1 million dollar Netflix challenge → this is one of the use case → predict movies. (if I am going to like this movie or not)
Face recognition and communication are just one another examples. (image retrieval → Google Lens → is one of the applications → hence computer vision + machine learning will create a new economy that will lead to billions of dollars).
In a nutshell, we are going to lean a function → that maps a relationship!
Not to mention → that there are multiple methods of optimization → but also there are multiple ways of learning and metrics. (classification/clustering discrete ID — while regression dimension reduction → continuous).
At the end of the day, → we need to take a look at → how the knowledge is represented → if they are represented in a format that is able to generalize.
ML is just a boiler plot of a different area of study LOOOL
Accuracy → and precision and recall and F scores. (these are trick and easy questions). (maybe even norms of vectors as well → normalized versions of vectors).
And singular value decompositions.
The model’s parameter can be biased → think in one way → if this do that → this can be understood as a simple if statement → super bias → but it also can have high variance → which means → WANNA fits everything.
In this term, → the bias means bit something different → but in general, → it means we are already wanting something.
At some point, the training error → becomes zero → while → we are going to have to face → the test error is huge! (we can use linear regression → which has a closed-form solution).
One way to optimize this is to use gradient descent → here what we are going to do is → find where the loss function is the smallest → move there → but if we have closed-form solution → we can just use that.
But computing pseudoinverse can be hard → and sometimes inverse might not even exist. (use gradient).
When we have only one feature point → we can square them or cube them to get more features → this is called feature mapping.
How to reduce bias → using more complex classifier → get more features.
RANSAC can be used for detecting outliers → do not always have to use them thou!
Now Data and Features!
Classification of data → are another important aspect of technology.
Navie Bayes → here we are going to see → which part of pixels are on → and make the prediction. (naive since we are going to assume that those are independent)
Bit of complicated decision models → here → there are branches → of classification power → at each level lets make a classifier → but what features are important?
Also, decision trees → do not care about scale difference
It can really get complex and hard to decide. (can overfit as well).
Entropy is the way to go → calculate the impurity.
Good information gain is the name of the game.
Multiple Classifiers → that will learn different representations → gain more accuracy → make sure that they are independent!
And bagging is how to do
Or we can have multiple weak classifiers → that are connected one by one → by the chain. (this is good as well).
Each of them is classifiers → and as one of them predicts face we can be more certain.