Reviewing Machine Learning Notes — Part 1

Source: Deep Learning on Medium

Reviewing Machine Learning Notes — Part 1

At the end of the day, → there are patterns in data and human behavior → that ML is doing is just finding those patterns and predicting them → since they are how we act in this world → this can be automated (to some degree).

The 1 million dollar Netflix challenge → this is one of the use case → predict movies. (if I am going to like this movie or not)

Face recognition and communication are just one another examples. (image retrieval → Google Lens → is one of the applications → hence computer vision + machine learning will create a new economy that will lead to billions of dollars).

In a nutshell, we are going to lean a function → that maps a relationship!

Not to mention → that there are multiple methods of optimization → but also there are multiple ways of learning and metrics. (classification/clustering discrete ID — while regression dimension reduction → continuous).

At the end of the day, → we need to take a look at → how the knowledge is represented → if they are represented in a format that is able to generalize.

ML is just a boiler plot of a different area of study LOOOL

Accuracy → and precision and recall and F scores. (these are trick and easy questions). (maybe even norms of vectors as well → normalized versions of vectors).

And singular value decompositions.

The model’s parameter can be biased → think in one way → if this do that → this can be understood as a simple if statement → super bias → but it also can have high variance → which means → WANNA fits everything.

In this term, → the bias means bit something different → but in general, → it means we are already wanting something.

At some point, the training error → becomes zero → while → we are going to have to face → the test error is huge! (we can use linear regression → which has a closed-form solution).

One way to optimize this is to use gradient descent → here what we are going to do is → find where the loss function is the smallest → move there → but if we have closed-form solution → we can just use that.

But computing pseudoinverse can be hard → and sometimes inverse might not even exist. (use gradient).

When we have only one feature point → we can square them or cube them to get more features → this is called feature mapping.

How to reduce bias → using more complex classifier → get more features.

RANSAC can be used for detecting outliers → do not always have to use them thou!

Now Data and Features!

Classification of data → are another important aspect of technology.

Navie Bayes → here we are going to see → which part of pixels are on → and make the prediction. (naive since we are going to assume that those are independent)

Bit of complicated decision models → here → there are branches → of classification power → at each level lets make a classifier → but what features are important?

Also, decision trees → do not care about scale difference

It can really get complex and hard to decide. (can overfit as well).

Entropy is the way to go → calculate the impurity.

Good information gain is the name of the game.

Multiple Classifiers → that will learn different representations → gain more accuracy → make sure that they are independent!

And bagging is how to do

Or we can have multiple weak classifiers → that are connected one by one → by the chain. (this is good as well).

Each of them is classifiers → and as one of them predicts face we can be more certain.