Andrew Ng’s Machine Learning course at Coursera has become the starting point for a lot of people trying to get into machine learning and artificial intelligence. Below is a cheat sheet I made while doing the course.

Pro Tip: Print this out and put it on your desk or place of work for quick reference

**Machine learning Algorithms**

Supervised learning: Logistic Regression, Linear Regression, Neural Nets, Support Vector Machines

Unsupervised Learning: K-Means Clustering, Principal Component Analysis, Anomaly Detection

**Linear vs Logistic Regression**

Linear Regression: In linear regression, the outcome (dependent variable) is continuous. It can have any one of an infinite number of possible values.

Logistic Regression: The outcome (dependent variable) has only a limited number of possible values.

**When to use SVM vs Logistic Regression**

n = number of features

m = number of training examples

· if n is large (relative to m) = use logistic regression on SVM without a kernel (linear kernel)

e.g. n = 10,000 , m = 10–1000

· if n is small, m is intermediate = use SVM with Gaussian kernel

e.g. n = 1–1000, m = 10–10,000

· if n is small, m is large = use logistic regression or SVM without a kernel (linear kernel)

e.g. n = 1–1000, m = 50,000 +

Neural nets can work too but will be slower to train

**Neural Nets**

Number of input layers = number of features

Number of output layers = number of

**Principal Component Analysis**

Summarizes features by giving it a new characteristic. The one of the main goal of the multivariate analyses (like PCA) is to decrease the dimensions (variables=coordinates) but keep the most of the variance of the data.

**Cost/Loss function**

Describes how well the model fits the data. It’s included in most algorithms.

**Regularization (e.g. Gradient Descent)**

Use to solve overfitting. Add more features with the objective function. Hence, it tries to push the coefficients for many variables to zero and hence reduce the cost function.

**Overfit/Underfit**

Overfit means fits all points on graph, underfit means fits a few points on graph

**Back propagation**

Back propagation is just a special name given to finding the gradient of the cost function in a neural network.

**Cross-validation**

Spitting data into training and test sets e.g. k-fold cross-validation

**Errors (1&2 w/ example)**

Type 1 Error: A man is pregnant

Type 2 Error: A pregnant woman is not pregnant (more important)

**Dimensionality reduction**

Reduces number of features (loss of information)

Source: Deep Learning on Medium