Cheat sheet for Andrew Ng’s Machine Learning course on Coursera

Andrew Ng’s Machine Learning course at Coursera has become the starting point for a lot of people trying to get into machine learning and artificial intelligence. Below is a cheat sheet I made while doing the course.

Pro Tip: Print this out and put it on your desk or place of work for quick reference

Machine learning Algorithms

Supervised learning: Logistic Regression, Linear Regression, Neural Nets, Support Vector Machines

Unsupervised Learning: K-Means Clustering, Principal Component Analysis, Anomaly Detection

Linear vs Logistic Regression

Linear Regression: In linear regression, the outcome (dependent variable) is continuous. It can have any one of an infinite number of possible values.

Logistic Regression: The outcome (dependent variable) has only a limited number of possible values.

When to use SVM vs Logistic Regression

n = number of features

m = number of training examples

· if n is large (relative to m) = use logistic regression on SVM without a kernel (linear kernel)

e.g. n = 10,000 , m = 10–1000

· if n is small, m is intermediate = use SVM with Gaussian kernel

e.g. n = 1–1000, m = 10–10,000

· if n is small, m is large = use logistic regression or SVM without a kernel (linear kernel)

e.g. n = 1–1000, m = 50,000 +

Neural nets can work too but will be slower to train

Neural Nets

Number of input layers = number of features

Number of output layers = number of

Principal Component Analysis

Summarizes features by giving it a new characteristic. The one of the main goal of the multivariate analyses (like PCA) is to decrease the dimensions (variables=coordinates) but keep the most of the variance of the data.

Cost/Loss function

Describes how well the model fits the data. It’s included in most algorithms.

Regularization (e.g. Gradient Descent)

Use to solve overfitting. Add more features with the objective function. Hence, it tries to push the coefficients for many variables to zero and hence reduce the cost function.


Overfit means fits all points on graph, underfit means fits a few points on graph

Back propagation

Back propagation is just a special name given to finding the gradient of the cost function in a neural network.


Spitting data into training and test sets e.g. k-fold cross-validation

Errors (1&2 w/ example)

Type 1 Error: A man is pregnant

Type 2 Error: A pregnant woman is not pregnant (more important)

Dimensionality reduction

Reduces number of features (loss of information)

Source: Deep Learning on Medium