Andrew Ng’s Machine Learning course at Coursera has become the starting point for a lot of people trying to get into machine learning and artificial intelligence. Below is a cheat sheet I made while doing the course.
Pro Tip: Print this out and put it on your desk or place of work for quick reference
Machine learning Algorithms
Supervised learning: Logistic Regression, Linear Regression, Neural Nets, Support Vector Machines
Unsupervised Learning: K-Means Clustering, Principal Component Analysis, Anomaly Detection
Linear vs Logistic Regression
Linear Regression: In linear regression, the outcome (dependent variable) is continuous. It can have any one of an infinite number of possible values.
Logistic Regression: The outcome (dependent variable) has only a limited number of possible values.
When to use SVM vs Logistic Regression
n = number of features
m = number of training examples
· if n is large (relative to m) = use logistic regression on SVM without a kernel (linear kernel)
e.g. n = 10,000 , m = 10–1000
· if n is small, m is intermediate = use SVM with Gaussian kernel
e.g. n = 1–1000, m = 10–10,000
· if n is small, m is large = use logistic regression or SVM without a kernel (linear kernel)
e.g. n = 1–1000, m = 50,000 +
Neural nets can work too but will be slower to train
Number of input layers = number of features
Number of output layers = number of
Principal Component Analysis
Summarizes features by giving it a new characteristic. The one of the main goal of the multivariate analyses (like PCA) is to decrease the dimensions (variables=coordinates) but keep the most of the variance of the data.
Describes how well the model fits the data. It’s included in most algorithms.
Regularization (e.g. Gradient Descent)
Use to solve overfitting. Add more features with the objective function. Hence, it tries to push the coefficients for many variables to zero and hence reduce the cost function.
Overfit means fits all points on graph, underfit means fits a few points on graph
Back propagation is just a special name given to finding the gradient of the cost function in a neural network.
Spitting data into training and test sets e.g. k-fold cross-validation
Errors (1&2 w/ example)
Type 1 Error: A man is pregnant
Type 2 Error: A pregnant woman is not pregnant (more important)
Reduces number of features (loss of information)
Source: Deep Learning on Medium