Understanding PCA and T-SNE intuitively

Original article can be found here (source): Artificial Intelligence on Medium

Understanding PCA and T-SNE intuitively

In this Article, I hope to present an intuitive way of understanding dimensionality reduction techniques such as PCA and T-SNE without dwelling deep into the mathematics behind it.

Dimensionality Reduction

Often in the real world cases, we encounter datasets with very high number of dimensions (read thousands). If you are not aware of what dimensions are, they are stuffs that help identify data or are attributes of the data. For ex. if we take a person, we could represent his/her height,weight,skin color,age,etc as their dimensions. The words dimensions and features are used interchangeably. Dimensionality reduction is important because:

Humans can visualize only up-to 3-dimensions (at least up until there is an alien invasion)

Often times, training a ML model on all the features would be computationally expensive

Lost in Higher Dimensions Source: link

Principal Component Analysis:

As mentioned earlier, in real world we deal with data having multiple dimensions. It would make much sense to reduce the dimensions. PCA is one of the dimensionality reduction techniques that is widely used. To understand PCA, lets first understand few jargons.

Variance: It measures how spread out our data is along any given dimension. It can be mathematically defined as Average squared deviation from the Mean.

Covariance: It measures the linear relationship between our data i.e. the value is positive if x increases and y also increases and negative if x increases and y decreases and near zero if there is no linear relationship to be found.

Covariance matrix is a symmetric matrix, where the diagonal elements are the variances of features and covariance of pairs of dimensions are the off-diagonal elements. Now, in layman terms we would want to reduce the dimensions such that the most amount of information is preserved. Geometrically, this means we should preserve the features that has high variance (or has the most spread of data). PCA does exactly this! It creates new features such that the maximum variance is preserved. PCA uses Eigen Vectors to accomplish this. In PCA, we find Eigen vectors for the Covariance matrix and the top n number of Eigen vectors (read new features/dimensions) where value of n depends on the amount of information we choose to prefer.

After, PCA we get a set of features that are orthogonal to each other which means they are linearly independent. This is done by creating new features that are linear combinations of the original features in the dataset. Additionally, we also get the off-diagonal elements to be zero as covariance among linearly independent features is zero.

Transformed features along green line after PCA Source: link

Steps to be followed for PCA:

  1. Normalize the Dataset and calculate its Covariance matrix X.
  2. Find its Eigen Vectors and Eigen Values.
  3. For reducing to k-dimensions, sort and select the Eigen vectors corresponding to top k Eigen values.
  4. Transform the n-dimensional data to new k-dimensions.

Note: We could choose the number of dimensions we want based on the percentage of variance we want to preserve. For ex, if we want 90% of the information to be preserved we could choose k such that: sum of k Eigen values/sum of n Eigen values = .9