Original article was published on Artificial Intelligence on Medium
This post is a part of a series about feature engineering techniques for machine learning with python.
You can check out the rest of the articles:
Welcome to another article in our series on feature engineering! In this post, we’re going to discuss the different transformations you can apply to your variables in a given dataset.
Specifically, we are going to explain mathematical transformations expected by linear models—they assume that the variables follow a normal distribution.
Why These Transformations?
Some machine learning models, like linear and logistic regression, assume that the variables follow a normal distribution. More likely, variables in real datasets will follow more a skewed distribution.
By applying a number of transformations to these variables, and mapping their skewed distribution to a normal distribution, we can increase the performance of our models.
In the Q-Q plots, if the variable follows a normal distribution, the variable’s values should fall in a 45-degree line when plotted against the theoretical quantiles.
Here’s the code snippet in Python to generate the previous plot: