Original article was published on Deep Learning on Medium

## Statistics for Data Science and Machine Learning

# Central Limit theorem

## Inferential Statistics

Central limit theorem states that the sampling distribution of the sample means(take the means of batches) approximates the normal distribution, having any distribution on the population.

This theorem is one of the most important, we can use it to make inference based on the Normal distribution, the easiest to analyze, regardless of the population distribution.

# Theorem

Let ** X1, . . , Xn** be a random sample from some population with mean

**and**

*μ*variance ** σ2**. Then for large

*n:*The mean of X is approximately normally distributed with mean

and varianceμσ2/n.

That means that having any distribution if we take random samples with **replacement** and calculate the mean of each sample, the distribution of the result means will approximately be a normal distribution.

A random sample with replacement means that we can take all values in all sames, one value can be in all the samples.

There’s no rule for the sufficient size of original data and the random samples that we need to use, but statisticians often repeat the rule of sample size larger than 30. But in cases of hard skewness, these values can reach 40 or more.

The number of elements for each sample will depend on each case and there’s no rule to define it, so we need to try distinct values.

Central Limit Theorem can be used on continuous and dichotomous data, let’s make some examples.

# Examples

## Left Skewed Data

The first example is applied over a left-skewed data, the original data is from a population of 1000 individuals and has the following skewness:

Let’s try to apply **Central Limit Theorem** to get a normal distribution, we will try out distinct sizes, we will start with ** n** = 10:

As we can see the result is similar to a normal distribution, let’s check it with the ** Shapiro Test **(Test used to check if a distribution is normal, it will be explained in next posts):

Statistics=0.9887740015983582, p=6.264754119911231e-07

Sample does not look Normal (reject H0)

The results are quite promising, but it does not pass the Shapiro Test, let’s increase the size of ** n** to be 15.

After increasing the size of samples, we’re passing the Shapiro Test with a 99% confidence:

Statistics=0.9943469762802124, p=0.000821537512820214

Sample does not look Normal (reject H0)

After some try and error, you will be able to reach a normal distribution and use it as data for your models.

**Summary**

This post introduces ** THE THEOREM **that allows data scientists to work with a distribution with the best properties.

Normal Distributions gives data scientists a lot of tools easy to apply and needed for nearly all classic machine learning models.