# How to Verify the Distribution of Data using Q-Q Plots?

Original article was published by Satyam Kumar on Artificial Intelligence on Medium

Given a random distribution, that needs to be verified if it is a normal/gaussian distribution or not. For understanding, we will name this unknown distribution X, and known normal distribution as Y.

## Generate unknown distribution X:

`X = np.random.normal(loc=50, scale=25, size=1000)`

we are generating a normal distribution having 1000 values with mean=50 and standard deviation=25.

## Find 100 percentile values:

`X_100 = []for i in range(1,101):    X_100.append(np.percentile(X, i))`

Compute each integral percentile (1%, 2%, 3%, . . . , 99%, 100%) value of X random distribution and store it in X_100.

## Generate known random distribution Y and its percentile values:

`Y = np.random.normal(loc=0, scale=1, size=1000)`

Generating a normal distribution having 1000 values with mean=0 and standard deviation=1 which need to be compared with the unknown distribution X to verify if X distribution is distributed normally or not.

`Y_100 = []for i in range(101):    Y_100.append(np.percentile(Y, i))`

Compute each integral percentile (1%, 2%, 3%, . . . , 99%, 100%) value of Y random distributions and store it in Y_100.

## Plotting:

Plot a scatter plot for the above obtained 100 percentile values of unknown distribution to the normal distribution.

Here X — is the unknown distribution that is compared to Y — normal distribution.

For a Q-Q Plot, if the scatter points in the plot lie in a straight line, then both the random variable have same distribution, else they have different distribution.

From the above Q-Q plot, it is observed that X is normally distributed.

## What if both the distributions are not the same?

If X is not normally distributed and it has some other distribution, then if the Q-Q plot is plotted between X and a normal distribution the scatter points will not lie in a straight line.

Here, X distributed is a log-normal distribution, which is compared to a normal distribution, hence the scatter points in the Q-Q plot are not in a straight line.

## Let us have some more observation:

Here are 4 Q-Q plots for 4 different conditions of X and Y distribution.