Where is statistics used in data science?

Source: Deep Learning on Medium

If you are newbie to data science, learning programming for the first time and you came to know that mathematics in your 12th standard and bachelors is again used in data science, you feel like whaaat!!!( It felt like that for me). I had a vague overview of data science. I used to think where should i use math, I didn’t know exactly. If you are also feeling like that this article is for you.

Today i will write where is exactly is statistics is used in data science workflow. Statistics, in my point of view makes upto 30–35% or more. Because in data science or analytics understanding the problem is a main thing. If you want to build a DS model you need the basics of what math I am doing or what part of math like if you relating between any two concepts correlation(statistics) is useful or when assessing whether a customer clicks on an ad we use probability etc.

So a DS model requires math in every step of workflow.

I will discuss the situations where statistics is used in a DS problem.

Situation 1 :

AD 1 has a click rate higher than AD 2 (Say when both are advertising similar thing).

Statistics can be used to determine whether the difference in click rate effects the AD 2’s to AD 1 customers visiting the site, traffic , future aspects etc.

Problems arising due to experimental results like when using different metrics etc , rising of the problem of simpsons paradox etc can be averted by using statistics.

In the above problem we use experimental testing, hypothesis testing, confidence intervals.

Situation 2 :

When a company’s previous and present years revenue becomes low.

Statistics tells you what is the reason behind the low revenue. It helps you understand the sales behavior, trends and predict future results, trying to increase the revenue.

In the above case the data should be considerable enough to make insights(bigger the better).

Here we use Regression, classification etc.

Situation 3 :

Turning the raw and humongous data into cooked and simple data.

Consider a company having a 1 million customers and buying different types of products (say 100k), statistics can be used to determine or label each person’s taste, group them, make possible steps to make a customer buy that product and also recommend the products. We also look at sentiment of the customers to make good recommendations.

Here we use clustering, Dimensionality reduction,PCA… etc

Situation 4 :

You are planning to launch a website on selling an item. You want to know who are your target customers or you have now opened the website but customers who have purchased once aren’t visiting your site again. You want to know how can you bring customers or lure them to buy products from your site.

Here we use statistics to estimate the problems , risks and budget to be allocated etc for your company.

Statistical concepts like regression , classification is used for estimating the results, clustering , dimensionality reduction, casual effect analysis, latent variable analysis are used for customer labeling, engagement , retention etc and collaborative filtering can be used for recommendations.

Situation 5 :

For predicting the results using previous data , one of the most important and crucial step in any business model.

Here we use statistical models like predictive modelling. Getting insights of the data.

As it is said “Data science is the bridge between data and company”.

We use the power of statistics to build insights, predict, estimate and visualize the data.

The above are some situations where statistics is used. Having the knowledge of programming , math , data understanding ability , understanding business models and general knowledge makes up a good data scientist.

I hope this clarifies some of your doubts.


If you want to colab for any opensource projects or publishing journals , and also we are starting a new website for data science enthusiasts, if you are interested

please mail me at francisvikramsagar@gmail.com .