Hypothesis testing in Data Science

Original article was published by Sahithya Rajasekar on Artificial Intelligence on Medium


Hypothesis testing in Data Science

Photo from Datacamp

One of the booming technologies, Data Science peaking so rapidly. In that particular technology, today we are gonna check about Hypothesis Testing.
Feeling Hypothetical? Relax!

It’s very simple concepts. No like the pirates treasure to find.
So why waiting let’s dive into the topic!
Hypothesis generally means initial step or initial stage of controversial or complex interrogation or investigation.
Yes, now you may ask me or staring at me for why I’m gonna use this strategy in my data!
Chill! I’ll tell you why!
Let me tell a scenario!
Imagine a criminal in front of the judge with enough evidence. Even the evidence is strong enough to prove the crime was done by that particular criminal.
The first question will be asked by the judge.
The inquiry will be “Did you accept the acquisition submitted on you? “
If the criminal regrets the acquisition, the proof will speak. So there are two categories here.
One is yes another one is no. Like Acceptance and Regression.
If he regrets it, the judge will inquire it again. If he agrees, punishment will be there.
What’s the major parameter here!?
Yes, evidence!

In Data science Hypothesis testing, data is our evidence!
We have certain formula for it to resolve and analyze the data given by the client.
This strategy is mainly to ignore or cleanse the unwanted data present in the
datasets. Hypothesis testing evaluates between two mutually exclusive imperatives or statements.
Two statements are classified as null hypothesis and alternative hypothesis.
Null can be represented as H0 and Alternative can be represented as H1.
H0 means there is no difference in data present in data sets.
H1 means there is difference among the data in data sets.
If Ho is accepted, H1 will be rejected.
Vice versa for H1 which means each one is opposite to each other.
The two types of errors playing major role in hypothesis.
One is Type I called false-negative. Here the analyst or investigator rejects the null hypothesis.
Type II error is false-positive. Here the Investigator fails to rejects the null hypothesis or accept the alternative hypothesis.
If the null hypothesis is true, type I error fails.
If the alternative hypothesis is true, type II error fails.

Clearly confusing?
No worries! As a beginner I was confused a lot! It’s common let me give a clear example!
Take the first scenario I already mentioned.
Judge is going to be the analyst here.
Crematorium evidences are submitted as datasets. It may be true or false. The criminal is not
100% proved yet. Now the judge has to analyze.

Let me give four statements.
If the person was a thief, he theft the jewelry shop.
If he is not that thief, there is not theft at the shop.
Now imagine Ho The person was a thief.
H1 as Not a thief.
Type 1 error will be theft was happened in the shop.
Type 2 error will be theft was not happened there.
Now correlate with this statements.

These are the basic stats for beginners. In the same topic we lots of parameters and zones. Learn from the beginning!

So major advantage of this strategy is to get the exact data what is needed for the analysis process. Using Python or R language packages the analysts and scientists can cleanse the data sets.
By using this strategy, the analysts and scientists can minimize the data errors and technical errors while researching on the data. After the cleansing and testing the data set is ready for further process.
This strategy is also applicable for tools and data analysis software. But high accuracy will be attained by coding obviously.
So let’s learn to code and become master.

In next topic we will see about further steps and procedures using codes!

Spread positive attitude! Not only Attitude!