Original article can be found here (source): Deep Learning on Medium
P Value, T test, Chi Square test, ANOVA, When to use Which Strategy?
I my previous blog i have given a overview about hypothesis testing what actually it is and errors related to it.
In this blog we will discuss about different techniques for hypothesis testing mainly theoretical and when to use what?
What is P-value?
The job of the p-value is to decide weather we should accept our Null Hypothesis or reject it.The lower the p-value, the more surprising the evidence is, the more ridiculous our null hypothesis looks. And when we feel ridiculous about our null hypothesis we simple reject it and accept our Alternate Hypothesis.
I we found p-value is lower than the predetermined significance value(often called as alpha or threshold value) the we reject the null hypothesis. The alpha should always be set before an experiment to avoid bias.
For example: we generally consider a large population data to be in Normal Distribution so while selecting alpha for that distribution we select it as 0.05 (it means we are accepting if it lies in the 95 percent of our distribution). Means that if our p-value is less than 0.05 we will reject the null hypothesis.
But wait guys!! Significance of p-value comes in after performing Statistical tests and when to use which technique is important. So now i will list when to perform which statistical technique for hypothesis testing.
Chi-Square test is used when we perform hypothesis testing on two categorical variables from a single population or we can say that to compare categorical variables from single population. By this we find is there any significant association between the two categorical variables.
The hypothesis being tested for chi-square is
Null : Variable A and Variable B are independent.
Alternate : Variable A and Variable B are not independent.
T-test is a inferential statistic which is used to determine difference or to compare the means of two groups or samples which may be related in certain features. It is performed on continuous variables.
There are three different versions of t-tests:
- One sample t-test which tells whether means of sample and population are different.
- Two sample t-test also known as Independent t test – it compares the means of two independent groups and determine weather there is statistical evidence that the associated population means are significantly different.
- Paired t-test when you wan to compare means of different sample from same group or which compares means from the same group at different times.
It is also called as analysis of variance and is used to compare multiple (three or more) samples with a single test. It is used when the categorical feature have more than two categories.
The hypothesis being tested in ANOVA is
Null: All pairs of samples are same i.e. all sample means are equal
Alternate: At least one pair of samples is significantly different
In my next blog i will cover up the mathematical interpretation of these tests and coding in python for better understanding. Till then Happy Learning!!