How to interpret p-value with COVID-19 data

Original article can be found here (source): Artificial Intelligence on Medium

[How to calculate chi-square test statistic from scratch]

  1. Draw the table (aka contingency table) from given data

According to table 1, out of a total 138 patients, 36 patients went to the ICU and 102 didn’t go to the ICU. Out of a total 55 patients who have anorexia, 24 went to the ICU and 31 didn’t go to the ICU.

2. Fill in the table

3. Compute the expected value for each cell. (This is the key!)

If you look at the cell [ICU, Anorexia], it calculates: out of a total of 36 ICU patients, 36 * (55/138) patients are expected to have anorexia.

Why do we calculate this?

Because, if the actual observation 24 is a whole lot different than 36 * (55/138), then something is going on between anorexia and ICU. Conversely, if 36 * (55/138) is not too different from 24, then ICU/non-ICU doesn’t have too much effect on anorexia.

If you calculate the expected value of the “Dry Cough” in table1, you won’t see much difference from the observation. (Try it!)

4. Normalize it & sum them up. (The intuition)

Applying the test statistic formula. Chi-square test statistic: 6.49 +4.3 +2.29+1.52 = 14.6

Doesn’t this formula remind you of something?

Linear regression. The sum of the squared differences between the observed and the expected. And we scale the deviations with its expected values just as any standardization. It calculates how much it deviates from what is expected.

The chi-square distribution is a sum of squares of independent standard normal random variables (like the chi-square test statistic).

Let’s say you have a random sample taken from a normal distribution. This is not an unrealistic assumption, given the central limit tendency (if there are more than 30 samples, the mean tends to be normal). Then, the (Observation — Expectation) value will also follow the normal distribution, because O is deterministic and E is normal (the expected value of normal also follows normal).

This normal distribution assumption is often omitted during the p-value teaching but this assumption is what makes the chi-squared p-value paradigm possible.

Chi-square test statistic: 6.49 +4.3 +2.29+1.52 = 14.6

This is a single number that tells you how much difference exists between your observed counts and the expected counts.

5. Get “Degrees of Freedom”.

For a table with r rows and c columns, the formula for calculating degrees of freedom for the chi-square test is

Degrees of Freedom = (# of rows − 1) × (# of columns − 1)

In our data, we have 2 rows and 2 columns in the contingency table so the df is 1.

Why are the degrees of freedom of the chi-square test defined this way?

First of all, degrees of freedom are the number of elements that can vary in order for the statistic to remain unchanged.

For example, in an equation `a + b + c = 10`, you can change a and b to any number but you cannot change c, once a & b are set, because the sum should remain as 10. So, in this case, df is 2.

The df for a chi-square test is the number of cells in the contingency table that can vary. In our example, the contingency table was 2 by 2. If you set the cell [ICU & Anorexia] as 24, the rest of the cells are determined. We multiply (row-1) and (column-1) because we want every possible combination of categorical variables.

Why do we need to take into account degrees of freedom?

Because the degrees of freedom affect the shape of the distribution.

If you look at the chi-square table, the different degrees of freedom has different test statistic values for the same alpha (significance level).

As the sample size becomes bigger, degrees of freedom get larger. So the bigger the degrees of freedom is, the more closely the distribution resembles the normal distribution.

This also means, as the df gets larger, the area in the tails becomes smaller. It congests more towards the mean (smaller standard deviation). And this means 0.05 p-value cut-off will be further out. Because when your sample size is smaller, the more uncertainties are there.

6. Look up the chi-square table, or do it with python or a calculator.

If you look at the chi-square table, as the bigger the test statistic is, the smaller the p-value is.

Or the calculator:

Or with python, it’s only a few lines.

from scipy.stats import chi2_contingencytable = [[24, 12], [31, 71]]alpha = 0.05test_statistic, p_value, dof, expected = chi2_contingency(table)if p_value <= alpha:
print('Variables are independent (reject H0)')
print('Variables are not independent (fail to reject H0)')

The more extreme the test statistic is, the more unlikely the result happened by chance.

3. How to Interpret P-value

P-value addresses ONLY one question:

How likely is your data, given that your null hypothesis is true?

66.7% of ICU patients have anorexia, but 30.4% of Non-ICU patients have it too. Its p-value is less than 0.001. The p-value 0.001 means if you sample 1000 different groups, you’d see the same statistics (or more extreme cases) only 1 time, given anorexia and ICU are indeed independent.

4. What P-value is NOT about.

  • The p-value is often misunderstood as being the probability that the null hypothesis is true. But technically it’s not. P-value can’t tell you whether or not the null hypothesis is true because it is already assuming the null hypothesis is true.
  • Non-significant p-values do not necessarily rule out the difference between ICU and non-ICU patients. It only means we don’t have enough evidence from the data at hand to say there is a difference.
  • The p-value doesn’t say your conclusion is correct either. It just tells you how rare the results are *only due to random chance alone*, not because of some other important factor. Notice p-value is about testing (therefore rejecting) the null hypothesis. It is only a piece of the puzzle.
  • P-value is not the probability of a type 1 error. The significance level (alpha) is.