One of the most misunderstood topics in the field of statistical hypothesis tests is P-Value. In this article, I will explain what is P-Value and how to interpret it. I felt that although there are many articles present related to P-Value, most of them doesn’t serve the purpose of explaining it with proper examples. Before directly moving to the topic of P-Tests let me set the stage right by explaining some terms that I will use in this article.
What is meant by Hypothesis Testing?
Suppose you live in a city named Kolkata in India. You are the head of the anti-hoax department. Now someone named Robin posted a video on Twitter and claimed, ” In 2022 if you toss a coin, most of the time head will appear. Because the stars are aligned in such a way, that tail will never show up. Please watch the video for Proof”. Within 10 mins there were a thousand retweets and people started posting the same(mostly her followers). You became furious and called Robin. You are pretty sure that her claim is absurd. Now you want to test whether her claim is true or your gut is feeling is true using Hypothesis testing. In statistical jargon:
What is a Null and alternate Hypothesis?
So in this example, You want to prove that the probability of getting heads will be the same as the probability of getting tails. So your Null Hypothesis will be “The probability of getting heads and tails are similar”. Can you quickly guess what will be the alternate hypothesis? Yes, you guessed it right, it will probability of getting heads or tails is not equal. To simplify it further and align it with our example, the alternate hypothesis will be ‘In 2022 if you toss a coin, most of the time head will appear.’ Null Hypothesis is denoted as H0.
If you quickly go through the definition of the Null Hypothesis, you will find something like that:
“The null hypothesis is a characteristic arithmetic theory suggesting that no statistical relationship and significance exists in a set of given, single, observed variables between two sets of observed data and measured phenomena. “
Then you must be thinking -okay, understood, but how to relate the example with this definition. I tell you, it’s pretty straightforward. Let me again tell the things that Robin did in a detailed manner:
- Her claim: The probability of getting heads in 2022 is greater than the probability of getting tail or P(H in 2022)>0.5. This is known as the Test Statistic.
- Her experiment(The video she posted on Twitter for Proof): Tossed coins three times and boom, three heads.
So in this case our null hypothesis will be:
Null Hypothesis: There exists no statistical relationships and significance of P(H) being >0.5 and her experiment. Or in other words the probability of getting heads and tails are similar or P(H)=0.5.
To come to a conclusion we will need two new terms: 1. Significance Level and 2. P-Value. So let’s dive deep into them.
What is the P-Value?
‘P-value’ is the probability of observing a value for getting three heads out of 3 tosses if our null hypothesis is true.
We write P-value in short form as P-Value= P(Experiment results | H0 is true) or probability of getting a result of three heads out of three coin tosses if our null hypothesis is true. In statistical mumbo jumbo:
In null hypothesis significance testing, the p-value is the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct.
So in this example to find the p-value we have to determine what is the probability of getting three heads out of three coin tosses if we assume that the coin is unbiased. Let’s gets the basics right, let me ask you several questions:
- What is the probability of head in an unbiased coin? Well, you guessed it right, it’s 0.5 or 1/2.
- Now, tell me What is the probability of getting two heads in two tosses in an unbiased coin? It’s simple right. The probability will be 1/2*1/2 i.e. 1/4. (Among 4 distinct outcomes of HH, HT, TH and TT, we’re interested in the occurrence of HH. So it’s 1/4)
- Last question, What is the probability of getting three heads in three tosses in an unbiased coin? It’s 1/8 or 0.125. Or in other words, if you toss three coins one after another 1000 times, 125 times you will get three heads.
Now, can you guess what is the P-Value for our example? Take your time.
The P-Value of our example is 0.125. Or the probability of obtaining test results three heads out of three tosses even if the coin is unbiased is 0.125. Its kind of like even a broken clock is right twice a day. Hence getting three heads consecutively might be completely out of blind luck.
What’s the significance Level then?
Now to prove whether his claim is right or not, we need to have a bar right. That bar is called the significance level. In general, 5% is the rule of thumb. If the percentage of p-value is <5%, then we reject the null hypothesis.
Here, in this case, the null hypothesis is “Coin is not biased towards the head”. So we fail to reject it. The alternative hypothesis(H1) in this case is, “The coin is biased towards the head” and we reject it. As our P-value is 0.125 or 12.5%. To define it:
The significance level (or α level) is a threshold that determines whether a study result can be considered statistically significant after performing the planned statistical tests. It is most often set to 5% (or 0.05), although other levels may be used depending on the study. It is the probability of rejecting the null hypothesis when it is true (the probability to commit a type I error). For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference.
So by now, Robin will be pretty disgusted to learn that her claim is rejected. Now’s discuss an alternative that could have happened.
What if Robin got 5 Heads out of 5 tosses? The corresponding p-value calculation:
In this case, the corresponding P-value would have been: (1/2)*(1/2)*(1/2)*(1/2)*(1/2) [You can do the math as an exercise]. Or the probability of getting 5 heads out of 5 tosses if the coin is unbiased is 1/32 i.e 0.03125. If you toss 5 coins one after another 1000 times, you will get 5 heads out of 5 tosses only 31 times approx which is no joke.
If somethings probability is as low as 3% and you conduct one experiment and got the same, that would only mean there is something wrong with your initial assumption about the biasedness of the coins. Hence in that case we would come to the conclusion that due to stars alignment in 2022, the probability of getting heads is higher. Or in statistical jargon accepted the Alternate hypothesis.