Introduction

When doing hypothesis testing, an often-repeated rule is ‘never accept the null hypothesis’. The reason for this is that we aren’t making probability statements about true underlying quantities, rather we are making statements about the observed data, given a hypothesis.

We reject the null hypothesis if the observed data is unlikely to be observed given the null hypothesis. In a sense we are trying to disprove the null hypothesis and the strongest thing we can say about it is that we fail to reject the null hypothesis.

That is because observing data that is not unlikely given that a hypothesis is true does not make that hypothesis true. That is a bit of a mouthful, but basically what we are saying is that if we make some claim about the world and then we see some data that does not disprove this claim, we cannot conclude that the claim is true. We can only claim that we haven’t disproven it.

A short example

Say we have data that comes from a Normal distribution with known standard deviation, \sigma = 1. We set up a null hypothesis H_0, of \mu =0 against H_1: \mu>0 . We collect n=25 data points and observe a sample mean \bar x = 0.25.

This gives a Z-stat of: \frac{0.25-0}{1/ \sqrt{25}} = 1.25 which has a p-value of 0.11. This would mean that we’d fail to reject the null hypothesis.

What about \mu = 0.1? If we redo the calculation:

Z-stat of: \frac{0.25-0.1}{1/ \sqrt{25}} = 0.75 which has a p-value of 0.23. This would mean that we’d fail to reject the null hypothesis.

That’s fine. We can’t say that either are wrong. But if we’d accepted the null hypotheses we’d be stuck believing the true mean was both 0 and that it was 0.1. Which is a tough belief to defend. In fact, there are infinitely many null hypotheses that we would fail to reject.

This is usually why the null hypothesis is chosen based on something that we care about priori. Otherwise we might spend a lot of time not disproving some pretty irrelevant things.

How clear is this post?