The raven paradox
Econometricians face difficult problems. One of the hardest problems one can encounter is a paradox. The raven paradox is such a problem and was first proposed by logician Carl Gustav Hempel. The paradox has its basis in statististics. In this paradox, inductive reasoning forms a contradiction with intuitive reasoning. This article will try to explain the paradox and its implications on statistics.
In statistics we often want to test whether a given statement is true based on some kind of data sample. To perform a test we need to start by defining the null hypothesis. The null hypothesis of the raven paradox is the following statement.
This null hypothesis implies that if something is a raven, then it must be black i.e. (). Now, the thing we are after is the likelihood of this statement being true. To find this likelihood we first need to know something about the contraposition of the null hypothesis.
To make a sample, we cannot just take a sample of ravens and check whether they are all black. In statistics the contraposition will also have an impact on the likelihood, because it also constitutes evidence that a given statement is true. This means that for given events or states and we have
First note that we can consider and to be sets where is contained in . Then by looking at the inclusion and exclusion principles of sets it immediately becomes apparent that both the left and right statements have the same implications when looking at a Venn diagram.
The contraposition of the null hypothesis is: if something is not black, then it is not a raven i.e. (). The concept of the contraposition is often used in mathematical proofs and makes sense in the likelihood at face value as well: if something is not black and is a raven, then the null hypothesis would not hold. Let us now use this knowledge to construct a sample and explain the paradox.
For this problem all observations in a given ‘relevant’ sample fall into one of four categories. Let x be a given observation. Then either,
- x is a raven and x is black,
- x is a raven and x is not black,
- x is not a raven and x is black,
- x is not a raven and x is not black.
Note that the null hypothesis is instantly disproven if an element of a given sample falls into the second category. If an element of the sample falls into the third category this is neither evidence supporting the null hypothesis nor evidence contradicting the null hypothesis. For the last two categories we will move on to the hypothesis.
We can slightly alter the problem to better suit a maximum likelihood problem. The null hypothesis we can better work with is: the fraction of ravens that are black is equal to 1. In this way we can use the fraction as the parameter in our maximum likelihood problem.
where is the fraction of ravens that are black.
Now let us assume we have a sample of a million ravens. If it turns out that all of these ravens are black, then that is evidence that supports the null hypothesis and logically this would slightly increase the likelihood of the fraction of ravens being black to equal 1. This makes intuitive sense as well. When we however turn to the contraposition, we find something strange. Let us take a sample of a million objects that are not ravens. Now it turns out that most of these objects are not black (keep in mind we have already stated that the objects that are black and not a raven have no influence on the likelihood). Then, this is also evidence that supports the claim that the fraction of ravens being black is equal to 1 due to the contraposition of the null hypothesis. This statement is statistically correct, but makes absolutely no intuitive sense. If, for example, I observe a white table, then how would this be evidence that supports the claim that all ravens are black. So the paradox is, how can something that makes absolutely no intuitive sense have statistical impact? Mathematician Irving John Good asked himself the same question and tried to resolve the paradox.
How to resolve the paradox
In 1967, Irving John Good wrote an article (Good – p. 322) where he disproved the logic used in the paradox. Irving found a way to show where an observation that seems to support the null hypothesis, does not support it. In his example he stated that we have two worlds. One world with one hundred ravens that are all black and a million other birds, and one world with one thousand black ravens, one white raven and a million other birds. Now we take a single bird at random out of a random world. Now let’s say that it turns out to be a black raven. Then this supports the claim that we are in the second world where not all ravens are black since the fraction of ravens that are black out of the total population of birds is about 10 times higher. So observing a black raven in this case would support the claim that not all ravens are black instead of the claim that all ravens are black, since we do not know in which world we are but assume we are in one of these hypothetical worlds. In reality, we do not know anything about the world we are in so we can also be in one of these hypothetical worlds. This would imply that we are not necessarily certain if the observation of a black raven would increase the likelihood of all ravens being black and hence refute the paradox.
The raven paradox shows us that we have to be very careful when working with statistics. We always need to make sure that things make both intuitive sense and inductive sense. If something does not add up, always check whether there is a way in which something might have gone wrong.
Good, Irving John. “The White Shoe Is a Red Herring.” no. The Raven Paradox, p. 322. Oxford Journals, http://joelvelasco.net/teaching/5330/good67-white_shoe.pdf.