The word statistics reminds most econometricians of the joyous hours spent on R. Apparently, there are more practical applications that statistics lends itself for: criminal justice. Or is it perhaps better to not apply statistics in the courtroom?
In this article we explore the use of statistics in the criminal justice system; we elaborate on how it is used and whether its use is lawful. It turns out the use of statistics has been widely debated through the years as misuse has been rather common.
How does it go wrong?
A frequently used example to explain mathematical abuse in the courtroom comes from The Hague. Many Dutch people recall the trial of the nurse Lucia de Berk, yet the minutiae of her trial remain hazy to most. Lucia was sentenced to prison for a supposed murder of many children and elderly that she was a nurse to. For a quick summary of the trial, click here.
In this particular trial, the use of statistics was conflicted. At the time, professor of law (not statistics!) Henk Elffers was responsible for the ‘mathematical proof’ that showed the would-be murders were indeed murders. In a statistics and operations research journal named STAtOR in 2004, Elffers wrote an article titled ‘Your honor, it was no coincidence, the rest is up to you’. In it, he explains the malpractice he put forward as proof. Elffers noted that in both hospitals that she worked, Lucia was among the nurses that experienced most deaths. He went on to calculate the probability that this was a coincidence. The crux that later bothered professors of statistics was the multiplication of p-values.
Suppose she was among the 1% of nurses that experience most deaths in both hospitals. Elffers’ reasoning was that the probability of this occurring in both hospitals would then be . This would be good practice if the samples came from two independent events. However, since we are dealing with the same nurse, an underlying cause such as inattentiveness or poor training might be the cause of both scores. Then, scoring amongst 1% nurses that experience most deaths twice is just a confirmation of the underlying causes. This might make her among the 1% worst nurses, but does not mean she is a murderer. Elffers did not think of these underlying causes and did see the two events as independent. Therefore, he thought it was justified to say that scoring among the 1% who experience most deaths twice, means she is among the
who experience most deaths. So, indeed, he thought it was justified to multiply the p-values he found for both hospitals.* Elffers found a p-value of 0.00000000292 that Lucia was accidentally present at all these deaths. Professor of philosophy Ton Derksen pointed out this fallacy. His explanations caught the attention of the authorities, which in the end led to acquittal. This shows that improper use of independence assumption in probability theory could have saved a woman six long years in prison.
Yet this fallacy could have been circumvented right? It turns out Henk Elffers only had an undergraduate degree in statistics. So, had the judges consulted professors of statistics instead, a better result would have been likely. Also, it seems likely that the trial would be more lawful had the judges appointed a committee of experts, rather than just one ‘expert’.
In the USA this problem would be less likely to occur. Not only because they would find legitimate experts, but because the status of statistics in relation to criminal justice has been subject to some compelling criticism by the hand of Laurence Tribe. In the late 1960s, Tribe had set right the case of the wrong verdict of Janet Collins and her partner due to the wrong use of probability theory. In the aftermath, he wrote denunciations of the use of mathematics in the courtroom. Tribe had excelled both in mathematics and law at Harvard University. This, together with his role in the Janet Collins case and his eloquent writing, lowered the status for statistics in the courtroom for decades to come.
The Janet Collins case
In the USA of 1964 there were hardly any interracial couples. The word ‘negro’ was still commonly used and many social constructs existed to keep the societal divide in place. Hence interracial couples were an unusual occurrence. How rare they were exactly, became a topic of debate in court.
In a robbery, two witnesses saw the offenders. Supposedly, the offenders were a Caucasian woman and an Afro-American man who fled in a yellow car. In the search of the offenders, the police figured there were not many interracial couples. To their delight, they found one such couple in the neighbourhood who also happened to have a yellow car. However, there was no proof that this was not just a coincidence.
In the trial that followed, the prosecutor put forward a probability theorist. The prosecutor gave the expert in probabilistic theory certain probabilities and asked him to calculate the probability a couple would match them all.

The probabilities given by the prosecutor.
In the courtroom, the probability theorist made a small remark about the independence of probabilities but multiplied them anyway. This resulted in a probability of 1/12.000.000 that a random couple would match all the given characteristics. The prosecutor nor the judge asked Mr. Martinez to go into further depth about the meaning of this figure. On the basis of this small probability, it was concluded that there were so few couples with all these characteristics that this couple had to be guilty. The couple from the neighbourhood was prosecuted for robbing an elderly woman.
In an appeal, the case went up to the supreme court of California. Here, Laurence Tribe, then a law clerk, set out all the problems that had occurred in the previous trial:
- The probabilities that were taken were fully arbitrary
- The assumption that the thieves were a married couple was unfounded
- Product rule could not be applied as probabilities were not independent
- Example: P[Mustache & Beard]
P[Mustache] * P[beard]
- Example: P[Mustache & Beard]
Tribe destroyed the original verdict.

Laurence Tribe, professor of Law at Harvard Law School.
Tribe’s denunciations of the use of statistics were described in a series of papers. The most important one is Trial by mathematics: Precision and Ritual in the Legal Process, published in the Harvard Law Review in 1971. In it, Tribe divides types of trials in which mathematics could be used; to prove occurrence, identity or intention. In each of the 3 cases, Tribe describes how mathematics may lead to wrong convictions. Important arguments Tribe uses are that a jury can be dwarfed by the overwhelming capacity of numbers. Furthermore, Tribe argues that statistics might be able to give the probability of whether something occurs, but in a trial, only the actual truth must be revealed, not the probability of truth. Discussion of his exact arguments are beyond the scope of this article, but it makes an interesting read.
The current status in the Netherlands
Although Tribe was very influential at the time, his impact has decreased over the years. In the Netherlands, statistics has established itself in the courtroom, although sometimes hidden. In the same issue of STAtOR in which Henk Elffers wrote his mathematical proof, statistician Marjan Sjerps describes this point. She argues that statistics is already accepted in the courtroom. Current applications are manifold: DNA analysis, sampling in drug cases, automatic speeding detection for cars, automatic speaker recognition and the process of choosing from a line-up of suspects. Would it then be justified to throw all statistics out of the courtroom because it sometimes goes wrong? Certainly this would let many offenders go free…

Studies show that the correct identification rate in police lineups is around 80%
The opinion of the editor
Perhaps a logical next step would be improved procedures for statistics in the courtroom. Surely, this would have helped Lucia de Berk. If a committee of statisticians is appointed for such cases this would have given better results. Moreover, to assure a wrong verdict such as Lucia’s will not happen, judges could disregard the evidence if there is no agreement amongst the statisticians. Taking such measures into practice means that we can reap the benefits of statistics while reducing the amount of false verdicts. This can be a happy marriage after all.
Dit artikel is geschreven door Tim van Schaick