Why the most exciting findings in science are probably wrong, and what the Law of Large Numbers has to say about it.
Think about the last time someone told you about a diet that worked wonders for them. Or a friend who swore by a particular study routine because it got them through their resits. These stories feel convincing because they are real. Someone actually experienced them. But here is the thing: a sample of one tells you almost nothing. And yet we update our beliefs based on these stories all the time.
This is not just a problem with personal anecdotes. It shows up in published research, in business decisions, and in the conclusions we draw from data every single day. The culprit is almost always the same: too few observations.
The Law of Large Numbers
You have probably seen the Law of Large Numbers in a probability course. It says that as you collect more data, your sample average gets closer and closer to the true population mean. Flip a coin enough times and you will land very close to 50% heads. Simple enough.
But what the law does not say is equally important. It makes no promises about small samples. With few observations, your estimate can land almost anywhere. There is no force pulling it toward the truth until you have enough data, and "enough" is usually far more than people expect.
This creates a real problem. In practice, collecting data is expensive and time-consuming. Researchers, companies, and students all work with whatever they can get. And with small samples, random variation does a lot of the heavy lifting, meaning the result you observe often has more to do with chance than with any real underlying pattern.
The surgeon problem
Imagine you are choosing between two surgeons for a procedure. Surgeon A has an 80% success rate. Surgeon B has a 82% success rate. Sounds close, right?
Now add the denominators. Surgeon A has performed the operation ten times. Surgeon B has performed it a thousand times. Suddenly the choice is obvious. With ten operations, a couple of lucky or unlucky outcomes can swing the percentage dramatically. The 80% could just as easily have been 60% or 100% — there simply is not enough data to know. Surgeon B's 82% is a reliable estimate. Surgeon A's 80% is barely a number at all.
This is the core issue with small samples. They produce percentages and averages that look just as precise as those from large samples, but carry far more uncertainty. And we almost never show the denominator.
Why small samples keep producing surprising results
Here is something that feels counterintuitive at first. You might expect small studies to simply be less reliable, sometimes too high, sometimes too low, averaging out over time. But that is not quite what happens in practice.
Academic journals tend to publish results that are statistically significant. A study that finds no effect is much harder to get published than one that finds a big effect. This means the studies that end up in print are not a random selection of all the research that was run. They are the ones where the random variation happened to land in an interesting direction.
With a small sample, the variation in your estimates is high. So to hit statistical significance, you need a large estimated effect. The studies that clear the bar are therefore the ones with inflated, lucky results. When other researchers then try to replicate those findings with more data, the estimates shrink, because the extra data averages out the luck.
This is why so many famous findings have failed to hold up when tested again. Not because the original researchers were dishonest, but because small samples combined with selective publication is a recipe for results that cannot be trusted.
The sample is never just small, it is also skewed. There is a second, quieter issue with small samples: they are rarely representative. When researchers need data quickly and cheaply, they tend to use whoever is most accessible. For most academics, that means students. Often econometrics or economics students at their own university.
There is nothing wrong with studying students. But conclusions drawn from that group do not automatically transfer to everyone else. The way a 21-year-old economics student responds to a financial decision task may look very different from how a 45-year-old with a mortgage and two kids responds to the same task. A small, convenient sample does not just give you imprecise estimates, it gives you estimates for a very specific group that may not resemble the people you actually want to say something about.
In econometrics, the distinction between getting the right answer for your sample versus getting the right answer for the broader population is called internal versus external validity. Small samples make both harder, but they especially hurt external validity, because convenience and representativeness rarely go together.
What this means in practice
None of this means that research based on smaller samples is useless. It means it should be treated as a starting point rather than a conclusion. A single study with 50 participants that finds a large effect is an interesting lead. It becomes meaningful when several independent studies find the same thing with different samples in different settings.
It also means paying attention to the denominator whenever you encounter a statistic. A conversion rate, a success percentage, a reported effect, all of these carry very different weight depending on how many observations they are based on. Two numbers that look identical on the surface can represent wildly different levels of certainty.
Ultimately, the Law of Large Numbers is a guarantee that truth reveals itself, eventually, with enough data. The uncomfortable implication is that a lot of what gets treated as established fact was measured long before enough data existed to be sure. Recognising that is not cynicism. It is just good statistics.