A Practical Guide to Statistical Thinking

You don't need a statistics degree to think clearly about data. You just need a few mental tools that help you ask the right questions when someone throws a number at you. This guide covers the most useful ones.

Sample Size: How Many Is Enough?

A study of 12 people is not the same as a study of 12,000. Small samples are noisy — they're more likely to produce extreme results by chance alone. When you see a claim backed by data, one of the first questions to ask is: how many observations are we talking about?

On Spurious, some of our correlations are calculated from as few as 5-6 data points. Those high r-values look impressive, but with so few points, even random data can produce a strong correlation. The more data points, the more reliable the measurement — which is why our correlations with 15-20 years of data are statistically more robust (and still equally meaningless as causal claims).

P-Hacking: Torturing the Data Until It Confesses

P-hacking is the practice of running many different statistical tests on a dataset until you find a “significant” result. If you test 20 hypotheses at the standard p < 0.05 threshold, you'd expect one to come back positive purely by chance.

This is exactly what happens when you compare hundreds of datasets to each other. With 300+ datasets, Spurious computes tens of thousands of pairings — and inevitably, many will show strong correlations by sheer coincidence. This is not a flaw in our method; it's the entire point. We're demonstrating what p-hacking looks like when you do it on purpose.

Base Rates: The Most Neglected Number

Imagine a medical test that is 99% accurate. You test positive. What are the odds you actually have the disease? Most people say 99%. The real answer depends on how common the disease is. If only 1 in 10,000 people have it, even a 99%-accurate test will produce far more false positives than true ones.

Base rate neglect is everywhere: in medical screening, criminal justice, airport security, and data science. The lesson is simple — always ask “how common is this in the first place?” before interpreting a result.

Regression to the Mean

If something is extreme the first time you measure it, it will probably be less extreme the second time — not because anything changed, but because extreme values are rare by definition. This is regression to the mean.

A student who scores 98% on one test will likely score lower on the next, not because they got worse, but because 98% is unusually high. A sports team that wins 15 games in a row will likely lose soon. This isn't prediction — it's probability. Understanding this prevents you from seeing patterns in what is actually just noise.

Survivorship Bias

During World War II, the US military studied bullet holes on returning bombers to decide where to add armor. They initially planned to armor the areas with the most holes. Statistician Abraham Wald pointed out the flaw: the planes that were hit in other areas didn't make it back. They were looking at survivors, not the full picture.

Survivorship bias affects everything from business advice (“successful founders all dropped out of college”) to medical research (studying only patients who survived treatment). Whenever you see a pattern, ask: what data am I not seeing?

Simpson's Paradox

A trend that appears in several groups of data can reverse when the groups are combined. This is Simpson's Paradox, and it's more common than you'd think. A famous example: UC Berkeley was sued for gender bias in admissions because overall acceptance rates were lower for women. But when broken down by department, women were actually accepted at equal or higher rates — they were just applying to more competitive departments.

The lesson: aggregated data can tell a completely different story than disaggregated data. Always ask whether the groups being combined are truly comparable.

Your Statistical Toolkit

Next time you encounter a data claim — in the news, at work, or on social media — run through these questions:

How large is the sample? (Bigger is more reliable.)
How many comparisons were tested? (More tests = more false positives.)
Could a confounding variable explain this? (The hidden third factor.)
Am I only seeing the survivors? (What data is missing?)
Does the trend hold when you break it into subgroups? (Simpson's Paradox.)
Is this an extreme value likely to regress? (Regression to the mean.)
What's the base rate? (How common is this in general?)

You won't always have answers to all seven questions. But just asking them puts you ahead of most people who consume data uncritically. And that's the real reason Spurious exists — not to mock statistics, but to celebrate the skill of thinking clearly about them.