Learn

Statistical thinking without the textbook

Welcome to your crash course in statistical literacy. We'll explore why correlation doesn't imply causation, how to spot spurious relationships, and why this matters in a world drowning in data.

What is Correlation?

The Basics

Correlation measures the statistical relationship between two variables. When two things are correlated, they tend to move together—either in the same direction (positive correlation) or opposite directions (negative correlation).

The Pearson Correlation Coefficient

We use Pearson's r, which ranges from -1 to +1. A value of +1 means perfect positive correlation, -1 means perfect negative correlation, and 0 means no linear relationship.

What is Causation?

Cause and Effect

Causation means that one event is the result of the occurrence of the other event. A causes B. This is fundamentally different from correlation—causation implies correlation, but correlation does not imply causation.

Proving Causation

To establish causation, scientists need randomized controlled trials, temporal precedence (cause before effect), elimination of alternative explanations, and ideally, a mechanistic understanding of how the cause produces the effect.

Confounding Variables

The Hidden Third Factor

A confounding variable is a hidden factor that influences both of your observed variables, creating the illusion of a direct relationship. It's the most common source of spurious correlations.

Classic Example

Ice cream sales and drowning deaths are correlated—not because ice cream causes drowning, but because both increase during summer months. Temperature is the confounding variable.

Common Fallacies

Post Hoc Ergo Propter Hoc

'After this, therefore because of this.' Just because B happened after A doesn't mean A caused B. This is one of the most common reasoning errors.

Cum Hoc Ergo Propter Hoc

'With this, therefore because of this.' Assuming that because two things occur together, one must cause the other. This is the core fallacy we're exploring.

Cherry-Picking Data

Selecting only data that supports your theory while ignoring contradictory evidence. With enough data, you can find correlations between almost anything.

Why This Matters

In a Data-Driven World

Understanding the difference between correlation and causation is crucial for making informed decisions in health, finance, policy, and daily life. Misinterpreting correlations can lead to costly mistakes.

Media Literacy

Headlines often confuse correlation with causation. 'Study shows X linked to Y' doesn't mean X causes Y. Developing skepticism about causal claims makes you a more informed citizen.

“Correlation is not causation, but it sure is a hint.”

— Edward Tufte, statistician and data visualization expert

Quick Checklist

Before believing any correlation claim, ask:

  • Is there a plausible mechanism for causation?
  • Could a third variable explain both?
  • Has a controlled experiment been done?
  • Does the relationship hold across different contexts?
  • Are the authors trying to sell me something?

Ready to test your knowledge?

Explore Correlations

See if you can spot the spurious ones