Finding weird correlations...
Statistical thinking without the textbook
Welcome to your crash course in statistical literacy. We'll explore why correlation doesn't imply causation, how to spot spurious relationships, and why this matters in a world drowning in data.
Correlation measures the statistical relationship between two variables. When two things are correlated, they tend to move together—either in the same direction (positive correlation) or opposite directions (negative correlation).
We use Pearson's r, which ranges from -1 to +1. A value of +1 means perfect positive correlation, -1 means perfect negative correlation, and 0 means no linear relationship.
Causation means that one event is the result of the occurrence of the other event. A causes B. This is fundamentally different from correlation—causation implies correlation, but correlation does not imply causation.
To establish causation, scientists need randomized controlled trials, temporal precedence (cause before effect), elimination of alternative explanations, and ideally, a mechanistic understanding of how the cause produces the effect.
A confounding variable is a hidden factor that influences both of your observed variables, creating the illusion of a direct relationship. It's the most common source of spurious correlations.
Ice cream sales and drowning deaths are correlated—not because ice cream causes drowning, but because both increase during summer months. Temperature is the confounding variable.
'After this, therefore because of this.' Just because B happened after A doesn't mean A caused B. This is one of the most common reasoning errors.
'With this, therefore because of this.' Assuming that because two things occur together, one must cause the other. This is the core fallacy we're exploring.
Selecting only data that supports your theory while ignoring contradictory evidence. With enough data, you can find correlations between almost anything.
Understanding the difference between correlation and causation is crucial for making informed decisions in health, finance, policy, and daily life. Misinterpreting correlations can lead to costly mistakes.
Headlines often confuse correlation with causation. 'Study shows X linked to Y' doesn't mean X causes Y. Developing skepticism about causal claims makes you a more informed citizen.
“Correlation is not causation, but it sure is a hint.”
— Edward Tufte, statistician and data visualization expert
Before believing any correlation claim, ask:
Knowledge is power. Statistical literacy is superpower.