How We Calculate Correlations

Every correlation on Spurious is computed from real, publicly available data. No numbers are invented, adjusted, or cherry-picked. The absurdity comes entirely from pairing datasets that have no business being in the same sentence — and discovering that, statistically, they move in lockstep.

Step 1: Collect the Data

We gather time-series datasets from government agencies (CDC, USDA, Bureau of Labor Statistics), scientific databases (NASA, NOAA), cultural trackers (Google Trends, box office records), and other public sources. Each dataset is a list of values measured at regular intervals — usually one per year — over a span of at least five years.

Our current library includes over 300 datasets across 13 categories: Food, Health, Government, Economic, Science, Education, Celebrity, Death, Internet, Consumer, Cultural, Paranormal, and Crime. That gives us tens of thousands of possible pairings.

Step 2: Align the Time Series

Not every dataset covers the same years. Before we can compare two datasets, we find their overlapping date range and keep only the data points that line up. If two datasets share fewer than five overlapping years, we skip that pair — there simply isn't enough data to compute a meaningful correlation.

Step 3: Calculate Pearson's r

We use the Pearson correlation coefficient, universally known as “r.” It measures the linear relationship between two variables and produces a number between -1 and +1:

r = +1.0Perfect positive correlation — both variables rise and fall together exactly.
r = 0.0No linear relationship at all.
r = -1.0Perfect negative correlation — when one rises, the other falls, and vice versa.

The formula itself is straightforward: Pearson's r is the covariance of the two variables divided by the product of their standard deviations. In plain English, it answers the question: “When one variable is above its average, how consistently is the other variable above (or below) its own average?”

Step 4: Rank and Curate

Out of the tens of thousands of possible pairings, we keep only the ones with a strong correlation: |r| ≥ 0.80. That still leaves over 16,000 statistically significant pairs. From those, we rank by a combination of:

Statistical strength — higher |r| scores better.
Category diversity — pairings from different categories are more surprising.
Entertainment value — the weirder the combination, the higher the score.

The top 2,200 make it onto the site. Each one is real, statistically valid, and completely meaningless as a causal claim.

Why This Matters

Pearson's r is one of the most commonly cited statistics in science, business, and media. A high r-value can be genuinely informative — or it can be a total coincidence. The entire point of Spurious is to train your brain to distinguish between the two.

When you see a chart with two lines moving together, your brain instinctively wants to believe one caused the other. That instinct is useful in daily life but dangerous in data analysis. By showing you correlations that are obviously absurd, we hope to make that skeptical voice in your head a little louder the next time someone says “studies show.”