Shallow Dives

Simpson's Paradox: When Statistics Lie by Telling the Truth

The Paradox That Saved a University's Reputation

In 1973, the University of California, Berkeley faced a crisis. Graduate admissions data showed that men were accepted at a significantly higher rate than women—44% versus 35%. A discrimination lawsuit seemed inevitable. But when statisticians examined the data more closely, they discovered something that defied common sense: most individual departments actually admitted women at higher rates than men. The bias had somehow reversed when the numbers were combined. Berkeley had stumbled into Simpson's Paradox.

What Simpson's Paradox Reveals About Hidden Variables

Simpson's Paradox occurs when a trend observed in separate groups disappears or reverses when those groups are aggregated. It's named after statistician Edward Simpson, who described it formally in 1951, though the phenomenon was noticed earlier by Karl Pearson in 1899.

The key insight: aggregate statistics can mask what's really happening because they ignore confounding variables—hidden factors that influence both the grouping and the outcome. In Berkeley's case, the confounding variable was department selectivity. Women disproportionately applied to competitive departments (like English) with low acceptance rates for everyone, while men applied more heavily to less competitive departments (like engineering) with higher acceptance rates overall.

Think of it like restaurant ratings. Restaurant A might have better lunch reviews and better dinner reviews than Restaurant B. But if Restaurant A serves mostly dinner (when it's harder to please critics) and Restaurant B serves mostly lunch, Restaurant B could still have the higher overall rating. Same restaurant, opposite conclusion, depending on how you slice the data.

The Medical Trial That Shocked Researchers

One of the most striking real-world examples involved a study of kidney stone treatments in the 1980s. Researchers compared two treatments across patients with small and large kidney stones. Treatment A was more effective for small stones (93% vs. 87%) and also more effective for large stones (73% vs. 69%). Logic says Treatment A must be superior overall, right?

Wrong. Treatment B had a better overall success rate: 83% versus 78%.

The paradox emerged because Treatment A was used more often on large stones (which are harder to treat successfully), while Treatment B was used predominantly on small stones (which have better outcomes regardless of treatment). The aggregated data told a different story than the disaggregated data—and doctors had to decide which version revealed the truth.

What This Means for Your World

Simpson's Paradox teaches three essential lessons about reasoning under uncertainty:

Context is everything. Aggregate statistics—whether about crime rates, economic trends, or batting averages—can be accurate and misleading simultaneously. The full picture requires understanding the subgroups and what's driving the overall number.

Beware of confounding variables. When you see a surprising statistic, ask: what hidden factor might be influencing both the groups being compared and the outcome being measured? In salary data, is it discrimination or different career choices? In health outcomes, is it treatment quality or patient selection?

The dinner party version: Tell your friends that the same dataset can simultaneously prove opposite conclusions—and both can be statistically correct. It's not about lying with statistics; it's about statistics telling different truths depending on how you ask the question.

A Question to Carry Forward

The next time you encounter a claim backed by data—whether in the news, at work, or in an argument—ask yourself: could this be Simpson's Paradox in action? What happens when we break this aggregate number into its constituent parts? Sometimes the truth hides not in the big picture, but in the details we've averaged away.

References

  • Bickel, P. J., Hammel, E. A., & O'Connell, J. W. (1975). "Sex Bias in Graduate Admissions: Data from Berkeley." Science, 187(4175), 398-404.
  • Charig, C. R., Webb, D. R., Payne, S. R., & Wickham, J. E. (1986). "Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy." British Medical Journal, 292(6524), 879-882.
  • Simpson, E. H. (1951). "The Interpretation of Interaction in Contingency Tables." Journal of the Royal Statistical Society, Series B, 13(2), 238-241.

Further Reading