Correlation Is Not Causation
The most frequently stated statistical fact and the least frequently applied one. Three structural reasons why correlation does not imply causation — confounding, reverse causation, and chance — with the Bradford Hill criteria as a framework for when it might.
The most obvious joke that keeps fooling everyone
Between 2000 and 2009, per capita cheese consumption in the United States correlated with deaths caused by people becoming tangled in their bedsheets. The correlation coefficient was 0.947, which means the two variables moved almost perfectly in lockstep. If you plotted the lines, you would struggle to get a sheet of paper between them.
This is from Tyler Vigen’s Spurious Correlations database, which contains hundreds of examples assembled by a program that simply trawls publicly available datasets and pairs anything that moves together. Nicolas Cage film releases correlate with swimming pool drownings. The divorce rate in Maine correlates with per capita consumption of margarine. US spending on science, space, and technology correlates with suicides by hanging, strangulation, and suffocation.
You are reading these and thinking: obviously not. Nobody believes cheese causes bedsheet deaths. The correlations are absurdly funny precisely because the causal connection is invisible. There is no plausible mechanism. The variables cannot have anything to do with each other.
But here is the uncomfortable part. The same logical structure that produces the cheese-and-bedsheets correlation produces the ones that made it into newspaper headlines and government policy. The variables are less obviously unrelated. The mechanism sounds plausible. The sample is real. The correlation is strong. And the same mistake gets made.
That is what this unit is about.
The concept
Correlation is a statistical relationship between two variables. When one goes up, the other tends to go up (positive correlation) or down (negative correlation). The correlation coefficient, usually written as r, runs from -1 (perfect inverse relationship) to +1 (perfect direct relationship). Zero means no linear relationship at all. The cheese-bedsheet correlation of 0.947 is extraordinarily high. Most real-world correlations worth paying attention to sit somewhere between 0.2 and 0.6.
Causation is something different. A causes B means that if you intervene on A and change it, B will change as a result. Not just that they happen to move together, but that one produces the other.
Correlation is a description of a pattern in data. Causation is a claim about what happens in the world. They are not the same thing, and observing the first gives you only weak evidence about the second. There are three structural reasons why.
Confounding. A third variable causes both A and B, producing a correlation between them that has nothing to do with any direct relationship. Ice cream sales correlate strongly with drowning rates. Does eating ice cream make you drown? No. The confounder is summer. Hot weather drives both ice cream consumption and the number of people swimming, and drownings follow swimmers. If you hold temperature constant, the ice cream correlation disappears. The confounder was doing all the work.
Reverse causation. A correlates with B, but you have the direction backwards. You find that people who are admitted to hospital are more likely to die than people who stay home. Does hospital admission cause death? Clearly not in the direction that reading would imply. Sick people go to hospital; their sickness, not the hospital, is what threatens their life. The causal arrow runs from illness to both hospitalisation and death. Confusing the direction of causation is not limited to obvious examples. In economics, in psychology, in nutrition science, direction is regularly assumed and rarely tested.
Chance. With enough variables and enough time, some correlations will appear by pure random coincidence. This is what Vigen’s program exploits. It did not go looking for cheese and bedsheets; it tested thousands of pairs until it found ones that happened to align. If you test enough combinations, improbable patterns will emerge by luck. The more variables in a dataset and the longer the time series, the more opportunities for chance correlations to appear.
These three mechanisms exhaust the structural explanations. If A and B are correlated, either A causes B, B causes A, a third variable causes both, or it is a coincidence. Those are the options. Asserting causation from correlation alone means you have eliminated the other three possibilities, and that requires evidence the correlation itself cannot supply.
The Bradford Hill criteria are one framework for gathering that evidence. In 1965, the epidemiologist Austin Bradford Hill described nine aspects of an association worth examining before concluding causation: strength, consistency, specificity, temporality, biological gradient, plausibility, coherence, experimental evidence, and analogy. He was careful to note that none of these can provide indisputable proof and none is required alone, but together they make a causal claim more or less persuasive. Temporality is the most fundamental: the cause must precede the effect. A correlation observed only after an intervention cannot establish that the intervention caused anything.
The counterfactual is the gold standard. To establish that A causes B, you need to know what would have happened to B if A had not occurred. This is why randomised controlled trials are the most powerful causal tool available: by randomly assigning some people to a treatment and others to a placebo, you create two groups that differ only in whether they received the treatment. The counterfactual is approximated. Observational data, however detailed and carefully collected, cannot recreate this approximation without assumptions.
The examples below share one thing: high statistical correlation, zero causal relationship. Looking at them should make you feel something specific: a mild discomfort at how convincing the line graphs look, even when you know the variables cannot be connected. That feeling is important. Carry it into the next health headline you read.
Why it matters
In October 2012, the New England Journal of Medicine published a paper by the cardiologist Franz Messerli correlating per capita chocolate consumption with the number of Nobel Prize laureates per ten million people across twenty-three countries. The correlation coefficient was 0.791. Messerli wrote that increasing a country’s chocolate consumption by 0.4 kg per person per year should produce one additional Nobel laureate. The paper was tongue-in-cheek, published in the journal’s occasional notes section. Few readers got the joke. The finding ricocheted around the internet and the press as evidence that chocolate improves cognitive function. The mechanism sounds plausible; cocoa contains flavanoids, which have been studied for various effects on cognition and cardiovascular health. The causal step was simply assumed.
The actual explanation is a confound. Countries with higher chocolate consumption are, on average, wealthier and more likely to fund the kind of long-term basic research that produces Nobel laureates. It is wealth and institutional investment in science that causes both, not chocolate. The ecological fallacy (using country-level data to make claims about individuals) compounds the error, but the correlation-causation confusion is the first mistake.
This example is relatively harmless. The medical case is not.
For decades, observational studies showed consistently that women who took hormone replacement therapy after menopause had thirty to fifty percent lower rates of coronary heart disease than women who did not. The effect was large and consistent across more than forty studies, including the Nurses’ Health Study, which followed over 120,000 women and found significant cardiovascular benefit. On the strength of this correlation, HRT was prescribed to millions of women for cardiovascular protection.
Then the Women’s Health Initiative trial, a randomised controlled study that started in 1991, reported its results in 2002. It found no cardiovascular protection and, for the combined oestrogen-progestogen preparation, an increased risk of coronary heart disease, stroke, and blood clots. The observational studies had been wrong. The most likely explanation is a confound: women who chose to take HRT were, on average, healthier, wealthier, and more health-conscious than those who did not. Their lower rates of heart disease reflected their baseline health, not the effect of the therapy. The correlation was real. The causal inference was wrong. And the prescription practice based on that causal inference exposed millions of women to risk without the protection they had been promised.
This is the category of error that matters. Not cheese and bedsheets, but policies, treatments, and public health recommendations built on correlations that were reported as causes.
How to spot it
In November 1995, the Committee on Safety of Medicines in the United Kingdom issued an emergency warning that third-generation oral contraceptive pills were associated with twice the risk of venous thromboembolism (blood clots) compared to second-generation pills. The warning was based on three observational studies that had not yet been published. The press release landed during a news cycle and received extensive coverage. Approximately one million women stopped taking the pill immediately. The consequences were predictable: unwanted pregnancies increased, and the number of abortions in England and Wales rose by approximately thirteen thousand in the year following the warning.
Subsequent analysis suggested the studies had been confounded. Women prescribed third-generation pills were more likely to have risk factors that contra-indicated the second-generation options. The apparent doubling of risk may have reflected who was prescribed which pill rather than a property of the pills themselves. A later independent review found the risk increase was smaller than reported and that the benefits of third-generation pills for some women outweighed the risks.
The real-world consequence of reporting a correlation as a causation was thirteen thousand additional abortions in a single year.
The tell in every case is the same: look for the phrase “associated with” in the headline or abstract. In scientific writing, “associated with” always means correlated, never necessarily caused. When you see it, ask three questions:
First, what is the confounder? Who or what else might cause both variables to move together? In the HRT case: health consciousness and wealth. In the pill case: pre-existing clotting risk factors.
Second, what is the direction? Is it possible the apparent effect runs the other way? Could the outcome be driving the exposure rather than the exposure driving the outcome?
Third, how many variables were tested? The more variables examined before landing on this correlation, the higher the probability it is a chance finding, regardless of the p-value. (Unit 3.10 on p-hacking goes deeper on this.)
The three-question test will not resolve every case, but it will stop you from swallowing the causal inference before it has been earned.
Your challenge
A large study of schoolchildren finds that children who eat breakfast perform better on academic tests than children who skip it. The effect is statistically significant and consistent across multiple years of data. The study authors write that “breakfast consumption was significantly associated with improved academic performance.”
Before you conclude that skipping breakfast causes worse academic outcomes, sketch out at least three distinct causal structures that could produce this finding. For each one, describe what would need to be true about the data, and whether the observational study as described could distinguish between your explanations.
No answer is given here. The skill being practised is the act of generating competing explanations, not selecting among them. A mind that automatically reaches for one explanation on seeing a correlation is the mind that prescribed HRT to millions of women.
References
Vigen, T. (2015). Spurious Correlations. Hachette Books. Spurious Correlations database: spuriouscorrelations.com
Messerli, F.H. (2012). Chocolate consumption, cognitive function, and Nobel laureates. New England Journal of Medicine, 367(16), 1562–1564. DOI: 10.1056/NEJMon1211064
Grodstein, F. et al. (1997). Postmenopausal hormone therapy and mortality. New England Journal of Medicine, 336(25), 1769–1775. PubMed
Writing Group for the Women’s Health Initiative Investigators (2002). Risks and benefits of estrogen plus progestin in healthy postmenopausal women. JAMA, 288(3), 321–333. DOI: 10.1001/jama.288.3.321
Furedi, A. (1999). The public health implications of the 1995 “pill scare.” Human Reproduction Update, 5(6), 621–626. PubMed
Hill, A.B. (1965). The environment and disease: association or causation? Proceedings of the Royal Society of Medicine, 58(5), 295–300. PMC
Continue by email
Get one unit delivered to your inbox every day for 44 days. Free. No spam. Unsubscribe any time.