★ Essential

Base Rate Neglect

The most consequential cognitive error in everyday probabilistic reasoning. When a specific piece of evidence arrives, the mind discards the background frequency of the event in the population. The cab problem shows you exactly how this happens, and Bayes' theorem shows you exactly how to stop it.

Time: 15 minutes

Requires: Unit 1.7 Unit 4.1

Opening Hook

A city has two cab companies. Eighty-five percent of the cabs on the road are green. The remaining fifteen percent are blue.

One night, a cab is involved in a hit-and-run accident. A witness was present. She identified the cab as blue. The witness was tested under similar visibility conditions: she correctly identified the colour of a cab 80 percent of the time, and made an error 20 percent of the time.

Question: what is the probability that the cab involved in the accident was actually blue?

Most people say: around 80 percent. The witness was tested, she is 80 percent reliable, she said blue, so blue is probably right. Perhaps 75 percent, perhaps 85 percent, but somewhere in that territory.

The correct answer is 41 percent.

That is not a misprint. Given everything above, a blue cab is no more likely than a green one. The most probable conclusion from this evidence, if you run the arithmetic, is that the cab was green. Not blue.

The witness who said blue is probably wrong — not because she is a liar or unusually bad at her job, but because there are so many more green cabs in the world than blue ones that even a moderately reliable identification system will produce more errors about blue cabs than correct identifications of them. The distribution of cabs in the city is doing most of the work. And most people, when reading this problem, do not think about the distribution of cabs in the city at all.

That is base rate neglect.

The cab problem was constructed by Daniel Kahneman and Amos Tversky in the 1970s as part of their systematic investigation of how people reason about probability. They found, across population after population, that when people are given specific case information alongside statistical background information, they weight the specific information far too heavily and the statistical background almost not at all. This is not a defect in certain kinds of thinkers. It is a near-universal tendency. Professionals fall into it. Experts fall into it. You fall into it.

Here is the arithmetic for the cab case, worked the way that makes it visible.

Imagine not one accident but 1,000 accidents involving cabs from this city, where the same 85/15 split holds. In 850 of those accidents, the cab was green. In 150, it was blue.

Now apply the witness to each group. For the 150 genuinely blue cabs: the witness correctly identifies 80 percent as blue, which is 120 correct identifications, and misidentifies 20 percent as green, which is 30 errors. For the 850 genuinely green cabs: the witness correctly identifies 80 percent as green, which is 680 correct identifications, and misidentifies 20 percent as blue, which is 170 errors.

So across 1,000 accidents, how many times does the witness say blue? She says blue 120 times correctly and 170 times incorrectly, for a total of 290 “blue” identifications. Of those 290, only 120 are accurate. That is 120 out of 290, or 41 percent.

The witness has given you real information. Before her testimony, you would have estimated only a 15 percent chance of a blue cab. After her testimony, your probability has risen from 15 percent to 41 percent. That is a meaningful update. But the witness has not told you the cab was blue. She has told you the cab was probably blue, in the sense that 41 percent is higher than 15 percent, but probably green in the sense that 59 percent is higher than 41 percent.

The instinct to hear “the witness said blue” and conclude “the cab was blue” is almost irresistible. It feels like the reasonable thing to do. That instinct is what we are here to dismantle.

The Concept

The base rate is the background frequency of something in the relevant population before any specific information about a particular case is considered. It is what you would know if you knew nothing about the individual incident except that it was drawn from this population. In the cab problem, the base rate is the 85/15 split in the city’s cab fleet. In the disease test problem from Unit 1.7, the base rate was the prevalence of the disease in the population: 1 in 10,000.

Base rate neglect is the tendency to ignore this background frequency when a specific, vivid piece of evidence arrives. The witness statement feels like the real evidence. The distribution of cabs in the city feels like an abstract statistic. One grabs the attention; the other slides past it. This is not laziness or ignorance. It is a deep feature of how the human mind processes information. Specific case evidence triggers a different kind of reasoning from statistical background information, and the specific tends to win.

Kahneman’s work describes two systems of thought: a fast, associative system that matches patterns and reaches intuitive conclusions, and a slow, deliberate system that can compute probabilities but requires effort. Base rate neglect is what happens when you let the fast system answer a question that requires the slow one. The witness says blue. The fast system says: reliable witness, confident identification, must be blue. The slow system says: wait, how many green cabs are there?

The problem reappears constantly in medical reasoning. You met it in Unit 1.7 in the form of a disease test: if the disease is rare enough, even a very accurate test will produce more false positives than true ones when applied to a general population. The underlying structure is identical to the cab problem. The base rate of the disease (the prior probability that any given person has it) is the distribution of the cab fleet. The test result is the witness testimony. Ignoring the base rate of the disease when interpreting a positive test is the same cognitive error as ignoring the green/blue split when interpreting the witness’s identification.

This is not a coincidence. All these cases have the same mathematical structure, and they all call for the same mathematical tool.

Bayes’ theorem is the formal statement of how to update a probability correctly in light of new evidence. You met it in full in Unit 1.7. In the language of that unit: your posterior probability should equal your prior probability updated by the likelihood of the evidence. Base rate neglect is, precisely, the failure to use the prior. The mind performs the likelihood calculation — the witness is 80 percent reliable, so the likelihood of her saying blue given a blue cab is high — and stops there, never combining it with the prior probability of a blue cab in the first place. Bayes requires both. The prior is the base rate. Neglect the prior and you cannot do Bayes. You are left with a number that looks like an answer but has only half the information it needs.

Two practical correctives follow from this analysis.

The first is to identify the reference class before evaluating any case. Before considering what the witness said, or what the test result showed, ask: what is the background frequency of this outcome in the relevant population? This forces the slow system to engage before the fast one has already reached a conclusion. In the cab case: what fraction of cabs in this city are blue? In the disease case: how prevalent is this condition in the population being screened? In a legal case: how common is this type of crime in this context? The reference class is the population from which the case was drawn, and you need its frequency before the specific evidence means anything.

The second is to use natural frequencies rather than probabilities. Working in whole numbers, as we did above with the 1,000-accident example, makes the base rate visible in a way that conditional probabilities do not. “The witness is 80 percent reliable and the cab is 15 percent likely to be blue” requires mental arithmetic that most people cannot perform accurately under time pressure. “Out of 1,000 accidents, 150 involve blue cabs, and the witness correctly identifies 120 of them while also misidentifying 170 green cabs as blue” gives you a concrete picture that arrives at the same arithmetic through counting. Gerd Gigerenzer’s research at the Max Planck Institute has shown repeatedly that training people to use natural frequencies dramatically improves their accuracy on these problems compared to training them to use conditional probabilities. The format of the information changes how well the brain handles it.

Why It Matters

The places where base rate neglect does the most damage are wherever high-stakes decisions are made about individuals on the basis of statistical signals: security, medicine, law.

Terrorism profiling is the clearest example. In any Western country, the proportion of the travelling population who are active terrorists is vanishingly small, probably under 1 in a million on any given day at a major airport. Any profiling algorithm, however clever, must produce vastly more false positives than true ones when the base rate is that low. Suppose a profiling system is 99 percent accurate: it correctly identifies 99 percent of terrorists and incorrectly flags 1 percent of innocent travellers. At a rate of 1 terrorist per million passengers, that system would generate approximately 10,000 false positive alerts for every 1 accurate detection. The people investigating those alerts are not making statistical errors if they treat each flagged person as probably innocent. That is what the arithmetic requires. The political argument for such programmes almost always proceeds as if the only relevant number is the accuracy of the test. The base rate, which changes the conclusion completely, is never mentioned. That silence is the tell.

Legal reasoning is where base rate neglect has the most documented costs. You will meet the full mechanics in Unit 3.9 on the prosecutor’s fallacy. The short version is this: when forensic evidence is offered to a jury — a DNA match, a fibre comparison, a blood type — the jurors are being invited to reason from the probability of the evidence if the defendant is innocent to the probability that the defendant is innocent given the evidence. These are not the same thing, and the difference is the base rate of guilt. How likely was this defendant to be guilty before the forensic evidence was introduced? That question has a real answer, and it is not 50 percent. In cold case investigations that rely on database DNA searches, the base rate of guilt for any single individual in a large database is very low, and a match probability of 1 in a billion still does not make conviction a safe conclusion without independent corroborating evidence. Courts that admit probability figures without requiring them to be set against a prior probability of guilt are performing half a Bayesian calculation, and it is the less important half.

Mass screening programmes for medical conditions compound both problems above. When screening is extended to a general population for a rare condition, the false positive burden falls on the people screened, in the form of recalls, further tests, anxiety, and sometimes unnecessary procedures. The base rate of the condition in the screened population determines whether the programme produces more good than harm. A programme with 95 percent sensitivity and 95 percent specificity, applied to a condition with a prevalence of 1 in 10,000, will produce approximately 499 false positives for every true positive found. Whether that is an acceptable trade-off depends entirely on what happens to those 499 people — and on whether anyone in the programme’s design did the Bayesian calculation before rolling it out.

How to Spot It

In 2002 the United States Transportation Security Administration introduced the Computer Assisted Passenger Prescreening System II (CAPPS II), a programme designed to automatically identify high-risk passengers for additional screening. The programme was cancelled in 2004 after sustained criticism, including from the Government Accountability Office, partly on civil liberties grounds but also on effectiveness grounds.

The core problem, articulated by statisticians and civil liberties researchers at the time, was the base rate problem. The number of genuine terrorists attempting to board commercial aircraft in the United States in any given year is extremely small relative to the number of passengers screened, which was approximately 700 million annual boardings at that time. Any system with even a 1-in-1,000 false positive rate would generate hundreds of thousands of false alerts per year. The system’s designers and political advocates focused on sensitivity — the ability to detect a true terrorist — and gave almost no public attention to the base rate consequence of applying that sensitivity to a population where the event is extremely rare. The word “accuracy” appeared frequently in programme descriptions. The phrase “positive predictive value” did not.

The tell is always the same: a number that describes how well the signal detects the target, presented without any reference to the frequency of the target in the population. When you see “95 percent accurate,” ask: 95 percent accurate applied to a population where the condition is how common? The accuracy figure alone is not enough information to evaluate the claim. If the background frequency is not given, it is almost always because the person presenting the number either does not know it or knows that including it would weaken the case they are making.

Your Challenge

A new national screening programme begins testing the general population for a rare disease that affects 1 person in every 10,000. The test being used has a sensitivity of 95 percent (if you have the disease, the test correctly identifies you as positive 95 percent of the time) and a specificity of 95 percent (if you do not have the disease, the test correctly gives you a negative result 95 percent of the time).

You take the test as part of the programme. The result comes back positive.

Using the natural frequency method — imagine a population of 100,000 people and work through the numbers as whole counts — calculate the probability that you actually have the disease.

There is no answer on this page.

References

Kahneman, D. and Tversky, A., “On the psychology of prediction,” Psychological Review, 80(4), 237–251 (1973). The original formulation and empirical testing of representativeness and base rate neglect across a range of probability estimation tasks.

Tversky, A. and Kahneman, D., “Judgment under uncertainty: heuristics and biases,” Science, 185(4157), 1124–1131 (1974). The foundational paper cataloguing systematic errors in probabilistic reasoning, including the neglect of prior probabilities. URL: https://www.science.org/doi/10.1126/science.185.4157.1124

The cab problem specifically: Kahneman, D., and Tversky, A., “Evidential impact of base rates,” in Kahneman, D., Slovic, P., and Tversky, A. (eds), Judgment Under Uncertainty: Heuristics and Biases, Cambridge University Press (1982), pp. 153–160. The two-cab-company scenario is one of the most widely cited demonstrations of base rate neglect in the experimental literature.

Kahneman, D., Thinking, Fast and Slow, Farrar, Straus and Giroux (2011), Chapters 14–16. Extended treatment of base rate neglect, the two-system model of cognition, and the cab problem, written for a general audience.

Gigerenzer, G. and Hoffrage, U., “How to improve Bayesian reasoning without instruction: frequency formats,” Psychological Review, 102(4), 684–704 (1995). The demonstration that natural frequency formats dramatically improve accuracy on Bayesian reasoning tasks compared to probability formats.

Gigerenzer, G., Reckoning with Risk: Learning to Live with Uncertainty, Penguin (2002). Chapter 4 covers base rate neglect in medical and legal contexts; the case for natural frequencies as the corrective is made in full.

CAPPS II programme: United States Government Accountability Office, “Aviation Security: Computer-Assisted Passenger Prescreening System Faces Significant Implementation Challenges,” GAO-04-385, February 2004. URL: https://www.gao.gov/assets/gao-04-385.pdf. The GAO report documenting implementation difficulties including false positive concerns.

Schneier, B., “The Security of CAPPS II,” Crypto-Gram Newsletter, April 15, 2004. URL: https://www.schneier.com/crypto-gram/archives/2004/0415.html. Accessible analysis of the base rate problem in automated passenger screening, including false positive rate estimation.

Eddy, D.M., “Probabilistic reasoning in clinical medicine: problems and opportunities,” in Kahneman, D., Slovic, P., and Tversky, A. (eds), Judgment Under Uncertainty: Heuristics and Biases, Cambridge University Press (1982), pp. 249–267. Documents the failure of clinicians to correctly account for disease prevalence when interpreting diagnostic test results; a companion to the Kahneman and Tversky work applied to medical practice.