The Normal Distribution

Why the bell curve appears everywhere, what it actually implies, and what happens when the people running your pension fund assume it holds when it does not. The 68-95-99.7 rule, z-scores, and the normality assumption that helped cause the 2008 financial crisis.

Time: 15 minutes
Requires: Unit 2.1

Opening Hook

Every year, thousands of seventeen-year-olds sit the same exam. Across a large, diverse population, most of them score somewhere in the middle. A smaller group scores very high; a smaller group scores very low. Plot the results on a chart and you get a shape that bulges in the centre and tapers off symmetrically at both ends. It is the same shape you would get if you plotted the heights of adult men, the birth weights of newborns, the measurement errors in a well-calibrated laboratory, or the daily returns of a large investment portfolio, on most days.

That shape is called the normal distribution, or the bell curve. It turns up so often, in so many different places, that it can start to seem like a law of nature.

Now consider this. On 19 October 1987, the US stock market fell by 22.6 percent in a single day. Under the normal distribution assumptions that most financial models of the time were using, such a move should have been so improbable as to be essentially impossible. The odds were later calculated at roughly one in 10 to the power of 50. For context, there are estimated to be around 10 to the power of 80 atoms in the observable universe. The models said this event should not have been possible in the entire lifespan of the cosmos. It happened on a Tuesday afternoon.

The bell curve is real, useful, and genuinely ubiquitous. It is also genuinely dangerous when people forget it is an approximation.

The Concept

The normal distribution is defined by two numbers: its mean and its standard deviation. The mean (which you covered in Unit 2.1) tells you where the centre of the distribution sits. The standard deviation (also from Unit 2.1) tells you how spread out the data is around that centre. Give someone the mean and standard deviation of a normal distribution, and they can reconstruct the entire shape.

The shape itself has a specific character. It is symmetric: the left half is a mirror image of the right half. It peaks at the mean. And it tapers off toward both extremes in a particular curved way, producing the characteristic bell silhouette. Values close to the mean are common. Values far from the mean are rare. Values very far from the mean are very rare.

How rare? This is where the 68-95-99.7 rule comes in. This rule is worth memorising. It tells you what fraction of data in a normal distribution falls within one, two, and three standard deviations of the mean.

Roughly 68 percent of values fall within one standard deviation of the mean. If adult male heights in a population are normally distributed with a mean of 178 cm and a standard deviation of 7 cm, then about 68 percent of men are between 171 cm and 185 cm tall.

Roughly 95 percent of values fall within two standard deviations. In that same example, 95 percent of men are between 164 cm and 192 cm.

Roughly 99.7 percent of values fall within three standard deviations. Only about three men in a thousand fall outside that range.

This is what makes extreme values so striking when they do appear. Being four standard deviations from the mean, in a normal distribution, happens to about one person in 15,000. Being six standard deviations out should happen to roughly two people in a billion. These are not common occurrences. Which is exactly why the “six sigma” quality standard in manufacturing carries weight: if your production process really is producing defects only at the six-sigma level, you are claiming fewer than 3.4 defects per million items.

Z-scores are how you work with this rule in practice. A z-score (also called a standard score) is the number of standard deviations a particular value sits above or below the mean. The formula is straightforward: subtract the mean from your value, then divide by the standard deviation. A z-score of 2 means the value is two standard deviations above the mean. A z-score of -1 means it is one standard deviation below. Once you have a z-score, you can look up what fraction of a normal distribution lies beyond that point, which is how statisticians calculate the probability of extreme values.

The z-score allows comparison across different scales. A student’s exam score, a patient’s blood pressure measurement, and a financial return can all be converted to z-scores and compared directly, even though the underlying units are completely different.

Why does the normal distribution appear so often? The reason is captured in one of the most important results in statistics: the Central Limit Theorem. The theorem says, roughly, that if you take a large enough sample of independent measurements from almost any distribution and add them together, the total will be approximately normally distributed, even if the individual measurements are not.

Think about the height example. A person’s height is the combined result of hundreds of genetic and environmental factors, each contributing a small amount. Add up enough small, independent influences and you tend to get a bell-shaped outcome. The same logic applies to exam scores (many different abilities and preparations contributing), measurement errors (many small random sources of imprecision), and many other quantities.

The key word is “independent.” When the individual contributions are not independent, when one large factor dominates, or when extreme events become correlated, the normal distribution breaks down. This matters enormously.

The normal distribution as an approximation. Real data is almost never perfectly normal. Real distributions have skew (a longer tail on one side), kurtosis (tails that are fatter or thinner than the normal predicts), or outright departures from the bell shape. The normal distribution is a model, and all models are simplifications. The test of a model is not whether it is exactly right, but whether using it leads you to make better or worse decisions than you would make without it. For a great many practical purposes, the normal distribution is the right tool. The question to always ask is: for this particular data, in this particular context, is the normality assumption reasonable?

Why It Matters

The normal distribution underpins an enormous amount of the formal machinery of statistics and finance. Confidence intervals, standard hypothesis tests, quality control systems, and risk models all typically assume, somewhere in their foundations, that the data they are working with is approximately normal. When that assumption is right, the machinery works well. When it is wrong, the results can be catastrophically misleading.

The 2008 financial crisis is the sharpest modern example of what happens when the normality assumption is applied to data that does not warrant it.

The financial products at the centre of the crisis were collateralised debt obligations (CDOs): instruments that pooled large numbers of mortgages and sliced the resulting cash flows into tranches with different risk profiles. Pricing them required a model for how correlated the underlying mortgage defaults were. If mortgages defaulted independently of each other, a senior tranche could be rated AAA with confidence. The question was: how often do many mortgages fail at the same time?

In 2000, a quantitative analyst named David X. Li published a paper proposing the use of a Gaussian copula function to model this correlation. The approach was mathematically elegant and practically convenient. It was also built, at its core, on an assumption that default correlations were stable and that the joint distribution of defaults behaved in a broadly normal way.

The assumption was wrong. As the US housing market declined, mortgage defaults did not behave independently. They were correlated, and as conditions worsened, the correlations increased dramatically. A distribution that looked manageable under normal assumptions had a tail that was vastly fatter than the models predicted. Senior tranches that had been rated as near-risk-free turned out to be exposed to exactly the scenario the models said almost could not happen.

Felix Salmon’s 2009 Wired article about Li’s formula, titled “Recipe for Disaster: The Formula That Killed Wall Street,” put the point plainly: the model worked fine in calm conditions and gave false reassurance about what would happen in a storm, which is the only time the reassurance actually mattered.

The story is not simply that the models were wrong. It is that the normality assumption was known to be an approximation, applied in a context where the consequences of the approximation being wrong were systemically catastrophic, and was treated in practice as if it were exact.

How to Spot It

The tell for misapplied normality is the combination of two things: a model that quantifies extreme risk using standard deviation, and a real-world process that is driven by concentration, correlation, or contagion.

A documented case that illustrates the pattern precisely is Long-Term Capital Management (LTCM), the hedge fund that collapsed in 1998. LTCM was run by some of the most mathematically sophisticated investors in the world, including two Nobel Prize-winning economists. Their models assumed that financial returns were approximately normally distributed and that positions in different markets were sufficiently independent that diversification would protect the fund.

In August and September 1998, following the Russian government’s default on its debt, markets across the world moved in the same direction at the same time. The correlations that LTCM’s models assumed were stable became, in a crisis, effectively equal to one. Every position the fund held went wrong simultaneously. In less than four months, LTCM lost $4.6 billion. A consortium of 14 banks, coordinated by the Federal Reserve, provided a $3.65 billion rescue to prevent a broader collapse.

The LTCM models were not incompetent. They described the world accurately most of the time. The problem was the tail. When the thing that should almost never happen did happen, the models had nothing useful to say about how bad it would be.

The tell, in this case and in general, is the phrase “assuming normal market conditions” or any equivalent construction. When a risk model tells you the probability of a loss larger than X under normal market conditions, what it is actually telling you is the probability under the model’s distributional assumption. Whether that assumption holds in the scenarios where you most need it to is a separate question, and often the more important one.

If you see a financial product, quality claim, or risk assessment that uses standard deviation or sigma as its primary measure of danger, ask one question: what does this model assume about the distribution, and is there any reason to think extreme events are more correlated or more frequent than that assumption implies?

Your Challenge

A financial technology company runs a fraud detection system. The system flags transactions that fall more than three standard deviations from the mean transaction value for a given account, based on the assumption that account activity is normally distributed. The company tells investors: “Our system catches 99.7% of fraud while minimising false positives. Only 0.3% of legitimate transactions are flagged.”

A data scientist at the company pulls the actual distribution of account transaction values across the platform. She finds that the distribution has a very long right tail: most transactions are small but a small number are very large, and the largest are much larger than the normal distribution would predict.

What is the problem with the company’s claim? What would you expect to happen to the false positive and false negative rates in practice, compared to the company’s stated figures? What type of distribution might better describe account activity, and why does the choice of model matter for the people whose accounts are flagged?

References