When Distributions Are Not Normal

Real data is rarely bell-shaped. Skew, fat tails, bimodal distributions, log-normal distributions, and power laws each produce systematic errors when treated as if they were normal. This unit shows what those errors look like and why they cost people money, elections, and lives.

Time: 15 minutes
Requires: Unit 2.2

Opening Hook

In August 2007, with the first tremors of what would become the worst financial crisis since the Great Depression, Goldman Sachs CFO David Viniar told the Financial Times that his firm’s quantitative trading funds were experiencing moves of “25 standard deviations, several days in a row.”

It is worth pausing on what that number implies. Under a normal distribution, a 25-standard-deviation event should occur roughly once in a number of years so large it exceeds the number of particles in the observable universe. The fact that Goldman was seeing such events on consecutive days was not evidence of extraordinary bad luck. It was evidence that their model was wrong. The returns were not normally distributed. They never had been. The models assumed they were, the positions were built on that assumption, and when the assumption broke, it broke in a hurry.

Viniar’s comment was not an admission of a mistake. He seemed to be presenting it as evidence of how unusual the market conditions were. But that is exactly backwards. If your model tells you an event has a one-in-a-universe probability and it happens three days running, the correct conclusion is that you need a different model, not that the universe has been unlucky.

The Concept

In Unit 2.2 you met the normal distribution: the bell curve, symmetrical around its mean, where most of the action happens in the middle and extreme events trail off smoothly toward zero. It is a beautiful shape. It is also, in the real world, the exception rather than the rule.

Skew is what happens when a distribution is pulled toward one side. A positively skewed distribution has a long tail stretching to the right: most values cluster at the low end, but a few very high values drag the mean upward. A negatively skewed distribution is the mirror image, with a long tail to the left. The tell is the gap between mean and median. In a symmetrical distribution, they are equal. In a positively skewed one, the mean is higher than the median, because the extreme high values inflate it. Income is the classic example of positive skew. Most people earn a modest income. A small number of people earn enormous sums. The mean income in most countries is substantially higher than the median, which means it describes nobody’s lived experience accurately. This is not an accident of which statistic gets published; it is a consequence of the shape of the underlying distribution. Any analysis that applies normal-distribution tools to income data is using the wrong arithmetic from the start.

Fat tails (sometimes called heavy tails) describe a distribution where extreme events are far more common than a normal curve would predict. The normal distribution’s tails thin out very quickly. By the time you are three standard deviations from the mean, you are in territory so rare it barely exists. By four standard deviations, you are in statistical outer space. Fat-tailed distributions do not thin out like this. The tails stay thick. Extreme events remain genuinely possible, and they cluster more than normal theory suggests. Financial returns are the canonical example. Daily stock market moves of 5 percent or more happen far more frequently than a normal distribution predicts. Benoit Mandelbrot, the mathematician who developed fractal geometry, documented this property of financial markets in the 1960s and spent decades pointing out that financial institutions were routinely underpricing risk because their models had thin tails. He was largely ignored until the thin-tailed models collapsed spectacularly.

Bimodal distributions are what you get when your data is actually drawn from two distinct populations that have been lumped together. A bimodal distribution has two separate peaks rather than one. If you measured the height of every person in a room containing both adult men and women, you would get something close to bimodal: a cluster around the typical male height and another around the typical female height. If you treated this as a single normal distribution, you would get a mean somewhere in the middle that describes neither group well, and a variance that makes the spread seem larger than it is within either group. Bimodal distributions appear in product reviews (where reviewers tend to be either enthusiasts or disappointed customers, rarely indifferent middle-grounders), in voter opinion data on polarised political questions, and in patient populations where two different disease subtypes get classified as one condition. The danger is always the same: a single average that belongs to neither peak.

The log-normal distribution is what you get when the logarithm of a variable follows a normal distribution. That sounds technical but the intuition is straightforward. Many real-world processes are multiplicative rather than additive. Bacteria don’t grow by adding a fixed number of cells each generation; they multiply. Prices don’t move by fixed amounts; they move by percentages. Returns compound. When a process grows multiplicatively, the resulting distribution is skewed to the right, with a long upper tail, and taking the logarithm of the values produces something that looks approximately normal. Income, wealth, stock prices, the size of cities, the size of files on the internet, and the duration of wars all follow approximately log-normal distributions. The practical implication is that the correct way to calculate the “typical” value is often the geometric mean (the antilogarithm of the average of the logarithms) rather than the arithmetic mean. Using an arithmetic mean on log-normally distributed data systematically inflates the apparent typical value.

Power law distributions, sometimes called Pareto distributions or Zipf’s law in specific contexts, describe situations where a small number of observations account for a disproportionately large share of the total. Wealth is power-law distributed at the top end: the richest one percent own far more than their population share would suggest under any bell-curve model. City sizes follow a power law: there are a handful of enormous cities, many medium-sized ones, and vastly more small towns. Word frequencies in language follow a power law: a tiny number of words account for the majority of text. Website traffic follows a power law. Book sales follow a power law. The name “power law” comes from the mathematical form of the relationship: the frequency of an observation is proportional to a power of its rank. In a power-law distribution, the mean can be misleading or even undefined in the mathematical sense. Averages become meaningless. The concept of a “typical” observation loses its value, because the distribution is dominated by extremes. This is the Pareto principle, or the 80/20 rule, in its underlying mathematical form: when the distribution is power-law shaped, the top twenty percent can account for eighty percent of the total, and the relationship holds roughly at every level of magnification.

Why It Matters

The errors that follow from wrongly assuming normality are not abstract. They are expensive, and sometimes catastrophic.

In finance, the consequences have been documented repeatedly. Long-Term Capital Management, the hedge fund staffed by Nobel Prize-winning economists and among the most sophisticated financial operations ever built, collapsed in 1998 after a cascade of correlated market moves that their models treated as near-impossible. The Federal Reserve had to orchestrate a $3.65 billion rescue to prevent wider systemic damage. The fund’s risk models had assumed that the correlations between different asset markets were stable, and that extreme moves were extraordinarily rare. When Russia defaulted on its debt in August 1998, correlations shifted and the rare events arrived together. The models had not been calibrated to fat-tailed, correlated distributions.

A decade later, the Gaussian copula model used to price collateralised debt obligations across Wall Street embedded the same error at industrial scale. David Li’s elegant formula, published in 2000, allowed banks to price complex bundles of mortgages by assuming that default correlations followed a Gaussian (normal) structure. The CDO market grew from $69 billion in 2000 to over $500 billion by 2005, much of it priced using this model. When the housing market turned, defaults became highly correlated in exactly the way the model said they should not. Roughly 80 percent of CDO tranches had been rated AAA. Most turned out to be worthless. The model had thin tails; the reality had fat ones.

In income statistics, the problem runs in a different direction. Governments reporting mean household income produce a figure that is systematically higher than the income of anyone in the middle of the distribution, because the long upper tail of high earners pulls the mean upward. In the United States and the United Kingdom, the mean household income is consistently and substantially higher than the median. Politicians using mean income to describe typical living standards are not necessarily lying, but they are choosing a statistic that the distribution’s shape makes unrepresentative. The median would be the honest measure for a skewed distribution. The fact that means get reported more often is not coincidental.

In health research, trials of drugs and treatments routinely report mean outcomes. If the distribution of outcomes is skewed, or bimodal because different patient subtypes respond very differently, the mean conceals as much as it reveals. A drug that produces a large benefit in a subset of patients and no benefit or mild harm in everyone else can show a positive mean effect while helping fewer than half of those who take it.

How to Spot It

The tell for non-normality that gets exploited most consistently is the gap between mean and median. When a reported average is a mean and the underlying data is plausibly skewed, the median will almost always tell a different story.

The most thoroughly documented case is income reporting in official statistics. The UK’s Office for National Statistics publishes both mean and median household income figures. In 2022/23, mean household disposable income in the UK was reported as approximately £38,100 per year. The median was approximately £32,300. The gap of nearly £6,000 represents the statistical weight of the high-earning tail. If you were told the average income was £38,100 and assumed this described a typical household, you would systematically overestimate how prosperous most people were. The mean is not wrong as a number. It is simply not the right tool for a skewed distribution. Whenever you see the word “average” attached to income, house prices, executive pay, or any other quantity you know to be skewed upward, the first question to ask is: is this a mean or a median? If it is a mean, ask for the median. If the median is not available, apply mental downward pressure to the figure you have been given.

The same logic applies to any domain where you know the distribution should have a long upper tail: company valuations, city populations, earthquake magnitudes, social media follower counts. In each case, the mean is inflated by the extreme values. The median tells you about the typical case; the mean tells you about the total.

Your Challenge

A property website reports that the “average asking price” for a house in a particular city is £425,000. A local newspaper leads with this figure in a story about housing affordability. A councillor cites it in a planning debate to argue that housing in the city is too expensive for ordinary workers.

Sketch, in your head, what the distribution of actual asking prices in a typical city probably looks like. Is it likely to be symmetrical? Where would you expect the distribution to have its peak? Where would you expect its tail to run?

Given your answer, what is the relationship between the “average” figure and the price that a typical buyer would actually encounter? What additional statistic would you ask for before accepting the £425,000 figure as meaningful? And if you were told that the median asking price was £280,000, what would that tell you about the shape of the market that the mean concealed?

There is no answer on this page. That is the point.

References

Goldman Sachs CFO David Viniar on 25-standard-deviation events: Financial Times, August 13, 2007. Analysis of the statistical meaning of the claim: Dowd, K., Cotter, J., Humphrey, C. and Woods, M., “How Unlucky is 25-Sigma?” (2008), University College Dublin Working Paper. URL: https://ideas.repec.org/p/ucd/wpaper/200838.html. ArXiv version: https://arxiv.org/abs/1103.5672

Long-Term Capital Management collapse and risk model failure: Federal Reserve Bank of New York, “Near Failure of Long-Term Capital Management” in Federal Reserve History. URL: https://www.federalreservehistory.org/essays/ltcm-near-failure. Lowenstein, R., When Genius Failed: The Rise and Fall of Long-Term Capital Management (Random House, 2000).

Gaussian copula, David Li, and CDO pricing: Wikipedia, “David X. Li.” URL: https://en.wikipedia.org/wiki/David_X._Li. Salmon, F., “Recipe for Disaster: The Formula That Killed Wall Street,” Wired, February 2009. URL: https://www.scss.tcd.ie/John.Haslett/st2352/Folder%20papers%20of%20interest/Felix.%20The%20formula%20that%20killed%20Wall%20St.pdf

Mandelbrot on fat tails in financial markets: Mandelbrot, B. and Hudson, R.L., The Misbehavior of Markets (Basic Books, 2004). Mandelbrot, B., “The Variation of Certain Speculative Prices,” Journal of Business, 36(4), 394-419 (1963).

Income distribution as log-normal with power-law tail: Drăgulescu, A. and Yakovenko, V.M., “Exponential and power-law probability distributions of wealth and income in the United Kingdom and the United States,” Physica A, 299(1-2), 213-221 (2001). URL: https://physics.umd.edu/~yakovenk/papers/PhysicaA-299-213-2001.pdf

UK household income mean versus median: Office for National Statistics, “Household income inequality, UK: financial year ending 2023.” Published January 2024. Available at: https://www.ons.gov.uk/peoplepopulationandcommunity/personalandhouseholdfinances/incomeandwealth/bulletins/householdincomeinequalityfinancial

Pareto distribution and power laws: Newman, M.E.J., “Power laws, Pareto distributions and Zipf’s law,” Contemporary Physics, 46(5), 323-351 (2005). URL: https://arxiv.org/abs/cond-mat/0412004