Please suggest books for review ...

Or Search The Database:
The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century

Author: David Salsburg

Publisher: New York: Owl Books, 2002

ISBN: David S. McIn

Summary:The Lady Tasting Tea is one of the more painless introductions to statistics I have seen...it makes the case that statistics changed the nature of science in the 20th Century. It is just as true that it has changed business.

Imagine that you have just been tested for an incurable, fatal disease, and the test results came back positive —you have the disease. But maybe you don't, you tell yourself. Blood tests aren't perfect, especially for newly-discovered diseases like AIDS. The scientists that developed the test say that it correctly identifies the disease in 90 percent of the people who actually have it, and that it is 99 percent accurate in telling people who don't have it that they are uninfected. What's the likelihood that you have the disease? Somewhere between 90 and 99 percent? It might be time to draw up a will.

Before you jump to conclusions, though, you need to know what percentage of the overall population is infected. If ten percent is, then the chance that you are infected is nine out of ten. But if only one tenth of one percent of the population is, then your chance of actually having the disease is about one in twelve. However, it would be premature to break out the champagne quite yet. When we said "the population," did that mean the population of the whole country, or the "population" of people being tested? If it is truly random who decides to get tested, this is a moot point. But in reality, the people getting tested are probably sicker than the average population, or at least more at risk of becoming infected. To know how likely it is that you are infected, you have to know not just the accuracy of the test (in both directions), but also the average condition of the other people being tested. The test results you got back were telling, but they didn't prove anything.

Looking at this example, we used words like test, accuracy, sample, population, chance, percent, probably, random, average, and prove—words that have specific meaning in the world of statistics. One cannot intelligently discuss medicine, physics, economics or baseball without an occasional regression into statistics. Statistics is an inescapable part of modern life. What is surprising is that statistics is a very young science, with roots in the early 19th century.

Newtonian mechanics gave astronomers the tools they needed to model the movements of the planets. When astronomers' observations of planetary locations didn't match their predictions, they assumed the differences were errors in measurement. Presumably, as telescopes got better, the errors would get smaller. After all, the universe operated according to the deterministic laws of Newton and Kepler, as the discovery of Neptune in 1846 demonstrated. Unfortunately, better telescopes produced observations at greater variance from what the math had predicted. Either the universe has an intrinsic randomness to it, or observations are inherently inexact. Or both. Either way, as the 19th Century rolled along, it became clear that science would have to be based on a more statistical understanding of things.

David Salsburg's book The Lady Drinking Tea takes its name from an anecdote involving Ronald Aylmer Fisher, one of the greats of modern statistics. Fisher, the story goes, was at a summer tea party, when a one of the guests insisted that she could of course taste the difference between tea poured into milk and tea that milk had been poured into. To many of the guests, this was an absurd assertion. To Fisher it was an interesting question of experimental design. How many cups of tea, in what order, would she have to taste, with what accuracy, before one could conclude that her statement was true? Could one ever know?

Throughout the 1920s, while working at the Rothampsted Agricultural Experimental Station, Fisher produced a stream of papers for the Journal of Agricultural Science with snappy titles like "Studies in Crop Variation VI." One of Fisher's lasting contributions, obvious in hindsight, was that the best statistical inferences are drawn from carefully designed experiments with randomized sampling.

Sometimes this is impractical, such as when studying people who smoke. At other times, testing is done on intentionally-less-than-random samples. The Nielsen ratings of television viewing are based on the habits of a group of families (a "judgment sample") chosen to mirror the socioeconomic and geographical diversity of the TV-watching US population. The "man on the street" polls that appear on the TV news are based on "opportunity samples," or whoever happens to be on the street that day. When polls go disastrously wrong, it is because the sample populations were not representative of the real population. In 1936, the Literary Digest confidently predicted that Alf Landon would trounce Franklin Roosevelt in the presidential election. The magazine had conducted a telephone survey of its readers, and found they would be voting for Landon. Embarrassingly, there were more people without phones that voted for FDR than people with phones who voted for Landon.

This type of mistake crops up in medical research all the time. For the last ten years, it has been standard practice to prescribe hormone replacement therapy for post-menopausal women. Doctors had observed that women who took estrogen had fewer blood clots and heart attacks. Then, in 1998, more careful research showed that the hormones actually increase the risk of heart attack. The researchers realized that the women taking estrogen had been in better health to start with, less likely to smoke, and more likely to see their doctors regularly. Once again, correlation was confused with causality, and the culprit was a non-random sample.

We all understand the idea of bell-shaped distribution curves, with most of the measurements clustering around the middle, and fewer and fewer as you get farther away. When the Augustinian monk Gregor Mendel fudged the numbers in his famous genetic experiments, he couldn't have suspected that subsequent generations would expose him. But the data he recorded didn't show the normal degree of randomness that real data always does. To paraphrase Johnnie Cochran, "If the data don't fit, you must acquit."

Distributions have outliers, which people often forget. Warren Buffet has an unrivaled record as an investment manager, and year after year his Berkshire Hathaway Inc. outperforms the stock market. It's unseemly to say so, but there is a chance he's just been lucky. Let's assume there are 100,000 investment managers in the US, and it's purely random whether any of them will outperform the market. Fifty percent do and fifty percent don't. After ten years, 98 of them would have an unbroken record of success. After fifteen years, only three. But those three weren't any smarter than the other investment managers. They just represent the tail end of a very large distribution curve. Maybe Buffet is just lucky.

The statistician Chester Bliss once studied the efficacy of different insecticides. Bliss found that no matter how much insecticide the bugs were exposed to, a few would always manage to survive, and no matter how little they were exposed to, a few would die. It was almost impossible to say how much insecticide would kill every insect, but relatively easy to identify the lethal dose that would kill half the insects, or the LD-50. In a way, this is like measuring the half-life of a radioactive substance. Interestingly, it takes considerably more than five times as much data to determine the LD-10 than the LD-50. Just like accelerating a particle ever closer to the speed of light, the amount of effort required increases exponentially. Salsburg tells a story about his own experience running a study to find out what level of a compound would be lethal to one percent of the mice exposed to it. He determined that he would need several hundred million mice to get an acceptably certain LD-01, and recommended the experiment be reconsidered.

Everybody knows that asbestos is a bad thing. When the crystalline particles of this mineral lodge in the lungs, they can eventually lead to mesothelioma, asbestosis, or lung cancer. A combination of lawsuits and regulations has reduced the amount of asbestos we are exposed to. All of the companies that once produced it have been bankrupted. Regulations now mandate that when buildings like schools and offices are renovated, all asbestos must be removed or encapsulated by specially trained, protected, and insured firms. The legal standard is that no amount of asbestos can be shown to be safe, and so no amount can be tolerated. People have a very hard time making sense of the tail of a distribution curve. Law, formal logic, and statistics are not always in concert about the meaning of proof and causality.

Cigarette smoking is at least as bad as asbestos. People have referred to cigarettes as coffin nails for the last eighty years. But it has been hard to prove that cigarette smoking is harmful. The statistical tools used by epidemiologist have often been ones that Fisher designed to test the significance of controlled experiments. Studies of smoking look at opportunity samples, or people who already started smoking. No one is proposing to create a double-blind study that asks half the participants to start smoking two packs a day. But the problem bedevils the courts: there is a preponderance of evidence that smoking is bad, yet each one of the studies is in some way flawed. To statisticians like Jerome Cornfield, the odds of all those studies being wrong, even if they are flawed, is so low as to constitute proof. To people like Kip Vincusi, a Harvard Law School professor and the founding editor of the Journal of Risk and Uncertainly, it is wrong to use the tools of statistics to assert what they can't. Law and statistics continue to be uneasy partners.

The Lady Tasting Tea makes the case that statistics changed the nature of science in the 20th Century. It is just as true that it has changed business. More than any other person, W. Edwards Deming brought statistical thinking to the attention of corporate executives. First in Japan and later in the US, he showed how quality control was a way to improve both costs and quality. Each step in a production process has its own variability, and by addressing the steps with the most variability first and getting to the others in turn, overall variability can be reduced. Although many people associate Deming with Total Quality Management, he in fact abhorred it. It is appropriate to reduce variability, he said, but is naïve to try to eliminate all defects. Statistics says the world doesn't work that way.

The Lady Tasting Tea is one of the more painless introductions to statistics I have seen. Rather than taking the readers through the mathematical undergrowth, Salsburg organizes his book around the individuals that created the field. Anecdotes abound. No formulas are to be found. By using a historical narrative, Salsburg illustrates how the field developed—how rivalries and world wars shaped people's careers and how the mundane needs of farmers and pharmacologists led to the creation of a new science. As a result, fields such as law, economics, and even textual analysis have been changed forever. Of course, I can't prove that, but it's probably true.

by David S. McIntosh




© David S. McIntosh.     Center for Business Innovation, Cambridge MA.