How Not to Be Wrong

Excerpts from Jordan Ellenberg, How Not To Be Wrong: The Power of Mathematical Thinking, 2015, Penguin Books.


“Mathematics is not just a sequence of computations to be carried out by rote until your patience or stamina runs out—although it might seem that way from what you’ve been taught in courses called mathematics. Those integrals are to mathematics as weight training and calisthenics are to soccer. If you want to play soccer—I mean, really play, at a competitive level—you’ve got to do a lot of boring, repetitive, apparently pointless drills. Do professional players ever use those drills? Well, you won’t see anybody on the field curling a weight or zigzagging between traffic cones. But you do see players using the strength, speed, insight, and flexibility they built up by doing those drills, week after tedious week. Learning those drills is part of learning soccer. “If you want to play soccer for a living, or even make the varsity team, you’re going to be spending lots of boring weekends on the practice field. There’s no other way. But now here’s the good news. If the drills are too much for you to take, you can still play for fun, with friends. You can enjoy the thrill of making a slick pass between defenders or scoring from distance just as much as a pro athlete does. You’ll be healthier and happier than you would be if you sat home watching the professionals on TV. “Mathematics is pretty much the same. You may not be aiming for a mathematically oriented career. That’s fine—most people aren’t. But you can still do math. You probably already are doing math, even if you don’t call it that. Math is woven into the way we reason. And math makes you better at things. Knowing mathematics is like wearing a pair of X-ray specs that reveal hidden structures underneath the messy and chaotic surface of the world. Math is a science of not being wrong about things, its techniques and habits hammered out by centuries of hard work and argument. With the tools of mathematics in hand, you can understand the world in a deeper, sounder, and more meaningful way. All you need is a coach, or even just a book, to teach you the rules and some basic tactics. I will be your coach. I will show you how.”

Pag. 5.


Wald’s insight was simply to ask: where are the missing holes? The ones that would have been all over the engine casing, if the damage had been spread equally all over the plane? Wald was pretty sure he knew. The missing bullet holes were on the missing planes. The reason planes were coming back with fewer hits to the engine is that planes that got hit in the engine weren’t coming back. Whereas the large number of planes returning to base with a thoroughly Swiss-cheesed fuselage is pretty strong evidence that hits to the fuselage can (and therefore should) be tolerated. If you go the recovery room at the hospital, you’ll see a lot more people with bullet holes in their legs than people with bullet holes in their chests. But that’s not because people don’t get shot in the chest; it’s because the people who get shot in the chest don’t recover.

Pag. 10.


Here’s an old mathematician’s trick that makes the picture perfectly clear: set some variables to zero. In this case, the variable to tweak is the probability that a plane that takes a hit to the engine manages to stay in the air. Setting that probability to zero means a single shot to the engine is guaranteed to bring the plane down. What would the data look like then? You’d have planes coming back with bullet holes all over the wings, the fuselage, the nose—but none at all on the engine. The military analyst has two options for explaining this: either the German bullets just happen to hit every part of the plane but one, or the engine is a point of total vulnerability. Both stories explain the data, but the latter makes a lot more sense. The armor goes where the bullet holes aren’t.

Pag. 10.


A mathematician is always asking, “What assumptions are you making? And are they justified?” This can be annoying. But it can also be very productive. In this case, the officers were making an assumption unwittingly: that the planes that came back were a random sample of all the planes. If that were true, you could draw conclusions about the distribution of bullet holes on all the planes by examining the distribution of bullet holes on only the surviving planes.

Pag. 11.


To a mathematician, the structure underlying the bullet hole problem is a phenomenon called survivorship bias. It arises again and again, in all kinds of contexts. And once you’re familiar with it, as Wald was, you’re primed to notice it wherever it’s hiding.

Pag. 12.


We tend to teach mathematics as a long list of rules. You learn them in order and you have to obey them, because if you don’t obey them you get a C-. This is not mathematics. Mathematics is the study of things that come out a certain way because there is no other way they could possibly be.

Pag. 15.


Math is like an atomic-powered prosthesis that you attach to your common sense, vastly multiplying its reach and strength.

Pag. 16.


As a mathematical discipline travels far from its empirical source, or still more, if it is a second and third generation only indirectly inspired by ideas coming from “reality” it is beset with very grave dangers. It becomes more and more purely aestheticizing, more and more purely l’art pour l’art. This need not be bad, if the field is surrounded by correlated subjects, which still have closer empirical connections, or if the discipline is under the influence of men with an exceptionally well-developed taste. But there is a grave danger that the subject will develop along the line of least resistance, that the stream, so far from its source, will separate into a multitude of insignificant branches, and that the discipline will become a disorganized mass of details and complexities. In other words, at a great distance from its empirical source, or after much “abstract” inbreeding, a mathematical subject is in danger of degeneration.*

Pag. 17.


Pure mathematics can be a kind of convent, a quiet place safely cut off from the pernicious influences of the world’s messiness and inconsistency. I grew up inside those walls. Other math kids I knew were tempted by applications to physics, or genomics, or the black art of hedge fund management, but I wanted no such rumspringa.* As a graduate student, I dedicated myself to number theory, what Gauss called “the queen of mathematics,” the purest of the pure subjects, the sealed garden at the center of the convent, where we contemplated the same questions about numbers and equations that troubled the Greeks and have gotten hardly less vexing in the twenty-five hundred years since.

Pag. 19.


There’s a whole field of mathematics that specializes in contemplating numbers of this kind, called nonstandard analysis. The theory, developed by Abraham Robinson in the mid-twentieth century, finally made sense of the “evanescent increments” that Berkeley found so ridiculous. The price you have to pay (or, from another point of view, the reward you get to reap) is a profusion of novel kinds of numbers; not only infinitely small ones, but infinitely large ones, a huge spray of them in all shapes and sizes.*

Pag. 42.


Linear regression is a marvelous tool, versatile, scalable, and as easy to execute as clicking a button on your spreadsheet. You can use it for data sets involving two variables, like the ones I’ve drawn here, but it works just as well for three variables, or a thousand. Whenever you want to understand which variables drive which other variables, and in which direction, it’s the first thing you reach for. And it works on any data set at all. That’s a weakness as well as a strength. You can do linear regression without thinking about whether the phenomenon you’re modeling is actually close to linear. But you shouldn’t. I said linear regression was like a screwdriver, and that’s true; but in another sense, it’s more like a table saw. If you use it without paying careful attention to what you’re doing, the results can be gruesome.

Pag. 48.


An important rule of mathematical hygiene: when you’re field-testing a mathematical method, try computing the same thing several different ways. If you get several different answers, something’s wrong with your method.

Pag. 57.


Stephen Pinker makes a similar point in his recent best seller The Better Angels of Our Nature, which argues that the world has steadily grown less violent throughout human history. The twentieth century gets a bad rap because of the vast numbers of people caught in the gears of great-power politics. But the Nazis, the Soviets, the Communist Party of China, and the colonial overlords were actually not particularly effective slaughterers on a proportional basis, Pinker argues—there are just so many more people to kill nowadays! These days we don’t spare much grief for antique bloodlettings like the Thirty Years’ War. But that war took place in a smaller world, and by Pinker’s estimate killed one out of every hundred people on Earth. To do that now would mean wiping out 70 million people, more than the number who died in both world wars together. So it’s better to study rates: deaths as a proportion of total population. For instance, instead of counting raw numbers of brain cancer deaths by state, we can compute the proportion of each state’s population that dies of brain cancer each year.

Pag. 57.


It feels like something is making it happen. Indeed, de Moivre himself might have felt this way. By many accounts, he viewed the regularities in the behavior of repeated coin flips (or any other experiment subject to chance) as the work of God’s hand itself, which turned the short-term irregularities of coins, dice, and human life into predictable long-term behavior, governed by immutable laws and decipherable formulae. It’s dangerous to feel this way. Because if you think somebody’s transcendental hand—God, Lady Luck, Lakshmi, doesn’t matter—is pushing the coins to come up half heads, you start to believe in the so-called law of averages: five heads in a row and the next one’s almost sure to land tails. Have three sons, and a daughter is surely up next. After all, didn’t de Moivre tell us that extreme outcomes, like four straight sons, are highly unlikely? He did, and they are. But if you’ve already had three sons, a fourth son is not so unlikely at all. In fact, you’re just as likely to have a son as a first-time parent.

Pag. 66.


Here’s a rule of thumb that makes sense to me: if the magnitude of a disaster is so great that it feels right to talk about “survivors,” then it makes sense to measure the death toll as a proportion of total population. When you talk about a survivor of the Rwandan genocide, you could be talking about any Tutsi living in Rwanda; so it makes sense to say that the genocide wiped out 75% of the Tutsi population. And you might be justified to say that a catastrophe that killed 75% of the population of Switzerland was the “Swiss equivalent” of what befell the Tutsi.

Pag. 68.


Most mathematicians would say that, in the end, the disasters and atrocities of history form what we call a partially ordered set.

Pag. 69.


Don’t talk about percentages of numbers when the numbers might be negative.

Pag. 70.


The favorite data set of the rabbinical scholar is the Torah, which is, after all, a sequentially arranged string of characters drawn from a finite alphabet, which we attempt faithfully to transmit without error from synagogue to synagogue. Despite being written on parchment, it’s the original digital signal.

Pag. 79.


Like I said, the Torah is a long document—by one count, it has 304,805 letters in all. So it’s not clear what to make, if anything, from patterns like the one Weissmandl found—there are lots of ways to slice and dice the Torah, and inevitably some of them are going to spell out words.

Pag. 81.


The universe is big, and if you’re sufficiently attuned to amazingly improbable occurrences, you’ll find them. Improbable things happen a lot. It’s massively improbable to get hit by a lightning bolt, or to win the lottery; but these things happen to people all the time, because there are a lot of people in the world, and a lot of them buy lottery tickets, or go golfing in a thunderstorm, or both. Most coincidences lose their snap when viewed from the appropriate distance.

Pag. 87.


Wiggle room is what the Baltimore stockbroker has when he gives himself plenty of chances to win; wiggle room is what the mutual fund company has when it decides which of its secretly incubating funds are winners and which are trash. Wiggle room is what McKay and Bar-Natan used to work up a list of rabbinical names that jibed well with War and Peace. When you’re trying to draw reliable inferences from improbable events, wiggle room is the enemy.

Pag. 90.


Neuroscientists divvy up their fMRI scans into tens of thousands of small pieces, called voxels, each corresponding to a small region of the brain. When you scan a brain, even a cold dead fish brain, there’s a certain amount of random noise coming through on each voxel. It’s pretty unlikely that the noise will happen to spike exactly at the moment that you show the fish a snapshot of a person in emotional extremity. But the nervous system is a big place, with tens of thousands of voxels to choose from. The odds that one of those voxels provides data matching up well with the photos is pretty good. That’s exactly what Bennett and his collaborators found; in fact, they located two groups of voxels that did an excellent job empathizing with human emotion, one in the salmon’s medial brain cavity and the other in the upper spinal column.

Pag. 92.


Long story. The point is, reverse engineering is hard. The problem of inference, which is what the Bible coders were wrestling with, is hard because it’s exactly this kind of problem. When we are scientists, or Torah scholars, or toddlers gaping at the clouds, we are presented with observations and asked to build theories—what went into the box to produce the world that we see? Inference is a hard thing, maybe the hardest thing. From the shape of the clouds and the way they move we struggle to go backward, to solve for x, the system that made them.

Pag. 99.


Improbability, as described here, is a relative notion, not an absolute one; when we say an outcome is improbable, we are always saying, explicitly or not, that it is improbable under some set of hypotheses we’ve made about the underlying mechanisms of the world.

Pag. 102.


The standard framework, called the null hypothesis significance test, was developed in its most commonly used form by R. A. Fisher, the founder of the modern practice of statistics,* in the early twentieth century. It goes like this. First, you have to run an experiment. You might start with a hundred subjects, then randomly select half to receive your proposed wonder drug while the other half gets a placebo. Your hope, obviously, is that the patients on the drug will be less likely to die than the ones getting the sugar pill. From here, the protocol might seem simple: if you observe fewer deaths among the drug patients than the placebo patients, declare victory and file a marketing application with the FDA. But that’s wrong. It’s not enough that the data be consistent with your theory; they have to be inconsistent with the negation of your theory, the dreaded null hypothesis.

Pag. 102.


Under the null hypothesis, there’s only one chance in two hundred of getting results this good. That’s much more compelling. If I claim I can make the sun come up with my mind, and it does, you shouldn’t be impressed by my powers; but if I claim I can make the sun not come up, and it doesn’t, then I’ve demonstrated an outcome very unlikely under the null hypothesis, and you’d best take notice.

Pag. 104.


Run an experiment. Suppose the null hypothesis is true, and let p be the probability (under that hypothesis) of getting results as extreme as those observed. The number p is called the p-value. If it is very small, rejoice; you get to say your results are statistically significant. If it is large, concede that the null hypothesis has not been ruled out.

Pag. 104.


In other words: if natural selection were false, think how unlikely it would be to encounter a biological world so thoroughly consistent with its predictions! The contribution of R. A. Fisher was to make significance testing into a formal endeavor, a system by which the significance, or not, of an experimental result was a matter of objective fact. In the Fisherian form, the null hypothesis significance test has been a standard method for assessing the results of scientific research for nearly a century. A standard textbook calls the method “the backbone of psychological research.” It’s the standard by which we separate experiments into successes and failures. Every time you encounter the results of a medical, psychological, or economic research study, you’re very likely reading about something that was vetted by a signficance test.

Pag. 106.


A statistical study that’s not refined enough to detect a phenomenon of the expected size is called underpowered—the equivalent of looking at the planets with binoculars. Moons or no moons, you get the same result, so you might as well not have bothered. You don’t send binoculars to do a telescope’s job.

Pag. 115.


a player is much more likely to try a long shot after a three-point basket than after a three-point miss. In other words, the hot hand might “cancel itself out”—players, believing themselves to be hot, get overconfident and take shots they shouldn’t. The nature of the analogous phenomenon in stock investment is left as an exercise for the reader.

Pag. 120.


You can think of the null hypothesis significance test as a sort of fuzzy version of the reductio: Suppose the null hypothesis H is true. It follows from H that a certain outcome O is very improbable (say, less than Fisher’s 0.05 threshold). But O was actually observed. Therefore, H is very improbable.

Pag. 123.


Zhang’s success, along with related work of other contemporary big shots like Ben Green and Terry Tao, points to a prospect even more exciting than any individual result about primes: that we might, in the end, be on our way to developing a richer theory of randomness. Say, a way of specifying precisely what we mean when we say that numbers act as if randomly scattered with no governing structure, despite arising from completely deterministic processes. How wonderfully paradoxical: what helps us break down the final mysteries about prime numbers may be new mathematical ideas that structure the concept of structurelessness

Pag. 134.


In medicine, most interventions we try won’t work and most associations we test for are going to be absent. Think about tests of genetic association with diseases: there are lots of genes on the genome, and most of them don’t give you cancer or depression or make you fat or have any recognizable direct effect at all. Ioannidis asks us to consider the case of genetic influence on schizophrenia. Such an influence is almost certain, given what we know about the heritability of the disorder. But where is it on the genome? Researchers might cast their net wide—it’s the Big Data era, after all—looking at a hundred thousand genes (more precisely: genetic polymorphisms) to see which ones are associated with schizophrenia. Ioannidis suggests that around ten of these actually have some clinically relevant effect. And the other 99,990? They’ve got nothing to do with schizophrenia. But one in twenty of them, or just about five thousand, are going to pass the p-value test of statistical significance. In other words, among the “OMG I found the schizophrenia gene” results that might get published, there are five hundred times as many bogus ones as real ones.

Pag. 136.


If you decide what color jelly beans to eat based just on the papers that get published, you’re making the same mistake the army made when they counted the bullet holes on the planes that came back from Germany. As Abraham Wald pointed out, if you want an honest view of what’s going on, you also have to consider the planes that didn’t come back. This is the so-called file drawer problem—a scientific field has a drastically distorted view of the evidence for a hypothesis when public dissemination is cut off by a statistical significance threshold. But we’ve already given the problem another name. It’s the Baltimore stockbroker. The lucky scientist excitedly preparing a press release about dermatological correlates of Green Dye #16 is just like the naive investor mailing off his life savings to the crooked broker. The investor, like the scientist, gets to see the one rendition of the experiment that went well by chance, but is blind to the much larger group of experiments that failed.

Pag. 141.


When the scientific community file-drawers its failed experiments, it plays both parts at once. They’re running the con on themselves. And all this is assuming that the scientists in question are playing fair. But that doesn’t always happen. Remember the wiggle-room problem that ensnared the Bible coders? Scientists, subject to the intense pressure to publish lest they perish, are not immune to the same wiggly temptations. If you run your analysis and get a p-value of .06, you’re supposed to conclude that your results are statistically insignificant. But it takes a lot of mental strength to stuff years of work in the file drawer. After all, don’t the numbers for that one experimental subject look a little screwy? Probably an outlier, maybe try deleting that line of the spreadsheet. Did we control for age? Did we control for the weather outside? Did we control for age and the weather outside? Give yourself license to tweak and shade the statistical tests you carry out on your results, and you can often get that .06 down to a .04. Uri Simonsohn, a professor at Penn who’s a leader in the study of replicability, calls these practices “p-hacking.” Hacking the p isn’t usually as crude as I’ve made it out to be, and it’s seldom malicious. The p-hackers truly believe in their hypotheses, just as the Bible coders do, and when you’re a believer, it’s easy to come up with reasons that the analysis that gives a publishable p-value is the one you should have done in the first place.

Pag. 141.


When they don’t think anyone’s listening, scientists call this practice “torturing the data until it confesses.” And the reliability of the results are about what you’d expect from confessions extracted by force. Assessing the scale of the p-hacking problem is not so easy—you can’t examine the papers that are hidden in the file drawer or were simply never written, just as you can’t examine the downed planes in Germany to see where they were hit. But you can, like Abraham Wald, make some inferences about data you can’t measure directly.

Pag. 142.


That slope is the shape of p-hacking. It tells you that a lot of experimental results that belong over on the unpublishable side of the p = .05 boundary have been cajoled, prodded, tweaked, or just plain tortured until, at last, they end up just on the happy side of the line. That’s good for the scientists who need publications, but it’s bad for science.

Pag. 143.


Scientists will twist themselves into elaborate verbal knots trying to justify reporting a result that doesn’t make it to statistical significance: they say the result is “almost statistically significant,” or “leaning toward significance,” or “well-nigh significant,” or “at the brink of significance,” or even, tantalizingly, that it “hovers on the brink of significance.”* It’s easy to make fun of the anguished researchers who resort to such phrases, but we should be hating on the game, not the players—it’s not their fault that publication is conditioned on an all-or-nothing threshold. To live or die by the .05 is to make a basic category error, treating a continuous variable (how much evidence do we have that the drug works, the gene predicts IQ, fertile women like Republicans?) as if it were a binary one (true or false? yes or no?). Scientists should be allowed to report statistically insignificant data.

Pag. 143.


But a conventional boundary, obeyed long enough, can be easily mistaken for an actual thing in the world. Imagine if we talked about the state of the economy this way! Economists have a formal definition of a “recession,” which depends on arbitrary thresholds just as “statistical signficance” does. One doesn’t say, “I don’t care about the unemployment rate, or housing starts, or the aggregate burden of student loans, or the federal deficit; if it’s not a recession, we’re not going to talk about it.” One would be nuts to say so. The critics—and there are more of them, and they are louder, each year—say that a great deal of scientific practice is nuts in just this way.

Pag. 145.


The development of the confidence interval is generally ascribed to Jerzy Neyman, another giant of early statistics. Neyman was a Pole who, like Abraham Wald, started as a pure mathematician in Eastern Europe before taking up the then-new practice of mathematical statistics and moving to the West. In the late 1920s, Neyman began collaborating with Egon Pearson, who had inherited from his father Karl both an academic position in London and a bitter academic feud with R. A. Fisher. Fisher was a difficult type, always ready for a fight, about whom his own daughter said, “He grew up without developing a sensitivity to the ordinary humanity of his fellows.” In Neyman and Pearson he found opponents sharp enough to battle him for decades. Their scientific differences are perhaps most starkly displayed in Neyman and Pearson’s approach to the problem of inference.* How to determine the truth from the evidence? Their startling response is to unask the question. For Neyman and Pearson, the purpose of statistics isn’t to tell us what to believe, but to tell us what to do. Statistics is about making decisions, not answering questions. A significance test is no more or less than a rule, which tells the people in charge whether to approve a drug, undertake a proposed economic reform, or tart up a website.

Pag. 147.


Fisher certainly understood that clearing the significance bar wasn’t the same thing as finding the truth. He envisions a richer, more iterated approach, writing in 1926: “A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance.” Not “succeeds once in giving,” but “rarely fails to give.” A statistically significant finding gives you a clue, suggesting a promising place to focus your research energy. The significance test is the detective, not the judge. You know how when you read an article about a breakthrough finding that this thing causes that thing, or that thing prevents the other thing, and at the end there’s always a banal sort of quote from a senior scientist not involved in the study intoning some very minor variant of “The finding is quite interesting, and suggests that more research in this direction is needed”? And how you don’t really even read that part because you think of it as an obligatory warning without content? Here’s the thing—the reason scientists always say that is because it’s important and it’s true!

Pag. 149.


But the culture is changing. Reformers with loud voices like Ioannides and Simonsohn, who speak both to the scientific community and to the broader public, have generated a new sense of urgency about the danger of descent into large-scale haruspicy. In 2013, the Association for Psychological Science announced that they would start publishing a new genre of article, called Registered Replication Reports. These reports, aimed at reproducing the effects reported in widely cited studies, are treated differently from usual papers in a crucial way: the proposed experiment is accepted for publication before the study is carried out. If the outcomes support the initial finding, great news, but if not, they’re published anyway, so the whole community can know the full state of the evidence. Another consortium, the Many Labs project, revisits high-profile findings in psychology and attempts to replicate them in large multinational samples. In November 2013, psychologists were cheered when the first suite of Many Labs results came back, finding that 10 of the 13 studies addressed were successfully replicated.

Pag. 150.


Question 1: What’s the chance that a person gets put on Facebook’s list, given that they’re not a terrorist? Question 2: What’s the chance that a person’s not a terrorist, given that they’re on Facebook’s list?

Pag. 158.


The p-value is the answer to the question “The chance that the observed experimental result would occur, given that the null hypothesis is correct.” But what we want to know is the other conditional probability: “The chance that the null hypothesis is correct, given that we observed a certain experimental result.” The danger arises precisely when we confuse the second quantity for the first.

Pag. 158.


Where does the apparent paradox of the terrorist red list come from? Why does the mechanism of the p-value, which seems so reasonable, work so very badly in this setting? Here’s the key. The p-value takes into account what proportion of people Facebook flags (about 1 in 2000) but it totally ignores the proportion of people who are terrorists. When you’re trying to decide whether your neighbor is a secret terrorist, you have critical prior information, which is that most people aren’t terrorists! You ignore that fact at your peril. Just as R.A. Fisher said, you have to evaluate each hypothesis in the “light of the evidence” of what you already know about it.

Pag. 159.


What we’ve done is to compute how our degrees of belief in the various theories ought to change once we see five reds in a row—what are known as the posterior probabilities. Just as the prior describes your beliefs before you see the evidence, the posterior describes your beliefs afterward. What we’re doing here is called Bayesian inference, because the passage from prior to posterior rests on an old formula in probability called Bayes’s Theorem. That theorem is a short algebraic expression and I could write it down for you right here and now.

Pag. 167.


Relying purely on null hypothesis significance testing is a deeply non-Bayesian thing to do—strictly speaking, it asks us to treat the cancer drug and the plastic Stonehenge with exactly the same respect. Is that a blow to Fisher’s view of statistics? On the contrary. When Fisher says that “no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas,” he is saying exactly that scientific inference can’t, or at least shouldn’t, be carried out purely mechanically; our preexisting ideas and beliefs must always be allowed to play a part.

Pag. 168.


Bayesian statisticians often don’t think about the null hypothesis at all; rather than asking “Does this new drug have any effect?” they might be more interested in a best guess for a predictive model governing the drug’s effects in various doses on various populations. And when they do talk about hypotheses, they’re relatively at ease with talking about the probability that a hypothesis—say, that the new drug works better than the existing one—is true. Fisher was not. In his view, the language of probability was appropriately used only in a context where some actual chance process is taking place. At this point, we’ve arrived at the shore of a great sea of philosophical difficulty, into which we’ll dip one or two toes, max. First of all: when we call Bayes’s Theorem a theorem it suggests we are discussing incontrovertible truths, certified by mathematical proof. That’s both true and not. It comes down to the difficult question of what we mean when we say “probability.” When we say that there’s a 5% chance that RED is true, we might mean that there actually is some vast global population of roulette wheels, of which exactly one in twenty is biased to fall red 3/5 of the time, and that any given roulette wheel we encounter is randomly picked from the roulette wheel multitude. If that’s what we mean, then Bayes’s Theorem is a plain fact, akin to the Law of Large Numbers we saw in the last chapter; it says that, in the long run, under the conditions we set up in the example, 12% of the roulette wheels that come up RRRRR are going to be of the red-favoring kind. But this isn’t actually what we’re talking about. When we say that there’s a 5% chance that RED is true, we are making a statement not about the global distribution of biased roulette wheels (how could we know?) but rather about our own mental state. Five percent is the degree to which we believe that a roulette wheel we encounter is weighted toward the red. This is the point at which Fisher totally got off the bus, by the way. He wrote an unsparing pan of John Maynard Keynes’s Treatise on Probability, in which probability “measures the ‘degree of rational belief’ to which a proposition is entitled in the light of given evidence.” Fisher’s opinion of this viewpoint is well summarized by his closing lines: “If the views of the last section of Mr. Keynes’s book were accepted as authoritative by mathematical students in this country, they would be turned away, some in disgust, and most in ignorance, from one of the most promising branches of applied mathematics.”

Pag. 169.


The Bayesian outlook is already enough to explain why RBRRB looks random while RRRRR doesn’t, even though both are equally improbable. When we see RRRRR, it strengthens a theory—the theory that the wheel is rigged to land red—to which we’ve already assigned some prior probability. But what about RBRRB? You could imagine someone walking around with an unusually open-minded stance concerning roulette wheels, which assigns some modest probability to the theory that the roulette wheel was fitted with a hidden Rube Goldberg apparatus designed to produce the outcome red, black, red, red, black. Why not? And such a person, observing RBRRB, would find this theory very much bolstered. But this is not how real people react to the spins of a roulette wheel coming up red, black, red, red, black. We don’t allow ourselves to consider every cockamamie theory we can logically devise. Our priors are not flat, but spiky. We assign a lot of mental weight to a few theories, while others, like the RBRRB theory, get assigned a probability almost indistinguishable from zero. How do we choose our favored theories? We tend to like simpler theories better than more complicated ones, theories that rest on analogies to things we already know about better than theories that posit totally novel phenomena. That may seem like an unfair prejudice, but without some prejudices we would run the risk of walking around in a constant state of astoundedness. Richard Feynman famously captured this state of mind: You know, the most amazing thing happened to me tonight. I was coming here, on the way to the lecture, and I came in through the parking lot. And you won’t believe what happened. I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing!

Pag. 170.


If you’ve ever used America’s most popular sort-of-illegal psychotropic substance, you know what it feels like to have too-flat priors. Every single stimulus that greets you, no matter how ordinary, seems intensely meaningful. Each experience grabs hold of your attention and demands that you take notice. It’s a very interesting mental state to be in. But it’s not conducive to making good inferences. The Bayesian point of view explains why Feynman wasn’t actually amazed; it’s because he assigns a very low prior probability to the hypothesis that a cosmic force intended him to see the license plate ARW 357 that night. It explains why five reds in a row feels “less random” than RBRRB to us; it’s because the former activates a theory, RED, to which we assign some non-negligible prior probability, and the latter doesn’t. And a number ending in 0 feels less random than a number ending in 7, because the former supports the theory that the number we’re seeing is not a precise count, but an estimate.

Pag. 171.


If you do happen to find yourself partially believing a crazy theory, don’t worry—probably the evidence you encounter will be inconsistent with it, driving down your degree of belief in the craziness until your beliefs come into line with everyone else’s. Unless, that is, the crazy theory is designed to survive this winnowing process. That’s how conspiracy theories work. Suppose you learn from a trusted friend that the Boston Marathon bombing was an inside job carried out by the federal government in order to, I don’t know, garner support for NSA wiretapping. Call that theory T. At first, because you trust your friend, maybe you assign that theory a reasonably high probability, say 0.1. But then you encounter other information: police located the suspected perpetrators, the surviving suspect confessed, etc. Each of these pieces of information is pretty unlikely, given T, and each one knocks down your degree of belief in T until you hardly credit it at all. That’s why your friend isn’t going to give you theory T; he’s going to add to it theory U, which is that the government and the news media are in on the conspiracy together, with the newspapers and cable networks feeding false information to support the story that the attack was carried out by Islamic radicals. The combined theory, T + U, should start out with a smaller prior probability; it is by definition harder to believe than T, because it asks you to swallow both T and another theory at the same time. But as the evidence flows in, which would tend to kill T alone,* the combined theory T + U remains untouched. Dzhokar Tsarnaev convicted? Well, sure, that’s exactly what you’d expect from a federal court—the Justice Department is totally in on it! The theory U acts as a kind of Bayesian coating to T, keeping new evidence from getting to it and dissolving it. This is a property most successful crackpot theories have in common; they’re encased in just enough protective stuff that they’re equally consistent with many possible observations, making them hard to dislodge. They’re like the multi-drug-resistant E. coli of the information ecosystem. In a weird way you have to admire

Pag. 172.


Digression: when I tell people the story of Edmond Halley and the price of annuities, I often get interrupted: “But it’s obvious that you should charge younger people more!” It is not obvious. Rather, it is obvious if you already know it, as modern people do. But the fact that people who administered annuities failed to make this observation, again and again, is proof that it’s not actually obvious. Mathematics is filled with ideas that seem obvious now—that negative quantities can be added and subtracted, that you can usefully represent points in a plane by pairs of numbers, that probabilities of uncertain events can be mathematically described and manipulated—but are in fact not obvious at all. If they were, they would not have arrived so late in the history of human thought.

Pag. 186.


ADDITIVITY: The expected value of the sum of two things is the sum of the expected value of the first thing with the expected value of the second thing.

Pag. 199.


The first step is to rephrase Buffon’s problem in terms of expected value. We can ask: What is the expected number of cracks the needle crosses? The number Buffon aimed to compute was the probability p that the thrown-down needle crosses a crack. Thus there is a probability of 1 − p that the needle doesn’t cross any cracks. But if the needle crosses a crack, it crosses exactly one.* So expected number of crossings is obtained the same way we always compute expected value: by summing up each possible number of crossings, multiplied by the probability of observing that number. In this case the only possibilities are 0 (observed with probability 1 − p) and 1 (observed with probability p) so we add up (1 − p) × 0 = 0 and p × 1 = p and get p. So the expected number of crossings is simply p, the same number Buffon computed.

Pag. 204.


I’ve found that in moments of emotional extremity there is nothing like a math problem to quiet the complaints the rest of the psyche serves up. Math, like meditation, puts you in direct contact with the universe, which is bigger than you, was here before you, and will be here after you. It might drive me crazy not to do it.

Pag. 210.


So who did get scammed? The obvious answer is “the other players.” It was their cash, after all, that ended up rolling into the cartels’ pockets.

Pag. 216.


Why do we allow this kind of thing to persist? The answer is simple—eliminating waste has a cost, just as getting to the airport early has a cost. Enforcement and vigilance are worthy goals, but eliminating all the waste, just like eliminating even the slightest chance of missing a plane, carries a cost that outweighs the benefit. As blogger (and former mathlete) Nicholas Beaudrot observed, that $31 million represents .004% of the benefits disbursed annually by the SSA. In other words, the agency is already extremely good at knowing who’s alive and who’s no more. Getting even better at that distinction, in order to eliminate those last few mistakes, might be expensive. If we’re going to count utils, we shouldn’t be asking, “Why are we wasting the taxpayer’s money?,” but “What’s the right amount of the taxpayer’s money to be wasting?” To paraphrase Stigler: if your government isn’t wasteful, you’re spending too much time fighting government waste.

Pag. 222.


This is a mathematical way of formalizing a principle you already know: the richer you are, the more risks you can afford to take. Bets like the one above are like risky stock investments with a positive expected dollar payoff; if you make a lot of these investments, you might sometimes lose a bunch of cash at once, but in the long run you’ll come out ahead. The rich person, who has enough reserves to absorb those occasional losses, invests and gets richer; the nonrich people stay right where they are.

Pag. 238.


Except you’re not—because the world economy, in these interconnected times, is a big rickety tree house held together with rusty nails and string. An epic collapse of one part of the structure runs a serious risk of pulling down the whole shebang. The Federal Reserve has a strong disposition not to let that happen. As the old saying goes, if you’re down a million bucks, it’s your problem; but if you’re down five billion bucks, it’s the government’s problem. This financial strategy is cynical, but it often works—it worked for Long-Term Capital Management in the 1990s, as chronicled in Roger Lowenstein’s superb book When Genius Failed, and it worked for the firms that survived, and even profited from, the financial collapse of 2008. Absent fundamental changes that seem nowhere in sight, it will work again.*

Pag. 239.


It’s because of variance that retirement funds diversify their holdings. If you have all your money in oil and gas stocks, one big shock to the energy sector can torch your whole portfolio. But if you’re half in gas and half in tech, a big move in one batch of stocks needn’t be accompanied by any action in the others; it’s a lower-variance portfolio. You want to have your eggs in different baskets, lots of different baskets; this is exactly what you do when you stash your savings in a giant index fund, which distributes its investments across the entire economy. The more mathematically minded financial self-help books, like Burton Malkiel’s A Random Walk down Wall Street, are fond of this strategy; it’s dull, but it works. If retirement planning is exciting . . .

Pag. 240.


Keep trying possible combinations and you’ll quickly see that Harvey’s choices have a very special property: either he wins the jackpot, or he wins exactly three deuces.

Pag. 244.


What we’re up against here is the dreaded phenomenon known by computer-science types as “the combinatorial explosion.”

Pag. 245.


The people who first figured out what was going on here were the people who needed to understand both how things are and how things look, and the difference between the two: namely, painters. The moment, early in the Italian Renaissance, at which painters understood perspective was the moment visual representation changed forever, the moment when European paintings stopped looking like your kid’s drawings on the refrigerator door (if your kid mostly drew Jesus dead on the cross) and started looking like the things they were paintings of.* How exactly Florentine artists like Filippo Brunelleschi came to develop the modern theory of perspective has occasioned a hundred quarrels among art historians, into which we won’t enter here. What we know for sure is that the breakthrough joined aesthetic concerns with new ideas from mathematics and optics. A central point was the understanding that the images we see are produced by rays of light that bounce off objects and subsequently strike our eye.

Pag. 245.


The difference is just this: there are more lines through our eye than there are points on the ground, because there are horizontal lines, which don’t intersect the ground at all. These correspond to the vanishing points on our canvas, the places where train tracks meet. You might think of this line as a point on the ground that is “infinitely far away” in the direction of the tracks. And indeed, mathematicians usually call them points at infinity.

Pag. 248.


That’s part of the glory of math; we develop a body of ideas, and once they’re correct, they’re correct, even when applied far, far outside the context in which they were first conceived.

Pag. 252.


Understand this: I warmly endorse, in fact highly recommend, a bristly skepticism in the face of all claims that such-and-such an entity can be explained, or tamed, or fully understood, by mathematical means. And yet the history of mathematics is a history of aggressive territorial expansion, as mathematical techniques get broader and richer, and mathematicians find ways to address questions previously thought of as outside their domain. “A mathematical theory of probability” sounds unexceptional now, but once it would have seemed a massive overreach; math was about the certain and the true, not the random and the maybe-so! All that changed when Pascal, Bernoulli, and others found mathematical laws that governed the workings of chance.* A mathematical theory of infinity? Before the work of Georg Cantor in the nineteenth century, the study of the infinite was as much theology as science; now, we understand Cantor’s theory of multiple infinities, each one infinitely larger than the last, well enough to teach it to first-year math majors. (To be fair, it does kind of blow their minds.)

Pag. 254.


Shannon, in the paper that launched the theory of information, identified the basic tradeoff that engineers still grapple with today: the more resistant to noise you want your signal to be, the slower your bits are transmitted. The presence of noise places a cap on the length of a message your channel can reliably convey in a given amount of time; this limit was what Shannon called the capacity of the channel. Just as a pipe can only handle so much water, a channel can only handle so much information.

Pag. 255.


You’ll notice that both strings are code words in the Hamming code. In fact, the seven nonzero code words in the Hamming code match up exactly to the seven lines in the Fano plane. The Hamming code and the Fano plane (and, for that matter, the optimal ticket combo for the Transylvanian lottery) are exactly the same mathematical object in two different outfits! This is the secret geometry of the Hamming code. A code word is a set of three points in the Fano plane that form a line. Flipping a bit in the string is the same thing as adding or deleting a point, so as long as the original code word wasn’t 0000000, the bollixed transmission you get corresponds to a set with either two or four points.* If you receive a two-point set, you know how to figure out the missing point; it’s just the third point on the unique line that joins the two points you received. What if you receive a four-point set of the form “line plus one extra point?” Then you can infer that the correct message consists of those three points in your set that form a line.

Pag. 257.


One of Hamming’s great conceptual contributions was to insist that this wasn’t merely a metaphor, or didn’t have to be. He introduced a new notion of distance, now called the Hamming distance, which was adapted to the new mathematics of information just as the distance Euclid and Pythagoras understood was adapted to the geometry of the plane. Hamming’s definition was simple: the distance between two blocks is the number of bits you need to alter in order to change one block into the other.

Pag. 260.


Hamming’s eight code words are a good code because no block of seven bits is within Hamming distance 1 of two different code words.

Pag. 260.


Hamming’s notion of “distance” follows Fano’s philosophy—a quantity that quacks like distance has the right to behave like distance. But why stop there? The set of points at distance less than or equal to 1 from a given central point has a name in Euclidean geometry; it is called a circle, or, if we are in higher dimensions, a sphere.* So we’re compelled to call the set of strings at Hamming distance at most 1* from a code word a “Hamming sphere,” with the code word at the center. For a code to be an error-correcting code, no string—no point, if we’re to take this geometric analogy seriously—can be within distance 1 of two different code words; in other words, we ask that no two of the Hamming spheres centered at the code words share any points. So the problem of constructing error-correcting codes has the same structure as a classical geometric problem, that of sphere packing: how do we fit a bunch of equal-sized spheres as tightly as possible into a small space, in such a way that no two spheres overlap? More succinctly, how many oranges can you stuff into a box?

Pag. 262.


Here’s Kepler’s explanation. The pomegranate wants to fit as many seeds as possible inside its skin; in other words, it is carrying out a sphere-packing problem. If we believe nature does as good a job as can be done, then these spheres ought to be arranged in the densest possible fashion.

Pag. 263.


When you encounter an intricate construction like Hamming’s, you’re naturally inclined to think an error-correcting code is a very special thing, designed and engineered and tweaked and retweaked until every pair of code words has been gingerly nudged apart without any other pair being forced together. Shannon’s genius was to see that this vision was totally wrong. Error-correcting codes are the opposite of special. What Shannon proved—and once he understood what to prove, it was really not so hard—was that almost all sets of code words exhibited the error-correcting property; in other words, a completely random code, with no design at all, was extremely likely to be an error-correcting code. This was a startling development, to say the least.

Pag. 267.


Courage is one of the things that Shannon had supremely. You have only to think of his major theorem. He wants to create a method of coding, but he doesn’t know what to do so he makes a random code. Then he is stuck. And then he asks the impossible question, “What would the average random code do?” He then proves that the average code is arbitrarily good, and that therefore there must be at least one good code. Who but a man of infinite courage could have dared to think those thoughts? That is the characteristic of great scientists; they have courage. They will go forward under incredible circumstances; they think and continue to think.

Pag. 267.


In a bigger departure from classical theory, Daniel Kahnemann and Amos Tversky suggested that people in general tend to follow a different path from the one the utility curve demands, not just when Daniel Ellsberg sticks an urn in front of them, but in the general course of life. Their “prospect theory,” for which Kahnemann later won the Nobel Prize, is now seen as the founding document of behavioral economics, which aims to model with the greatest possible fidelity the way people do act, not the way that, according to an abstract notion of rationality, they should. In the Kahnemann-Twersky theory, people tend to place more weight on low-probability events than a person obedient to the von Neumann-Morgenstern axioms would; so the allure of the jackpot exceeds what a strict expected utility calculation would license.

Pag. 272.


Economics isn’t like physics and utility isn’t like energy. It is not conserved, and an interaction between two beings can leave both with more utility than they started with. This is the sunny free-marketeer’s view of the lottery. It’s not a regressive tax, it’s a game, where people pay the state a small fee for a few minutes of entertainment the state can provide very cheaply, and the proceeds keep the libraries open and the streetlights on.

Pag. 273.


Here’s Pascal again, delivering a typically morose take on the excitement of gambling: This man spends his life without weariness in playing every day for a small stake. Give him each morning the money he can win each day, on condition he does not play; you make him miserable. It will perhaps be said that he seeks the amusement of play and not the winnings. Make him then play for nothing; he will not become excited over it, and will feel bored. It is then not the amusement alone that he seeks; a languid and passionless amusement will weary him. He must get excited over it, and deceive himself by the fancy that he will be happy to win what he would not have as a gift on condition of not playing. — Pascal sees the pleasures of gambling as contemptible. And enjoyed to excess, they can of course be harmful. The reasoning that endorses lotteries also suggests that methamphetamine dealers and their clients enjoy a similar win-win relationship. Say what you want about meth, you can’t deny it is broadly and sincerely enjoyed.

Pag. 273.


Typical entrepreneurs (like typical lottery customers) overrate their chance of success. Even businesses that survive typically make their proprietors less money than they’d have drawn in salary from an existing company. And yet society benefits from a world in which people, against their wiser judgment, launch businesses. We want restaurants, we want barbers, we want smartphone games. Is entrepreneurship “a tax on the stupid”? You’d be called crazy if you said so. Part of that is because we esteem a business owner more highly than we do a gambler; it’s hard to separate our moral feelings about an activity from the judgments we make about its rationality. But part of it—the biggest part—is that the utility of running a business, like the utility of buying a lottery ticket, is not measured only in expected dollars. The very act of realizing a dream, or even trying to realize it, is part of its own reward.

Pag. 274.


With time, the top performers started to look and behave just like the members of the common mass. Secrist’s book arrived as a bucket of cold water to the face of an already uncomfortable business elite. Many reviewers saw in Secrist’s graphs and tables a numerical disproof of the mythology that sustained entrepreneurship. Robert Riegel of the University of Buffalo wrote, “The results confront the business man and the economist with an insistent and to some degree tragic problem. While there are exceptions to the general rule, the conception of an early struggle, crowned with success for the able and efficient, followed by a long period of harvesting the rewards, is thoroughly dissipated.”

Pag. 277.


Complete freedom to enter trade and the continuance of competition mean the perpetuation of mediocrity. New firms are recruited from the relatively “unfit”—at least from the inexperienced. If some succeed, they must meet the competitive practices of the class, the market, to which they belong. Superior judgment, merchandising sense, and honesty, however, are always at the mercy of the unscrupulous, the unwise, the misinformed and the injudicious. The results are that retail trade is over-crowded, shops are small and inefficient, volume of business inadequate, expenses relatively high, and profits small. So long as the field of activity is freely entered, and it is; and so long as competition is “free,” and within the limits suggested above, it is; neither superiority nor inferiority will tend to persist. Rather, mediocrity tends to become the rule. The average level of the intelligence of those conducting business holds sway, and the practices common to such trade mentality become the rule.

Pag. 277.


Whatever friends Galton had among the religious establishment were surely lost three years later, when he published a short article titled “Statistical Inquiries into the Efficacy of Prayer.” (Executive summary: prayer not so efficacious.)

Pag. 280.


To be fair, Darwin might have been biased, being Galton’s first cousin. What’s more, Darwin truly believed that mathematical methods offered scientists an enriched view of the world, even though his own work was far less quantitative than Galton’s. He wrote in his memoirs, reflecting on his early education, I attempted mathematics, and even went during the summer of 1828 with a private tutor (a very dull man) to Barmouth, but I got on very slowly. The work was repugnant to me, chiefly from my not being able to see any meaning in the early steps in algebra. This impatience was very foolish, and in after years I have deeply regretted that I did not proceed far enough at least to understand something of the great leading principles of mathematics, for men thus endowed seem to have an extra sense. In Galton, Darwin may have felt he was finally seeing the outset of the extrasensory biology he was mathematically unequipped to launch on his own.

Pag. 281.


Excellence doesn’t persist; time passes, and mediocrity asserts itself.* But there’s one big difference between Galton and Secrist. Galton was, in his heart, a mathematician, and Secrist was not. And so Galton understood why regression was taking place, while Secrist remained in the dark. Height, Galton understood, was determined by some combination of inborn characteristics and external forces; the latter might include environment, childhood health, or just plain chance. I am six foot one, and in part that’s because my father is six foot one and I share some of his height-promoting genetic material. But it’s also because I ate reasonably nutritious food as a child and didn’t undergo any unusual stresses that would have stunted my growth. And my height was no doubt bumped up and down by who knows how many other experiences I underwent, in utero and ex. Tall people are tall because their heredity predisposes them to be tall, or because external forces encourage them to be tall, or both. And the taller a person is, the likelier it is that both factors are pointing in the upward direction. In other words, people drawn from the tallest segment of the population are almost certain to be taller than their genetic predisposition would suggest. They were born with good genes, but they also got a boost from environment and chance. Their children will share their genes, but there’s no reason the external factors will once again conspire to boost their height over and above what heredity accounts for. And so, on average, they’ll be taller than the average person, but not quite so exceedingly tall as their beanpole parents. That’s what causes regression to the mean: not a mysterious mediocrity-loving force, but the simple workings of heredity intermingled with chance.

Pag. 282.


Hotelling was totally devoted to research and the generation of knowledge, and in Secrist he may have seen something of a kindred soul. “The labor of compilation and of direct collection of data,” he wrote sympathetically, “must have been gigantic.” Then the hammer drops. The triumph of mediocrity observed by Secrist, Hotelling points out, is more or less automatic whenever we study a variable that’s affected by both stable factors and the influence of chance. Secrist’s hundreds of tables and graphs “prove nothing more than that the ratios in question have a tendency to wander about.” The result of Secrist’s exhaustive investigation is “mathematically obvious from general considerations, and does not need the vast accumulation of data adduced to prove it.”

Pag. 288.


Biologists are eager to think regression stems from biology, management theorists like Secrist want it to come from competition, literary critics ascribe it to creative exhaustion—but it is none of these. It is mathematics. And still, despite the entreaties of Hotelling, Weldon, and Galton himself, the message hasn’t totally sunk in. It’s not just the Wall Street Journal sports page that gets this wrong; it happens to scientists, too.

Pag. 290.


Galton had shown that regression to the mean was in effect whenever the phenomenon being studied was influenced by the play of chance forces. But how strong were those forces, by comparison with the effect of heredity? In order to hear what the data was telling him, Galton had to put it in a form more graphically revealing than a column of numbers. He later recalled, “I began with a sheet of paper, ruled crossways, with a scale across the top to refer to the statures of the sons, and another down the side for the statures of their fathers, and there also I had put a pencil mark at the spot appropriate to the stature of each son and to that of his father.” This method of visualizing the data is the spiritual descendant of René Descartes’s analytic geometry, which asks us to think about points in the plane as pairs of numbers, an x-coordinate and a y-coordinate, joining algebra and geometry in a tight clasp they’ve been locked in ever since. Each father-son pair has an associated pair of numbers: namely, the height of the father followed by the height of the son. My father is six foot one and so am I—seventy-three inches each—so if we’d been in Galton’s data set we would have been recorded as (73,73). And Galton would have recorded our existence by making a mark on his sheet of paper with x-coordinate 73 and y-coordinate 73. Each parent and child in Galton’s voluminous records required another mark on the paper, until in the end his sheet bore a vast spray of dots, representing the whole range of variation in stature. Galton had invented the type of graph we now call a scatterplot.*

Pag. 291.


Scatterplots are spectacularly good at revealing the relationship between two variables; look in just about any contemporary scientific journal and you’ll see a raft of them. The late nineteenth century was a kind of golden age of data visualization. In 1869 Charles Minard made his famous chart showing the dwindling of Napoleon’s army on its path into Russia and its subsequent retreat, often called the greatest data graphic ever made; this, in turn, was a descendant of Florence Nightingale’s coxcomb graph* showing in stark visual terms that most of the British soldiers lost in the Crimean War had been killed by infections, not Russians.

Pag. 292.


Galton found an amazing regularity: his isopleths were all ellipses, one contained within the next, each one with the same center. It was like the contour map of a perfectly elliptical mountain, with its peak at the pair of heights most frequently observed in Galton’s sample: average height for both parents and children. The mountain is none other than the three-dimensional version of the gendarme’s hat that de Moivre had studied; in modern language we call it the bivariate normal distribution.

Pag. 295.


When the son’s height is completely unrelated to those of the parents, as in the second scatterplot above, Galton’s ellipses are all circles, and the scatterplot looks roughly round. When the son’s height is completely determined by heredity, with no chance element involved, as in the first scatterplot, the data lies along a straight line, which one might think of as an ellipse that has gotten as elliptical as it possibly can. In between, we have ellipses of various levels of skinniness. That skinniness, which the classical geometers called the eccentricity of the ellipse, is a measure of the extent to which the height of the father determines that of the son. High eccentricity means that heredity is powerful and regression to the mean is weak; low eccentricity means the opposite, that regression to the mean holds sway. Galton called his measure correlation, the term we still use today. If Galton’s ellipse is almost round, the correlation is near 0; when the ellipse is skinny, lined up along the northeast-southwest axis, the correlation comes close to 1. By means of the eccentricity—a geometric quantity at least as old as the work of Apollonius of Perga in the third century BCE—Galton had found a way to measure the association between two variables, and in so doing had solved a problem at the cutting edge of nineteenth-century biology: the quantification of heredity.

Pag. 296.


In math there are many, many complicated objects, but only a few simple ones. So if you have a problem whose solution admits a simple mathematical description, there are only a few possibilities for the solution. The simplest mathematical entities are thus ubiquitous, forced into multiple duty as solutions to all kinds of scientific problems. The simplest curves are lines. And it’s clear that lines are everywhere in nature, from the edges of crystals to the paths of moving bodies in the absence of force. The next simplest curves are those cut out by quadratic equations,* in which no more than two variables are ever multiplied together. So squaring a variable, or multiplying two different variables, is allowed, but cubing a variable, or multiplying one variable by the square of another, is strictly forbidden. Curves in this class, including ellipses, are still called conic sections out of deference to history; but more forward-looking algebraic geometers call them quadrics.*

Pag. 298.


Galton understood quickly that the idea of correlation wasn’t limited to the study of heredity; it applied to any pair of qualities that might bear some relation to one another. As it happened, Galton was in possession of a massive database of anatomical measurements, of the sort that were enjoying a vogue in the late nineteenth century, thanks to the work of Alphonse Bertillon.

Pag. 300.


To study this phenomenon, Galton made another scatterplot, this one of height versus “cubit,” the distance from the elbow to the tip of the middle finger. To his astonishment, he saw the same elliptical pattern that had emerged from the heights of fathers and sons. Once again, he had graphically demonstrated that the two variables, height and cubit, were correlated, even though one didn’t strictly determine the other. If two measurements are highly correlated (like the length of the left foot and the length of the right) there’s little point in taking the time to record both numbers. The best measurements to take are the ones that are uncorrelated with each of the others. And the relevant correlations could be computed from the vast array of anthropometric data Galton had already gathered. As it happens, Galton’s invention of correlation didn’t lead to the institution of a vastly improved Bertillon system. That was largely thanks to Galton himself, who championed a competing system, dactyloscopy—what we now call fingerprinting. Like Bertillon’s system, fingerprinting reduced a suspect to a list of numbers or symbols that could be marked on a card, sorted, and filed. But fingerprinting enjoyed certain obvious advantages, most notably that a criminal’s fingerprints were often available for measurement in circumstances where the criminal himself was not.

Pag. 301.


Galton’s great insight was that the same thing applies even if finger length and cubit length aren’t identical, but only correlated. Correlations between the measurements make the Bertillon code less informative. Once again, Galton’s keen wisdom provided him a kind of intellectual prescience. What he’d captured was, in embryonic form, a way of thinking that would become fully formalized only a half-century later, by Claude Shannon in his theory of information. As we saw in chapter 13, Shannon’s formal measure of information was able to provide bounds on how quickly bits could flow through a noisy channel; in much the same way, Shannon’s theory provides a way of capturing the extent to which correlation between variables reduces the informativeness of a card. In modern terms we would say that the more strongly correlated the measurements, the less information, in Shannon’s precise sense, a Bertillon card conveys. Nowadays, though Bertillonage is gone, the idea that the best way to keep track of identity is by a sequence of numbers has achieved total dominance; we live in a world of digital information. And the insight that correlation reduces the effective amount of information has emerged as a central organizing principle.

Pag. 304. 4772 Darwin showed that one could meaningfully talk about progress without any need to invoke purpose.

Pag. 312.


Galton showed that one could meaningfully talk about association without any need to invoke underlying cause.

Pag. 312.


Two variables are positively correlated when the corresponding vectors are separated by an acute angle—that is, an angle smaller than 90 degrees—and negatively correlated when the angle between the vectors is larger than 90 degrees, or obtuse. It makes sense: vectors at an acute angle to one another are, in some loose sense, “pointed in the same direction,” while vectors that form an obtuse angle seem to be working at cross purposes. When the angle is a right angle, neither acute nor obtuse, the two variables have a correlation of zero; they are, at least as far as correlation goes, unrelated to each other. In geometry, we call a pair of vectors that form a right angle perpendicular, or orthogonal. And by extension, it’s common practice among mathematicians and other trig aficionados to use the word “orthogonal” to refer to something unrelated to the issue at hand—“You might expect that mathematical skills are associated with magnificent popularity, but in my experience, the two are orthogonal.”

Pag. 316. 4886 Correlation is not transitive.

Pag. 319.


Galton’s notion of correlation is limited in a very important way: it detects linear relations between variables, where an increase in one variable tends to coincide with a proportionally large increase (or decrease) in the other. But just as not all curves are lines, not all relationships are linear relationships.

Pag. 322.


The mere assertion of correlation is very different from an explanation. Doll and Hill’s study doesn’t show that smoking causes cancer; as they write, “The association would occur if carcinoma of the lung caused people to smoke or if both attributes were end-effects of a common cause.” That lung cancer causes smoking, as they point out, is not very reasonable; a tumor can’t go back in time and give someone a pack-a-day habit. But the problem of the common cause is more troubling.

Pag. 327.


It’s easy to criticize the policy makers in these scenarios for letting their decision making get ahead of the science. But it’s not that simple. It’s not always wrong to be wrong.

Pag. 332.


Remember: the expected value doesn’t represent what we literally expect to happen, but rather what we might expect to happen on average were the same decision to be repeated again and again.

Pag. 333.


But one thing’s for certain: refraining from making recommendations at all, on the grounds that they might be wrong, is a losing strategy. It’s a lot like George Stigler’s advice about missing planes. If you never give advice until you’re sure it’s right, you’re not giving enough advice.

Pag. 334.


But niceness and handsomeness have a common effect; they put these men in the group that you notice. Be honest—the mean uglies are the ones you never even consider. So inside the Great Square is a Smaller Triangle of Acceptable Men: And now the source of the phenomenon is clear. The handsomest men in your triangle run the gamut of personalities, from kindest to cruelest. On average, they’re about as nice as the average person in the whole population, which, let’s face it, is not that nice. And by the same token, the nicest men are only averagely handsome. The ugly guys you like, though—they make up a tiny corner of the triangle, and they are pretty darn nice—they have to be, or they wouldn’t be visible to you at all.

Pag. 338.


“The most plausible reading of this data is that the public wants a free lunch,” economist Bryan Caplan wrote. “They hope to spend less on government without touching any of its main functions.” Nobel Prize−winning economist Paul Krugman: “People want spending cut, but are opposed to cuts in anything except foreign aid. . . . The conclusion is inescapable: Republicans have a mandate to repeal the laws of arithmetic.”

Pag. 340.


That’s the familiar self-contradicting position we see in polls: We want to cut! But we also want each program to keep all its funding! How did we get to this impasse? Not because the voters are stupid or delusional. Each voter has a perfectly rational, coherent political stance. But in the aggregate, their position is nonsensical.

Pag. 341.


I think the right answer is that there are no answers. Public opinion doesn’t exist. More precisely, it exists sometimes, concerning matters about which there’s a clear majority view. Safe to say it’s the public’s opinion that terrorism is bad and The Big Bang Theory is a great show. But cutting the deficit is a different story. The majority preferences don’t meld into a definitive stance. If there’s no such thing as the public opinion, what’s an elected official to do? The simplest answer: when there’s no coherent message from the people, do whatever you want. As we’ve seen, simple logic demands that you’ll sometimes be acting contrary to the will of the majority.

Pag. 344.


The immense ingenuity of the human species in devising ways to punish people rivals our abilities in art, philosophy, and science. Punishment is a renewable resource; there is no danger we’ll run out.

Pag. 351.


In other words: the slime mold likes the small, unlit pile of oats about as much as it likes the big, brightly lit one. But if you introduce a really small unlit pile of oats, the small dark pile looks better by comparison; so much so that the slime mold decides to choose it over the big bright pile almost all the time. This phenomenon is called the “asymmetric domination effect,” and slime molds are not the only creatures subject to it. Biologists have found jays, honeybees, and hummingbirds acting in the same seemingly irrational way.

Pag. 357.


So if you’re a single guy looking for love, and you’re deciding which friend to bring out on the town with you, choose the one who’s pretty much exactly like you—only slightly less desirable.

Pag. 358.


Maybe individual people seem irrational because they aren’t really individuals! Each one of us is a little nation-state, doing our best to settle disputes and broker compromises between the squabbling voices that drive us. The results don’t always make sense. But they somehow allow us, like the slime molds, to shamble along without making too many terrible mistakes. Democracy is a mess—but it kind of works.

Pag. 358.


A majority of voters—everybody in the pie slices marked K and W—prefers Wright to Montroll. And another majority, the M and K slices, prefers Kiss to Wright. If most people like Kiss better than Wright, and most people like Wright better than Montroll, doesn’t that mean Kiss should win again? There’s just one problem: people like Montroll better than Kiss by a resounding 2845 to 371. There’s a bizarre vote triangle: Kiss beats Wright, Wright beats Montroll, Montroll beats Kiss. Every candidate would lose a one-on-one race to one of the other two candidates. So how can anyone at all rightfully take office? Vexing circles like this are called Condorcet paradoxes, after the French Enlightenment philosopher who first discovered them in the late eighteenth century. Marie-Jean-Antoine-Nicolas de Caritat, Marquis de Condorcet, was a leading liberal thinker in the run-up to the French Revolution, eventually becoming president of the Legislative Assembly. He was an unlikely politician—shy and prone to exhaustion, with a speaking style so quiet and hurried that his proposals often went unheard in the raucous revolutionary chamber. On the other hand, he became quickly exasperated with people whose intellectual standards didn’t match his own. This combination of timidity and temper led his mentor Jacques Turgot to nickname him “le mouton enragé,” or “the rabid sheep.”

Pag. 363.


Condorcet’s so-called “jury theorem” shows that a sufficiently large jury is very likely to arrive at the right outcome, as long as the jurors have some individual bias toward correctness, no matter how small.* If the majority of people believe something, Condorcet said, that must be taken as strong evidence that it is correct. We are mathematically justified in trusting a sufficiently large majority—even when it contradicts our own preexisting beliefs. “I must act not by what I think reasonable,” Condorcet wrote, “but by what all who, like me, have abstracted from their own opinion must regard as conforming to reason and truth.” The role of the jury is much like the role of the audience on Who Wants to Be a Millionaire? When we have the chance to query a collective, Condorcet thought, even a collective of unknown and unqualified peers, we ought to value their majority opinion above our own.

Pag. 364.


In 1770, the twenty-seven-year-old Condorcet and his mathematical mentor, Jean le Rond d’Alembert, a coeditor of the Encylopédie, made an extended visit to Voltaire’s house at Ferney on the Swiss border. The mathophile Voltaire, then in his seventies and in faltering health, quickly adopted Condorcet as a favorite, seeing in the young up-and-comer his best hope of passing rationalistic Enlightenment principles to the next generation of French thinkers.

Pag. 365.


Condorcet thought he could. He wrote down an axiom—that is, a statement he took to be so self-evident as to require no justification. Here it is: If the majority of voters prefer candidate A to candidate B, then candidate B cannot be the people’s choice. Condorcet wrote admiringly of Borda’s work, but considered the Borda count unsatisfactory for the same reason that the slime mold is considered irrational by the classical economist; in Borda’s system, as with majority voting, the addition of a third alternative can flip the election from candidate A to candidate B. That violates Condorcet’s axiom: if A would win a two-person race against B, then B can’t be the winner of a three-person race that includes A. Condorcet intended to build a mathematical theory of voting from his axiom, just as Euclid had built an entire theory of geometry on his five axioms about the behavior of points, lines, and circles:

Pag. 367.


That’s the disgusting situation that faced Condorcet when he discovered his paradox. In the pie chart above, Condorcet’s axiom says Montroll cannot be elected, because he loses the head-to-head matchup to Wright. The same goes for Wright, who loses to Kiss, and for Kiss, who loses to Montroll. There is no such thing as the people’s choice. It just doesn’t exist. Condorcet’s paradox presented a grave challenge to his logically grounded worldview. If there is an objectively correct ranking of candidates, it can hardly be the case that Kiss is better than Wright, who is better than Montroll, who is better than Kiss. Condorcet was forced to concede that in the presence of such examples, his axiom had to be weakened: the majority could sometimes be wrong. But the problem remained of piercing the fog of contradiction to divine the people’s actual will—for Condorcet never really doubted there was such a thing.

Pag. 368.


Sometimes, a mathematical development is “in the air”—for reasons only poorly understood, the community is ready for a certain advance to come, and it comes from several sources at once. Just as Bolyai was constructing his non-Euclidean geometry in Austria-Hungary, Nikolai Lobachevskii* was doing the same in Russia. And the great Carl Friedrich Gauss, an old friend of the senior Bolyai, had formulated many of the same ideas in work that had not yet seen print. (When informed of Bolyai’s paper, Gauss responded, somewhat ungraciously, “To praise it would amount to praising myself.”)

Pag. 371.


If that strange condition, where no two lines are ever parallel, sounds familiar, it’s because we’ve been here before. It’s just the same phenomenon we saw in the projective plane, which Brunelleschi and his fellow painters used to develop the theory of perspective.* There, too, every pair of lines met. And this is no coincidence—one can prove that Riemann’s geometry of Points and Lines on a sphere is the same as that of the projective plane.

Pag. 374.


This is a story told in mathematics again and again: we develop a method that works for one problem, and if it is a good method, one that really contains a new idea, we typically find that the same proof works in many different contexts, which may be as different from the original as a sphere is from a plane, or more so. At the moment, the young Italian mathematician Olivia Caramello is making a splash with claims that theories governing many different fields of mathematics are closely related beneath the skin—if you like technical terms, they are “classified by the same Grothendieck topos”—and, that, as a result, theorems proved in one field of mathematics can be carried over for free into theorems in another area, which on the surface appear totally different. It’s too early to say whether Caramello has truly “created a strange new universe,” as Bolyai did—but her work is very much in keeping with the long tradition in mathematics of which Bolyai was a part.

Pag. 374.


Hardy would certainly have recognized Condorcet’s anguish as perplexity of the most unnecessary kind. He would have advised Condorcet not to ask who the best candidate really was, or even who the public really intended to install in office, but rather which candidate we should define to be the public choice. And this formalist take on democracy is more or less general in the free world today.

Pag. 375.


Not everybody shares Scalia’s view of the law (note that his opinion in Davis was in the minority). As we saw in Atkins v. Virginia, the words of the Constitution, like “cruel and unusual,” leave a remarkable amount of space for interpretation. If even the great Euclid left some ambiguity in his axioms, how can we expect any different from the framers? Legal realists, like judge and University of Chicago professor Richard Posner, argue that Supreme Court jurisprudence is never the exercise in formal rule following that Scalia says it is: Most of the cases the Supreme Court agrees to decide are toss-ups, in the sense that they cannot be decided by conventional legal reasoning, with its heavy reliance on constitutional and statutory language and previous decisions. If they could be decided by those essentially semantic methods, they would be resolved uncontroversially at the level of a state supreme court or federal court of appeals and never get reviewed by the Supreme Court.

Pag. 379.


Even Justice Scalia has occasionally conceded that when the literal words of the law seem to require an absurd judgment, the literal words have to be set aside in favor of a reasonable guess as to what Congress must have meant. In just the same way, no scientist really wants to be bound strictly by the rules of significance, no matter what they say their principles are. When you run two experiments, one testing a clinical treatment that seems theoretically promising and the other testing whether dead salmon respond emotionally to romantic photos, and both experiments succeed with p-values of .03, you don’t really want to treat the two hypotheses the same. You want to approach absurd conclusions with an extra coat of skepticism, rules be damned.

Pag. 380.


But a specter was haunting Hilbert’s program—the specter of contradiction. Here’s the nightmare scenario. The community of mathematicians, working together in concert, rebuilds the entire apparatus of number theory, geometry, and calculus, starting from the bedrock axioms and laying on new theorems, brick by brick, each layer glued to the last by the rules of deduction. And then, one day, a mathematician in Amsterdam proves that a certain mathematical assertion is the case, while another mathematician in Kyoto proves that it is not. Now what? Starting from assertions one cannot possibly doubt, one has arrived at a contradiction. Reductio ad absurdum. Do you conclude that the axioms were wrong? Or that there’s something wrong with the structure of logical deduction itself? And what do you do with the decades of work based on those axioms?*

Pag. 382.


We might call such a set ouroboric, after the mythical snake so hungry it chows down on its own tail and consumes itself. So the set of infinite sets is ouroboric, but {1,2,pig} is not, because none of its elements is the set {1,2,pig} itself; all its elements are either numbers or farm animals, but not sets. Now here comes the punch line. Let NO be the set of all non-ouroboric sets. NO seems like a weird thing to think about, but if Frege’s definition allows it into the world of sets, so must we. Is NO ouroboric or not? That is, is NO an element of NO? By definition, if NO is ouroboric, then NO cannot be in NO, which consists only of non-ouroboric sets. But to say NO is not an element of NO is precisely to say NO is non-ouroboric; it does not contain itself. But wait a minute—if NO is non-ouroboric, then it is an element of NO, which is the set of all non-ouroboric sets. Now NO is an element of NO after all, which is to say that NO is ouroboric. If NO is ouroboric, it isn’t, and if it isn’t, it is. This, more or less, was the content of a letter the young Bertrand Russell wrote to Frege in June of 1902. Russell had met Peano in Paris at the International Congress—whether he attended Hilbert’s talk isn’t known, but he was certainly on board with the program of reducing all of mathematics to a pristine sequence of deductions from basic axioms.*

Pag. 384.


Hilbert sought a finitary proof of consistency, one that did not make reference to any infinite sets, one that a rational mind couldn’t help but wholly believe. But Hilbert was to be disappointed. In 1931, Kurt Gödel proved in his famous second incompleteness theorem that there could be no finitary proof of the consistency of arithmetic. He had killed Hilbert’s program with a single stroke. So should you be worried that all of mathematics might collapse tomorrow afternoon? For what it’s worth, I’m not. I do believe in infinite sets, and I find the proofs of consistency that use infinite sets to be convincing enough to let me sleep at night.

Pag. 386.


But we are Hilbert’s children; when we have beers with the philosophers on the weekend, and the philosophers hassle us about the status of the objects we study,* we retreat into our formalist redoubt, protesting: sure, we use our geometric intuition to figure out what’s going on, but the way we finally know that what we say is true is that there’s a formal proof behind the picture. In the famous formulation of Philip Davis and Reuben Hersh, “The typical working mathematician is a Platonist on weekdays and a formalist on Sundays.” Hilbert didn’t want to destroy Platonism; he wanted to make the world safe for Platonism, by placing subjects like geometry on a formal foundation so unshakable that we could feel as morally sturdy the whole week as we do on Sunday.

Pag. 387.


One of the most painful parts of teaching mathematics is seeing students damaged by the cult of the genius. The genius cult tells students it’s not worth doing mathematics unless you’re the best at mathematics, because those special few are the only ones whose contributions matter. We don’t treat any other subject that way! I’ve never heard a student say, “I like Hamlet, but I don’t really belong in AP English—that kid who sits in the front row knows all the plays, and he started reading Shakespeare when he was nine!” Athletes don’t quit their sport just because one of their teammates outshines them. And yet I see promising young mathematicians quit every year, even though they love mathematics, because someone in their range of vision was “ahead” of them. We lose a lot of math majors this way. Thus, we lose a lot of future mathematicians; but that’s not the whole of the problem. I think we need more math majors who don’t become mathematicians. More math major doctors, more math major high school teachers, more math major CEOs, more math major senators. But we won’t get there until we dump the stereotype that math is only worthwhile for kid geniuses.

Pag. 388.


The cult of the genius also tends to undervalue hard work. When I was starting out, I thought “hardworking” was a kind of veiled insult—something to say about a student when you can’t honestly say they’re smart. But the ability to work hard—to keep one’s whole attention and energy focused on a problem, systematically turning it over and over and pushing at everything that looks like a crack, despite the lack of outward signs of progress—is not a skill everybody has. Psychologists nowadays call it “grit,” and it’s impossible to do math without it. It’s easy to lose sight of the importance of work, because mathematical inspiration, when it finally does come, can feel effortless and instant.

Pag. 389.


Mathematics, mostly, is a communal enterprise, each advance the product of a huge network of minds working toward a common purpose, even if we accord special honor to the person who places the last stone in the arch. Mark Twain is good on this: “It takes a thousand men to invent a telegraph, or a steam engine, or a phonograph, or a telephone or any other important thing—and the last man gets the credit and we forget the others.”

Pag. 391.


Terry Tao writes: The popular image of the lone (and possibly slightly mad) genius—who ignores the literature and other conventional wisdom and manages by some inexplicable inspiration (enhanced, perhaps, with a liberal dash of suffering) to come up with a breathtakingly original solution to a problem that confounded all the experts—is a charming and romantic image, but also a wildly inaccurate one, at least in the world of modern mathematics. We do have spectacular, deep and remarkable results and insights in this subject, of course, but they are the hard-won and cumulative achievement of years, decades, or even centuries of steady work and progress of many good and great mathematicians; the advance from one stage of understanding to the next can be highly non-trivial, and sometimes rather unexpected, but still builds upon the foundation of earlier work rather than starting totally anew. . . . Actually, I find the reality of mathematical research today—in which progress is obtained naturally and cumulatively as a consequence of hard work, directed by intuition, literature, and a bit of luck—to be far more satisfying than the romantic image that I had as a student of mathematics being advanced primarily by the mystic inspirations of some rare breed of “geniuses.”

Pag. 391.


A year later, when the faculty at Göttingen balked at offering a position to the great algebraist Emmy Noether, arguing that students could not possibly be asked to learn mathematics from a woman, Hilbert responded: “I do not see how the sex of the candidate is an argument against her admission. We are a university, not a bathhouse.”

Pag. 393.


But the problem wasn’t the voters; it was the math. Condorcet, we now understand, was doomed to failure from the start. Kenneth Arrow, in his 1951 PhD thesis, proved that even a much weaker set of axioms than Condorcet’s, a set of requirements that seem as hard to doubt as Peano’s rules of arithmetic, leads to paradoxes.* It was a work of great elegance, which helped earn Arrow a Nobel Prize in economics in 1972, but it surely would have disappointed Condorcet, just as Gödel’s Theorem had disappointed Hilbert.

Pag. 395.


May it not be expected that the human race will be meliorated by new discoveries in the sciences and the arts, and, as an unavoidable consequence, in the means of individual and general prosperity; by farther progress in the principles of conduct, and in moral practice; and lastly, by the real improvement of our faculties, moral, intellectual and physical, which may be the result either of the improvement of the instruments which increase the power and direct the exercise of those faculties, or of the improvement of our natural organization itself? Nowadays, the Sketch is best known indirectly; it inspired Thomas Malthus, who considered Condorcet’s predictions hopelessly sunny, to write his much more famous, and much bleaker, account of humanity’s future.

Pag. 395.


And yet—when Roosevelt says, “The closet philosopher, the refined and cultured individual who from his library tells how men ought to be governed under ideal conditions, is of no use in actual governmental work,” I think of Condorcet, who spent his time in the library doing just that, and who contributed more to the French state than most of his time’s more practical men. And when Roosevelt sneers at the cold and timid souls who sit on the sidelines and second-guess the warriors, I come back to Abraham Wald, who as far as I know went his whole life without lifting a weapon in anger, but who nonetheless played a serious part in the American war effort, precisely by counseling the doers of deeds how to do them better. He was unsweaty, undusty, and unbloody, but he was right. He was a critic who counted.

Pag. 399.


For this is action, this not being sure! It is a sentence I often repeat to myself like a mantra. Theodore Roosevelt would surely have denied that “not being sure” was a kind of action. He would have called it cowardly fence sitting. The Housemartins—the greatest Marxist pop band ever to pick up guitars—took Roosevelt’s side in their 1986 song “Sitting on a Fence,” a withering portrait of a wishy-washy political moderate: Sitting on a fence is a man who swings from poll to poll Sitting on a fence is a man who sees both sides of both sides. . . . But the real problem with this man Is he says he can’t when he can . . . But Roosevelt and the Housemartins are wrong, and Ashbery is right. For him, not being sure is the move of a strong person, not a weakling: it is, elsewhere in the poem, “a kind of fence-sitting / Raised to the level of an esthetic ideal.” And math is part of it.

Pag. 400.


The paladin of principled uncertainty in our time is Nate Silver, the online-poker-player-turned-baseball-statistics-maven-turned-political-analyst whose New York Times columns about the 2012 presidential election drew more public attention to the methods of probability theory than they have ever before enjoyed. I think of Silver as a kind of Kurt Cobain of probability. Both were devoted to cultural practices that had previously been confined to a small, inward-looking cadre of true believers (for Silver, quantitative forecasting of sports and politics, for Cobain, punk rock). And both proved that if you carried their practice out in public, with an approachable style but without compromising the source material, you could make it massively popular.

Pag. 401.


Traditional political types greeted this response with the same disrespect I got from my tuberculosis boss. They wanted an answer. They didn’t understand that Silver was giving them one. Josh Jordan, in the National Review, wrote: “On September 30, leading into the debates, Silver gave Obama an 85 percent chance and predicted an Electoral College count of 320−218. Today, the margins have narrowed—but Silver still gives Obama a 67 percent chance and an Electoral College lead of 288−250, which has led many to wonder if he has observed the same movement to Romney over the past three weeks as everyone else has.” Had Silver noticed the movement to Romney? Clearly, yes. He gave Romney a 15% chance of winning at the end of September, and a 33% chance on October 22—more than twice as much. But Jordan didn’t notice that Silver had noticed, because Silver was still estimating—correctly—that Obama had a better chance of winning than Romney did. For traditional political reporters like Jordan, that meant his answer hadn’t changed.

Pag. 402.


The twistiness this incites in the mind is healthy; follow it! When you reason correctly, as Silver does, you find that you always think you’re right, but you don’t think you’re always right. As the philosopher W. O. V. Quine put it, “To believe something is to believe that it is true; therefore a reasonable person believes each of his beliefs to be true; yet experience has taught him to expect that some of his beliefs, he knows not which, will turn out to be false. A reasonable person believes, in short, that each of his beliefs is true and that some of them are false.”

Pag. 406.


Silver was being uncertain, rigorously uncertain, in public, and the public ate it up. I wouldn’t have thought it was possible. This is action, this not being sure!

Pag. 406.


And I tend to side with Seife, who argues that elections this close should be decided by coin flip.* Some will balk at the idea of choosing our leaders by chance. But that’s actually the coin flip’s most important benefit! Close elections are already determined by chance. Bad weather in the big city, a busted voting machine in an outlying town, a poorly designed ballot leading elderly Jews to vote for Pat Buchanan—any of these chance events can make the difference when the electorate is stuck at 50−50. Choosing by coin flip helps keep us from pretending that the people have spoken for the winning candidate in a closely divided race. Sometimes the people speak and they say, “I dunno.”

Pag. 408.


This sounds weird, but as a logical deduction it’s irrefutable; drop one tiny contradiction anywhere into a formal system and the whole thing goes to hell. Philosophers of a mathematical bent call this brittleness in formal logic ex falso quodlibet, or, among friends, “the principle of explosion.” (Remember what I said about how much math people love violent terminology?) Ex falso quodlibet is how Captain James T. Kirk used to disable dictatorial AIs—feed them a paradox and their reasoning modules frazzle and halt. That (they plaintively remark, just before the power light goes out) does not compute. Bertrand Russell did to Gottlob Frege’s set theory what Kirk did to uppity robots. His one sneaky paradox brought the whole edifice down. But Kirk’s trick doesn’t work on human beings.

Pag. 409.


As F. Scott Fitzgerald said, “The test of a first-rate intelligence is the ability to hold two opposed ideas in the mind at the same time, and still retain the ability to function.”

Pag. 410.


Disproving by night is a kind of hedge against that gigantic waste. But there’s a deeper reason. If something is true and you try to disprove it, you will fail. We are trained to think of failure as bad, but it’s not all bad. You can learn from failure. You try to disprove the statement one way, and you hit a wall. You try another way, and you hit another wall. Each night you try, each night you fail, each night a new wall, and if you are lucky, those walls start to come together into a structure, and that structure is the structure of the proof of the theorem. For if you have really understood what’s keeping you from disproving the theorem, you very likely understand, in a way inaccessible to you before, why the theorem is true. This is what happened to Bolyai, who bucked his father’s well-meaning advice and tried, like so many before him, to prove that the parallel postulate followed from Euclid’s other axioms. Like all the others, he failed. But unlike the others, he was able to understand the shape of his failure.

Pag. 410.


Proving by day and disproving by night is not just for mathematics. I find it’s a good habit to put pressure on all your beliefs, social, political, scientific, and philosophical. Believe whatever you believe by day; but at night, argue against the propositions you hold most dear. Don’t cheat! To the greatest extent possible you have to think as though you believe what you don’t believe. And if you can’t talk yourself out of your existing beliefs, you’ll know a lot more about why you believe what you believe. You’ll have come a little closer to a proof.

Pag. 411.


This salutary mental exercise is not at all what F. Scott Fitzgerald was talking about, by the way. His endorsement of holding contradictory beliefs comes from “The Crack-Up,” his 1936 essay about his own irreparable brokenness. The opposing ideas he has in mind there are “the sense of futility of effort and the sense of the necessity to struggle.” Samuel Beckett later put it more succinctly: “I can’t go on, I’ll go on.” Fitzgerald’s characterization of a “first-rate intelligence” is meant to deny his own intelligence that designation; as he saw it, the pressure of the contradiction had made him effectively cease to exist, like Frege’s set theory or a computer downed by Kirkian paradox.

Pag. 411.


David Foster Wallace was interested in paradox too. In his characteristically mathematical style; he put a somewhat tamed version of Russell’s paradox at the center of his first novel, The Broom of the System. It isn’t too strong to say his writing was driven by his struggle with contradictions. He was in love with the technical and analytic, but he saw that the simple dicta of religion and self-help offered better weapons against drugs, despair, and killing solipsism. He knew it was supposed to be the writer’s job to get inside other people’s heads, but his chief subject was the predicament of being stuck fast inside one’s own. Determined to record and neutralize the influence of his own preoccupations and prejudices, he knew this determination was itself among those preoccupations and subject to those prejudices. This is Phil 101 stuff, to be sure, but as any math student knows, the old problems you meet freshman year are some of the deepest you ever see. Wallace wrestled with the paradoxes just the way mathematicians do. You believe two things that seem in opposition. And so you go to work—step by step, clearing the brush, separating what you know from what you believe, holding the opposing hypotheses side by side in your mind and viewing each in the adversarial light of the other until the truth, or the nearest you can get to it, comes clear. As for Beckett, he had a richer and more sympathetic view of contradiction, which is so ever-present in his work that it takes on every possible emotional color somewhere or other in the corpus. “I can’t go on, I’ll go on” is bleak; but Beckett also draws on the Pythagoreans’ proof of the irrationality of the square root of 2, turning it into a joke between drunks: “But betray me,” said Neary, “and you go the way of Hippasos.” “The Akousmatic, I presume,” said Wylie. “His retribution slips my mind.” “Drowned in a puddle,” said Neary, “for having divulged the incommensurability of side and diagonal.” “So perish all babblers,” said Wylie. It’s not clear how much higher math Beckett knew, but in his late prose piece Worstward Ho, he sums up the value of failure in mathematical creation more succinctly than any professor ever has: Ever tried. Ever failed. No matter. Try again. Fail again. Fail better.

Pag. 412.


The mathematicians we’ve encountered in this book are not just puncturers of unjustified certainties, not just critics who count. They found things and they built things. Galton uncovered the idea of regression to the mean; Condorcet built a new paradigm for social decision making; Bolyai created an entirely novel geometry, “a strange new universe”; Shannon and Hamming made a geometry of their own, a space where digital signals lived instead of circles and triangles; Wald got the armor on the right part of the plane.

Pag. 413.