Excerpts from Nate Silver, The Signal and the Noise, 2012, Penguin Books.
exposure to so many new ideas was producing mass confusion. The amount of information was increasing much more rapidly than our understanding of what to do with it, or our ability to differentiate the useful information from the mistruths.13 Paradoxically, the result of having so much more shared knowledge was increasing isolation along national and religious lines. The instinctual shortcut that we take when we have “too much information” is to engage with it selectively, picking out the parts we like and ignoring the remainder, making allies with those who have made the same choices and enemies of the rest.
The idea of controlling one’s fate seemed to have become part of the human consciousness by Shakespeare’s time—but not yet the competencies to achieve that end. Instead, those who tested fate usually wound up dead.18
The idea of man as master of his fate was gaining currency. The words predict and forecast are largely used interchangeably today, but in Shakespeare’s time, they meant different things. A prediction was what the soothsayer told you; a forecast was something more like Cassius’s idea. The term forecast came from English’s Germanic roots,20 unlike predict, which is from Latin.21 Forecasting reflected the new Protestant worldliness rather than the otherworldliness of the Holy Roman Empire. Making a forecast typically implied planning under conditions of uncertainty. It suggested having prudence, wisdom, and industriousness, more like the way we now use the word foresight.
The 1970s were the high point for “vast amounts of theory applied to extremely small amounts of data,” as Paul Krugman put it to me. We had begun to use computers to produce models of the world, but it took us some time to recognize how crude and assumption laden they were, and that the precision that computers were capable of was no substitute for predictive accuracy.
This exponential growth in information is sometimes seen as a cure-all, as computers were in the 1970s. Chris Anderson, the editor of Wired magazine, wrote in 2008 that the sheer volume of data would obviate the need for theory, and even the scientific method.37 This is an emphatically pro-science and pro-technology book, and I think of it as a very optimistic one. But it argues that these views are badly mistaken. The numbers have no way of speaking for themselves. We speak for them. We imbue them with meaning.
But in speaking with well more than one hundred experts in more than a dozen fields over the course of four years, reading hundreds of journal articles and books, and traveling everywhere from Las Vegas to Copenhagen in pursuit of my investigation, I came to realize that prediction in the era of Big Data was not going very well. I had been lucky on a few levels: first, in having achieved success despite having made many of the mistakes that I will describe, and second, in having chosen my battles well.
If there is one thing that defines Americans—one thing that makes us exceptional—it is our belief in Cassius’s idea that we are in control of our own fates.
“This need of finding patterns, humans have this more than other animals,” I was told by Tomaso Poggio, an MIT neuroscientist who studies how our brains process information. “Recognizing objects in difficult situations means generalizing. A newborn baby can recognize the basic pattern of a face. It has been learned by evolution, not by the individual.” The problem, Poggio says, is that these evolutionary instincts sometimes lead us to see patterns when there are none there. “People have been doing that all the time,” Poggio said. “Finding patterns in random noise.”
The information overload after the birth of the printing press produced greater sectarianism. Now those different religious ideas could be testified to with more information, more conviction, more “proof”—and less tolerance for dissenting opinion. The same phenomenon seems to be occurring today.
A recent study in Nature found that the more informed that strong political partisans were about global warming, the less they agreed with one another.44
Meanwhile, if the quantity of information is increasing by 2.5 quintillion bytes per day, the amount of useful information almost certainly isn’t. Most of it is just noise, and the noise is increasing faster than the signal. There are so many hypotheses to test, so many data sets to mine—but a relatively constant amount of objective truth.
we can never make perfectly objective predictions. They will always be tainted by our subjective point of view. But this book is emphatically against the nihilistic viewpoint that there is no objective truth. It asserts, rather, that a belief in the objective truth—and a commitment to pursuing it—is the first prerequisite of making better predictions. The forecaster’s next commitment is to realize that she perceives it imperfectly. Prediction is important because it connects subjective and objective reality. Karl Popper, the philosopher of science, recognized this view.45 For Popper, a hypothesis was not scientific unless it was falsifiable—meaning that it could be tested in the real world by means of a prediction.
This attitude is embodied by something called Bayes’s theorem, which I introduce in chapter 8. Bayes’s theorem is nominally a mathematical formula. But it is really much more than that. It implies that we must think differently about our ideas—and how to test them. We must become more comfortable with probability and uncertainty. We must think more carefully about the assumptions and beliefs that we bring to a problem.
I am convinced, however, that the best way to view the financial crisis is as a failure of judgment—a catastrophic failure of prediction. These predictive failures were widespread, occurring at virtually every stage during, before, and after the crisis and involving everyone from the mortgage brokers to the White House.
“The housing crash was not a black swan. The housing crash was the elephant in the room.”
Human beings have an extraordinary capacity to ignore risks that threaten their livelihood, as though this will make them go away. So perhaps Deven Sharma’s claim isn’t so implausible—perhaps the ratings agencies really had missed the housing bubble, even if others hadn’t.
But suppose instead that there is some common factor that ties the fate of these homeowners together. For instance: there is a massive housing bubble that has caused home prices to rise by 80 percent without any tangible improvement in the fundamentals. Now you’ve got trouble: if one borrower defaults, the rest might succumb to the same problems. The risk of losing your bet has increased by orders of magnitude. The latter scenario was what came into being in the United States beginning in 2007 (we’ll conduct a short autopsy on the housing bubble later in this chapter). But it was the former assumption of largely uncorrelated risks that the ratings agencies had bet on.
In a broader sense, the ratings agencies’ problem was in being unable or uninterested in appreciating the distinction between risk and uncertainty. Risk, as first articulated by the economist Frank H. Knight in 1921,45 is something that you can put a price on. Say that you’ll win a poker hand unless your opponent draws to an inside straight: the chances of that happening are exactly 1 chance in 11.46 This is risk. It is not pleasant when you take a “bad beat” in poker, but at least you know the odds of it and can account for it ahead of time. In the long run, you’ll make a profit from your opponents making desperate draws with insufficient odds. Uncertainty, on the other hand, is risk that is hard to measure. You might have some vague awareness of the demons lurking out there. You might even be acutely concerned about them. But you have no real idea how many of them there are or when they might strike. Your back-of-the-envelope estimate might be off by a factor of 100 or by a factor of 1,000; there is no good way to know. This is uncertainty. Risk greases the wheels of a free-market economy; uncertainty grinds them to a halt.
Akerlof wrote a famous paper on this subject called “The Market for Lemons”78—it won him a Nobel Prize. In the paper, he demonstrated that in a market plagued by asymmetries of information, the quality of goods will decrease and the market will come to be dominated by crooked sellers and gullible or desperate buyers.
Summers thinks of the American economy as consisting of a series of feedback loops. One simple feedback is between supply and demand. Imagine that you are running a lemonade stand.83 You lower the price of lemonade and sales go up; raise it and they go down. If you’re making lots of profit because it’s 100 degrees outside and you’re the only lemonade stand on the b, the annoying kid across the street opens his own lemonade stand and undercuts your price. Supply and demand is an example of a negative feedback: as prices go up, sales go down.
Usually, in Summers’s view, negative feedbacks predominate in the American economy, behaving as a sort of thermostat that prevents it from going into recession or becoming overheated. Summers thinks one of the most important feedbacks is between what he calls fear and greed. Some investors have little appetite for risk and some have plenty, but their preferences balance out: if the price of a stock goes down because a company’s financial position deteriorates, the fearful investor sells his shares to a greedy one who is hoping to bottom-feed. Greed and fear are volatile quantities, however, and the balance can get out of whack. When there is an excess of greed in the system, there is a bubble. When there is an excess of fear, there is a panic.
There were at least four major failures of prediction that accompanied the financial crisis. The housing bubble can be thought of as a poor prediction. Homeowners and investors thought that rising prices implied that home values would continue to rise, when in fact history suggested this made them prone to decline. There was a failure on the part of the ratings agencies, as well as by banks like Lehman Brothers, to understand how risky mortgage-backed securities were. Contrary to the assertions they made before Congress, the problem was not that the ratings agencies failed to see the housing bubble. Instead, their forecasting models were full of faulty assumptions and false confidence about the risk that a collapse in housing prices might present. There was a widespread failure to anticipate how a housing crisis could trigger a global financial crisis. It had resulted from the high degree of leverage in the market, with $50 in side bets staked on every $1 that an American was willing to invest in a new home. Finally, in the immediate aftermath of the financial crisis, there was a failure to predict the scope of the economic problems that it might create. Economists and policy makers did not heed Reinhart and Rogoff’s finding that financial crises typically produce very deep and long-lasting recessions.
The problem, of course, is that of those 20,000 car trips, none occurred when you were anywhere near this drunk. Your sample size for drunk driving is not 20,000 trips but zero, and you have no way to use your past experience to forecast your accident risk. This is an example of an out-of-sample
We forget—or we willfully ignore—that our models are simplifications of the world. We figure that if we make a mistake, it will be at the margin. In complex systems, however, mistakes are not measured in degrees but in whole orders of magnitude.
Was the failure to predict the collapse of the Soviet Union an anomaly, or does “expert” political analysis rarely live up to its billing? His studies, which spanned more than fifteen years, were eventually published in the 2005 book Expert Political Judgment. Tet’s conclusion was damning. The experts in his survey—regardless of their occupation, experience, or subfield—had done barely any better than random chance, and they had done worse than even rudimentary statistical methods at predicting future political events.
While the experts’ performance was poor in the aggregate, however, Tet found that some had done better than others. On the losing side were those experts whose predictions were cited most frequently in the media. The more interviews that an expert had done with the press, Tetlock found, the worse his predictions tended to be. Another subgroup of experts had done relatively well, however. Tetlock, with his training as a psychologist, had been interested in the experts’ cognitive styles—how they thought about the world. So he administered some questions lifted from personality tests to all the experts. On the basis of their responses to these questions, Tetlock was able to classify his experts along a spectrum between what he called hedgehogs and foxes. The reference to hedgehogs and foxes comes from the title of an Isaiah Berlin essay on the Russian novelist Leo Tolstoy—The Hedgehog and the Fox. Berlin had in turn borrowed his title from a passage attributed to the Greek poet Archilochus: “The fox knows many little things, but the hedgehog knows one big thing.”
“What are the incentives for a public intellectual?” Tet asked me. “There are some academics who are quite content to be relatively anonymous. But there are other people who aspire to be public intellectuals, to be pretty bold and to attach nonnegligible probabilities to fairly dramatic change. That’s much more likely to bring you attention.” Big, bold, hedgehog-like predictions, in other words, are more likely to get you on television.
In fact, a little knowledge may be a dangerous thing in the hands of a hedgehog with a Ph.D. One of Tet’s more remarkable findings is that, while foxes tend to get better at forecasting with experience, the opposite is true of hedgehogs: their performance tends to worsen as they pick up additional credentials. Tetlock believes the more facts hedgehogs have at their command, the more opportunities they have to permute and manipulate them in ways that confirm their biases. The situation is analogous to what might happen if you put a hypochondriac in a dark room with an Internet connection. The more time that you give him, the more information he has at his disposal, the more ridiculous the self-diagnosis he’ll come up with; before long he’ll be mistaking a common cold for the bubonic plague.
When asked in general terms about how well Republicans were likely to do, there was almost no difference between the panelists. They differed profoundly, however, when asked about specific cases—these brought the partisan differences to the surface.23 Too much information can be a bad thing in the hands of a hedgehog.
Hedgehogs who have lots of information construct stories—stories that are neater and tidier than the real world, with protagonists and villains, winners and losers, climaxes and dénouements—and, usually, a happy ending for the home team.
“When the facts change, I change my mind,” the economist John Maynard Keynes famously said. “What do you do, sir?”
Quite a lot of evidence suggests that aggregate or group forecasts are more accurate than individual ones, often somewhere between 15 and 20 percent more accurate depending on the discipline. That doesn’t necessarily mean the group forecasts are good. (We’ll explore this subject in more depth later in the book.) But it does mean that you can benefit from applying multiple perspectives toward a problem.
So why bother with the candidate interviews at all? Mostly, Wasserman is looking for red flags—like the time when the Democratic congressman Eric Massa (who would later abruptly resign from Congress after accusations that he sexually harassed a male staffer) kept asking Wasserman how old he was. The psychologist Paul Meehl called these “broken leg” cases—situations where there is something so glaring that it would be foolish not to account for it.42 Catching a few of these each year helps Wasserman to call a few extra races right. He is able to weigh the information from his interviews without overweighing it, which might actually make his forecasts worse. Whether information comes in a quantitative or qualitative flavor is not as important as how you use it.
It Isn’t Easy to Be Objective In this book, I use the terms objective and subjective carefully. The word objective is sometimes taken to be synonymous with quantitative, but it isn’t. Instead it means seeing beyond our personal biases and prejudices and toward the truth of a problem.43 Pure objectivity is desirable but unattainable in this world. When we make a forecast, we have a choice from among many different methods. Some of these might rely solely on quantitative variables like polls, while approaches like Wasserman’s may consider qualitative factors as well. All of them, however, introduce decisions and assumptions that have to be made by the forecaster. Wherever there is human judgment there is the potential for bias. The way to become more objective is to recognize the influence that our assumptions play in our forecasts and to question ourselves about them. In politics, between our ideological predispositions and our propensity to weave tidy narratives from noisy data, this can be especially difficult. So you will need to adopt some different habits from the pundits you see on TV. You will need to learn how to express—and quantify—the uncertainty in your predictions. You will need to update your forecast as facts and circumstances change. You will need to recognize that there is wisdom in seeing the world from a different viewpoint. The more you are willing to do these things, the more capable you will be of evaluating a wide variety of information without abusing it. In short, you will need to learn how to think like a fox.
The second chore—separating skill from luck—requires more work. Baseball is designed in such a way that luck tends to predominate in the near term: even the best teams lose about one-third of their ball games, and even the best hitters fail to get on base three out of every five times. Sometimes luck will obscure a player’s real skill level even over the course of a whole year.
The goal, as in formulating any prediction, is to weed out the root cause: striking batters out prevents them from getting on base, preventing them from getting on base prevents them from scoring runs, and preventing them from scoring runs prevents them from winning games.
Baseball offers perhaps the world’s richest data set: pretty much everything that has happened on a major-league playing field in the past 140 years has been dutifully and accurately recorded, and hundreds of players play in the big leagues every year. Meanwhile, although baseball is a team sport, it proceeds in a highly orderly way: pitchers take their turn in the rotation, hitters take their turn in the batting order, and they are largely responsible for their own statistics.* There are relatively few problems involving complexity and nonlinearity. The causality is easy to sort out.
By looking at statistics for thousands of players, James had discovered that the typical player9 continues to improve until he is in his late twenties, at which point his skills usually begin to atrophy, especially once he reaches his midthirties.10 This gave James one of his most important inventions: the aging curve. Olympic gymnasts peak in their teens; poets in their twenties; chess players in their thirties11; applied economists in their forties,12 and the average age of a Fortune 500 CEO is 55.13 A baseball player, James found, peaks at age twenty-seven.
Real aging curves are noisy—very noisy (figure 3-2). On average, they form a smooth-looking pattern. But the average, like the family with 1.7 children, is just a statistical abstraction. Perhaps, Gary Huckabay reasoned, there was some signal in the noise that James’s curve did not address.
By 2009 or so, however, the other systems were catching up and sometimes beating PECOTA. As I had borrowed from James and Huckabay, other researchers had borrowed some of PECOTA’s innovations while adding new wrinkles of their own. Some of these systems are very good. When you rank the best forecasts each year in terms of how well they predict the performance of major league players, the more advanced ones will now usually come within a percentage point or two of one another.27
The fuel of any ranking system is information—and being able to look at both scouting and statistical information means that you have more fuel. The only way that a purely stat-based prospect list should be able to beat a hybrid list is if the biases introduced by the process are so strong that they overwhelm the benefit. In other words, scouts use a hybrid approach. They have access to more information than statistics alone.
“I love to evaluate,” he told me. “I’ve always enjoyed statistical proofs even way back in the day when we did it with calculators and adding machines.” Sanders relayed an anecdote: “One of the scouts said, ‘Well, let’s face it, guys, what’s the first thing we do when we go to the ballpark? We go to the press room, we get the stats. We get the stats! What’s wrong with that? That’s what you do.’ ” Statistics, indeed, have been a part of the fabric of baseball since the very beginning.
The scouts’ traditional alternative to statistics are the Five Tools: hitting for power, hitting for average, speed, arm strength, and defensive range.
Sanders’s focus is less on physical tools and more on usable, game-ready skills. The extent to which one can be translated to the other depends on what he calls a player’s mental toolbox. The mental tools are often slower to develop than the physical ones; Sanders’s wife is a special-needs educator and pointed him toward research suggesting that most of us are still in a state of mental adolescence until about the age of twenty-four.40 Before that age, Sanders will cut a player some slack if he sees signs that their mental tools are developing. After that, he needs to see performance. Interestingly, twenty-four is right about the age when a player is usually in Double-A and his performance starts to become more predictable from his statistics.
“What defines a good scout? Finding out information that other people can’t,” he told me. “Getting to know the kid. Getting to know the family. There’s just some things that you have to find out in person.”
This is the essence of Beane’s philosophy: collect as much information as possible, but then be as rigorous and disciplined as possible when analyzing it. The litmus test for whether you are a competent forecaster is if more information makes your predictions better.
This new technology will not kill scouting any more than Moneyball did, but it may change its emphasis toward the things that are even harder to quantify and where the information is more exclusive, like a player’s mental tools. Smart scouts like Sanders are already ahead of the curve.
Our first instinct is to place information into categories—usually a relatively small number of categories since they’ll be easier to keep track of. (Think of how the Census Bureau classifies people from hundreds of ethnic groups into just six racial categories or how thousands of artists are placed into a taxonomy of a few musical genres.)
In the most competitive industries, like sports, the best forecasters must constantly innovate. It’s easy to adopt a goal of “exploit market inefficiencies.” But that doesn’t really give you a plan for how to find them and then determine whether they represent fresh dawns or false leads. It’s hard to have an idea that nobody else has thought of. It’s even harder to have a good idea—and when you do, it will soon be duplicated. That is why this book shies away from promoting quick-fix solutions that imply you can just go about your business in a slightly different way and outpredict the competition. Good innovators typically think very big and they think very small. New ideas are sometimes found in the most granular details of a problem where few others bother to look. And they are sometimes found when you are doing your most abstract and philosophical thinking, considering why the world is the way that it is and whether there might be an alternative to the dominant paradigm.
Weather forecasting is one of the success stories in this book, a case of man and machine joining forces to understand and sometimes anticipate the complexities of nature. That we can sometimes predict nature’s course, however, does not mean we can alter it. Nor does a forecast do much good if there is no one willing to listen to it. The story of Katrina is one of human ingenuity and human error.
The idea takes on various forms, but no one took it further than Pierre-Simon Laplace, a French astronomer and mathematician. In 1814, Laplace made the following postulate, which later came to be known as Laplace’s Demon: We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect were also vast enough to submit these data to analysis, it would embrace in a single formula the movements of the greatest bodies of the universe and those of the tiniest atom; for such an intellect nothing would be uncertain and the future just like the past would be present before its eyes.13 Given perfect knowledge of present conditions (“all positions of all items of which nature is composed”), and perfect knowledge of the laws that govern the universe (“all forces that set nature in motion”), we ought to be able to make perfect predictions (“the future just like the past would be present”). The movement of every particle in the universe should be as predictable as that of the balls on a billiard table.
Laplace’s Demon has been controversial for all its two-hundred-year existence. At loggerheads with the determinists are the probabilists, who believe that the conditions of the universe are knowable only with some degree of uncertainty.* Probabilism was, at first, mostly an epistemological paradigm: it avowed that there were limits on man’s ability to come to grips with the universe. More recently, with the discovery of quantum mechanics, scientists and philosophers have asked whether the universe itself behaves probabilistically. The particles Laplace sought to identify begin to behave like waves when you look closely enough—they seem to occupy no fixed position. How can you predict where something is going to go when you don’t know where it is in the first place? You can’t. This is the basis for the theoretical physicist Werner Heisenberg’s famous uncertainty principle.14 Physicists interpret the uncertainty principle in different ways, but it suggests that Laplace’s postulate cannot literally be true. Perfect predictions are impossible if the universe itself is random.
The first computer weather forecast was made in 1950 by the mathematician John von Neumann, who used a machine that could make about 5,000 calculations per second.
Although you need, roughly speaking, to get ahold of sixteen times more processing power in order to double the resolution of your weather forecast, processing power has been improving exponentially—doubling about once every two years.
Chaos theory. You may have heard the expression: the flap of a butterfly’s wings in Brazil can set off a tornado in Texas. It comes from the title of a paper19 delivered in 1972 by MIT’s Edward Lorenz, who began his career as a meteorologist. Chaos theory applies to systems in which each of two properties hold: The systems are dynamic, meaning that the behavior of the system at one point in time influences its behavior in the future; And they are nonlinear, meaning they abide by exponential rather than additive relationships.
Lorenz and his team were working to develop a weather forecasting program on an early computer known as a Royal McBee LGP-30.21 They thought they were getting somewhere until the computer started spitting out erratic results. They began with what they thought was exactly the same data and ran what they thought was exactly the same code—but the program would forecast clear skies over Kansas in one run, and a thunderstorm in the next. After spending weeks double-checking their hardware and trying to debug their program, Lorenz and his team eventually discovered that their data wasn’t exactly the same: one of their technicians had truncated it in the third decimal place. Instead of having the barometric pressure in one corner of their grid read 29.5168, for example, it might instead read 29.517. Surely this couldn’t make that much difference? Lorenz realized that it could. The most basic tenet of chaos theory is that a small change in initial conditions—a butterfly flapping its wings in Brazil—can produce a large and unexpected divergence in outcomes—a tornado in Texas.
This is the process by which modern weather forecasts are made. These small changes, introduced intentionally in order to represent the inherent uncertainty in the quality of the observational data, turn the deterministic forecast into a probabilistic one. For instance, if your l weatherman tells you that there’s a 40 percent chance of rain tomorrow, one way to interpret that is that in 40 percent of his simulations, a storm developed, and in the other 60 percent—using just slightly different initial parameters—it did not. It is still not quite that simple, however. The programs that meteorologists use to forecast the weather are quite good, but they are not perfect. Instead, the forecasts you actually see reflect a combination of computer and human judgment. Humans can make the computer forecasts better or they can make them worse.
The unique resource that these forecasters were contributing was their eyesight. It is a valuable tool for forecaters in any discipline—a visual inspection of a graphic showing the interaction between two variables is often a quicker and more reliable way to detect outliers in your data than a statistical test. It’s also one of those areas where computers lag well behind the human brain. Distort a series of letters just slightly—as with the CAPTCHA technology that is often used in spam or password protection—and very “smart” computers get very confused. They are too literal-minded, unable to recognize the pattern once its subjected to even the slightest degree of manipulation. Humans by contrast, out of pure evolutionary necessity, have very powerful visual cortexes. They rapidly parse through any distortions in the data in order to identify abstract qualities like pattern and organization—qualities that happen to be very important in different types of weather systems.
The best forecasters, Hoke explained, need to think visually and abstractly while at the same time being able to sort through the abundance of information the computer provides them with. Moreover, they must understand the dynamic and nonlinear nature of the system they are trying to study. It is not an easy task, requiring vigorous use of both the left and right brain. Many of his forecasters would make for good engineers or good software designers, fields where they could make much higher incomes, but they choose to become meteorologists instead. The NWS keeps two different sets of books: one that shows how well the computers are doing by themselves and another that accounts for how much value the humans are contributing. According to the agency’s statistics, humans improve the accuracy of precipitation forecasts by about 25 percent over the computer guidance alone,31 and temperature forecasts by about 10 percent.32 Moreover, according to Hoke, these ratios have been relatively constant over time: as much progress as the computers have made, his forecasters continue to add value on top of it. Vision accounts for a lot.
After a little more than a week, Loft told me, chaos theory completely takes over, and the dynamic memory of the atmopshere erases itself. Although the following analogy is somewhat imprecise, it may help to think of the atmosphere as akin to a NASCAR oval, with various weather systems represented by individual cars that are running along the track. For the first couple of dozen laps around the track, knowing the starting order of the cars should allow us to make a pretty good prediction of the order in which they might pass by. Our predictions won’t be perfect—there’ll be crashes, pit stops, and engine failures that we’ve failed to account for—but they will be a lot better than random. Soon, however, the faster cars will start to lap the slower ones, and before long the field will be completely jumbled up.
Calibration is difficult to achieve in many fields. It requires you to think probabilistically, something that most of us (including most “expert” forecasters) are not very good at. It really tends to punish overconfidence—a trait that most forecasters have in spades. It also requires a lot of data to evaluate fully—cases where forecasters have issued hundreds of predictions.* Meteoroloigsts meet this standard. They’ll forecast the temperatures, and the probability of rain and other precipitation, in hundreds of cities every day. Over the course of a year, they’ll make tens of thousands of forecasts. This sort of high-frequency forecasting is extremely helpful not just when we want to evaluate a forecast but also to the forecasters themselves—they’ll get lots of feedback on whether they’re doing something wrong and can change course accordingly.
This logic is a little circular. TV weathermen say they aren’t bothering to make accurate forecasts because they figure the public won’t believe them anyway. But the public shouldn’t believe them, because the forecasts aren’t accurate.
One lesson from Katrina, however, is that accuracy is the best policy for a forecaster. It is forecasting’s original sin to put politics, personal glory, or economic benefit before the truth of the forecast. Sometimes it is done with good intentions, but it always makes the forecast worse.
The seismological community has a reputation for being very conservative. It was very slow to accept the theory of plate tectonics, for instance23—the now broadly accepted notion that the shifting of the earth’s continental plates is the primary cause for earthquakes—not adopting it into their canon until the 1960s even though it was proposed in 1912. Had Hough’s skepticism crossed the line into cynicism? The official position of the USGS is even more emphatic: earthquakes cannot be predicted.
A prediction is a definitive and specific statement about when and where an earthquake will strike: a major earthquake will hit Kyoto, Japan, on June 28. Whereas a forecast is a probabilistic statement, usually over a longer time scale: there is a 60 percent chance of an earthquake in Southern California over the next thirty years. The USGS’s official position is that earthquakes cannot be predicted. They can, however, be forecasted.
It turns out that these earthquakes display a stunning regularity when you graph them in a slightly different way. In figure 5-3b, I’ve changed the vertical axis—which shows the frequency of earthquakes of different magnitudes—into a logarithmic scale.* Now the earthquakes form what is almost exactly a straight line on the graph. This pattern is characteristic of what is known as a power-law distribution, and it is the relationship that Richter and Gutenberg uncovered.
Something that obeys this distribution has a highly useful property: you can forecast the number of large-scale events from the number of small-scale ones, or vice versa. In the case of earthquakes, it turns out that for every increase of one point in magnitude, an earthquake becomes about ten times less frequent. So, for example, magnitude 6 earthquakes occur ten times more frequently than magnitude 7’s, and one hundred times more often than magnitude 8’s. What’s more, the Gutenberg–Richter law generally holds across regions of the globe as well as over the whole planet. Suppose, for instance, that we wanted to make an earthquake forecast for Tehran, Iran. Fortunately, there hasn’t been a catastrophic earthquake there since its seismicity began to be measured. But there have been a number of medium-size ones; between 1960 and 2009, there were about fifteen earthquakes that measured between 5.0 and 5.9 on the magnitude scale in the area surrounding the city.31 That works out to about one for every three years. According to the power law that Gutenberg and Richter uncovered, that means that an earthquake measuring between 6.0 and 6.9 should occur about once every thirty years in Tehran.
What seismologists are really interested in—what Susan Hough calls the “Holy Grail” of seismology—are time-dependent forecasts, those in which the probability of an earthquake is not assumed to be constant across time. Even seismologists who are skeptical of the possibility of making time-dependent earthquake forecasts acknowledge that there are some patterns in the earthquake distribution. The most obvious is the presence of aftershocks. Large earthquakes are almost always followed by dozens or even thousands of aftershocks (the 2011 earthquake in Japan produced at least 1,200 of them). These aftershocks follow a somewhat predictable pattern.35 Aftershocks are more likely to occur immediately after an earthquake than days later, and more likely to occur days later than weeks after the fact.
What this means is that if San Francisco is forecasted to have a major earthquake every thirty-five years, it does not imply that these will be spaced out evenly (as in 1900, 1935, 1970). It’s safer to assume there is a 1 in 35 chance of an earthquake occurring every year, and that this rate does not change much over time regardless of how long it has been since the last one.
“I’m a failed predictor,” Bowman told me in 2010. “I did a bold and stupid thing—I made a testable prediction. That’s what we’re supposed to do, but it can bite you when you’re wrong.” Bowman’s idea had been to identify the root causes of earthquakes—stress accumulating along a fault line—and formulate predictions from there. In fact, he wanted to understand how stress was changing and evolving throughout the entire system; his approach was motivated by chaos theory. Chaos theory is a demon that can be tamed—weather forecasters did so, at least in part. But weather forecasters have a much better theoretical understanding of the earth’s atmosphere than seismologists do of the earth’s crust. They know, more or less, how weather works, right down to the molecular level. Seismologists don’t have that advantage. “It’s easy for climate systems,” Bowman reflected. “If they want to see what’s happening in the atmosphere, they just have to look up. We’re looking at rock. Most events occur at a depth of fifteen kilometers underground. We don’t have a hope of drilling down there, realistically—sci-fi movies aside. That’s the fundamental problem. There’s no way to directly measure the stress.”
What happens in systems with noisy data and underdeveloped theory—like earthquake prediction and parts of economics and political science—is a two-step process. First, people start to mistake the noise for a signal. Second, this noise pollutes journals, blogs, and news accounts with false alarms, undermining good science and setting back our ability to understand how the system really works.
In statistics, the name given to the act of mistaking noise for a signal is overfitting.
Suppose that you’re some sort of petty criminal and I’m your boss. I deputize you to figure out a good method for picking combination s of the sort you might find in a middle school—maybe we want to steal everybody’s lunch money. I want an approach that will give us a high probability of picking a lock anywhere and anytime. I give you three locks to practice on—a red one, a black one, and a blue one. After experimenting with the locks for a few days, you come back and tell me that you’ve discovered a foolproof solution. If the lock is red, you say, the combination is 27-12-31. If it’s black, use the numbers 44-14-19. And if it’s blue, it’s 10-3-32. I’d tell you that you’ve completely failed in your mission. You’ve clearly figured out how to open these three particular locks. But you haven’t done anything to advance our theory of lock-picking—to give us some hope of picking them when we don’t know the combination in advance. I’d have been interested in knowing, say, whether there was a good type of paper clip for picking these locks, or some sort of mechanical flaw we can exploit. Or failing that, if there’s some trick to detect the combination: maybe certain types of numbers are used more often than others? You’ve given me an overly specific solution to a general problem. This is overfitting, and it leads to worse predictions.
Michael Babyak, who has written extensively on this problem,60 puts the dilemma this way: “In science, we seek to balance curiosity with skepticism.” This is a case of our curiosity getting the better of us.
Even if we had a thousand years of reliable seismological records, however, it might be that we would not get all that far. It may be that there are intrinsic limits on the predictability of earthquakes. Earthquakes may be an inherently complex process. The theory of complexity that the late physicist Per Bak and others developed is different from chaos theory, although the two are often lumped together. Instead, the theory suggests that very simple things can behave in strange and mysterious ways when they interact with one another. Bak’s favorite example was that of a sandpile on a beach. If you drop another grain of sand onto the pile (what could be simpler than a grain of sand?), it can actually do one of three things. Depending on the shape and size of the pile, it might stay more or less where it lands, or it might cascade gently down the small hill toward the bottom of the pile. Or it might do something else: if the pile is too steep, it could destabilize the entire system and trigger a sand avalanche. Complex systems seem to have this property, with large periods of apparent stasis marked by sudden and catastrophic failures. These processes may not literally be random, but they are so irreducibly complex (right down to the last grain of sand) that it just won’t be possible to predict them beyond a certain level.
Some economic forecasters wouldn’t want you to know that. Like forecasters in most other disciplines, they see uncertainty as the enemy—something that threatens their reputation. They don’t estimate it accurately, making assumptions that lower the amount of uncertainty in their forecast models but that don’t improve their predictions in the real world. This tends to leave us less prepared when a deluge hits.
Isn’t economics supposed to be the field that studies the rationality of human behavior? Sure, you might expect someone in another field—an anthropologist, say—to show bias when he makes a forecast. But not an economist. Actually, however, that may be part of the problem. Economists understand a lot about rationality—which means they also understand a lot about how our incentives work. If they’re making biased forecasts, perhaps this is a sign that they don’t have much incentive to make good ones.
As Hatzius sees it, economic forecasters face three fundamental challenges. First, it is very hard to determine cause and effect from economic statistics alone. Second, the economy is always changing, so explanations of economic behavior that hold in one business cycle may not apply to future ones. And third, as bad as their forecasts have been, the data that economists have to work with isn’t much good either.
How does an indicator that supposedly had just a 1-in-4,700,000 chance of failing flop so badly? For the same reason that, even though the odds of winning the Powerball lottery are only 1 chance in 195 million,30 somebody wins it every few weeks. The odds are hugely against any one person winning the lottery—but millions of tickets are bought, so somebody is going to get lucky. Likewise, of the millions of statistical indicators in the world, a few will have happened to correlate especially well with stock prices or GDP or the unemployment rate. If not the winner of the Super Bowl, it might be chicken production in Uganda. But the relationship is merely coincidental.
At its logical extreme, this is a bit like the observer effect (often mistaken for a related concept, the Heisenberg uncertainty principle): once we begin to measure something, its behavior starts to change. Most statistical models are built on the notion that there are independent variables and dependent variables, inputs and outputs, and they can be kept pretty much separate from one another.39 When it comes to the economy, they are all lumped together in one hot mess.
A forecaster should almost never ignore data, especially when she is studying rare events like recessions or presidential elections, about which there isn’t very much data to begin with. Ignoring data is often a tip-off that the forecaster is overconfident, or is overfitting her model—that she is interested in showing off rather than trying to be accurate.
The other rationale you’ll sometimes hear for throwing out data is that there has been some sort of fundamental shift in the problem you are trying to solve.
And this is exactly how the financial crisis played out. Not only was Hatzius’s forecast correct, but it was also right for the right reasons, explaining the causes of the collapse and anticipating the effects. Hatzius refers to this chain of cause and effect as a “story.” It is a story about the economy—and although it might be a data-driven story, it is one grounded in the real world.
Indeed, this exact property has been identified in the Blue Chip forecasts:66 one study terms the phenomenon “rational bias.”67 The less reputation you have, the less you have to lose by taking a big risk when you make a prediction. Even if you know that the forecast is dodgy, it might be rational for you to go after the big score. Conversely, if you have already established a good reputation, you might be reluctant to step too far out of line even when you think the data demands it.
“The way we think about it is if you take something like initial claims on unemployment insurance, that’s a very good predictor for unemployment rates, which is a good predictor for economic activity,” I was told by Google’s chief economist, Hal Varian, at Google’s headquarters in Mountain View, California. “We can predict unemployment initial claims earlier because if you’re in a company and a rumor goes around that there are going to be layoffs, then people start searching ‘where’s the unemployment office,’ ‘how am I going to apply for unemployment,’ and so on. It’s a slightly leading indicator.”
“In an MBA school you present this image of a manager as a great decision maker—the scientific decision maker. He’s got his spreadsheet and he’s got his statistical tests and he’s going to weigh the various options. But in fact real management is mostly about managing coalitions, maintaining support for a project so it doesn’t evaporate. If they put together a coalition to do a project, and then at the last minute the forecasts fluctuate, you can’t dump the project at the last minute, right? “Even academics aren’t very interested in collecting a track record of forecasts—they’re not very interested in making clear enough forecasts to score,” he says later. “What’s in it for them? The more fundamental problem is that we have a demand for experts in our society but we don’t actually have that much of a demand for accurate forecasts.”
We will revisit the idea of prediction markets in chapter 11; they are not a panacea, particularly if we make the mistake of assuming that they can never go wrong. But as Hansen says, they can yield some improvement by at least getting everyone’s incentives in order.
If you can’t make a good prediction, it is very often harmful to pretend that you can. I suspect that epidemiologists, and others in the medical community, understand this because of their adherence to the Hippocratic oath. Primum non nocere: First, do no harm.
As the statistician George E. P. Box wrote, “All models are wrong, but some models are useful.”90 What he meant by that is that all models are simplifications of the universe, as they must necessarily be. As another mathematician said, “The best model of a cat is a cat.”91 Everything else is leaving out some sort of detail. How pertinent that detail might be will depend on exactly what problem we’re trying to solve and on how precise an answer we require. Nor are statistical models the only tools we use that require us to make approximations about the universe. Language, for instance, is a type of model, an approximation that we use to communicate with one another. All languages contain words that have no direct cognate in other languages, even though they are both trying to explain the same universe. Technical subfields have their own specialized language. To you or me, the color on the front cover of this book is yellow. To a graphic designer, that term is too approximate—instead, it’s Pantone 107.
Voulgaris’s big secret is that he doesn’t have a big secret. Instead, he has a thousand little secrets, quanta of information that he puts together one vector at a time. He has a program to simulate the outcome of each game, for instance. But he relies on it only if it suggests he has a very clear edge or it is supplemented by other information. He watches almost every NBA game—some live, some on tape—and develops his own opinions about which teams are playing up to their talent and which aren’t. He runs what is essentially his own scouting service, hiring assistants to chart every player’s defensive positioning on every play, giving him an advantage that even many NBA teams don’t have.
So Voulgaris is not just looking for patterns. Finding patterns is easy in any kind of data-rich environment; that’s what mediocre gamblers do. The key is in determining whether the patterns represent noise or signal. But although there isn’t any one particular key to why Voulgaris might or might not bet on a given game, there is a particular type of thought process that helps govern his decisions. It is called Bayesian
The argument made by Bayes and Price is not that the world is intrinsically probabilistic or uncertain. Bayes was a believer in divine perfection; he was also an advocate of Isaac Newton’s work, which had seemed to suggest that nature follows regular and predictable laws. It is, rather, a statement—expressed both mathematically and philosophically—about how we learn about the universe: that we learn about it through approximation, getting closer and closer to the truth as we gather more evidence. This contrasted25 with the more skeptical viewpoint of the Scottish philosopher David Hume, who argued that since we could not be certain that the sun would rise again, a prediction that it would was inherently no more rational than one that it wouldn’t.26 The Bayesian viewpoint, instead, regards rationality as a probabilistic matter. In essence, Bayes and Price are telling Hume, don’t blame nature because you are too daft to understand it: if you step out of your skeptical shell and make some predictions about its behavior, perhaps you will get a little closer to the truth.
Laplace came to view probability as a waypoint between ignorance and knowledge. It seemed obvious to him that a more thorough understanding of probability was essential to scientific progress.32 The intimate connection between probability, prediction, and scientific progress was thus well understood by Bayes and Laplace in the eighteenth century—the period when human societies were beginning to take the explosion of information that had become available with the invention of the printing press several centuries earlier, and finally translate it into sustained scientific, technological, and economic progress.
First, you need to estimate the probability of the underwear’s appearing as a condition of the hypothesis being true—that is, you are being cheated upon. Let’s assume for the sake of this problem that you are a woman and your partner is a man, and the underwear in question is a pair of panties. If he’s cheating on you, it’s certainly easy enough to imagine how the panties got there. Then again, even (and perhaps especially) if he is cheating on you, you might expect him to be more careful. Let’s say that the probability of the panties’ appearing, conditional on his cheating on you, is 50 percent. Second, you need to estimate the probability of the underwear’s appearing conditional on the hypothesis being false. If he isn’t cheating, are there some innocent explanations for how they got there? Sure, although not all of them are pleasant (they could be his panties). It could be that his luggage got mixed up. It could be that a platonic female friend of his, whom you trust, stayed over one night. The panties could be a gift to you that he forgot to wrap up. None of these theories is inherently untenable, although some verge on dog-ate-my-homework excuses. Collectively you put their probability at 5 percent. Third and most important, you need what Bayesians call a prior probability (or simply a prior). What is the probability you would have assigned to him cheating on you before you found the underwear? Of course, it might be hard to be entirely objective about this now that the panties have made themselves known. (Ideally, you establish your priors before you start to examine the evidence.) But sometimes, it is possible to estimate a number like this empirically. Studies have found, for instance, that about 4 percent of married partners cheat on their spouses in any given year,33 so we’ll set that as our prior. If we’ve estimated these values, Bayes’s theorem can then be applied to establish a posterior possibility. This is the number that we’re interested in: how likely is it that we’re being cheated on, given that we’ve found the underwear? The calculation (and the simple algebraic expression that yields it) is in figure 8-3. As it turns out, this probability is still fairly low: 29 percent.
When our priors are strong, they can be surprisingly resilient in the face of new evidence.
Usually, however, we focus on the newest or most immediately available information, and the bigger picture gets lost. Smart gamblers like Bob Voulgaris have learned to take advantage of this flaw in our thinking.
But the number of meaningful relationships in the data—those that speak to causality rather than correlation and testify to how the world really works—is orders of magnitude smaller. Nor is it likely to be increasing at nearly so fast a rate as the information itself; there isn’t any more truth in the world than there was before the Internet or the printing press. Most of the data is just noise, as most of the universe is filled with empty space.
Essentially, the frequentist approach toward statistics seeks to wash its hands of the reason that predictions most often go wrong: human error. It views uncertainty as something intrinsic to the experiment rather than something intrinsic to our ability to understand the real world. The frequentist method also implies that, as you collect more data, your error will eventually approach zero: this will be both necessary and sufficient to solve any problems. Many of the more problematic areas of prediction in this book come from fields in which useful data is sparse, and it is indeed usually valuable to collect more of it. However, it is hardly a golden road to statistical perfection if you are not using it in a sensible way. As Ioannidis noted, the era of Big Data only seems to be worsening the problems of false positive findings in the research literature.
Everyone can see the statistical patterns, and they are soon reflected in the betting line. The question is whether they represent signal or noise. Voulgaris forms hypotheses from his basketball knowledge so that he might tell the difference more quickly and more accurately. Voulgaris’s approach to betting basketball is one of the purer distillations of the scientific method that you’re likely to find (figure 8-7). He observes the world and asks questions: why are the Cleveland Cavaliers so frequently going over on the total? He then gathers information on the problem, and formulates a hypothesis: the Cavaliers are going over because Ricky Davis is in a contract year and is trying to play at a fast pace to improve his statistics. The difference between what Voulgaris does and what a physicist or biologist might do is that he demarcates his predictions by placing bets on them, whereas a scientist would hope to validate her prediction by conducting an experiment.
If you hold there is a 100 percent probability that God exists, or a 0 percent probability, then under Bayes’s theorem, no amount of evidence could persuade you otherwise.
I’m not here to tell you whether there are things you should believe with absolute and unequivocal certainty or not.* But perhaps we should be more honest about declaiming these. Absolutely nothing useful is realized when one person who holds that there is a 0 percent probability of something argues against another person who holds that the probability is 100 percent. Many wars—like the sectarian wars in Europe in the early days of the printing press—probably result from something like this premise. This does not imply that all prior beliefs are equally correct or equally valid. But I’m of the view that we can never achieve perfect objectivity, rationality, or accuracy in our beliefs. Instead, we can strive to be less subjective, less irrational, and less wrong. Making predictions based on our beliefs is the best (and perhaps even the only) way to test ourselves. If objectivity is the concern for a greater truth beyond our personal circumstances, and prediction is the best way to examine how closely aligned our personal perceptions are with that greater truth, the most objective among us are those who make the most accurate predictions.
Poe recognized just how impressive it might be for a machine to play chess at all. The first mechanical computer, what Charles Babbage called the difference engine, had barely been conceived of at the time that Poe wrote his exposé. Babbage’s proposed computer, which was never fully built during his lifetime, might at best hope to approximate some elementary functions like logarithms in addition to carrying out addition, subtraction, multiplication, and division. Poe thought of Babbage’s work as impressive enough—but still, all it did was take predictable inputs, turn a few gears, and spit out predictable outputs. There was no intelligence there—it was purely mechanistic. A computer that could play chess, on the other hand, verged on being miraculous because of the judgment required to play the game well.
The father of the modern chess computer was MIT’s Claude Shannon, a mathematician regarded as the founder of information theory, who in 1950 published a paper called “Programming a Computer for Playing Chess.”8 Shannon identified some of the algorithms and techniques that form the backbone of chess programs today. He also recognized why chess is such an interesting problem for testing the powers of information-processing machines.
Both computer programs and human chess masters therefore rely on making simplifications to forecast the outcome of the game. We can think of these simplifications as “models,” but heuristics is the preferred term in the study of computer programming and human decision making. It comes from the same Greek root word from which we derive eureka.10 A heuristic approach to problem solving consists of employing rules of thumb when a deterministic solution to a problem is beyond our practical capacities. Heuristics are very useful things, but they necessarily produce biases and blind spots.11 For instance, the heuristic “When you encounter a dangerous animal, run away!”
A chess game, like everything else, has three parts: the beginning, the middle and the end. What’s a little different about chess is that each of these phases tests different intellectual and emotional skills, making the game a mental triathlon of speed, strength, and
But chess computers had long been rather poor at the opening phase of the game. Although the number of possibilities was the most limitless, the objectives were also the least clear. When there are branches on the tree, calculating 3 moves per second or 200 million is about equally fruitless unless you are harnessing that power in a directed way. Both computer and human players need to break a chess game down into intermediate goals: for instance, capturing an opponent’s pawn or putting their king into check. In the middle of the match, once the pieces are ed in combat and threaten one another, there are many such strategic objectives available. It is a matter of devising tactics to accomplish them, and forecasting which might have the most beneficial effects on the remainder of the game. The goals of the opening moves, however, are more abstract.
Computers struggle with abstract and open-ended problems, whereas humans understand heuristics like “control the center of the board” and “keep your pawns organized” and can devise any number of creative ways to achieve them.
Kasparov’s goal, therefore, in his first game of his six-game match against Deep Blue in 1997, was to take the program out of database-land and make it fly blind again. The opening move he played was fairly common; he moved his knight to the square of the board that players know as f3. Deep Blue responded on its second move by advancing its bishop to threaten Kasparov’s knight—undoubtedly because its databases showed that such a move had historically reduced white’s winning percentage* from 56 percent to 51 percent. Those databases relied on the assumption, however, that Kasparov would respond as almost all other players had when faced with the position,22 by moving his knight back out of the way. Instead, he ignored the threat, figuring that Deep Blue was bluffing,23 and chose instead to move one of his pawns to pave the way for his bishop to control the center of the board. Kasparov’s move, while sound strategically, also accomplished another objective. He had made just three moves and Deep Blue had made just two, and yet the position they had now achieved (illustrated in figure 9-2) had literally occurred just once before in master-level competition24 out of the hundreds of thousands of games in Deep Blue’s database. Even when very common chess moves are played, there are so many possible branches on the tree that databases are useless after perhaps ten or fifteen moves. In any long game of chess, it is quite likely that you and your opponent will eventually reach some positions that literally no two players in the history of humanity have encountered before. But Kasparov had taken the database out after just three moves. As we have learned throughout this book, purely statistical approaches toward forecasting are ineffective at best when there is not a sufficient sample of data to work with. Deep Blue would need to “think” for itself.
Great players like Kasparov do not delude themselves into thinking they can calculate all these possibilities. This is what separates elite players from amateurs. In his famous study of chess players, the Dutch psychologist Adriaan de Groot found that amateur players, when presented with a chess problem, often frustrated themselves by looking for the perfect move, rendering themselves incapable of making any move at all.26 Chess masters, by contrast, are looking for a good move—and certainly if at all possible the best move in a given position—but they are more forecasting how the move might favorably dispose their position than trying to enumerate every possibility. It is “pure fantasy,” the American grandmaster Reuben Fine wrote,27 to assume that human chess players have calculated every position to completion twenty or thirty moves in advance.
Chess players learn through memory and experience where to concentrate their thinking. Sometimes this involves probing many branches of the tree but just a couple of moves down the line; at other times, they focus on just one branch but carry out the calculation to a much greater depth. This type of trade-off between breadth and depth is common anytime that we face a complicated problem. The Defense Department and the CIA, for instance, must decide whether to follow up on a broader array of signals in predicting and preventing potential terrorist attacks, or instead to focus on those consistent with what they view as the most likely threats. Elite chess players tend to be good at metacognition—thinking about the way they think—and correcting themselves if they don’t seem to be striking the right balance.
Computer chess machines, to some extent, get to have it both ways. They use heuristics to prune their search trees, focusing more of their processing power on more promising branches rather than calculating every one to the same degree of depth. But because they are so much faster at calculation, they don’t have to compromise as much, evaluating all the possibilities a little bit and the most important-seeming ones in greater detail. But computer chess programs can’t always see the bigger picture and think strategically. They are very good at calculating the tactics to achieve some near-term objective but not very strong at determining which of these objectives are most important in the grander scheme of the game. Kasparov tried to exploit the blind spots in Deep Blue’s heuristics by baiting it into mindlessly pursuing plans that did not improve its strategic position. Computer chess programs often prefer short-term objectives that can be broken down and quantized and that don’t require them to evaluate the chessboard as a holistic organism. A classic example of the computer’s biases is its willingness to accept sacrifices; it is often very agreeable when a strong player offers to trade a better piece for a weaker one. The heuristic “Accept a trade when your opponent gives up the more powerful piece” is usually a good one—but not necessarily when you are up against a player like Kasparov and he is willing to take the seemingly weaker side of the deal; he knows the tactical loss is outweighed by strategic gain. Kasparov offered Deep Blue such a trade thirty moves into his first game, sacrificing a rook for a bishop,* and to his delight Deep Blue accepted.
Literally all positions in which there are six or fewer pieces on the board have been solved to completion. Work on seven-piece positions is mostly complete—some of the solutions are intricate enough to require as many as 517 moves—but computers have memorized exactly which are the winning, losing, and drawing ones. Thus, something analogous to a black hole has emerged by this stage of the game: a point beyond which the gravity of the game tree becomes inescapable, when the computer will draw all positions that should be drawn and win all of them that should be won. The abstract goals of this autumnal phase of a chess game are replaced by a set of concrete ones: get your queenside pawn to here, and you will win; induce black to move his rook there, and you will draw.
But this had been something different: a tactical error in a relatively simple position—exactly the sort of mistake that computers don’t make.
“Positions that are good for computers are complex positions with lots of pieces on the board so there’s lots of legal moves available,” Campbell told me. “We want the positions where tactics are more important than strategy. So you can do some minor things to encourage that.” In this sense, Deep Blue was more “human” than any chess computer before or since. Although game theory does not come into play in chess to the same degree it does in games of incomplete information like poker, the opening sequences are one potential exception. Making a slightly inferior move to throw your opponent off-balance can undermine months of his preparation time—or months of yours if he knows the right response to it. But most computers try to play “perfect” chess rather than varying their game to match up well against their opponent. Deep Blue instead did what most human players do and leaned into positions where Campbell thought it might have a comparative advantage.
“There is a well-understood algorithm to solve chess,” Campbell told me. “I could probably write the program in half a day that could solve the game if you just let it run long enough.” In practice, however, “it takes the lifetime of the universe to do that,” he lamented.
My general advice, in the broader context of forecasting, is to lean heavily toward the “bug” interpretation when your model produces an unexpected or hard-to-explain result. It is too easy to mistake noise for a signal. Bugs can undermine the hard work of even the strongest forecasters.
People marveled at Fischer’s skill because he was so young, but perhaps it was for exactly that reason that he found the moves: he had the full breadth of his imagination at his disposal. The blind spots in our thinking are usually of our own making and they can grow worse as we age. Computers have their blind spots as well, but they can avoid these failures of the imagination by at least considering all possible moves.
But this does not mean that computers produce perfect forecasts, or even necessarily good ones. The acronym GIGO (“garbage in, garbage out”) sums up this problem. If you give a computer bad data, or devise a foolish set of instructions for it to analyze, it won’t spin straw into gold.
“There are these experiments running all the time,” said Hal Varian, the chief economist at Google, when I met him there. “You should think of it as more of an organism, a living thing. I have said that we should be concerned about what happens when it comes alive, like Skynet.* But we made a deal with the governor of California”—at the time, Arnold Schwarzenegger—“to come and aid us.”
“So Google is doing on a rough order of ten thousand experiments a year.” Some of these experiments are highly visible—occasionally involving rolling out a whole new product line. But most are barely noticeable: moving the placement of a logo by a few pixels, or slightly permuting the background color on an advertisement, and then seeing what effect that has on click-throughs or monetization. Many of the experiments are applied to as few as 0.5 percent of Google’s users, depending on how promising the idea seems to be. When you search for a term on Google, you probably don’t think of yourself as participating in an experiment. But from Google’s standpoint, things are a little different. The search results that Google returns, and the order in which they appear on the Pag., represent their prediction about which results you will find most useful. How is a subjective-seeming quality like “usefulness” measured and predicted? If you search for a term like best new mexican restaurant, does that mean you are planning a trip to Albuquerque? That you are looking for a Mexican restaurant that opened recently? That you want a Mexican restaurant that serves Nuevo Latino cuisine? You probably should have formed a better search query, but since you didn’t, Google can convene a panel of 1,000 people who made the same request, show them a wide variety of Web Pag.s, and have them rate the utility of each one on a scale of 0 to 10. Then Google would display the Pag.s to you in order of the highest to lowest average rating.
In fact, bluffing and aggressive play is not just a luxury in poker but a necessity—otherwise your play is just too predictable. Poker games have become extremely aggressive since I stopped playing regularly five years ago, and game theory13 as well as computer simulations14 strongly suggest this is the optimal approach.
What you see is a graph that consists of effort on one axis and accuracy on the other. You could label the axes differently—for instance, experience on the one hand and skill on the other. But the same general idea holds. By effort or experience I mean the amount of money, time, or critical thinking that you are willing to devote toward a predictive problem. By accuracy or skill I mean how reliable the predictions will prove to be in the real world. The name for the curve comes from the well-known business maxim called the Pareto principle or 80-20 rule (as in: 80 percent of your profits come from 20 percent of your customers16). As I apply it here, it posits that getting a few basic things right can go a long way. In poker, for instance, simply learning to fold your worst hands, bet your best ones, and make some effort to consider what your opponent holds will substantially mitigate your losses. If you are willing to do this, then perhaps 80 percent of the time you will be making the same decision as one of the best poker players like Dwan—even if you have spent only 20 percent as much time studying the game. This relationship also holds in many other disciplines in which prediction is vital. The first 20 percent often begins with having the right data, the right technology, and the right incentives. You need to have some information—more of it rather than less, ideally—and you need to make sure that it is quality-controlled.
Then you might progress to a few intermediate steps, developing some rules of thumb (heuristics) that are grounded in experience and common sense and some systematic process to make a forecast rather than doing so on an ad hoc basis. These things aren’t exactly easy—many people get them wrong. But they aren’t hard either, and by doing them you may be able to make predictions 80 percent as reliable as those of the world’s foremost expert. Sometimes, however, it is not so much how good your predictions are in an absolute sense that matters but how good they are relative to the competition.
In cases like these, it can require a lot of extra effort to beat the competition. You will find that you soon encounter diminishing returns. The extra experience that you gain, the further wrinkles that you add to your strategy, and the additional variables that you put into your forecasting model—these will only make a marginal difference. Meanwhile, the helpful rules of thumb that you developed—now you will need to learn the exceptions to them. However, when a field is highly competitive, it is only through this painstaking effort around the margin that you can make any money. There is a “water level” established by the competition and your profit will be like the tip of an iceberg: a small sliver of competitive advantage floating just above the surface, but concealing a vast bulwark of effort that went in to support it. I’ve tried to avoid these sorts of areas. Instead, I’ve been fortunate enough to take advantage of fields where the water level was set pretty low, and getting the basics right counted for a lot.
If you have strong analytical skills that might be applicable in a number of disciplines, it is very much worth considering the strength of the competition. It is often possible to make a profit by being pretty good at prediction in fields where the competition succumbs to poor incentives, bad habits, or blind adherence to tradition—or because you have better data or technology than they do. It is much harder to be very good in fields where everyone else is getting the basics right—and you may be fooling yourself if you think you have much of an edge. In general, society does need to make the extra effort at prediction, even though it may entail a lot of hard work with little immediate reward—or we need to be more aware that the approximations we make come with trade-offs. But if you’re approaching prediction as more of a business proposition, you’re usually better off finding someplace where you can be the big fish in a small pond.
In the United States, we live in a very results-oriented society. If someone is rich or famous or beautiful, we tend to think they deserve to be those things. Often, in fact, these factors are self-reinforcing: making money begets more opportunities to make money; being famous provides someone with more ways to leverage their celebrity; standards of beauty may change with the look of a Hollywood starlet. This is not intended as a political statement, an argument for (or against) greater redistribution of wealth or anything like that. As an empirical matter, however, success is determined by some combination of hard work, natural talent, and a person’s opportunities and environment—in other words, some combination of noise and signal.
In fact, free-market capitalism and Bayes’ theorem come out of something of the same intellectual tradition. Adam Smith and Thomas Bayes were contemporaries, and both were educated in Scotland and were heavily influenced by the philosopher David Hume. Smith’s “invisible hand” might be thought of as a Bayesian process, in which prices are gradually updated in response to changes in supply and demand, eventually reaching some equilibrium. Or, Bayesian reasoning might be thought of as an “invisible hand” wherein we gradually update and improve our beliefs as we debate our ideas, sometimes placing bets on them when we can’t agree. Both are consensus-seeking processes that take advantage of the wisdom of crowds.
Nevertheless, there is strong empirical and theoretical evidence that there is a benefit in aggregating different forecasts. Across a number of disciplines, from macroeconomic forecasting to political polling, simply taking an average of everyone’s forecast rather than relying on just one has been found to reduce forecast error,14 often by about 15 or 20 percent. But before you start averaging everything together, you should understand three things. First, while the aggregate forecast will essentially always be better than the typical individual’s forecast, that doesn’t necessarily mean it will be good. For instance, aggregate macroeconomic forecasts are much too crude to predict recessions more than a few months in advance. They are somewhat better than individual economists’ forecasts, however. Second, the most robust evidence indicates that this wisdom-of-crowds principle holds when forecasts are made independently before being averaged together. In a true betting market (including the stock market), people can and do react to one another’s behavior. Under these conditions, where the crowd begins to behave more dynamically, group behavior becomes more complex. Third, although the aggregate forecast is better than the typical individual’s forecast, it does not necessarily hold that it is better than the best individual’s forecast. Perhaps there is some polling firm, for instance, whose surveys are so accurate that it is better to use their polls and their polls alone rather than dilute them with numbers from their less-accurate peers.
When this property has been studied over the long run, however, the aggregate forecast has often beaten even the very best individual forecast. A study of the Blue Chip Economic Indicators survey, for instance, found that the aggregate forecast was better over a multiyear period than the forecasts issued by any one of the seventy economists that made up the panel.15 Another study by Wolfers, looking at predictions of NFL football games, found that the consensus forecasts produced by betting markets were better than about 99.5 percent of those from individual handicappers.16 And this is certainly true of political polling; models that treat any one poll as the Holy Grail are more prone to embarrassing failures.17
After looking at enough of this type of data, Fama refined his hypothesis to cover three distinct cases,31 each one making a progressively bolder claim about the predictability of markets. First, there is the weak form of efficient-market hypothesis. What this claims is that stock-market prices cannot be predicted from analyzing past statistical patterns alone. In other words, the chartist’s techniques are bound to fail. The semistrong form of efficient-market hypothesis takes things a step further. It argues that fundamental analysis—meaning, actually looking at publicly available information on a company’s financial statements, its business model, macroeconomic conditions and so forth—is also bound to fail and will also not produce returns that consistently beat the market. Finally, there is the strong form of efficient-market hypothesis, which claims that even private information—insider secrets—will quickly be incorporated into market prices and will not produce above-average returns. This version of efficient-market hypothesis is meant more as the logical extreme of the theory and is not believed literally by most proponents of efficient markets (including Fama.32) There is fairly unambiguous evidence, instead, that insiders make above-average returns.
One disturbing example is that members of Congress, who often gain access to inside information about a company while they are lobbied and who also have some ability to influence the fate of companies through legislation, return a profit on their investments that beats market averages by 5 to 10 percent per year,33 a remarkable rate that would make even Bernie Madoff blush.
These statistics represent a potential complication for efficient-market hypothesis: when it’s not your own money on the line but someone else’s, your incentives may change. Under some circumstances, in fact, it may be quite rational for traders to take positions that lose money for their firms and their investors if it allows them to stay with the herd and reduces their chance of getting fired.70 There is significant theoretical and empirical evidence71 for herding behavior among mutual funds and other institutional investors.72 “The answer as to why bubbles form,” Blodget told me, “is that it’s in everybody’s interest to keep markets going up.”
One conceit of economics is that markets as a whole can perform fairly rationally, even if many of the participants within them are irrational. But irrational behavior in the markets may result precisely because individuals are responding rationally according to their incentives. So long as most traders are judged on the basis of short-term performance, bubbles involving large deviations of stock prices from their long-term values are possible—and perhaps even inevitable.
The heuristic of “follow the crowd, especially when you don’t know any better” usually works pretty well. And yet, there are those times when we become too trusting of our neighbors—like in the 1980s “Just Say No” commercials, we do something because Everyone Else Is Doing it Too. Instead of our mistakes canceling one another out, which is the idea behind the wisdom of crowds,74 they instead begin to reinforce one another and spiral out of control. The blind lead the blind and everyone falls off a cliff. This phenomenon occurs fairly rarely, but it can be quite disastrous when it does.
But, if you think a market is efficient—efficient enough that you can’t really beat it for a profit—then it would be irrational for you to place any trades. In fact, efficient-market hypothesis is intrinsically somewhat self-defeating. If all investors believed the theory—that they can’t make any money from trading since the stock market is unbeatable—there would be no one left to make trades and therefore no market at all.
Some theorists have proposed that we should think of the stock market as constituting two processes in one.98 There is the signal track, the stock market of the 1950s that we read about in textbooks. This is the market that prevails in the long run, with investors making relatively few trades, and prices well tied down to fundamentals. It helps investors to plan for their retirement and helps companies capitalize themselves. Then there is the fast track, the noise track, which is full of momentum trading, positive feedbacks, skewed incentives and herding behavior. Usually it is just a rock-paper-scissors game that does no real good to the broader economy—but also perhaps also no real harm. It’s just a bunch of sweaty traders passing money around. However, these tracks happen to run along the same road, as though some city decided to hold a Formula 1 race but by some bureaucratic oversight forgot to close one lane to commuter traffic. Sometimes, like during the financial crisis, there is a big accident, and regular investors get run over. This sort of duality, what the physicist Didier Sornette calls “the fight between order and disorder,”99 is common in complex systems, which are those governed by the interaction of many separate individual parts.
Complex systems like these can at once seem very predictable and very unpredictable. Earthquakes are very well described by a few simple laws (we have a very good idea of the long-run frequency of a magnitude 6.5 earthquake in Los Angeles). And yet they are essentially unpredictable from day to day. Another characteristic of these systems is that they periodically undergo violent and highly nonlinear* phase changes from orderly to chaotic and back again. For Sornette and others who take highly mathematical views of the market, the presence of periodic bubbles seems more or less inevitable, an intrinsic property of the system.
I am partial toward this perspective. My view on trading markets (and toward free-market capitalism more generally) is the same as Winston Churchill’s attitude toward democracy.100 I think it’s the worst economic system ever invented—except for all the other ones. Markets do a good job most of the time, but I don’t think we’ll ever be rid of bubbles.
Can we know when we’re in that 10 percent phase? Then we might hope to profit from bubbles. Or, less selfishly, we could create softer landings that lessened the need for abhorrent taxpayer bailouts. Bubble detection does not seem so hopeless. I don’t think we’re ever going to bat 100 percent, or even 50 percent, but I think we can get somewhere.
We should examine the evidence and articulate what might be thought of as healthy skepticism toward climate predictions. As you will see, this kind of skepticism does not resemble the type that is common in blogs or in political arguments over global warming.
It would be nice if we could just plug data into a statistical model, crunch the numbers, and take for granted that it was a good representation of the real world. Under some conditions, especially in data-rich fields like baseball, that assumption is fairly close to being correct. In many other cases, a failure to think carefully about causality will lead us up blind alleys.
First, it claims that atmospheric concentrations of greenhouse gases like CO2 are increasing, and as a result of human activity. This is a matter of simple observation. Many industrial processes, particularly the use of fossil fuels, produce CO2 as a by-product.18 Because CO2 remains in the atmosphere for a long time, its concentrations have been rising: from about 315 parts per million (ppm) when CO2 levels were first directly monitored at the Mauna Loa Observatory in Hawaii in 1959 to about 390 PPM as of 2011.19 The second claim, “these increases will enhance the greenhouse effect, resulting on average in additional warming of the Earth’s surface,” is essentially just a restatement of the IPCC’s first conclusion that the greenhouse effect exists, phrased in the form of a prediction.
The third claim—that water vapor will also increase along with gases like CO2, thereby enhancing the greenhouse effect—is modestly bolder. Water vapor, not CO2, is the largest contributor to the greenhouse effect.21 If there were an increase in CO2 alone, there would still be some warming, but not as much as has been observed to date or as much as scientists predict going forward.
One type of skepticism flows from self-interest. In 2011 alone, the fossil fuel industry spent about $300 million on lobbying activities (roughly double what they’d spent just five years earlier).32, * Some climate scientists I later spoke with for this chapter used conspiratorial language to describe their activities. But there is no reason to allege a conspiracy when an explanation based on rational self-interest will suffice: these companies have a financial incentive to preserve their position in the status quo, and they are within their First Amendment rights to defend it. What they say should not be mistaken for an attempt to make accurate predictions, however. A second type of skepticism falls into the category of contrarianism. In any contentious debate, some people will find it advantageous to align themselves with the crowd, while a smaller number will come to see themselves as persecuted outsiders. This may especially hold in a field like climate science, where the data is noisy and the predictions are hard to experience in a visceral way. And it may be especially common in the United States, which is admirably independent-minded. “If you look at climate, if you look at ozone, if you look at cigarette smoking, there is always a community of people who are skeptical of the science-driven results,” Rood told me. Most importantly, there is scientific skepticism. “You’ll find that some in the scientific community have valid concerns about one aspect of the science or the other,” Rood said. “At some level, if you really want to move forward, we need to respect some of their points of view.”
First, Armstrong and Green contend that agreement among forecasters is not related to accuracy—and may reflect bias as much as anything else. “You don’t vote,” Armstrong told me. “That’s not the way science progresses.”
Next, they say the complexity of the global warming problem makes forecasting a fool’s errand. “There’s been no case in history where we’ve had a complex thing with lots of variables and lots of uncertainty, where people have been able to make econometric models or any complex models work,” Armstrong told me. “The more complex you make the model the worse the forecast gets.” Finally, Armstrong and Green write that the forecasts do not adequately account for the uncertainty intrinsic to the global warming problem. In other words, they are potentially overconfident.
This book advises you to be wary of forecasters who say that the science is not very important to their jobs, or scientists who say that forecasting is not very important to their jobs! These activities are essentially and intimately related. A forecaster who says he doesn’t care about the science is like the cook who says he doesn’t care about food. What distinguishes science, and what makes a forecast scientific, is that it is concerned with the objective world. What makes forecasts fail is when our concern only extends as far as the method, maxim, or model.
In science, progress is possible. In fact, if one believes in Bayes’s theorem, scientific progress is inevitable as predictions are made and as beliefs are tested and refined.* The march toward scientific progress is not always straightforward, and some well-regarded (even “consensus”) theories are later proved wrong—but either way science tends to move toward the truth. In politics, by contrast, we seem to be growing ever further away from consensus. The amount of polarization between the two parties in the United States House, which had narrowed from the New Deal through the 1970s, had grown by 2011 to be the worst that it had been in at least a century.111 Republicans have moved especially far away from the center,112 although Democrats have to some extent too.
To Wohlstetter, a signal is a piece of evidence that tells us something useful about our enemy’s intentions;15 this book thinks of a signal as an indication of the underlying truth behind a statistical or predictive problem.* Wohlstetter’s definition of noise is subtly different too. Whereas I tend to use noise to mean random patterns that might easily be mistaken for signals, Wohlstetter uses it to mean the sound produced by competing signals.16 In the field of intelligence analysis, the absence of signals can signify something important (the absence of radio transmissions from Japan’s carrier fleet signaled their move toward Hawaii) and the presence of too many signals can make it exceptionally challenging to discern meaning. They may drown one another out in an ear-splitting cacophony.
In cases like these, what matters is not our signal detection capabilities: provided that we have met some basic threshold of competence, we will have perceived plenty of signals before something on the scale of Pearl Harbor or September 11. The relevant signals will be somewhere in a file cabinet or a computer database. But so will a whole host of irrelevant ones. We need signal analysis capabilities to isolate the pertinent signals from the echo chamber.
Schelling writes of our propensity to mistake the unfamiliar for the improbable: There is a tendency in our planning to confuse the unfamiliar with the improbable. The contingency we have not considered seriously looks strange; what looks strange is thought improbable; what is improbable need not be considered seriously.
But at least this flawed type of thinking would have involved some thinking. If we had gone through the thought process, perhaps we could have recognized how loose our assumptions were. Schelling suggests that our problems instead run deeper. When a possibility is unfamiliar to us, we do not even think about it. Instead we develop a sort of mind-blindness to it. In medicine this is called anosognosia:20 part of the physiology of the condition prevents a patient from recognizing that they have the condition. Some Alzheimer’s patients present in this way.
Terror attacks behave in something of the same way. The Lockerbie bombing and Oklahoma City were the equivalent of magnitude 7 earthquakes. While destructive enough on their own, they also implied the potential for something much worse—something like the September 11 attacks, which might be thought of as a magnitude 8. It was not an outlier but instead part of the broader mathematical pattern..
In 1982, the social scientists James Q. Wilson and George L. Kelling introduced what they called the “broken windows” theory of crime deterrence.71 The idea was that by focusing on smaller-scale types of crime, like vandalism and misdemeanor drug offenses,72 police could contribute to an overall climate of lawfulness and therefore prevent bigger crime. The empirical evidence for the merit of this theory is quite mixed.73, 74 However, the theory was very warmly embraced by police departments from Los Angeles to New York because it lowered the degree of difficulty for police departments and provided for much more attainable goals. It’s much easier to bust a sixteen-year-old kid for smoking a joint than to solve an auto theft or prevent a murder. Everybody likes to live in a cleaner, safer neighborhood. But it’s unclear whether the broken-windows theory is more than window dressing.
The first approximation—the unqualified statement that no investor can beat the stock market—seems to be extremely powerful. By the time we get to the last one, which is full of expressions of uncertainty, we have nothing that would fit on a bumper sticker. But it is also a more complete description of the objective world.
Information becomes knowledge only when it’s placed in context. Without it, we have no way to differentiate the signal from the noise, and our search for the truth might be swamped by false positives. What isn’t acceptable under Bayes’s theorem is to pretend that you don’t have any prior beliefs. You should work to reduce your biases, but to say you have none is a sign that you have many.
This is perhaps the easiest Bayesian principle to apply: make a lot of forecasts.