Excerpts from James Gleick, The Information: A History, A Theory, A Flood, 2011, Vintage.
Signals were distorted as they passed across the circuits; the greater the distance, the worse the distortion. Campbell’s solution was partly mathematics and partly electrical engineering.
As a first-year research assistant at MIT, he worked on a hundred-ton proto-computer, Vannevar Bush’s Differential Analyzer, which could solve equations with great rotating gears, shafts, and wheels. At twenty-two he wrote a dissertation that applied a nineteenth-century idea, George Boole’s algebra of logic, to the design of electrical circuits. (Logic and electricity—a peculiar combination.) Later he worked with the mathematician and logician Hermann Weyl, who taught him what a theory was: “Theories permit consciousness to ‘jump over its own shadow,’ to leave behind the given, to represent the transcendent, yet, as is self-evident, only in symbols.”♦ In 1943 the English mathematician and code breaker Alan Turing visited Bell Labs on a cryptographic mission and met Shannon sometimes over lunch, where they traded speculation on the future of artificial thinking machines. (“Shannon wants to feed not just data to a Brain, but cultural things!”♦ Turing exclaimed. “He wants to play music to it!”) Shannon also crossed paths with Norbert Wiener, who had taught him at MIT and by 1948 was proposing a new discipline to be called “cybernetics,” the study of communication and control.
Logic and circuits crossbred to make a new, hybrid thing; so did codes and genes. In his solitary way, seeking a framework to connect his many threads, Shannon began assembling a theory for information.
A few engineers, especially in the telephone labs, began speaking of information. They used the word in a way suggesting something technical: quantity of information, or measure of information. Shannon adopted this usage. For the purposes of science, information had to mean something special. Three centuries earlier, the new discipline of physics could not proceed until Isaac Newton appropriated words that were ancient and vague—force, mass, motion, and even time—and gave them new meanings. Newton made these terms into quantities, suitable for use in mathematical formulas.
“Man the food-gatherer reappears incongruously as information-gatherer,”♦ remarked Marshall McLuhan in 1967.♦ He wrote this an instant too soon, in the first dawn of computation and cyberspace. We can see now that information is what our world runs on: the blood and the fuel, the vital principle. It pervades the sciences from top to bottom, transforming every branch of knowledge. Information theory began as a bridge from mathematics to electrical engineering and from there to computing. What English speakers call “computer science” Europeans have known as informatique, informatica, and Informatik. Now even biology has become an information science, a subject of messages, instructions, and code. Genes encapsulate information and enable procedures for reading it in and writing it out. Life spreads by networking. The body itself is an information processor. Memory resides not just in brains but in every cell. No wonder genetics bloomed along with information theory. DNA is the quintessential information molecule, the most advanced message processor at the cellular level—an alphabet and a code, 6 billion bits to form a human being. “What lies at the heart of every living thing is not a fire, not warm breath, not a ‘spark of life,’ ”♦ declares the evolutionary theorist Richard Dawkins. “It is information, words, instructions.… If you want to understand life, don’t think about vibrant, throbbing gels and oozes, think about information technology.”
The cells of an organism are nodes in a richly interwoven communications network, transmitting and receiving, coding and decoding. Evolution itself embodies an ongoing exchange of information between organism and environment. “The information circle becomes the unit of life,”♦ says Werner Loewenstein after thirty years spent studying intercellular communication. He reminds us that information means something deeper now: “It connotes a cosmic principle of organization and order, and it provides an exact measure of that.” The gene has its cultural analog, too: the meme. In cultural evolution, a meme is a replicator and propagator—an idea, a fashion, a chain letter, or a conspiracy theory. On a bad day, a meme is a virus. Economics is recognizing itself as an information science, now that money itself is completing a developmental arc from matter to bits, stored in computer memory and magnetic strips, world finance coursing through the global nervous system. Even when money seemed to be material treasure, heavy in pockets and ships’ holds and bank vaults, it always was information. Coins and notes, shekels and cowries were all just short-lived technologies for tokenizing information about who owns what. And atoms? Matter has its own coinage, and the hardest science of all, physics, seemed to have reached maturity. But physics, too, finds itself sideswiped by a new intellectual model. In the years after World War II, the heyday of the physicists, the great news of science appeared to be the splitting of the atom and the control of nuclear energy. Theorists focused their prestige and resources on the search for fundamental particles and the laws governing their interaction, the construction of giant accelerators and the discovery of quarks and gluons. From this exalted enterprise, the business of communications research could not have appeared further removed. At Bell Labs, Claude Shannon was not thinking about physics. Particle physicists did not need bits. And then, all at once, they did.
John Archibald Wheeler, the last surviving collaborator of both Einstein and Bohr, put this manifesto in oracular monosyllables: “It from Bit.” Information gives rise to “every it—every particle, every field of force, even the spacetime continuum itself.”♦ This is another way of fathoming the paradox of the observer: that the outcome of an experiment is affected, or even determined, when it is observed. Not only is the observer observing, she is asking questions and making statements that must ultimately be expressed in discrete bits. “What we call reality,” Wheeler wrote coyly, “arises in the last analysis from the posing of yes-no questions.” He added: “All things physical are information-theoretic in origin, and this is a participatory universe.” The whole universe is thus seen as a computer—a cosmic information-processing machine.
Hardly any information technology goes obsolete. Each new one throws its predecessors into relief. Thus Thomas Hobbes, in the seventeenth century, resisted his era’s new-media hype: “The invention of printing, though ingenious, compared with the invention of letters is no great matter.”♦ Up to a point, he was right. Every new medium transforms the nature of human thought. In the long run, history is the story of information becoming aware of itself.
Pag. 15. 288 Here was a messaging system that outpaced the best couriers, the fastest horses on good roads with way stations and relays.
A German historian, Richard Hennig, traced and measured the route in 1908 and confirmed the feasibility of this chain of bonfires.♦ The meaning of the message had, of course, to be prearranged, effectively condensed into a single bit. A binary choice, something or nothing: the fire signal meant something, which, just this once, meant “Troy has fallen.” To transmit this one bit required immense planning, labor, watchfulness, and firewood. Many years later, lanterns in Old North Church likewise sent Paul Revere a single precious bit, which he carried onward, one binary choice: by land or by sea.
This was a prescient thought, and entirely theoretical, a product of new seventeenth-century knowledge of astronomy and geography. It was the first crack in the hitherto solid assumption of simultaneity. Anyway, as Browne noted, experts differed. Two more centuries would pass before anyone could actually travel fast enough, or communicate fast enough, to experience local time differences. For now, in fact, no one in the world could communicate as much, as fast, as far as unlettered Africans with their drums.
At first they conceived of a system built on two elements: the clicks (now called dots) and the spaces in between. Then, as they fiddled with the prototype keypad, they came up with a third sign: the line or dash, “when the circuit was closed a longer time than was necessary to make a dot.”♦ (The code became known as the dot-and-dash alphabet, but the unmentioned space remained just as important; Morse code was not a binary language.♦) That humans could learn this new language was, at first, wondrous.
In the name of speed, Morse and Vail had realized that they could save strokes by reserving the shorter sequences of dots and dashes for the most common letters. But which letters would be used most often? Little was known about the alphabet’s statistics. In search of data on the letters’ relative frequencies, Vail was inspired to visit the local newspaper office in Morristown, New Jersey, and look over the type cases.♦ He found a stock of twelve thousand E’s, nine thousand T’s, and only two hundred Z’s. He and Morse rearranged the alphabet accordingly. They had originally used dash-dash-dot to represent T, the second most common letter; now they promoted T to a single dash, thus saving telegraph operators uncountable billions of key taps in the world to come. Long afterward, information theorists calculated that they had come within 15 percent of an optimal arrangement for telegraphing English text.♦
They failed to decipher the code of the drums because, in effect, there was no code. Morse had bootstrapped his system from a middle symbolic layer, the written alphabet, intermediate between speech and his final code. His dots and dashes had no direct connection to sound; they represented letters, which formed written words, which represented the spoken words in turn. The drummers could not build on an intermediate code—they could not abstract through a layer of symbols—because the African languages, like all but a few dozen of the six thousand languages spoken in the modern world, lacked an alphabet. The drums metamorphosed speech.
He finally published his discoveries about drums in 1949, in a slim volume titled The Talking Drums of Africa. In solving the enigma of the drums, Carrington found the key in a central fact about the relevant African languages. They are tonal languages, in which meaning is determined as much by rising or falling pitch contours as by distinctions between consonants or vowels. This feature is missing from most Indo-European languages, including English, which uses tone only in limited, syntactical ways: for example, to distinguish questions (“you are happy ”) from declarations (“you are happy ”). But for other languages, including, most famously, Mandarin and Cantonese, tone has primary significance in distinguishing words. So it does in most African languages.
linguists have found it surprisingly difficult to agree on an exact inventory of phonemes for English or any other language (most estimates for English are in the vicinity of forty-five). The problem is that a stream of speech is a continuum; a linguist may abstractly, and arbitrarily, break it into discrete units, but the meaningfulness of these units varies from speaker to speaker and depends on the context. Most speakers’ instincts about phonemes are biased, too, by their knowledge of the written alphabet, which codifies language in its own sometimes arbitrary ways. In any case, tonal languages, with their extra variable, contain many more phonemes than were first apparent to inexperienced linguists.
The extra drumbeats, far from being extraneous, provide context. Every ambiguous word begins in a cloud of possible alternative interpretations; then the unwanted possibilities evaporate. This takes place below the level of consciousness. Listeners are hearing only staccato drum tones, low and high, but in effect they “hear” the missing consonants and vowels, too. For that matter, they hear whole phrases, not individual words. “Among peoples who know nothing of writing or grammar, a word per se, cut out of its sound group, seems almost to cease to be an intelligible articulation,”♦ Captain Rattray reported.
resemblance to Homeric formulas—not merely Zeus, but Zeus the cloud-gatherer; not just the sea, but the wine-dark sea—is no accident. In an oral culture, inspiration has to serve clarity and memory first. The Muses are the daughters of Mnemosyne.
Neither Kele nor English yet had words to say, allocate extra bits for disambiguation and error correction. Yet this is what the drum language did. Redundancy—inefficient by definition—serves as the antidote to confusion. It provides second chances. Every natural language has redundancy built in; this is why people can understand text riddled with errors and why they can understand conversation in a noisy room. The natural redundancy of English motivates the famous New York City subway poster of the 1970s (and the poem by James Merrill), if u cn rd ths u cn gt a gd jb w hi pa! (“This counterspell may save your soul,”♦ Merrill adds.)
After publishing his book, John Carrington came across a mathematical way to understand this point. A paper by a Bell Labs telephone engineer, Ralph Hartley, even had a relevant-looking formula: H = n log s, where H is the amount of information, n is the number of symbols in the message, and s is the number of symbols available in the language.♦ Hartley’s younger colleague Claude Shannon later pursued this lead, and one of his touchstone projects became a precise measurement of the redundancy in English. Symbols could be words, phonemes, or dots and dashes. The degree of choice within a symbol set varied—a thousand words or forty-five phonemes or twenty-six letters or three types of interruption in an electrical circuit. The formula quantified a simple enough phenomenon (simple, anyway, once it was noticed): the fewer symbols available, the more of them must be transmitted to get across a given amount of information. For the African drummers, messages need to be about eight times as long as their spoken equivalents.
“TRY TO IMAGINE,” proposed Walter J. Ong, Jesuit priest, philosopher, and cultural historian, “a culture where no one has ever ‘looked up’ anything.”♦ To subtract the technologies of information internalized over two millennia requires a leap of imagination backward into a forgotten past. The hardest technology to erase from our minds is the first of all: writing. This arises at the very dawn of history, as it must, because the history begins with the writing. The pastness of the past depends on it.♦
Writing, as a technology, requires premeditation and special art. Language is not a technology, no matter how well developed and efficacious. It is not best seen as something separate from the mind; it is what the mind does. “Language in fact bears the same relationship to the concept of mind that legislation bears to the concept of parliament,” says Jonathan Miller: “it is a competence forever bodying itself in a series of concrete performances.”♦ Much the same might be said of writing—it is concrete performance—but when the word is instantiated in paper or stone, it takes on a separate existence as artifice. It is a product of tools, and it is a tool. And like many technologies that followed, it thereby inspired immediate detractors.
With words we begin to leave traces behind us like breadcrumbs: memories in symbols for others to follow. Ants deploy their pheromones, trails of chemical information; Theseus unwound Ariadne’s thread. Now people leave paper trails. Writing comes into being to retain information across time and across space. Before writing, communication is evanescent and local; sounds carry a few yards and fade to oblivion. The evanescence of the spoken word went without saying. So fleeting was speech that the rare phenomenon of the echo, a sound heard once and then again, seemed a sort of magic. “This miraculous rebounding of the voice, the Greeks have a pretty name for, and call it Echo,”♦ wrote Pliny. “The spoken symbol,” as Samuel Butler observed, “perishes instantly without material trace, and if it lives at all does so only in the minds of those who heard it.” Butler was able to formulate this truth just as it was being falsified for the first time, at the end of the nineteenth century, by the arrival of the electric technologies for capturing speech. It was precisely because it was no longer completely true that it could be clearly seen. Butler completed the distinction: “The written symbol extends infinitely, as regards time and space, the range within which one mind can communicate with another; it gives the writer’s mind a life limited by the duration of ink, paper, and readers, as against that of his flesh and blood body.”♦
But the new channel does more than extend the previous channel. It enables reuse and “re-collection”—new modes. It permits whole new architectures of information. Among them are history, law, business, mathematics, and logic. Apart from their content, these categories represent new techniques. The power lies not just in the knowledge, preserved and passed forward, valuable as it is, but in the methodology: encoded visual indications, the act of transference, substituting signs for things. And then, later, signs for signs.
There is a progression from pictographic, writing the picture; to ideographic, writing the idea; and then logographic, writing the word.
In all the languages of earth there is only one word for alphabet (alfabet, alfabeto, ). The alphabet was invented only once. All known alphabets, used today or found buried on tablets and stone, descend from the same original ancestor, which arose near the eastern littoral of the Mediterranean Sea, sometime not much before 1500 BCE, in a region that became a politically unstable crossroads of culture, covering Palestine, Phoenicia, and Assyria. To the east lay the great civilization of Mesopotamia, with its cuneiform script already a millennium old; down the shoreline to the southwest lay Egypt, where hieroglyphics developed simultaneously and independently.
The paleographer has a unique bootstrap problem. It is only writing that makes its own history possible. The foremost twentieth-century authority on the alphabet, David Diringer, quoted an earlier scholar: “There never was a man who could sit down and say: ‘Now I am going to be the first man to write.’ ”♦
The alphabet spread by contagion. The new technology was both the virus and the vector of transmission. It could not be monopolized, and it could not be suppressed. Even children could learn these few, lightweight, semantically empty letters. Divergent routes led to alphabets of the Arab world and of northern Africa; to Hebrew and Phoenician; across central Asia, to Brahmi and related Indian script; and to Greece. The new civilization arising there brought the alphabet to a high degree of perfection. Among others, the Latin and Cyrillic alphabets followed along.
Greece had not needed the alphabet to create literature—a fact that scholars realized only grudgingly, beginning in the 1930s. That was when Milman Parry, a structural linguist who studied the living tradition of oral epic poetry in Bosnia and Herzegovina, proposed that the Iliad and the Odyssey not only could have been but must have been composed and sung without benefit of writing. The meter, the formulaic redundancy, in effect the very poetry of the great works served first and foremost to aid memory. Its incantatory power made of the verse a time capsule, able to transmit a virtual encyclopedia of culture across generations. His argument was first controversial and then overwhelmingly persuasive—but only because the poems were written down, sometime in the sixth or seventh century BCE. This act—the transcribing of the Homeric epics—echoes through the ages. “It was something like a thunder-clap in human history, which the bias of familiarity has converted into the rustle of papers on a desk,”♦ said Eric Havelock, a British classical scholar who followed Parry. “It constituted an intrusion into culture, with results that proved irreversible. It laid the basis for the destruction of the oral way of life and the oral modes of thought.” The transcription of Homer converted this great poetry into a new medium and made of it something unplanned: from a momentary string of words created every time anew by the rhapsode and fading again even as it echoed in the listener’s ear, to a fixed but portable line on a papyrus sheet. Whether this alien, dry mode would suit the creation of poetry and song remained to be seen.
Aristotle himself, son of the physician to the king of Macedonia and an avid, organized thinker, was attempting to systematize knowledge. The persistence of writing made it possible to impose structure on what was known about the world and, then, on what was known about knowing. As soon as one could set words down, examine them, look at them anew the next day, and consider their meaning, one became a philosopher, and the philosopher began with a clean slate and a vast project of definition to undertake. Knowledge could begin to pull itself up by the bootstraps. For Aristotle the most basic notions were worth recording and were necessary to record: A beginning is that which itself does not follow necessarily from anything else, but some second thing naturally exists or occurs after it. Conversely, an end is that which does itself naturally follow from something else, either necessarily or in general, but there is nothing else after it. A middle is that which itself comes after something else, and some other thing comes after it.♦ These are statements not about experience but about the uses of language to structure experience. In the same way, the Greeks created categories (this word originally meaning “accusations” or “predictions”) as a means of classifying animal species, insects, and fishes. In turn, they could then classify ideas. This was a radical, alien mode of thought. Plato had warned that it would repel most people: The multitude cannot accept the idea of beauty in itself rather than many beautiful things, nor anything conceived in its essence instead of the many specific things. Thus the multitude cannot be philosophic.♦ For “the multitude” we may understand “the preliterate.” They “lose themselves and wander amid the multiplicities of multifarious things,”♦ declared Plato, looking back on the oral culture that still surrounded him. They “have no vivid pattern in their souls.”
And what vivid pattern was that? Havelock focused on the process of converting, mentally, from a “prose of narrative” to a “prose of ideas”; organizing experience in terms of categories rather than events; embracing the discipline of abstraction. He had a word in mind for this process, and the word was thinking. This was the discovery, not just of the self, but of the thinking self—in effect, the true beginning of consciousness. In our world of ingrained literacy, thinking and writing seem scarcely related activities. We can imagine the latter depending on the former, but surely not the other way around: everyone thinks, whether or not they write. But Havelock was right. The written word—the persistent word—was a prerequisite for conscious thought as we understand it. It was the trigger for a wholesale, irreversible change in the human psyche—psyche being the word favored by Socrates/Plato as they struggled to understand. Plato, as Havelock puts it, is trying for the first time in history to identify this group of general mental qualities, and seeking for a term which will label them satisfactorily under a single type.… He it was who hailed the portent and correctly identified it. In so doing, he so to speak confirmed and clinched the guesses of a previous generation which had been feeling its way towards the idea that you could “think,” and that thinking was a very special kind of psychic activity, very uncomfortable, but also very exciting, and one which required a very novel use of Greek.
Logic might be imagined to exist independent of writing—syllogisms can be spoken as well as written—but it did not. Speech is too fleeting to allow for analysis. Logic descended from the written word, in Greece as well as India and China, where it developed independently.♦ Logic turns the act of abstraction into a tool for determining what is true and what is false: truth can be discovered in words alone, apart from concrete experience. Logic takes its form in chains: sequences whose members connect one to another. Conclusions follow from premises. These require a degree of constancy. They have no power unless people can examine and evaluate them.
“We know that formal logic is the invention of Greek culture after it had interiorized the technology of alphabetic writing,” Walter Ong says—it is true of India and China as well—“and so made a permanent part of its noetic resources the kind of thinking that alphabetic writing made possible.”♦ For evidence Ong turns to fieldwork of the Russian psychologist Aleksandr Romanovich Luria among illiterate peoples in remote Uzbekistan and Kyrgyzstan in Central Asia in the 1930s.♦ Luria found striking differences between illiterate and even slightly literate subjects, not in what they knew, but in how they thought. Logic implicates symbolism directly: things are members of classes; they possess qualities, which are abstracted and generalized. Oral people lacked the categories that become second nature even to illiterate individuals in literate cultures: for example, for geometrical shapes. Shown drawings of circles and squares, they named them as “plate, sieve, bucket, watch, or moon” and “mirror, door, house, apricot drying board.” They could not, or would not, accept logical syllogisms. A typical question: In the Far North, where there is snow, all bears are white. Novaya Zembla is in the Far North and there is always snow there. What color are the bears? Typical response: “I don’t know. I’ve seen a black bear. I’ve never seen any others.… Each locality has its own animals.” By contrast, a man who has just learned to read and write responds, “To go by your words, they should all be white.” To go by your words—in that phrase, a level is crossed. The information has been detached from any person, detached from the speaker’s experience. Now it lives in the words, little life-support modules. Spoken words also transport information, but not with the self-consciousness that writing brings. Literate people take for granted their own awareness of words, along with the array of word-related machinery: classification, reference, definition. Before literacy, there is nothing obvious about such techniques. “Try to explain to me what a tree is,” Luria says, and a peasant replies, “Why should I? Everyone knows what a tree is, they don’t need me telling them.”
“Basically the peasant was right,”♦ Ong comments. “There is no way to refute the world of primary orality. All you can do is walk away from it into literacy.” It is a twisting journey from things to words, from words to categories, from categories to metaphor and logic. Unnatural as it seemed to define tree, it was even trickier to define word, and helpful ancillary words like define were not at first available, the need never having existed. “In the infancy of logic, a form of thought has to be invented before the content can be filled up,”♦ said Benjamin Jowett, Aristotle’s nineteenth-century translator. Spoken languages needed further evolution.
Gongsun Long was a member of the Mingjia, the School of Names, and his delving into these paradoxes formed part of what Chinese historians call the “language crisis,” a running debate over the nature of language. Names are not the things they name. Classes are not coextensive with subclasses. Thus innocent-seeming inferences get derailed: “a man dislikes white horses” does not imply “a man dislikes horses.”
the paradoxes seem to be in language, or about language, one way to banish them was to purify the medium: eliminate ambiguous words and woolly syntax, employ symbols that were rigorous and pure. To turn, that is, to mathematics. By the beginning of the twentieth century, it seemed that only a system of purpose-built symbols could make logic work properly—free of error and paradoxes. This dream was to prove illusory; the paradoxes would creep back in, but no one could hope to understand until the paths of logic and mathematics converged. Mathematics, too, followed from the invention of writing. Greece is often thought of as the springhead for the river that becomes modern mathematics, with all its many tributaries down the centuries. But the Greeks themselves alluded to another tradition—to them, ancient—which they called Chaldean, and which we understand to be Babylonian. That tradition vanished into the sands, not to surface until the end of the nineteenth century, when tablets of clay were dug up from the mounds of lost cities.
Hammurabi himself was probably the first literate king, writing his own cuneiform rather than depending on scribes, and his empire building manifested the connection between writing and social control. “This process of conquest and influence is made possible by letters and tablets and stelae in an abundance that had never been known before,”♦ Jaynes declares. “Writing was a new method of civil direction, indeed the model that begins our own memo-communicating government.”
These symbols were hardly words—or they were words of a peculiar, slender, rigid sort. They seemed to arrange themselves into visible patterns in the clay, repetitious, almost artistic, not like any prose or poetry archeologists had encountered. They were like maps of a mysterious city. This was the key to deciphering them, finally: the ordered chaos that seems to guarantee the presence of meaning. It seemed like a task for mathematicians, anyway, and finally it was. They recognized geometric progressions, tables of powers, and even instructions for computing square roots and cube roots. Familiar as they were with the rise of mathematics a millennium later in ancient Greece, these scholars were astounded at the breadth and depth of mathematical knowledge that existed before in Mesopotamia. “It was assumed that the Babylonians had had some sort of number mysticism or numerology,” wrote Asger Aaboe in 1963, “but we now know how far short of the truth this assumption was.”♦ The Babylonians computed linear equations, quadratic equations, and Pythagorean numbers long before Pythagoras. In contrast to the Greek mathematics that followed, Babylonian mathematics did not emphasize geometry, except for practical problems; the Babylonians calculated areas and perimeters but did not prove theorems. Yet they could (in effect) reduce elaborate second-degree polynomials. Their mathematics seemed to value computational power above all.
In 1972, Donald Knuth, an early computer scientist at Stanford, looked at the remains of an Old Babylonian tablet the size of a paperback book, half lying in the British Museum in London, one-fourth in the Staatliche Museen in Berlin, and the rest missing, and saw what he could only describe, anachronistically, as an algorithm:
“This is the procedure” was a standard closing, like a benediction, and for Knuth redolent with meaning. In the Louvre he found a “procedure” that reminded him of a stack program on a Burroughs B5500. “We can commend the Babylonians for developing a nice way to explain an algorithm by example as the algorithm itself was being defined,” said Knuth. By then he himself was engrossed in the project of defining and explaining the algorithm; he was amazed by what he found on the ancient tablets. The scribes wrote instructions for placing numbers in certain locations—for making “copies” of a number, and for keeping a number “in your head.” This idea, of abstract quantities occupying abstract places, would not come back to life till much later.
Where is a symbol? What is a symbol? Even to ask such questions required a self-consciousness that did not come naturally. Once asked, the questions continued to loom. Look at these signs, philosophers implored. What are they? “Fundamentally letters are shapes indicating voices,”♦ explained John of Salisbury in medieval England. “Hence they represent things which they bring to mind through the windows of the eyes.” John served as secretary and scribe to the Archbishop of Canterbury in the twelfth century. He served the cause of Aristotle as an advocate and salesman. His Metalogicon not only set forth the principles of Aristotelian logic but urged his contemporaries to convert, as though to a new religion. (He did not mince words: “Let him who is not come to logic be plagued with continuous and everlasting filth.”) Putting pen to parchment in this time of barest literacy, he tried to examine the act of writing and the effect of words: “Frequently they speak voicelessly the utterances of the absent.”
Unfortunately the written word stands still. It is stable and immobile. Plato’s qualms were mostly set aside in the succeeding millennia, as the culture of literacy developed its many gifts: history and the law; the sciences and philosophy; the reflective explication of art and literature itself. None of that could have emerged from pure orality. Great poetry could and did, but it was expensive and rare. To make the epics of Homer, to let them be heard, to sustain them across the years and the miles required a considerable share of the available cultural energy.
Jonathan Miller rephrases McLuhan’s argument in quasi-technical terms of information: “The larger the number of senses involved, the better the chance of transmitting a reliable copy of the sender’s mental state.”♦♦ In the stream of words past the ear or eye, we sense not just the items one by one but their rhythms and tones, which is to say their music. We, the listener or the reader, do not hear, or read, one word at a time; we get messages in groupings small and large. Human memory being what it is, larger patterns can be grasped in writing than in sound. The eye can glance back. McLuhan considered this damaging, or at least diminishing. “Acoustic space is organic and integral,” he said, “perceived through the simultaneous interplay of all the senses; whereas ‘rational’ or pictorial space is uniform, sequential and continuous and creates a closed world with none of the rich resonance of the tribal echoland.”♦ For McLuhan, the tribal echoland is Eden. By their dependence on the spoken word for information, people were drawn together into a tribal mesh … the spoken word is more emotionally laden than the written.… Audile-tactile tribal man partook of the collective unconscious, lived in a magical integral world patterned by myth and ritual, its values divine.♦ Up to a point, maybe.
Was McLuhan right, or was Hobbes? If we are ambivalent, the ambivalence began with Plato. He witnessed writing’s rising dominion; he asserted its force and feared its lifelessness. The writer-philosopher embodied a paradox.
In fact, few had any concept of “spelling”—the idea that each word, when written, should take a particular predetermined form of letters. The word cony (rabbit) appeared variously as conny, conye, conie, connie, coni, cuny, cunny, and cunnie in a single 1591 pamphlet.♦ Others spelled it differently. And for that matter Cawdrey himself, on the title page of his book for “teaching the true writing,” wrote wordes in one sentence and words in the next. Language did not function as a storehouse of words, from which users could summon the correct items, preformed. On the contrary, words were fugitive, on the fly, expected to vanish again thereafter. When spoken, they were not available to be compared with, or measured against, other instantiations of themselves. Every time people dipped quill in ink to form a word on paper they made a fresh choice of whatever letters seemed to suit the task. But this was changing. The availability—the solidity—of the printed book inspired a sense that the written word should be a certain way, that one form was right and others wrong. First this sense was unconscious; then it began to rise toward general awareness. Printers themselves made it their business.
“Labour to speake so as is commonly received, and so as the most ignorant may well understand them.” And above all do not affect to speak like a foreigner: Some far journied gentlemen, at their returne home, like as they love to go in forraine apparrell, so they will pouder their talke with over-sea language. He that commeth lately out of France, will talk French English, and never blush at the matter. Cawdrey had no idea of listing all the words—whatever that would mean.
Pag. 66. 1011 The book Cawdrey made was the first English dictionary. The word dictionary was not in it.
That Cawdrey should arrange his words in alphabetical order, to make his Table Alphabeticall, was not self-evident. He knew he could not count on even his educated readers to be versed in alphabetical order, so he tried to produce a small how-to manual. He struggled with this: whether to describe the ordering in logical, schematic terms or in terms of a step-by-step procedure, an algorithm. “Gentle reader,” he wrote—again adapting freely from Coote— thou must learne the Alphabet, to wit, the order of the Letters as they stand, perfectly without booke, and where every Letter standeth: as b neere the beginning, n about the middest, and t toward the end. Nowe if the word, which thou art desirous to finde, begin with a then looke in the beginning of this Table, but if with v looke towards the end. Againe, if thy word beginne with ca looke in the beginning of the letter c but if with cu then looke toward the end of that letter. And so of all the rest. &c.
It was not easy to explain. Friar Johannes Balbus of Genoa tried in his 1286 Catholicon. Balbus thought he was inventing alphabetical order for the first time, and his instructions were painstaking: “For example I intend to discuss amo and bibo. I will discuss amo before bibo because a is the first letter of amo and b is the first letter of bibo and a is before b in the alphabet. Similarly …”♦ He rehearsed a long list of examples and concluded: “I beg of you, therefore, good reader, do not scorn this great labor of mine and this order as something worthless.” In the ancient world, alphabetical lists scarcely appeared until around 250 BCE, in papyrus texts from Alexandria. The great library there seems to have used at least some alphabetization in organizing its books. The need for such an artificial ordering scheme arises only with large collections of data, not otherwise ordered. And the possibility of alphabetical order arises only in languages possessing an alphabet: a discrete small symbol set with its own conventional sequence (“abecedarie, the order of the Letters, or hee that useth them”). Even then the system is unnatural.
“Definition,” John Locke finally writes in 1690, “being nothing but making another understand by Words, what Idea the Term defin’d stands for.”♦ And Locke still takes an operational view. Definition is communication: making another understand; sending a message. Cawdrey borrows definitions from his sources, combines them, and adapts them. In many case he simply maps one word onto another: orifice, mouth baud, whore helmet, head peece For a small class of words he uses a special designation, the letter k: “standeth for a kind of.” He does not consider it his job to say what kind. Thus: crocodile, k beast alablaster, k stone citron, k fruit But linking pairs of words, either as synonyms or as members of a class, can carry a lexicographer only so far.
When Galileo pointed his first telescope skyward and discovered sunspots in 1611, he immediately anticipated controversy—traditionally the sun was an epitome of purity—and he sensed that science could not proceed without first solving a problem of language: So long as men were in fact obliged to call the sun “most pure and most lucid,” no shadows or impurities whatever had been perceived in it; but now that it shows itself to us as partly impure and spotty; why should we not call it “spotted and not pure”? For names and attributes must be accommodated to the essence of things, and not the essence to the names, since things come first and names afterwards.♦
Pag. 73. 1178 Where Cawdrey had been isolated, Simpson was connected.
By contrast, the dictionary itself has acquired the status of a monument, definitive and towering. It exerts an influence on the language it tries to observe. It wears its authoritative role reluctantly. The lexicographers may recall Ambrose Bierce’s sardonic century-old definition: “dictionary, a malevolent literary device for cramping the growth of a language and making it hard and inelastic.”♦ Nowadays they stress that they do not presume (or deign) to disapprove any particular usage or spelling. But they cannot disavow a strong ambition: the goal of completeness. They want every word, all the lingo: idioms and euphemisms, sacred or profane, dead or alive, the King’s English or the street’s. It is an ideal only: the constraints of space and time are ever present and, at the margins, the question of what qualifies as a word can become impossible to answer. Still, to the extent possible, the OED is meant to be a perfect record, perfect mirror of the language.
The dictionary ratifies the persistence of the word. It declares that the meanings of words come from other words. It implies that all words, taken together, form an interlocking structure: interlocking, because all words are defined in terms of other words. This could never have been an issue in an oral culture, where language was barely visible. Only when printing—and the dictionary—put the language into separate relief, as an object to be scrutinized, could anyone develop a sense of word meaning as interdependent and even circular. Words had to be considered as words, representing other words, apart from things. In the twentieth century, when the technologies of logic advanced to high levels, the potential for circularity became a problem. “In giving explanations I already have to use language full blown,”♦ complained Ludwig Wittgenstein. He echoed Newton’s frustration three centuries earlier, but with an extra twist, because where Newton wanted words for nature’s laws, Wittgenstein wanted words for words: “When I talk about language (words, sentences, etc.) I must speak the language of every day. Is this language somehow too coarse and material for what we want to say?” Yes. And the language was always in flux.
For Cawdrey the dictionary was a snapshot; he could not see past his moment in time. Samuel Johnson was more explicitly aware of the dictionary’s historical dimension. He justified his ambitious program in part as a means of bringing a wild thing under control—the wild thing being the language, “which, while it was employed in the cultivation of every species of literature, has itself been hitherto neglected; suffered to spread, under the direction of chance, into wild exuberance; resigned to the tyranny of time and fashion; and exposed to the corruptions of ignorance, and caprices of innovation.”♦
The lexis is a measure of shared experience, which comes from interconnectedness. The number of users of the language forms only the first part of the equation: jumping in four centuries from 5 million English speakers to a billion. The driving factor is the number of connections between and among those speakers. A mathematician might say that messaging grows not geometrically, but combinatorially, which is much, much faster. “I think of it as a saucepan under which the temperature has been turned up,” Gilliver said. “Any word, because of the interconnectedness of the English-speaking world, can spring from the backwater. And they are still backwaters, but they have this instant connection to ordinary, everyday discourse.” Like the printing press, the telegraph, and the telephone before it, the Internet is transforming the language simply by transmitting information differently. What makes cyberspace different from all previous information technologies is its intermixing of scales from the largest to the smallest without prejudice, broadcasting to the millions, narrowcasting to groups, instant messaging one to one. This comes as quite an unexpected consequence of the invention of computing machinery. At first, that had seemed to be about numbers.
Babbage did not quite belong in his time, which called itself the Steam Age or the Machine Age. He did revel in the uses of steam and machinery and considered himself a thoroughly modern man, but he also pursued an assortment of hobbies and obsessions—cipher cracking, lock picking, lighthouses, tree rings, the post—whose logic became clearer a century later. Examining the economics of the mail, he pursued a counterintuitive insight, that the significant cost comes not from the physical transport of paper packets but from their “verification”—the calculation of distances and the collection of correct fees—and thus he invented the modern idea of standardized postal rates. He loved boating, by which he meant not “the manual labor of rowing but the more intellectual art of sailing.”♦ He was a train buff. He devised a railroad recording device that used inking pens to trace curves on sheets of paper a thousand feet long: a combination seismograph and speedometer, inscribing the history of a train’s velocity and all the bumps and shakes along the way.
He might have been described as a professional mathematician, yet here he was touring the country’s workshops and manufactories, trying to discover the state of the art in machine tools. He noted, “Those who enjoy leisure can scarcely find a more interesting and instructive pursuit than the examination of the workshops of their own country, which contain within them a rich mine of knowledge, too generally neglected by the wealthier classes.”♦ He himself neglected no vein of knowledge. He did become expert on the manufacture of Nottingham lace; also the use of gunpowder in quarrying limestone; precision glass cutting with diamonds; and all known uses of machinery to produce power, save time, and communicate signals. He analyzed hydraulic presses, air pumps, gas meters, and screw cutters. By the end of his tour he knew as much as anyone in England about the making of pins.
He spent his long life improving it, first in one and then in another incarnation, but all, mainly, in his mind. It never came to fruition anywhere else. It thus occupies an extreme and peculiar place in the annals of invention: a failure, and also one of humanity’s grandest intellectual achievements.
Anyway, the machine was not meant to be a sort of oracle, to be consulted by individuals who would travel from far and wide for mathematical answers. The engine’s chief mission was to print out numbers en masse. For portability, the facts of arithmetic could be expressed in tables and bound in books. To Babbage the world seemed made of such facts. They were the “constants of Nature and Art.” He collected them everywhere. He compiled a Table of Constants of the Class Mammalia: wherever he went he timed the breaths and heartbeats of pigs and cows.♦ He invented a statistical methodology with tables of life expectancy for the somewhat shady business of life insurance. He drew up a table of the weight in Troy grains per square yard of various fabrics: cambric, calico, nankeen, muslins, silk gauze, and “caterpillar veils.” Another table revealed the relative frequencies of all the double-letter combinations in English, French, Italian, German, and Latin. He researched, computed, and published a Table of the Relative Frequency of the Causes of Breaking of Plate Glass Windows, distinguishing 464 different causes, no less than fourteen of which involved “drunken men, women, or boys.” But the tables closest to his heart were the purest: tables of numbers and only numbers, marching neatly across and down the pages in stately rows and columns, patterns for abstract appreciation.
Tables of numbers had been part of the book business even before the beginning of the print era. Working in Baghdad in the ninth century, Abu Abdullah Mohammad Ibn Musa al-Khwarizmi, whose name survives in the word algorithm, devised tables of trigonometric functions that spread west across Europe and east to China, made by hand and copied by hand, for hundreds of years.
A seventeenth-century invention had catalyzed the whole enterprise. This invention was itself a species of number, given the name logarithm. It was number as tool. Henry Briggs explained: Logarithmes are Numbers invented for the more easie working of questions in Arithmetike and Geometrie. The name is derived of Logos, which signifies Reason, and Arithmos, signifying Numbers. By them all troublesome Multiplications and Divisions in Arithmetike are avoided, and performed onely by Addition in stead of Multiplication, and by Subtraction in stead of Division.♦ In 1614 Briggs was a professor of geometry—the first professor of geometry—at Gresham College, London, later to be the birthplace of the Royal Society.
This new book proposed a method that would do away with most of the expense and the errors. It was like an electric flashlight sent to a lightless world. The author was a wealthy Scotsman, John Napier (or Napper, Nepair, Naper, or Neper), the eighth laird of Merchiston Castle, a theologian and well-known astrologer who also made a hobby of mathematics. Briggs was agog. “Naper, lord of Markinston, hath set my head and hands a work,”♦ he wrote. “I hope to see him this summer, if it please God, for I never saw book, which pleased me better, and made me more wonder.” He made his pilgrimage to Scotland and their first meeting, as he reported later, began with a quarter hour of silence: “spent, each beholding other almost with admiration before one word was spoke.”♦
Looking up and adding are easier than multiplying. But Napier did not express his idea this way, in terms of exponents. He grasped the thing viscerally: he was thinking in terms of a relationship between differences and ratios. A series of numbers with a fixed difference is an arithmetic progression: 0, 1, 2, 3, 4, 5 … When the numbers are separated by a fixed ratio, the progression is geometric: 1, 2, 4, 8, 16, 32 … Set these progressions side by side, 0 1 2 3 4 5 … (base 2 logarithms) 1 2 4 8 16 32 … (natural numbers) and the result is a crude table of logarithms—crude, because the whole-number exponents are the easy ones.
Fundamentally, there was only one calculus. Newton and Leibniz knew how similar their work was—enough that each accused the other of plagiarism. But they had devised incompatible systems of notation—different languages—and in practice these surface differences mattered more than the underlying sameness. Symbols and operators were what a mathematician had to work with, after all. Babbage, unlike most students, made himself fluent in both—“the dots of Newton, the d’s of Leibnitz”♦—and felt he had seen the light. “It is always difficult to think and reason in a new language.”♦
The “calculus of finite differences” had been explored by mathematicians (especially the French) for a hundred years. Its power was to reduce high-level calculations to simple addition, ready to be routinized. For Babbage the method was so crucial that he named his machine from its first conception the Difference Engine.
At this point in the history of computing machinery, a new theme appears: the obsession with time. It occurred to Babbage that his machine had to compute faster than the human mind and as fast as possible. He had an idea for parallel processing: number wheels arrayed along an axis could add a row of the digits all at once. “If this could be accomplished,” he noted, “it would render additions and subtractions with numbers having ten, twenty, fifty, or any number of figures, as rapid as those operations are with single figures.”
For the first time, but not the last, a device was invested with memory. “It is in effect a memorandum taken by the machine,” wrote his publicizer, Dionysius Lardner. Babbage himself was self-conscious about anthropomorphizing but could not resist. “The mechanical means I employed to make these carriages,” he suggested, “bears some slight analogy to the operation of the faculty of memory.”
Machinery, like mathematics, needed rigor and definition for progress. “The forms of ordinary language were far too diffuse,” he wrote. “The signs, if they have been properly chosen, and if they should be generally adopted, will form as it were an universal language.” Language was never a side issue for Babbage.
Inspiring him, as well, was the loom on display in the Strand, invented by Joseph-Marie Jacquard, controlled by instructions encoded and stored as holes punched in cards. What caught Babbage’s fancy was not the weaving, but rather the encoding, from one medium to another, of patterns. The patterns would appear in damask, eventually, but first were “sent to a peculiar artist.” This specialist, as he said, punches holes in a set of pasteboard cards in such a manner that when those cards are placed in a Jacquard loom, it will then weave upon its produce the exact pattern designed by the artist.♦ The notion of abstracting information away from its physical substrate required careful emphasis. Babbage explained, for example, that the weaver might choose different threads and different colors—“but in all these cases the form of the pattern will be precisely the same.” As Babbage conceived his machine now, it raised this very process of abstraction to higher and higher degrees.
He made clear, though, that information—representations of number and process—would course through the machinery. It would pass to and from certain special physical locations, which Babbage named a store, for storage, and a mill, for action.
The mathematician and logician Augustus De Morgan, a friend of Babbage and of Lady Byron, became Ada’s teacher by post. He sent her exercises. She sent him questions and musings and doubts (“I could wish I went on quicker”; “I am sorry to say I am sadly obstinate about the Term at which Convergence begins”; “I have enclosed my Demonstration of my view of the case”; “functional Equations are complete Will-o-the-wisps to me”; “However I try to keep my metaphysical head in order”).
was fearless about drilling down to first principles. Where she felt difficulties, real difficulties lay.
A formal solution to a game—the very idea of such a thing was original. The desire to create a language of symbols, in which the solution could be encoded—this way of thinking was Babbage’s, as she well knew.
There he made his first (and last) public presentation of the Analytical Engine. “The discovery of the Analytical Engine is so much in advance of my own country, and I fear even of the age,”♦ he said. He met the Sardinian king, Charles Albert, and, more significantly, an ambitious young mathematician named Luigi Menabrea. Later Menabrea was to become a general, a diplomat, and the prime minister of Italy; now he prepared a scientific report, “Notions sur la machine analytique,”♦ to introduce Babbage’s plan to a broader community of European philosophers.
The engine did not just calculate; it performed operations, she said, defining an operation as “any process which alters the mutual relation of two or more things,” and declaring: “This is the most general definition, and would include all subjects in the universe.”♦ The science of operations, as she conceived it, is a science of itself, and has its own abstract truth and value; just as logic has its own peculiar truth and value, independently of the subjects to which we may apply its reasonings and processes.… One main reason why the separate nature of the science of operations has been little felt, and in general little dwelt on, is the shifting meaning of many of the symbols used. Symbols and meaning: she was emphatically not speaking of mathematics alone. The engine “might act upon other things besides number.” Babbage had inscribed numerals on those thousands of dials, but their working could represent symbols more abstractly. The engine might process any meaningful relationships.
… We may say most aptly, that the Analytical Engine weaves algebraical patterns just as the Jacquard-loom weaves flowers and leaves.♦ For this flight of fancy she took full responsibility. “Whether the inventor of this engine had any such views in his mind while working out the invention, or whether he may subsequently ever have regarded it under this phase, we do not know; but it is one that forcibly occurred to ourselves.”
She devised a process, a set of rules, a sequence of operations. In another century this would be called an algorithm, later a computer program, but for now the concept demanded painstaking explanation. The trickiest point was that her algorithm was recursive. It ran in a loop. The result of one iteration became food for the next. Babbage had alluded to this approach as “the Engine eating its own tail.”
A core idea was the entity she and Babbage called the variable. Variables were, in hardware terms, the machine’s columns of number dials. But there were “Variable cards,” too. In software terms they were a sort of receptacle or envelope, capable of representing, or storing, a number of many decimal digits. (“What is there in a name?” Babbage wrote. “It is merely an empty basket until you put something in it.”) Variables were the machine’s units of information. This was quite distinct from the algebraic variable. As A.A.L. explained, “The origin of this appellation is, that the values on the columns are destined to change, that is to vary, in every conceivable manner.” Numbers traveled, in effect, from variable cards to variables, from variables to the mill (for operations), from the mill to the store.
She was programming the machine. She programmed it in her mind, because the machine did not exist. The complexities she encountered for the first time became familiar to programmers of the next century: How multifarious and how mutually complicated are the considerations which the working of such an engine involve. There are frequently several distinct sets of effects going on simultaneously; all in a manner independent of each other, and yet to a greater or less degree exercising a mutual influence. To adjust each to every other, and indeed even to perceive and trace them out with perfect correctness and success, entails difficulties whose nature partakes to a certain extent of those involved in every question where conditions are very numerous and inter-complicated.♦
Babbage’s interests, straying so far from mathematics, seeming so miscellaneous, did possess a common thread that neither he nor his contemporaries could perceive. His obsessions belonged to no category—that is, no category yet existing. His true subject was information: messaging, encoding, processing. He took up two quirky and apparently unphilosophical challenges, which he himself noted had a deep connection one to the other: picking locks and deciphering codes. Deciphering, he said, was “one of the most fascinating of arts, and I fear I have wasted upon it more time than it deserves.”♦ To rationalize the process, he set out to perform a “complete analysis” of the English language. He created sets of special dictionaries: lists of the words of one letter, two letters, three letters, and so on; and lists of words alphabetized by their initial letter, second letter, third letter, and so on. With these at hand he designed methodologies for solving anagram puzzles and word squares.
In tree rings he saw nature encoding messages about the past. A profound lesson: that a tree records a whole complex of information in its solid substance. “Every shower that falls, every change of temperature that occurs, and every wind that blows, leaves on the vegetable world the traces of its passage; slight, indeed, and imperceptible, perhaps, to us, but not the less permanently recorded in the depths of those woody fabrics.”♦
As he looked to the future, he saw a special role for one truth above all: “the maxim, that knowledge is power.” He understood that literally. Knowledge “is itself the generator of physical force,” he declared. Science gave the world steam, and soon, he suspected, would turn to the less tangible power of electricity: “Already it has nearly chained the ethereal fluid.” And he looked further: It is the science of calculation—which becomes continually more necessary at each step of our progress, and which must ultimately govern the whole of the applications of science to the arts of life.
Maybe nerves were not just like wires; maybe they were wires, carrying messages from the nether regions to the sensorium. Alfred Smee, in his 1849 Elements of Electro-Biology, likened the brain to a battery and the nerves to “bio-telegraphs.”♦
In human hands, electricity could hardly accomplish anything, at first. It could not make a light brighter than a spark. It was silent. But it could be sent along wires to great distances—this was discovered early—and it seemed to turn wires into faint magnets. Those wires could be long: no one had found any limit to the range of the electric current. It took no time at all to see what this meant for the ancient dream of long-distance communication. It meant sympathetic needles.
A whole realm of engineering had to be invented. Apart from the engineering was a separate problem: the problem of the message itself. This was more a logic puzzle than a technical one. It was a problem of crossing levels, from kinetics to meaning. What form would the message take? How would the telegraph convert this fluid into words? By virtue of magnetism, the influence propagated across a distance could perform work upon physical objects, such as needles, or iron filings, or even small levers. People had different ideas: the electromagnet might sound an alarum-bell; might govern the motion of wheel-work; might turn a handle, which might carry a pencil (but nineteenth-century engineering was not up to robotic handwriting). Or the current might discharge a cannon. Imagine discharging a cannon by sending a signal from miles away! Would-be inventors naturally looked to previous communications technologies, but the precedents were mostly the wrong sort.
Then came needles. The physicist André-Marie Ampère, a developer of the galvanometer, proposed using that as a signaling device: it was a needle deflected by electromagnetism—a compass pointing to a momentary artificial north. He, too, thought in terms of one needle for every letter. In Russia, Baron Pavel Schilling demonstrated a system with five needles and later reduced that to one: he assigned combinations of right and left signals to the letters and numerals. At Göttingen in 1833 the mathematician Carl Friedrich Gauss, working with a physicist, Wilhelm Weber, organized a similar scheme with one needle.
Morse told a friend who was rooming with him in Paris: “The mails in our country are too slow; this French telegraph is better, and would do even better in our clear atmosphere than here, where half the time fogs obscure the skies. But this will not be fast enough—the lightning would serve us better.”♦ As he described his epiphany, it was an insight not about lightning but about signs: “It would not be difficult to construct a system of signs by which intelligence could be instantaneously transmitted.”♦ TELEGRAPHIC WRITING BY MORSE’S FIRST INSTRUMENT ALFRED VAIL’S TELEGRAPH “KEY” Morse had a great insight from which all the rest flowed. Knowing nothing about pith balls, bubbles, or litmus paper, he saw that a sign could be made from something simpler, more fundamental, and less tangible—the most minimal event, the closing and opening of a circuit. Never mind needles. The electric current flowed and was interrupted, and the interruptions could be organized to create meaning.
So at one end they had a lever, for closing and opening the circuit, and at the other end the current controlled an electromagnet. One of them, probably Vail, thought of putting the two together. The magnet could operate the lever. This combination (invented more or less simultaneously by Joseph Henry at Princeton and Edward Davy in England) was named the “relay,” from the word for a fresh horse that replaced an exhausted one. It removed the greatest obstacle standing in the way of long-distance electrical telegraphy: the weakening of currents as they passed through lengths of wire. A weakened current could still operate a relay, enabling a new circuit, powered by a new battery.
Despite the expense—at first, typically, fifty cents for ten words—the newspapers became the telegraph services’ most enthusiastic patrons. Within a few years, 120 provincial newspapers were getting reports from Parliament nightly.
The relationship between the telegraph and the newspaper was symbiotic. Positive feedback loops amplified the effect. Because the telegraph was an information technology, it served as an agent of its own ascendency.
“Distance and time have been so changed in our imaginations,” said Josiah Latimer Clark, an English telegraph engineer, “that the globe has been practically reduced in magnitude, and there can be no doubt that our conception of its dimensions is entirely different to that held by our forefathers.”♦ Formerly all time was local: when the sun was highest, that was noon. Only a visionary (or an astronomer) would know that people in a different place lived by a different clock. Now time could be either local or standard, and the distinction baffled most people. The railroads required standard time, and the telegraph made it feasible.
Far from annihilating time, synchrony extended its dominion. The very idea of synchrony, and the awareness that the idea was new, made heads spin. The New York Herald declared: Professor Morse’s telegraph is not only an era in the transmission of intelligence, but it has originated in the mind an entirely new class of ideas, a new species of consciousness. Never before was any one conscious that he knew with certainty what events were at that moment passing in a distant city—40, 100, or 500 miles off.♦
Imagine, continued this exhilarated writer, that it is now 11 o’clock. The telegraph relays what a legislator is now saying in Washington. It requires no small intellectual effort to realize that this is a fact that now is, and not one that has been. This is a fact that now is. History (and history making) changed, too. The telegraph caused the preservation of quantities of minutiae concerning everyday life. For a while, until it became impractical, the telegraph companies tried to maintain a record of every message. This was information storage without precedent.
A message had seemed to be a physical object. That was always an illusion; now people needed consciously to divorce their conception of the message from the paper on which it was written. Scientists, Harper’s explained, will say that the electric current “carries a message,” but one must not imagine that anything—any thing—is transported. There is only “the action and reaction of an imponderable force, and the making of intelligible signals by its means at a distance.” No wonder people were misled. “Such language the world must, perhaps for a long time to come, continue to employ.”
The wires resembled nothing in architecture and not much in nature. Writers seeking similes thought of spiders and their webs. They thought of labyrinths and mazes. And one more word seemed appropriate: the earth was being covered, people said, with an iron net-work. “A net-work of nerves of iron wire, strung with lightning, will ramify from the brain, New York, to the distant limbs and members,”♦ said the New York Tribune. “The whole net-work of wires,” wrote Harper’s, “all quivering from end to end with signals of human intelligence.”♦ Wynter offered a prediction. “The time is not distant,”♦ he wrote, “when everybody will be able to talk with everybody without going out of the house.”
Movement from one symbolic level to another could be called encoding.
Two motivations went hand in glove: secrecy and brevity. Short messages saved money—that was simple. So powerful was that impulse that English prose style soon seemed to be feeling the effects. Telegraphic and telegraphese described the new way of writing. Flowers of rhetoric cost too much, and some regretted it.
Those who used the telegraph codes slowly discovered an unanticipated side effect of their efficiency and brevity. They were perilously vulnerable to the smallest errors. Because they lacked the natural redundancy of English prose—even the foreshortened prose of telegraphese—these cleverly encoded messages could be disrupted by a mistake in a single character.
This little book was titled Mercury: or the Secret and Swift Messenger. Shewing, How a Man may with Privacy and Speed communicate his Thoughts to a Friend at any Distance. The author eventually revealed himself as John Wilkins, a vicar and mathematician, later to become master of Trinity College, Cambridge, and a founder of the Royal Society. “He was a very ingenious man and had a very mechanical head,”♦ one contemporary said. “One of much and deep thinking,… lusty, strong grown, well set, broad shouldered.”
For Wilkins the issues of cryptography stood near the fundamental problem of communication. He considered writing and secret writing as essentially the same. Leaving secrecy aside, he expressed the problem this way: “How a Man may with the greatest Swiftness and Speed, discover his Intentions to one that is far distant from him.”♦ By swiftness and speed he meant, in 1641, something philosophical; the birth of Isaac Newton was a year away. “There is nothing (we say) so swift as Thought,” he noted.
For example, a set of five symbols—a, b, c, d, e—used in pairs could replace an alphabet of twenty-five letters: “According to which,” wrote Wilkins, “these words, I am betrayed, may be thus described: Bd aacb abaedddbaaecaead.” So even a small symbol set could be arranged to express any message at all. However, with a small symbol set, a given message requires a longer string of characters—“more Labour and Time,” he wrote. Wilkins did not explain that 25 = 52, nor that three symbols taken in threes (aaa, aab, aac,…) produce twenty-seven possibilities because 33 = 27. But he clearly understood the underlying mathematics.
Two symbols. In groups of five. “Yield thirty two Differences.” That word, differences, must have struck Wilkins’s readers (few though they were) as an odd choice. But it was deliberate and pregnant with meaning. Wilkins was reaching for a conception of information in its purest, most general form. Writing was only a special case: “For in the general we must note, That whatever is capable of a competent Difference, perceptible to any Sense, may be a sufficient Means whereby to express the Cogitations.”♦
the most advanced cryptanalyst in Victorian England was Charles Babbage. The process of substituting symbols, crossing levels of meaning, lay near the heart of so many issues. And he enjoyed the challenge. “One of the most singular characteristics of the art of deciphering,” he asserted, “is the strong conviction possessed by every person, even moderately acquainted with it, that he is able to construct a cipher which nobody else can decipher. I have also observed that the cleverer the person, the more intimate is his conviction.”
Boole thought of his system as a mathematics without numbers. “It is simply a fact,”♦ he wrote, “that the ultimate laws of logic—those alone on which it is possible to construct a science of logic—are mathematical in their form and expression, although not belonging to the mathematics of quantity.” The only numbers allowed, he proposed, were zero and one. It was all or nothing: “The respective interpretation of the symbols 0 and 1 in the system of logic are Nothing and Universe.”♦ Until now logic had belonged to philosophy. Boole was claiming possession on behalf of mathematics. In doing so, he devised a new form of encoding. Its code book paired two types of symbolism, each abstracted far from the world of things. On one side was a set of characters drawn from the formalism of mathematics: p’s and q’s, +’s and –’s, braces and brackets.
Language, after all, is an instrument. It was seen distinctly now as an instrument with two separate functions: expression and thought. Thinking came first, or so people assumed. To Boole, logic was thought—polished and purified. He chose The Laws of Thought as the title for his 1854 masterwork.
As the century turned, Bertrand Russell paid George Boole an extraordinary compliment: “Pure mathematics was discovered by Boole, in a work which he called the Laws of Thought.”♦ It has been quoted often. What makes the compliment extraordinary is the seldom quoted disparagement that follows on its heels: He was also mistaken in supposing that he was dealing with the laws of thought: the question how people actually think was quite irrelevant to him, and if his book had really contained the laws of thought, it was curious that no one should ever have thought in such a way before.
These devices changed the topology—ripped the social fabric and reconnected it, added gateways and junctions where there had only been blank distance. Already at the turn of the twentieth century there was worry about unanticipated effects on social behavior. The superintendent of the line in Wisconsin fretted about young men and women “constantly sparking over the wire” between Eau Claire and Chippewa Falls. “This free use of the line for flirtation purposes has grown to an alarming extent,” he wrote, “and if it is to go on somebody must pay for it.”
Bush, like Babbage, hated the numbing, wasteful labor of mere calculation. “A mathematician is not a man who can readily manipulate figures; often he cannot,” Bush wrote. “He is primarily an individual who is skilled in the use of symbolic logic on a high plane, and especially he is a man of intuitive judgment.”♦
Relay circuitry was designed for each particular case. No one had thought to study the idea systematically, but Shannon was looking for a topic for his master’s thesis, and he saw a possibility. In his last year of college he had taken a course in symbolic logic, and, when he tried to make an orderly list of the possible arrangements of switching circuits, he had a sudden feeling of déjà vu. In a deeply abstract way, these problems lined up. The peculiar artificial notation of symbolic logic, Boole’s “algebra,” could be used to describe circuits.
By melding logic and mathematics in a system of axioms, signs, formulas, and proofs, philosophers seemed within reach of a kind of perfection—a rigorous, formal certainty. This was the goal of Bertrand Russell and Alfred North Whitehead, the giants of English rationalism, who published their great work in three volumes from 1910 to 1913. Their title, Principia Mathematica, grandly echoed Isaac Newton; their ambition was nothing less than the perfection of all mathematics. This was finally possible, they claimed, through the instrument of symbolic logic, with its obsidian signs and implacable rules. Their mission was to prove every mathematical fact. The process of proof, when carried out properly, should be mechanical. In contrast to words, symbolism (they declared) enables “perfectly precise expression.” This elusive quarry had been pursued by Boole, and before him, Babbage, and long before either of them, Leibniz, all believing that the perfection of reasoning could come with the perfect encoding of thought.
A cleaner formulation of Epimenides’ paradox—cleaner because one need not worry about Cretans and their attributes—is the liar’s paradox: This statement is false. The statement cannot be true, because then it is false. It cannot be false, because then it becomes true. It is neither true nor false, or it is both at once. But the discovery of this twisting, backfiring, mind-bending circularity does not bring life or language crashing to a halt—one grasps the idea and moves on—because life and language lack the perfection, the absolutes, that give them force.
One was Berry’s paradox, first suggested to Russell by G. G. Berry, a librarian at the Bodleian. It has to do with counting the syllables needed to specify each integer. Generally, of course, the larger the number the more syllables are required. In English, the smallest integer requiring two syllables is seven. The smallest requiring three syllables is eleven. The number 121 seems to require six syllables (“one hundred twenty-one”), but actually four will do the job, with some cleverness: “eleven squared.” Still, even with cleverness, there are only a finite number of possible syllables and therefore a finite number of names, and, as Russell put it, “Hence the names of some integers must consist of at least nineteen syllables, and among these there must be a least. Hence the least integer not nameable in fewer than nineteen syllables must denote a definite integer.”♦♦ Now comes the paradox. This phrase, the least integer not nameable in fewer than nineteen syllables, contains only eighteen syllables. So the least integer not nameable in fewer than nineteen syllables has just been named in fewer than nineteen syllables. Another paradox of Russell’s is the Barber paradox. The barber is the man (let us say) who shaves all the men, and only those, who do not shave themselves. Does the barber shave himself? If he does he does not, and if he does not he does.
Recursion was the oxygen feeding the flame. In the same way, the liar paradox relies on statements about statements. “This statement is false” is meta-language: language about language. Russell’s paradoxical set relies on a meta-set: a set of sets. So the problem was a crossing of levels, or, as Russell termed it, a mixing of types. His solution: declare it illegal, taboo, out of bounds. No mixing different levels of abstraction. No self-reference; no self-containment.
Enter Kurt Gödel.
He found that lurking within PM—and within any consistent system of logic—there must be monsters of a kind hitherto unconceived: statements that can never be proved, and yet can never be disproved. There must be truths, that is, that cannot be proved—and Gödel could prove it. He accomplished this with iron rigor disguised as sleight of hand. He employed the formal rules of PM and, as he employed them, also approached them metamathematically—viewed them, that is, from the outside. As he explained, all the symbols of PM—numbers, operations of arithmetic, logical connectors, and punctuation—constituted a limited alphabet. Every statement or formula of PM was written in this alphabet. Likewise every proof comprised a finite sequence of formulas—just a longer passage written in the same alphabet. This is where metamathematics came in. Metamathematically, Gödel pointed out, one sign is as good as another; the choice of a particular alphabet is arbitrary. One could use the traditional assortment of numerals and glyphs (from arithmetic: +, −, =, ×; from logic: ¬, ∨, ⊃, ∃), or one could use letters, or one could use dots and dashes. It was a matter of encoding, slipping from one symbol set to another. Gödel proposed to use numbers for all his signs. Numbers were his alphabet. And because numbers can be combined using arithmetic, any sequence of numbers amounts to one (possibly very large) number. So every statement, every formula of PM can be expressed as a single number, and so can every proof. Gödel outlined a rigorous scheme for doing the encoding—an algorithm, mechanical, just rules to follow, no intelligence necessary. It works forward and backward: given any formula, following the rules generates one number, and given any number, following the rules produces the corresponding formula.
Gödel’s conclusion sprang not from a weakness in PM but from a strength. That strength is the fact that numbers are so flexible or “chameleonic” that their patterns can mimic patterns of reasoning.… PM’s expressive power is what gives rise to its incompleteness.
The telegraph demanded literacy; the telephone embraced orality. A message sent by telegraph had first to be written, encoded, and tapped out by a trained intermediary. To employ the telephone, one just talked. A child could use it. For that very reason it seemed like a toy. In fact, it seemed like a familiar toy, made from tin cylinders and string. The telephone left no permanent record. The Telephone had no future as a newspaper name. Business people thought it unserious. Where the telegraph dealt in facts and numbers, the telephone appealed to emotions.
In 1905, his finest year, Einstein published a paper on Brownian motion, the random, jittery motion of tiny particles suspended in a fluid. Antony van Leeuwenhoek had discovered it with his early microscope, and the phenomenon was named after Robert Brown, the Scottish botanist who studied it carefully in 1827: first pollen in water, then soot and powdered rock. Brown convinced himself that these particles were not alive—they were not animalcules—yet they would not sit still. In a mathematical tour de force, Einstein explained this as a consequence of the heat energy of molecules, whose existence he thereby proved. Microscopically visible particles, like pollen, are bombarded by molecular collisions and are light enough to be jolted randomly this way and that. The fluctuations of the particles, individually unpredictable, collectively express the laws of statistical mechanics.
It seemed intuitively clear that the amount of information should be proportional to the number of symbols: twice as many symbols, twice as much information. But a dot or dash—a symbol in a set with just two members—carries less information than a letter of the alphabet and much less information than a word chosen from a thousand-word dictionary. The more possible symbols, the more information each selection carries. How much more? The equation, as Hartley wrote it, was this: H = n log s where H is the amount of information, n is the number of symbols transmitted, and s is the size of the alphabet.
In a dot-dash system, s is just 2. A single Chinese character carries so much more weight than a Morse dot or dash; it is so much more valuable. In a system with a symbol for every word in a thousand-word dictionary, s would be 1,000. The amount of information is not proportional to the alphabet size, however. That relationship is logarithmic: to double the amount of information, it is necessary to quadruple the alphabet size.
an evaluation forty years later the geneticist James F. Crow wrote: “It seems to have been written in complete isolation from the population genetics community.…[Shannon] discovered principles that were rediscovered later.… My regret is that [it] did not become widely known in 1940. It would have changed the history of the subject substantially, I think.”
AT THE HEIGHT OF THE WAR, in early 1943, two like-minded thinkers, Claude Shannon and Alan Turing, met daily at teatime in the Bell Labs cafeteria and said nothing to each other about their work, because it was secret.♦ Both men had become cryptanalysts.
It was clear later that these men, on their respective sides of the Atlantic, had done more than anyone else to turn cryptography from an art into a science, but for now the code makers and code breakers were not talking to each other.
With that subject off the table, Turing showed Shannon a paper he had written seven years earlier, called “On Computable Numbers,” about the powers and limitations of an idealized computing machine. They talked about another topic that turned out to be close to their hearts, the possibility of machines learning to think. Shannon proposed feeding “cultural things,” such as music, to an electronic brain, and they outdid each other in brashness,
Turing was a fellow and a recent graduate at King’s College, Cambridge, when he presented his computable-numbers paper to his professor in 1936. The full title finished with a flourish in fancy German: it was “On Computable Numbers, with an Application to the Entscheidungsproblem.” The “decision problem” was a challenge that had been posed by David Hilbert at the 1928 International Congress of Mathematicians. As perhaps the most influential mathematician of his time, Hilbert, like Russell and Whitehead, believed fervently in the mission of rooting all mathematics in a solid logical foundation—“In der Mathematik gibt es kein Ignorabimus,” he declared. (“In mathematics there is no we will not know.”) Of course mathematics had many unsolved problems, some quite famous, such as Fermat’s Last Theorem and the Goldbach conjecture—statements that seemed true but had not been proved. Had not yet been proved, most people thought. There was an assumption, even a faith, that any mathematical truth would be provable, someday.
The Entscheidungsproblem was to find a rigorous step-by-step procedure by which, given a formal language of deductive reasoning, one could perform a proof automatically. This was Leibniz’s dream revived once again: the expression of all valid reasoning in mechanical rules.
Hilbert had distinguished among three questions: Is mathematics complete? Is mathematics consistent? Is mathematics decidable? Gödel showed that mathematics could not be both complete and consistent but had not definitively answered the third question, at least not for all mathematics. Even though a particular closed system of formal logic must contain statements that could neither be proved nor disproved from within the system, it might conceivably be decided, as it were, by an outside referee—by external logic or rules.
Alan Turing, just twenty-two years old, unfamiliar with much of the relevant literature, so alone in his work habits that his professor worried about his becoming “a confirmed solitary,”♦ posed an entirely different question (it seemed): Are all numbers computable? This was an unexpected question to begin with, because hardly anyone had considered the idea of an uncomputable number. Most numbers that people work with, or think about, are computable by definition. The rational numbers are computable because they can be expressed as the quotient of two integers, a/b. The algebraic numbers are computable because they are solutions of polynomial equations. Famous numbers like Π and e are computable; people compute them all the time. Nonetheless Turing made the seemingly mild statement that numbers might exist that are somehow nameable, definable, and not computable.
States required more explaining. Turing used the word “configurations” and pointed out that these resembled “states of mind.” The machine has a few of these—some finite number. In any given state, the machine takes one or more actions depending on the current symbol. For example, in state a, the machine might move one square to the right if the current symbol is 1, or move one square to the left if the current symbol is 0, or print 1 if the current symbol is blank. In state b, the machine might erase the current symbol. In state c, if the symbol is 0 or 1, the machine might move to the right, and otherwise stop. After each action, the machine finishes in a new state, which might be the same or different. The various states used for a given calculation were stored in a table—how this was to be managed physically did not matter. The state table was, in effect, the machine’s set of instructions.
He demonstrated that this short list covers everything a person does in computing a number. No other knowledge or intuition is necessary. Anything computable can be computed by this machine. Then came the final flourish. Turing’s machines, stripped down to a finite table of states and a finite set of input, could themselves be represented as numbers. Every possible state table, combined with its initial tape, represents a different machine. Each machine itself, then, can be described by a particular number—a certain state table combined with its initial tape. Turing was encoding his machines just as Gödel had encoded the language of symbolic logic. This obliterated the distinction between data and instructions: in the end they were all numbers. For every computable number, there must be a corresponding machine number.
Turing produced (still in his mind’s eye) a version of the machine that could simulate every other possible machine—every digital computer. He called this machine U, for “universal,” and mathematicians fondly use the name U to this day. It takes machine numbers as input. That is, it reads the descriptions of other machines from its tape—their algorithms and their own input. No matter how complex a digital computer may grow, its description can still be encoded on tape to be read by U. If a problem can be solved by any digital computer—encoded in symbols and solved algorithmically—the universal machine can solve it as well.
Now the microscope is turned onto itself. The Turing machine sets about examining every number to see whether it corresponds to a computable algorithm. Some will prove computable. Some might prove uncomputable. And there is a third possibility, the one that most interested Turing. Some algorithms might defy the inspector, causing the machine to march along, performing its inscrutable business, never coming to a halt, never obviously repeating itself, and leaving the logical observer forever in the dark about whether it would halt. By now Turing’s argument, as published in 1936, has become a knotty masterpiece of recursive definitions, symbols invented to represent other symbols, numbers standing in for numbers, for state tables, for algorithms, for machines. In print it looked like this: By combining the machines D and U we could construct a machine M to compute the sequence β′. The machine D may require a tape. We may suppose that it uses the E-squares beyond all symbols on F-squares, and that when it has reached its verdict all the rough work done by D is erased.… We can show further that there can be no machine E which, when applied with the S.D of an arbitrary machine M, will determine whether M ever prints a given symbol (0 say). Few could follow it. It seems paradoxical—it is paradoxical—but Turing proved that some numbers are uncomputable. (In fact, most are.) Also, because every number corresponds to an encoded proposition of mathematics and logic, Turing had resolved Hilbert’s question about whether every proposition is decidable. He had proved that the Entscheidungsproblem has an answer, and the answer is no. An uncomputable number is, in effect, an undecidable proposition.
So Turing’s computer—a fanciful, abstract, wholly imaginary machine—led him to a proof parallel to Gödel’s. Turing went further than Gödel by defining the general concept of a formal system. Any mechanical procedure for generating formulas is essentially a Turing machine. Any formal system, therefore, must have undecidable propositions. Mathematics is not decidable.
Incompleteness follows from uncomputability. Once again, the paradoxes come to life when numbers gain the power to encode the machine’s own behavior. That is the necessary recursive twist. The entity being reckoned is fatally entwined with the entity doing the reckoning. As Douglas Hofstadter put it much later, “The thing hinges on getting this halting inspector to try to predict its own behavior when looking at itself trying to predict its own behavior when looking at itself trying to predict its own behavior when …”♦ A conundrum that at least smelled similar had lately appeared in physics, too: Werner Heisenberg’s new uncertainty principle.
When Turing learned about that, he expressed it in terms of self-reference: “It used to be supposed in Science that if everything was known about the Universe at any particular moment then we can predict what it will be through all the future.… More modern science however has come to the conclusion that when we are dealing with atoms and electrons we are quite unable to know the exact state of them; our instruments being made of atoms and electrons themselves.”
Alan Turing and Claude Shannon had codes in common. Turing encoded instructions as numbers. He encoded decimal numbers as zeroes and ones. Shannon made codes for genes and chromosomes and relays and switches. Both men applied their ingenuity to mapping one set of objects onto another: logical operators and electric circuits; algebraic functions and machine instructions. The play of symbols and the idea of mapping, in the sense of finding a rigorous correspondence between two sets, had a prominent place in their mental arsenals. This kind of coding was not meant to obscure but to illuminate: to discover that apples and oranges were after all equivalent, or if not equivalent then fungible. The war brought both men to cryptography in its most riddling forms.
Rather, Turing cared about the data that changed the probability: a probability factor, something like the weight of evidence. He invented a unit he named a “ban.” He found it convenient to use a logarithmic scale, so that bans would be added rather than multiplied. With a base of ten, a ban was the weight of evidence needed to make a fact ten times as likely. For more fine-grained measurement there were “decibans” and “centibans.” Shannon had a notion along similar lines.
Pag. 250. 3834 his “analysis of some of the fundamental properties of general systems for the transmission of intelligence.”
“From the point of view of the cryptanalyst,” Shannon noted, “a secrecy system is almost identical with a noisy communication system.”♦ (He completed his report, “A Mathematical Theory of Cryptography,” in 1945; it was immediately classified.) The data stream is meant to look stochastic, or random, but of course it is not: if it were truly random the signal would be lost. The cipher must transform a patterned thing, ordinary language, into something apparently without pattern. But pattern is surprisingly persistent. To analyze and categorize the transformations of ciphering, Shannon had to understand the patterns of language in a way that scholars—linguists, for example—had never done before. Linguists had, however, begun to focus their discipline on structure in language—system to be found amid the vague billowing shapes and sounds. The linguist Edward Sapir wrote of “symbolic atoms” formed by a language’s underlying phonetic patterns. “The mere sounds of speech,” he wrote in 1921, “are not the essential fact of language, which lies rather in the classification, in the formal patterning.… Language, as a structure, is on its inner face the mold of thought.”♦ Mold of thought was exquisite. Shannon, however, needed to view language in terms more tangible and countable.
Pattern, as he saw it, equals redundancy. In ordinary language, redundancy serves as an aid to understanding. In cryptanalysis, that same redundancy is the Achilles’ heel. Where is this redundancy? As a simple example in English, wherever the letter q appears, the u that follows is redundant. (Or almost—it would be entirely redundant were it not for rare borrowed items like qin and Qatar.) After q, a u is expected. There is no surprise. It contributes no information. After the letter t, an h has a certain amount of redundancy, because it is the likeliest letter to appear. Every language has a certain statistical structure, Shannon argued, and with it a certain redundancy.
What all secrecy systems had in common was the use of a key: a code word, or phrase, or an entire book, or something even more complex, but in any case a source of characters known to both the sender and receiver—knowledge shared apart from the message itself. In the German Enigma system, the key was internalized in hardware and changed daily; Bletchley Park had to rediscover it anew each time, its experts sussing out the patterns of language freshly transformed. Shannon, meanwhile, removed himself to the most distant, most general, most theoretical vantage point. A secrecy system comprised a finite (though possibly very large) number of possible messages, a finite number of possible cryptograms, and in between, transforming one to the other, a finite number of keys, each with an associated probability.
The message was seen as a choice: one alternative selected from a set. At Old North Church the night of Paul Revere’s ride, the number of possible messages was two.
He offered this provocation in order to make his purpose utterly clear. Shannon needed, if he were to create a theory, to hijack the word information. “ ‘Information’ here,” he wrote, “although related to the everyday meaning of the word, should not be confused with it.” Like Nyquist and Hartley before him, he wished to leave aside “the psychological factors” and focus only on “the physical.” But if information was divorced from semantic content, what was left? A few things could be said, and at first blush they all sounded paradoxical. Information is uncertainty, surprise, difficulty, and entropy: “Information is closely associated with uncertainty.” Uncertainty, in turn, can be measured by counting the number of possible messages. If only one message is possible, there is no uncertainty and thus no information. Some messages may be likelier than others, and information implies surprise. Surprise is a way of talking about probabilities. If the letter following t (in English) is h, not so much information is conveyed, because the probability of h was relatively high. “What is significant is the difficulty in transmitting the message from one point to another.” Perhaps this seemed backward, or tautological, like defining mass in terms of the force needed to move an object. But then, mass can be defined that way. Information is entropy. This was the strangest and most powerful notion of all. Entropy—already a difficult and poorly understood concept—is a measure of disorder in thermodynamics, the science of heat and energy.
The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.♦ “Point” was a carefully chosen word: the origin and destination of a message could be separated in space or in time; information storage, as in a phonograph record, counts as a communication. Meanwhile, the message is not created; it is selected. It is a choice. It might be a card dealt from a deck, or three decimal digits chosen from the thousand possibilities, or a combination of words from a fixed code book. He could hardly overlook meaning altogether, so he dressed it with a scientist’s definition and then showed it the door: Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. Nonetheless, as Weaver took pains to explain, this was not a narrow view of communication. On the contrary, it was all-encompassing: “not only written and oral speech, but also music, the pictorial arts, the theatre, the ballet, and in fact all human behavior.” Nonhuman as well: why should machines not have messages to send? Shannon’s model for communication fit a simple diagram—essentially the same diagram, by no coincidence, as in his secret cryptography paper. (Illustration credit 7.3) A communication system must contain the following elements: The information source is the person or machine generating the message, which may be simply a sequence of characters, as in a telegraph or teletype, or may be expressed mathematically as functions—f(x, y, t)—of time and other variables. In a complex example like color television, the components are three functions in a three-dimensional continuum, Shannon noted. The transmitter “operates on the message in some way”—that is, encodes the message—to produce a suitable signal. A telephone converts sound pressure into analog electric current. A telegraph encodes characters in dots, dashes, and spaces. More complex messages may be sampled, compressed, quantized, and interleaved. The channel: “merely the medium used to transmit the signal.” The receiver inverts the operation of the transmitter. It decodes the message, or reconstructs it from the signal. The destination “is the person (or thing)” at the other end. In the case of ordinary speech, these elements are the speaker’s brain, the speaker’s vocal cords, the air, the listener’s ear, and the listener’s brain. As prominent as the other elements in Shannon’s diagram—because for an engineer it is inescapable—is a box labeled “Noise Source.” This covers everything that corrupts the signal, predictably or unpredictably: unwanted additions, plain errors, random disturbances, static, “atmospherics,” interference, and distortion. An unruly family under any circumstances, and Shannon had two different types of systems to deal with, continuous and discrete. In a discrete system, message and signal take the form of individual detached symbols, such as characters or digits or dots and dashes.
Shannon sidestepped this problem by treating the signal as a string of discrete symbols. Now, instead of boosting the power, a sender can overcome noise by using extra symbols for error correction—just as an African drummer makes himself understood across long distances, not by banging the drums harder, but by expanding the verbosity of his discourse. Shannon considered the discrete case to be more fundamental in a mathematical sense as well. And he was considering another point: that treating messages as discrete had application not just for traditional communication but for a new and rather esoteric subfield, the theory of computing machines.
stochastic process is neither deterministic (the next event can be calculated with certainty) nor random (the next event is totally free). It is governed by a set of probabilities. Each event has a probability that depends on the state of the system and perhaps also on its previous history. If for event we substitute symbol, then a natural written language like English or Chinese is a stochastic process. So is digitized speech; so is a television signal. Looking more deeply, Shannon examined statistical structure in terms of how much of a message influences the probability of the next symbol. The answer could be none: each symbol has its own probability but does not depend on what came before. This is the first-order case. In the second-order case, the probability of each symbol depends on the symbol immediately before, but not on any others. Then each two-character combination, or digram, has its own probability: th greater than xp, in English. In the third-order case, one looks at trigrams, and so forth. Beyond that, in ordinary text, it makes sense to look at the level of words rather than individual characters, and many types of statistical facts come into play. Immediately after the word yellow, some words have a higher probability than usual and others virtually zero. After the word an, words beginning with consonants become exceedingly rare. If the letter u ends a word, the word is probably you. If two consecutive letters are the same, they are probably ll, ee, ss, or oo. And structure can extend over long distances: in a message containing the word cow, even after many other characters intervene, the word cow is relatively likely to occur again. As is the word horse. A message, as Shannon saw, can behave like a dynamical system whose future course is conditioned by its past history.
These sequences increasingly “look” like English. Less subjectively, it turns out that touch typists can handle them with increasing speed—another indication of the ways people unconsciously internalize a language’s statistical structure.
Shannon wanted to define the measure of information (represented as H) as the measure of uncertainty: “of how much ‘choice’ is involved in the selection of the event or of how uncertain we are of the outcome.”♦ The probabilities might be the same or different, but generally more choices meant more uncertainty—more information. Choices might be broken down into successive choices, with their own probabilities, and the probabilities had to be additive; for example, the probability of a particular digram should be a weighted sum of the probabilities of the individual symbols. When those probabilities were equal, the amount of information conveyed by each symbol was simply the logarithm of the number of possible symbols—Nyquist and Hartley’s formula: H = n log s For the more realistic case, Shannon reached an elegant solution to the problem of how to measure information as a function of probabilities—an equation that summed the probabilities with a logarithmic weighting (base 2 was most convenient). It is the average logarithm of the improbability of the message; in effect, a measure of unexpectedness: H = −Σ pi log2pi where pi is the probability of each message.
Quantifying predictability and redundancy in this way is a backward way of measuring information content. If a letter can be guessed from what comes before, it is redundant; to the extent that it is redundant, it provides no new information. If English is 75 percent redundant, then a thousand-letter message in English carries only 25 percent as much information as one thousand letters chosen at random. Paradoxical though it sounded, random messages carry more information. The implication was that natural-language text could be encoded more efficiently for transmission or storage.
He labeled this axis “bits storage capacity.”♦ He began listing some items that might be said to “store” information. A digit wheel, of the kind used in a desktop adding machine—ten decimal digits—represents just over 3 bits. At just under 103 bits, he wrote “punched card (all config. allowed).” At 104 he put “page single spaced typing (32 possible symbols).” Near 105 he wrote something offbeat: “genetic constitution of man.” There was no real precedent for this in current scientific thinking. James D. Watson was a twenty-one-year-old student of zoology in Indiana; the discovery of the structure of DNA lay several years in the future. This was the first time anyone suggested the genome was an information store measurable in bits. Shannon’s guess was conservative, by at least four orders of magnitude. He thought a “phono record (128 levels)” held more information: about 300,000 bits. To the 10 million level he assigned a thick professional journal (Proceedings of the Institute of Radio Engineers) and to 1 billion the Encyclopaedia Britannica. He estimated one hour of broadcast television at 1011 bits and one hour of “technicolor movie” at more than a trillion. Finally, just under his pencil mark for 1014, 100 trillion bits, he put the largest information stockpile he could think of: the Library of Congress.
the end of his life Gödel wrote, “It was only by Turing’s work that it became completely clear, that my proof is applicable to every formal system containing arithmetic.”
MOST MATHEMATICAL THEORIES take shape slowly; Shannon’s information theory sprang forth like Athena, fully formed. Yet the little book of Shannon and Weaver drew scant public attention when it appeared in 1949.
strangest review was barely a review at all: five paragraphs in Physics Today, September 1950, signed by Norbert Wiener, Massachusetts Institute of Technology. Wiener began with a faintly patronizing anecdote: Some fifteen years ago, a very bright young student came to the authorities at MIT with an idea for a theory of electric switching dependent on the algebra of logic. The student was Claude E. Shannon. In the present book (Wiener continued), Shannon, along with Warren Weaver, “has summed up his views on communication engineering.” The fundamental idea developed by Shannon, said Wiener, “is that of the amount of information as negative entropy.” He added that he himself—“the author of the present review”—had developed the same idea at about the same time. Wiener declared the book to be work “whose origins were independent of my own work, but which has been bound from the beginning to my investigations by cross influences spreading in both directions.” He mentioned “those of us who have tried to pursue this analogy into the study of Maxwell’s demon” and added that much work remained to be done. Then he suggested that the treatment of language was incomplete without greater emphasis on the human nervous system: “nervous reception and the transmission of language into the brain. I say these things not as a hostile criticism.” Finally, Wiener concluded with a paragraph devoted to another new book: “my own Cybernetics.” Both books, he said, represent opening salvos in a field that promises to grow rapidly. In my book, I have taken the privilege of an author to be more speculative, and to cover a wider range than Drs. Shannon and Weaver have chosen to do.… There is not only room, but a definite need for different books. He saluted his colleagues for their well-worked and independent approach—to cybernetics.
Shannon’s colleague John Pierce wrote later: “Wiener’s head was full of his own work.… Competent people have told me that Wiener, under the misapprehension that he already knew what Shannon had done, never actually found out.”♦
Where Shannon’s fire-control work drilled down to the signal amid the noise, Wiener stayed with the noise: swarming fluctuations in the radar receiver, unpredictable deviations in flight paths. The noise behaved statistically, he understood, like Brownian motion, the “extremely lively and wholly haphazard movement” that van Leeuwenhoek had observed through his microscope in the seventeenth century. Wiener had undertaken a thoroughgoing mathematical treatment of Brownian motion in the 1920s. The very discontinuity appealed to him—not just the particle trajectories but the mathematical functions, too, seemed to misbehave. This was, as he wrote, discrete chaos, a term that would not be well understood for several generations.
Both the antiaircraft gun, with its operator, and the target airplane, with its pilot, were hybrids of machine and human. One had to predict the behavior of the other.
Wiener was as worldly as Shannon was reticent. He was well traveled and polyglot, ambitious and socially aware; he took science personally and passionately. His expression of the second law of thermodynamics, for example, was a cry of the heart: We are swimming upstream against a great torrent of disorganization, which tends to reduce everything to the heat death of equilibrium and sameness.… This heat death in physics has a counterpart in the ethics of Kierkegaard, who pointed out that we live in a chaotic moral universe. In this, our main obligation is to establish arbitrary enclaves of order and system.… Like the Red Queen, we cannot stay where we are without running as fast as we can.
He was concerned for his place in intellectual history, and he aimed high. Cybernetics, he wrote in his memoirs, amounted to “a new interpretation of man, of man’s knowledge of the universe, and of society.”♦ Where Shannon saw himself as a mathematician and an engineer, Wiener considered himself foremost a philosopher, and from his fire-control work he drew philosophical lessons about purpose and behavior. If one defines behavior cleverly—“any change of an entity with respect to its surroundings”♦—then the word can apply to machines as well as animals. Behavior directed toward a goal is purposeful, and the purpose can sometimes be imputed to the machine rather than a human operator: for example, in the case of a target-seeking mechanism. “The term servomechanisms has been coined precisely to designate machines with an intrinsic purposeful behavior.” The key was control, or self-regulation.
To analyze it properly he borrowed an obscure term from electrical engineering: “feed-back,” the return of energy from a circuit’s output back to its input. When feedback is positive, as when the sound from loudspeakers is re-amplified through a microphone, it grows wildly out of control. But when feedback is negative—as in the original mechanical governor of steam engines, first analyzed by James Clerk Maxwell—it can guide a system toward equilibrium; it serves as an agent of stability. Feedback can be mechanical: the faster Maxwell’s governor spins, the wider its arms extend, and the wider its arms extend, the slower it must spin. Or it can be electrical. Either way, the key to the process is information. What governs the antiaircraft gun, for example, is information about the plane’s coordinates and about the previous position of the gun itself. Wiener’s friend Bigelow emphasized this: “that it was not some particular physical thing such as energy or length or voltage, but only information (conveyed by any means).”♦ Negative feedback must be ubiquitous, Wiener felt. He could see it at work in the coordination of eye and hand, guiding the nervous system of a person performing an action as ordinary as picking up a pencil. He focused especially on neurological disorders, maladies that disrupted physical coordination or language. He saw them quite specifically as cases of information feedback gone awry: varieties of ataxia, for example, where sense messages are either interrupted in the spinal cord or misinterpreted in the cerebellum. His analysis was detailed and mathematical, with equations—almost unheard of in neurology. Meanwhile, feedback control systems were creeping into factory assembly lines, because a mechanical system, too, can modify its own behavior. Feedback is the governor, the steersman.
Dr. Wiener sees no reason why they can’t learn from experience, like monstrous and precocious children racing through grammar school. One such mechanical brain, ripe with stored experience, might run a whole industry, replacing not only mechanics and clerks but many of the executives too.… As men construct better calculating machines, explains Wiener, and as they explore their own brains, the two seem more & more alike. Man, he thinks, is recreating himself, monstrously magnified, in his own image. Much of the success of his book, abstruse and ungainly as it was, lay in Wiener’s always returning his focus to the human, not the machine. He was not as interested in shedding light on the rise of computing—to which, in any case, his connections were peripheral—as in how computing might shed light on humanity. He cared profoundly, it turned out, about understanding mental disorders; about mechanical prostheses; and about the social dislocations that might follow the rise of smart machinery. He worried that it would devalue the human brain as factory machinery had devalued the human hand.
“Information is information, not matter or energy. No materialism which does not admit this can survive at the present day.” Now came a time of excitement. “We are again in one of those prodigious periods of scientific progress—in its own way like the pre-Socratic period,” declared the gnomic, white-bearded neurophysiologist Warren McCulloch to a meeting of British philosophers. He told them that listening to Wiener and von Neumann put him in mind of the debates of the ancients. A new physics of communication had been born, he said, and metaphysics would never be the same: “For the first time in the history of science we know how we know and hence are able to state it clearly.”♦ He offered them heresy: that the knower was a computing machine, the brain composed of relays, perhaps ten billion of them, each receiving signals from other relays and sending them onward. The signals are quantized: they either happen or do not happen. So once again the stuff of the world, he said, turns out to be the atoms of Democritus—“indivisibles—leasts—which go batting about in the void.” It is a world for Heraclitus, always “on the move.” I do not mean merely that every relay is itself being momentarily destroyed and re-created like a flame, but I mean that its business is with information which pours into it over many channels, passes through it, eddies within it and emerges again to the world.
That these ideas were spilling across disciplinary borders was due in large part to McCulloch, a dynamo of eclecticism and cross-fertilization. Soon after the war he began organizing a series of conferences at the Beekman Hotel on Park Avenue in New York City, with money from the Josiah Macy Jr. Foundation, endowed in the nineteenth century by heirs of Nantucket whalers. A host of sciences were coming of age all at once—so-called social sciences, like anthropology and psychology, looking for new mathematical footing; medical offshoots with hybrid names, like neurophysiology; not-quite-sciences like psychoanalysis—and McCulloch invited experts in all these fields, as well as mathematics and electrical engineering. He instituted a Noah’s Ark rule, inviting two of each species so that speakers would always have someone present who could see through their jargon.♦ Among the core group were the already famous anthropologist Margaret Mead and her then-husband Gregory Bateson, the psychologists Lawrence K. Frank and Heinrich Klüver, and that formidable, sometimes rivalrous pair of mathematicians, Wiener and von Neumann.
For Wiener, entropy was a measure of disorder; for Shannon, of uncertainty. Fundamentally, as they were realizing, these were the same.
The more inherent order exists in a sample of English text—order in the form of statistical patterns, known consciously or unconsciously to speakers of the language—the more predictability there is, and in Shannon’s terms, the less information is conveyed by each subsequent letter. When the subject guesses the next letter with confidence, it is redundant, and the arrival of the letter contributes no new information. Information is surprise.
Heinz von Foerster, a young physicist from Vienna and an early acolyte of Wittgenstein, wondered how the degree of redundancy in a language might change as the language evolved, and especially in the transition from oral to written culture. Von Foerster, like Margaret Mead and others, felt uncomfortable with the notion of information without meaning. “I wanted to call the whole of what they called information theory signal theory,” he said later, “because information was not yet there. There were ‘beep beeps’ but that was all, no information. The moment one transforms that set of signals into other signals our brain can make an understanding of, then information is born—it’s not in the beeps.”♦
The discussion changed his mind about the centrality of information. He added an epigrammatic note to his transcript of the eighth conference: “Information can be considered as order wrenched from disorder.”♦
Sometimes, however, a particularly awkward combination of previous memory and a new maze would put the machine in an endless loop. He showed them: “When it arrives at A, it remembers that the old solution said to go to B, and so it goes around the circle, A, B, C, D, A, B, C, D. It has established a vicious circle, or a singing condition.”♦ “A neurosis!” said Ralph Gerard. Shannon added “an antineurotic circuit”: a counter, set to break out of the loop when the machine repeated the same sequence six times. Leonard Savage saw that this was a bit of a cheat.
One critic, Dennis Gabor, a Hungarian electrical engineer who later won the Nobel Prize for inventing holography, complained, “In reality it is the maze which remembers, not the mouse.”♦ This was true up to a point. After all, there was no mouse. The electrical relays could have been placed anywhere, and they held the memory. They became, in effect, a mental model of a maze—a theory of a maze.
They talked not just about understanding brains but “designing” them. A psychiatrist, W. Ross Ashby, announced that he was working on the idea that “a brain consisting of randomly connected impressional synapses would assume the required degree of orderliness as a result of experience”♦—in other words, that the mind is a self-organizing dynamical system.
The digital computer comprises three parts: a “store of information,” corresponding to the human computer’s memory or paper; an “executive unit,” which carries out individual operations; and a “control,” which manages a list of instructions, making sure they are carried out in the right order. These instructions are encoded as numbers. They are sometimes called a “programme,” Turing explains, and constructing such a list may be called “programming.” The idea is an old one, Turing says, and he cites Charles Babbage, whom he identifies as Lucasian Professor of Mathematics at Cambridge from 1828 to 1839—once so famous, now almost forgotten. Turing explains that Babbage “had all the essential ideas” and “planned such a machine, called the Analytical Engine, but it was never completed.” It would have used wheels and cards—nothing to do with electricity. The existence (or nonexistence, but at least near existence) of Babbage’s engine allows Turing to rebut a superstition he senses forming in the zeitgeist of 1950. People seem to feel that the magic of digital computers is essentially electrical; meanwhile, the nervous system is also electrical. But Turing is at pains to think of computation in a universal way, which means in an abstract way. He knows it is not about electricity at all: Since Babbage’s machine was not electrical, and since all digital computers are in a sense equivalent, we see that this use of electricity cannot be of theoretical importance.… The feature of using electricity is thus seen to be only a very superficial similarity.
Turing’s famous computer was a machine made of logic: imaginary tape, arbitrary symbols. It had all the time in the world and unbounded memory, and it could do anything expressible in steps and operations. It could even judge the validity of a proof in the system of Principia Mathematica. “In the case that the formula is neither provable nor disprovable such a machine certainly does not behave in a very satisfactory manner, for it continues to work indefinitely without producing any result at all, but this cannot be regarded as very different from the reaction of the mathematicians.”
Looking for rigor, verifiability, and perhaps even mathematicization, students of the mind veered in radically different directions by the turn of the twentieth century. Sigmund Freud’s path was only one. In the United States, William James constructed a discipline of psychology almost single-handed—professor of the first university courses, author of the first comprehensive textbook—and when he was done, he threw up his hands. His own Principles of Psychology, he wrote, was “a loathsome, distended, tumefied, bloated, dropsical mass, testifying to but two facts: 1st, that there is no such thing as a science of psychology, and 2nd, that WJ is an incapable.”♦
A behaviorist running a rat through a maze would discuss the association between stimulus and response but would refuse to speculate in any way about the mind of the rat; now engineers were building mental models of rats out of a few electrical relays. They were not just prying open the black box; they were making their own. Signals were being transmitted, encoded, stored, and retrieved. Internal models of the external world were created and updated. Psychologists took note. From information theory and cybernetics, they received a set of useful metaphors and even a productive conceptual framework.
“The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.” A psychologist could hardly fail to consider the case where the source of the message is the outside world and the receiver is the mind.
An influential counterpart of Broadbent’s in the United States was George Miller, who helped found the Center for Cognitive Studies at Harvard in 1960. He was already famous for a paper published in 1956 under the slightly whimsical title “The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information.”♦ Seven seemed to be the number of items that most people could hold in working memory at any one time: seven digits (the typical American telephone number of the time), seven words, or seven objects displayed by an experimental psychologist. The number also kept popping up, Miller claimed, in other sorts of experiments. Laboratory subjects were fed sips of water with different amounts of salt, to see how many different levels of saltiness they could discriminate. They were asked to detect differences between tones of varying pitch or loudness. They were shown random patterns of dots, flashed on a screen, and asked how many (below seven, they almost always knew; above seven, they almost always estimated). In one way and another, the number seven kept recurring as a threshold. “This number assumes a variety of disguises,” he wrote, “being sometimes a little larger and sometimes a little smaller than usual, but never changing so much as to be unrecognizable.”
So the general rule is simple: every time the number of alternatives is increased by a factor of two, one bit of information is added.
people perform acts of what information theorists call “recoding,” grouping information into larger and larger chunks—for example, organizing telegraph dots and dashes into letters, letters into words, and words into phrases. By now Miller’s argument had become something in the nature of a manifesto. Recoding, he declared, “seems to me to be the very lifeblood of the thought processes.” The concepts and measures provided by the theory of information provide a quantitative way of getting at some of these questions. The theory provides us with a yardstick for calibrating our stimulus materials and for measuring the performance of our subjects.… Informational concepts have already proved valuable in the study of discrimination and of language; they promise a great deal in the study of learning and memory; and it has even been proposed that they can be useful in the study of concept formation. A lot of questions that seemed fruitless twenty or thirty years ago may now be worth another look. This was the beginning of the movement called the cognitive revolution in psychology, and it laid the foundation for the discipline called cognitive science, combining psychology, computer science, and philosophy.
As for cybernetics, the word began to fade. The Macy cyberneticians held their last meeting in 1953, at the Nassau Inn in Princeton; Wiener had fallen out with several of the group, who were barely speaking to him. Given the task of summing up, McCulloch sounded wistful. “Our consensus has never been unanimous,” he said. “Even had it been so, I see no reason why God should have agreed with us.”♦
What Marshall McLuhan later called the “medium” was for Shannon the channel, and the channel was subject to rigorous mathematical treatment. The applications were immediate and the results fertile: broadcast channels and wiretap channels, noisy and noiseless channels, Gaussian channels, channels with input constraints and cost constraints, channels with feedback and channels with memory, multiuser channels and multiaccess channels. (When McLuhan announced that the medium was the message, he was being arch. The medium is both opposite to, and entwined with, the message.)
Later, he wrote thousands of words on scientific aspects of juggling♦—with theorems and corollaries—and included from memory a quotation from E. E. Cummings: “Some son-of-a-bitch will invent a machine to measure Spring with.” In the 1950s Shannon was also trying to design a machine that would repair itself.
IT WOULD BE AN EXAGGERATION TO SAY that no one knew what entropy meant. Still, it was one of those words. The rumor at Bell Labs was that Shannon had gotten it from John von Neumann, who advised him he would win every argument because no one would understand it.♦ Untrue, but plausible. The word began by meaning the opposite of itself.
The ability of a thermodynamic system to produce work depends not on the heat itself, but on the contrast between hot and cold. A hot stone plunged into cold water can generate work—for example, by creating steam that drives a turbine—but the total heat in the system (stone plus water) remains constant. Eventually, the stone and the water reach the same temperature. No matter how much energy a closed system contains, when everything is the same temperature, no work can be done.
It is the unavailability of this energy—its uselessness for work—that Clausius wanted to measure. He came up with the word entropy, formed from Greek to mean “transformation content.” His English counterparts immediately saw the point but decided Clausius had it backward in focusing on the negative. James Clerk Maxwell suggested in his Theory of Heat that it would be “more convenient” to make entropy mean the opposite: “the part which can be converted into mechanical work.” Thus: When the pressure and temperature of the system have become uniform the entropy is exhausted. Within a few years, though, Maxwell turned about-face and decided to follow Clausius.♦ He rewrote his book and added an abashed footnote: In former editions of this book the meaning of the term Entropy, as introduced by Clausius, was erroneously stated to be that part of the energy which cannot be converted into work. The book then proceeded to use the term as equivalent to the available energy; thus introducing great confusion into the language of thermodynamics. In this edition I have endeavoured to use the word Entropy according to its original definition by Clausius.
The problem was not just in choosing between positive and negative. It was subtler than that. Maxwell had first considered entropy as a subtype of energy: the energy available for work. On reconsideration, he recognized that thermodynamics needed an entirely different measure. Entropy was not a kind of energy or an amount of energy; it was, as Clausius had said, the unavailability of energy. Abstract though this was, it turned out to be a quantity as measurable as temperature, volume, or pressure. It became a totemic concept. With entropy, the “laws” of thermodynamics could be neatly expressed: First law: The energy of the universe is constant. Second law: The entropy of the universe always increases.
There are many other formulations of these laws, from the mathematical to the whimsical, e.g., “1. You can’t win; 2. You can’t break even either.”♦ But this is the cosmic, fateful one. The universe is running down. It is a degenerative one-way street. The final state of maximum entropy is our destiny. William Thomson, Lord Kelvin, imprinted the second law on the popular imagination by reveling in its bleakness: “Although mechanical energy is indestructible,” he declared in 1862, “there is a universal tendency to its dissipation, which produces gradual augmentation and diffusion of heat, cessation of motion, and exhaustion of potential energy through the material universe. The result of this would be a state of universal rest and death.”♦ Thus entropy dictated the universe’s fate in H. G. Wells’s novel The Time Machine: the life ebbing away, the dying sun, the “abominable desolation that hung over the world.”
Thomson liked the word dissipation for this. Energy is not lost, but it dissipates. Dissipated energy is present but useless. It was Maxwell, though, who began to focus on the confusion itself—the disorder—as entropy’s essential quality. Disorder seemed strangely unphysical. It implied that a piece of the equation must be something like knowledge, or intelligence, or judgment. “The idea of dissipation of energy depends on the extent of our knowledge,” Maxwell said. “Available energy is energy which we can direct into any desired channel. Dissipated energy is energy which we cannot lay hold of and direct at pleasure, such as the energy of the confused agitation of molecules which we call heat.”
What we can do, or know, became part of the definition. It seemed impossible to talk about order and disorder without involving an agent or an observer—without talking about the mind: Confusion, like the correlative term order, is not a property of material things in themselves, but only in relation to the mind which perceives them.
Order is subjective—in the eye of the beholder. Order and confusion are not the sorts of things a mathematician would try to define or measure. Or are they? If disorder corresponded to entropy, maybe it was ready for scientific treatment after all.
“Time flows on, never comes back.” Such processes run in one direction only. Probability is the reason. What is remarkable—physicists took a long time to accept it—is that every irreversible process must be explained the same way. Time itself depends on chance, or “the accidents of life,” as Richard Feynman liked to say: “Well, you see that all there is to it is that the irreversibility is caused by the general accidents of life.”♦ For the box of gas to come unmixed is not physically impossible; it is just improbable in the extreme. So the second law is merely probabilistic. Statistically, everything tends toward maximum entropy. Yet probability is enough: enough for the second law to stand as a pillar of science. As Maxwell put it: Moral. The 2nd law of Thermodynamics has the same degree of truth as the statement that if you throw a tumblerful of water into the sea, you cannot get the same tumblerful of water out again.♦ The improbability of heat passing from a colder to a warmer body (without help from elsewhere) is identical to the improbability of order arranging itself from disorder (without help from elsewhere). Both, fundamentally, are due only to statistics. Counting all the possible ways a system can be arranged, the disorderly ones far outnumber the orderly ones. There are many arrangements, or “states,” in which molecules are all jumbled, and few in which they are neatly sorted. The orderly states have low probability and low entropy. For impressive degrees of orderliness, the probabilities may be very low. Alan Turing once whimsically proposed a number N, defined as “the odds against a piece of chalk leaping across the room and writing a line of Shakespeare on the board.”♦
Eventually physicists began speaking of microstates and macrostates. A macrostate might be: all the gas in the top half of the box. The corresponding microstates would be all the possible arrangements of all particles—positions and velocities. Entropy thus became a physical equivalent of probability: the entropy of a given macrostate is the logarithm of the number of its possible microstates. The second law, then, is the tendency of the universe to flow from less likely (orderly) to more likely (disorderly) macrostates.
The demon sees what we cannot—because we are so gross and slow—namely, that the second law is statistical, not mechanical. At the level of molecules, it is violated all the time, here and there, purely by chance. The demon replaces chance with purpose. It uses information to reduce entropy. Maxwell never imagined how popular his demon would become, nor how long-lived. Henry Adams, who wanted to work some version of entropy into his theory of history, wrote to his brother Brooks in 1903, “Clerk Maxwell’s demon who runs the second law of Thermo-dynamics ought to be made President.”♦ The demon presided over a gateway—at first, a magical gateway—from the world of physics to the world of information.
Implacable as the laws of nature now seemed, the demon defied these laws. It was a burglar, picking the lock one molecule at a time. It had “infinitely subtile senses,” wrote Henri Poincaré, and “could turn back the course of the universe.”♦ Was this not just what humans dreamed of doing? Through their ever better microscopes, scientists of the early twentieth century examined the active, sorting processes of biological membranes. They discovered that living cells act as pumps, filters, and factories. Purposeful processes seemed to operate at tiny scales. Who or what was in control? Life itself seemed an organizing force. “Now we must not introduce demonology into science,” wrote the British biologist James Johnstone in 1914.
Szilárd showed that even this perpetual motion machine would have to fail. What was the catch? Simply put: information is not free. Maxwell, Thomson, and the rest had implicitly talked as though knowledge was there for the taking—knowledge of the velocities and trajectories of molecules coming and going before the demon’s eyes. They did not consider the cost of this information. They could not; for them, in a simpler time, it was as if the information belonged to a parallel universe, an astral plane, not linked to the universe of matter and energy, particles and forces, whose behavior they were learning to calculate. But information is physical. Maxwell’s demon makes the link. The demon performs a conversion between information and energy, one particle at a time. Szilárd—who did not yet use the word information—found that, if he accounted exactly for each measurement and memory, then the conversion could be computed exactly. So he computed it. He calculated that each unit of information brings a corresponding increase in entropy—specifically, by k log 2 units. Every time the demon makes a choice between one particle and another, it costs one bit of information. The payback comes at the end of the cycle, when it has to clear its memory (Szilárd did not specify this last detail in words, but in mathematics). Accounting for this properly is the only way to eliminate the paradox of perpetual motion, to bring the universe back into harmony, to “restore concordance with the Second Law.” Szilárd had thus closed a loop leading to Shannon’s conception of entropy as information.
For his part, Shannon did not read German and did not follow Zeitschrift für Physik. “I think actually Szilárd was thinking of this,” he said much later, “and he talked to von Neumann about it, and von Neumann may have talked to Wiener about it. But none of these people actually talked to me about it.”♦ Shannon reinvented the mathematics of entropy nonetheless. To the physicist, entropy is a measure of uncertainty about the state of a physical system: one state among all the possible states it can be in. These microstates may not be equally likely, so the physicist writes S = −Σ pi log pi. To the information theorist, entropy is a measure of uncertainty about a message: one message among all the possible messages that a communications source can produce. The possible messages may not be equally likely, so Shannon wrote H = −Σ pi log pi. It is not just a coincidence of formalism: nature providing similar answers to similar problems. It is all one problem. To reduce entropy in a box of gas, to perform useful work, one pays a price in information. Likewise, a particular message reduces the entropy in the ensemble of possible messages—in terms of dynamical systems, a phase space.
That was how Shannon saw it. Wiener’s version was slightly different. It was fitting—for a word that began by meaning the opposite of itself—that these colleagues and rivals placed opposite signs on their formulations of entropy. Where Shannon identified information with entropy, Wiener said it was negative entropy. Wiener was saying that information meant order, but an orderly thing does not necessarily embody much information. Shannon himself pointed out their difference and minimized it, calling it a sort of “mathematical pun.” They get the same numerical answers, he noted: I consider how much information is produced when a choice is made from a set—the larger the set the more information. You consider the larger uncertainty in the case of a larger set to mean less knowledge of the situation and hence less information.♦ Put another way, H is a measure of surprise. Put yet another way, H is the average number of yes-no questions needed to guess the unknown message.
Shannon had it right—at least, his approach proved fertile for mathematicians and physicists a generation later—but the confusion lingered for some years. Order and disorder still needed some sorting.
In 1943 Erwin Schrödinger, the chain-smoking, bow-tied pioneer of quantum physics, asked to deliver the Statutory Public Lectures at Trinity College, Dublin, decided the time had come to answer one of the greatest of unanswerable questions: What is life? The equation bearing his name was the essential formulation of quantum mechanics. In looking beyond his field, as middle-aged Nobel laureates so often do, Schrödinger traded rigor for speculation and began by apologizing “that some of us should venture to embark on a synthesis of facts and theories, albeit with second-hand and incomplete knowledge of some of them—and at the risk of making fools of ourselves.”♦ Nonetheless, the little book he made from these lectures became influential. Without discovering or even stating anything new, it laid a foundation for a nascent science, as yet unnamed, combining genetics and biochemistry.
Schrödinger began with what he called the enigma of biological stability. In notable contrast to a box of gas, with its vagaries of probability and fluctuation, and in seeming disregard of Schrödinger’s own wave mechanics, where uncertainty is the rule, the structures of a living creature exhibit remarkable permanence. They persist, both in the life of the organism and across generations, through heredity. This struck Schrödinger as requiring explanation. “When is a piece of matter said to be alive?”♦ he asked.
Norbert Wiener pursued this thought in Cybernetics: enzymes, he wrote, may be “metastable” Maxwell’s demons—meaning not quite stable, or precariously stable. “The stable state of an enzyme is to be deconditioned,” he noted, “and the stable state of a living organism is to be dead.”♦ Schrödinger felt that evading the second law for a while, or seeming to, is exactly why a living creature “appears so enigmatic.” The organism’s ability to feign perpetual motion leads so many people to believe in a special, supernatural life force. He mocked this idea—vis viva or entelechy—and he also mocked the popular notion that organisms “feed upon energy.” Energy and matter were just two sides of a coin, and anyway one calorie is as good as another. No, he said: the organism feeds upon negative entropy.
“To put it less paradoxically,” he added paradoxically, “the essential thing in metabolism is that the organism succeeds in freeing itself from all the entropy it cannot help producing while alive.”♦ In other words, the organism sucks orderliness from its surroundings. Herbivores and carnivores dine on a smorgasbord of structure; they feed on organic compounds, matter in a well-ordered state, and return it “in a very much degraded form—not entirely degraded, however, for plants can make use of it.” Plants meanwhile draw not just energy but negative entropy from sunlight. In terms of energy, the accounting can be more or less rigorously performed. In terms of order, calculations are not so simple. The mathematical reckoning of order and chaos remains more ticklish, the relevant definitions being subject to feedback loops of their own.
And we must understand this pattern as a four-dimensional object: the structure of the organism through the whole of its ontogenetic development, every stage from embryo to adult.
In seeking a clue to the gene’s molecular structure, it seemed natural to look to the most organized forms of matter, crystals. Solids in crystalline form have a relative permanence; they can begin with a tiny germ and build up larger and larger structures; and quantum mechanics was beginning to give deep insight into the forces involved in their bonding. But Schrödinger felt something was missing. Crystals are too orderly—built up in “the comparatively dull way of repeating the same structure in three directions again and again.” Elaborate though they seem, crystalline solids contain just a few types of atoms. Life must depend on a higher level of complexity, structure without predictable repetition, he argued. He invented a term: aperiodic crystals. This was his hypothesis: We believe a gene—or perhaps the whole chromosome fiber—to be an aperiodic solid.♦ He could hardly emphasize enough the glory of this difference, between periodic and aperiodic: The difference in structure is of the same kind as that between an ordinary wallpaper in which the same pattern is repeated again and again in regular periodicity and a masterpiece of embroidery, say a Raphael tapestry, which shows no dull repetition, but an elaborate, coherent, meaningful design.♦
SCIENTISTS LOVE THEIR FUNDAMENTAL PARTICLES. If traits are handed down from one generation to the next, these traits must take some primal form or have some carrier. Hence the putative particle of protoplasm. “The biologist must be allowed as much scientific use of the imagination as the physicist,” The Popular Science Monthly explained in 1875. “If the one must have his atoms and molecules, the other must have his physiological units, his plastic molecules, his ‘plasticules.’ ”♦
To banish the fallacious thinking, he proposed a new terminology, beginning with gene: “nothing but a very applicable little word, easily combined with others.”♦ It hardly mattered that neither he nor anyone else knew what a gene actually was; “it may be useful as an expression for the ‘unit-factors,’ ‘elements,’ or ‘allelomorphs.’… As to the nature of the ‘genes’ it is as yet of no value to propose a hypothesis.” Gregor Mendel’s years of research with green and yellow peas showed that such a thing must exist. Colors and other traits vary depending on many factors, such as temperature and soil content, but something is preserved whole; it does not blend or diffuse; it must be quantized.♦ Mendel had discovered the gene, though he did not name it. For him it was more an algebraic convenience than a physical entity. When Schrödinger contemplated the gene, he faced a problem. How could such a “tiny speck of material” contain the entire complex code-script that determines the elaborate development of the organism? To resolve the difficulty Schrödinger summoned an example not from wave mechanics or theoretical physics but from telegraphy: Morse code. He noted that two signs, dot and dash, could be combined in well-ordered groups to generate all human language. Genes, too, he suggested, must employ a code: “The miniature code should precisely correspond with a highly complicated and specified plan of development and should somehow contain the means to put it into action.”
Quastler, an early radiologist from Vienna, then at the University of Illinois, was applying information theory to both biology and psychology; he estimated that an amino acid has the information content of a written word and a protein molecule the information content of a paragraph. His colleague Sidney Dancoff suggested to him in 1950 that a chromosomal thread is “a linear coded tape of information”♦: The entire thread constitutes a “message.” This message can be broken down into sub-units which may be called “paragraphs,” “words,” etc. The smallest message unit is perhaps some flip-flop which can make a yes-no decision. In 1952 Quastler organized a symposium on information theory in biology, with no purpose but to deploy these new ideas—entropy, noise, messaging, differentiating—in areas from cell structure and enzyme catalysis to large-scale “biosystems.”
The whole set of instructions—situated “somewhere in the chromosomes”—is the genome. This is a “catalogue,” he said, containing, if not all, then at least “a substantial fraction of all information about an adult organism.”
A consensus had emerged that whatever genes were, however they functioned, they would probably be proteins: giant organic molecules made of long chains of amino acids.
What they discovered became an icon: the double helix, heralded on magazine covers, emulated in sculpture. DNA is formed of two long sequences of bases, like ciphers coded in a four-letter alphabet, each sequence complementary to the other, coiled together. Unzipped, each strand may serve as a template for replication. (Was it Schrödinger’s “aperiodic crystal”? In terms of physical structure, X-ray diffraction showed DNA to be entirely regular. The aperiodicity lies at the abstract level of language—the sequence of “letters.”)
The macromolecules of organic life embody information in an intricate structure. A single hemoglobin molecule comprises four chains of polypeptides, two with 141 amino acids and two with 146, in strict linear sequence, bonded and folded together. Atoms of hydrogen, oxygen, carbon, and iron could mingle randomly for the lifetime of the universe and be no more likely to form hemoglobin than the proverbial chimpanzees to type the works of Shakespeare. Their genesis requires energy; they are built up from simpler, less patterned parts, and the law of entropy applies. For earthly life, the energy comes as photons from the sun. The information comes via evolution. The DNA molecule was special: the information it bears is its only function. Having recognized this, microbiologists turned to the problem of deciphering the code. Crick, who had been inspired to leave physics for biology when he read Schrödinger’s What Is Life?, sent Schrödinger a copy of the paper but did not receive a reply.
Gamow, at the other extreme, was bypassing the biochemical details to put forward an idea of shocking simplicity: that any living organism is determined by “a long number written in a four-digital system.”♦ He called this “the number of the beast” (from Revelation). If two beasts have the same number, they are identical twins.
The genetic code performed a function with uncanny similarities to the metamathematical code invented by Gödel for his philosophical purposes. Gödel’s code substitutes plain numbers for mathematical expressions and operations; the genetic code uses triplets of nucleotides to represent amino acids. Douglas Hofstadter was the first to make this connection explicitly, in the 1980s: “between the complex machinery in a living cell that enables a DNA molecule to replicate itself and the clever machinery in a mathematical system that enables a formula to say things about itself.”♦ In both cases he saw a twisty feedback loop. “Nobody had ever in the least suspected that one set of chemicals could code for another set,” Hofstadter wrote. Indeed, the very idea is somewhat baffling: If there is a code, then who invented it? What kinds of messages are written in it? Who writes them? Who reads
The Tie Club recognized that the problem was not just information storage but information transfer. DNA serves two different functions. First, it preserves information. It does this by copying itself, from generation to generation, spanning eons—a Library of Alexandria that keeps its data safe by copying itself billions of times. Notwithstanding the beautiful double helix, this information store is essentially one-dimensional: a string of elements arrayed in a line. In human DNA, the nucleotide units number more than a billion, and this detailed gigabit message must be conserved perfectly, or almost perfectly. Second, however, DNA also sends that information outward for use in the making of the organism. The data stored in a one-dimensional strand has to flower forth in three dimensions. This information transfer occurs via messages passing from the nucleic acids to proteins. So DNA not only replicates itself; separately, it dictates the manufacture of something entirely different. These proteins, with their own enormous complexity, serve as the material of a body, the mortar and bricks, and also as the control system, the plumbing and wiring and the chemical signals that control growth. The replication of DNA is a copying of information. The manufacture of proteins is a transfer of information: the sending of a message. Biologists could see this clearly now, because the message was now well defined and abstracted from any particular substrate.
Triplet codons remained central, and a solution seemed tantalizingly close but out of reach. A problem was how nature punctuated the seemingly unbroken DNA and RNA strands. No one could see a biological equivalent for the pauses that separate letters in Morse code, or the spaces that separate words. Perhaps every fourth base was a comma. Or maybe (Crick suggested) commas would be unnecessary if some triplets made “sense” and others made “nonsense.”♦ Then again, maybe a sort of tape reader just needed to start at a certain point and count off the nucleotides three by three. Among the mathematicians drawn to this problem were a group at the new Jet Propulsion Laboratory in Pasadena, California, meant to be working on aerospace research. To them it looked like a classic problem in Shannon coding theory: “the sequence of nucleotides as an infinite message, written without punctuation, from which any finite portion must be decodable into a sequence of amino acids by suitable insertion of commas.”♦ They constructed a dictionary of codes. They considered the problem of misprints. Biochemistry did matter. All the world’s cryptanalysts, lacking petri dishes and laboratory kitchens, would not have been able to guess from among the universe of possible answers. When the genetic code was solved, in the early 1960s, it turned out to be full of redundancy.
Crick crystallized its fundamental principles in a statement that he called (and is called to this day) the Central Dogma. It is a hypothesis about the direction of evolution and the origin of life; it is provable in terms of Shannon entropy in the possible chemical alphabets: Once “information” has passed into protein it cannot get out again. In more detail, the transfer of information from nucleic acid to nucleic acid, or from nucleic acid to protein may be possible, but transfer from protein to protein, or from protein to nucleic acid is impossible. Information means here the precise determination of sequence.♦ The genetic message is independent and impenetrable: no information from events outside can change it.
By this time, the technical jargon of biologists included the words alphabet, library, editing, proofreading, transcription, translation, nonsense, synonym, and redundancy. Genetics and DNA had drawn the attention not just of cryptographers but of classical linguists. Certain proteins, capable of flipping from one relatively stable state to another, were found to act as relays, accepting ciphered commands and passing them to their neighbors—switching stations in three-dimensional communications networks. Brenner, looking forward, thought the focus would turn to computer science as well. He envisioned a science—though it did not yet have a name—of chaos and complexity. “I think in the next twenty-five years we are going to have to teach biologists another language still,” he said. “I don’t know what it’s called yet; nobody knows. But what one is aiming at, I think, is the fundamental problem of the theory of elaborate systems.”
As molecular biology perfected its knowledge of the details of DNA and grew more skillful in manipulating these molecular prodigies, it was natural to see them as the answer to the great question of life: how do organisms reproduce themselves? We use DNA, just as we use lungs to breathe and eyes to see. We use it. “This attitude is an error of great profundity,”♦ Dawkins wrote. “It is the truth turned crashingly on its head.” DNA came first—by billions of years—and DNA comes first, he argued, when life is viewed from the proper perspective. From that perspective, genes are the focus, the sine qua non, the star of the show. In his first book—published in 1976, meant for a broad audience, provocatively titled The Selfish Gene—he set off decades of debate by declaring: “We are survival machines—robot vehicles blindly programmed to preserve the selfish molecules known as genes.”♦ He said this was a truth he had known for years. Genes, not organisms, are the true units of natural selection.
Samuel Butler had said a century earlier—and did not claim to be the first—that a hen is only an egg’s way of making another egg. Butler was quite serious, in his way: Every creature must be allowed to “run” its own development in its own way; the egg’s way may seem a very roundabout manner of doing things; but it is its way, and it is one of which man, upon the whole, has no great reason to complain. Why the fowl should be considered more alive than the egg, and why it should be said that the hen lays the egg, and not that the egg lays the hen, these are questions which lie beyond the power of philosophic explanation, but are perhaps most answerable by considering the conceit of man, and his habit, persisted in during many ages, of ignoring all that does not remind him of himself.♦ He added, “But, perhaps, after all, the real reason is, that the egg does not cackle when it has laid the hen.” Some time later, Butler’s template, X is just a Y’s way of making another Y, began reappearing in many forms. “A scholar,” said Daniel Dennett in 1995, “is just a library’s way of making another library.”♦ Dennett, too, was not entirely joking.
A part of Dawkins’s purpose was to explain altruism: behavior in individuals that goes against their own best interests. Nature is full of examples of animals risking their own lives in behalf of their progeny, their cousins, or just fellow members of their genetic club. Furthermore, they share food; they cooperate in building hives and dams; they doggedly protect their eggs. To explain such behavior—to explain any adaptation, for that matter—one asks the forensic detective’s question, cui bono? Who benefits when a bird spots a predator and cries out, warning the flock but also calling attention to itself? It is tempting to think in terms of the good of the group—the family, tribe, or species—but most theorists agree that evolution does not work that way. Natural selection can seldom operate at the level of groups. It turns out, however, that many explanations fall neatly into place if one thinks of the individual as trying to propagate its particular assortment of genes down through the future.
It took some time, but the gene-centered, information-based perspective led to a new kind of detective work in tracing the history of life. Where paleontologists look back through the fossil record for skeletal precursors of wings and tails, molecular biologists and biophysicists look for telltale relics of DNA in hemoglobin, oncogenes, and all the rest of the library of proteins and enzymes. “There is a molecular archeology in the making,”♦ says Werner Loewenstein. The history of life is written in terms of negative entropy. “What actually evolves is information in all its forms or transforms. If there were something like a guidebook for living creatures, I think, the first line would read like a biblical commandment, Make thy information larger.” No one gene makes an organism. Insects and plants and animals are collectives, communal vehicles, cooperative assemblies of a multitude of genes, each playing its part in the organism’s development. It is a complex ensemble in which each gene interacts with thousands of others in a hierarchy of effects extending through both space and time. The body is a colony of genes. Of course, it acts and moves and procreates as a unit, and furthermore, in the case of at least one species, it feels itself, with impressive certainty, to be a unit. The gene-centered perspective has helped biologists appreciate that the genes composing the human genome are only a fraction of the genes carried around in any one person, because humans (like other species) host an entire ecosystem of microbes—bacteria, especially, from our skin to our digestive systems.
He pointed out that genes are about differences, after all. So he began with a simple counterpoint: might there not be a gene for dyslexia? All we would need in order to establish the existence of a gene for reading is to discover a gene for not reading, say a gene which induced a brain lesion causing specific dyslexia. Such a dyslexic person might be normal and intelligent in all respects except that he could not read. No geneticist would be particularly surprised if this type of dyslexia turned out to breed true in some Mendelian fashion. Obviously, in this event the gene would only exhibit its effect in an environment which included normal education. In a prehistoric environment it might have had no detectable effect, or it might have had some different effect and have been known to cave-dwelling geneticists as, say, a gene for inability to read animal footprints.… It follows from the ordinary conventions of genetic terminology that the wild-type gene at the same locus, the gene that the rest of the population has in double dose, would properly be called a gene “for reading.” If you object to that, you must also object to our speaking of a gene for tallness in Mendel’s peas.… In both cases the character of interest is a difference, and in both cases the difference only shows itself in some specified environment. The reason why something so simple as a one gene difference can have such a complex effect … is basically as follows. However complex a given state of the world may be, the difference between that state of the world and some alternative state of the world may be caused by something extremely simple.
The quavers and crotchets inked on paper are not the music. Music is not a series of pressure waves sounding through the air; nor grooves etched in vinyl or pits burned in CDs; nor even the neuronal symphonies stirred up in the brain of the listener. The music is the information. Likewise, the base pairs of DNA are not genes. They encode genes. Genes themselves are made of bits.
“NOW THROUGH THE VERY UNIVERSALITY of its structures, starting with the code, the biosphere looks like the product of a unique event,”♦ Jacques Monod wrote in 1970. “The universe was not pregnant with life, nor the biosphere with man. Our number came up in the Monte Carlo game. Is it any wonder if, like a person who has just made a million at the casino, we feel a little strange and a little unreal?” Monod, the Parisian biologist who shared the Nobel Prize for working out the role of messenger RNA in the transfer of genetic information, was not alone in thinking of the biosphere as more than a notional place: an entity, composed of all the earth’s life-forms, simple and complex, teeming with information, replicating and evolving, coding from one level of abstraction to the next. This view of life was more abstract—more mathematical—than anything Darwin had imagined, but he would have recognized its basic principles. Natural selection directs the whole show. Now biologists, having absorbed the methods and vocabulary of communications science, went further to make their own contributions to the understanding of information itself. Monod proposed an analogy: Just as the biosphere stands above the world of nonliving matter, so an “abstract kingdom” rises above the biosphere. The denizens of this kingdom? Ideas.
Richard Dawkins made his own connection between the evolution of genes and the evolution of ideas. His essential actor was the replicator, and it scarcely mattered whether replicators were made of nucleic acid. His rule is “All life evolves by the differential survival of replicating entities.” Wherever there is life, there must be replicators. Perhaps on other worlds replicators could arise in a silicon-based chemistry—or in no chemistry at all. What would it mean for a replicator to exist without chemistry? “I think that a new kind of replicator has recently emerged on this planet,”♦ he proclaimed at the end of his first book, in 1976. “It is staring us in the face. It is still in its infancy, still drifting clumsily about in its primeval soup, but already it is achieving evolutionary change at a rate that leaves the old gene panting far behind.” That “soup” is human culture; the vector of transmission is language; and the spawning ground is the brain. For this bodiless replicator itself, Dawkins proposed a name. He called it the meme, and it became his most memorable invention, far more influential than his selfish genes or his later proselytizing against religiosity.
Memes emerge in brains and travel outward, establishing beachheads on paper and celluloid and silicon and anywhere else information can go. They are not to be thought of as elementary particles but as organisms. The number three is not a meme; nor is the color blue, nor any simple thought, any more than a single nucleotide can be a gene. Memes are complex units, distinct and memorable—units with staying power. Also, an object is not a meme.
When Dawkins first floated the meme meme, Nicholas Humphrey, an evolutionary psychologist, said immediately that these entities should be considered “living structures, not just metaphorically but technically”: When you plant a fertile meme in my mind you literally parasitize my brain, turning it into a vehicle for the meme’s propagation in just the way that a virus may parasitize the genetic mechanism of a host cell. And this isn’t just a way of talking—the meme for, say, “belief in life after death” is actually realized physically, millions of times over, as a structure in the nervous systems of individual men the world over.♦
Memes could travel wordlessly even before language was born. Plain mimicry is enough to replicate knowledge—how to chip an arrowhead or start a fire. Among animals, chimpanzees and gorillas are known to acquire behaviors by imitation. Some species of songbirds learn their songs, or at least song variants, after hearing them from neighboring birds
Language serves as culture’s first catalyst. It supersedes mere imitation, spreading knowledge by abstraction and encoding.
Still, most of the elements of culture change and blur too easily to qualify as stable replicators. They are rarely as neatly fixed as a sequence of DNA. Dawkins himself emphasized that he had never imagined founding anything like a new science of memetics. A peer-reviewed Journal of Memetics came to life in 1997—published online, naturally—and then faded away after eight years partly spent in self-conscious debate over status, mission, and terminology. Even compared with genes, memes are hard to mathematize or even to define rigorously. So the gene-meme analogy causes uneasiness and the genetics-memetics analogy even more. Genes at least have a grounding in physical substance. Memes are abstract, intangible, and unmeasurable. Genes replicate with near-perfect fidelity, and evolution depends on that: some variation is essential, but mutations need to be rare. Memes are seldom copied exactly; their boundaries are always fuzzy, and they mutate with a wild flexibility that would be fatal in biology. The term meme could be applied to a suspicious cornucopia of entities, from small to large. For Dennett, the first four notes of Beethoven’s Fifth Symphony were “clearly” a meme, along with Homer’s Odyssey (or at least the idea of the Odyssey), the wheel, anti-Semitism, and writing.♦ “Memes have not yet found their Watson and Crick,” said Dawkins; “they even lack their Mendel.”♦
“Probability, like time, is a concept invented by humans, and humans have to bear the responsibility for the obscurities that attend it.”♦
Ignorance is subjective. It is a quality of the observer. Presumably randomness—if it exists at all—should be a quality of the thing itself.
Researchers have established that human intuition is useless both in predicting randomness and in recognizing it. Humans drift toward pattern willy-nilly. The New York Public Library bought A Million Random Digits and shelved it under Psychology. In 2010 it was still available from Amazon for eighty-one dollars.
Shannon also considered redundancy within a message: the pattern, the regularity, the order that makes a message compressible. The more regularity in a message, the more predictable it is. The more predictable, the more redundant. The more redundant a message is, the less information it contains.
Chaitin proposed a clear answer: a number is not random if it is computable—if a definable computer program will generate it. Thus computability is a measure of randomness.
Algorithms generate patterns. So we can gauge computability by looking at the size of the algorithm. Given a number—represented as a string of any length—we ask, what is the length of the shortest program that will generate it? Using the language of a Turing machine, that question can have a definite answer, measured in bits. Chaitin’s algorithmic definition of randomness also provides an algorithmic definition of information: the size of the algorithm measures how much information a given string contains.
This is what science always seeks: a simple theory that accounts for a large set of facts and allows for prediction of events still to come. It is the famous Occam’s razor. “We are to admit no more causes of natural things than such as are both true and sufficient to explain their appearances,” said Newton, “for nature is pleased with simplicity.”♦ Newton quantified mass and force, but simplicity had to wait.
His Foundations of the Theory of Probability, published in Russian in 1933 and in English in 1950, remains the modern classic. But his interests ranged widely, to physics and linguistics as well as other fast-growing branches of mathematics. Once he made a foray into genetics but drew back after a dangerous run-in with Stalin’s favorite pseudoscientist, Trofim Lysenko. During World War II Kolmogorov applied his efforts to statistical theory in artillery fire and devised a scheme of stochastic distribution of barrage balloons to protect Moscow from Nazi bombers. Apart from his war work, he studied turbulence and random processes. He was a Hero of Socialist Labor and seven times received the Order of Lenin. He first saw Claude Shannon’s Mathematical Theory of Communication rendered into Russian in 1953, purged of its most interesting features by a translator working in Stalin’s heavy shadow. The title became Statistical Theory of Electrical Signal Transmission. The word information, , was everywhere replaced with , data. The word entropy was placed in quotation marks to warn the reader against inferring a connection with entropy in physics.
“At each given moment there is only a fine layer between the ‘trivial’ and the impossible,”♦ Kolmogorov mused in his diary. “Mathematical discoveries are made in this layer.” In the new, quantitative view of information he saw a way to attack a problem that had eluded probability theory, the problem of randomness. How much information is contained in a given “finite object”? An object could be a number (a series of digits) or a message or a set of data. He described three approaches: the combinatorial, the probabilistic, and the algorithmic. The first and second were Shannon’s, with refinements. They focused on the probability of one object among an ensemble of objects—one particular message, say, chosen from a set of possible messages. How would this work, Kolmogorov wondered, when the object was not just a symbol in an alphabet or a lantern in a church window but something big and complicated—a genetic organism, or a work of art? How would one measure the amount of information in Tolstoy’s War and Peace? “Is it possible to include this novel in a reasonable way in the set of ‘all possible novels’ and further to postulate the existence of a certain probability distribution in this set?”♦ he asked. Or could one measure the amount of genetic information in, say, the cuckoo bird by considering a probability distribution in the set of all possible species? His third approach to measuring information—the algorithmic—avoided the difficulties of starting with ensembles of possible objects. It focused on the object itself.♦♦ Kolmogorov introduced a new word for the thing he was trying to measure: complexity. As he defined this term, the complexity of a number, or message, or set of data is the inverse of simplicity and order and, once again, it corresponds to information. The simpler an object is, the less information it conveys. The more complexity, the more information. And, just as Gregory Chaitin did, Kolmogorov put this idea on a solid mathematical footing by calculating complexity in terms of algorithms. The complexity of an object is the size of the smallest computer program needed to generate it. An object that can be produced by a short algorithm has little complexity. On the other hand, an object needing an algorithm every bit as long as the object itself has maximal complexity.
Surely there must be a number about which there is nothing special to say. Wherever it is, there stands a paradox: the number we may describe, interestingly, as “the smallest uninteresting number.” This is none other than Berry’s paradox reborn, the one described by Bertrand Russell in Principia Mathematica. Berry and Russell had devilishly asked, What is the least integer not nameable in fewer than nineteen syllables? Whatever this number is, it can be named in eighteen syllables: the least integer not nameable in fewer than nineteen syllables.
Asking whether a number is interesting is the inverse of asking whether it is random. If the number n can be computed by an algorithm that is relatively short, then n is interesting. If not, it is random.
But if the most concise algorithm for n is “PRINT [n]”—an algorithm incorporating the entire number, with no shorthand—then we may say that there is nothing interesting about n. In Kolmogorov’s terms, this number is random—maximally complex. It will have to be patternless, because any pattern would provide a way to devise a shorthand algorithm. “If there is a small, concise computer program that calculates the number, that means it has some quality or characteristic that enables you to pick it out and to compress it into a smaller algorithmic description,” Chaitin says. “So that’s unusual; that’s an interesting number.”
Instead of “the smallest uninteresting number,” one inevitably encounters a statement in the form of “the smallest number that we can prove cannot be named in fewer than n syllables.” (We are not really talking about syllables any more, of course, but Turing-machine states.)♦ It is another recursive, self-looping twist. This was Chaitin’s version of Gödel’s incompleteness. Complexity, defined in terms of program size, is generally uncomputable. Given an arbitrary string of a million digits, a mathematician knows that it is almost certainly random, complex, and patternless—but cannot be absolutely sure.
Most numbers are random. Yet very few of them can be proved random. A chaotic stream of information may yet hide a simple algorithm. Working backward from the chaos to the algorithm may be impossible. Kolmogorov-Chaitin (KC) complexity is to mathematics what entropy is to thermodynamics: the antidote to perfection. Just as we can have no perpetual-motion machines, there can be no complete formal axiomatic systems. Some mathematical facts are true for no reason. They are accidental, lacking a cause or deeper meaning. Joseph Ford, a physicist studying the behavior of unpredictable dynamical systems in the 1980s, said that Chaitin had “charmingly captured the essence of the matter”♦ by showing the path from Gödel’s incompleteness to chaos. This was the “deeper meaning of chaos,” Ford declared: Chaotic orbits exist but they are Gödel’s children, so complex, so overladen with information that humans can never comprehend them. But chaos is ubiquitous in nature; therefore the universe is filled with countless mysteries that man can never understand.
The first of these, now called Shannon-Fano coding, came from his colleague Robert M. Fano. It began with the simple idea of assigning short codes to frequent symbols, as in Morse code. They knew their method was not optimal, however: it could not be relied on to produce the shortest possible messages. Within three years it was surpassed by work of a graduate student of Fano’s at MIT, David Huffman. In the decades since, versions of the Huffman coding algorithm have squeezed many, many bytes. Ray Solomonoff, a child of Russian immigrants who studied at the University of Chicago, encountered Shannon’s work in the early 1950s and began thinking about what he called the Information Packing Problem: how much information could one “pack” into a given number of bits, or conversely, given some information, how could one pack it into the fewest possible bits.♦ He had majored in physics, studied mathematical biology and probability and logic on the side, and gotten to know Marvin Minsky and John McCarthy, pioneers in what would soon be called artificial intelligence. Then he read Noam Chomsky’s offbeat and original paper “Three Models for the Description of Language,”♦ applying the new information-theoretic ideas to the formalization of structure in language. All this was bouncing around in Solomonoff’s mind; he was not sure where it led, but he found himself focusing on the problem of induction. How do people create theories to account for their experience of the world? They have to make generalizations, find patterns in data that are always influenced by randomness and noise. Could one enable a machine to do that? In other words, could a computer be made to learn from experience?
Solomonoff, Kolmogorov, and Chaitin tackled three different problems and came up with the same answer. Solomonoff was interested in inductive inference: given a sequence of observations, how can one make the best predictions about what will come next? Kolmogorov was looking for a mathematical definition of randomness: what does it mean to say that one sequence is more random than another, when they have the same probability of emerging from a series of coin flips? And Chaitin was trying to find a deep path into Gödel incompleteness by way of Turing and Shannon—as he said later, “putting Shannon’s information theory and Turing’s computability theory into a cocktail shaker and shaking vigorously.”♦ They all arrived at minimal program size. And they all ended up talking about complexity.
Only a wholly random sequence remains incompressible: nothing but one surprise after another. Random sequences are “normal”—a term of art meaning that on average, in the long run, each digit appears exactly as often as the others, one time in ten; and each pair of digits, from 00 to 99, appears one time in a hundred; and each triplet likewise, and so on. No string of any particular length is more likely to appear than any other string of that length. Normality is one of those simple-seeming ideas that, when mathematicians look closely, turn out to be covered with thorns. Even though a truly random sequence must be normal, the reverse is not necessarily the case.
According to this measure, a million zeroes and a million coin tosses lie at opposite ends of the spectrum. The empty string is as simple as can be; the random string is maximally complex. The zeroes convey no information; coin tosses produce the most information possible. Yet these extremes have something in common. They are dull. They have no value. If either one were a message from another galaxy, we would attribute no intelligence to the sender. If they were music, they would be equally worthless.
Chaitin and a colleague, Charles H. Bennett, sometimes discussed these matters at IBM’s research center in Yorktown Heights, New York. Over a period of years, Bennett developed a new measure of value, which he called “logical depth.” Bennett’s idea of depth is connected to complexity but orthogonal to it. It is meant to capture the usefulness of a message, whatever usefulness might mean in any particular domain. “From the earliest days of information theory it has been appreciated that information per se is not a good measure of message value,”♦ he wrote, finally publishing his scheme in 1988. A typical sequence of coin tosses has high information content but little value; an ephemeris, giving the positions of the moon and planets every day for a hundred years, has no more information than the equations of motion and initial conditions from which it was calculated, but saves its owner the effort of recalculating these positions.
the value of a message lies in “what might be called its buried redundancy—parts predictable only with difficulty, things the receiver could in principle have figured out without being told, but only at considerable cost in money, time, or computation.” When we value an object’s complexity, or its information content, we are sensing a lengthy hidden computation.
What is the physical cost of logical work? “Computers,” he wrote provocatively, “may be thought of as engines for transforming free energy into waste heat and mathematical work.”♦ Entropy surfaced again. A tape full of zeroes, or a tape encoding the works of Shakespeare, or a tape rehearsing the digits of Π, has “fuel value.” A random tape has none.
“Information Is Physical” was the title of one famous paper, meant to remind the community that computation requires physical objects and obeys the laws of physics. Lest anyone forget, he titled a later essay—his last, it turned out—“Information Is Inevitably Physical.” Whether a bit is a mark on a stone tablet or a hole in a punched card or a particle with spin up or down, he insisted that it could not exist without some embodiment. Landauer tried in 1961 to prove von Neumann’s formula for the cost of information processing and discovered that he could not. On the contrary, it seemed that most logical operations have no entropy cost at all. When a bit flips from zero to one, or vice-versa, the information is preserved. The process is reversible. Entropy is unchanged; no heat needs to be dissipated. Only an irreversible operation, he argued, increases entropy.
This imperfect distinguishability is what gives quantum physics its dreamlike character: the inability to observe systems without disturbing them; the inability to clone quantum objects or broadcast them to many listeners. The qubit has this dreamlike character, too. It is not just either-or. Its 0 and 1 values are represented by quantum states that can be reliably distinguished—for example, horizontal and vertical polarizations—but coexisting with these are the whole continuum of intermediate states, such as diagonal polarizations, that lean toward 0 or 1 with different probabilities. So a physicist says that a qubit is a superposition of states; a combination of probability amplitudes. It is a determinate thing with a cloud of indeterminacy living inside. But the qubit is not a muddle; a superposition is not a hodgepodge but a combining of probabilistic elements according to clear and elegant mathematical rules.
This happens to be the key to cracking the most widespread cryptographic algorithms in use today, particularly RSA encryption.♦ The world’s Internet commerce depends on it. In effect, the very large number is a public key used to encrypt a message; if eavesdroppers can figure out its prime factors (also large), they can decipher the message. But whereas multiplying a pair of large prime numbers is easy, the inverse is exceedingly difficult. The procedure is an informational one-way street.
“Quantum computers were basically a revolution,”♦ Dorit Aharonov of Hebrew University told an audience in 2009. “The revolution was launched into the air by Shor’s algorithm. But the reason for the revolution—other than the amazing practical implications—is that they redefine what is an easy and what is a hard problem.”
This is the challenge that remains, and not just for scientists: the establishment of meaning.
Suppose within every book there is another book, and within every letter on every page another volume constantly unfolding; but these volumes take no space on the desk. Suppose knowledge could be reduced to a quintessence, held within a picture, a sign, held within a place which is no place. —Hilary Mantel (2009)♦ “THE UNIVERSE (which others call the Library)…”♦ Thus Jorge Luis Borges began his 1941 story “The Library of Babel,” about the mythical library that contains all books, in all languages, books of apology and prophecy, the gospel and the commentary upon that gospel and the commentary upon the commentary upon the gospel, the minutely detailed history of the future, the interpolations of all books in all other books, the faithful catalogue of the library and the innumerable false catalogues. This library (which others call the universe) enshrines all the information. Yet no knowledge can be discovered there, precisely because all knowledge is there, shelved side by side with all falsehood. In the mirrored galleries, on the countless shelves, can be found everything and nothing. There can be no more perfect case of information glut.
As the free, amateur, collaborative online encyclopedia called Wikipedia began to overtake all the world’s printed encyclopedias in volume and comprehensiveness, the editors realized that too many names had multiple identities. They worked out a disambiguation policy, which led to the creation of disambiguation pages—a hundred thousand and more. For example, a user foraging in Wikipedia’s labyrinthine galleries for “Babel” finds “Babel (disambiguation),” which leads in turn to the Hebrew name for ancient Babylon, to the Tower of Babel, to an Iraqi newspaper, a book by Patti Smith, a Soviet journalist, an Australian language teachers’ journal, a film, a record label, an island in Australia, two different mountains in Canada, and “a neutrally aligned planet in the fictional Star Trek universe.” And more. The paths of disambiguation fork again and again. For example, “Tower of Babel (disambiguation)” lists, besides the story in the Old Testament, songs, games, books, a Brueghel painting, an Escher woodcut, and “the tarot card.” We have made many towers of Babel. Long before Wikipedia, Borges also wrote about the encyclopedia “fallaciously called The Anglo-American Cyclopedia (New York, 1917),” a warren of fiction mingling with fact, another hall of mirrors and misprints, a compendium of pure and impure information that projects its own world. That world is called Tlön. “It is conjectured that this brave new world is the work of a secret society of astronomers, biologists, engineers, metaphysicians, poets, chemists, algebraists, moralists, painters, geometers.…”♦ writes Borges. “This plan is so vast that each writer’s contribution is infinitesimal. At first it was believed that Tlön was a mere chaos, an irresponsible license of the imagination; now it is known that it is a cosmos.” With good reason, the Argentine master has been taken up as a prophet (“our heresiarch uncle,”♦ William Gibson says) by another generation of writers in the age of information.
To dramatize his perfect determinism, Laplace asked us to imagine a being—an “intelligence”—capable of perfect knowledge: It would embrace in the same formula the movements of the greatest bodies of the universe and those of the lightest atom; for it, nothing would be uncertain and the future, as the past, would be present to its eyes.♦ Nothing else Laplace wrote ever became as famous as this thought experiment. It rendered useless not only God’s will but Man’s. To scientists this extreme Newtonianism seemed cause for optimism. To Babbage, all nature suddenly resembled a vast calculating engine, a grand version of his own deterministic machine: “In turning our views from these simple consequences of the juxtaposition of a few wheels, it is impossible not to perceive the parallel reasoning, as applied to the mighty and far more complex phenomena of nature.”
By painting or drawing, an artist—with skill, training, and long labor—reconstructs what the eye might see. By contrast, a daguerreotype is in some sense the thing itself—the information, stored, in an instant. It was unimaginable, but there it was. The possibilities made the mind reel. Once storage began, where would it stop? An American essayist immediately connected photography to Babbage’s atmospheric library of sounds: Babbage said that every word was registered somewhere in the air, so perhaps every image, too, left its permanent mark—somewhere. In fact, there is a great album of Babel. But what too, if the great business of the sun be to act registrar likewise, and to give out impressions of our looks, and pictures of our actions; and so … for all we know to the contrary, other worlds may be peopled and conducted with the images of persons and transactions thrown off from this and from each other; the whole universal nature being nothing more than phonetic and photogenic structures.♦ The universe, which others called a library or an album, then came to resemble a computer. Alan Turing may have noticed this first: observing that the computer, like the universe, is best seen as a collection of states, and the state of the machine at any instant leads to the state at the next instant, and thus all the future of the machine should be predictable from its initial state and its input signals. The universe is computing its own destiny.
Turing noticed that Laplace’s dream of perfection might be possible in a machine but not in the universe, because of a phenomenon which, a generation later, would be discovered by chaos theorists and named the butterfly effect. Turing described it this way in 1950: The system of the “universe as a whole” is such that quite small errors in initial conditions can have an overwhelming effect at a later time. The displacement of a single electron by a billionth of a centimetre at one moment might make the difference between a man being killed by an avalanche a year later, or escaping.♦
“On Wikipedia, there is a giant conspiracy attempting to have articles agree with reality.” This is about right. A conspiracy is all the Wikipedians can hope for, and often it is enough. Lewis Carroll, near the end of the nineteenth century, described in fiction the ultimate map, representing the world on a unitary scale, a mile to a mile: “It has never been spread out, yet. The farmers objected: they said it would cover the whole country, and shut out the sunlight.”
Wikipedia evolves dendritically, sending off new shoots in many directions. (In this it resembles the universe.) So deletionism and inclusionism spawn mergism and incrementalism. They lead to factionalism, and the factions fission into Associations of Deletionist Wikipedians and Inclusionist Wikipedians side by side with the Association of Wikipedians Who Dislike Making Broad Judgments About the Worthiness of a General Category of Article, and Who Are in Favor of the Deletion of Some Particularly Bad Articles, but That Doesn’t Mean They Are Deletionists. Wales worried particularly about Biographies of Living Persons.
He suggested calling it Deletopedia. “It would have much to tell us over time.” On the principle that nothing online ever perishes, Deletionpedia was created shortly thereafter, and it has grown by degrees. The Port Macquarie Presbyterian Church lives on there, though it is not, strictly speaking, part of the encyclopedia. Which some call the universe.
A useful term of art emerged from computer science: namespace, a realm within which all names are distinct and unique. The world has long had namespaces based on geography and other namespaces based on economic niche.
It is no coincidence that the spectacular naming triumphs of cyberspace verge on nonsense: Yahoo!, Google, Twitter. The Internet is not just a churner of namespaces; it is also a namespace of its own. Navigation around the globe’s computer networks relies on the special system of domain names, like COCA-COLA.COM. These names are actually addresses, in the modern sense of that word: “a register, location, or a device where information is stored.” The text encodes numbers; the numbers point to places in cyberspace, branching down networks, subnetworks, and devices. Although they are code, these brief text fragments also carry the great weight of meaning in the most vast of namespaces. They blend together features of trademarks, vanity license plates, postal codes, radio-station call letters, and graffiti. Like the telegraph code names, anyone could register a domain name, for a small fee, beginning in 1993. It was first come, first served. The demand exceeds the supply.
A more familiar metaphor is the cloud. All that information—all that information capacity—looms over us, not quite visible, not quite tangible, but awfully real; amorphous, spectral; hovering nearby, yet not situated in any one place. Heaven must once have felt this way to the faithful. People talk about shifting their lives to the cloud—their informational lives, at least.
It is finally natural—even inevitable—to ask how much information is in the universe. It is the consequence of Charles Babbage and Edgar Allan Poe saying, “No thought can perish.” Seth Lloyd does the math. He is a moon-faced, bespectacled quantum engineer at MIT, a theorist and designer of quantum computers. The universe, by existing, registers information, he says. By evolving in time, it processes information. How much? To figure that out, Lloyd takes into account how fast this “computer” works and how long it has been working. Considering the fundamental limit on speed, operations per second (“where E is the system’s average energy above the ground state and = 1.0545 × 10−34 joule-sec is Planck’s reduced constant”), and on memory space, limited by entropy to S/kB ln 2 (“where S is the system’s thermodynamic entropy and kB = 1.38 × 10−23 joules/K is Boltzmann’s constant”), along with the speed of light and the age of the universe since the Big Bang, Lloyd calculates that the universe can have performed something on the order of 10120 “ops” in its entire history.♦ Considering “every degree of freedom of every particle in the universe,” it could now hold something like 1090 bits. And counting.
She gave credit to Marshall McLuhan, whose Gutenberg Galaxy had appeared in 1962, for forcing them to refocus their gaze. In the age of scribes, the culture had only primitive reckonings of chronology: muddled timelines counted the generations from Adam, or Noah, or Romulus and Remus. “Attitudes toward historical change,” she wrote, “will be found only occasionally in writings ostensibly devoted to ‘history’ and often have to be read into such writings. They must also be read into sagas and epics, sacred scriptures, funerary inscriptions, glyphs and ciphers, vast stone monuments, documents locked in chests in muniment rooms, and marginal notations on manuscript.”♦ The sense of when we are—the ability to see the past spread out before one; the internalization of mental time charts; the appreciation of anachronism—came with the shift to print. As a duplicating machine, the printing press not only made texts cheaper and more accessible; its real power was to make them stable. “Scribal culture,” Eisenstein wrote, was “constantly enfeebled by erosion, corruption, and loss.”♦ Print was trustworthy, reliable, and permanent.♦ When Tycho Brahe spent his countless hours poring over planetary and star tables, he could count on others checking the same tables, now and in the future. When Kepler computed his own far more accurate catalogue, he was leveraging the tables of logarithms published by Napier. Meanwhile, print shops were not only spreading Martin Luther’s theses but, more important, the Bible itself. The revolution of Protestantism hinged more on Bible reading than on any point of doctrine—print overcoming script; the codex supplanting the scroll; and the vernacular replacing the ancient languages.
“Overloading of circuits” was a fairly new metaphor to express a sensation—too much information—that felt new. It had always felt new. One hungers for books; rereads a cherished few; begs or borrows more; waits at the library door, and perhaps, in the blink of an eye, finds oneself in a state of surfeit: too much to read. In 1621 the Oxford scholar Robert Burton (who amassed one of the world’s largest private libraries, 1,700 books, but never a thesaurus) gave voice to the feeling: I hear new news every day, and those ordinary rumours of war, plagues, fires, inundations, thefts, murders, massacres, meteors, comets, spectrums, prodigies, apparitions, of towns taken, cities besieged in France, Germany, Turkey, Persia, Poland, &c. daily musters and preparations, and such like, which these tempestuous times afford, battles fought, so many men slain, monomachies, shipwrecks, piracies, and sea-fights, peace, leagues, stratagems, and fresh alarms. A vast confusion of vows, wishes, actions, edicts, petitions, lawsuits, pleas, laws, proclamations, complaints, grievances are daily brought to our ears. New books every day, pamphlets, currantoes, stories, whole catalogues of volumes of all sorts, new paradoxes, opinions, schisms, heresies, controversies in philosophy, religion, &c. Now come tidings of weddings, maskings, mummeries, entertainments, jubilees, embassies, tilts and tournaments, trophies, triumphs, revels, sports, plays: then again, as in a new shifted scene, treasons, cheating tricks, robberies, enormous villanies in all kinds, funerals, burials, deaths of Princes, new discoveries, expeditions; now comical then tragical matters. To-day we hear of new Lords and officers created, to-morrow of some great men deposed, and then again of fresh honours conferred; one is let loose, another imprisoned; one purchaseth, another breaketh: he thrives, his neighbour turns bankrupt; now plenty, then again dearth and famine; one runs, another rides, wrangles, laughs, weeps &c. Thus I daily hear, and such like.♦ He thought information glut was new then. He was not complaining; just amazed.
Deluge became a common metaphor for people describing information surfeit. There is a sensation of drowning: information as a rising, churning flood. Or it calls to mind bombardment, data impinging in a series of blows, from all sides, too fast. Fear of the cacophony of voices can have a religious motivation, a worry about secular noise overwhelming the truth. T. S. Eliot expressed that in 1934: Knowledge of speech, but not of silence; Knowledge of words, and ignorance of the Word. All our knowledge brings us nearer to our ignorance, All our ignorance brings us nearer to death, But nearness to death no nearer to GOD.
David Foster Wallace had a more ominous name for this modern condition: Total Noise. “The tsunami of available fact, context, and perspective”♦—that, he wrote in 2007, constitutes Total Noise. He talked about the sensation of drowning and also of a loss of autonomy, of personal responsibility for being informed.
Another way to speak of the anxiety is in terms of the gap between information and knowledge. A barrage of data so often fails to tell us what we need to know. Knowledge, in turn, does not guarantee enlightenment or wisdom. (Eliot said that, too: “Where is the wisdom we have lost in knowledge? / Where is the knowledge we have lost in information?”) It is an ancient observation, but one that seemed to bear restating when information became plentiful—particularly in a world where all bits are created equal and information is divorced from meaning. The humanist and philosopher of technology Lewis Mumford, for example, restated it in 1970: “Unfortunately, ‘information retrieving,’ however swift, is no substitute for discovering by direct personal inspection knowledge whose very existence one had possibly never been aware of, and following it at one’s own pace through the further ramification of relevant literature.”♦ He begged for a return to “moral self-discipline.” There is a whiff of nostalgia in this sort of warning, along with an undeniable truth: that in the pursuit of knowledge, slower can be better. Exploring the crowded stacks of musty libraries has its own rewards. Reading—even browsing—an old book can yield sustenance denied by a database search. Patience is a virtue, gluttony a sin. Even in 1970, however, Mumford was not thinking about databases or any of the electronic technologies that loomed. He complained about “the multiplication of microfilms.” He also complained about too many books. Without “self-imposed restraints,” he warned, “the overproduction of books will bring about a state of intellectual enervation and depletion hardly to be distinguished from massive ignorance.” Restraints were not imposed. Titles continue to multiply. Books about information glut join the cornucopia;
One worker in the area was Siegfried Streufert, who reported in a series of papers in the 1960s that the relation between information load and information handling typically looked like an “inverted U”: more information was helpful at first, then not so helpful, and then actually harmful.
Strategies emerge for coping. There are many, but in essence they all boil down to two: filter and search. The harassed consumer of information turns to filters to separate the metal from the dross; filters include blogs and aggregators—the choice raises issues of trust and taste. The need for filters intrudes on any thought experiment about the wonders of abundant information.
Filters would be needed—editors and critics. “They flourish because of the short supply and limited capacity of minds, whatever the transmission media between minds.” When information is cheap, attention becomes expensive. For the same reason, mechanisms of search—engines, in cyberspace—find needles in haystacks. By now we’ve learned that it is not enough for information to exist.
Even Wikipedia is a combination of the two: powerful search, mainly driven by Google, and a vast, collaborative filter, striving to gather the true facts and screen out the false ones. Searching and filtering are all that stand between this world and the Library of Babel.
When Robert Burton held forth on all his “new news every day,” his “new paradoxes, opinions, schisms, heresies, controversies in philosophy, religion, &c,” it was by way of justifying his life’s great project, The Anatomy of Melancholy, a rambling compendium of all previous knowledge. Four centuries earlier, the Dominican monk Vincent of Beauvais tried to set down his own version of everything that was known, creating one of the first medieval encyclopedias, Speculum Maius, “The Great Mirror”—his manuscripts organized into eighty books, 9,885 chapters. His justification: “The multitude of books, the shortness of time and the slipperiness of memory do not allow all things which are written to be equally retained in the mind.”
“The perception of an overabundance of books fueled the production of many more books.”
Once again, as in the first days of the telegraph, we speak of the annihilation of space and time. For McLuhan this was prerequisite to the creation of global consciousness—global knowing. “Today,” he wrote, “we have extended our central nervous systems in a global embrace, abolishing both space and time as far as our planet is concerned. Rapidly, we approach the final phase of the extensions of man—the technological simulation of consciousness, when the creative process of knowing will be collectively and corporately extended to the whole of human society.”♦ Walt Whitman had said it better a century before: What whispers are these O lands, running ahead of you, passing under the seas? Are all nations communing? is there going to be but one heart to the globe?♦
His friend the Jesuit philosopher Pierre Teilhard de Chardin did even more to promote the noosphere, which he called a “new skin” on the earth: Does it not seem as though a great body is in the process of being born—with its limbs, its nervous system, its centers of perception, its memory—the very body of that great something to come which was to fulfill the aspirations that had been aroused in the reflective being by the freshly acquired consciousness of its interdependence with and responsibility for a whole in evolution?♦ That was a mouthful even in French, and less mystical spirits considered it bunkum (“nonsense, tricked out with a variety of tedious metaphysical conceits,”♦ judged Peter Medawar), but many people were testing the same idea, not least among them the writers of science fiction.♦ Internet pioneers a half century later liked it, too. H. G. Wells was known for his science fiction, but it was as a purposeful social critic that he published a little book in 1938, late in his life, with the title World Brain. There was nothing fanciful about what he wanted to promote: an improved educational system throughout the whole “body” of humanity.
For that matter, he said, “It might have the form of a network.” It is not the amount of knowledge that makes a brain. It is not even the distribution of knowledge. It is the interconnectedness. When Wells used the word network—a word he liked very much—it retained its original, physical meaning for him, as it would for anyone in his time. He visualized threads or wires interlacing: “A network of marvellously gnarled and twisted stems bearing little leaves and blossoms”; “an intricate network of wires and cables.”♦ For us that sense is almost lost; a network is an abstract object, and its domain is information.
Epistemologists cared about knowledge, not beeps and signals. No one would have bothered to make a philosophy of dots and dashes or puffs of smoke or electrical impulses. It takes a human—or, let’s say, a “cognitive agent”—to take a signal and turn it into information. “Beauty is in the eye of the beholder, and information is in the head of the receiver,”♦ says Fred Dretske. At any rate that is a common view, in epistemology—that “we invest stimuli with meaning, and apart from such investment, they are informationally barren.”
That is not the world I see. It was once thought that a perfect language should have an exact one-to-one correspondence between words and their meanings. There should be no ambiguity, no vagueness, no confusion. Our earthly Babel is a falling off from the lost speech of Eden: a catastrophe and a punishment. “I imagine,” writes the novelist Dexter Palmer, “that the entries of the dictionary that lies on the desk in God’s study must have one-to-one correspondences between the words and their definitions, so that when God sends directives to his angels, they are completely free from ambiguity. Each sentence that He speaks or writes must be perfect, and therefore a miracle.”♦ We know better now. With or without God, there is no perfect language. Leibniz thought that if natural language could not be perfect, at least the calculus could: a language of symbols rigorously assigned. “All human thoughts might be entirely resolvable into a small number of thoughts considered as primitive.”♦ These could then be combined and dissected mechanically, as it were. “Once this had been done, whoever uses such characters would either never make an error, or, at least, would have the possibility of immediately recognizing his mistakes, by using the simplest of tests.” Gödel ended that dream. On the contrary, the idea of perfection is contrary to the nature of language. Information theory has helped us understand that—or, if you are a pessimist, forced us to understand it. “We are forced to see,” Palmer continues, that words are not themselves ideas, but merely strings of ink marks; we see that sounds are nothing more than waves. In a modern age without an Author looking down on us from heaven, language is not a thing of definite certainty, but infinite possibility; without the comforting illusion of meaningful order we have no choice but to stare into the face of meaningless disorder; without the feeling that meaning can be certain, we find ourselves overwhelmed by all the things that words might mean. Infinite possibility is good, not bad. Meaningless disorder is to be challenged, not feared. Language maps a boundless world of objects and sensations and combinations onto a finite space. The world changes, always mixing the static with the ephemeral, and we know that language changes,
Margaret Atwood, a master of a longer form, said she had been “sucked into the Twittersphere like Alice down the rabbit hole.” Is it signaling, like telegraphs? Is it Zen poetry? Is it jokes scribbled on the washroom wall? Is it John Hearts Mary carved on a tree? Let’s just say it’s communication, and communication is something human beings like to do.♦
MIT established a Center for Collective Intelligence, devoted to finding group wisdom and “harnessing” it. It remains difficult to know when and how much to trust the wisdom of crowds—the title of a 2004 book by James Surowiecki, to be distinguished from the madness of crowds as chronicled in 1841 by Charles Mackay, who declared that people “go mad in herds, while they recover their senses slowly, and one by one.”♦ Crowds turn all too quickly into mobs, with their time-honored manifestations: manias, bubbles, lynch mobs, flash mobs, crusades, mass hysteria, herd mentality, goose-stepping, conformity, groupthink—all potentially magnified by network effects and studied under the rubric of information cascades. Collective judgment has appealing possibilities; collective self-deception and collective evil have already left a cataclysmic record. But knowledge in the network is different from group decision making based on copying and parroting. It seems to develop by accretion; it can give full weight to quirks and exceptions; the challenge is to recognize it and gain access to it.
Then came Google. Brin and Page moved their fledgling company from their Stanford dorm rooms into offices in 1998. Their idea was that cyberspace possessed a form of self-knowledge, inherent in the links from one page to another, and that a search engine could exploit this knowledge. As other scientists had done before, they visualized the Internet as a graph, with nodes and links: by early 1998, 150 million nodes joined by almost 2 billion links. They considered each link as an expression of value—a recommendation. And they recognized that all links are not equal. They invented a recursive way of reckoning value: the rank of a page depends on the value of its incoming links; the value of a link depends on the rank of its containing page. Not only did they invent it, they published it. Letting the Internet know how Google worked did not hurt Google’s ability to leverage the Internet’s knowledge.
The science of networks had many origins and evolved along many paths, from pure mathematics to sociology, but it crystallized in the summer of 1998, with the publication of a letter to Nature from Duncan Watts and Steven Strogatz. The letter had three things that combined to make it a sensation: a vivid catchphrase, a nice result, and a surprising assortment of applications. It helped that one of the applications was All the World’s People. The catchphrase was small world.
The network has a structure, and that structure stands upon a paradox. Everything is close, and everything is far, at the same time. This is why cyberspace can feel not just crowded but lonely. You can drop a stone into a well and never hear a splash. No deus ex machina waits in the wings; no man behind the curtain. We have no Maxwell’s demon to help us filter and search. “We want the Demon, you see,” wrote Stanislaw Lem, “to extract from the dance of atoms only information that is genuine, like mathematical theorems, fashion magazines, blueprints, historical chronicles, or a recipe for ion crumpets, or how to clean and iron a suit of asbestos, and poetry too, and scientific advice, and almanacs, and calendars, and secret documents, and everything that ever appeared in any newspaper in the Universe, and telephone books of the future.”♦ As ever, it is the choice that informs us (in the original sense of that word).