10 minute read
PREDICTIVE TEXT
Predictive Text
Communication, possibility and control through the lens of Information Theory
Advertisement
THE FOREHEAD WAS ROUND AND SMOOTH, AND THERE WAS A CURIOUS BUMP AT THE BACK OF HIS HEAD, COV- ERED BY CURLING HAI_
With high certainty, we could say that the final missing letter is “r,” which would spell hair—curling hair. With some degree of certainty (though less this time) we could guess the first letter of the next word. It looks like “hair” may have been the terminal word of its sentence, so the probability of the next letter after “r” being some particular letter, say “o” or “a,” is markedly less than the probability of “r” following “i” to spell “hair.” The next sentence could go anywhere. Who knows what its author is capable of—how erratic their style is, what arc the passage will take on, etc. With all of these questions open, but with the helpful assumption that character choice is limited to the English alphabet, the probability that the next letter is any particular letter, say “o” or “a,” is somewhere around one in 26. Familiarity with the English language dictates that “x” is a pretty unlikely candidate, so maybe that figure is closer to one in 25. Supposing we had a few
more sentences given, our sense of the author’s style and direction more solid, we could go on adjusting our approximations of the probability of any particular letter appearing next.
If you’ve ever written or read anything, spoken or been spoken to, this mode of receiving a sentence—that is, as a question of probabilities—probably felt unnatural. And not just letters in language; we could inspect the color value of each consecutive pixel in an image, for instance. In the broadest sense, we could encounter any digital communication from one person to another as a construction of individual symbols appearing probabilistically. This logic, arbitrary as it seems, is the central tenet of information theory, a field of mathematics at the core of all digital communication technologies. It is precisely this logic that governs the translation of communications (text, images, etc.) into the 1s and 0s passing immaterially from machine to machine. And so it is the logic beneath the vast communication webs that link us, of the digital economies that entangle us. Its limits are, perhaps, the limits of social and economic life in the digital realm. The structures of control in communication—and on the internet in particular—relate to this logic, opposing and constraining the uncertainty that it describes.
+++
Information theory (IT) emerged with the publication of Claude Shannon’s 1948 article, A Mathematical Theory of Communication. The article came out of Shannon’s wartime cryptography research at Bell Labs, AT&T’s former research laboratories. The probabilistic approach to communication—IT’s big innovation—led to a somewhat counterintuitive definition of “information.” It’s a technical definition that has to do with coding schemas—the rules by which data is translated into binary—but the key takeaway for our purposes is an inverse relation of information with likelihood. According to this definition, the observation of an unlikely event has high information, while the observation of a likely event has low information. Back to our “curling hair” example: there would be very little information contained in the event that the next letter is “r,” but if we observe that the next letter is actually “q,” our observation would be very “informative,” as it were.
This idea of information may make more sense from a machine’s perspective, if we can imagine it. The machine doesn’t care much about coherence or meaning per se. Its approach to the world, to the signals it receives, is far too rigid and methodical to accommodate a sense of “meaning” (at least not until AI came along, but that’s a whole other conversation). The machine is interested, first and foremost, in how predictable something is. A likely event is a more predictable event, and the more predictable the event, the less the machine learns anything new about the world from that event, roughly speaking. If something happens that one expects to happen, in a sense one hasn’t gained new information by observing that thing happening.
In the introduction to Communication, Shannon distills his motivating challenge, what he calls the “fundamental problem of communication,” as “that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning.” Frequently, but not necessarily. The inverse relationship of likelihood to information, as Shannon defined it, sets up a paradox, a dichotomy of meaning and information. If a more likely choice of letter, given the preceding sequence of letters, is a choice which tends to make more sense semantically (a real word usually makes more sense than a nonexistent one), Shannon tells us that the more informative letter choice is often one that makes less sense—or, at the very least, the most informative letter choice need not make sense.
As the thought experiment from the point of view of the machine suggested, information theory affirms no particular commitment to language as a medium for coherent thoughts or to communication as a medium for meaning. At first glance, a probabilistic view of communication seems to imagine its building blocks as fundamentally random. An information theoretic study of language might take a text bank (a bunch of books, for instance) as its data set and analyze the frequency of each letter (or the frequency of a letter when preceded by a particular letter, or the frequency of a letter when followed by a particular letter, etc.). A hypothetical speaker would thus choose letters randomly in accordance with the probabilities estimated by those frequencies, as though the speaker’s choice of each letter were determined by a (weighted) dice roll. By a sort of circular logic, information theory, as applied to text and speech, seems to substitute the culmination of the cognitive processes behind language in place of their origin.
But maybe this is unfair to Shannon and his theory. Perfect and complete knowledge of a system, of all causal chains producing observable effects in that system, would make a probabilistic approach unnecessary. Why consider the probabilities of each of several outcomes if we can predict the actual outcome with 100% certainty? To approach human speech/ text in terms of probabilities, then, is not so much to make an unfounded assumption about the cognitive processes behind language, but to humbly acknowledge the limits of knowledge: to preface the entire project of information theory in its application to language with an admission that much is unknown about how the human brain works. Speech’s determinants and their paths of cause and effect are a mystery, the theory seems to say, so the best we can do is suppose that sentences, words, and letters are chosen at random.
+++
Information theory borrows a crucial bit of terminology from thermodynamics: entropy. IT’s entropy and thermodynamics’ entropy don’t have much in common besides a vague resemblance in their mathematical definitions. Entropy in the IT context is the average (“expected,” to be mathematically rigorous) information content taken over the entire set of possible symbols. In short, it is a measure of the “unpredictability” or “uncertainty” of a sequence of symbols (letters) constructed out of a given symbol set (alphabet). To illustrate this, suppose hypothetically that in all English text, the letter “a” occurs 99% of the time. In this scenario, the English alphabet would have very little entropy; that is, each letter is highly predictable, since it is almost certainly
“a” every time. An entropic, highly unpredictable alphabet, on the other hand, is one whose letters all occur with equal frequency and in no discernable pattern, so that no one letter is more likely to occur in a passage of text than any other.
When encoding an alphabet in binary, information theory provides a limit on the efficiency of any coding scheme on that alphabet, where ‘efficiency’ refers to the average (expected) length of binary code needed to encode a letter of the given alphabet. A message encoded in binary is at least as ‘lengthy’ and inefficient as the entropy of the alphabet is large. This ‘entropic limit’ sets up an opposition between efficiency in the communication of a message and the unpredictability of that message—the latter as a constraint on the former. This opposition lurks beneath all modern communication technologies, all systems by which data is sent and received. It is a fundamental bound on the efficiency of digital communications in terms of message encoding, and thus suggests a limit on the functioning of digital economies—it is everywhere and all the time.
Digital economies (and all economies, for that matter), in pursuit of efficiency and profitability, live and evolve in tension with uncertainty. All the time, new technologies emerge that, while opening up new connections and modes of (inter)action on the internet, tend to constrain the terms of those connections by preforming and predetermining the messages sent through them. Common experience with the trademark structures of the digital economy and internet communications are instructive here. Algorithms—dictating recommended-for-you media content or search results on the Web, for instance—shape a user’s tastes, interests, and ideas, influencing online behavior in turn. Online media platforms give us a finite set of preformed options for engagement with media. These examples of controls on online behavior are designs of a tech oligarchy, to be sure, but constraints on possibility and unpredictability also persist in free-floating, dispersed forms, which are not necessarily machinations of a Silicon Valley regime. In particular, the same tension between proliferation and restriction of possibility plays out on the less visible level of protocol, described by media and communications scholar Alexander Galloway as the decentralized control structures underlying the internet. Like their etymological predecessors in diplomatic negotiations, they are not imposed from above but emerge as norms from horizontal interaction between people. Galloway likens protocol in technical form to the interstate highway system: “many different combinations of roads are available to a person driving from point A to point B. However, en route one is compelled to stop at red lights, stay between the white lines, follow a reasonably direct path, and so on.” Thus, the countlessly many permutations of connection, an unbounded set of possibilities, are reigned in under protocol’s control. Connections multiply in number while their substance—the digital communications they transmit—are controlled and, to some extent, predetermined.
The sense that the internet, once billed as a revolutionary democratic alternative to top-down media structures, fell short of that promise points to a preference for the predictable and to the structures of control—imposed from above and materializing from within—of an ostensibly horizontal, distributed, non-hierarchical space. To be clear, it’s beyond the scope of this article to prove that the limit on efficiency, as information theory articulates, is the ultimate source of this control. It may not literally be the entropic limit which directly produces these real constraints. But we can at least pay closer attention to the ways these constraints appear concomitantly with information theory’s entropic limit on efficiency. We can ask how this limit exists in relation—as synecdoche, as root cause, or somehow else—to the digital economy’s natural antagonism toward unpredictability and preference for certainty.
+++
Shannon chose to approximate language and communication as a game of probability, perhaps out of a humble lack of knowledge of the cognitive processes that give thought a symbolic form, but this approach leaves much to be desired. If science seeks incessantly to master the puzzles of cause and effect, to know all determinants of all phenomena, it must view information theory as a job only partially completed—or rather a job whose point of departure was prematurely advanced. These puzzles have evidently not been solved; by way of constraints on unpredictability, the determinants of communication have not been discovered so much as constructed.
JUSTIN SCHEER B’23 hopes the message got through.