A Letter to a Friend About Information
Meaning, names, speech acts, and formal causation.
Dear [Friend],
What gives information meaning?
Many pressing questions of our day concern information. How to obtain it, protect it, fund it, and build machines that process it; and what consequences this will have for society, economy, and sense of ourselves. So we should probably slow down now and again to check the foundations of the house: What exactly are we talking about? What is information?
One highly influential way of looking at it says information is basically patterns upon a substrate which are separable from that substrate. Scratches on rocks, waves on water, sequences of charge on magnetic tape, that kind of thing. This is implicitly how Claude Shannon looked at it. His theory gave us tools to analyze how such patterns are copied from one substrate to another more or less “noisily”—thus emphasizing their separability from any particular substrate.
Shannon was quite clear that his definition of information was not equivalent to meaning. Instead, “Shannon information” captures seemingly random, unexpected patterns. Counterintuitively, when patterns are extremely random, they carry more information. The way to understand it is that their unexpectedness makes them capable of carrying more potential meaning. So, for example, the ten letter sequence “tbontbtitq” seems random, while the ten letter word “strawberry” seems ordered. The former seems to mean nothing and the latter something. But the latter carries less information than the former: “Strawberry” probably just means “strawberry”. “Tbontbtitq”, on the other hand, might mean nothing, or it might be a coded message with heck of a lot more going on in it than “strawberry”. (In fact, it is a compression of “to be or not to be, that is the question.”) This potential to be a complex coded message is what makes a sequence “information rich”.
Here is how his collaborator Warren Weaver put it in a 1949 introduction to Shannon’s ideas:
“The word information, in this theory, is used in a special sense that must not be confused with its ordinary usage. In particular, information must not be confused with meaning. In fact, two messages, one of which is heavily loaded with meaning and the other of which is pure nonsense, can be exactly equivalent, from the present viewpoint, as regards information. It is this, undoubtedly, that Shannon means when he says that ‘the semantic aspects of communication are irrelevant to the engineering aspects.’ But this does not mean that the engineering aspects are necessarily irrelevant to the semantic aspects. To be sure, this word information in communication theory relates not so much to what you say, as to what you could say. That is, [Shannon’s definition of] information is a measure of one’s freedom of choice when one selects a message.”
So Shannon information—the conception of information upon which the computing revolution was built, what our hard drives store, what LLMs process—is not meaning, but rather the potential to carry meaning.
Yet that still doesn’t tell us what meaning is. What does it mean for a bit of Shannon information to “have meaning”? On this, Weaver remarks further: “One has the vague feeling that information and meaning may prove to be something like a pair of canonically conjugate variables in quantum theory, they being subject to some joint restriction that condemns a person to the sacrifice of the one as he insists on having much of the other.” Vague indeed. Shannon seems to have appropriated the word “information”, rather confusingly, to tell us precisely what meaning is not, while shedding no light on what it is.
“Behaviorist” thinkers might wish to interject here (and plenty have) that meaning has something to do with effectiveness. If a message has a predictable effect on a rat in a maze, it must mean something. For example, if you tell the rat “go left” and it actually goes left, then the message has a meaning, namely, “go left”. But I disagree with this line of thought, because carrots and sticks can have much more predictable effects on rats in Skinner boxes, and without meaning anything at all. In fact, the concept of effectiveness seems every bit as much of a mutually-exclusive obverse of meaning as Shannon information is. Meaning, in order to be meaning, must maintain the possibility of having no effect at all. Thus, if we tell the rat to go left, then the statement has meaning–but, well, the rat might not do it. But if we poke it enough, it will definitely go left, yet without any meaning having been conveyed.
If meaning must “do” something in order to “mean” anything, what do we need the concept of meaning for at all? Why not just speak of things that do things? It does not help to say that meaning is a special “informational” kind of effectiveness—information that does something—for if this were so, then random passwords would be meaningful. They are not.
I believe this is a better way of looking at it: information has meaning insofar as it contains assertions about the world. “The sky is blue”. “I, Annie, am inviting you, Billy, to my birthday party”. “Pigs can fly”. Such statements may or may not require further context to be interpreted as the sender intends. And, obviously, they may or may not be true. But they have some degree of meaning regardless, because they carry the potential to be interpreted by recipients as asserting some true or false statement about the world. By contrast, “cf65w5Fg” is neither potentially true nor potentially false in any sense except in the threadbare sense that “This message’s content is: ‘cf65w5Fg’”. It isn’t a secret code containing an assertion or a reference. It simply has no meaning.
Names are not reducible to information.
Yet, not every bit of information can be categorized neatly as meaningful or meaningless. Some strings of Shannon information seem to encode neither nonsense, nor any assertion about the world. This suggests a strange gray area.
Consider a name. A name is obviously encodeable as information. And it may seem that names implicitly assert a kind of meaningful social fact: my name is “Matt”, yours is “Peter” or “Judy” or whatever. That’s what our parents named us, or what people call us. But there remains an important difference between names and other meaningful information. Because while most meaningful information can in principle be translated or redescribed without losing anything, names cannot.
If I tell you “the city has 8 million inhabitants,” you can render that in German, Mandarin, or binary code without altering the meaning. The description points to a state of affairs that exists independently of however we encode it. But the name of the city—New York, let’s say—despite being encodable as information, is neither nonsense, nor interchangeable with any other label, nor reducible to a true or false assertion about the world.
To be sure, the string of letters “New York” carries some information concerning etymology, and the history of the city. But there is more to it. The thing to which “New York” refers is not fully defined. It points to a living, impossible-to-fully-describe object—a city—which interpenetrates with countless other living objects, like people. These people use the name. And because the name cannot be losslessly translated, the way an ordinary fact about the world can, it is not an arbitrary label. “Nueva York” is not exactly the same name as “New York.” Everyone who has anything to do with New York must either simply call it New York, or else alter their relationship with it—for example, by calling it “Nueva York” or “the city”.
Names are thus not “separable” from all substrate, the way Shannon information is. They are “entangled” with their users and their referents. They are not merely neutral reference tools for their users, or interchangeable labels for the things to which they refer; rather, they partly constitute the things they name and the people who use them, linking both into a relational ensemble.
Names include what Dworkin called “interpretive concepts”.
Ronald Dworkin thought that certain words play a special role in our moral and legal culture. For example, the word “justice” refers to something real, something objective; yet, it could never be replaced by some other word referring to the same objective thing without altering or destroying that thing. In other words, “justice” is not merely a label for that to which it refers, but a name.
Dworkin didn’t put it this way. He called these special words “interpretive concepts”. But he believed they lay at the core of our political life. As a society, we understand ourselves through them: we seek the meaning of words like justice, beauty, and kindness with our whole being, yet never fully find it. And any time one invokes an interpretive concept, one implies some interpretation more than another. It is, in short, a moral act, and a political act — no mere conveyance of information.
This is so when any name is invoked, whether the name of a concept with normative power, or the name of a god, or the name of a person. Using names, then—and invoking “interpretive concepts”—is thus a “speech act”: an action consummated through speech, like agreeing, rebuking, or lying.
Can non-humans perform speech acts? This is the most important question about AI.
It is not reasonable (and in fact deeply misleading) to consider that AI is “thinking”, or “agreeing“, or “lying”, or doing anything besides processing information, until the answer is yes. Some suppose that speech acts like naming, thinking, acting, agreeing, or lying are complex forms of information processing — that is, functions that become possible when information processing reaches a certain level of intensity, or acquires a certain topology. But our argument hints at the opposite: that speech acts are something totally other than information processing, and that the more Shannon information is being processed, the less meaning exists, the less names are being used, and the less speech acts can be said to be occurring.
Information and efficient versus formal cause
For a moment, let us leave aside the special case of names and speech acts, and return to the notion of ordinary meaning-laden information.
Meaning-laden information might be characterized as description. For a description to exist, there must be two locations which contain substrate matter in two states. The matter in location A is in a state that somehow reflects, or purports to reflect, some fact or facts about the state of matter in location B. Thus, location A carries certain meaningful information about location B. (Again, for now, we are leaving aside the question of whether the information about location B is true. Even false information is meaningful. What makes it meaningful is that it could be true.)
What do we mean by “two locations”? Possibly that location A is “over here”, and location B is “over there”, with the two locations separated by some expanse of space which is neither A nor B. For example, Earth is one place, Mars is another, and between them lies an expanse which is neither. But this is not the only possibility. It is also conceivable that A might be within location B, yet not identical to it—the way that “New York” is not the same place as “Earth”, even though the former is within the latter.
Many important assumptions about the nature of information and meaning depend on the difference between these two versions of “not-being-in-the-same-place.” Namely, when information could correspond to truth—when it is meaningful—what is it that creates that possibility?
Why, for example, might a pencil sketch accurately depict the New York skyline? If we assume “two separate locations” between the sketchpad and the skyline, then the possible truth (i.e., the meaning) of the sketch must be attributable to what Aristotle would have called an efficient cause. Someone aware of the contours of the New York skyline must have actually sketched it upon the paper. In this case, the description must postdate the described thing.
But there are other possible explanations for the sketch, which reveal that the sketchpad and the skyline do not necessarily constitute two separate locations. Perhaps the sketch was done by an architect imagining a new building added to the skyline, and the building was built, but only after the architect’s sketch. Another example: suppose we have a swatch of painted fabric which matches the skin tone of the Mona Lisa. It might have been made by someone who matched the shade, in which case it would have to postdate the Mona Lisa. But it might also be a piece of canvas cut from the actual Mona Lisa, in which case it would predate the Mona Lisa, since it existed before the painting was finished. These cases show how true, meaningful information can exist without being a postdated “description”—and without ever traversing any empty expanse from one location to another in the manner of Shannon information.
When changes occur in a whole body, information can arise in the parts of that body without being “conveyed” from any separate location. This corresponds to what Aristotle called “formal” causation. When some Whole undergoes a process of change, truthful descriptions of Part B can arise in Part A before Part B “changes to match its description”. As Leonardo works on his masterpiece, a part of the canvas comes to contain information about the Mona Lisa’s skin tone before the Mona Lisa even exists as a woman with a smile. An architect’s sketchpad contains information about the New York skyline before the skyline contains that information. In these cases, the description and the described things are part of a larger whole, and wherever that is the case, information can reach back from the future.
Best,
Matt

