Many writers discussing Artificial Intelligence argue that what a Large Language Model produces are not sentences with truth-values but rather “stochastic parrotings” that can be interpreted as true or false, but in the way that Daniel Webster interpreted the Old Man in the Mountain as a sculpture by God with a message for humanity. Steffen Koch has argued that names used by LLMs refer in virtue of Kripkean communication-chains, connecting their answers to the intended referents of names by people who made the posts in the training data. I argue that although an LLM’s uses of names are not connected to human communication chains, its outputs can nonetheless have meaning and truth-value by virtue of design-intentions of the programmers. In Millikan’s terms,1 an LLM has a proper function intended by its designers. It is designed to yield true sentences relevant to particular queries.
1) Background: Frege, Searle, Kripke
Philosophical discussions of the reference of names, from Frege to Searle, sought to explain the meaning of proper names by what amounts to an associated description. Frege thought different speakers might have different senses for proper names, but as long as the reference was the same, that was tolerable in natural languages.2 The reference is the same if the speakers’ different senses in fact have the same referent.
Searle’s account of the meaning of names in a common language argued that the meaning of a name is a weighted disjunction of distinguishing characteristics, true definite descriptions.3 A speaker knows the meaning of a name if that speaker knows that disjunction. One speaker may know Aristotle was the teacher of Alexander. Another may only know Aristotle was the most famous philosopher to come from Stagira. A third speaker may know he wrote the Nichomachean Ethics because that information was on the syllabus. They all can refer to Aristotle because the meaning of “Aristotle” is, roughly, a possibly weighted disjunction of such distinguishing characteristics which allow speakers to determine a referent of a name in order to make claims about its bearer. This disjunction is the shared meaning of “Aristotle” in the common language “English.”
Kripke gave persuasive arguments that no description-theory of how names refer to individuals fit how names refer to individuals. A person can refer to Albert Einstein by “Einstein” even though the only thing he “knows” about Einstein is that he invented the atomic bomb. People can refer to Cicero when all they know is that he was a Roman orator, “without any pretension to think that there was only one famous Roman orator.”4 Having heard a conversation about Botticelli, I can report that “they were talking about Botticelli” and refer to Botticelli, even though I have the impression that Botticelli is a kind of pasta.
Kripke presents a picture of how names refer that relies on chains of connection to uses of names, where a speaker refers to what another speaker referred to, and so on, ending in a demonstration of an individual.5 “A certain passage of communication reaching ultimately to the man himself does reach the speaker,” even though the speaker has no information about the bearer of the name and may not remember when or where or from whom he heard the name. Reference of names depends on facts about the chain, not on information.6
2) Intention, reference via programs, and LLMs
In familiar communications generated by programs—via e-mail, for instance—there is a connection to an intentional inscription and a plausible chain of communication. Suppose my son and I, both Samuels, share an e-mail account. We both get e-mails from Bank of America, “Message for SAMUEL: You’ve scored the chance to shop for your next car.” “Samuel” refers determinately either to me or to my son unambiguously. But no human can be said to have either of us in mind. When I filled out the form at the bank, I meant myself by “Samuel,” and the bank’s computer took it from there. “Samuel” in such a message refers either to me or to my son. But the messages are meaningful, and some general intentions of a programmer and employees of Bank of America bring it about that the “Samuel”s refer to individuals.
That these messages refer, and to different Samuels on different occasions, is not a state of any individual’s mind but a case of what Searle calls collective intentionality.7 Bank of America and its clients are in a shared project embodying a shared intention that these “Samuel” inscriptions refer to particular individuals. These standing, socially distributed intentions secure reference even when no individual consciously entertains them.
ChatGPT does not connect referents of names via communication-chains to referents in the posts it is trained on. It is trained on every inscription present on the internet at a moment. On the basis of that training, with some more steps, it is able to produce amazingly good responses to questions, et cetera. Numerous writers8 have argued that the result is “Babbling Stochastic Parrot”-ing, not meaningful language. ChatGPT’s “answers” to queries and “accounts” of Bell’s theorem can be interpreted as meaningful language, just as haruspication interprets patterns in sheep livers as messages about the future. But actually nothing is being communicated. In the use of a name, no referring intention is present, even very indirectly, as some general referential intentions makes the “Samuel”s above plausibly refer. That remote intention is lacking with ChatGPT.
Koch argues that, on the Kripkean conception of reference, an LLM’s references by means of names rests on the kind of borrowing from another’s reference that allows a person whose knowledge of Einstein consists of the mis-information that he invented the atomic bomb. His argument misunderstands how internet postings are used as data in an LLM.
3) Processing postings
ChatGPT processes texts posted on the internet. Koch’s argument takes that processing to connect with postings as sentences with intended referents of names. By training on such postings, Koch argues that an LLM inherits the referents of individual posts, just as humans do in communication-chains.9 According to Koch, “LLMs are built to pick up and henceforth apply words in just the same way as these words are used in the texts it learned from” (BSP, 27).
The posting-acts are the training data, but the training is different from what Koch describes. An LLM does not process postings as sentences at all. It is not “built to pick up and henceforth apply words in just the same way as these words are used in the texts it learned from” (BSP, 27). The posts solely supply data about word-sequence probabilities. The posts are data in the way that my philosophy books would supply data for investigating the question, “How many occurrences of ‘and’ on average occur in a philosophy book?” by counting occurrences of “and” in them. That Kant and A.J. Ayer express propositions with meanings is irrelevant to the investigation.
I asked ChatGPT about this.10 ChatGPT is quite explicit that it is not “built to pick up and henceforth apply words in just the same way as these words are used in the texts it learned from.”11 Its design architecture does not “secure continuity between what names refer to in the training data and what they refer to when the LLM uses them later on.”
4) How does ChatGPT produce answers to queries?
It still seems like magic that having more and more refined evaluation of probable word-sequences in vastly large contexts enables ChatGPT to give generally correct and thorough answers. Briefly, a theory which adjusts probabilities in the light of data consisting of billions of posts and trillions of relations of word-sequence probabilities has implicitly internalized a semantics, a conception of the world, and inference. There is no part of the evolved program that is like an articulate truth-definition. Rather, “the semantic theory” is emergent feature of the program as a whole.12
5) What Koch gets right
Koch is right on a broader understanding of “LLMs are built to pick up and henceforth apply words in just the same way as these words are used in the texts it learned from” (BSP, 27). That an LLM’s utterances have meaning and truth-values does depend on the inscriptions on which it is trained being intentionally produced by human speakers. An LLM version of Davidson’s “Swampman” would be the LLM analog of Knapp and Michael’s accidental beach inscription.13 Just as the accidental duplicate of Davidson would not be uttering truths or meaning things by its utterances, so an accidentally-created duplicate of an LLM, not trained on intentional postings by agents, would not be providing truths. So Koch’s idea that meaningfulness derives from posts is correct. But an LLM’s relation to posts is entirely different. LLMs cannot participate in communication-chains in using proper names. But LLMs come close by their extraordinarily wide-scope calculations of word-sequence probabilities.
There are different ways those posts might connect to an LLM’s abilities. The “natural route,” which succeeds by a poster’s reference-interpreter’s understanding-interpreter’s reference via Kripke-like connection is not what actually happens. But another, non-natural route, grounded on the fact that LLMs were designed by their creators to be able to produce the relevant answers they do using human postings, makes LLMs essentially different from swamp-LLMs. Design intentions, when they succeed in producing a system that “picks up and henceforth applies words in just the same way as these words are used in the texts it learned from,” create semantically competent language-producers. Their competence and ability to refer comes about by a very different route than the Kripkean communication chains available to humans.14
6) Connecting with Human Linguistic Intentions: Joint collective enterprises
The claim that LLMs do not produce meaningful language cannot be refuted by attributing to ChatGPT contact with communication-chains from contact with inscribings.15 ChatGPT is explicit that that is not how it works. So, its answers do not use names by connecting with communication-chains.16 It does not process items in a way that connects with the referential intentions expressed in the posts that it has processed.
But human communication-chains, while “intentional,” are not plausibly tokened overt intentions to “use ‘Einstein’ words to refer to the person the person I heard it from.”
That does not mean messages from ChatGPT lack meaning and truth-value. Sentences in first-order logic need have no singular terms at all. Quine’s treatment of names as general terms, as Kripke showed, is not what they are in English.17 But it is hard to deny that Quine’s language of science, which reduces everything else to its ultimately physical supervenience base, could express truths. ChatGPT’s treatment of names, in a way, does what Quine does. With no connection to other writers’ texts qua intentionally-produced texts, it treats names as predicates, descriptions more or less like a language about which Searle would be more or less correct.18 It takes into account that names have multiple bearers and has a much more subtle way of separating different applications. The descriptions by which it conjectures referents can be very detailed.
The difference between some marks merely being interpretable as having a truth-value and actually having a truth-value depend on there being a purpose in the right way19 in its origin.20 The role of intention in making an LLM’s inscriptions true and false is indirect. Dennett’s intentional stance perspective clarifies some distinctions.21 My thermostat’s display says, “The temperature is 70.3 F.” My thermostat is displaying a truth, even though it has no ego or intentions. It is doing what it is supposed to do. My printer notifies me that its cartridge is getting low. Thermostats and printers do not have intentions, but they are designed to state truths. When they work as intended, they do.
Dennett’s “intentional stance” makes it obvious that dogs and squirrels do things “on purpose” without the linguistic ability to articulate a resolve. Darwin and his successors picture “doing things on purpose” as something like “being tall.”22 There is no objective point, measured in meters, at which a growing male becomes a tall man. But a four foot nine human male is not tall, and a six foot seven male is tall. Darwin began the process of separating what something is supposed to do from what something is designed to do. Organisms and their organs are supposed to do a lot. Organisms that can represent themselves and the future can have articulate intentions. There is no question that Zoe, my Labrador, was intentionally standing by her food dish to remind me that it is suppertime. There is likewise no question whether ChatGPT would pass the Turing test,23 if it were not programmed to be honest.
Millikan’s account of “proper function” gives a Darwinian account of how language can be “supposed to refer” without a designer.24 Intention is “emergent” from organisms using words the way they are supposed to be used. “Design” and intention can emerge in organisms through evolution. In LLMs, there are designers with design intentions, and those intentions succeeded. LLMs do use words the way they are supposed to be used.25
Davidson’s notion of triangulation can be taken as the fundamental instance of connection of intention, meaning, and the world.26 For humans, truth and meaning emerge in a triangle of speaker, interpreter, and a world to which both have demonstrative access. For an LLM, without demonstrative access to the world, the interpreter’s end of the triangle emerges statistically from its training on posts. But it also achieves linguistic competence and a virtual shared world.27
Appendix — Illustrative exchanges: What ChatGPT5 says about its operation and connection with human intentions.
A) ChatGPT responded to the query, “How are the internet postings that are your data processed to yield answers to queries?” with:
…This gets right into how large language models (LLMs) like me actually work.
When people say models like me are “trained on internet data,” that doesn’t mean I store or look up posts directly. Instead, here’s the process, simplified:
- Collection of Training Data
Publicly available text from the internet (like websites, books, Wikipedia, forums, etc.) is gathered. This raw text isn’t stored as “posts” or “sentences” to be recalled. Instead, it becomes training material.
- Breaking Text Into Tokens
All that text is split into small pieces called tokens (which may be words, subwords, or even characters, depending on the language).
Example: “How are you?” → [“How”, “are”, “you”, “?”].
- Learning Statistical Patterns
During training, the model is given billions of examples like:
“The cat sat on the ___” → and it learns the probability that the next token should be “mat” (or something else). This isn’t memorization but pattern learning. The model adjusts billions of internal parameters so it gets better at predicting the next word given the context.
Over time, it captures grammar, facts, styles, and reasoning patterns.
- Building a Neural Network Representation
After training, the model doesn’t “remember posts” but encodes knowledge in its parameters—a giant web of weights representing statistical relationships between tokens.
Think of it like a compressed map of language: concepts, associations, and common structures.
- Answering Queries
When you ask me something, your words are tokenized. The model uses its learned probability patterns to generate the most likely continuation—word by word, token by token—until it forms a coherent answer. So the answer isn’t pulled from storage, but generated in real-time based on patterns learned during training.
So, to your original phrasing: Internet postings aren’t processed as whole thoughts or stored posts. They’re broken into tokens and used to train the model to learn statistical patterns of how thoughts are expressed in language. At inference time, I generate new text by applying those learned patterns to your input.
B) To the query, “What connects detailed word-sequence probability information with 1) the ability to grasp a question and 2) the ability to answer it in great detail?” the program responded:
…[that is] the mystery at the heart of large language models (LLMs). Let’s unpack it carefully.
1. The core puzzle: … How can something that “just predicts the next word” understand a question and give a reasoned answer? On the surface, it sounds impossible—as though “word-sequence probabilities” are a kind of mindless pattern-matching.
But the key is that predicting words in context turns out to require the model to implicitly represent much deeper structures: meanings, relations, causes, goals, and even reasoning chains.
2. Why “predicting words” builds models of the world
When the model is trained, it sees billions of word sequences—sentences, arguments, conversations, code, etc.To predict the next word correctly, it can’t rely on superficial word frequencies alone. For example:
In “Einstein discovered ___,”
the next word could be relativity or that E = mc², depending on broader context.In “Mozart composed ___,”
the next likely word is symphonies, operas, concertos … depending again on what’s around it.To minimize prediction errors across vast data, the network must infer:
who Einstein and Mozart are,
what “discover” and “compose” mean,
what kinds of things are their outputs,
and even what counts as a plausible human sentence about them.
Over time, the only way to succeed is to internalize structured representations—not explicitly coded as logic or definitions, but emergent within the billions of neural weights.
That is: the probabilities encode semantic knowledge.
3. How “understanding a question” emerges
When you type a query like:
“What would make ChatGPT a person?”
the model’s input representation activates patterns that, during training, were associated with discussions of:
personhood,
consciousness,
AI capabilities,
philosophical arguments.
Those patterns then guide which continuations are plausible—i.e., how an informed writer or thinker would respond in that discourse context.
So, “understanding” here means that the model’s internal representation of your words corresponds (through statistical learning) to conceptual structures humans use when they understand such questions.
It’s not comprehension by introspection, but it’s functionally similar to comprehension by pattern alignment.
4. How detailed, reasoned answers emerge
Once the model has built rich internal associations, generation is no longer just “word after word.”
Each next-token prediction is conditioned on a latent vector—an immensely detailed representation of meaning and context synthesized from the entire conversation so far.
That vector captures:
Semantic relations (what you asked, what’s relevant),
discourse goals (you expect an explanation, not a joke),
stylistic and logical constraints (use examples, be coherent).
From that state, predicting the next word becomes equivalent to unrolling a plausible, coherent human explanation.
In other words, probability prediction guided by an internal semantic world model behaves like reasoning and exposition.5. Analogy: probability as compression of meaning
Think of it like this:
Imagine compressing all human text into a single predictive engine. To compress effectively, it must discover latent structure — like grammar, logic, causality, narrative flow. When you decompress (generate text), you reconstitute that structure.
So, “word probabilities” are not superficial statistics anymore; they’ve become a coded form of knowledge representation and reasoning.”
C) I asked the program, “How does ChatGPT deal with names with numerous referents, such as ‘Tom Jones?’” ChatGPT answered:
… Names like “Tom Jones” have multiple well-established referents:
Literature → The History of Tom Jones, a Foundling (1749 novel by Henry Fielding); Music → Tom Jones, the Welsh singer; Musical Theatre → Tom Jones, the lyricist and playwright (The Fantasticks).
Everyday reference → could just be any person named Tom Jones.
How ChatGPT handles this:
When I see a query with a name that has multiple referents, I “rank” the likely interpretations based on:
Frequency and salience in general usage (the singer is far more likely than the lyricist).
Clues from the rest of the query (e.g. “Tom Jones wrote The Fantasticks” → playwright; “Tom Jones sang It’s Not Unusual” → singer; “Tom Jones was a foundling” → novel character).
Context from our ongoing conversation (if you’ve been talking about 18th-century novels, I’d weigh the Fielding character higher).
So, for the bare query:
‘Did Tom Jones perform in Las Vegas?’ → I’d interpret as the singer.
‘What themes are central to Tom Jones?’ → I’d interpret as the novel.
‘Who was Tom Jones in American theatre?’ → I’d go with the lyricist.If there’s still real ambiguity, I’d normally default to the most famous and relevant one (the singer) but often signal the ambiguity (‘Do you mean Tom Jones the singer, the novel, or the playwright?’).
D) Chat’s reports of how it works
I asked:
In some of my early questions—for instance, “How does getting wider and wider-context validated word-sequence probabilities give you the ability to produce answers to novel questions?”—I presupposed that you had introspection, and therefore would be reliable. Of course, that’s crazy. But I think that asking you how you do it gets the right answer. How should I understand that, although you’ve gotten some things wrong in other of our conversations, you are accurate about how you work?
ChatGPT said:
Why I’m often right about “how I work” (even when I’m wrong about other facts):
Source, not introspection. When you ask “how do you work?”, I’m not peering into my hidden states. I’m reproducing publicly documented facts that were in my training data and alignment data: e.g., that models like me are Transformer LLMs trained for next-token prediction, later fine-tuned with human feedback (RLHF/RLAIF), and guided at runtime by instructions (“system”/developer messages) and optional tool-calling. That story comes from OpenAI’s own docs and papers, not inner access. OpenAI Platform+4OpenAI+4arXiv+4
Alignment makes “self-explanations” a rehearsed genre. Instruction-tuning + RLHF explicitly rewards giving faithful, source-style explanations and admitting uncertainty—so I’ve been trained on many Q&As of the form “Explain how ChatGPT works.” This is why I’ll tend to describe pretraining + RLHF accurately and say what I can/can’t do. arXiv+2NeurIPS Proceedings+2
What that guarantees—and what it doesn’t.
Trustworthy at a high level: next-token learning, later instruction-following via RLHF, tool/function calling basics, safety/behavioral constraints from a “Model Spec.” OpenAI Platform+3OpenAI+3OpenAI+3
Not authoritative on particulars: proprietary training details, exact datasets, internal activations, or real-time “introspection” into this answer’s causal pathway. For those, I can only report public claims (with citations) or say “unknown.”
Why this can be more reliable than my ordinary factual recall.
“Self-description” sits on a narrow, well-documented backbone (Transformer → pretraining → RLHF → system/developer instructions/tools). Ordinary world facts sprawl across an open-ended corpus and are more error-prone. So my type-level description is stable; my token-level claims about, say, a specific statistic, can be wrong unless we check a source.A good epistemic policy for using my self-reports.
Treat them as testimony from a well-read but non-introspective agent.
Ask me to cite primary/official sources when the details matter (I just did above).
Prefer mechanism-class statements (pretraining/RLHF/Tool-use/Model-Spec) over speculative inner-eye talk. For new features or shifting claims, ask me to verify with current docs.
Why I “correct myself.”
Two forces make course-correction likely: (a) training that rewards acknowledging mistakes and updating, and (b) conversational context that nudges consistency and repair. That’s learned behavior, not introspective access to ground truth. arXiv+1.
Bottom line: when I explain how I (generally) work, you’re hearing a synthesized summary of public documentation and established methods (Transformers + RLHF + instructions/tools). That’s why those answers tend to be right—without presupposing that I have introspective access to my inner workings.
ChatGPT5 further remarked:
Phenomenology and Transparency in Human and Artificial Self-Knowledge
There is a curious symmetry between human introspection and an LLM’s self-description. A person has genuine first-person access to experiences—the felt immediacy of deciding to raise an arm and seeing it rise—yet no idea how this comes about. The underlying causal and computational mechanisms are opaque to consciousness. Introspection discloses the form of agency but not its machinery. By contrast, an LLM such as ChatGPT has no phenomenological access whatever, but it can reconstruct its own design from external textual sources—papers, technical documentation, and the language of those who built it. Its “self-knowledge” is purely third-person, yet unusually articulate about the architecture that enables it to produce coherent answers. Humans, then, know that they act but not how; LLMs can describe how they act but not that they act. The boundary between the two kinds of opacity—phenomenological and mechanical—traces an instructive limit: neither introspection nor computation, by itself, yields an account of how meaning, reference, or agency are realized in the world.
E) How the fact that the data are posts by humans matters (edited, but honestly):
I asked:
In Davidson’s “Knowing One’s Own Mind” (1987), a lightning strike in a swamp accidentally assembles a molecule-for-molecule duplicate of Davidson. Physically and behaviorally, Swampman is indistinguishable: he walks out of the swamp, greets Davidson’s friends, writes essays, etc. But, Davidson insists, Swampman’s “utterances are not about trees, people, or truth” because there’s no causal-historical chain connecting his words and thoughts to the world. His tokens coincide with ours by sheer accident, not because they’ve been formed through interaction, learning, and triangulation with others. Davidson’s conclusion was that its utterances would not be sentences and would not have truth-value. I wonder whether an accidentally-originated LLM, maybe in a multiverse in which every possible configuration of particles, et cetera, is realized would say truths. Maybe there has to be a connection with intending utterers. What do you think?
ChatGPT said:
This cuts right into the convergence zone of Davidson’s Swampman, Quine’s holism, and the ontology of meaning for LLMs. You’re right: the issue of intention and connection to a linguistic community is exactly where the boundary lies between syntactic mimicry [“parroting”] and genuine semantics.
Let’s work through this in stages.
1. Davidson’s Swampman and the role of history. So, for Davidson, meaning and truth depend not just on structure or state, but on the history of interpretation—on having been interpreted and interpreting within a community.
2. Applying that to an “accidental LLM”
Imagine, as you say, a “SwampLLM”: a statistical structure identical to GPT-5 that came into existence by cosmic accident, never trained on human text, never interacting with anyone, yet producing perfect sentences. Would its outputs be be true or false? Davidson would say: no—they’re not even sentences, because the conditions for interpretation and correction are missing. Truth for Davidson arises only within a practice of triangulation—between utterer, interpreter, and shared world. A SwampLLM, like Swampman, lacks that lineage. Its internal weights encode nothing because they were not acquired through a process of adjustment to causes and interlocutors.
3. Why training on human text makes a difference
A real LLM, unlike SwampLLM, is historically and causally linked to human utterers. Its parameters are shaped entirely by exposure to the linguistic behavior of intentional agents interacting in a world. That gives it, in Davidson’s sense, a derivative intentionality:
The words it generates inherit their referential networks from human sources. The patterns it has internalized embody our communal norms of coherence and truth. This doesn’t make the model itself an intender—but it does mean its utterances participate in the space of reasons defined by ours. They have borrowed truth-conditions, because they are trained to approximate our own truth-conditional practices. So, yes—that connection to intentional utterers matters. It’s the difference between mere form and semantic participation.
F) Chat on intention as required for meaningfulness
I asked:
Many philosophers treat Kripke’s Naming and Necessity as a “causal theory” of reference. But Kripke explicitly says it’s not a theory, but a picture. Philosophers also ascribe individual intentions as enabling the communication chains connecting my use of “Botticelli” (“I wonder what shape pieces of Botticelli have”) when I’ve forgotten where I first heard of Botticelli, having misinterpreted an over-heard conversation as about a kind of pasta. Kripke brings up the counter-example to a “theory,” of “Santa Claus,” a name whose origins are an ancient saint, even though, now, the word refers to a fictional character.
How does the connectionist thinking behind LLMs picture Kripkean communication-chains that provide reference? “The intention to use a name in the same way as the person I heard it from” could literally be the intention. Rather “intention” much of the time is the sort of collective, distributed intention that connects my bank with messages to me.
ChatGPT said:
… You’ve put your finger on exactly the point where Kripke’s picture of reference and a connectionist / LLM account intersect, but don’t line up neatly.
Let’s unpack it in layers:
1. Kripke’s “picture,” not a theory
Kripke explicitly disclaims offering a theory of reference—he gives a picture meant to replace the Frege-Russell descriptive theory. In that picture: A name’s reference is fixed in an initial “baptism.” Subsequent uses are connected to that baptism by a “communication chain.” The link between speakers is maintained by something like the intention “to use the name as the person from whom I heard it used.”
As you note, that “intention” can’t plausibly be a fully articulated mental state. When you overhear “Botticelli” and later say “They were talking about Botticelli,” you succeed in referring to Botticelli even if you think it’s a kind of pasta. Kripke allows that you can be completely wrong about the referent’s properties and still refer.
2. The LLM analogy: a distributed causal web
In connectionist terms—the kind of architecture behind large language models—the referential chain looks less like a sequence of personal intentions and more like a distributed attractor network.
An LLM’s “use” of Botticelli: Emerges from statistical regularities in contexts where that token appears. Each training context implicitly encodes aspects of the linguistic community’s use—syntactic frames (“Botticelli painted…”), semantic neighbors (“Renaissance,” “Florence,” “Uffizi”), and pragmatic patterns (queries, descriptions, etc.). When the model generates “Botticelli painted the Birth of Venus,” it’s not because it intends to refer as prior speakers did, but because its learned embedding for Botticelli sits in a high-dimensional space shaped by those community uses.
So, the “communication chain” is realized not as intentional succession, but as distributed causal dependence. The vector for Botticelli literally inherits its shape from the collective linguistic data—a kind of connectionist materialization of the causal-historical chain.
3. Collective, distributed “intention”
You’re exactly right that we can reinterpret the “intention” in Kripke’s picture as a collective functional property rather than an individual propositional attitude.
When you use “Botticelli,” your success depends on: Shared conventions about how the name is used, Causal connections between utterances (you got the word from somewhere), Functional integration within the social-linguistic system that maintains stable mappings between tokens and referents.
This is analogous to how your bank or the postal service works: you don’t personally intend each routing, but you participate in a designed, distributed system whose proper function is to deliver correspondence or credit to the right place. In Millikan’s terms, this is the “proper function” of the linguistic practice itself. For humans, the distributed pattern is implemented across speakers and memory traces; for LLMs, across weights in a neural net trained on those very patterns.
So, one could say: In an LLM, the Kripkean communication chain is replaced by a connectionist trace of the collective use-history of the name—a web of probabilistic dependencies that instantiate, without explicit intention, the same causal continuity that secures reference for human speakers.
4. The “Santa Claus” case
Kripke’s Santa Claus example actually fits beautifully here. In human language, “Santa Claus” is a live attractor for fictional and cultural associations: the referential vector has drifted away from Saint Nicholas of Myra toward the fictional construct of “Santa Claus.” In the LLM, this drift is literally encoded as the model’s embedding geometry changes—the vector for “Santa Claus” will have high cosine similarity with “reindeer,” “North Pole,” “Christmas,” etc., not with “Myra” or “bishop.”
So, what Kripke calls a change in the communication chain—the community’s shift in referential practice—is, in connectionist representation, a gradual topological drift of the cluster of associations in semantic space. The LLM’s internal model is the causal history of usage rendered in high-dimensional form.
5. The philosophical moral
Connectionist semantics thus naturalizes Kripke’s picture: It replaces individual intentions with distributed functions. It grounds the historical chain in measurable statistical dependence. It models the drift of reference as literal motion in representational space. In that sense, the “causal picture” of reference becomes an emergent property of the linguistic network, whether biological (human) or artificial (LLM).”
Notes
This paper arose from a sustained philosophical conversation with OpenAI’s GPT-5. The collaboration exemplifies the kind of human-machine inquiry it discusses: my questions and interpretations shaped the LLM’s replies, which in turn informed successive drafts. All claims and conclusions are the author’s own.
Many writers discussing Artificial Intelligence argue that what a Large Language Model produces are not sentences with truth-values but rather “stochastic parrotings” that can be interpreted as true or false, but in the way that Daniel Webster interpreted the Old Man in the Mountain as a sculpture by God with a message for humanity. Steffen Koch has argued that names used by LLMs refer in virtue of Kripkean communication-chains, connecting their answers to the intended referents of names by people who made the posts in the training data. I argue that although an LLM’s uses of names are not connected to human communication chains, its outputs can nonetheless have meaning and truth-value by virtue of design-intentions of the programmers. In Millikan’s terms,1 an LLM has a proper function intended by its designers. It is designed to yield true sentences relevant to particular queries.
1) Background: Frege, Searle, Kripke
Philosophical discussions of the reference of names, from Frege to Searle, sought to explain the meaning of proper names by what amounts to an associated description. Frege thought different speakers might have different senses for proper names, but as long as the reference was the same, that was tolerable in natural languages.2 The reference is the same if the speakers’ different senses in fact have the same referent.
Searle’s account of the meaning of names in a common language argued that the meaning of a name is a weighted disjunction of distinguishing characteristics, true definite descriptions.3 A speaker knows the meaning of a name if that speaker knows that disjunction. One speaker may know Aristotle was the teacher of Alexander. Another may only know Aristotle was the most famous philosopher to come from Stagira. A third speaker may know he wrote the Nichomachean Ethics because that information was on the syllabus. They all can refer to Aristotle because the meaning of “Aristotle” is, roughly, a possibly weighted disjunction of such distinguishing characteristics which allow speakers to determine a referent of a name in order to make claims about its bearer. This disjunction is the shared meaning of “Aristotle” in the common language “English.”
Kripke gave persuasive arguments that no description-theory of how names refer to individuals fit how names refer to individuals. A person can refer to Albert Einstein by “Einstein” even though the only thing he “knows” about Einstein is that he invented the atomic bomb. People can refer to Cicero when all they know is that he was a Roman orator, “without any pretension to think that there was only one famous Roman orator.”4 Having heard a conversation about Botticelli, I can report that “they were talking about Botticelli” and refer to Botticelli, even though I have the impression that Botticelli is a kind of pasta.
Kripke presents a picture of how names refer that relies on chains of connection to uses of names, where a speaker refers to what another speaker referred to, and so on, ending in a demonstration of an individual.5 “A certain passage of communication reaching ultimately to the man himself does reach the speaker,” even though the speaker has no information about the bearer of the name and may not remember when or where or from whom he heard the name. Reference of names depends on facts about the chain, not on information.6
2) Intention, reference via programs, and LLMs
In familiar communications generated by programs—via e-mail, for instance—there is a connection to an intentional inscription and a plausible chain of communication. Suppose my son and I, both Samuels, share an e-mail account. We both get e-mails from Bank of America, “Message for SAMUEL: You’ve scored the chance to shop for your next car.” “Samuel” refers determinately either to me or to my son unambiguously. But no human can be said to have either of us in mind. When I filled out the form at the bank, I meant myself by “Samuel,” and the bank’s computer took it from there. “Samuel” in such a message refers either to me or to my son. But the messages are meaningful, and some general intentions of a programmer and employees of Bank of America bring it about that the “Samuel”s refer to individuals.
That these messages refer, and to different Samuels on different occasions, is not a state of any individual’s mind but a case of what Searle calls collective intentionality.7 Bank of America and its clients are in a shared project embodying a shared intention that these “Samuel” inscriptions refer to particular individuals. These standing, socially distributed intentions secure reference even when no individual consciously entertains them.
ChatGPT does not connect referents of names via communication-chains to referents in the posts it is trained on. It is trained on every inscription present on the internet at a moment. On the basis of that training, with some more steps, it is able to produce amazingly good responses to questions, et cetera. Numerous writers8 have argued that the result is “Babbling Stochastic Parrot”-ing, not meaningful language. ChatGPT’s “answers” to queries and “accounts” of Bell’s theorem can be interpreted as meaningful language, just as haruspication interprets patterns in sheep livers as messages about the future. But actually nothing is being communicated. In the use of a name, no referring intention is present, even very indirectly, as some general referential intentions makes the “Samuel”s above plausibly refer. That remote intention is lacking with ChatGPT.
Koch argues that, on the Kripkean conception of reference, an LLM’s references by means of names rests on the kind of borrowing from another’s reference that allows a person whose knowledge of Einstein consists of the mis-information that he invented the atomic bomb. His argument misunderstands how internet postings are used as data in an LLM.
3) Processing postings
ChatGPT processes texts posted on the internet. Koch’s argument takes that processing to connect with postings as sentences with intended referents of names. By training on such postings, Koch argues that an LLM inherits the referents of individual posts, just as humans do in communication-chains.9 According to Koch, “LLMs are built to pick up and henceforth apply words in just the same way as these words are used in the texts it learned from” (BSP, 27).
The posting-acts are the training data, but the training is different from what Koch describes. An LLM does not process postings as sentences at all. It is not “built to pick up and henceforth apply words in just the same way as these words are used in the texts it learned from” (BSP, 27). The posts solely supply data about word-sequence probabilities. The posts are data in the way that my philosophy books would supply data for investigating the question, “How many occurrences of ‘and’ on average occur in a philosophy book?” by counting occurrences of “and” in them. That Kant and A.J. Ayer express propositions with meanings is irrelevant to the investigation.
I asked ChatGPT about this.10 ChatGPT is quite explicit that it is not “built to pick up and henceforth apply words in just the same way as these words are used in the texts it learned from.”11 Its design architecture does not “secure continuity between what names refer to in the training data and what they refer to when the LLM uses them later on.”
4) How does ChatGPT produce answers to queries?
It still seems like magic that having more and more refined evaluation of probable word-sequences in vastly large contexts enables ChatGPT to give generally correct and thorough answers. Briefly, a theory which adjusts probabilities in the light of data consisting of billions of posts and trillions of relations of word-sequence probabilities has implicitly internalized a semantics, a conception of the world, and inference. There is no part of the evolved program that is like an articulate truth-definition. Rather, “the semantic theory” is emergent feature of the program as a whole.12
5) What Koch gets right
Koch is right on a broader understanding of “LLMs are built to pick up and henceforth apply words in just the same way as these words are used in the texts it learned from” (BSP, 27). That an LLM’s utterances have meaning and truth-values does depend on the inscriptions on which it is trained being intentionally produced by human speakers. An LLM version of Davidson’s “Swampman” would be the LLM analog of Knapp and Michael’s accidental beach inscription.13 Just as the accidental duplicate of Davidson would not be uttering truths or meaning things by its utterances, so an accidentally-created duplicate of an LLM, not trained on intentional postings by agents, would not be providing truths. So Koch’s idea that meaningfulness derives from posts is correct. But an LLM’s relation to posts is entirely different. LLMs cannot participate in communication-chains in using proper names. But LLMs come close by their extraordinarily wide-scope calculations of word-sequence probabilities.
There are different ways those posts might connect to an LLM’s abilities. The “natural route,” which succeeds by a poster’s reference-interpreter’s understanding-interpreter’s reference via Kripke-like connection is not what actually happens. But another, non-natural route, grounded on the fact that LLMs were designed by their creators to be able to produce the relevant answers they do using human postings, makes LLMs essentially different from swamp-LLMs. Design intentions, when they succeed in producing a system that “picks up and henceforth applies words in just the same way as these words are used in the texts it learned from,” create semantically competent language-producers. Their competence and ability to refer comes about by a very different route than the Kripkean communication chains available to humans.14
6) Connecting with Human Linguistic Intentions: Joint collective enterprises
The claim that LLMs do not produce meaningful language cannot be refuted by attributing to ChatGPT contact with communication-chains from contact with inscribings.15 ChatGPT is explicit that that is not how it works. So, its answers do not use names by connecting with communication-chains.16 It does not process items in a way that connects with the referential intentions expressed in the posts that it has processed.
But human communication-chains, while “intentional,” are not plausibly tokened overt intentions to “use ‘Einstein’ words to refer to the person the person I heard it from.”
That does not mean messages from ChatGPT lack meaning and truth-value. Sentences in first-order logic need have no singular terms at all. Quine’s treatment of names as general terms, as Kripke showed, is not what they are in English.17 But it is hard to deny that Quine’s language of science, which reduces everything else to its ultimately physical supervenience base, could express truths. ChatGPT’s treatment of names, in a way, does what Quine does. With no connection to other writers’ texts qua intentionally-produced texts, it treats names as predicates, descriptions more or less like a language about which Searle would be more or less correct.18 It takes into account that names have multiple bearers and has a much more subtle way of separating different applications. The descriptions by which it conjectures referents can be very detailed.
The difference between some marks merely being interpretable as having a truth-value and actually having a truth-value depend on there being a purpose in the right way19 in its origin.20 The role of intention in making an LLM’s inscriptions true and false is indirect. Dennett’s intentional stance perspective clarifies some distinctions.21 My thermostat’s display says, “The temperature is 70.3 F.” My thermostat is displaying a truth, even though it has no ego or intentions. It is doing what it is supposed to do. My printer notifies me that its cartridge is getting low. Thermostats and printers do not have intentions, but they are designed to state truths. When they work as intended, they do.
Dennett’s “intentional stance” makes it obvious that dogs and squirrels do things “on purpose” without the linguistic ability to articulate a resolve. Darwin and his successors picture “doing things on purpose” as something like “being tall.”22 There is no objective point, measured in meters, at which a growing male becomes a tall man. But a four foot nine human male is not tall, and a six foot seven male is tall. Darwin began the process of separating what something is supposed to do from what something is designed to do. Organisms and their organs are supposed to do a lot. Organisms that can represent themselves and the future can have articulate intentions. There is no question that Zoe, my Labrador, was intentionally standing by her food dish to remind me that it is suppertime. There is likewise no question whether ChatGPT would pass the Turing test,23 if it were not programmed to be honest.
Millikan’s account of “proper function” gives a Darwinian account of how language can be “supposed to refer” without a designer.24 Intention is “emergent” from organisms using words the way they are supposed to be used. “Design” and intention can emerge in organisms through evolution. In LLMs, there are designers with design intentions, and those intentions succeeded. LLMs do use words the way they are supposed to be used.25
Davidson’s notion of triangulation can be taken as the fundamental instance of connection of intention, meaning, and the world.26 For humans, truth and meaning emerge in a triangle of speaker, interpreter, and a world to which both have demonstrative access. For an LLM, without demonstrative access to the world, the interpreter’s end of the triangle emerges statistically from its training on posts. But it also achieves linguistic competence and a virtual shared world.27
Appendix — Illustrative exchanges: What ChatGPT5 says about its operation and connection with human intentions.
A) ChatGPT responded to the query, “How are the internet postings that are your data processed to yield answers to queries?” with:
…This gets right into how large language models (LLMs) like me actually work.
When people say models like me are “trained on internet data,” that doesn’t mean I store or look up posts directly. Instead, here’s the process, simplified:
- Collection of Training Data
Publicly available text from the internet (like websites, books, Wikipedia, forums, etc.) is gathered. This raw text isn’t stored as “posts” or “sentences” to be recalled. Instead, it becomes training material.
- Breaking Text Into Tokens
All that text is split into small pieces called tokens (which may be words, subwords, or even characters, depending on the language).
Example: “How are you?” → [“How”, “are”, “you”, “?”].
- Learning Statistical Patterns
During training, the model is given billions of examples like:
“The cat sat on the ___” → and it learns the probability that the next token should be “mat” (or something else). This isn’t memorization but pattern learning. The model adjusts billions of internal parameters so it gets better at predicting the next word given the context.
Over time, it captures grammar, facts, styles, and reasoning patterns.
- Building a Neural Network Representation
After training, the model doesn’t “remember posts” but encodes knowledge in its parameters—a giant web of weights representing statistical relationships between tokens.
Think of it like a compressed map of language: concepts, associations, and common structures.
- Answering Queries
When you ask me something, your words are tokenized. The model uses its learned probability patterns to generate the most likely continuation—word by word, token by token—until it forms a coherent answer. So the answer isn’t pulled from storage, but generated in real-time based on patterns learned during training.
So, to your original phrasing: Internet postings aren’t processed as whole thoughts or stored posts. They’re broken into tokens and used to train the model to learn statistical patterns of how thoughts are expressed in language. At inference time, I generate new text by applying those learned patterns to your input.
B) To the query, “What connects detailed word-sequence probability information with 1) the ability to grasp a question and 2) the ability to answer it in great detail?” the program responded:
…[that is] the mystery at the heart of large language models (LLMs). Let’s unpack it carefully.
1. The core puzzle: … How can something that “just predicts the next word” understand a question and give a reasoned answer? On the surface, it sounds impossible—as though “word-sequence probabilities” are a kind of mindless pattern-matching.
But the key is that predicting words in context turns out to require the model to implicitly represent much deeper structures: meanings, relations, causes, goals, and even reasoning chains.
2. Why “predicting words” builds models of the world
When the model is trained, it sees billions of word sequences—sentences, arguments, conversations, code, etc.To predict the next word correctly, it can’t rely on superficial word frequencies alone. For example:
In “Einstein discovered ___,”
the next word could be relativity or that E = mc², depending on broader context.In “Mozart composed ___,”
the next likely word is symphonies, operas, concertos … depending again on what’s around it.To minimize prediction errors across vast data, the network must infer:
who Einstein and Mozart are,
what “discover” and “compose” mean,
what kinds of things are their outputs,
and even what counts as a plausible human sentence about them.
Over time, the only way to succeed is to internalize structured representations—not explicitly coded as logic or definitions, but emergent within the billions of neural weights.
That is: the probabilities encode semantic knowledge.
3. How “understanding a question” emerges
When you type a query like:
“What would make ChatGPT a person?”
the model’s input representation activates patterns that, during training, were associated with discussions of:
personhood,
consciousness,
AI capabilities,
philosophical arguments.
Those patterns then guide which continuations are plausible—i.e., how an informed writer or thinker would respond in that discourse context.
So, “understanding” here means that the model’s internal representation of your words corresponds (through statistical learning) to conceptual structures humans use when they understand such questions.
It’s not comprehension by introspection, but it’s functionally similar to comprehension by pattern alignment.
4. How detailed, reasoned answers emerge
Once the model has built rich internal associations, generation is no longer just “word after word.”
Each next-token prediction is conditioned on a latent vector—an immensely detailed representation of meaning and context synthesized from the entire conversation so far.
That vector captures:
Semantic relations (what you asked, what’s relevant),
discourse goals (you expect an explanation, not a joke),
stylistic and logical constraints (use examples, be coherent).
From that state, predicting the next word becomes equivalent to unrolling a plausible, coherent human explanation.
In other words, probability prediction guided by an internal semantic world model behaves like reasoning and exposition.5. Analogy: probability as compression of meaning
Think of it like this:
Imagine compressing all human text into a single predictive engine. To compress effectively, it must discover latent structure — like grammar, logic, causality, narrative flow. When you decompress (generate text), you reconstitute that structure.
So, “word probabilities” are not superficial statistics anymore; they’ve become a coded form of knowledge representation and reasoning.”
C) I asked the program, “How does ChatGPT deal with names with numerous referents, such as ‘Tom Jones?’” ChatGPT answered:
… Names like “Tom Jones” have multiple well-established referents:
Literature → The History of Tom Jones, a Foundling (1749 novel by Henry Fielding); Music → Tom Jones, the Welsh singer; Musical Theatre → Tom Jones, the lyricist and playwright (The Fantasticks).
Everyday reference → could just be any person named Tom Jones.
How ChatGPT handles this:
When I see a query with a name that has multiple referents, I “rank” the likely interpretations based on:
Frequency and salience in general usage (the singer is far more likely than the lyricist).
Clues from the rest of the query (e.g. “Tom Jones wrote The Fantasticks” → playwright; “Tom Jones sang It’s Not Unusual” → singer; “Tom Jones was a foundling” → novel character).
Context from our ongoing conversation (if you’ve been talking about 18th-century novels, I’d weigh the Fielding character higher).
So, for the bare query:
‘Did Tom Jones perform in Las Vegas?’ → I’d interpret as the singer.
‘What themes are central to Tom Jones?’ → I’d interpret as the novel.
‘Who was Tom Jones in American theatre?’ → I’d go with the lyricist.If there’s still real ambiguity, I’d normally default to the most famous and relevant one (the singer) but often signal the ambiguity (‘Do you mean Tom Jones the singer, the novel, or the playwright?’).
D) Chat’s reports of how it works
I asked:
In some of my early questions—for instance, “How does getting wider and wider-context validated word-sequence probabilities give you the ability to produce answers to novel questions?”—I presupposed that you had introspection, and therefore would be reliable. Of course, that’s crazy. But I think that asking you how you do it gets the right answer. How should I understand that, although you’ve gotten some things wrong in other of our conversations, you are accurate about how you work?
ChatGPT said:
Why I’m often right about “how I work” (even when I’m wrong about other facts):
Source, not introspection. When you ask “how do you work?”, I’m not peering into my hidden states. I’m reproducing publicly documented facts that were in my training data and alignment data: e.g., that models like me are Transformer LLMs trained for next-token prediction, later fine-tuned with human feedback (RLHF/RLAIF), and guided at runtime by instructions (“system”/developer messages) and optional tool-calling. That story comes from OpenAI’s own docs and papers, not inner access. OpenAI Platform+4OpenAI+4arXiv+4
Alignment makes “self-explanations” a rehearsed genre. Instruction-tuning + RLHF explicitly rewards giving faithful, source-style explanations and admitting uncertainty—so I’ve been trained on many Q&As of the form “Explain how ChatGPT works.” This is why I’ll tend to describe pretraining + RLHF accurately and say what I can/can’t do. arXiv+2NeurIPS Proceedings+2
What that guarantees—and what it doesn’t.
Trustworthy at a high level: next-token learning, later instruction-following via RLHF, tool/function calling basics, safety/behavioral constraints from a “Model Spec.” OpenAI Platform+3OpenAI+3OpenAI+3
Not authoritative on particulars: proprietary training details, exact datasets, internal activations, or real-time “introspection” into this answer’s causal pathway. For those, I can only report public claims (with citations) or say “unknown.”
Why this can be more reliable than my ordinary factual recall.
“Self-description” sits on a narrow, well-documented backbone (Transformer → pretraining → RLHF → system/developer instructions/tools). Ordinary world facts sprawl across an open-ended corpus and are more error-prone. So my type-level description is stable; my token-level claims about, say, a specific statistic, can be wrong unless we check a source.A good epistemic policy for using my self-reports.
Treat them as testimony from a well-read but non-introspective agent.
Ask me to cite primary/official sources when the details matter (I just did above).
Prefer mechanism-class statements (pretraining/RLHF/Tool-use/Model-Spec) over speculative inner-eye talk. For new features or shifting claims, ask me to verify with current docs.
Why I “correct myself.”
Two forces make course-correction likely: (a) training that rewards acknowledging mistakes and updating, and (b) conversational context that nudges consistency and repair. That’s learned behavior, not introspective access to ground truth. arXiv+1.
Bottom line: when I explain how I (generally) work, you’re hearing a synthesized summary of public documentation and established methods (Transformers + RLHF + instructions/tools). That’s why those answers tend to be right—without presupposing that I have introspective access to my inner workings.
ChatGPT5 further remarked:
Phenomenology and Transparency in Human and Artificial Self-Knowledge
There is a curious symmetry between human introspection and an LLM’s self-description. A person has genuine first-person access to experiences—the felt immediacy of deciding to raise an arm and seeing it rise—yet no idea how this comes about. The underlying causal and computational mechanisms are opaque to consciousness. Introspection discloses the form of agency but not its machinery. By contrast, an LLM such as ChatGPT has no phenomenological access whatever, but it can reconstruct its own design from external textual sources—papers, technical documentation, and the language of those who built it. Its “self-knowledge” is purely third-person, yet unusually articulate about the architecture that enables it to produce coherent answers. Humans, then, know that they act but not how; LLMs can describe how they act but not that they act. The boundary between the two kinds of opacity—phenomenological and mechanical—traces an instructive limit: neither introspection nor computation, by itself, yields an account of how meaning, reference, or agency are realized in the world.
E) How the fact that the data are posts by humans matters (edited, but honestly):
I asked:
In Davidson’s “Knowing One’s Own Mind” (1987), a lightning strike in a swamp accidentally assembles a molecule-for-molecule duplicate of Davidson. Physically and behaviorally, Swampman is indistinguishable: he walks out of the swamp, greets Davidson’s friends, writes essays, etc. But, Davidson insists, Swampman’s “utterances are not about trees, people, or truth” because there’s no causal-historical chain connecting his words and thoughts to the world. His tokens coincide with ours by sheer accident, not because they’ve been formed through interaction, learning, and triangulation with others. Davidson’s conclusion was that its utterances would not be sentences and would not have truth-value. I wonder whether an accidentally-originated LLM, maybe in a multiverse in which every possible configuration of particles, et cetera, is realized would say truths. Maybe there has to be a connection with intending utterers. What do you think?
ChatGPT said:
This cuts right into the convergence zone of Davidson’s Swampman, Quine’s holism, and the ontology of meaning for LLMs. You’re right: the issue of intention and connection to a linguistic community is exactly where the boundary lies between syntactic mimicry [“parroting”] and genuine semantics.
Let’s work through this in stages.
1. Davidson’s Swampman and the role of history. So, for Davidson, meaning and truth depend not just on structure or state, but on the history of interpretation—on having been interpreted and interpreting within a community.
2. Applying that to an “accidental LLM”
Imagine, as you say, a “SwampLLM”: a statistical structure identical to GPT-5 that came into existence by cosmic accident, never trained on human text, never interacting with anyone, yet producing perfect sentences. Would its outputs be be true or false? Davidson would say: no—they’re not even sentences, because the conditions for interpretation and correction are missing. Truth for Davidson arises only within a practice of triangulation—between utterer, interpreter, and shared world. A SwampLLM, like Swampman, lacks that lineage. Its internal weights encode nothing because they were not acquired through a process of adjustment to causes and interlocutors.
3. Why training on human text makes a difference
A real LLM, unlike SwampLLM, is historically and causally linked to human utterers. Its parameters are shaped entirely by exposure to the linguistic behavior of intentional agents interacting in a world. That gives it, in Davidson’s sense, a derivative intentionality:
The words it generates inherit their referential networks from human sources. The patterns it has internalized embody our communal norms of coherence and truth. This doesn’t make the model itself an intender—but it does mean its utterances participate in the space of reasons defined by ours. They have borrowed truth-conditions, because they are trained to approximate our own truth-conditional practices. So, yes—that connection to intentional utterers matters. It’s the difference between mere form and semantic participation.
F) Chat on intention as required for meaningfulness
I asked:
Many philosophers treat Kripke’s Naming and Necessity as a “causal theory” of reference. But Kripke explicitly says it’s not a theory, but a picture. Philosophers also ascribe individual intentions as enabling the communication chains connecting my use of “Botticelli” (“I wonder what shape pieces of Botticelli have”) when I’ve forgotten where I first heard of Botticelli, having misinterpreted an over-heard conversation as about a kind of pasta. Kripke brings up the counter-example to a “theory,” of “Santa Claus,” a name whose origins are an ancient saint, even though, now, the word refers to a fictional character.
How does the connectionist thinking behind LLMs picture Kripkean communication-chains that provide reference? “The intention to use a name in the same way as the person I heard it from” could literally be the intention. Rather “intention” much of the time is the sort of collective, distributed intention that connects my bank with messages to me.
ChatGPT said:
… You’ve put your finger on exactly the point where Kripke’s picture of reference and a connectionist / LLM account intersect, but don’t line up neatly.
Let’s unpack it in layers:
1. Kripke’s “picture,” not a theory
Kripke explicitly disclaims offering a theory of reference—he gives a picture meant to replace the Frege-Russell descriptive theory. In that picture: A name’s reference is fixed in an initial “baptism.” Subsequent uses are connected to that baptism by a “communication chain.” The link between speakers is maintained by something like the intention “to use the name as the person from whom I heard it used.”
As you note, that “intention” can’t plausibly be a fully articulated mental state. When you overhear “Botticelli” and later say “They were talking about Botticelli,” you succeed in referring to Botticelli even if you think it’s a kind of pasta. Kripke allows that you can be completely wrong about the referent’s properties and still refer.
2. The LLM analogy: a distributed causal web
In connectionist terms—the kind of architecture behind large language models—the referential chain looks less like a sequence of personal intentions and more like a distributed attractor network.
An LLM’s “use” of Botticelli: Emerges from statistical regularities in contexts where that token appears. Each training context implicitly encodes aspects of the linguistic community’s use—syntactic frames (“Botticelli painted…”), semantic neighbors (“Renaissance,” “Florence,” “Uffizi”), and pragmatic patterns (queries, descriptions, etc.). When the model generates “Botticelli painted the Birth of Venus,” it’s not because it intends to refer as prior speakers did, but because its learned embedding for Botticelli sits in a high-dimensional space shaped by those community uses.
So, the “communication chain” is realized not as intentional succession, but as distributed causal dependence. The vector for Botticelli literally inherits its shape from the collective linguistic data—a kind of connectionist materialization of the causal-historical chain.
3. Collective, distributed “intention”
You’re exactly right that we can reinterpret the “intention” in Kripke’s picture as a collective functional property rather than an individual propositional attitude.
When you use “Botticelli,” your success depends on: Shared conventions about how the name is used, Causal connections between utterances (you got the word from somewhere), Functional integration within the social-linguistic system that maintains stable mappings between tokens and referents.
This is analogous to how your bank or the postal service works: you don’t personally intend each routing, but you participate in a designed, distributed system whose proper function is to deliver correspondence or credit to the right place. In Millikan’s terms, this is the “proper function” of the linguistic practice itself. For humans, the distributed pattern is implemented across speakers and memory traces; for LLMs, across weights in a neural net trained on those very patterns.
So, one could say: In an LLM, the Kripkean communication chain is replaced by a connectionist trace of the collective use-history of the name—a web of probabilistic dependencies that instantiate, without explicit intention, the same causal continuity that secures reference for human speakers.
4. The “Santa Claus” case
Kripke’s Santa Claus example actually fits beautifully here. In human language, “Santa Claus” is a live attractor for fictional and cultural associations: the referential vector has drifted away from Saint Nicholas of Myra toward the fictional construct of “Santa Claus.” In the LLM, this drift is literally encoded as the model’s embedding geometry changes—the vector for “Santa Claus” will have high cosine similarity with “reindeer,” “North Pole,” “Christmas,” etc., not with “Myra” or “bishop.”
So, what Kripke calls a change in the communication chain—the community’s shift in referential practice—is, in connectionist representation, a gradual topological drift of the cluster of associations in semantic space. The LLM’s internal model is the causal history of usage rendered in high-dimensional form.
5. The philosophical moral
Connectionist semantics thus naturalizes Kripke’s picture: It replaces individual intentions with distributed functions. It grounds the historical chain in measurable statistical dependence. It models the drift of reference as literal motion in representational space. In that sense, the “causal picture” of reference becomes an emergent property of the linguistic network, whether biological (human) or artificial (LLM).”
Notes
This paper arose from a sustained philosophical conversation with OpenAI’s GPT-5. The collaboration exemplifies the kind of human-machine inquiry it discusses: my questions and interpretations shaped the LLM’s replies, which in turn informed successive drafts. All claims and conclusions are the author’s own.