1. Introduction
Placing the query “What is a tree” will prompt ChatGPT 5.2 to answer:
“A tree is a tall, perennial plant with a woody trunk, branches, and leaves or needles. Trees usually live for many years and grow larger than most other plants.”
We can understand this answer and easily relate to it. In fact, we can comprehend almost all of the responses provided by LLMs and find them relatable. Since our queries concern the world we live in, the responses appear to reflect that same world. We understand them because they align with our own understanding of reality. But how is this possible? In this paper, I argue that LLMs are successful in encapsulating a world model, even if this model is partial and incomplete. While this model differs from that of a human agent, it is similar and captures enough relevant features for us to find the responses meaningful and factually accurate. This view contrasts with many prevailing opinions, which point out that these systems merely predict the next token in a sequence based on large volumes of training data. My view differs, as I conjecture that LLMs are encapsulating a human-like world model, even if this was not intended by their training.
Most authors deny real similarities between human agents and LLMs, arguing that models rely on surface correlations rather than semantic comprehension, and that benchmark performance does not necessarily reflect real understanding of language and the physical or social contexts it encodes [
1]. Cuskley et al. argue that LLMs offer a purely functional similarity to human language and they provide very limited insight into human language and cognition because their learning is fundamentally different from human language acquisition. Human understanding relies on rich multimodal, interactive, embodied experience, whereas LLMs learn from massive unimodal text corpora. The authors conclude that superficial output similarity to human language does not imply mechanistic or semantic understanding [
2]. LLMs are often seen to perform at chance accuracy and make qualitatively non-human errors, suggesting LLMs lack a compositional operator necessary for genuine semantic understanding [
3]. LLMs tell us very little about actual human language and cognition because similarities between model outputs and human language are purely functional and surface-level [
2], acting as autocomplete engines [
4] or simulating knowledge [
5] rather than having commonsense human understanding. The apparent “intelligence” of LLMs is considered to be a byproduct of pattern prediction rather than deep cognitive understanding [
6]. LLMs are described by Bender et al. as being “stochastic parrots,” generating plausible text without meaning or intent, which can lead to misinformation and harm [
7]. LLMs lack true understanding because they do not possess grounded, causal mental models of the world and instead rely on patterns in text data. Mitchell and Krakauer emphasize that human understanding is typically based on concept-rich, causally structured models shaped by embodied experience, which current LLMs lack, despite their impressive linguistic performance. They conclude that this debate highlights the need for an expanded science of intelligence capable of distinguishing different forms of understanding—human and artificial—and developing new methods to evaluate them [
1].
If they do not directly reject the idea that LLMs have understanding, authors point to the need for sufficient grounding in the world in order for their performance not to be a mere syntactic manipulation or an imitation of understanding, which is only partial and very shallow [
8].
In this paper I will not discuss whether LLMs have or do not have genuine understanding. I will conjecture that LLMs encapsulate a human-like world model, which grounds their output in a world model similar to the one that grounds our own linguistic output. The question of whether similar world models lead to genuine understanding will not be discussed in this paper. If I were to answer this question, I would say that the answer is probably no. However, the focus of this paper is to evaluate whether LLMs can acquire parts of our world model, even though this was not intended during their training. The structure of this paper is as follows. First, I draw on the FEP to introduce the notion of a “world model” as a set of structural and statistical representations of the external world. Second, I draw a parallel between the success of human linguistic communication and the sentences produced by LLMs, supporting the claim that the communicative success of LLMs stems from their indirect grounding in the same worldly facts and from their partial capture of our world models. The underlying idea is that, under the FEP, similar behaviors in similar environments imply similar world models. Additionally, I am evaluating whether human and LLM neural networks are similar when their behaviors are similar. This would potentially strengthen my hypothesis: similar neural networks deploying similar behaviors in similar environments strengthen the case for similar world models. Third, I discuss the limitations of LLMs and the differences between LLMs and human agents, emphasizing the necessity of a direct connection to the external world and pointing toward a future AI system that would operate under the FEP and thereby construct its own world models directly.
2. World Models Under the Free Energy Principle Framework
The FEP promises to be a global brain theory, explaining learning, perception and action under one unifying principle of reducing free energy [
9]. Moreover, the FEP extends to any system that is self-organizing by stating that such a system must minimize free energy in order to persist over time, while remaining separated and at equilibrium with its environment [
10]. Minimizing free energy is a means for reducing prediction error, hence reducing surprise. Free energy is defined in relation to surprise: “Free energy is an information theory measure that bounds or limits (by being greater than) the surprise on sampling some data, given a generative model” [
11]. Organisms cannot directly minimize error of prediction, but they can minimize free energy, a value that is bigger than the error of prediction. Thus, by minimizing free energy, self-organizing systems are minimizing their prediction error, hence improving their predictions and reducing uncertainty in relation to the external environment [
12,
13,
14].
Human agents and other self-organizing systems update their models through perception and confirm them through action. Active inference is understood as perception, through which models are updated, and “action of the agent on the environment to alter sensory input to better fit sensory predictions” [
15]. If you believe that your dog is at the door, you will open the door so she can enter. You will not open the closet, as this is not in accordance with your model. Actions are a way to confirm the model and are a direct consequence of it. Human agents do not have direct access to the external environment, but rather treat it as unknown and try to infer the causal relation of hidden causes from the sensory input they receive. In this view, the brain is a Markov blanket [
16,
17,
18]. Under FEP, the external environment is hidden, and human agents can only access the sensory input caused by it and try to infer the hidden causal relationships of the external environment. They can do this by inferring probabilities from what they know, in this case by calculating free energy, which is always greater than surprise [
15]. By reducing free energy, the human agent reduces uncertainty in relation to its environment.
LLMs are not acting in our environment in the same way human agents do, hence their active inference is incomplete. Pezzulo et al. argue that while LLMs have demonstrated remarkable performance, they lack the embodied, purposive interaction with the world that characterizes biological cognition. LLMs exhibit passive, statistical learning, whereas biological agents exhibit active, sensorimotor-based model learning, which enables living organisms to generate meaning grounded in real-world experience through perception and active inference. The authors argue that genuine understanding in humans emerges from the predictive modeling of self-initiated actions and their consequences, whereas generative AI lacks such grounding and may therefore merely mimic understanding. The authors further suggest that future AI systems might achieve more authentic intelligence by incorporating embodied interaction, thereby enabling them to form causal models and purposive behavior rather than merely passively reflecting patterns in training data [
19]. This point is straightforward, and I agree with it. An artificial general intelligence would need to be connected to the external environment in order to develop its own world model—one that updates in real time through interaction with the world. At the same time, LLMs may still acquire a human-like world model because their training data concern the world, and correctly answering human queries may require a world model similar to that possessed by human agents. Yıldırım and Paul employ the concept of a world model to assess whether LLMs possess genuine knowledge of the external environment. World models are defined as “structure-preserving, behaviorally efficacious representations of the entities and processes in the real world” [
20]. They support the idea that LLMs demonstrate “instrumental knowledge,” which is knowledge acquired through the successful use of instruments—in this case, language. They argue that LLMs use language in a way similar to how a child might use a remote control: she knows how to use the remote as an instrument, without understanding what is inside it or how it functions. I am accepting this definition of world models, but I am exploring a different possibility that LLMs preserve an important part of our own world model, and their answers are grounded in our own world model. This may be a mediated—and therefore imperfect—world model, but it is nonetheless more similar to human world models than many others would accept. A world model is an internal model encapsulated within our neural networks. It is captured by the structure of our brain and matches the external environment, as it is shaped through interaction with it. We can envisage the possibility of an agent having a world model without necessarily acquiring it independently; there is nothing that seems to prevent this. Once a world model is constructed, I conjecture that there is nothing preventing it from being encapsulated within another system. This functionalist view acknowledges that a generative model requires an embodied agent interacting with its environment in order to be created, but it also supports the idea that, once constructed, its structure can operate within different systems. Indirect evidence for this is that human agents can transmit parts of their own world models through cultural exchange and learning. Learning from others is an effective way of updating our own world model, even when the knowledge acquired is not constructed through our own active inference process. The same neural network structure can, in principle, be implemented across different systems, as suggested by the fact that different humans possess similar world models. Behavior reflects the model: if behavior is similar across different neural networks, I argue that a similar world model is encapsulated within them. Humans exhibit similar behaviors because their world models are similar, despite differences in their specific implementations. World models are constructed through active inference, but the reverse also holds: consistently similar behavior points to similar world models. In this interpretation of the FEP, behavior itself describes active inference. If one knew everything about an agent’s behavior and the niche it inhabits, one could, in principle, infer the structure of its world model. Once a human agent has built a sufficiently rich world model, it can be placed in a room and engage in conversations without needing to interact meaningfully with the external environment. Having a world model is sufficient for such interaction; full active inference is not required at that point. I conjecture that something similar occurs with LLMs, which encapsulate parts of our own world model. While it is clear that, at this stage, given the way LLMs are built, they cannot construct a world model on their own, I conjecture that they may still acquire parts of our world model.
3. LLMs Are Encapsulating a Human-Like World Model
A system that is built and trained to predict the next token in a sentence may be able to acquire a world model because its training material, which consists of human-based sentences, is grounded in the external world; hence, LLMs’ output is indirectly grounded in the external world. The intuition is that something is captured when training is performed on sentences produced by human agents, given that most of these sentences are intended to express true facts about the world; therefore, the training material is not random. I considered four directions supporting the claim that LLMs may encapsulate a human-like world model.
First, I rely on behavior and the similarity between LLM responses and those of human agents, under the hypothesis that similar behaviors point to similar models. This includes observations that some LLMs perform at a level comparable to, or even exceeding, human experts in highly specialized domains. I acknowledge that this could be described as instrumental knowledge, but it nevertheless supports the intuition that LLMs capture something correct about the world.
Second, similarities between the brain’s and LLMs’ neural networks add to the behavioral similarities. Mechanistic interpretability (MI) can be seen as an emerging “neuroscience for language models” framework that seeks to reverse-engineer transformer-based LLMs into understandable computational mechanisms by studying three core objects: features (human-interpretable concepts encoded in activations), circuits (computational pathways connecting components and behaviors), and universality (whether these mechanisms generalize across models and tasks) [
21].
Third, acquiring internal models may also be understood by analogy with certain forms of human learning. Humans can learn about the world through language alone, without always verifying it through direct interaction with the environment. Of course, this process ultimately relies on a generative model that is itself grounded in the external world, but learning does not necessarily require continuous grounding.
Fourth, I explored the notion of grounding as developed by Kit Fine [
22], attempting to explain how such grounding might work. This attempt is far from complete, but it contributes to previous arguments supporting the idea that LLMs encapsulate a human-like world model. LLMs provide a form of non-factive, weak, and partial grounding, which is, in some respects, similar to how human agents ground certain aspects of their own linguistic communication. Human agents also ground their language directly in the external world, a form of grounding that LLMs lack. However, this weaker form of grounding helps explain how LLM-generated communication can still be indirectly grounded in the external world—namely, by being grounded in human language, which is itself grounded in the world.
The first approach is to provide evidence that the behavior of LLMs is similar to the behavior of human agents. Several studies, which I will discuss below, indirectly support the claim that LLMs may encapsulate a human-like world model by showing that their behaviors are similar. The intuition underlying this claim is that human language is about the world and, more importantly, that our sentences aim to provide a true description of it. Of course, many sentences are false or only partially true; however, in general, human communication aims to relate to the world—that is, to reflect what is true about it. Our language is grounded in our world model. Training LLMs on vast datasets of sentences that are true about the world may lead them to encapsulate, within their artificial neural networks, a model of the world—similar to the way human agents encapsulate, within their biological neural networks, a generative model of the external world.
Assuming that LLMs have a world model helps in assessing whether they can provide good predictions about the future. Luo et al. present a study showing that LLMs, when fine-tuned on the neuroscience literature, can outperform human experts in predicting the outcomes of neuroscience studies. The core concept revolves around treating LLMs as world models—systems that can generalize from massive data to make forward-looking predictions in scientific domains [
23]. The structure of these benchmarks is in the format of question-and-answer, in which models should retrieve relevant information based on the context of the question and answer correctly, thus demonstrating extensive world knowledge. “LLMs outperformed human experts on BrainBench with LLMs averaging 81.4% accuracy and human experts averaging 63.4%” [
23]. This is a potential indication that the world model possessed by LLMs acts in the same way as the one possessed by humans and that it is not just a mere imitation; otherwise, the results should have been less accurate for the LLMs.
Another interesting study concludes that LLMs form structured mental representations of objects that align closely with human cognitive processes. By analyzing 66 core dimensions derived from behavioral data and comparing these to neural patterns in regions like EBA, PPA, RSC, and FFA, the authors demonstrate that while the representations are not identical, they reflect similar conceptual frameworks. MLLMs showed greater alignment with human judgments and brain activity due to their visual capabilities [
24]. These findings suggest that multimodal AI systems can learn abstract conceptual knowledge without explicit task training, and their internal representations are interpretable and stable.
Not all researchers believe that grounding in the external environment is necessary or that LLMs are linked to human agents’ world models. Xu et al. consider that rich, structured conceptual knowledge can arise solely from [
24] statistical learning of language, challenging the idea that real-world interaction is necessary for concept acquisition. LLMs can develop structured, human-like conceptual representations from training on text alone. These representations are stable across contexts, predict the model’s task ability, and reflect patterns seen in human behavior and brain activity, suggesting a meaningful parallel between artificial and human conceptual organization, but the authors do not imply that this is linked to the acquisition of human-like world models [
25]. These examples strengthen the claim that LLMs and human agents are similar when it comes to their linguistic behavior. Under the additional hypothesis that systems with similar behaviors may possess similar internal models, we may build a case that LLMs may acquire a human-like world model, even if this is not intended through their training.
Mahowald et al. emphasize the critical distinction between formal linguistic competence—the ability to generate grammatically correct and structured language—and functional linguistic competence—the use of language to achieve goals in real-world contexts, which requires broader cognitive functions. While LLMs demonstrate near-human performance in formal tasks (like syntax and morphology), they struggle with functional tasks that depend on reasoning, world knowledge, situation modeling, and social cognition. They suggest that this gap is due to the current design of LLMs, which are optimized for next-word prediction, a task well-suited for mastering formal competence but insufficient for building robust world models. World models, essential for functional competence, involve integrating factual knowledge, tracking situations over time, and understanding agents’ intentions. The paper argues that human cognition supports language through modular systems—distinct networks for language and thought—suggesting future LLMs may need architectural or emergent modularity to emulate this integration effectively [
26]. Our opinion is different, as we support the idea that LLMs are capturing, even if in an imperfect way, human-like world models. This is not due to the intention of LLMs’ creators, but is instead a consequence of weak grounding in humans’ world models through training on large sets of language data.
Centaur, an LLM built on Llama 3.1 70B and fine-tuned on a dataset (Psych-101) with 160 experiments, 60,000+ participants, and 10M+ decisions, is intended to be a model that generalizes across domains, scenarios, and task structures, showing characteristics of a true world model of human behavior. Centaur outperforms both its base LLM and traditional domain-specific cognitive models in predicting human behavior across hundreds of tasks. It generalizes to unseen participants, novel task structures, new domains, and altered narratives, a rare and powerful feature of world models. It simulates open-loop behavior closely matching human strategies and variability—including model-free and model-based learning mixtures. Centaur is intended to be a novel world model of human cognition—a large language model fine-tuned to predict human behavior across diverse psychological tasks and is positioned as a general-purpose computational model capable of simulating, predicting, and explaining cognition across multiple domains. Centaur is shown to predict the behavior of participants who were not part of the training data during the experiments. This predictive power extends to modified cover stories, altered problem structures, and even entirely new domains. Even more interestingly, Centaur’s internal representations seem to become more aligned with human neural activity, despite the model never having been explicitly trained on neural data and instead being trained only on behavior. Researchers used fMRI data from 94 participants performing a decision-making task and compared brain activity with the model’s internal states before choices and after feedback. Centaur consistently predicted human neural activity better than the base Meta Llama model across brain regions and layers. The tasks used were not part of Centaur’s training data, demonstrating strong generalization. These findings suggest that fine-tuning on large-scale behavioral data can make AI representations more similar to human neural processes [
27]. This leads to a potential avenue for testing the hypothesis that LLMs encapsulate human-like world models. We can explore this by examining whether there are similarities between the human brain’s neural networks and the artificial neural networks of LLMs. If the two show strong similarities, this may provide additional evidence that they encode similar models of the world, even if this was not the explicit intention of their training. Centaur can be seen as a suggestion of how future developments of LLMs will lead to systems that can better capture the sensorimotor aspects of our world models. Even if lacking direct access to the external environment, future LLMs may be able to encapsulate sensorimotor information, thus becoming proficient in action within our world, despite lacking direct access to it.
This brings us to the second approach for gathering evidence that LLMs may encapsulate a human-like world model: analyzing the similarities between the human brain’s neural networks and LLMs’ artificial neural networks. Kumar et al. investigate whether transformer-based language models and the human brain share similar forms of functional specialization during language comprehension. Using fMRI recordings from participants listening to naturalistic spoken stories, the authors analyze BERT’s internal “transformations,” namely the contextual computations performed by individual attention heads. They show that these transformations predict activity across cortical language regions as well as, or better than, traditional linguistic features and static word embeddings, suggesting that contextual computations in transformers resemble neural processes used by the brain to construct meaning from language. The study further demonstrates that individual attention heads exhibit specialized functions that map onto distinct brain regions, with gradients related to layer depth and context length (“look-back” distance) emerging across cortical space. Early and intermediate transformer layers align more closely with posterior temporal regions involved in local syntactic and semantic integration, while heads attending over longer narrative contexts align with anterior temporal and prefrontal areas associated with higher-level comprehension. Importantly, the correspondence disappears when the model is untrained or when the headwise structure is randomized, indicating that the observed brain–model alignment depends on learned linguistic structure rather than architecture alone. Overall, the paper argues that transformer attention mechanisms capture meaningful computational principles shared with human language processing and offers a framework for linking MI in AI with neuroscience [
28]. The MI approach suggests that LLMs develop internal computational structures that partially resemble how the human brain processes language communication. Studies using fMRI, ECoG, sparse autoencoders (SAEs), representational similarity analysis, and causal circuit tracing show that LLM hidden representations align with activity in the human language cortex, particularly during predictive next-word processing [
29]. Shue et al. argue that large language models and human neural systems may share important organizational similarities, particularly in how they represent meaning through distributed, sparse, and compositional patterns rather than single “grandmother neurons.” Seemingly opaque LLM activations can be decomposed into interpretable semantic units, supporting the idea that neural representations in artificial systems may converge toward brain-like sparse coding principles for efficient information processing. At the same time, the authors acknowledge that the theoretical foundations remain incomplete and that current interpretations are still limited and approximate [
29].
If we can develop MI into a method that allows us to analyze LLMs’ neural networks in a manner similar to how fMRI is used to analyze the human brain, we may be able to demonstrate similarities between the neural networks of LLMs and those of human agents. The existence of similar neural structures alongside similar behavioral patterns would provide an additional anchor for supporting the hypothesis that LLMs may encapsulate models similar to those of human agents, even if this was not the original intention behind their training.
Exploring the notion of grounding as developed by Kit Fine [
22], we can attempt to explain how the grounding of LLMs might work. Grounding is differentiated by Kit Fine into several types, including factive vs. non-factive grounding
1, full vs. partial grounding
2, and weak vs. strict grounding
3. LLMs may provide a form of non-factive, weak, and partial grounding, which is, in some respects, similar to how human agents ground certain aspects of their own linguistic communication. Human agents also ground their language directly in the external world, a form of grounding that LLMs lack. However, this weaker form of grounding helps explain how LLM-generated communication can still be indirectly grounded in the external world—namely, by being grounded in human language, which is itself grounded in the world. While none of these four approaches provides a fully robust account of the idea that LLMs may encapsulate human-like world models, together they support the intuition that training LLMs on sentences intended to express truths about the world leads them to acquire such models. LLMs’ grounding is weak, which helps explain why LLMs, in their current form, cannot be fully embodied or perfectly aligned with the external environment, and why they lack certain aspects of human knowledge.
4. Limitations of LLMs and the Need for a Direct Connection to the External Environment
While LLMs often produce good answers that indicate they manage to capture a significant part of our world model, we can also observe their limitations. Many times, their answers are not comparable to what a human agent would provide. Often, LLMs fail to recognize what is implied in our questions, resulting in unsuitable answers. Even if LLMs capture part of our world model, this is done indirectly, via statistical analysis of our language. The training data consists of statements issued by human agents, which are based on our world model. LLMs’ neural networks and algorithms incorporate the regularities captured by these models. It is an imperfect translation of the world model into the neural weights of the LLMs.
Through the lens of active inference, LLMs are typically seen as passive predictors lacking real-world engagement. Kulveit et al. argue that LLMs share many features with active inference agents—most notably the use of generative models and error minimization. The main difference lies in LLMs’ limited feedback loops: they do not yet perceive the consequences of their outputs. The authors suggest this gap is closing as newer models incorporate user interactions and real-time learning, which could drive LLMs toward increased autonomy, adaptability, and self-awareness, with profound societal implications [
30]. I agree with Kulveit et al. that LLMs do not share a full active inference model, but I believe that this is not necessary, as the work of building a world model is already done by human agents. Hence, LLMs do not need to be connected to the outside world to build it. Incorporating live feedback loops may help, but it is not a necessary step for LLMs to encapsulate a human-like world model and thus possess real understanding.
Hu et al. investigate whether LLMs possess theory of mind (ToM), the ability to attribute mental states to others. While previous research yields conflicting claims, the authors argue this confusion stems largely from two issues: (1) unclear definitions of what it means for an LLM to “have” ToM, and (2) problematic evaluation practices that may not truly test ToM. The authors advocate a more precise and computation-centric approach to understanding ToM in AI and support the idea that the confusion around whether LLMs possess ToM arises from ambiguous definitions and flawed evaluations. The authors argue that matching human behavior on ToM tasks (behavioral matching) is not enough to confirm true ToM capabilities. Instead, we should focus on whether LLMs use similar computations to humans when inferring mental states [
31]. I disagree with the claim that, for an LLM to possess a Theory of Mind or a human-like world model, it must implement the same computations as humans. The same models can be encapsulated in different ways, as demonstrated by the success of LLMs. If, by “world model,” we mean the set of policies and behaviors that lead to successful interaction with the external world, then the type of computations employed to achieve this success does not matter. By “encapsulating a world model”, I mean the preservation of a neural structure—artificial in the case of LLMs—that can give rise to the same behavior, linguistic in the case of LLMs. It is clear that LLMs do not construct a full generative model by themselves, due to their incomplete active inference and their lack of connection to the external world. However, they do encapsulate parts of our own world model in their neural structures. The strong claim underlying this is that behavior is a strong indicator of similar models. This can be illustrated by a more extreme example: an android that operates exactly like a human, to the point that it would be indistinguishable from one. I conjecture that such an android would possess a world model similar to our own. This is a separate discussion, and it is not the purpose of this paper to provide a detailed account of it; rather, the accepted hypothesis here is that similar behavior is best explained by similar world models.
An interesting study by Xu et al. investigates whether LLMs can replicate complex conceptual representations. They analyzed ~4442 lexical concepts between humans and LLMs across non-sensorimotor, sensory, and motor domains, and found that LLMs closely matched human responses in non-sensorimotor dimensions (like valence and imageability), but their alignment significantly declined in sensory domains and was poorest in motor domains [
32]. LLMs with visual training (e.g., GPT-4, Gemini) performed better than text-only ones (e.g., GPT-3.5, PaLM) in visually related and imageable dimensions, suggesting that visual grounding improves alignment with a human-like world model. However, LLMs still failed to capture a full human-like world model due to their lack of direct grounding, which leads to a lack of access to action-based experiences that are core to sensorimotor understanding.
Even when LLMs score highly, for example, on medical knowledge tests, they do not demonstrate metacognitive abilities like confidence calibration or error awareness crucial for genuine clinical reasoning. The results of the study conducted by Griot et al. suggest that the apparent competence of LLMs is limited to pattern recognition rather than true, introspective understanding of tasks—indicating a gap between task performance and deeper comprehension [
33].
Improvements could arise from establishing a direct connection to the external environment, although it is not clear how this would be implemented. In principle, there is no obvious argument against an artificial system being directly connected to the external world and acquiring its own world model in accordance with the FEP. However, it does not seem that such a system would operate in a manner similar to current LLMs. The training data would differ, and statistical modeling would be applied to interpret different types of sensory input. In the case of LLMs, the sensory input consists of text, and active inference consists of generating text output and interacting with users, as well as with human and artificial trainers. This situation would be entirely different if the sensory input were derived directly from the external environment. Rather than modeling text that reflects a world model, the system would construct a world model directly. The resulting conversational capabilities might be similar, but the method of deriving the world model would be fundamentally different.
5. Discussion
In this paper, I rely on the concept of a world model under the FEP, clarify that LLMs do not operate under FEP, and discuss the hypothesis that LLMs may encapsulate a human-like world model despite the fact that human agents and LLMs differ so fundamentally in their design. Human agents appear to encapsulate world models understood as generative models under the FEP—that is, models of how the causal structure of the external world is generating the sensory input that we receive. According to the FEP, the human brain implicitly encodes such a model, embedded in the dynamics of its neural networks. LLMs may capture part of this model within their own artificial neural networks, stemming from their training on large corpora of human-produced sentences.
LLMs are grounded in worldly facts. By grounding their statements in those issued by human agents, LLMs capture part of the world models of human agents. If we understand world models as “structure-preserving, behaviorally efficacious representations of the entities and processes in the real world” [
20], then LLMs are preserving our world models, as the answers they provide reflect the efficacy of human agents’ world models. At first glance, the way LLMs operate seems to imply that they cannot have any strong relation to human agents’ world models because, ultimately, all they do is predict the next best token to construct a statement. There is no thinking involved. It does not appear to bear any clear similarity to the human mind. LLMs do not seem to have any real understanding of the statements they produce. However, due to the grounding of their statements in those issued by human agents—statements themselves grounded in worldly facts and thus related to a world model—LLMs manage to capture part of this world model. What they capture is enough to produce statements that are recognized as meaningful by human agents. Of course, LLMs are not capturing the full extent of the world model of human agents. To do so, they would need to function in accordance with the FEP and have a direct connection to the external environment. Nevertheless, they are capturing enough of our world models to produce meaningful answers to some complex questions that require a solid understanding of those models.
LLMs encapsulate a human-like world model because they are trained primarily on human-generated language. The intuition is that, since language is about the world, what LLMs acquire—even if not intentionally—is a world model. It is a human-like world model because our utterances are generally intended to be true about the world. We aim to ground our language in the world in a correct and meaningful way. Therefore, LLMs do not need to acquire a world model independently; they acquire a copy of ours. This copy is not perfect, but it is good enough to support the claim that LLMs possess true understanding, as their answers are grounded in the very same world model that grounds our own. This points to the idea that, if the training of LLMs is continuous, they will not need to obtain a direct connection to the environment, but will continuously obtain a copy of our own updated world model. To better understand how this might work, consider a human agent who is asleep and then, after waking up, begins a conversation with another person. Even while sleeping, we can reasonably assume that a world model remains encoded in her brain’s neural network. This suggests that an agent does not need to be actively interacting with the world at every moment in order to possess or maintain a world model. Naturally, this world model was originally acquired through interaction with the world. However, once acquired, it does not require continuous interaction to persist. When she engages in conversation with another human agent, her linguistic output can be understood as being grounded in this world model. My hypothesis is that something similar may be occurring in LLMs, with the important difference that their world model is acquired indirectly through training on human-generated text, even though the training process itself was not explicitly designed to produce such a model. LLMs lack the grounding of human agents: they do not possess full generative models, nor do they exhibit agency. However, they may still manage to encapsulate a human-like world model that allows them to engage in conversations in a way similar to how a human counterpart would. Several objections can be raised against my position. First, it is not at all clear whether LLMs operate under the FEP. There is currently no strong evidence that they do. While we can identify certain similarities—such as the presence of sensory and active states, the existence of internal models, and other structural parallels—there are also many notable dissimilarities. One key difference is that LLMs predict the output (i.e., the next token or sentence to be generated), whereas systems governed by FEP typically predict the input (i.e., the next sensory stimulus expected to be received). In other words, LLMs do not predict what the user is going to input; rather, they predict the most appropriate response to the given input. Of course, under FEP, everything is predictive—including active inference—so one might argue that predicting the output still fits within the broader predictive framework. However, LLMs seem to lack a critical focus on sensory input prediction. If we are to evaluate LLMs through the lens of FEP, then they should be dynamically updating their internal models during interactions with humans. At present, this does not appear to be the case on a large scale. Instead, what we observe is a system that is structurally similar to an FEP-based agent, but not fully aligned with its principles. In this sense, LLMs may possess human-like world models, but these models are not acquired by themselves under FEP. This distinction does not necessarily undermine our position; it may, in fact, reflect the partial rather than the complete success of LLMs. LLMs are only partially successful because they do not fully operate under the FEP. An LLM that predicts the next input a user will provide—a personalized LLM—could potentially perform better than a general-purpose LLM. Such a system would predict the next sensory input, which in this case would consist of the user’s query. Of course, the LLM would still predict the next token in its response, but the primary focus would shift toward modeling sensory input rather than generating output. I accept this objection as valid and conclude that LLMs are not operating under the FEP due to their lack of direct connection with the external environment and the incompleteness of their active inference. However, LLMs do not need to operate under the FEP, as they do not need to construct their own generative models. Instead, these models are transferred through training from those built by human agents. The FEP is nevertheless very useful in clarifying how a generative model can be constructed—namely, through active inference under conditions of direct interaction with the external environment—and in highlighting why LLMs lack this capability. The FEP is agnostic to the way the external environment is actually structured. The brain builds a model and acts in the environment to test this model, continuously comparing its sensory input with expectations generated by its own predictions. In a sense, for the brain, the external environment does not matter directly; what matters is the reduction in free energy. Of course, the external environment is real—I may hold a model that says “I am immortal” when crossing the street, but I may still be hit by a bus and die. However, when operating in the world, an agent does not need anything beyond its Markov blanket and its internal models (which are themselves indirectly shaped by the external environment). We believe that human agents possess similar world models based on several key considerations: they appear to share similar environments, possess similar neural architectures, exhibit similar behaviors, and rely on similar sensory systems. Under the FEP, one can argue that their Markov blankets are similar, that their internal models are therefore similar, and that the hidden causes shaping their statistical modeling are also comparable. I am attempting to build a similar case for LLMs and human agents, based on similarities in behavior and neural architecture. At the same time, it is clear that LLMs are fundamentally different from human agents: they do not possess agency, nor do they construct generative models in the same sense as humans. Even if they could be said to possess such models, the hidden causes they would model would likely differ significantly from those relevant to human cognition. Nevertheless, the similarity of their behavior—exhibited even in complex situations—together with certain similarities in neural architecture may support the claim that LLMs encapsulate world models that are, to some extent, similar to those of humans. There are multiple ways to challenge this position, but I do not think that FEP is obviously incompatible with behavioral functionalism.
A second objection that can be raised against my position is the following: if one accepts that LLMs possess a world model similar to that of human agents, it would seem to follow that placing an LLM within an android body should result in a human-like system. However, this conclusion does not appear plausible, as LLMs would need to have a proper connection to the external world and to be able to process their newly acquired sensory inputs. It would constitute a fundamentally different system. I therefore agree that the claim that such a system would operate like a human agent cannot be accepted. This objection is significant because it exposes a core issue in my position: if a human-like world model is not sufficient to build an android that behaves in a human-like manner, then something essential must be missing from the world models acquired by LLMs. My response is that LLMs lack many components of our world models—for example, those related to movement, self-evidencing in interaction with the external environment, and numerous other functions tied to human sensory input and emotionally regulated policies, etc. But this does not mean that many other parts of a human-like world model are not acquired. This is a similar situation to that of human agents when we consider learning. Humans can update their internal models through language alone, but not in all cases. For example, you can decide to take an umbrella because you asked someone whether it is raining outside and they confirmed that it is, but you cannot learn how to ride a bicycle simply by listening to instructions on how to do it. I suggest that something similar is happening with LLMs: they can acquire those parts of our world model that are learnable through language, but they cannot acquire those that require the full active inference cycle, including direct interaction with the external environment. An additional argument supporting this perspective is provided by Xu et al., who demonstrated that LLMs trained on visual data show greater similarity to human representations in vision-related dimensions. In text-only LLMs, the similarity between model and human representations decreased progressively from non-sensorimotor to sensory domains and was weakest in motor domains. By contrast, models incorporating visual inputs exhibited enhanced alignment with human representations, particularly in vision-related dimensions. These findings suggest that learning exclusively from language is sufficient to recover many non-sensorimotor aspects of conceptual representation, but remains limited in capturing sensorimotor information, especially motor-related features. Moreover, extending training to the visual domain improves the alignment between LLM representations and human representations not only for visual properties, but also for related dimensions such as imageability and haptic features, indicating possible knowledge transfer through multimodal integration [
32]. This points to both the current limitations of LLMs, stemming from their lack of direct interaction with the external environment, and to potential future developments in which LLMs become more closely connected to their environment, thereby improving their alignment with human agents.
In their current form, LLMs cannot build their own human-like world models. It is clear that they do not inhabit our ecological niche, nor are they able to deploy a human-like active inference process. If evaluated under the FEP, their sensory input would consist of text, and their actions upon the world would also be mediated through text (or, more precisely, numerical representations, though we can treat these as text for simplicity). However, even within this distinct ecological niche, they do not implement a full active inference cycle. While it is not impossible to design systems that do, their Markov blankets would differ significantly from ours, and their active inference processes would likely produce generative models that represent different hidden causes and causal structures than those found in human cognition. The third objection stems from the second: it is not clear how much of our own world model is captured by LLMs. Every human agent has their own world model, and there are certainly similarities among the world models that different human agents possess. But how much of this does an LLM capture, and how do we measure it? Even if someone accepts that, conceptually, it is possible for an LLM to acquire a human-like world model, it is not clear how much of it is captured or how different it is from the ones possessed by human agents. We have seen in this paper some attempts to measure exactly what is captured by LLMs from our world model; most notable is that of Xu et al. [
32], but the work is far from complete. Xu et al. measured the ability of LLMs to derive human-like conceptual representations from descriptions in a reverse dictionary (some examples: youthful male person ⇒ boy; a small, very thin pancake ⇒ crepe; a small bed that folds up for storage or transport ⇒ cot; dried grape ⇒ raisin; edge tool used in shaving ⇒ razor). The conclusion is that LLMs can pass the reverse dictionary test on par with humans. But the list of measures and the complexity of the measurements must be increased.
The fourth objection that can be raised against my position is that LLMs are now trained on content that represents their own output. As humans use LLMs increasingly to generate content—up to 30–40% by some accounts [
34]—it will become harder to distinguish how much of the training data stems from humans and how much is generated by LLMs. This could potentially weaken the similarity between human agents’ world models and those of LLMs, because the grounding of LLMs should be based on human-derived language that is itself grounded in the external world (assuming that human language generally consists mostly of sentences that aim to be true about the world). It might be that LLMs lose their capabilities due to being trained on data that is no longer anchored in the external environment, but rather on LLMs’ own output. Alternatively, it might be that LLMs’ and human world models will converge even further, and LLMs will become what Kulveit et al. called an “active LLM”, which is an LLM that uses the full active inference cycle. In this situation, the LLM will ingest as training data (input) parts of the content that is a reaction to its own content (output), leading to a full cycle of modeling the external world. This would represent an interesting path to test. However, at this moment, LLMs are not functioning under a full active inference model. If they were, we would expect them to actively predict user input, which is not how they are built. If they operated fully under the FEP, they would most likely encounter issues such as the “dark room problem,” where they fall into a false minimum, steering interactions toward less engagement, since the ideal scenario would be one in which no queries are entered. Having no queries is the easiest state to predict; hence, free energy would be minimized. This is clearly not the case. I therefore argue that LLMs are not operating under FEP, and that this is unlikely to be the architecture that will enable such systems in the future. The key question, however, is how—despite this—LLMs are so successful and appear to encapsulate a human-like world model. I suggest that the answer lies in how we define world models: they are neural structures, whether artificial or biological, that match regularities present in the external world, and this matching can be evaluated through behavior. If a system’s behavior is well-aligned with its environment, we can say it has a good world model. Different organisms will have different world models because they inhabit different ecological niches and behave in ways adapted to those niches, reflecting a behavioral-functionalist perspective. We can determine whether an LLM has a human-like world model by examining its behavior; in terms of linguistic abilities, LLM behavior is very similar to ours. While we can debate how similar this behavior must be, it is clear that LLMs demonstrate a level of success unmatched by any other artificial system and comparable only to humans. However, LLMs do not possess the same world model as humans, as many other behaviors differ significantly. I conjecture that only parts of a world model can be shared, as evidenced by processes such as learning and cultural exchange. Not all components of a human world model are acquired through individual active inference; humans also learn from others. What distinguishes LLMs is that they acquire their world model entirely from external sources rather than from direct interaction with the external environment.
These objections point toward the next step in researching this topic, which is to find a way to measure what and how much of our world model is captured by LLMs. There are difficulties with this project, stemming from the fact that even the notion of a “human-like world model” is not easily measurable, but progress could be made in this direction with initiatives such as mechanistic interpretability or that of Xu et al. By identifying how we can properly define world models in a measurable way, we can also advance the task of determining how much of our own world model an LLM possesses.