1. Introduction
What does it mean to know something? And do all cognitive systems know in the same way? This paper explores these questions by examining a central debate in the philosophy of mind and cognitive science: whether content—in the sense of internal, semantic representation—is necessary for cognition. I take up this issue by engaging with radical enactivism [
1], a position that claims most cognition can be explained without content, and that intelligent behavior does not require internal representations of the world. Here, by “content” I follow the standard philosophical usage of states that are about something, capable of being true or false, and therefore semantically evaluable. This clarification is important, as misunderstandings sometimes arise when “content” is taken merely to mean information-processing in a broad sense.
To test this claim, I compare human cognition with the behavior of Rodney Brooks’ behavior-based robots [
2], which are capable of highly sophisticated interaction with their environment, yet operate entirely without representational content. This comparison helps illuminate the strengths of radical enactivism, but also its limits. I will argue that while some beings can interact with the world without content, we human beings do not. Our cognitive processes are structured in ways that presuppose internal representations; we store input as something, retrieve it as something, and change our behavior accordingly. This argument does not deny that humans also display embodied, sensorimotor forms of engagement with the world; rather, it highlights that in addition to such embodied forms, human cognition essentially depends on representational content.
This will be shown through four empirical cases—spreading activation, object recognition, agnosia, and vision reconstruction—which together provide evidence that human cognition is fundamentally contentful. Because of this, humans face what Hutto and Myin call the hard problem of content: the problem of explaining where content comes from if it cannot be picked up from the environment. I will suggest a solution to this problem by arguing that content is generated internally through the brain’s own categorizing and processing structures. This position acknowledges the force of the radical enactivist critique but insists that in the case of humans, cognitive explanations that omit representational content are insufficient.
In the final part of this work, I will draw a broader conclusion. By contrasting human cognition with that of Brooks’ robots, I will argue that we are not faced with a binary question of which form of cognition is “correct” or “superior.” Instead, these two cases point toward a more inclusive view: there is more than one kind of knowledge. One kind is contentful, representational, and reflective. The other is contentless, embodied, and sensorimotor. Neither is reducible to the other, and neither is inherently better or worse. By framing the debate this way, I aim to clarify a common misunderstanding: defending the indispensability of content for human cognition is not equivalent to denying the explanatory value of contentless, embodied models in nonhuman or artificial cases.
In this way, the goal of this work is not only to challenge the scope of radical enactivism but also to open up a pluralistic epistemology—an account of knowledge that recognizes the diversity of cognitive systems and ways of knowing.
At the same time, it is important to acknowledge that Brooks’ robots and human cognition represent only two ends of a broader spectrum. Biosemiotic approaches suggest that there exist intermediate cases, where organisms and systems gradually expand their sphere of meaning without relying exclusively on either pure embodiment or full-blown semantic representation. Classical work by von Uexküll [
3] on the Umwelt, Anderson et al. [
4] on semiotic paradigms, and Emmeche [
5] on emergence, already indicate how living systems scaffold their own forms of significance. More recent contributions, such as Nöth [
6] on semiotic machines and Sarosiek [
7,
8] on biosemiosis, homeostasis, and adaptive artificial intelligence, develop this line of thought further. These perspectives support the idea that cognition may be graded, ranging from embodied reactivity to semiotically enriched forms of interaction, and eventually to the reflective, representational knowledge characteristic of humans.
2. The Hard Problem of Content
The hard problem of content refers to a fundamental issue in the philosophy of mind and cognitive science: explaining how cognitive systems come to possess contentful mental states—that is, states that are about something and carry truth values. For example, when one perceives an apple and judges it to be rotten, this perception is not merely a sensory input but a representation with semantic content: “The apple is rotten.” The difficulty lies in accounting for the origin of this content in a naturalistic, non-magical way [
1].
Hutto and Myin [
1] define this challenge precisely: if cognition is contentful, then one must explain where this content comes from, especially given that the external world does not provide it. They argue that explanations of content must obey the Muggle constraint, which holds that one’s explanation must involve only “entities, states and processes that are wholly nonmagical in character” [
1] (p. 66). This constraint forces theorists to ask: how do we get from a brown apple to the meaningful thought “This apple is rotten”? If semantic properties like ‘rotten’ are not part of the physical world, and if they cannot be picked up from the environment, how do they emerge in cognition?
This is the essence of the hard problem of content; if cognition is based on content, then we must account for how representational and truth-evaluable content arises in the first place. According to Hutto and Myin, this explanatory burden has never been adequately met. They emphasize that content is not “the raw material of mental consumption” and cannot be “picked up” or “acquired” from a contentless world [
1] (p. 73). This framing is crucial, since some critics misinterpret the “hard problem” as a merely empirical puzzle about perception. In fact, it is a deeper metaphysical and explanatory challenge about how semantic properties enter into a naturalistic ontology.
3. Radical Enactivism
A theory which circumvents solving the hard problem of content is radical enactivism. Radical enactivism does not attempt to solve the hard problem of content—because it rejects the assumption that most cognition involves content to begin with. Hutto and Myin [
1] argue that cognitive systems, both biological and artificial, are often capable of interacting with the environment in intelligent and adaptive ways without relying on internal representations or semantic content. Instead of explaining how content arises, radical enactivists deny that it plays a significant role in basic cognition.
Their point is illustrated through biological and artificial examples. Consider female crickets, which locate male mates by responding to specific acoustic signals. These signals trigger adaptive behavior, such as orienting and movement, but without requiring internal representations of those signals. Crickets do not model their environment; they are “guided by continuous, temporally extended interactions with their environment” [
1] (p. 42). Their cognition is embodied, dynamic, and non-contentful. Here it is important to note that radical enactivism is not claiming that no cognition anywhere is representational, but rather that many forms of adaptive, intelligent behavior (especially in simpler organisms) can be adequately explained without appeal to representational content.
4. Brook’s Behavior Based Robots as Proof for Radical Enactivism
I claim that one of the clearest real-world supports for radical enactivism comes from Rodney Brooks’ behavior-based robots, which demonstrate how intelligent, adaptive interaction with the environment can be accomplished without internal representations or content. This architecture rests on a bottom-up design, where robots operate through layered sensorimotor behaviors, not symbolic reasoning. Brooks [
2] emphasizes that “the world is its own best model,” meaning there is no need to construct internal representations when the environment itself provides sufficient structure for action.
The robot Herbert, for example, was designed to roam office spaces and collect empty soda cans. Its sensors identify obstacles and locate cans, triggering grasping behaviors via a simple arm mechanism. Importantly, Herbert does not represent the soda can as such; it does not recognize it as a “can” in any semantic or contentful way. Instead, it reacts to features in its environment based on built-in behavioral responses [
2]. This behavior is intelligent and goal-directed, but achieved entirely without the kind of semantic content that representational theories of mind assume. This aligns with Hutto and Myin’s [
1] assertion that “we can explain the behavior of beings without assuming that there is content.”
A more advanced example is Packbot, a robot used in military surveillance and reconnaissance. Packbot navigates complex terrains, flips itself upright when overturned, climbs stairs, and avoids hazards, all without prior knowledge of the specific environment. Again, there is no representational map of the space it navigates; instead, its behavior is driven by its continuous, embodied coupling with environmental features [
9]. Packbot’s performance supports the radical enactivist view that world-directed, action-guiding cognition does not require content [
1].
Even more striking are behavior-based robots that interact socially or emotionally. Kismet, a humanoid robot with expressive facial features, was designed to engage in human-like emotional interaction. It can mimic expressions such as happiness, surprise, fear, and interest using eye movements, eyebrow positioning, and mouth articulation [
10]. Kismet even adjusts its responses during interactions with humans, giving the impression of understanding or emotional responsiveness. However, these behaviors arise not from semantic representations of emotional states but from dynamically calibrated sensorimotor feedback loops. Kismet’s responses are grounded in the real-time modulation of sensorimotor patterns, not in internal symbolic interpretation. This directly echoes Hutto and Myin’s claim that sophisticated behavior “can make do without content” [
1].
Finally, the robot Baxter demonstrates how embodied interaction can support safe, collaborative activity with human beings. Baxter possesses two arms and an animated face, allowing it to perform industrial tasks like object manipulation and to communicate its intentions via gaze direction. It responds to proximity by stopping movement when a person enters its operational range—without understanding the concept of a human being or danger. Baxter’s behavior is structured through sensor-based safety mechanisms, not abstract representations [
11]. Users can train Baxter by physically guiding its arms, showing that its “learning” consists of the physical restructuring of behavior, not the acquisition of representational content. This behavior once again confirms the radical enactivist position that ongoing embodied interaction with the world can yield highly flexible and safe performance without invoking contentful cognition.
All four robots—Herbert, Packbot, Kismet, and Baxter—act in ways that are functionally intelligent, adaptively responsive, and environmentally embedded, yet none of them require content-involving mental representations. They are designed around principles that echo those of biological organisms like crickets, which, according to Hutto and Myin [
1] (p. 42), engage in “a continuous interactive process of engagement with the environment” without content. The robots, like the crickets, exploit specialized body–environment dynamics rather than abstract mental representations. Their architecture and success further confirm that cognition, in its basic and embodied forms, can be explained without assuming the existence of content—a cornerstone of radical enactivism.
Therefore, Brooks’ behavior-based robots do more than perform engineering feats. They offer philosophical insight into the nature of cognition. By showing that intelligent behavior arises from non-representational mechanisms, they support the radical enactivist claim that content is not the basis of most cognitive activity. There are beings who can interact with our world in a quite sophisticated manner without content. Thus, radical enactivism seems to be right. However, there are beings who might be able to interact with the world in a quite sophisticated manner but simply do not do so. These beings are we, human beings. In what follows, I will argue for the view that we human beings do need content to interact with the world.
5. Faced with the Hard Problem of Content
In what follows, I will argue that we do need content in order to interact with the world successfully and flexibly. To support this claim, I will present four cases—drawn from empirical research—that suggest our cognitive processes necessarily involve internal representations. These cases are spreading activation, object recognition, agnosia, and vision reconstruction. Each shows that our brains not only process inputs but do so by representing them as something. This implies the presence of semantic or pre-semantic content, which is exactly what radical enactivism denies.
The first case is the phenomenon of spreading activation, initially theorized by Collins and Loftus [
12] and later supported by various empirical studies. The idea is that our brains store information based on semantic relatedness—for example, the word “fox” is stored closer to “cat” than to “book” because they share the category ‘animal’. The crucial point is that in order for words or perceptual inputs to be stored based on their meaning, the brain must represent them as what they are—i.e., as a fox, cat, or book. If inputs were not internally categorized or encoded with content, no semantic proximity could be established. Even more convincingly, spreading activation has been shown to occur in non-verbal and visuospatial domains, and in infants who do not yet use language [
13,
14]. These findings show that content is not reducible to language, and that cognitive content precedes linguistic capability, thereby challenging radical enactivism’s claim that only language introduces content [
1] (p. 82).
The second case is object recognition. Humans can recognize objects across changes in appearance, orientation, and context [
14] (p. 539 ff.). For instance, if I see a rabbit eating dandelions on the lawn in the morning and later see the same rabbit indoors on a table standing on its hind legs, I recognize it as the same rabbit. What this example shows is that we are capable of linking two perceptual events as concerning one and the same object, even when various perceptual conditions (location, configuration, posture) change. In order to do so, we must store and retrieve the earlier input in a way that preserves the identity of the object over time. This requires that the input be represented as something—as that same rabbit—for the continuity in recognition and behavioral adaptation to be possible.
The third example is the neurological condition of agnosia, where individuals are unable to recognize certain stimuli despite intact sensory systems. For example, someone with visual agnosia may see an apple clearly but be unable to recognize it as an apple [
15] (p. 500 ff.). In such cases, visual input is not missing, but the association with meaning—i.e., the representational content—is impaired. This condition highlights that representing input as something is a usual function of healthy cognition, and its disruption shows what happens when content is lost. The brain’s default seems to be the following: interpret and categorize inputs meaningfully. That agnosia can be category-specific (e.g., affecting only fruit recognition but not animals) further suggests that the brain stores and processes information based on content-sensitive neural structures. If cognition were truly non-contentful, as radical enactivism proposes, such localized deficits would make little sense.
The fourth case is vision reconstruction through brain imaging technologies. In studies like those conducted by researchers at UC Berkeley [
16], neural activity recorded while subjects viewed videos was used to reconstruct approximations of what they had seen. Specific brain areas activated for features such as edges, motion, color, and texture, showing that the brain processes and stores visual input by encoding these features in distinct, content-specific ways. The mere fact that researchers could decode what someone saw based on their brain activity confirms that the brain does not merely respond to stimuli—it categorizes and stores inputs in structured, meaningful formats. Without contentful representations, this kind of reconstruction would be impossible.
Together, these four cases form a cumulative argument against radical enactivism’s claim that cognition is generally non-contentful. While simple forms of interaction may not require content—as seen in robots like Herbert or Baxter—human cognition clearly does. It is not only that we can act with content, but that our cognitive systems are structured to do so. The brain’s storage, retrieval, and categorization of inputs all point to the presence of internal representational content, much of which is non-linguistic and below the level of conscious thought.
Moreover, the four empirical cases demonstrate that human cognition is fundamentally different from the cognition exhibited by Brooks’ behavior-based robots. While these robots can engage in surprisingly sophisticated behaviors through non-representational, content-free mechanisms, we human beings do not merely react to the world. We represent it. We interpret sensory input as something, categorize experiences, and recall past events based on how they were structured and understood. In short, we think and remember with content.
Because of this, we human beings face the hard problem of content. Unlike robots such as Herbert, Packbot, Kismet, or Baxter, we do not simply respond to stimuli based on built-in routines. We continuously generate internal representations that allow us to recognize, reflect, and flexibly respond to ever-changing environments. Yet, the world does not present itself in a pre-structured, content-laden form. As Hutto and Myin [
1] (p. 73) argue, content is not “picked up” from the world or added in later. Thus, the following problem arises: if content is not out there, where does it come from?
I suggest that content comes from within us—specifically, from the processing capacities of the human brain. The brain categorizes inputs and stores them according to structure, meaning, and context. Here lies the decisive difference between us and behavior-based robots: robots can only pursue the goals they are programmed for. Herbert collects soda cans because that is what it is designed to do. Packbot performs reconnaissance because its routines are pre-specified. Even Baxter, which can learn new hand movements, cannot suddenly decide to switch goals or reinterpret tasks. In contrast, human beings adapt. If our goals shift, we reinterpret our environment, modify our behavior, and change strategies—without needing to be reprogrammed externally.
This is the key point: robots must be reprogrammed to pursue new goals but humans reprogram themselves. I claim that this self-modification depends on content. Only by representing our environment and experiences as something can we compare them, evaluate them, and change course. This capacity for internal redirection—based on the contentful structure of our cognition—is precisely what sets us apart. It is also why we, and not robots, must address the hard problem of content.
6. Why We Know Differently
The fundamental difference between contentless robotic cognition and human contentful cognition leads to an important epistemological conclusion: there is more than one form of knowledge. While we often privilege human knowledge—reflective, semantic, abstract—as the ideal, the success of Brooks’ behavior-based robots shows that another kind of knowledge exists. This knowledge is non-representational, embodied, and action guided. Rather than understanding the world as something, this kind of cognition engages with the world directly through reliable sensorimotor loops.
Herbert’s ability to navigate office spaces, locate cans, and grasp them is a form of knowing-how, not based on beliefs or concepts but on structured interaction. Packbot’s capacity to operate in dangerous terrain, and Baxter’s ability to learn physical movements through direct guidance, both exemplify cognition that is procedural, adaptive, and situated—but not representational. These machines know what to do in specific contexts without needing to know what it is they are doing in a contentful sense.
Human knowledge, by contrast, is content-laden. We do not just act—we understand. We recognize, reflect, and reason. Our ability to change behavior in response to shifting goals depends on our capacity to internally represent both our environment and our goals. When our context changes, we reinterpret the meaning of previous actions, adapt our intentions, and project possible futures. This kind of flexible cognition is only possible when content is involved—when inputs are represented as meaningful. It is this content that allows us to adjust, rethink, and self-direct, without external reprogramming.
Nevertheless, I suggest that neither form of knowledge is better nor worse. The contentless cognition of behavior-based robots is efficient, robust, and sufficient for many tasks. It thrives in domains where goals are stable, environments are constrained, and reaction suffices. Human cognition is flexible, expansive, and semantically rich, but it is also slower, prone to error, and burdened by the complexity of interpretation. Both forms are legitimate. They reflect different strategies of navigating and making sense of the world.
At the same time, these two cases—robots on the one hand and humans on the other—should not be seen as exhausting the possibilities. Uexküll [
3]; Anderson et al. [
4]; Emmeche [
5]; Nöth [
6]; and Sarosiek [
7,
8] suggest that cognition can be understood as existing on a continuum, where organisms and systems gradually expand their semiotic capacities. In this view, there are intermediate forms of knowing which are neither purely embodied and contentless, nor fully representational, but based on semiotic scaffolding and homeostatic mechanisms that allow systems to create and extend meaning in context.
Rather than using human cognition as the yardstick for all intelligence, we should acknowledge that multiple epistemic frameworks exist. Knowledge is not a single unified capacity but a spectrum of ways of engaging with the world—some contentful, some not. What matters is not whether cognition contains content, but how a system is able to interact with its environment, solve problems, and pursue goals.
From this perspective, we are led to a pluralistic account of knowledge. Behavior-based robots and humans know differently—but both know. Their modes of knowing are shaped by their structure, capabilities, and environments. And by recognizing these differences without ranking them, we open the door to a more nuanced understanding of cognition—one that respects both the mechanistic intelligence of robots and the representational richness of human thought.