Our discussion so far exemplifies the fact that it is possible to characterize two key notions of 4-Es theories—active perception and control—using different approaches, some of which use model-based and inferential processes, and some of which dispense from using them—the latter being considered more “deflationist” compared to traditional cognitive theory. Yet, the problem of assessing the relative merits of these and other alternative proposals remains open.
Comparing different approaches is difficult given that they are often formulated at different levels of detail, e.g., at the theoretical level or as computationally implemented models. To mitigate this problem, we focused on examples for which detailed computational models have been proposed in the literature (see the above discussion). However, the mere existence of implemented computational or formal models does not solve all of the problems. Another problem in comparing different approaches is the usage of different terminologies or formal approaches. Indeed, it is possible that formal solutions that are commonly considered to be alternative are in fact mathematically equivalent—as in the case of the equivalence between control and inference problems [61
]. A similar problem seems to exist when comparing computational and dynamical systems perspectives on cognitive phenomena—two approaches that are often considered to be mutually exclusive, especially by proponents of dynamical systems perspectives who support anti-representationalism [55
]. As noticed by Botvinick ([105
], p. 81) “The message is that one must choose: One may either use differential equations to explain phenomena, or one may appeal to representation
.” However, this problem might be more apparent than real, at least in some cases. Botvinick [105
] continues as follows:
“This strikes me as a false dilemma. As an illustration of how representation and dynamics can peacefully coexist, one may consider recent computational accounts of perceptual decision-making. Here, we find models that can be understood as implementing statistical procedures, computing the likelihood ratio of opposing hypotheses (read: representations), or with equal immediacy as systems of differential equations”.
and refers to two specific examples of models that have these characteristics [106
]. Ahissar and Kleinfeld [34
] (p. 53) provide another interesting illustration of duality between homeostatic (or control-theoretic) and computational perspectives:
“The operation of neuronal closed loops at various levels can be considered from either homeostatic or computational points of view. All closed loops have set-points at which the values of their state variables are stable. Thus, feedback loops provide a mechanism for maintaining neuronal variables within a particular range of values. This can be termed a homeostatic function. On the other hand, since the feedback loops compute changes in the state variables to counteract changes in the external world, the change in state variables constitutes a representation of change in the outside world. As an example, we consider Wiener’s description of the sensorimotor control of a stick with one finger. The state variables are the angle of the stick and the position (angle and pivot location) of the finger. When the stick leaves a set-point as a result of a change in local air pressure, the sensorimotor system will converge to a new set-point in which the position of the finger is different. The end result, from the homeostatic point of view, is that equilibrium is re-established. From the computational point of view, the new set-point is an internal representation of the new conditions, e.g., the new local air pressure, in the external world. (We note that the representation of perturbation by state variables may be dimensionally under- or over-determined and possibly not unique.) This internal representation is ‘computed’ by the closed-loop mechanism”.
This short illustration of the difficulties of comparing different approaches—and the possible errors one can incur if one naively maps different formal languages to different theories—is meant to suggest caution in the analysis, but not that all theories are equal. Rather, we suggest that different families of approaches (e.g., with or without internal models) to the problems we have focused on—active perception and control—have some elements in common but are different in other respects. In the rest of this section, we will discuss some of the theoretical implications of using, or not using, model-based and inferential approaches to problems of active perception and control, for what concerns the notion of internal representation and the way we conceptualize brain architecture.
4.1. Model-Based Approaches to Active Perception and Control: Conceptual Implications
As we have seen, it is possible to address related problems (e.g., active perception and control) and even appeal to similar constructs (e.g., sensorimotor contingencies) using a range of different architectural solutions. For example, one can cast active perception within a family of solutions rooted in dynamical systems theory (e.g., [13
]) or, alternatively, within a family of solutions rooted in model-based and inferential computations (e.g., [35
]). Both approaches implement perception as an interactive process, in which action dynamics (e.g., the routines for grasping an apple) probe whether the “presuppositions for action” (e.g., the presence of an apple) hold or not—hence, sensory and motor processes form a closed loop and not successive stages, in agreement with the tenets of pragmatists [9
However, the appeal to similar pragmatist principles hides the theoretical differences between the two approaches. Enactive theories of cognition including SMC theory [13
] tend to assume that perceptual processing depends on an implicit mastery of the rules of how sensations change depending on actions; thus, appealing to the notion of internal representation is not necessary or is even misleading, as it would divert the attention from the most important (interactive) components that make SMCs useful. In other words, enacting an apple-related action-perception loop is sufficient for perception and a successful grasping: no internal apple representation is needed for this. This would make redundant the usage of notions such as “beliefs” or “hidden states” that model-based systems associate with perceptual hypotheses such as the presence of an apple or a cup, and of the notion of “inference” that often refers to maximizing the likelihood of (or minimizing surprise about) perceptual hypotheses. More specifically, one can argue that these notions would not be particularly problematic if used as technical constructs—as constituents of an adaptive agent architecture—but would instead become problematic if one assigned them a theoretical dignity, e.g., if one equated “belief” or “hidden state” to internal representation (it is, however, worth reminding that theories of ecological perception [12
] would not accept the notion of “hidden states”—even if intended in a minimalistic sense—because they are not required under the assumption that perception is “direct” and sensory stimuli are self-sufficient for it, making the mediation of internal or hidden states unnecessary).
This point leads us to the question on whether a model-based approach like PP invites (or implies) a representational interpretation—an issue that is currently debated in philosophy, with contrasting proposals that highlight the relations between PP and various (e.g., internalist, externalist, or non-representationalist) epistemological perspectives [62
]. This diversity of opinions recapitulates within the field of PP theories some long-lasting debates about the nature and/or the existence of representations. Our contribution to this debate is to review various existing examples of Active Inference agents, and discuss in which senses they may lend themselves to representationalist or anti-representationalist interpretations—with the obvious caveat that these interpretations may diverge, depending on the specific definition of representation.
Some theories of representation emphasize some form of correspondence or (exploitable) structural similarity between a vehicle and what it represents [108
]. In this vein, a test for representation would be assessing the (structural) similarity between an agent’s internal generative model and/or hidden states (aka the vehicles) and the “true” environmental dynamics—or “generative process” in Active Inference parlance—which is unknown to the agent. When representation is conceived in this way, it seems natural to assign a representational status to (hidden) states within the agent’s Markov blanket, and to notice that the similarity between generative model and generative process is a guarantee for a “good regulator” [67
]. In keeping, most implemented Active Inference agents have internal generative models that are very similar to the external generative process, and sometimes almost the same, see e.g., [82
]. However, this is often done for practical purposes; and it is not necessary to assume a too strong (or naive) idea of similarity according to which internal models are necessarily copies of (or mirror) the external generative process.
In fact, in Active Inference systems generative models and generative processes can diverge in various ways, and for various reasons. The most obvious reason is that internal models are subject to imperfect learning procedures, whose objective is ultimately affording accurate control and goal-achievement or, in other words, mediating adaptive action, or permitting the agent to reciprocate external stimuli with appropriate adaptive actions in order to keep its internal variables within acceptable ranges (and minimize its free energy). Intuitively, given that biological agents have limited resources, and their ultimate goal is interacting successfully within their ecological niche, the “content” of their models will be biased by utilitarian considerations, with resources assigned to coding relevant aspects only, as evident (for example) in the fact that different animals perceive broader or narrower colour spectra [113
]. Several learning procedures also have the objective to compress information, that is, to integrate in the models only the minimal amount of information necessary to solve a specific task [114
]. One can reframe all these ideas within a more formal model comparison procedure, which is part and parcel of free energy minimization, and consider that when a simpler model (e.g., one that includes fewer variables) affords good control, then it may be privileged compared to a more complex model, which may putatively represent the environment more faithfully [81
]. This might imply that, for example, an agent’s model may fail to encode differentially two external states or contexts, if they afford the same policy; and in the long run, even a less discriminative model such as one that assumes that “all cows are black (at night)” can be privileged. If one additionally considers that Active Inference requires the active suppression of expected sensory consequences of actions to trigger movement, and it often affords a sort of optimism bias [116
], it becomes evident that neither the agent’s generative model has to be identical to the generative process, not the agent’s current beliefs (or inferred states) have to be necessarily aligned to external states. This is because, in Active inference, control demands have priority over the rest of inference. To what extent the above examples are compatible with a representational view that highlights some form of correspondence or structural similarity between a vehicle and what it represents remains a matter for debate [108
The topic becomes even more controversial if one considers a slightly different way to construct the generative models for Active Inference, which appeals more directly to the notions of SMCs [13
] and motor-sensory-motor contingencies [31
]. For example, an agent’s generative model can be composed of a simple dynamical system [117
] (e.g., a pendulum) that guides the active sampling of information, in analogy to rodent whisking behaviour. In this example, the pendulum may jointly support the control of a simple whisker-like sensor and the prediction of a sensory event following its protraction (with a certain amplitude)—or a sensorimotor contingency between whisker protraction and the receipt of sensory stimuli. Such mechanism would be sufficient to solve tasks such as the tactile discrimination or localization of some objects (e.g., walls versus open arenas) or distance discrimination tasks [119
]. A peculiarity of this model is that pendulum would not be considered a model of the external object (e.g., a wall), but a model of the way an agent interacts with the environment or samples its inputs. Given its similarity with SMC theory, one can consider that the generative model in this example mediates successful interactive behaviour and dynamical coupling with the external environment, rather establishing a correspondence with—or represent—it. Alternatively, one might argue that the generative model deserves a representational status, as some of its internal variables (e.g., the angle of the pendulum) are related to external variables (e.g., animal-object distance); or alternatively, because the pendulum-supported active sampling can be part of a wider inferential scheme (hypothesis testing [35
]), in which repeated cycles of whisking behaviour support the accumulation of evidence in favour or specific hypotheses or beliefs (e.g., whether or not the animal is facing a wall), which constitute representations. To the extent that representation is defined in relation to a consistent mapping between variables inside and outside the agent’s Markov blanket, then considering the inferential system as a whole (not just the pendulum) would meet the definition.
Having said so, it is important to recognize that—as our discussion exemplifies—the mapping between internal generative models (as well as hidden states and beliefs) and external, generative processes can be sometimes complex, even in very simple computational models using PP (and plausibly, much more in biological agents). There are also cases in which some aspects of agent-environment interactions don't need to be modelled, because they are directly embedded in the way the body works; the field of morphological computation [121
] provides several examples of this form of off-loading. In this perspective, one can even re-read the “good regulator” theorem [67
] and consider that a good controller needs to be
(not necessarily to include
) a model of a system—hence, bodily and morphological processes can be part and parcel of the model. In sum, there are multiple ways to implement Active Inference agents and their generative models. This fact should not be surprising, as the same framework has been used to model autonomous systems at various levels of complexity, from cells that self-organize and show the emergence of life from a primordial soup [101
] or of morphogenetic processes [102
], to more sophisticated agent models that engage in cognitive [122
] or social tasks [92
]—while also appealing within these cognitive models to different inferred variables, including spatial representations [80
], action possibilities within an affordance landscape [16
], and internal (belief) states of other agents [96
]. It is then possible that a reason for dissatisfaction with the above definition of representation is that it does not account for this diversity, and the possibility that we have different epistemological attitudes towards these diverse systems.
One can also sidestep entirely the question of whether or not an agent’s generative model or internal states are similar to the external generative process, and ask in which cases they can be productively assumed to play a representational function—here, in the well-known (yet not universally accepted) sense of mediating interaction and cognition off-line, or “in the absence of” the entity or object that they putatively represent [8
]. Using this (quite conservative) criterion of off-line usage and decouplability (or detachment), then different model-based or PP systems (and associated notions such as “belief”, “hidden state” or “generative model”) lend themselves to representational or non-representational interpretations, depending on how they are used within the system. If one considers again the aforementioned case of the emergence of life from a primordial soup [101
], the agent’s model is a medium for self-organization and synchrony between two coupled dynamical systems, and despite the presence of internal (hidden) states and inferential processes, the architecture does not invite a representational interpretation; see [98
] for discussions and [102
] for a related example. There are other cases in which beliefs and hidden states label components of internal models, which are transiently updated during the action-perception loop for accurate control or learning, but are not used or accessed outside it. The “beliefs” that are maintained within the model-based architecture might correspond, for example, to specific parametrizations of the system (e.g., joint angles of fingers) that need to be optimized during grasping (e.g., to produce the necessary hand preshape). In this example, it would seem too strong to assign such beliefs a truly representational status—at least if one assumes that a tenet of representational content is that it can be accessed and manipulated off-line [17
However, other examples of model-based systems lend themselves to a representational interpretation, which would be precluded in some (e.g., enactivist) theoretical perspectives. Consider the same apple-grasping architecture described above, in which “beliefs” about hand preshape become systematically monitored and used outside the current online action-perception loop. These beliefs could be used in parallel for grasping the apple and for updating an internal virtual (physical) simulator, which might encode the position of objects, permitting an agent to remember them or to plan/imagine grasping actions when the objects are temporarily out of its view [124
]. Another popular example in the “motor cognition” framework is the fact that some sub-processes implied in model-based motor control, and namely the forward modelling loop that accompanies overt actions, may be at times detached and reused off-line, in a neural simulation of action that supports action planning and imagination of movement [126
]. The representational aspect of this process would not consist in the inference of current (latent/hidden) states, but in the process of anticipating action consequences—for example, in the anticipated softness of grasping a sponge or the anticipated sweetness of eating an apple. This view is compatible with the idea that representation is eminently anticipatory and consists in a set of predictions including action consequences and dispositions [19
]. According to this analysis, the difference between non-representational and representational processes would not depend on the mere presence of constructs such as belief or hidden state or prediction, or even on their “content” (e.g., whether they encode states that “mirror” the external environment), but how they are used: for on-line control only or also for “detached” and off-line processes. A similar distinction can be found in some theories of “action-oriented” or “embodied” representation [17
] as well as in conceptualizations of the differences between states that can, or cannot, be accessed consciously [129
Another process of model-based systems that is usually associated with internal representation (and meta-cognitive abilities) concerns confidence estimation and the monitoring of internal variables such as beliefs and hidden states [79
]. There are aspects of confidence estimation that are automatically available in probabilistic model-based systems—for example, a measure of the precision (or inverse variance) of current beliefs—but some architectures also monitor other variables, such as for example the quality of current evidence or the volatility of the environment [70
]. These additional parameters (or meta-parameters) have multiple uses in learning control, such as adapting learning rates or the stochasticity of policy selection [112
], but may also have psychological counterparts, such as a subjective “feeling” of confidence [79
What would these (putatively representational) confidence ratings add to processes of active perception and control? It is worth remembering that in SMC or closed-loop perception theories, enacting the right sensorimotor program is sufficient to attune with (and, thus, perceive) the object. However, there is a potentially problematic aspect of this process: how can an organism know when it has found an object (say, an apple), and decide to disengage from it to search another object (say, a knife to cut the apple into pieces)? One possible answer is that the agent does not really need to “know” anything, but only to steer the appropriate (knife-search) routine at the right time. This is certainly possible, but not trivial except, for instance, in cases where action chains are routinized and external cues are sufficient to trigger the “next” action in a sequence. In their theory of closed-loop perception, Ahissar and Assa [31
] recognized this “disengagement” problem and proposed a possible solution, based on an additional component: a “confidence estimator”, which essentially measures when the closed-loop process converges and thus the agent can change task. In turn, they propose that the confidence estimator may be based on an internal model of the task—thus essentially describing a solution that resembles model-based control, but requires two components: one using internal models (for confidence ratings) and one not using internal models (for control). Despite this difference, the confidence-based system would essentially keep track of the precision (or inverse uncertainty) of the belief “there is an apple”, thus playing the same role as the more standard methods of confidence estimation considered above.
In sum, we have argued that inferential systems that use internal models and hidden states do not automatically invite a representational interpretation; this is most notably the case when the internal model only mediates agent-environment coupling and there is no separate access to the internal states or dynamics for other (e.g., offline) operations. In cases like this, it is sufficient to appeal to an “actionable” (generative) model that supports successful action-perception loops, affording agent-environment coupling. Under certain circumstances, however, a representational interpretation seems more appealing, and particularly when hidden states are used for off-line processing (e.g., remembering, imagining) and accessed in other ways (e.g., for confidence judgements), which are usually considered representational processes; and when the system shows similar dynamics when operating in the two dynamical regimes, on-line (coupled to action-perception loops) and offline (decoupled from them).
The possibility for generative models to operate in a dual mode—when coupled with external dynamics and when decoupled from them—presents a challenge for the interactive and anti-representational arguments of enactivists. This is because, if cognition and meaning were constitutively interactive phenomena, then they would be lost in the detached mode, when coupling is broken. The alternative hypothesis would be that generative models acquire their “meaning” through situated interaction, but retain it even when operating in a detached mode, to support forms of “simulated interaction”, such as action planning or understanding [126
A useful biological illustration of a dual mode of operation of brain mechanisms is the phenomenon of “internally generated sequences” in the hippocampus, and beyond [132
]. In short, dynamical patterns of neuronal activations that code for behavioural trajectories (i.e., sequence of place cells) are observed in the rodent hippocampus both when animals are actively engaged in overt spatial navigation, and when they are disengaged from the sensorimotor loop, e.g., when they sleep or groom after consuming a food—the latter depending on an internally-generated, spontaneous mode of neuronal processing that generally does not require external sensory inputs. Internally-generated sequences that mimic closely (albeit within different dynamical modes) neuronal activations observed during overt navigation have been proposed to be neuronal instantiations of internal models, which play multiple roles including memory consolidation and planning—thus illustrating a possible way the brain might reuse brain dynamics/internal models in a “dual mode”, across overt and covert cognitive processes [132
]. An intriguing neurobiological possibility is that the internal models that produce internally generated sequences are formed by exploiting pre-existing internal neuronal dynamics that are initially “meaningless”, but acquire their “meaning” (e.g., code for a specific behavioural trajectory of the animal) through situated interaction, when the internal (spontaneous) and external dynamics become coupled [138
]. From a theoretical perspective, this mechanism might be reiterated hierarchically—thus forming internal models whose different hierarchical levels capture interactive patterns at different timescales [139