Next Article in Journal
Korovkin-Type Approximation Theorems for Statistical Gauge Integrable Functions of Two Variables
Previous Article in Journal
Mapping Research on the Birnbaum–Saunders Statistical Distribution: Patterns, Trends, and Scientometric Perspective
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Still No Free Lunch: Failure of Stability in Regulated Systems of Interacting Cognitive Modules

The New York State Psychiatric Institute, Box 47, 1051 Riverside Dr., New York, NY 10032, USA
Stats 2025, 8(4), 117; https://doi.org/10.3390/stats8040117
Submission received: 25 August 2025 / Revised: 12 November 2025 / Accepted: 2 December 2025 / Published: 15 December 2025

Abstract

The asymptotic limit theorems of information and control theories, instantiated as the Rate Distortion Control Theory of bounded rationality, enable examination of stability across models of cognition based on a variety of fundamental, underlying probability distributions likely to characterize different forms of embodied ‘intelligent’ systems. Embodied cognition is inherently unstable, requiring the pairing of cognition with regulation at and across the various and varied scales and levels of organization. Like contemporary Large Language Model ‘hallucination,’ de facto ‘psychopathology’—the failure of regulation in systems of cognitive modules—is not a bug but an inherent feature of embodied cognition. What particularly emerges from this analysis, then, is the ubiquity of failure-under-stress even for ‘intelligent’ embodied cognition, where cognitive and regulatory modules are closely paired. There is still No Free Lunch, much in the classic sense of Wolpert and Macready. With some further effort, the probability models developed here can be transformed into robust statistical tools for the analysis of observational and experimental data regarding regulated and other cognitive phenomena.

1. Introduction

Recent studies by Wang et al. [1], Dulberg et al. [2], and others argue that AI must evolve into a distributed ecosystem in which multiple AI entities interact, specialize, and collectively enhance intelligence. A central assertion is that a modular architecture can learn more efficiently and effectively to manage multiple homeostatic objectives, in nonstationary environments, and as the number of objectives increases.
Much of the origin of this approach lies in the early work of Wolpert and Macready [3,4] —more recently, Shalev-Schwartz and Ben-David [5]—demonstrating that an optimizer tuned to be ‘best’ on one particular kind of problem must inevitably be ‘worst’ at some complementary problem set. Complex problems that divide ‘naturally’ into modular subcomponents can then be—it is hoped—efficiently addressed by an interacting system of cognitive modules, each individually tuned to be ‘best’ on the different problem components most representative of the embedding challenge environment. Success in such a ‘game’ requires matching cognitive system subcomponents to problem subcomponents in real time. This is not easy. As an anonymous commentator put the matter [6],
Apparently a proved theorem cannot be entirely circumvented by a framework consistent in the same formal foundation system, so [Mixture of Experts] just redistributes the challenge across the framework’s underlying components.
In particular, the maintenance of dynamic homeostasis under the Clauswitzian ‘selection pressures’ of fog, friction, and adversarial intent is far from trivial, and, among humans, even after some considerable evolutionary pruning, high-level cognitive process remains liable to debilitating ‘culture-bound syndromes’ often associated with environmental stress and related, inherently toxic, exposures and subsequent developmental trajectories (e.g., [7], Ch. 3).
We are, then, very broadly concerned with the maintenance of cognitive stability in a modular system under the burdens of what, in control theory, is called ‘topological information’ [8] imposed by disruptive circumstances, internally or externally generated.
Maturana and Varela [9] hold that the living state is cognitive at every scale and level of organization. Atlan and Cohen [10] characterize cognition as the ability to choose an action—or a proper subset—from the full set of those available, in response to internal and/or environmental signals. Such a choice reduces uncertainty in a formal manner and implies the existence of an information source dual to the cognitive process studied. The argument is direct and places cognition under the constraints of the asymptotic limit theorems of information theory [11,12,13].
More recently, the Data Rate Theorem [8] establishes conditions under which an inherently unstable dynamical process can be stabilized by the imposition of control information at a rate exceeding the rate at which the process generates its own ‘topological information’. The model depicts a vehicle driven at high speed along a twisting, potholed roadway. The driver must impose steering, braking, and gear-shift information at a rate greater than the road imposes its own topology, given the chosen speed.
Most real-world cognitive phenomena, from the living state to such embodied machine cognition as High Frequency Stock Trading [14,15,16], must act in concert with–be actively paired with–a regulator to successfully meet the demands of real-world, real-time function:
  • Blood pressure must respond to changing physiological needs while remaining within tolerable limits.
  • The immune system must engage in routine cellular maintenance and pathogen control without causing autoimmune disease by attacking healthy tissue.
  • The stream of consciousness in higher animals must be kept within ‘riverbanks’ useful to the animal and its social groupings, particularly among organized hominids, the most deadly predators on Earth for the best part of half a million years.
  • Human institutions and organizations, via ‘rule-of-law’ or ‘doctrine’ constraints, are notorious for cognition-regulation pairing.
Here, using the asymptotic limit theorems of information and control theories, we will explore, in a sense, the ‘metabolic’, ‘free energy’ and other such costs of this necessary pairing as the number of basic, interacting submodules rises, uncovering a truly remarkable set of scaling properties across the ‘fundamental underlying probability distributions’ likely to power and constrain a variety of important cognitive phenomena.
The probability models arising ‘naturally’ from this effort can serve as the basis of new statistical tools for the analysis of data related to the stability of cognitive phenomena across a broad range of phenomena. However, like the regression models derived from the Central Limit Theorem and related results, such tools do not, in themselves, ‘do the science’. This work has, consequently, focused on a range of models and modeling strategies. Not everything follows a y = m x + b pattern. Sometimes things go as y = m x 2 + b or worse, hence our insistence on the analysis of dynamics across different underlying probability distributions.
Some current attempts at comprehensive cognitive models, e.g., ’integrated information theory’, ’the free energy principal’, ‘quantum consciousness’, have made claims of existential universality akin to a general relativity of biology and mind. This work, far more modestly, aims to derive tools that can aid the analysis of observational and experimental data, the only sources of new knowledge rather than new speculation.
That said, we do not at all denigrate the model-based speculations that might arise from our approach, provided they are ultimately constrained by effective feedback from data and experiment.
The development, unfortunately, is far from straightforward, and requires the introduction of some considerable methodological boilerplate, extending relatively simple ‘first order’ approaches abducted from statistical physics and the Onsager model of nonequilibrium thermodynamics, generalizing, in a sense, the results of Khaluf et al. [17], Kringlebach et al. [18], and other recent work.
An often-overlooked but essential matter is that information sources are not, like physical processes are assumed to be, microreversible, i.e., in English ‘the’ has a much higher probability than the sequence ‘eht’, implying a directed homotopy ultimately leading to groupoid symmetry-breaking cognitive phase transitions. Much–but not all–of this will be safely hidden in the relatively simple first-order formalism developed below.
Another common misunderstanding is the interpretation of Shannon uncertainty, as expressed mathematically, as an actual entropy. Following Feynman [19] and Bennett [20], information should be treated as a form of free energy that can be used to construct partition functions leading to higher-order free-energy constructs. This is, as Feynman remarks, a subtle matter.
More generally, as Khinchin [12] indicates, for nonergodic stationary systems, each message stream converges to its own source uncertainty that cannot be expressed in ‘entropy’ format. A standard approach from statistical physics, however, still allows construction of a partition function in such cases, hence an iterated free energy, and, via a Legendre Transform, a compound entropy measure. The gradient of that compound entropy can be taken as a thermodynamic force, thereby defining the rate-of-change of the underlying variates, thereby permitting standard stochastic extensions of the models. Again, however, this provides only an entry-level, first-order attack on a barbed-wire thicket of cognitive phenomena and their inevitable pathologies.
Although modern medicine can often perform seeming miracles against an increasingly wide spectrum of infectious and chronic diseases, the culture-bound syndromes of ‘madness’ remain largely intractable.
The narrowly trained designers and builders of cutting-edge artificial intelligence systems and entities are entering a Brave New World.

2. An Introduction to Method

2.1. Resources

Real-world, real-time systems are necessarily embedded in and of an environment that includes themselves. There must be at least three resource streams necessary for correct function:
  • The rate at which elements of the system can communicate with each other, characterized by some channel capacity C .
  • The rate at which ‘sensory information’ is available from the embedding circumstance, according to the channel capacity H .
  • The rate at which ‘material/materiel’ resources can be provided, a rate M . For an organism, this might be measured by the rate at which metabolic free energy is provided.
These rates cross-correlate and interact, generating a (minimally) 3 × 3 matrix Z . An n × n matrix will have n scalar measures r i invariant under similarity transformations that are determined by the standard expression:
p ( γ ) = det [ Z γ I ] = k = 0 n ( 1 ) k r k γ n k
I is the identity matrix, ‘det’ the determinant, and γ a real-valued parameter. The first is the matrix trace, and the last is the determinant. We build a scalar resource rate measure from the r k .
An elementary index might be Z = C × H × M , but matters are not likely quite that simple.
We do, however, impose a first-order model, assuming that a single scalar index Z ( r 1 , , r n ) can be constructed that sufficiently represents information and material resource rates. This argument is analogous to principal component analysis based on a correlation matrix. Wallace [21] outlines difficulties inherent to more than a single such index.

2.2. Rate Distortion Control Theory

An ‘embodied’ formalism emerges via a ‘Rate Distortion Control Theory’ based on the Rate Distortion Theorem (RDT) [11,12]. The RDT addresses transmission under conditions of noise over an information channel. There will be, for that channel and a particular scalar measure of average distortion D between what is sent and what is received, a minimum necessary channel capacity R ( D ) .
The theorem asks, in effect, what is the ‘best’ channel for transmission of a message with the least possible average distortion. R ( D ) can be defined for nonergodic information sources via a limit argument based on the ergodic decomposition of a nonergodic source into a ‘sum’ of ergodic sources. The RDT can be reconfigured in a control theory context if we envision a system’s topological information from the Data Rate Theorem as another form of noise, adding to the average distortion D. See Figure 1.
The punctuation implied by the Data Rate Theorem (DRT) [8] happens if there is a critical maximum average distortion beyond which control fails.
Embodied cognition, characterized by continuous interaction with a real world having its own intents and capabilities—as indexed by the distortion between what is wanted and what is observed—is inherently a phenomenon of ‘bounded rationality’ in the sense of Ortega and Braun [22]. Cognition without such embodied interaction, however, as Dreyfus [14], Fjelland [15], and many others have noted, risks a narrow boundedness without rationality.

2.3. The ‘Tuning Theorem’

Parallel to both Rate Distortion Control Theory and the No Free Lunch Theorem is the ‘tuning theorem’, inverse of the Shannon Coding Theorem (SCT). The SCT holds that an ‘infinite’ message stream encoded so that its probabilities are ‘typical’ with respect to the channel along which it is transmitted can be sent with arbitrarily small error so long as the transmission rate is ≤C, the ‘capacity’ of that channel.
A relatively simple argument ([7], Section 12.2) shows that, conversely, if a message must be sent at a maximum possible rate, there is a channel which is effectively ‘transmitted by the message’ at an effective dual channel capacity C * , where the channel probabilities are now made ‘typical’ with respect to that particular message. An inherently unstable system’s optimal ‘control message’ can thus be transmitted at the maximum rate by a ‘tuned channel’ optimized for that message. Different optimal control messages must be transmitted by different optimal channels; hence, the No Free Lunch Theorem: there is no single optimal channel for all possible control messages.

3. A Single Cognition-Regulation Pairing

Here, we are interested in the dynamics of interaction between two entities—a ‘system-of-interest’ and its paired regulating apparatus—engaged separately in ‘real-world’ evaluations according to Figure 1, where each entity is concerned with its own channel capacity R i ( D i ) —increasing R to lower the distortion between what it intends and what it observes.
In rough consonance with the approaches of Ortega and Braun [22] and Kringlebach et al. [18], we consider a ‘simple’ probability distribution underlying overall system dynamics, say ρ ( x ) , and construct an iterated probability for each component of the interaction individually, having the following form:
P j = ρ ( R j / g j ( Z j ) ) ρ ( R 1 / g 1 ( Z 1 ) ) + ρ ( R 2 / g 2 ( Z 2 ) )
where g ( Z ) represents a temperature analog that must be further characterized, and each R j , again following Feynman’s ([19], p. 146) adaptation of Bennett’s [20] analysis, is itself to be interpreted as an initial free energy measure. Bennett argued that information itself has a kind of ‘fuel value’ or free energy content. Following Feynman, we take this and run with it. In what follows, ‘channel capacities’ R i will be used to build higher-order, iterated, free energy measures following from the statistical mechanical partition function.
That is, this formulation leads directly to a partition function definition of an iterated free energy F [23,24,25]:
ρ ( F / < g > ) = ρ ( R 1 / g 1 ( Z 1 ) ) + ρ ( R 2 / g 2 ( Z 2 ) )
where < g > is some appropriate ‘average temperature’ that we take here as follows:
< g > g 1 ( Z 1 ) g 2 ( Z 2 )
We next impose a first-order model, g ( Z ) = g × Z , where the constant g is an affordance index that determines how well the resource rate Z can actually be utilized, and, for convenience, take ρ ( x ) as the Boltzmann Distribution, solving for F as follows:
F = g 1 Z 1 g 2 Z 2 ln e R 1 g 1 Z 1 + e R 2 g 2 Z 2
Dynamics are then defined in second order at the nonequilibrium steady state according to an Onsager nonequilibrium thermodynamics model [26]:
S F + Z 1 d F / d Z 1 + Z 2 d F / d Z 2 d Z j / d t S / Z j = 0 , j = 1 , 2
where S, as the Legendre Transform of F [24], is an entropy-analog, and we are able to omit the ‘diffusion coefficient’ in the second line of Equation (6).
Solving the two relations S / Z j = 0 for the R j leads to three solution sets for the nonequilibrium steady state (nss)—i.e., homeostatic/allostatic—conditions. The physically nontrivial one is the equipartition
R 1 g 1 Z 1 = R 2 g 2 Z 2 X Z 2 = g 1 R 2 g 2 R 1 Z 1 Z ( Z 1 + Z 2 ) = 1 X R 1 g 1 + R 2 g 2
1. Note that pathological values of R 1 , g 1 and Z 1 can force Z 2 —the regulatory system supply demand—to grow so large that it becomes unsustainable. That is, homeostasis/allostasis of a cognitive/control dyad can be driven to failure by sufficient ‘environmental’ stress.
2. The ‘total energy rate’ Z can be lowered by raising the ‘affordance’ measures g j .
This relatively simple example provides a useful starting point for significant generalizations, at the expense of considerable, often nontrivial, conceptual and formal development.

4. Extending the Model I

As Stanley et al. [27] describe, empirical evidence has been mounting that supports the intriguing possibility that a number of systems arising in disciplines as diverse as physics, biology, ecology, and economics may have certain quantitative features that are intriguingly similar. These properties are conventionally grouped under the headings of ‘scale invariance’ and ‘universality’ in the study of phase transitions.
Here, using the equipartition implied by Equation (7), we impose a quasi-scale-invariant model and explore the resulting patterns of phase transition that affect and afflict homeostasis and allostasis under ‘stress’. The ‘quasi’ arises because one cannot simply abduct the dynamics of physical processes to the dynamics of cognition, particularly biocognition. At the very least, information sources ‘dual’ to cognitive phenomena [13] are not micro-reversible, displaying, as mentioned above, directed homotopy.
An inverse argument generalizes the nss equipartition derived above across a full set of probability distributions likely to be characterized by biological, institutional, and machine cognition.
Fix the R j / ( g j ( Z j ) ) X , and assume there are n 2 contending/cooperating agents, taking n as an integer. (In analogy with physical theory, we might actually need to take n V ( n ) for some monotonic increasing function of n.)
We are, again, envisioning at least an individual enmeshed in a social environment. Then, on this basis, calculate the most general possible form of the corresponding average R.
The relations of interest for possible underlying probability distributions are then taken according to the first-order quasi-scale invariant expressions:
ρ F X R ( R 1 , , R n ) = n ρ ( X ) F = ρ 1 n ρ ( X ) X R R 1 , , R n
where, again, n is an integer, and the appropriate functional form of R = R ( R 1 , R n ) must now be found.
See Khaluf et al. [17] for a discussion of ‘ordinary’ scale invariance, where ρ ( λ x ) = Q ( λ ) ρ ( x ) for some monotonic increasing function Q ( λ ) .
In general, it may be necessary to replace n with some appropriate monotonic increasing function V ( n ) , but for algebraic simplicity, we will continue with the simple, first-order model.
Equation (6) is reexpressed in terms of the R j , imposing the nonequilibrium steady state condition:
R j / t S / R j 0
for all possible R j , so that the R j / ( g ( Z j ) ) are indeed fixed at the same value across all possible probability distributions.
What functional forms of R make this true?
The argument is not quite trivial but straightforward. Requiring  S / R j = 0 in Equation (6), generalizes to a full class of ‘averaging’ functions R ( R 1 , , R n ) , given proper dimensional extension of S.
For n dimensions, a direct solution by induction is as follows:
R = R R 1 , R 2 , , R n = f R 2 R 1 , , R n R 1 R 1
which applies to both arithmetic and geometric averages. This is a simplification from a more elaborate treatment that would add considerable mathematical overhead to an already burdened development.
We will reconsider this problem, and that of Equation (4), from a different and perhaps deeper perspective below.
Here, we carry the argument further, treating Equation (8) as an explicit function of X a nonequilibrium steady state, and extending the Onsager analysis:
ρ F ( X ) X R ( X ) ρ ( Q ) = n ρ ( X ) F ( X ) = ρ 1 n ρ ( X ) X R ( X ) = Q ( n , X ) X R ( X ) S F X + X d d X F X d X / d t S / X = 0 R ( n , X ) = X ( C 1 X + C 2 ) Q ( n , X )
where in the third line we again invoke the Legendre Transform to define an ‘entropy’ in terms of a ‘free energy’.
Again, it may be necessary to replace the integer n in the first line of Equation (11) by an appropriate monotonic increasing function V ( n ) . In physical systems, typically V ( n ) n α , α > 0 .
Higher order ‘Onsager-like’ [26] expressions for the entropy-analog may well be needed, for example,
S F + j ϵ j X j d j F / d X j
for appropriate ϵ j . Such expressions may be system-specific, again leading to the second relation in Equation (11), but producing more involved expressions for F ( X ) .
Below, we extend the argument to cases where d X / d t d S / d X = f ( X ( t ) ) 0 as t increases, introducing ‘frictional’ delays.
Here, the essential point is that calculating  Q ( n , X )  allows finding critical values  X C ( n )  under different probability distributions and ‘Onsager-type’–or other–models. Again, scaling with n may not be the simple linear model used here.
Thus, a direct, comprehensive, approach to system stability, via this approach, is to ‘simply’ demand that Q ( n , X ) 0 be real-valued, given that ρ ( x ) is a probability distribution on 0 x .
For, in order, Boltzmann, k-order Lomax, and Cauchy Distributions—instantiating the non-zero requirement—and sequentially, the Rayleigh, k-order Erlang, and k-order chi-square distributions,
X C = ln ( n ) X C = n 1 / ( k + 1 ) 1 X C = n 1 X C = W m , e 1 n 2 X C = k 1 W m , e 1 1 n 1 k 1 X C = k 2 W m , k 2 k k 2 e 1 k 2 2 k 2 1 k 2 2 n 1 k 2 2 e k k 2 k 2
where all scale parameters have been set equal to one and, in general, one might expect n V ( n ) for some appropriate increasing function. Again, detailed development would add more mathematical overhead to an already overburdened argument.
Note the following:
  • The second condition is most easily proved by direct substitution.
  • W ( m , x ) is the Lambert W-function of order m that satisfies the following relation:
W ( m , x ) exp [ W ( m , x ) ] = x
W is real-valued only for m = 0 , 1 and x > exp [ 1 ] .
  • Equation (13) is strictly equivalent to the condition that there is no solution to ρ ( Q ) = n ρ ( X ) if n ρ ( X ) > max x 0 ρ ( x ) . If the distribution is nonmonotonic and a solution exists, there will be more than one solution form. The first three relations of Equation (13) are associated with monotonic distributions having mode zero.
We are concerned with the rising branches representing cooperation between system subcomponents, displaying in Figure 2, patterns of limiting behavior with increasing n for, respectively, the Rayleigh, second-order Erlang, and fourth-order Chi Square Distributions. These are nonmonotonic. The regimes above the top branch of the ‘C’ are stable for cooperating systems.
The essential point for this analysis is that, like X C = ln ( n ) for the Boltzmann Distribution, the ‘cost’ in X for cooperating modes does not increase linearly with rising n. For the Boltzmann Distribution model, doubling n, i.e., pairing each cognitive submodule with a regulator, increases the rate of ‘resource’ demand as ln ( n ) ln ( 2 n ) = ln ( n ) + ln ( 2 ) .
For cooperating cognition/regulation dyads, the ‘cost’ in units of X does not increase in proportion to the shift n 2 n . From the perspective of evolution under selection pressure, this seems a small price to pay for a grand increase in stability of function.
Again, here, the m = 1 branches can be taken as indexing cooperative coalitions, while the m = 0 branches represent contending system dynamics under rising n, which is another matter.
After some algebra, results similar to Figure 2 are found for the Levy and Weibull distributions, although the Levy is difficult to parse.
We can now extend the ‘energy’ argument of Equation (7), recalling that, for all j, R j / g j ( Z j ) X , the equipartition value. Applying the arguments above, we set X X C ( n ) from Equation (13) for the appropriate underlying probability distribution.
Then
g j ( Z j ) = R j X C ( n ) Z j = g j 1 R j X C ( n ) Z C = j Z j = j g j 1 R j X C ( n )
For convenience, we assume a set of n identical systems, with fixed forms for the g j ( Z j ) = g ( Z j ) and fixed values of the R j = R under some particular probability distribution. Then
Z C = j Z j = n g 1 R X C ( n )
Recall that, if g is monotonic increasing, so is g 1 .
For example, assuming a Boltzmann Distribution,
Z C = n g 1 R ln ( n )
which does not increase linearly with n by virtue of the division by ln ( n ) in g 1 .
Adopting the simple linear model g ( Z ) g × Z ,
R g Z C = X C ( n ) n Z C = n X C ( n ) R g
which makes intuitive sense.
Further, the first relation of Equation (17) can be solved for n (or V ( n ) ), generating necessary conditions for Z C .
For example, under the Boltzmann Distribution,
n = W m , R g Z C R g Z C
for m = 0 , 1 in the Lambert W-function, imposing the necessary condition:
R g Z C exp [ 1 ] Z C R g exp [ 1 ]
From Equations (15)–(17), this is a general result for the linear model: That is, the same expression—with different right-hand terms—is ‘easily’ found for the other probability distributions leading to Equation (13).

5. Extending the Model II

5.1. The Basic Idea

The burdens of fog, friction, and adversarial intent characterize evolutionary selection pressures at every scale and level of organization of cognitive processes, whether biological, social, institutional, machine, or their various composites.
Here, in addressing these matters, we again assume the equipartition implied by Equation (7) as a basic condition, and impose a kind of ‘Born-Oppenheimer Approximation’ [28] in which large-scale changes to the overall system are assumed to occur slowly enough that equipartition is ‘almost always’ sufficiently satisfied.
Under that basic assumption, friction is ‘easily’ introduced by extending the next-to-last relation of Equation (11), so that
S F ( X ) + X d d X F ( X ) d X / d t d S / d X = X d 2 F / d X 2 = f ( X ) F X = f X X d X d X + C 1 X + C 2
Here, we take the following:
f ( X ) = β α X ( t ) X ( t ) = β α 1 exp [ α t ] β / α
so that α becomes the (inverse) index of Clausewitzian friction, i.e., the delay in reaching the maximum possible value of X.
Then
F X = β X ln X X α X 2 2 + C 1 X + C 2
where C 1 and C 2 are important local boundary conditions that emerge ‘naturally’ from solving the differential equations. Solutions are indeed sensitive to these boundary conditions.
From Equation (11), depending on the underlying probability distribution ρ ( X ) , recall that
R ( n , X ) = X F ( X ) Q ( n , X ) ρ ( Q ( n , X ) ) n ρ ( X )
where Q is characteristic of that distribution. Again, in general, n V ( n ) .
Assuming monotononic Boltzmann and nonmonotonic Rayleigh and second-order Erlang Distributions, i.e., ρ ( x ) = exp [ x ] , x exp [ x 2 / 2 ] , x exp [ x ] , the Q ( n , X ) are, respectively,
X ln ( n ) X e X 2 2 X 2 e X 2 W m , X 2 n 2 e X 2 W m , n X e X
where W ( m , x ) is the Lambert W-function of order m = 0 , 1 .
Recall that F is defined by d S / d X = X d 2 F / d X 2 = d X / d t = f ( X ) so that F is given as the last relation of Equation (20).
Fixing α = 1 in the expression for F based on the exponential relation f ( X ) = β α X , with C 1 = 1 , C 2 = 2 , then R ( n , β ) is as in Figure 3a–c. Note the two solutions for the Rayleigh and Erlang Distributions, depending on the order of the Lambert W-function as indicated.
Recall that R is an average channel capacity for control across the full system: This is not a trivial matter. Here, relatively narrow troughs are bracketed by high—essentially unattainable—demands for bandwidth under friction measured by α .

5.2. Fog and Friction

In addition to the burden of Clausewitzian ‘friction’, inherent to the relation d X / d t = f ( X ( t ) ) 0 , the effect of ‘fog-of-war’ can be explored by expanding the expression for d X / d t in Equation (20) as a stochastic differential equation:
d X t = f ( X t ) d t + σ X t d B t
where d B t is taken as Brownian white noise, and the last term represents ‘ordinary’ volatility, here in the context of two models, the ‘exponential’ and the ‘Arrhenius’, given a Boltzmann Distribution and n cognitive modules with the equipartition assumption on X.
The exponential model follows Equations (21) and (22). For the Arrhenius model,
X ( t ) = β exp [ α / t ] d X / d t = f ( X ) = X α ln ( X / β ) 2 F ( X ) = ln X β 2 X 2 2 α 3 X 2 ln X β 2 α + 7 X 2 4 α + C 1 X + C 2
Assuming the Boltzmann Distribution–so that Q = X ln ( n ) in Equation (23), where n is the number of linked modules–we calculate the nonequilibrium steady state conditions < d R t > = 0 for both examples using the Ito Chain Rule [29]. The resulting expressions are relatively simple, but too lengthy for the author’s LaTeX compiler.
For both forms of F, exponential and Arrhenius, α = 1 , β = 3 , C 1 = 1 , C 2 = 3 . Figure 4 shows the resulting NSS solution sets, expressed as X ( n , σ ) . As n and σ increase, the solution sets are destabilized.
It is interesting to repeat this calculation for the second-order Erlang Distribution, which represents a process of two sequential, identical Exponential Distributions (e.g., a two-fold staged decision), using the convolution integral method described in the Mathematical Appendix A. Then
ρ ( x ) = x exp [ x ] Q ( n , X ) = W m , n X e X
where, again, W ( m , x ) is the Lambert W-function of order m.
Taking all other parameter values exactly as in Figure 4, Figure 5 again represents the nonequilibrium steady state solution sets X ( n , σ ) for the Arrhenius and exponential models defining d X / d t = f ( X ) . The solution sets become counterintuitively convoluted with increasing ‘noise’ σ .
These examples show how Clausewitzian ‘friction’ defined by α in the expressions for d X / d t = f ( X ) becomes idiosyncratically synergistic with ‘fog’ uncertainties characterized by σ . That is, what appear to be ‘slight’ differences in frictional dynamics—different forms of delay in response to perturbation—can have an amplified impact on the dynamics of uncertainty in response to perturbation. Similar results will afflict all probability models across any modular conformations and forms of the friction relation f ( X ) .
These results are not encouraging, as any embodied cognitive system, dynamically tuned modular or otherwise, will always encounter rising rates of ‘noise’ in chaotic, unsecured, or adversarial environments, given the necessarily delayed response to such perturbations.
The statistical tools derived from this general approach will likely be as flexible for the study of cognitive phenomena as have been the ‘ordinary’ regression equations used in the analysis of more usual observational and experimental data, in particular, providing benchmarks against which to compare observed patterns.
Cognition, cognitive dynamics—and their inherent failure modes—are hard to understand and remediate, and their effective remediation requires new methods and approaches.
What seems to emerge from the synergism between uncertainty and frictional delay in response to that uncertainty in cognitive dynamics is the dire need for analogs to the long-evolved innate and adaptive immune systems, i.e., a basic, rapid ‘implanted’ response, followed by a more detailed ‘learned’ response.

6. Groupoid Symmetry-Breaking Phase Transitions

There is yet more to be teased out from behind the veils of Clausewitzian Fog-and-Friction. So far, we have treated ‘underlying probability distributions’ as the central, fixed star in a kind of planetary dynamic, leading to the methodological harvests of Equation (13) and what follows from the network relation ρ ( Q ) = n ρ ( X ) (recalling that, in general, n V ( n ) n α ).
What happens when ρ ( X ) is itself perturbed by fog and friction?
Most simply, this can be found by calculating the nonequilibrium steady state:
< d ρ t > = 0
based on the generalized ‘frictional’ stochastic differential equation
d X t = f ( X t ) d t + σ h ( X t ) d B t
using the Ito Chain Rule.
The relation defining the NSS is ‘easily’ found as follows:
σ 2 = 2 f ( X ) ( d d X ρ ( X ) ) h ( X ) 2 ( d 2 d X 2 ρ ( X ) ) = 2 X d 2 d X 2 F ( X ) ( d d X ρ ( X ) ) h ( X ) 2 ( d 2 d X 2 ρ ( X ) )
where the second form follows from the second expression in Equation (20), i.e., d X / d t d S / d X = X d 2 F / d X 2 = f ( X ) .
Equation (29) generates solution set equivalence classes { σ , X } that can be seen as arising from the functions X ( σ ) or σ ( X ) .
1. For reasons that will become more salient, if not intrusive, we are principally concerned with these equivalence classes rather than with the functions themselves.
2. The term
d ρ / d X d 2 ρ / d X 2
may be interpreted as the scale of change or ‘characteristic length’ relating how sharply the density changes versus its curvature at a point. It is analogous to, but different from, the classic ‘hazard rate’ at which signals are detected. For the Exponential Distribution m exp [ m X ] , the hazard rate is m and the characteristic length 1 / m .
3. For convenience, we adopt ‘simple volatility’, i.e., h ( X ) = X , with f ( X ) as the exponential and Arrhenius models from above for, respectively, the Boltzmann and Second Order Erlang Distributions, and plot σ ( X ) to generate Figure 6.
Again, the second-order Erlang represents the convolution of two identical exponential distributions, akin to replacing ‘mission command’ initiative with a two-stage ‘detailed command’ protocol in military or business enterprise [30].
The distributions are, respectively, monotonic and nonmonotonic. The results, in Figure 6, are a Boltzmann-exponential, b Boltzmann–Arrhenius, c Erlang-exponential, and d Erlang–Arrhenius. Again,
ρ ( X ) = exp [ X ] , X exp [ X ] f ( X ) = β α X , X / α ln ( X / β ) 2
Here, α = 1 , β = 3 in both expressions for f ( X ) .
These distinct solution sets, representing markedly different fundamental interactions between internal probabilities, ‘friction’ from α and β , and ‘fog’ from σ , can be grouped together as equivalence classes even as the internal parameters defining ρ ( X ) and f ( X ) are varied.
As the Mathematical Appendix A explores in some detail, equivalence classes lead to an important generalization of elementary ideas of group symmetry, i.e., groupoids, for which products are not defined between all possible element pairings [31].
The simplest groupoid is the disjoint union of groups. Imagine two bags containing the same number of identical ball bearings, each of which can be characterized by the three-dimensional rotation group. Remove a ball bearing from one bag, make a deep indentation in it with a hard-metal punch, and return it to the original bag. The disjoint union of three-dimensional rotation groups in one of the bags has now been broken. One bearing is now symmetric only via rotation about an axis defined by the indentation.
The physical act of striking a ball bearing with a hard-metal punch has triggered a groupoid symmetry-breaking phase transition, analogous to, but distinct from, group symmetry-breaking phase transitions such as melting a snow crystal, evaporating a liquid, or burning a diamond.
Each panel of Figure 6 defines a particular set of equivalence classes that, via the somewhat mind-bending arguments of the Appendix A, defines a groupoid. Changes within fog/friction/probability patterns under the perturbations of external circumstances—like the hard-metal punch on the ball bearing—trigger punctuated groupoid symmetry-breaking phase transitions in the kind of systems we have studied. That is, sufficiently challenged cognitive structures will experience sudden shifts among the panels of a vastly enlarged version of Figure 6.
A more extensive discussion of groupoid symmetry-breaking phase transitions by the Perplexity AI system is found in the Appendix A.
Cognition, even with regulation, is a castle made of sand.

7. Iterated System Temperature

Subsystems of living organisms consume metabolic free energy (MFE) at markedly different rates. For example, among humans, Heart/Kidneys, Brain, and Liver typically require, respectively, 440, 240, and 200 KCal/Kg/day of MFE, while resting Skeletal Muscles and Adipose Tissue require only 13 and 4.5 Kcal/Kg/day ([32], Table 4).
These marked differences, however, are encapsulated within a ‘normal body temperature’ of ≈98.6 deg.F.
This is not as simple as it seems, nor as treated in the first iteration of Equations (4) and (10) above.
Here, we have, via an ‘equipartition’ argument, condensed closely interacting nonequilibrium steady state subcomponents of a larger enterprise in terms of the quantity X from Equations (20)–(29). Taking the perspective of the previous section, more can be done, in particular, leading to the definition of an ‘iterated temperature’ construct that seems akin to the normal body temperature.
The essential assumption is, again, that, under the stochastic burden of Equation (28), the basic underlying probability distribution is preserved in a nonequilibrium steady state < d ρ t > = 0 , leading to the second form of Equation (29). Then the ‘free energy’ construct F ( X ) is expressed as follows:
F X = σ 2 h X 2 d 2 d X 2 ρ X 2 X d d X ρ X d X d X + C 1 X + C 2
Recall now, from above, the embodied Rate Distortion Control Theory partition function argument defining ‘free energy’ F in terms of a temperature analog g:
ρ ( F / g ) = 0 ρ ( R / g ) d R = g
where, g, has become an iterated function of X, the equipartition energy measure. From here on, we will write this as G * to differentiate it from the ‘simple’ temperature analog of previous studies.
That is, here, we are interested in the solution to the following relation:
ρ ( F / G * ) = G * ( X )
given the NSS condition  < d ρ t > = 0 , i.e., the quasi-preservation of the underlying basic probability distribution of the overall system.
We are, in a sense, reprising the calculations of Equations (4) and (10) from a different perspective.
More specifically, we contrast a simple one-stage ‘mission command’ system based on a Boltzmann Distribution with a two-stage ‘detailed command’ process that passes decisions across two sequential Boltzmann stages, giving a process described overall by a second-order Erlang Distribution. Under the impact of ‘noise’ defined by Equation (28), each ‘cultural’ form attempts homeostasis, the preservation of its inherent nature, i.e., < d ρ t > = 0 .
Thus, we, respectively, set ρ ( x ) = exp [ x ] , x exp [ x ] and assume ordinary volatility h ( X ) = X . Then, again via the Ito Chain Rule calculation [29],
F B ( σ , X ) = 1 12 σ 2 X 3 + C 1 X + C 2 , G B * = F B W ( m , F B ) F E ( σ , X ) = σ 2 4 X 3 3 X 2 2 ln X 1 X 1 2 + 2 X + C 1 X + C 2 , G E * = F E 2 W m , ± F E 2
where W ( m , x ) is the Lambert W-function of orders m = 0 , 1 . The G * ( X , σ ) expressions are then the iterated temperatures of the entire system under the reduced equipartition defining X across subsystems.
Here, we adopt the negative- F E form of the W-function in G E * , as the positive form generates only negative temperatures that, in physical systems, are associated with grossly unstable ‘excited states’.
Figure 7a,b show the two forms of G * ( X , σ ) , setting C 1 = 1 , C 2 = 3 , with m = 0 , 1 as indicated.
Sufficient ‘noise’ σ drives both systems into zones of instability where the relation < d ρ t > = 0 fails. This occurs, via the properties of the W-function, for, respectively, F B ( X , σ ) > exp [ 1 ] and F E / 2 > exp [ 1 ] or F E < 0 . The contrast between stable zones is noteworthy: the Detailed Command model is at best only slightly stable compared with our version of Mission Command.
Again, each mode of the underlying ‘culture’, when stressed, attempts to maintain homeostasis as the nonequilibrium steady state < d ρ t > = 0 .
It is ‘not difficult’ to extend these results across orders of iteration of the underlying Boltzmann Distribution via a k-stage ‘cancer initiation’ model based on the Gamma Distribution:
ρ ( m , k , X ) = m k X k 1 e mX Γ k
where Γ ( x ) is the Gamma Function that generalizes the factorial function for integers.
Then the nss condition < d ρ t > = 0 , under the Ito Chain Rule, gives the following:
d 2 d X 2 F X = σ 2 m 2 X 2 2 m X k + 2 m X + k 2 3 k + 2 2 m X 2 k + 2 G * = m F k W n , Γ k 1 k F 1 k k
The first relation can easily be solved as F = σ 2 H ( m , k , X ) + C 1 X + C 2 in an expression too long to be written on this page.
We can conceptually simplify matters slightly by generating a three-dimensional ‘criticality surface’ { k , σ , X }  across the decision level index k by (1), using the necessary condition for real-valued Lambert W-functions of orders n = 0 , 1 , and (2), setting m = 1 , while varying the boundary conditions C 1 , C 2 . This is done via the second expression of Equation (36):
Γ k 1 k F 1 k k = e 1 F = k k e k Γ k
The result is shown in Figure 8a,b for two boundary condition pairs: { C 1 = 1 , C 2 = 3 } and { C 1 = 1 , C 2 = 2 } .
One possible inference from this development is that the approach we have taken permits the development of a very rich body of statistical tools for data analysis. Like all such matters, however, error estimates and basic validity entail a correspondingly rich range of possible devils-in-the-details.
In the Mathematical Appendix, we explore a ‘higher iteration’ in the detailed command structure, i.e., doubling the two-level dynamic based on the second-order Erlang Distribution. The figure shows an initial two-stage ‘tactical’ decision level, followed by a second two-stage ‘operational’ or ‘strategic’ command level. The results are not reassuring.
It is important to note that other selection pressures may operate. That is, Equation (31) assumes the most essential matter is the preservation of the underlying probability distribution through the nonequilibrium steady state < d ρ t > = 0 . What if environmental selection pressure enforces the preservation of some other quantity, for example, a ‘Weber-Fechner’ sensory signal detection mode Q ( X ) = ln ( 1 + X ) instead of ρ ( X ) ? Then < d Q t > = 0 , under the Ito Chain Rule, leads to the following:
F Q ( σ , X ) = σ 2 2 X 2 2 ln 1 + X 1 + X + 1 + X + C 1 X + C 2 G B * = F Q W ( m , F Q ) G E * = F Q 2 W m , F Q 2
Taking C 1 = 1 , C 2 = 3 again produces something much like Figure 7.
We can take this one step further by defining a ‘collective cognition rate’ based on the underlying probability distribution of the overall multi-modular system, given the Rate Distortion Control Theory model in Figure 1. The essential point is to define ‘system temperature’ as G * from Equations (34) or (38). Then, given the distribution form ρ , and a basic Data Rate Theorem stability limit R 0 , the cognition rate, in analogy with a chemical physics reaction rate approach (e.g., Laidler [33]) becomes the following:
L R 0 ρ ( R / G * ) d R 0 ρ ( R / G * ) d R
For the particular Boltzmann and second-order Erlang Distributions studied here, Figure 7 and friends are transformed as follows:
G * exp [ R 0 / G * ] G * exp [ R 0 / G * ] 1 + R 0 G *
Details are left as an exercise. The general forms of L ( X , σ ) are, however, similar to Figure 7.
Note that the ‘Boltzmann’ part of Equation (40) produces the classic inverted-U ‘Yerkes-Dodson Effect’ pattern of cognition rate vs. ‘arousal’ X across a variety of plausible < d Q t > = 0 nonequilibrium steady state restrictions [30,34].

8. Discussion

There seem to be ‘universality classes’ for cognitive systems, determined by the nature of the underlying probability distributions, e.g., monotonic or nonmonotonic, and by the nature of their embedding environments. Other examples may become manifest across different cognitive phenomena, scales, and/or levels of organization.
In particular, the transcendental Lambert W-function is used to solve equations in which the unknown quantity occurs both in the base and in the exponent, or both inside and outside of a logarithm [35]. Similarly, problems from thermodynamic equilibrium theory with fixed heat capacities parallel the results shown in Figure 2.
Indeed, Wang and Moniz [36] make explicit use of the Lambert W-function in the study of classical physical system phase transitions, using arguments that are recognizably similar to those developed here for cognitive phase transitions. Likewise, Roberts and Vallur [37] use the W-function in their study of the zeros of the QCD partition function, another phase transition analysis.
The Perplexity AI system [38] contrasts physical and cognitive phase transition models in these terms:
Stats 08 00117 i001
Both physical and ‘cognitive’ theories are based on probability distribution arguments ‘in which the unknown quantity occurs both in the base and in the exponent, or both inside and outside of a logarithm’. Cognitive phenomena, of course, act over much richer and more dynamic probability landscapes. Cognitive system probability models will, consequently, require sophisticated and difficult conversions into statistical tools useful for the analysis of observational and experimental data.
Khinchin [39] explored in great detail the rigorous mathematical foundations of statistical mechanics, based on such asymptotic limit theorems of probability theory as the Central Limit and Ergodic Theorems. Here, we outline the need for the parallel, yet strikingly different, development of a rigorous ‘statistical mechanics of cognition’ based on the asymptotic limit theorems of information and control theories. Such developments must, however, inherently reflect the directed homotopy and groupoid symmetry-breaking mechanisms inherent to cognitive phenomena.
There are important matters buried in this formalism:
  • We study systems at various scales and levels of organization that are, by Figure 1, fully embodied in the sense of Dreyfus [14], so that their cognitive dynamics may be, or become, ‘intelligent’ in his sense.
  • is basically a ‘rate’ variate, and the existence of a minimum stable X C for cooperating cognitive/regulation dyads is consistent with John Boyd’s OODA Loop analysis in which a central aim is to force the rate of confrontation to exceed the operating limit of one’s opponent, breaking homeostatic or allostatic ‘stalemate’. Predator/prey cycles of post and riposte come to mind.
  • The cost of pairing cognitive with regulatory submodules, which is necessary for even basic stability in dynamic environments, scales sublinearly with the inherent doubling of modular numbers across a considerable range of basic underlying probability distributions likely to characterize embodied cognition. This appears to be a general matter with important evolutionary implications across biological, social, institutional, and machine domains.
  • Nonetheless, in spite of ‘best case’ assumptions and deductions, failure of nominally ‘effective’ cognition on dynamic environments is not a ‘bug’, but is an inevitable feature of the cognitive process of any and all configurations, at any and all scales and levels of organization, including systems supposedly stabilized by close coupling between cognitive and regulatory submodules. See the Appendix for a surprisingly broad literature review by the Perplexity AI system.
The last point has particularly significant implications. A recent study of nuclear warfare command, control, and communications (NC3) by Saltini et al. [40] makes the assertion
Because… [strategic warning, decision support and adaptive targeting]… are independent and tightly coupled, the potential role of AI is not limited to a single application; instead, it can simultaneously enhance multiple aspects of the NC3 architecture.
The introduction of AI entities—even when tightly coupled with draconian regulatory modules—cannot be assumed to significantly stabilize or enhance NC3 or any other critical system. Placing NC3 architectures under AI control merely shifts, obscures, and obfuscates failure modes.
This being said, as a reviewer has noted, the approach advocated here has a number of limitations:
  • The g and ρ families used here are simplifications. Real systems may deviate. Sensitivity analysis— σ and SDE’s—helps but does not establish universality.
  • Validation has been minimal by design. Thresholds/phase transitions should be viewed as testable predictions, not established empirical facts.
  • Results may not hold in adversarial, highly non-stationary, or strongly path-dependent settings without additional assumptions or new proofs.
Again, this being said, on the basis of our—and much similar—work, the Precautionary Principle should constrain the use of AI in such safety-critical domains as NC3.
Further work will be needed to fully characterize the effects of Clausewitzian fog and friction across the full set of probability distributions likely to underlie a wide spectrum of observed cognitive phenomena, but what emerges from this analysis is the ubiquity of failure-under-stress even for ‘intelligent’ embodied cognition, even when cognitive and regulatory modules are closely paired.
There is still No Free Lunch, much in the classic sense of Wolpert and MacReady [3,4].

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy. Text available from the author on request.

Acknowledgments

The author thanks D.N. Wallace for useful discussions and the reviewers for careful reading and comments valuable in revision.

Conflicts of Interest

The author declares there is no conflict of interest.

Appendix A

Appendix A.1. Groupoids

Following Weinstein [31], a groupoid, G is defined by a base set A upon which some mapping—a morphism—can be defined. Note that not all possible pairs of states ( a j , a k ) in the base set A can be connected by such a morphism. Those that can define the groupoid element, a morphism g = ( a j , a k ) having the natural inverse g 1 = ( a k , a j ) . Given such a pairing, it is possible to define ‘natural’ end-point maps α ( g ) = a j , β ( g ) = a k from the set of morphisms G into A, and an associative product in the groupoid g 1 g 2 provided α ( g 1 g 2 ) = α ( g 1 ) , β ( g 1 g 2 ) = β ( g 2 ) , and β ( g 1 ) = α ( g 2 ) . Then the product is defined, and associative, ( g 1 g 2 ) g 3 = g 1 ( g 2 g 3 ) .
In addition, there are natural left and right identity elements λ g , ρ g such that λ g g = g = g ρ g [31].
An orbit of the groupoid G over A is an equivalence class for the relation a j G a k if and only if there is a groupoid element g with α ( g ) = a j and β ( g ) = a k . Following Cannas da Silva and Weinstein [41], we note that a groupoid is called transitive if it has just one orbit. The transitive groupoids are the building blocks of groupoids in that there is a natural decomposition of the base space of a general groupoid into orbits. Over each orbit, there is a transitive groupoid, and the disjoint union of these transitive groupoids is the original groupoid. Conversely, the disjoint union of groupoids is itself a groupoid.
The isotropy group of a X consists of those g in G with α ( g ) = a = β ( g ) . These groups prove fundamental to classifying groupoids.
If G is any groupoid over A, the map ( α , β ) : G A × A is a morphism from G to the pair groupoid of A. The image of ( α , β ) is the orbit equivalence relation G , and the functional kernel is the union of the isotropy groups. If f : X Y is a function, then the kernel of f, k e r ( f ) = [ ( x 1 , x 2 ) X × X : f ( x 1 ) = f ( x 2 ) ] defines an equivalence relation.
Groupoids may have additional structure. As Weinstein [30] explains, a groupoid G is a topological groupoid over a base space X if G and X are topological spaces and α , β and multiplication are continuous maps. A criticism sometimes applied to groupoid theory is that its classification up to isomorphism is nothing other than the classification of equivalence relations via the orbit equivalence relation and groups via the isotropy groups. The imposition of a compatible topological structure produces a nontrivial interaction between the two structures. For example, introducing a metric structure on manifolds of related information sources produces such interaction.
In essence, a groupoid is a category in which all morphisms have inverses, here defined in terms of a connection to a base point via a meaningful path of an information source dual to a cognitive process.
Again, from Weinstein [31], the morphism ( α , β ) suggests another way of looking at groupoids. A groupoid over A identifies not only which elements of A are equivalent to one another (isomorphic), but it also parameterizes the different ways (isomorphisms) in which two elements can be equivalent, i.e., in our context, all possible information sources dual to some cognitive process. Given the information theoretic characterization of cognition presented above, this produces a full modular cognitive network in a highly natural manner.
Brown [42] describes the fundamental structure in these terms:
A groupoid should be thought of as a group with many objects, or with many identities… A groupoid with one object is essentially just a group. So the notion of a groupoid is an extension of that of groups. It gives an additional convenience, flexibility, and range of applications…
EXAMPLE 1. A disjoint union [of groups] G = λ G λ , λ Λ , is a groupoid: the product a b is defined if and only if a , b belong to the same G λ , and a b is then just the product in the group G λ . There is an identity 1 λ for each λ Λ . The maps α , β coincide and map G λ to λ , λ Λ .
EXAMPLE 2. An equivalence relation R on [a set] X becomes a groupoid with α , β : R X the two projections and product ( x , y ) ( y , z ) = ( x , z ) whenever ( x , y ) , ( y , z ) R . There is an identity, namely ( x , x ) , for each x X
Weinstein [31] makes an important point:
Almost every interesting equivalence relation on a space B arises in a natural way as the orbit equivalence relation of some groupoid G over B. Instead of dealing directly with the orbit space B / G as an object in the category S m a p of sets and mappings, one should consider instead the groupoid G itself as an object in the category G h t p of groupoids and homotopy classes of morphisms.
The groupoid approach has become quite popular in the study of networks of coupled dynamical systems, which can be defined by differential equation models [43].

Appendix A.2. Groupoid Symmetry-Breaking Phase Transitions

Generated by the Perplexity AI system.
[The Wallace work]… connects phase-transition-like phenomena in cognitive systems to groupoid symmetry-breaking, a mathematical generalization of group theory that better captures local, context-dependent symmetries. The idea is that, just as physical systems undergo sharp shifts in macroscopic behavior (phase transitions) due to symmetry breaking, so too might cognitive or social systems when information flows, or constraints change.
Physical Theories and Groupoid Symmetry-Breaking
While standard phase transition theory (as in Ising models or Landau’s symmetry-breaking) mainly employs group symmetries, deeper mathematical frameworks—specifically groupoids—have been applied in various modern extensions:
  • Topological Phases of Matter: In condensed matter and statistical mechanics, generalized symmetries, including groupoids, have been used to classify distinct phases, especially where local order parameters or defects are prominent. This context is particularly relevant in systems with spatially varying or local symmetries whose transformations do not fit neatly into group actions.
  • Quantum Field Theory (QFT): Groupoids appear in describing moduli spaces of field configurations, local gauge freedoms, and anomalies. In systems with nontrivial topology or defects (domain walls, vortices), groupoid symmetries formalize the algebra of admissible local transformations, essential in phase transitions involving topological order.
  • String Theory and Higher Category Theory: While classic string theory phase transitions (such as confinement/deconfinement or duality transitions) have historically relied on group-based symmetries, modern approaches—especially those employing higher category theory–increasingly use groupoids (and even higher groupoids) to encode brane intersections, generalized gauge symmetries, and dualities. The language of moduli stacks (groupoids up to homotopy) is prominent in string compactification, where phase structure is analyzed via changes in the topology or geometry of moduli spaces under symmetry breaking.
String Theory: Phase Transitions and Groupoids
  • Moduli Stacks and Groupoids: String theory’s moduli spaces of vacua and their degenerations (leading to phase transitions, e.g., conifold transitions) are often described by stacks, mathematically formalized as groupoids-in-spaces. Symmetry-breaking is then realized as a shift in the groupoid of symmetries acting on these moduli, with phase transitions corresponding to changes in the equivalence classes of these groupoid actions.
  • Generalized Geometric Langlands Program: This mathematical framework, motivated by string dualities and topological field theories, also formalizes symmetry-breaking via groupoid and higher categorical structures, analyzing how categorical symmetries fragment or recombine at criticality.
Stats 08 00117 i002
This convergence of ideas is at the frontier of mathematical physics and complex systems theory—and is a notable area where abstraction and application in cognitive science, as per Wallace, connect deeply with string theory and advanced modern physics.
Literature
Symmetry Breaking and Gauge Theory-Groupoids and Groupoidification by Baez and others explore mathematical physics models in categorical/groupoid frameworks.
Moduli Stacks and Groupoids in String Theory in algebraic geometry and physics journals detail these structures for moduli space phase transitions.
Wallace’s work can be found in Consciousness, Cognition and Crosstalk: The evolutionary exaptation of nonergodic groupoid symmetry breaking, Springer, New York, 2022.

Appendix A.3. Doubling-Down on Bureaucracy

Moving from a single-level decision structure under some probability distribution ρ 1 ( x ) to a two-stage model based on that distribution is done via the convolution integral:
ρ 2 ( x ) 0 x ρ 1 ( u ) ρ 1 ( x u ) d u
characterizing a two-stage dispersal. One could, at the expense of mathematical difficulty, compound the first and second stages using different basic probability distributions. We restrict the argument to Boltzmann and second-order Erlang Distributions.
It is easy to show that exp [ x ] x exp [ x ] under the transformation of Equation (A1).
Suppose the PLA’s (or the current USA’s) Cultural Revolution-era Stavka requires extra ‘political’ supervision via the compounding of two two-stage Erlang Distributions. Then Equation (A1) gives x exp [ x ] ρ 4 ( x ) = ( x 3 / 6 ) exp [ x ] . Taking ordinary volatility under Equation (31),
F = σ 2 4 X 3 3 3 X 2 6 ln X 3 X 3 + 6 X 18 + C 1 X + C 2 ρ 4 ( F / G * ) = G * G * = F 4 W m , 6 1 / 4 F 1 / 4 4 L = e R 0 G * 6 G * 3 + 6 R 0 G * 2 + 3 R 0 2 G * + R 0 3 6 G * 3
The critical relation, from the Lambert W-function, is as follows:
6 1 / 4 F 1 / 4 4 = e 1
G * and the cognition rate L, in this model, have become exceedingly sensitive to the boundary conditions C 1 and C 2 , far more than perhaps expected. In particular, Figure 7b is pancaked even further. Indeed, for the boundary conditions of Figure 7, only the σ = 0 result is real-valued: any ‘noise’ at all is fatal to system cognition.
In contrast, setting C 1 = 1 , C 2 = 2 leads to a bizarre form of stochastic resonance in which a minimum noise level is required for signal detection. Grim details are left to the reader as an exercise.
To yet again beat the Devil, as it were, multiple levels of control seem decidedly counterproductive when confronted by fog, friction, and skilled adversarial intent.

Appendix A.4. The Consensus on ‘Psychopathology’ and Regulation

The Perplexity AI (2025) system comments on these matters as follows.
…[A] number of researchers in cognitive science, neuroscience, and information theory have independently converged on conclusions… that instability and even failure are inherent to embodied cognitive systems, and that pairing with regulatory modules (e.g., meta-cognitive or executive functions) is essential for any measure of reliable function. This view asserts that dysfunctions—ranging from hallucination to broad psychopathology—arise not simply as defects to be eradicated, but as inevitable features of complex, embodied cognition interacting with a highly variable environment [1,2,3,4].
Cognitive Instability and Regulation
Several studies in neuroscience have framed cognitive control as a process of reducing uncertainty and stabilizing decision-making under constant environmental variability. The architecture of human cognition relies on layered control networks (especially in the frontoparietal and anterior cingulate cortices), designed to regulate or compensate for the unreliable and noisy nature of sensory and internal cognitive processes. Failures in these regulatory modules can manifest as behavioral or perceptual instability, which is increasingly viewed… as a structural property of complex cognitive architectures [1,4]
Embodied Cognition and Predictive Coding
Theories rooted in embodied cognition and predictive coding (notably in accounts of schizophrenia and hallucination) describe how the interplay of top-down (cognitive/regulatory) and bottom-up (sensory/environmental) processes is inherently precarious. In disorders like schizophrenia, symptoms arise from an imbalance between these streams: a failure in predictive regulation that is not merely pathological ‘noise,’ but an outcome of how predictive, embodied systems must operate in uncertain worlds. Such failures are not bugs to be entirely eliminated, but intrinsic risks of the embodied, probabilistic strategies that make cognition flexible and adaptive [2,4].
Feature or Bug? Theoretical Perspectives
Contemporary researchers have proposed that ‘noise’ in cognition is a central, adaptive feature rather than an aberration. For example, in a recent review, the variability of neural sampling and thought is described as critical for exploration, adaptation, and learning–not as a simple malfunction. This resonates deeply with [a]… thesis that cognitive ‘failures’ –including hallucinations, forgetting, and the emergence of psychopathology–are consequences of the same mechanisms that underlie cognitive power in dynamic, embodied contexts [4].
Specific Theoretical and Empirical Parallels
  • The information theory account of cognitive control relates instability directly to limits on information processing, echoing [recent] formal modeling [1,5].
  • Embodied cognition approaches highlight failures in emotion regulation and over-abstractness as routes to pathology, supporting the inevitability of instability without appropriate regulatory embedding [3,6].
  • Predictive coding models in psychopathology describe hallucination and delusion as failures of regulatory balancing, rather than simple technical errors [2].
  • Recent work insists that ‘noise’ or variability is central to cognitive function, not a dispensable imperfection [4].
In summary, while… [the formal approach of this study]… articulates these dynamics with rigorous application of asymptotic limit theorems from information theory and nonequilibrium thermodynamics, the broader research landscape affirms the essential insight: that instability, regulation, and the ubiquity of potential failure are not design flaws, but deep, necessary features of real-world embodied cognition [1,2,3,4].
1. https://pmc.ncbi.nlm.nih.gov/articles/PMC4151034 (accessed on 1 August 2025).
2. https://pmc.ncbi.nlm.nih.gov/articles/PMC5472128 (accessed on 1 August 2025).
5. https://pmc.ncbi.nlm.nih.gov/articles/PMC7513233 (accessed on 1 August 2025).
9. https://en.wikipedia.org/wiki/Modularity_of_mind (accessed on 1 August 2025).

References

  1. Wang, Y.; Li, Y.; Xu, C. AI Scaling: From up to down and out. arXiv 2025, arXiv:2502.01677v1. [Google Scholar] [PubMed]
  2. Dulberg, Z.; Dubey, R.; Betrwian, I.; Cohen, J. Having multiple selves helps learning agents explore and adapt in complex changings worlds. Proc. Natl. Acad. Sci. USA 2023, 120, e2221180120. [Google Scholar] [CrossRef] [PubMed]
  3. Wolpert, D.; MacReady, W. No Free Lunch Theorems for Search; SFI-TR-02-010; Santa Fe Institute: Santa Fe, NM, USA, 1995. [Google Scholar]
  4. Wolpert, D.; MacReady, W. No free lunch theorems and optimization. IEEE Trans. Evol. Comput. 1997, 1, 67–82. [Google Scholar] [CrossRef]
  5. Shalev-Shwartz, S.; Ben-David, S. Understanding Machine Learning: From Theory to Algorithms; Cambridge University Press: New York, NY, USA, 2014. [Google Scholar]
  6. Anonymous, 2025. Available online: https://ai.stackexchange.com/questions/47924/do-mixture-of-experts-circumvent-the-limitations-imposed-by-the-no-free-lunch-th (accessed on 1 August 2025).
  7. Wallace, R. Computational Psychiatry: A Systems Biology Approach to the Epigenetics of Mental Disorders; Springer: New York, NY, USA, 2017. [Google Scholar]
  8. Nair, G.; Fagnani, F.; Zampieri, S.; Evans, R. Feedback control under data rate constraints: An overview. Proc. IEEE 2007, 95, 108–138. [Google Scholar] [CrossRef]
  9. Maturana, H.; Varela, F. Autopoiesis and Cognition: The Realization of the Living; Reidel: Boston, MA, USA, 1980. [Google Scholar]
  10. Atlan, H.; Cohen, I. Immune information, selforganization, and meaning. Int. Immunol. 1998, 10, 711–717. [Google Scholar] [CrossRef] [PubMed]
  11. Cover, T.; Thomas, J. Elements of Information Theory, 2nd ed.; Wiley: New York, NY, USA, 2006. [Google Scholar]
  12. Khinchin, A. The Mathematical Foundations of Information Theory; Dover: New York, NY, USA, 1957. [Google Scholar]
  13. Wallace, R. Consciousness, Cognition and Crosstalk: The Evolutionary Exaptation of Nonergodic Groupoid Symmetrybreaking; Springer: New York, NY, USA, 2022. [Google Scholar]
  14. Dreyfus, H. What Computers Can’t Do: Of Artificial Reason; Harper and Row: New York, NY, USA, 1972. [Google Scholar]
  15. Fjelland, R. Why general artificial intelligence will not be realized. Hum. Soc. Sci. Commun. 2020, 7, 10. [Google Scholar] [CrossRef]
  16. Wallace, R. High Frequency Trading, Artificial Intelligence, and the Instabilities of Embodied Cognition, 2024. Available online: https://hal.science/hal-04763730v3 (accessed on 1 August 2025).
  17. Khaluf, Y.; Ferrante, E.; Simoens, P.; Huepe, C. Scale invariance in natural and artificial collective systems: A review. Interface 2017, 14, 20170662. [Google Scholar] [CrossRef] [PubMed]
  18. Kringlebach, M.; Perl, Y.S.; Deco, G. The thermodynamics of mind. Trends Cogn. Sci. 2024, 28, 568–581. [Google Scholar] [CrossRef] [PubMed]
  19. Feynman, R. The Feynman Lectures on Computation; Addison-Wesley: Boston, MA, USA, 1996. [Google Scholar]
  20. Bennett, C.H. The thermodynamics of computation. Int. J. Theor. Phys. 1982, 21, 905–940. [Google Scholar] [CrossRef]
  21. Wallace, R. How AI founders on adversarial landscapes of fog and friction. J. Def. Model. Simul. 2021, 19, 519–538. [Google Scholar] [CrossRef]
  22. Ortega, P.; Braun, D. Thermodynamics as a theory of decision-making with information processing costs. Proc. R. Soc. 2013, 469, 20120683. [Google Scholar] [CrossRef]
  23. Landau, L.; Lifshitz, E. Statistical Physics, 3rd ed.; Part 1; Elsevier: New York, NY, USA, 2007. [Google Scholar]
  24. Pettini, M. Geometry and Topology in Hamiltonian Dynamics and Statistical Mechanics; Springer: New York, NY, USA, 2007. [Google Scholar]
  25. Feynman, R. Statistical Mechanics: A Set of Lectures; Addison Wesley Longman: Boston, MA, USA, 1998. [Google Scholar]
  26. de Groot, S.; Mazur, P. Non-Equilibrium Thermodynamics; Dover: New York, NY, USA, 1984. [Google Scholar]
  27. Stanley, H.; Amaral, L.; Gopikrishnan, P.; Ivanov, P.; Keitt, T.; Plerou, V. Scale invariance and universality: Organizing principles in complex systems. Phys. Stat. Mech. Its Appl. 2000, 281, 60–68. [Google Scholar] [CrossRef]
  28. Wikipedia, 2025b. Available online: https://en.wikipedia.org/wiki/Born%E2%80%93Oppenheimer_approximation (accessed on 1 August 2025).
  29. Cyganowski, S.; Kloeden, P.; Ombach, J. From Elementary Probability to Stochastic Differential Equations with MAPLE; Springer: New York, NY, USA, 2002. [Google Scholar]
  30. Wallace, R. Detailed command vs. mission command: A cancer-stage model of institutional decision making. Stats 2025, 8, 27. [Google Scholar] [CrossRef]
  31. Weinstein, A. Groupoids: Unifying internal and external symmetry. Not. Am. Math. Assoc. 1996, 43, 744–752. [Google Scholar]
  32. Wang, Z.; Ying, Z.; Bosy-Westphal, A.; Zhang, J.; Schautz, B.; Later, W.; Heymsfield, S.; Muller, M. Specific metabolic rates of major organs and tissues across adulthood: Evaluation by mechanistic model of resting energy expenditure. Am. J. Clin. Neutrition 2010, 92, 1369–1377. [Google Scholar] [CrossRef] [PubMed]
  33. Laidler, K. Chemical Kinetics, 3rd ed.; Harper and Row: New York, NY, USA, 1987. [Google Scholar]
  34. Wallace, R.; Fricchione, G. Stress-induced failure of embodied cognition: A general model. BioSystems 2024, 239, 105193. [Google Scholar] [CrossRef] [PubMed]
  35. Wikipedia, 2025. Available online: https://en.wikipedia.org/wiki/LambertWfunction (accessed on 1 August 2025).
  36. Wang, J.; Moniz, N. Analysis of thermodynamic problems with the Lambert W function. Am. J. Phys. 2017, 87, 752–757. [Google Scholar] [CrossRef]
  37. Roberts, K.; Vallur, S. The Lambert W Function, Laguerre Polynomials, and the Zeros of the QCD Partition Function. arXiv 2014, arXiv:1307.1017. [Google Scholar] [CrossRef]
  38. Perplexity AI. Comparing uses of the Lambert W function in physical phase transitions vs. cognition and consciousness models. 2025. [Google Scholar]
  39. Khinchin, A. Mathematical Foundations of Statistical Mechanics; Dover: New York, NY, USA, 1949. [Google Scholar]
  40. Saltini, A.; Mishra, S.; Reiner, P. Nuclear Command, Control, and Communications: A Primer on Strategic Warning, Decision Support and Adaptive Targeting Subsystems, Institute for Security and Technology, 2025. Available online: https://securityandtechnology.org/ (accessed on 1 August 2025).
  41. Cannas da Silva, A.; Weinstein, A. Geometric Models for Noncommutative Algebras; American Mathematical Society: Providence, RI, USA, 1999. [Google Scholar]
  42. Brown, R. From groups to groupoids: A brief survey. Bull. Lond. Math. Soc. 1987, 19, 113–134. [Google Scholar] [CrossRef]
  43. Golubitsky, M.; Stewart, I. Nonlinear dynamics of networks: The groupoid formalism. Bull. Am. Math. Soc. 2006, 43, 305–364. [Google Scholar] [CrossRef]
Figure 1. The Rate Distortion Control Theory model. An inherently unstable system’s ‘topological information’, according to the Data Rate Theorem [8], is added to the effects of ‘noise’ to determine the minimum channel capacity R ( D ) needed to assure transmission with average distortion D. The punctuation implied by the DRT emerges here if there is a critical maximum average distortion beyond which control fails.
Figure 1. The Rate Distortion Control Theory model. An inherently unstable system’s ‘topological information’, according to the Data Rate Theorem [8], is added to the effects of ‘noise’ to determine the minimum channel capacity R ( D ) needed to assure transmission with average distortion D. The punctuation implied by the DRT emerges here if there is a critical maximum average distortion beyond which control fails.
Stats 08 00117 g001
Figure 2. Nonmonotonic distributions. For structures dominated by (a) Rayleigh, (b) second-order Erlang, and (c) fourth-order chi-square distributions, there are two solution forms, and the regions within the ‘C’ are unstable. The systems are partitioned into m = 1 ‘cooperating’ and m = 0 ‘contending’ modes for n interacting components. The cooperating modes are stable only above the ‘C’. Contending modes collapse with rising n. The minimum stability ‘cost’ in X for cooperating modes, defined by the tops of the ‘C’, however, does not increase linearly with rising n.
Figure 2. Nonmonotonic distributions. For structures dominated by (a) Rayleigh, (b) second-order Erlang, and (c) fourth-order chi-square distributions, there are two solution forms, and the regions within the ‘C’ are unstable. The systems are partitioned into m = 1 ‘cooperating’ and m = 0 ‘contending’ modes for n interacting components. The cooperating modes are stable only above the ‘C’. Contending modes collapse with rising n. The minimum stability ‘cost’ in X for cooperating modes, defined by the tops of the ‘C’, however, does not increase linearly with rising n.
Stats 08 00117 g002
Figure 3. R ( n , β ) , setting α = 1 and, in the expression for F ( X ) , C 1 = 1 , C 2 = 2 . (a) Boltzmann, (b) Rayleigh, (c) second-order Erlang Distributions. Narrow troughs are bounded by unattainable demands for bandwidth. The nonmonotonic Rayleigh and Erlang Distributions have two solutions, corresponding to the m = 0 , 1 branches of the Lambert W-function.
Figure 3. R ( n , β ) , setting α = 1 and, in the expression for F ( X ) , C 1 = 1 , C 2 = 2 . (a) Boltzmann, (b) Rayleigh, (c) second-order Erlang Distributions. Narrow troughs are bounded by unattainable demands for bandwidth. The nonmonotonic Rayleigh and Erlang Distributions have two solutions, corresponding to the m = 0 , 1 branches of the Lambert W-function.
Stats 08 00117 g003
Figure 4. Effect of increasing stochastic ‘fog’ σ on equipartition value X for the Boltzmann Distribution assuming (a) Arrhenius and (b) exponential models for d X / d t = f ( X ) under the nonequilibrium steady state condition < d R t > = 0 . Here, β = 3 , α = 1 , C 1 = 1 , C 2 = 3 in F. As n and σ increase, the solution sets are destabilized.
Figure 4. Effect of increasing stochastic ‘fog’ σ on equipartition value X for the Boltzmann Distribution assuming (a) Arrhenius and (b) exponential models for d X / d t = f ( X ) under the nonequilibrium steady state condition < d R t > = 0 . Here, β = 3 , α = 1 , C 1 = 1 , C 2 = 3 in F. As n and σ increase, the solution sets are destabilized.
Stats 08 00117 g004
Figure 5. Analog to Figure 4 for the second-order Erlang Distribution representing the serial composition of two Boltzmann Distributions. (a). Arrhenius. (b). Exponential. What appear to be ‘slight’ differences between frictional dynamics appear to have an amplified effect in synergism with uncertainty.
Figure 5. Analog to Figure 4 for the second-order Erlang Distribution representing the serial composition of two Boltzmann Distributions. (a). Arrhenius. (b). Exponential. What appear to be ‘slight’ differences between frictional dynamics appear to have an amplified effect in synergism with uncertainty.
Stats 08 00117 g005
Figure 6. Assuming ‘simple volatility’ h ( X ) = X , with f ( X ) as the exponential and Arrhenius models, generates equivalence class solution sets (a) Boltzmann-exponential, (b) Boltzmann–Arrhenius, (c) Erlang-exponential, and (d) Erlang–Arrhenius. For all, α = 1 , β = 3 .
Figure 6. Assuming ‘simple volatility’ h ( X ) = X , with f ( X ) as the exponential and Arrhenius models, generates equivalence class solution sets (a) Boltzmann-exponential, (b) Boltzmann–Arrhenius, (c) Erlang-exponential, and (d) Erlang–Arrhenius. For all, α = 1 , β = 3 .
Stats 08 00117 g006
Figure 7. Iterated temperature G * ( X , σ ) for the (a) Boltzmann and (b) second-order Erlang Distributions with C 1 = 1 , C 2 = 3 in F. Both systems become unstable, respectively, for F B > exp [ 1 ] and F E / 2 > exp [ 1 ] or F E < 0 . The stable zone of the Detailed Command example is consequently highly compressed and stunted.
Figure 7. Iterated temperature G * ( X , σ ) for the (a) Boltzmann and (b) second-order Erlang Distributions with C 1 = 1 , C 2 = 3 in F. Both systems become unstable, respectively, for F B > exp [ 1 ] and F E / 2 > exp [ 1 ] or F E < 0 . The stable zone of the Detailed Command example is consequently highly compressed and stunted.
Stats 08 00117 g007
Figure 8. Criticality surfaces from Equation (37) for two boundary condition sets: (a). { C 1 = 1 , C 2 = 3 } (b). { C 1 = 1 , C 2 = 2 } .
Figure 8. Criticality surfaces from Equation (37) for two boundary condition sets: (a). { C 1 = 1 , C 2 = 3 } (b). { C 1 = 1 , C 2 = 2 } .
Stats 08 00117 g008
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wallace, R. Still No Free Lunch: Failure of Stability in Regulated Systems of Interacting Cognitive Modules. Stats 2025, 8, 117. https://doi.org/10.3390/stats8040117

AMA Style

Wallace R. Still No Free Lunch: Failure of Stability in Regulated Systems of Interacting Cognitive Modules. Stats. 2025; 8(4):117. https://doi.org/10.3390/stats8040117

Chicago/Turabian Style

Wallace, Rodrick. 2025. "Still No Free Lunch: Failure of Stability in Regulated Systems of Interacting Cognitive Modules" Stats 8, no. 4: 117. https://doi.org/10.3390/stats8040117

APA Style

Wallace, R. (2025). Still No Free Lunch: Failure of Stability in Regulated Systems of Interacting Cognitive Modules. Stats, 8(4), 117. https://doi.org/10.3390/stats8040117

Article Metrics

Back to TopTop