On the Inherent Instability of Biocognition: Toward New Probability Models and Statistical Tools

A central conundrum enshrouds biocognition: almost all such phenomena are inherently unstable and must be constantly controlled by external regulatory machinery to ensure proper function, in much the same sense that blood pressure and the ‘stream of consciousness’ require persistent delicate regulation for the survival of higher organisms. Here, we derive the Data Rate Theorem of control theory that characterizes such instability via the Rate Distortion Theorem of information theory for adiabatically stationary nonergodic systems. We then outline a novel approach to building new statistical tools for data analysis based on those theorems, focusing on groupoid symmetry-breaking phase transitions characterized by Fisher Zero analogs.


Introduction
Although, as Maturana and Varela [1] put it, all organisms-with and without a nervous system-are cognitive, and living per se is a process of cognition, the assertion, as it were, only sings half the Mass. Many, perhaps most, cognitive phenomena, from blood pressure control through gene expression, immune function, consciousness, and driving a fast vehicle at night on a twisting, pot-holed roadway, are inherently unstable and must be heavily regulated to prevent catastrophe. For humans in particular, the 'stream of consciousness' must always be contained within social and cultural as well as more common neuropsychological 'riverbanks' to successfully confront and circumvent powerful selection pressures (e.g., [2]).
Cognition implies choice, choice reduces uncertainty, and the reduction in uncertainty implies the existence of an information source 'dual' to the cognitive process under study [2][3][4][5][6]. The argument is both direct and unambiguous. Here, we will assume such information sources are piecewise adiabatically stationary, so that on a trajectory 'piece', the system is sufficiently close to nonergodic stationary for the appropriate asymptotic limit theorems to work sufficiently well. A useful analog is the Born-Oppenheimer approximation in molecular QM, where nuclear motions are taken as slow enough for electron structures to essentially equilibrate.
Relatively recent work relates the asymptotic limit theorems of information theory to control theory via the Data Rate Theorem (e.g., [7]). Following the linear approximation used in that work, consider a reduced model of a control system as in Figure 1.
where det is the determinant of the subcomponent A m -with m ≤ n -of the matrix A having eigenvalues ≥ 1. H 0 is defined as the rate at which the unstable system generates its own 'topological information'. A singular feature of this perspective, as a reviewer has pointed out, is that not only does it separate the car and driver from the road, but it differentiates the road from the car and driver, which may be of considerable importance if one's focus is highway maintenance.
If this inequality is violated, stability fails: blood pressure skyrockets, the immune system eats its host, conscious higher animals suffer hallucinations, and a speeding vehicle goes off the road.
A singular feature of such a control theory perspective, as a reviewer pointed out, is that not only does it separate a car and driver from the road, but it differentiates a road from cars and drivers, which may be of considerable importance if one's central focus is highway maintenance. Further as a reviewer has noted, the channel can be rendered noise-free by importing more information through expanding the environment-the controller-so that the system/controller interaction occurs at the boundary that separates them. This is, however, seldom possible in real-world, real-time interactions.
Here, we will approach these matters from a somewhat unusual direction that permits significant generalization and engineering application, in a large sense. We will derive the Data Rate Theorem from the Rate Distortion Theorem via some counterintuitive but characteristic formal legerdemain.

Remembering Brownian Motion
Einstein's treatment of Brownian motion for N particles, via the simple diffusion equation where x is distance and µ a diffusion coefficient, produces a solution in terms of the Normal Distribution, After some manipulation, A central observation of Bennett, as represented by Feynman [8], is that a 'simple ideal machine' permits the extraction of free energy from an information source. The Rate Distortion Theorem [9] expresses the minimum channel capacity R(D) needed for a transmitted signal along a noisy channel to be received with average distortion less than or equal to some scalar measure D. A 'worst case', in many respects, is the Gaussian channel for which, under the square distortion measure, where σ is a 'noise' parameter. If, following Feynman [8], we identify information as a form of free energy, it becomes possible-in a purely formal manner-to construct an 'entropy' S for the system in the standard way of ordinary thermodynamics via the Legendre transform and to then impose a first-order nonequilibrium thermodynamics Onsager approximation [10] as the simple diffusion equation where 'µ' is taken as a kind of diffusion coefficient. Using Equation (6), we can directly calculate D(t) as Thus, based on the definition of 'entropy' in Equation (7) as a Legendre transform and the use of the Onsager entropy gradient approximation of Equation (8), D(t) ∝ √ t, as if the 'target' undergoes simple Brownian motion in the absence of further data. Parenthetically, as will be shown below in Section 4, this surprising result suggests a direction for generalization to less simple systems, i.e., build an appropriate free energy, define an 'entropy' as its Legendre transform, and then impose the Onsager entropy gradient approximation to derive system dynamics. Taking the next step in the full argument, for a simple deterministic system, we can impose 'control free energy' at a rate M(H), depending, in a monotonic increasing but possibly nonlinear manner, on a control information rate H. Then, for the existence of a nonequilibrium steady state-a kind of stability having, at least, a fixed value of distortion-dD/dt must decline to zero, where max represents the maximum value, and noting that, if M is monotonic increasing, so is the inverse function M −1 . By convexity, d 2 R/dD 2 ≥ 0 [9]. For the Gaussian channel-in the absence of further noise-at nonequilibrium steady state where dD/dt ≡ 0, then D ∝ 1/M(H). Again, an example might be continuous radar or lidar illumination of a moving target with transmitted energy rate M, defining a maximum D inversely proportional to M. If illumination fails, then D will increase as √ t, representing classic diffusion from the original tracking trajectory.
We have, in the last expression of Equation (10), replicated something of Equation (2), but there is far more structure concealed here.

A General Model
Recall again that the Rate Distortion Function R(D) is always convex in D, so that d 2 R/dD 2 ≥ 0 [9,11]. Further, following [11,12], and others, for a nonergodic process, the Rate Distortion Function can be calculated as an average across the RDF of the ergodic components of that process, and can thus be expected to remain convex in D. We reconsider Figure 1 and now examine the 'expected transmission' of a signal X t →X t+1 , (in the presence of an added 'noise' Ω), but received as a de facto signal X t+1 . That is, there will be some deviation between what is ordered and what is observed, measured under a scalar distortion metric as d(X t+1 , X t+1 ) and averaged as where Pr is the probability ofX t+1 and the 'sum' may represent a generalized integral. We have constructed an adiabatically, piecewise stationary information channel for the control system and can invoke a Rate Distortion Function R(D). Following Equation (10), we can write a general stochastic differential equation [13] where µ is a 'diffusion coefficient' and Ωh(D) is the 'volatility' under Brownian noise dW t . As above, an average < D > can be calculated from the relation so that, again, something like Equation (10) can be said to hold. However, stability of stochastic systems is a far richer landscape than for deterministic systems. That is, we are now interested in stability under stochastic volatility, and there are many possible characterizations of it. More specifically, we want to calculate the nonequilibrium steady-state properties of a general function Q(D t ) given Equation (11), i.e., the relation < dQ t >= 0. This can be conducted using the Ito Chain Rule [13], leading to Solving for M(H), giving general results that are not confined to a particular algebraic form for R(D).
As a 'worst case' example, we treat the Gaussian channel of Equation (6) in second order.
Assuming the simple volatility function h(D) = D and taking Q(D) = D 2 allows determination of a general condition for stability in variance for the Gaussian channel from Equation (14) as where the last inequality in the first expression can be found from a simple minimization argument. Calculations for other kinds of channels are similar. Another possible approach views the control signal H itself as the fundamental distortion measure between intended and observed behaviors, so that Equation (11) becomes where H is the rate of control information needed to stabilize the inherently unstable system and the other factors are as above. We can now carry through the analysis as driven by Equation (12), but with D replaced by H and the rate of control free energy is taken as M(H). This gives the necessary condition of Equation (14), but in terms of H, it provides a different picture of stability dynamics based on the 'average control distortion' H rather than on H itself, regardless of the typically monotonic increasing nature of M(H).

Extending the Perspective
The focus on distortion and channel capacity in our reinterpretation of Figure 1 permits a simplified one-dimensional analysis. We continue in one dimension but branch the argument.
Via the Legendre transform and Onsager approximations of Equations (7) and (8), and through the SDE of Equation (12), it has been possible to derive, in Equations (14) and (16), a fair version of the Data Rate Theorem result of Equation (2). A constraint on the approach is that, because information transmission is not microreversible-for example, in English, the term 'the' has a much higher probability than 'eht', and in Chess, one cannot uncheckmate a King-there can be no 'Onsager reciprocal relations' in multidimensional variants of Equations (8) and (12).
The treatment of multiple parallel processes-branching-in this formalism is of particular interest if somewhat subtle.
Here, we see control dynamics-the set of possible 'Data Rate Theorems'-as highly context-dependent, supposing that different 'road conditions' will require different analogs to, or forms of, the DRT. That is, the transmission of control messages as represented in Equation (11) involves equivalence classes of possible control sequences X ≡ {X t , X t+1 , X t+2 , . . .}. For example, driving a particular stretch of road slowly on a dry, sunny morning is a different 'game' than driving it at high speed during a midnight snowstorm.
We characterize all possible such 'games' in terms of equivalence classes G j of the control path sequences X, each class associated with a particular 'game' represented by an information source having uncertainty H G j ≡ H j .
In this model, R(D), characterizing the particular 'vehicle', plays a different role. Here, we fix the maximum acceptable average distortion and impose the corresponding R as a temperature analog via an iterated model. To do this, we construct a pseudoprobability where the 'temperature' g(R) must be calculated from first principles.
The denominator in Equation (17) can be taken as a statistical mechanical partition function to derive a free energy F as Next, again, it is possible to formally define an 'entropy' in terms of a Legendre transform on F and to again impose a dynamic relation like that of the first-order Onsager treatment of nonequilibrium thermodynamics [10]. Then, in general (dropping the 'diffusion coefficient' µ), Several points: (1) In particular, note the first-order Onsager nonequilibrium thermodynamics approximation in the second expression. This is a very crude model that cannot be expected to have universal applicability. Higher-order and multivariate versions will be of greater, but still limited, use.
(2) f (R) represents the 'friction' inherent to any control system, e.g., dR/dt = f (R) = β − αR, R(t) → β/α, while the RootOf construction generalizes the Lambert Wfunction, seen by carrying through a calculation setting A(g(R)) = g(R). (3) As 'RootOf' may have complex number solutions, the temperature analog g(R) now imposes 'Fisher Zeros' analogous to those characterizing the phase transition in physical systems (e.g., [14][15][16]). Phase transitions in the cognitive process range from the punctuated onset of conscious signal detection to Yerkes-Dodson 'arousal' dynamics. See Wallace [2] for worked-out examples. (4) The set of equivalence classes G j defines a groupoid in the classic manner (e.g., [17]), and the Fisher Zero construction represents groupoid symmetry-breaking for cognitive phenomena that is analogous to, but different from, the group symmetry-breaking associated with the phase transition in physical processes.
Groupoid symmetry-breaking phase transitions represent one extension of the basic DRT.
(5) One possible extension of these results is via a 'reaction rate' treatment abducted from chemical physics [18]. The 'reaction rate' L, taking the minimum possible source uncertainty across the 'game' played by the control system as H 0 , is then where the sums may be generalized integrals and the denominator can be recognized as the partition function A(g(R)) from Equation (18).
Wallace [2] uses similar models to derive a version of the Yerkes-Dodson 'inverted-U' model for cognition rate vs. arousal.
Again, Fisher Zero analogs in g(R) characterize cognition rate phase transitions.
(6) We have, by careful design, been able to restrict the development to one dimension via focus on the Rate Distortion Function R(D) alone. The extension of the theory to higher dimensions, for example, writing R(D, Z), where Z is an irreducible vector of resource rates, is not entirely straightforward [19] (Chapter 5), nor is the extension to more complicated versions of the Onsager models [20]. (7) The argument extends directly to stationary nonergodic systems, where time averages are not ensemble averages, assuming that sequences can be broken into small highprobability sets consonant with underlying forms of grammar and syntax, and a much larger set of low-probability sequences not so consonant [2,19,20]. Equation (17) is then path-by-path, as source uncertainties can still be defined for individual paths [21]. The 'game' equivalence classes still emerge directly, leading to groupoid symmetrybreaking phase transitions [2,22].
Finally, it is possible to explore the stochastic stability of the system of Equation (19) in much the same manner as was performed in the previous section. Here, the driving stochastic relation-analogous to Equation (12)-becomes where, again, Ωh(R) is a volatility term in the Brownian noise dW t . The application of the Ito Chain Rule to a stochastic stability function Q(R) produces an analog to Equation (13), < dQ t >= 0, as Examining stability in second order and with 'simple' volatility, so that Q = R 2 and h(R) = R, and assigning an 'exponential' relation as dR/dt = f (R) = β − αR gives the variance as which explodes as Ω 2 → 2α. A different viewpoint permits R in Equation (22) to be unconstrained, thus imposing selection pressure via the possible forms of f (R) and F, as determined by Q and h.

Discussion
The 'simple' Rate Distortion Theorem arguments of Sections 2 and 3, driven by the essential convexity of the Rate Distortion Function for stationary systems, lead to very general deterministic and stochastic forms of the Data Rate Theorem. That theorem, usually derived as an extension of the Bode Integral Theorem (e.g., [7], and references therein), places control theory and information theory within the same milieu, one that encompasses much of the cognitive phenomena so uniquely characterizing the living state (e.g., [1]).
More generally, there cannot be cognition-held within the confines of information theory-without a parallel regulation, held within the confines of control theory. Both theories are constrained by powerful-and apparently closely related-asymptotic limit theorems. Section 4 extends the underlying concept to multiple 'selection pressures' via equivalence classes of control path sequences. This leads to an iterated model in which a partition function, the denominator of Equation (17), serves as the basis for an iterated free energy, its associated Legendre transform entropy, and an iterated Onsager approximation in the gradient of that entropy construct, driving system dynamics. A stochastic stability analysis can be conducted in a standard manner. Ultimately, the extended theory focuses on groupoid symmetry-breaking phase transitions characterized by Fisher Zero analogs.
In sum, we have outlined a novel approach to building new statistical tools for data analysis based on the asymptotic limit theorems of control and information theories. An interested reader should be able to take this material and run with it, recognizing that all such statistical tools-much like Onsager approximations in nonequilibrium thermodynamicsinevitably have limited ranges of reliability and applicability. Such recognition serves to evade many of the intellectual thickets surrounding contemporary Grand Unifying Theories, e.g., [23][24][25][26][27].
Author Contributions: R.W. conceived the approach and carried out the calculations. I.L. and S.G. provided essential context and support. All authors have read and agreed to the published version of the manuscript.