A Variational Synthesis of Evolutionary and Developmental Dynamics

This paper introduces a variational formulation of natural selection, paying special attention to the nature of ‘things’ and the way that different ‘kinds’ of ‘things’ are individuated from—and influence—each other. We use the Bayesian mechanics of particular partitions to understand how slow phylogenetic processes constrain—and are constrained by—fast, phenotypic processes. The main result is a formulation of adaptive fitness as a path integral of phenotypic fitness. Paths of least action, at the phenotypic and phylogenetic scales, can then be read as inference and learning processes, respectively. In this view, a phenotype actively infers the state of its econiche under a generative model, whose parameters are learned via natural (Bayesian model) selection. The ensuing variational synthesis features some unexpected aspects. Perhaps the most notable is that it is not possible to describe or model a population of conspecifics per se. Rather, it is necessary to consider populations of distinct natural kinds that influence each other. This paper is limited to a description of the mathematical apparatus and accompanying ideas. Subsequent work will use these methods for simulations and numerical analyses—and identify points of contact with related mathematical formulations of evolution.


Introduction
This paper is an attempt to show that some fundaments of theoretical evolution-and (neuro)biology-emerge when applying the free energy principle to dynamical systems with separation of temporal scales.It offers a technical and generic treatment with minimal assumptions or commitments to specific biological processes.As such, it does not borrow from established constructs in evolutionary theory; rather, it tries to show how some of these constructs are emergent properties, when seen through the lens of the free energy principle.In subsequent work, we will use the ensuing variational synthesis to consider established-and current-evolutionary theories.Our aim in this paper is to introduce a formalism that may be useful for addressing specific questions-about evolutionary or developmental dynamicsusing analytic or numerical recipes that have proven useful when applying the free energy principle in other fields.
A key phylogenetic process-underlying the development and diversification of species in evolutionary time-is known as natural selection, regarded by some as the central organizing principle of biology.While Darwin conceived of natural selection in terms of heredity, variation, and selection [1,2], he only detailed selection, as the mechanisms of heredity and variation would not be understood for some time [3,4].The integration of Mendelian genetics with natural selection in the early twentieth century was followed by an integration with molecular genetics [5] in the mid-century to form Neo-Darwinism, or the modern synthesis.The modern synthesis, along with the selfish gene hypothesis-put forth in the 1970s [6]-provide a largely gene-centric view of Darwinian evolution that dominates the current perspective.This gene-centric view of evolutionary biology has remained largely disconnected from phenotypic processes that impact organisms in developmental time [7,8].Lewontin characterised this disconnect-between genetic and phenotypic understanding-as the major challenge facing the field [9].While some progress has been made in the following fifty years, biologists continue to highlight the gaps remaining for modelling biology as a single integrated process over multiple scales [10][11][12][13].By 'gene-centric' we refer not just to theories of sequence evolution [14], but also to the central role genes (or summary statistics of genes) play either explicitly or implicitly in accounts of phenotypic evolution.For instance, the Price Equation [15] and the, closely related, replicator equation [16] of evolutionary game theory express the relationship between the change in (the average of) some phenotypic trait over time.This implicitly relies upon a mapping between that trait and the genetic material passed from generation to generation but focuses upon the phenotypic effects of genes as opposed to the alleles themselves.Similarly, adaptive dynamic approaches [17] typically focus upon ecological interactions at a phenotypic level.The modern focus upon phenotypic traits reflects the importance of the interaction between a phenotype and its environment in determining fitness.However, it is important to note that such perspectives do not conflict with the central role of genetic inheritance, and implicitly score the fitness of genotypes in terms of the phenotypes they imply.
An organism inherits a set of instructions for growth and development (i.e., an extended genotype) that is, in essence, a prediction about the niche environment (including temperature, humidity, chemical composition, available resources, statistical patterns, etc.).Interrogating the phrase 'survival of the fittest,' leads to the understanding of 'fittest' as organisms that are the best 'fit' to their niche environment [18].For example, a bacterium from thermal hot springs will fail to thrive in a cool pond because its genotype does not accurately predict the niche environment.Therefore, "fittest" must be relative to the niche.A series of slow phylogenetic processes has selected for an extended genotype that enhances the growth and proliferation of organisms in the environment where the corresponding species expects to find itself.
An organism can also 'fit' itself to the niche through adaptation (i.e., action, learning, and development) during its lifetime.For example, a bacterium that normally subsists on sulphur reduction-but can also survive through reducing oxygen-will outlast its sulphurdependent competitors in an environment that is devoid of sulphur.Such an organism can adapt to its environment through learning and optimising for oxygen reduction, thereby increasing its fit to the niche, and, implicitly, its capacity to reproduce in a high-oxygen environment.In this way, the phenotypic processes can enhance the fit of organisms to their environment in developmental time, and through reproduction, phenotypic processes can lead to the enhancement of fit in evolutionary time (i.e., across generations).As the (extended) genotype of organisms produces phenotypes, phylogenetic processes over evolutionary time also impact phenotypic (ontogenetic) processes in developmental time.
Here, we offer a synthesis of evolution and development through a mathematical framework that unifies slow, multi-generational (phylogenetic) processes with single-lifetime, phenotypic (developmental and behavioural) processes using the same principles, as they apply to each temporal scale.The ensuing variational account of evolution focuses on the coupling between phylogenetic processes at evolutionary timescales and ontogenetic processes over phenotypic lifetimes.In principle this-relatively abstract-treatment is agnostic to specific mechanisms, and could be applied to biological as well as non-biological systems provided their 'fitness' depends upon events during a lifetime, and where this fitness influences dynamics over a generational scale.This foregrounds the circular causality that arises from the implicit separation of timescales [19].
In brief, we treat the slow phylogenetic process (natural selection) as furnishing topdown constraints (i.e., top-down causation) on fast phenotypic processes (action selection).In turn, the active exchange of the phenotype with its environment provides evidence that is assimilated by natural selection (i.e., bottom-up causation).This multi-scale ontological account is licensed by describing both phylogenetic and phenotypic processes as selecting (extended) genotypes and (extended) phenotypes [7,20] with the greatest fitness; where fitness is quantified with (free energy) functionals of probability density functions (a functional is a function of a function).
A narrative version of this (multiscale) account can be told from the point of view of the genotype (from the bottom-up) or the phenotype (from the top-down): • From the perspective of the genotype, we can consider evolution as belief-updating over generations, where the belief in question corresponds to a probability density over extended genotypes (henceforth, genotype).This is analogous to treatments of evolution in terms of changes in allele frequencies from generation to generation [15].This belief updating can be described by the probability of a genotype appearing in subsequent generations, in a way that depends lawfully on the marginal likelihood of extended phenotypes (henceforth, phenotype) in the current generation.The basic idea is that the genotype parameterises or encodes a generative model, which the phenotype uses to infer and act on its environment.
On this view, evolution can be regarded as testing hypotheses-in the form of generative models-that this kind of phenotype can persist in this environment.These hypotheses are tested by exposing the phenotype to the environment and are rejected if the phenotype 'strays from the path' of a persistent phenotype.In this way, the evolutionary process selects models or hypotheses about persistent phenotypes for which it has the greatest evidence.In short, natural selection just is Bayesian model selection [25,26,31,32].
• From the point of view of a phenotype, each conspecific is equipped with a generative model and initial/prior/inherited conditions that underwrite its epigenetic, developmental and ethological trajectories.The states of the phenotype trace out a path through state-space over its lifetime.These phenotypic states encode or parameterise beliefs about environmental states-and the way the phenotype acts.This leads to active inference and learning, in which the phenotype tries to make sense of its world and-through a process of belief updating-to realise the kind of creature it thinks it is.We use the term 'thinks' in a loose (anthropomorphic) sense here and do not mean to imply that all living entities have explicit existential thoughts.More precisely, what we mean is that these entities behave as if they implicitly hold a set of beliefs about the sort of entity they are (e.g., the meta-Bayesian stance as considered in [33]).In virtue of its genetic endowment, it thinks it is a persistent phenotype.If endowed with a good generative model of its environment [34], it will persist and supply evidence of its 'fit' to the environment (i.e., fitness); namely, evidence (i.e., marginal likelihood) that has been accumulated by the slow evolutionary process.
What follows is a formal version of this narrative that calls upon some standard results from statistical physics.The resulting synthesis is both dense and delicate, because it tries to account for coupling between a phenotype and its econiche-and the coupling between phenotypic and phylogenetic processes-using the same principles.Specifically, we describe a variational synthesis that calls on the path integral formulation of stochastic dynamics, the apparatus of the renormalisation group, and the Poincaré recurrence theorem.The ensuing synthesis considers natural selection and action selection as emergent properties of two random dynamical processes unfolding at slow (phylogenetic) and fast (phenotypic) timescales.The key aspect of this synthesis is that both processes have an attracting set (a.k.a., pullback attractor) or steady-state solution [35].These solutions correspond to an evolutionary stable state [36] and a nonequilibrium steady-state density [37] over phylogenetic and phenotypic states, respectively.By describing these steady states in terms of a phylogenetically encoded generative model-namely, a joint density over the paths of the phenotype and its environment-one can recover an ontological description of how the two processes inform, and are informed by, each other.
Much of the analysis presented in this paper follows that in [21][22][23], which also appeals to the notion of a renormalization group.These treatments are based upon the emergence of separable timescales and the interpretation of the dynamics at each scale in analogy with inference and learning processes.The key differences are as follows.The renormalization in [21] depends upon a reduction in the number of degrees of freedom with learning, whereas our formulation depends upon a partitioning operation as part of the renormalization.The differences in timescales between variables in [21] emerges from the structure of the neural network used, whereas it is a direct consequence of the reduction operator implicit in our choice of renormalization.Finally, we extend our analysis to sentient phenotypes, whose dynamics can be interpreted explicitly in terms of Bayesian belief-updating.

A variational formulation
We assume that evolution can be described with two random dynamical systems, describing phylogenetic (evolutionary) and phenotypic (particular) processes, respectively.
The idea is to couple these systems using the apparatus of the renormalisation group [38][39][40].
This implies a mapping from fast phenotypic dynamics to slow phylogenetic dynamics in evolutionary time.
This mapping rests upon a dimension reduction and coarse graining or grouping operator (RG for Renormalization Group) that maps the path of a phenotype  to relevant variables at the evolutionary scale  = R .On this view, bottom-up causation is simply the application of a reduction operator,  R to select variables that change very slowly.Topdown causation entails a specification of fast phenotypic trajectories in terms of slow genotypic variations, which are grouped into populations,  G , according to the influences they exert on each other.The implicit separation into fast and slow variables can be read as an adiabatic approximation [41] or-in the sense of synergetics-into fast, dynamically stable and slow, dynamically unstable modes, respectively [42].This separation can also be seen in terms of vectorial geometric formulations [43].Please see [21], who deal carefully with the separation of time scales, by analogy with temporal dilation in physics.Intuitively, this analogy rests upon the idea that time can be rescaled, depending upon whether we take the perspective of things that move quickly or slowly.
The final move is to express the dynamics-at fast and slow levels-in terms of functionals that have the same form.These functionals are functions of probability densities that can be read as Bayesian beliefs.Expressing the dynamics in this way allows one to interpret phenotypic dynamics as active inference and learning, under a generative model that depends on the extended genotype.This also allows one to interpret the phylogenetic state as inferring states of the environment over evolutionary time.Crucially, the extended genotype accumulates evidence for its phenotype; thereby evincing a form of Bayesian model selection or structure learning [25,[44][45][46][47][48].For an analogous thermodynamic treatment, please see [22], who refine and extend the free energy formulation of [49].In context of learning dynamics, a thermodynamic free energy was derived in [50]-using the maximum entropy principle [51,52]-and later applied to study phenomenological models of evolution [22].Please see [50,53,54] for further discussion in terms of neural networks and information theory.

Particular partitions
There are lots of moving parts in this formulation because it tries to account for the behaviour of 'things' [55] and how this behaviour underwrites the emergence of 'kinds' (e.g.individuals and populations) at nested (i.e., developmental and evolutionary) timescales.

We will use [ ( )]
x t x  to denote the history or path of a time varying state.These paths are determined by state-dependent flow () x fx , with parameters xx  that include initial states 0 (0) x x x = .These parameters denote a (natural) kind.
Everything that follows rests upon a particular partition of states.A particular partition is considered necessary to talk about 'things', such as a 'phenotype' or 'population'.In brief, a particular partition enables the (internal) states of some 'thing' to be separated from the (external) states of every 'thing' else by (sensory and active) blanket states [56][57][58][59][60].In the absence of this partition, there would be no way of distinguishing a phenotype from its external milieu-or a population from the environment.In this setup, external states can only influence themselves and sensory states, while internal states can only influence themselves and active states: see Figure 1  • External states respond to sensory and active states.These are the states of a phenotype's external milieu: e.g., econiche, body, or extracellular space, depending upon the scale of analysis.
• Sensory states respond to fluctuations in external and active states: e.g., chemoreception, proprioception, interception, et cetera.
• Active states respond to sensory and internal states and mediate action on the environment; either directly or vicariously through sensory states: e.g., actin filaments, motor action, autonomic reflexes, et cetera.
• Internal states respond to sensory and active states: e.g., transcription, intracellular concentrations, synaptic activity, et cetera.
The evolution of these sparsely coupled states can be expressed as a Langevin or stochastic differential equation; namely, a high dimensional, nonlinear, state-dependent flow plus independent random (Wiener) fluctuations,  , with a variance of 2Γ: ( , , ) The flow per se can be expressed using the Helmholtz-Hodge decomposition [61] as follows: ) ( ) Note that our appeal to an equation of this form means with have implicitly stipulated that there is a steady state density or potential function, which remains constant (or at least changes very slowly) over the timescale we are interested in.Equation ( 2 In principle, other partitioning schemes could have been employed.Given that many schemes of this sort also rely upon identifying sparsity in the coupling between dynamic variables [68,69] some of the key results of this paper would be unchanged by choosing an alternative grouping operator.The potential downsides of alternative partitions include the loss of the clear relationship between things and their environments-the 'action-perception cycle' [70] -and the role this relationship has in formulations of inference and learning.Methods like that in [69], which pay careful attention to the relationship between (active) inputs, (external) states, (sensory) outputs, and (internal) controllers that receive the outputs and generate the inputs preserve approximately the same sort of partition as we have employed.
The terms 'external' and 'internal' offer useful intuitions, but it is worth being cautious about overinterpreting these labels in spatial terms.For instance, it might seem that some 'external' variables like ambient temperature might directly influence 'internal' variables like the temperature within a cell.However, this would not be an appropriate way of thinking about this system's partition.Either we would have to assume that there is an intervening variable (e.g., the temperature within the cell membrane) or we would have to treat the internal temperature as a sensory variable, which itself influences internal variables like the rates of enzymatic reactions.There is now an emerging literature asking about the appropriate ways to think of particular partitions in biology, including what is internal to a neuronal network [71], or a spinal reflex arc [72].

Ensemble dynamics and paths of least action
To describe dynamics at the phenotypic or phylogenetic scale, we first need to rehearse some standard results from statistical physics that furnish a probabilistic description of trajectories or paths at any scale.This description calls on the self-information of states () xt , generalised states ( , , ) , and paths, [ ( )] , where ( , , ) generalised notion, and 2Γ is the covariance of generalised random fluctuations: The first measure () x  is the self-information or surprisal of a state; namely, the implausibility of a state being occupied.When the state is an allele frequency and evolves according to Wright-Fisher dynamics, this is sometimes referred to as an 'adaptive landscape' [73].The second, () x is the Lagrangian, which is the surprisal of a generalised state; namely, the instantaneous path associated with the motion from an initial state.In generalised coordinates of motion, the state, velocity, acceleration, etc are treated as separate (generalised) states that are coupled through the flow [74,75].Finally, the surprisal of a path () x is called action, namely, the path integral of the Lagrangian.
Generalised states afford a convenient way of expressing the path of least action in terms of the Lagrangian The first equality resembles a Lagrange equation of the first kind that ensures the generalised motion of states is the state of generalised motion.Alternatively, it can be read as a gradient descent on the Lagrangian, in a moving frame of reference (second equality).When the Lagrangian is convex, solutions to this generalised gradient descent on the Lagrangian (third equality) necessarily converge to the path of least action.Denoting paths of least action with boldface: Convergence is guaranteed by the quadratic form (i.e., convexity) of the Lagrangian, which inherits from Gaussian assumptions about random fluctuations.This is sometimes described as convergence to the path of least action, in a frame of reference that moves with the state of generalised motion [76].
We can also express the conditional independencies implied by a particular partition using the Lagrangian of generalised states.Because there are no flows that depend on both internal and external states, external and internal paths are independent, when conditioned on blanket paths: In other words, blanket paths furnish a Markov blanket over internal paths.We will use this result later to disambiguate the role of active and sensory dynamics in sentient behaviouri.e., active inference-of a phenotype.First, we have to establish a formalism for ensembles or populations of phenotypes.Here, we draw on the apparatus of the renormalisation group.

Different kinds of things
To deal with multiple 'things' (e.g., particles, phenotypes and populations), we first introduce a grouping operator G that partitions the states at the i-th scale of analysis into N particles on the basis of the sparse coupling implied by a particular partition.In other words, we group states into an ensemble of particles, where each particle has their own internal and blanket states.With a slight abuse of the set builder notation: { , , } { , , , , , , , , , , , } The grouping operator means the external states of a given particle are the (blanket) states of remaining particles that influence it.See [55] for a worked example and numerical analysis.This enables us to express the dynamics of each particle in terms of its sensory states-that depend upon the blanket states of other particles-and autonomous states-that only depend upon the states of the particle in question: At this point, we pause to consider that the states in the particular ensemble have to be the states of some 'thing': namely, the states of a particle at a lower scale.This means that states must be the states of particles (e.g., phenotypic states) that constitute the particular states at the next scale (e.g., phylogenetic states) This recursive truism can be expressed in terms of grouping G operator-that creates particles-and a reduction R operator-that picks out certain particular states for the next scale: The composition of the two operators can be construed as mapping from the states of particles at one scale to the next or, equivalently, from particular states at one scale to the next.
In short, creating particles of particles, namely, populations.See Figure 2. ( ( ) ( ) The reduction operator R typically selects relevant variables whose slow fluctuations contextualise dynamics at the scale below.Here, R simply recovers the states of a particle that are time invariant, or vary slowly with time (i.e., the initial states and flow parameters).This instantiates a separation of timescales, such that the lifetime of a particle (e.g., phenotype) unfolds during an instant from the perspective of the next scale (e.g., evolution).The separation of timescales could have been achieved without the grouping (partitioning) operator.We could simply have projected onto the eigenvectors of a dynamical system's Jacobian-effectively, taking linear (or nonlinear) mixtures of our system to arrive at fast and slow coordinates.
However, all we would be left with are fast and slow continuous variables which have nothing of the character of the individuals, phenotypes, or populations in a system.In short, the grouping operator is key in identifying fast and slow 'things'-as opposed to just fast and slow coordinates of a dynamical system.variable is a (very large) vector state that itself is partitioned into multiple vector states.At scale i+1, each particle represents an ensemble (e.g., ( 1)   i m  + is population m), the elements of which are partitioned into autonomous and sensory subsets (e.g., ( 1)   n i m  + is the n-th autonomous genotype from population m).
At scale i, each particle represents an element of an ensemble (e.g., () i  is the -th phenotype), which is itself partitioned into sensory and autonomous subsets.The slow states of each element (e.g., phenotype) are recovered by the reduction operator R, to furnish the states at the ensemble level (e.g., genotype).A key feature of this construction is that it applies recursively over scales.
In short, the renormalisation group operator creates particles of particles, retaining only particular variables that change very slowly and then grouping them according to their sparse coupling.This means that particles increase in their size from one scale to the next-in virtue of the grouping of particles at the lower scale-and change more slowly-in virtue of the coarse graining afforded by temporal reduction.
In an evolutionary setting, the existence of steady-state solutions-implicit in the Langevin formalism above-means that phenotypic dynamics possess a pullback attractor.
This means their paths will return to the neighbourhood of previously occupied states.In other words, their 'lifecycle' will intersect with some Poincaré section in phenotypic state-space (possibly many times).We will take this intersection to be a mathematical image of persistence, which is underwritten by the flow parameters at any point in evolutionary time.
At the phylogenetic scale we have a partition into populations of phenotypes based upon which phenotypes influence each other.At this slow scale, states can be read as characterising the 'kind' of 'thing' that has particular states at the scale below.We will therefore refer to states at this level as (natural) kinds; noting that the 'kind of thing' in question does not change at the fast scale.We can now rehearse the particular partition at the phylogenetic scale, noting that for a population to exist, it must have a particular partition.
Here, a population corresponds to a set of particular kinds (i.e., sensory and autonomous kinds): . These include external, sensory, active, and internal kinds.
• External kinds of particles or phenotypes outside the population that change as a function of themselves, sensory and active kinds: c.f., the target of niche construction, from a molecular through to a cultural level, depending upon the scale of analysis [77,78].
• Sensory kinds mediate the effects of external kinds on the internal members of the population in question: e.g., nutrients or prey • Active kinds mediate the effects of internal kinds on external kinds: e.g., agents who mediate niche construction, from a molecular through to a cultural level, depending upon the scale of analysis.
• Internal kinds influence themselves and respond to changes in sensory and active kinds.
This concludes our formal setup.Next, we consider the coupling between fast phenotypic and slow phylogenetic dynamics.As in other applications of the free energy principle, this coupling emerges as a property of any phylogenetics that possesses an evolutionary steady-state.In other words, the idea here is to identify the properties of a system that exists; as opposed to identifying the properties that underwrite existence.We will see that the emergent properties look very much like natural selection.

Natural selection: a variational formulation
To specialise particular partitions to natural selection, we will associate autonomous (active and internal) kinds with the (extended) genotypes that constitute a population of agents, noting that there is no requirement for agents to belong to the same equivalence class-they just interact, in virtue of the sparse coupling that defines their grouping into a population.For example, some agents could be animals and others could be plants.
At the phylogenetic scale, an agent is an autonomous kind from a particular population.
At the phenotypic scale, the agent has particular (phenotypic) states, whose dynamics or paths depend upon its (genotypic) kind.For ease of notation, we will deal with a single population where the phenotypic state of the n-th agent, ( 1)   i n  + will be denoted by () i  (i.e., dropping the m in Figure 2).With this formalism in place, we formulate the coupling between phenotypic and phylogenetic dynamics with the following lemma Lemma 1 (variational fitness): if, at non-equilibrium evolutionary steady-state, the likelihood of an agent's genotype ( 1)   ( ) ii n  + = R is proportional to the likelihood of its phenotypic trajectory () i  (where \ denotes exclusion), ( 1) ( 1) ( 1) ( 1) then the following holds: An agent's autonomous dynamics can be cast as a gradient descent on a Lagrangian, whose path integral (i.e., action) corresponds to negative fitness.This Lagrangian depends upon the flow parameters (and initial states) supplied by the genotype.The agent's genotype can then be cast as a stochastic gradient descent on negative fitness.This emphasises the relationship between gradients on fitness (selection) and the stochastic terms that are uncorrelated with selection (drift): Fast (c.f., phenotypic) dynamics Slow (c.f., phylogenetic) dyn ) Lagrangian (c.f., surprisal) Action (c.f., adaptive fitness) Formally, the generalised gradient descent at the phenotypic scale corresponds to Bayesian filtering or inference [76] that maximises the marginal likelihood of phenotypic paths.This is almost tautological, in that it says that deviations from the most likely developmental trajectory, given some genotype, are unlikely.An additional subtlety here is that the Lagrangian, which plays the role of a (negative) Lyapunov function, is a function of sensory states.The implication is that the gradients are not static, but themselves change based upon the way in which the environment interacts with a creature during its development.The stochastic gradient descent at the phylogenetic scale corresponds to Bayesian learning via stochastic gradient Langevin dynamics [79], equipped with solenoidal mixing [80].px  encoded by the extended genotype x .This evidence is also known as a marginal likelihood because it marginalises over external dynamics; i.e., other agents.
Proof: the condition in (11) means that the probability of finding an agent of a particular kind is proportional to the likelihood of its phenotypic path; namely, the likelihood a phenotype keeps to the 'trodden path', characteristic of the 'kind' of 'things' that persist.The existence of a nonequilibrium evolutionary steady-state solution to the density dynamics (at both scales) allows us to express the fast and slow dynamics of agents and their autonomous states in terms of Helmholtz-Hodge decompositions.From ( 1) and ( 2), we have ) ) The gradients of surprisal at the slow scale, with respect to any given agent's 'kind' or genotype, are the gradients of action by ( 11): ( 1) ( 1) ( 1) ( ) chaotic oscillations at the phenotypic scale [84][85][86][87][88] or Red Queen dynamics at the phylogenetic scale [83,89,90].A system that has the property of detailed balance is one in which timereversal makes no qualitative difference to the dynamics of that system.The implication is that systems in which the solenoidal flow is zero possess detailed balance, while those with a nonzero solenoidal flow will not.The presence of solenoidal flow means that time reversal also leads to a reversal in the direction of this flow.Please see [31], as a relatively recent example of the Helmholtz-Hodge decomposition in Darwinian processes and [80] for a generic treatment of stochastic chaos in this setting.Furthermore, there is no requirement for the grouping operator to return the same partition at each instant of its application.This follows because the grouping operator is determined by sparse coupling among particles at the scale below, which itself may change as certain particles become 'shielded' from others [91]; for example, during the self-assembly of particular partitions associated with cell-division, multicellular organisation and development [57].Mathematically, this permits wandering sets (i.e., partitions) at each scale, where fitness gradients remain well-defined, because they inherit from the dynamics of the scale below.
Implicit in the renormalisation group construction, is the notion that variational selection could operate at multiple scales.In other words, although framed in terms of natural selection and evolution, the variational formulation above does not commit to separation of temporal scales apt for replication or reproduction.Any selective mechanism that fulfils the fitness lemma (Lemma 1) will, in principle, be subject to the same selective mechanics.
Common examples could include the optimisation of weights in neural networks and their structure learning [45,76,92].In a biological setting, this could correspond to developmental stages that have well-defined (separation of) temporal scales.Finally, we take a closer look at phenotypic dynamics and explain why they can be construed as sentient behaviour.

The sentient phenotype
An ontological interpretation of phenotypic dynamics-in terms of sentient behaviour or active inference-obtains by expressing the Lagrangian as a variational free energy.For clarity, we will drop the sub and superscripts (and condition on the extended genotype x ) to focus on the generalised states of a given phenotype.
[ This variational free energy can be rearranged in several ways.First, it can be expressed as an energy constraint minus the entropy of the variational density, which licences the name free energy [93].In this decomposition, minimising variational free energy corresponds to the maximum entropy principle, under the constraint that the expected Lagrangian is minimised [51,94].The energy constraint is a functional of the marginal density over external and sensory states that plays the role of a generative model (i.e., parameterised by the extended genotype); namely, a joint density over causes (external dynamics) and their consequences (autonomous dynamics).Second-on a statistical reading-variational free energy can be decomposed into the (negative) log likelihood of particular paths (i.e., accuracy) and the KL divergence between posterior and prior densities over external paths (i.e., complexity).Finally, it can be written as the negative log evidence plus the KL divergence between the variational and conditional (i.e., posterior) density.In variational Bayesian inference [95], negative free energy is called an evidence lower bound or ELBO [96][97][98].
Proof: the sparse coupling-that underwrites a particular partition-means autonomous paths (i.e., generalised states) depend only on sensory paths.This means there is a (deterministic and injective) map from the most likely autonomous paths (of sufficiently high order generalised motion) to the conditional density over external paths, where both are conditioned on sensory paths.This injection means we can consider the conditional density over external paths as being parameterised by internal paths.We will call this a variational density (noting from (6) that internal paths are conditionally independent of external paths): If autonomous dynamics are conservative, their trajectory is a path of least action and we can replace the Lagrangian gradients in (12) with variational free energy gradients to give ( 16) 

Remarks:
The free energy lemma (Lemma 2) associates negative fitness with variational free energy, such that phenotypic behaviour will appear to pursue paths of least free energy or greatest fitness.Because variational free energy is an upper bound on log evidence, the pursuit of maximum fitness can be read as self-evidencing [99]: namely, actively soliciting evidence for generative models endowed by evolution.In short, autonomous dynamics (appear to) actively infer external states under a generative model, whose parameters are (apparently) learned by minimising a path integral of variational free energy.
The functional form of variational free energy licences a teleological interpretation of autonomous dynamics; the internal paths can be read as the sufficient statistics or parameters of (approximate) Bayesian beliefs about external states, while active paths will (appear to) to change the posterior over external states to 'fit' internal (Bayesian) beliefs.In other words, active dynamics will look as if they are trying to fulfil the predictions of internal representations.A complementary interpretation inherits from the decomposition of variational free energy into complexity and accuracy.Minimising complexity means that generalised internal states encode Bayesian beliefs about external states that are as close as possible to prior beliefs, while generalised active states will look as if they are changing sensory states to realise those beliefs.These interpretations-in terms of perception and action-furnish an elementary but fairly expressive formulation of active inference.For example, the free energy formulations above have been used to emulate many kinds of sentient behaviour, ranging from morphogenesis [100], through action observation [101], to birdsong [102].
Although not developed here, the renormalisation group construction means that we can apply the same arguments to autonomous kinds-i.e., agents-at the slow scale.In other words, on average, the extended genotype of internal kinds comes to encode Bayesian beliefs about external kinds, while active kinds will look as if they are trying to realise those beliefs, via niche construction [77,[103][104][105].In virtue of the minimisation of variational free energy, we have an implicit maximum entropy principle, which brings us back to [21,22] via [49].

Discussion
One insight from the above analysis is that populations are not necessarily quotient sets of equivalence classes.In other words, there is no assumption that any given particle shares phenotypic or genotypic characteristics with any other particle.This is interesting on two counts.First, it suggests that treating a population as an equivalence class of conspecifics may not be sufficient, in the sense that the population includes all the (natural) kinds that interact to maintain their particular partition.The constituents of this population could range from macromolecules to marmosets.Furthermore, even if some agents share the same genotype, their phenotypes can specialise in distinct ways to minimise their joint variational free energies: see [100] for a worked example in the context of morphogenesis.This mandates a quintessentially co-evolutionary perspective that emphasises co-dependencies and co-creation [20,[106][107][108].However, the emergence of equivalence classes begs explanation.A potential answer is the generalised synchrony between particles, as they find their joint variational free energy minima-and become mutually predictable; e.g., [58,102].In an evolutionary setting, The synthesis of biological evolution and development on offer here is an example of a generalised synthesis: applicable, under the free energy principle, to all kinds of things.This synthesis can be read as generative models autopoietically generating entities and then using the 'fit' of the model to the niche as evidence for updating the model, in a cyclical process summarised in Figure 3. Effectively, we are describing the evolutionary-developmental process with the following protocol (where the first two items correspond to the right arrow in Figure 3, and the last two the left arrow): • First, generate an ensemble of particles (i.e., extended phenotypes) by sampling their flow parameters and initial states (i.e., extended genotypes) from some initial density.
• For each particle, find the path of least action using a generalised Bayesian filter (i.e., active inference).
• After a suitable period of time, evaluate the path integral of variational free energy (i.e., action) to supply a fitness functional.
• Update the flow parameters and initial states, using a stochastic gradient descent on the action (i.e., Darwinian evolution).
If this protocol were repeated for a sufficiently long period of time, it would converge to an attracting set, assuming this pullback attractor exists [35].In statistical mechanics this would be a nonequilibrium steady-state, while in theoretical biology, it would correspond to an evolutionary steady-state.

Limitations
As with most applications of the free energy principle, the variational account alone does not supply a process theory.Rather, it starts from the assumption that a nonequilibrium (evolutionary) steady-state exists and then describes the dynamics that the system must exhibit.
This enables various process theories to be proposed specific hypotheses about multiscale biological systems.For example, the genetic variation in the above formulation follows from the Helmholtz-Hodge decomposition or fundamental theorem of vector calculus.However, the ensuing stochastic gradient Langevin dynamics does not specify the particular processes that give rise to this kind of dynamics, e.g., [109].There are many candidates one could consider: for example, simple rejection sampling or more involved genetic algorithms that provide a plausible account of bisexual reproduction [110,111].A computationally expedient way of evaluating the requisite gradients-for example those for simulating artificial evolutioncould call upon Bayesian model reduction [48,112].Irrespective of the replication or reproduction process, it must, on the present analysis, conform to a stochastic gradient flow on 'fitness' with solenoidal mixing [83,89,90].
The issue of relating variational treatments to process theories is a problem also faced in other fields, notably the cognitive sciences-where the difficulty is in relating the optimization problems our brains solve to the anatomical [113] and physiological [114] mechanisms that underwrite its solution.The link in the neurosciences turns out to depend upon the form of the Lagrangian, from which the dynamics of Equation 16 imply specific connectivity structures and electrophysiological responses.This means that different Lagrangians can be proposed as alternative hypotheses that can be assessed against the measurements they predict.By analogy, different Lagrangians in Equation 12will have consequences for the trajectory an individual follows during their lifetime and the changes a population undergoes over several generations.
As an example, consider a bacterial culture exposed to a new antibiotic.In this setting, the autonomous states might include expression of mRNA and proteins essential for cell-wall maintenance and antibiotic resistance (e.g., β-lactamases).Sensory states might include intracellular antibiotic concentrations.We could hypothesize a Lagrangian that assigns high probability to an intact cell wall and low probability to significant levels of intracellular antibiotics.In the absence of the antibiotic, the path of least action will be pursued during the lifetime of a bacterium (i.e., between replications).However, the antibiotic causes deviation from the path of least action through disruption of cell wall integrity and unexpectedly high intracellular antibiotic concentrations.The slow dynamics of Equation 12 might, through changes in allele frequencies, change the Lagrangian such that higher β-lactamase concentrations are anticipated, altering the path of least action to one in which intracellular antibiotics are degraded.This sort of narrative is a reframing of the simplest mechanism of antibiotic resistance but is formulated in such a way that specific choices of Lagrangian result in subtly different trajectories at both time-scales.
In other words, the primary offering of this variational formulation of natural selection-from an empirical science perspective-is that one can hypothesize alternative forms for the Lagrangian.Each choice of Lagrangian will have consequences not only for the dynamics over physiological and developmental timescales but will also allow for predictions as to evolution over phylogenetic timescales.It is also worth noting that the account of natural selection set out here, in which genotypic evolution depends upon the action of phenotypic paths, applies only for systems that satisfy the variational fitness lemma (Lemma 1): namely, the likelihood of an agent's genotype corresponds to the likelihood of its phenotypic trajectory.
While a plausible assumption-that is intuitively consistent with Darwinian evolution-we did not examine the conditions under which this assumption holds.This means there is an opportunity to further the ideas set out in this paper by examining the sorts of stochastic systems in which the variational fitness lemma (Lemma 1) holds.It could be argued that Lemma 1 must hold at least in those systems where the genotype transforms into the phenotype retaining an equivalence within stochastic limits.For example, gene expression is the most fundamental level at which the genotype gives rise to the phenotype and this mapping from genotype to phenotype is the subject of the many process theories studied by developmental biology.On a teleological view, one might further argue that active inference is necessary to maintain a high degree of equivalence during the course of this transformation and to preserve a correspondence between genotype and phenotype.
One could suggest that Lemma 1, and the broader scope of the formalisms described here, may be applicable to systems where a population of entities engages in intergenerational replication (modelled here using the renormalization operations), and where those entities at a faster timescale engage in rapid adaptation (e.g.development, learning, behaviour, modelled with Active Inference) during their lifetime.These two levels could, for example, model how genome-based intergenerational evolution sets initial conditions for organismal molecular and behavioural developments.For the faster intra-generational scale, the external states model the material basis of what the phenotype is a generative model of.For the slower inter-generational scale, the external states are updated through time as a process of renormalization (reduction and grouping) of the extended genotype-phenotype.

Conclusion
This work attempts to unify the slow, multi-generational phylogenetic process of natural selection with the single-lifetime, phenotypic process of development.In this perspective, a bidirectional flow of information occurs as evolution imposes top-down constraints on phenotypic processes, and action-selection provides evidence that is selected for by the environment (i.e., bottom-up causation).In this account, learning and inference occur through updating probabilistic beliefs via Bayesian model selection in evolutionary time and active inference in developmental time.The fitness of (extended) genotypes and (extended) phenotypes is selected for through the minimisation of the same free energy functional; Bayesian model evidence or marginal likelihood.

Additional Information
for an influence diagram representing the coupling among internal, external, and blanket states: the external, sensory, active and internal states of a phenotype.Sensory and active states constitute blanket states ( , ) b s a = , while phenotypic states comprise internal and blanket states, ( , ) ( , ) bs    ==.The autonomous states of a phenotype ( , ) a  = are not influenced by external states: ) expresses the flow as a mixture of a dissipative, gradient flow and a conservative, solenoidal flow [62-64].The gradient flow  depends upon the amplitude of random fluctuations, while the solenoidal flow Q circulates on the isocontours of potential function called self-information, ( ) ln ( ) x p x  = − , where () px is called the nonequilibrium steady-state density or NESS density [37,65-67].

Figure 1 :
Figure 1: schematic (i.e., influence diagram) illustrating the sparse coupling among states that constitute a particular partition at two scales

Figure 2 :
Figure 2: schematic showing the hierarchical relationship between particles at scales i and i+1.For clarity, only sensory and autonomous states are illustrated in blue and pink, respectively.Note that each On this Bayesian reading, phenotypic dynamics infer their external dynamics, under a probabilistic model of how external dynamics generate phenotypic dynamics.Intergenerational genetic changes can be seen as learning the parameters of a generative model, given the Bayesian model evidence supplied by the scale below (e.g.extended phenotype).This reading rests upon the action (i.e., negative fitness) scoring the accumulated evidence ( | ) px  for a phenotype's generative model, ( , | )

Lemma 2 (
variational free energy): if the autonomous dynamics of a particle or phenotype evince classical (Lagrangian) mechanics, then they can be expressed as minimising a variational free energy functional of Bayesian beliefs-about external states-encoded by their internal phenotypic states, ()p  , under a generative model encoded by their (extended) that the Lagrangian and variational free energy share the same minima, where their gradients vanish: one can imagine this leading to convergent evolution or speciation (Luc Ciompi, personal communication; Ciompi, L. Synergie und Schizophrenie -und noch viel mehr.In Kriz, J, Tschacher, W. Synergetik als Ordner.Die strukturelle Wirkung der interdisziplinären Ideen Hermann Hakens.Pabst Science Publishers, Lengerich, Germany, 2017, pp.15-20).

Figure 3 :
Figure 3: Phylogeny and Ontogeny as bottom-up and top-down causation.
Statement KF is supported by funding for the Wellcome Centre for Human Neuroimaging (Ref: 205103/Z/16/Z) and a Canada-UK Artificial Intelligence Initiative (Ref: ES/T01279X/1).DAF is supported by a National Science Foundation postdoctoral fellowship (ID 2010290).