The Price Equation Program: Simple Invariances Unify Population Dynamics, Thermodynamics, Probability, Information and Inference

The fundamental equations of various disciplines often seem to share the same basic structure. Natural selection increases information in the same way that Bayesian updating increases information. Thermodynamics and the forms of common probability distributions express maximum increase in entropy, which appears mathematically as loss of information. Physical mechanics follows paths of change that maximize Fisher information. The information expressions typically have analogous interpretations as the Newtonian balance between force and acceleration, representing a partition between the direct causes of change and the opposing changes in the frame of reference. This web of vague analogies hints at a deeper common mathematical structure. I suggest that the Price equation expresses that underlying universal structure. The abstract Price equation describes dynamics as the change between two sets. One component of dynamics expresses the change in the frequency of things, holding constant the values associated with things. The other component of dynamics expresses the change in the values of things, holding constant the frequency of things. The separation of frequency from value generalizes Shannon’s separation of the frequency of symbols from the meaning of symbols in information theory. The Price equation’s generalized separation of frequency and value reveals a few simple invariances that define universal geometric aspects of change. For example, the conservation of total frequency, although a trivial invariance by itself, creates a powerful constraint on the geometry of change. That constraint plus a few others seem to explain the common structural forms of the equations in different disciplines. From that abstract perspective, interpretations such as selection, information, entropy, force, acceleration, and physical work arise from the same underlying geometry expressed by the Price equation.


Introduction 2
The abstract Price equation 2

Key results 5
History of earlier forms 6 Mathematical properties 7 D'Alembert's principle 9

Introduction
The Price equation is an abstract mathematical description for the change in populations.The most general form describes a way to map entities between two sets.That abstract set mapping partitions the forces that cause change between populations into two components, the direct and inertial forces.
The direct forces change frequencies.The inertial forces change the values associated with population members.Changed values can be thought of as an altered frame of reference driven by the inertial forces.
From the abstract perspective of the Price equation, one can see the same partition of direct and inertial forces in the fundamental equations of many different subjects.That abstract unity clarifies understanding of natural selection and its relations to such disparate topics as thermodynamics, information, the common forms of probability distributions, Bayesian inference, and physical mechanics.
In a special form of the Price equation, the changes caused by the direct and inertial forces cancel so that the total remains conserved.That conservation law defines a universal invariance and canonical separation of the direct and inertial forces.The canonical separation of forces clarifies the common mathematical structure of seemingly different topics.
This article sketches the overall argument for the common mathematical structure of different subjects.The argument is, at present, a broad framing of conjectures.The conjectures raise many interesting problems that require further work.Consult Frank (2012aFrank ( , 2017) ) for mathematical details, open problems, and citations to additional literature.

The abstract Price equation
The Price equation describes the change in the average value of some property between two populations (Price, 1972a;Frank, 2012a).Consider a population as a set of things.Each thing has a property indexed by i.Those things with a common property index comprise a fraction, q i , of the population and have average value, z i , for whatever we choose to measure by z.Write q and z as the vectors over all i.The population average value is z = q • z = q i z i , summed over i.
A second population has matching vectors q and z .Those vectors for the second population are defined by the special set mapping of the abstract Price equation.In particular, q i is the fraction of the second population derived from entities with index i in the first population.The second population does not have its own indexing by i.Instead the second population's indices derive from the mapping of the second population's members to the members of the first population.
Similarly, z i is the average value in the second population of members derived from entities with index i in the first population.Let ∆ be the difference between the derived population and the original population, ∆q = q − q and ∆z = z − z.
To calculate the change in average value, it is useful to begin by considering q and z as abstract variables associated with the first set, and q and z as corresponding variables from the second set.
The change in the product of q and z is ∆(qz) = q z − qz.Note that q = q + ∆q and z = z + ∆z.We can write the total change in the product as a discrete analog of the chain rule for differentiation of a product, yielding two partial change terms ∆(qz) = (q + ∆q)(z + ∆z) − qz = (∆q)z + (q + ∆q)∆z = (∆q)z + q ∆z.The first term, (∆q)z, is the partial difference of q holding z constant.The second term, q ∆z, is the partial difference of z holding q constant.In the second term, we use q as the constant value because, with discrete differences, one of the partial change terms must be evaluated in the context of the second set.
The same product rule can be applied to vectors, yielding the abstract form of the Price equation ∆z = ∆(q • z) = ∆q • z + q • ∆z. (1) The abstract Price equation simply partitions the total change in the average value into two partial change terms.
Note that q has a clearly defined meaning as frequency, whereas z may be chosen arbitrarily as any values assigned to members.The values, z, define the frame of reference.Because frequency is clearly defined, whereas values are arbitrary, the frequency changes, ∆q, take on the primary role in analyzing the structural aspects of change that unify different subjects.
The primacy of frequency change naturally labels the first term, with ∆q, as the changes caused by the direct forces acting on populations.Because q and q define a sequence of probability distributions, the primary aspect of change concerns the dynamics of probability distributions.
The arbitrary aspect of the values, z, naturally labels the second term, with ∆z, as the changes caused by the forces that alter the frame of reference, the inertial forces.
Table 1 defines commonly used symbols.Tables 2 and 3 in Appendix B summarize mathematical forms and relations between disciplines.

Canonical form
The prior section emphasized the primary role for the dynamics of probability distributions, ∆q, which follows as a consequence of the forces acting on populations.
The canonical form of the Price equation focuses on the dynamics of probability distributions and the associated forces that cause change.To obtain the canonical form, define as the relative change in the frequency of the ith type.
We can use any value for z in the Price equation.Choose z ≡ a. Then in which the equality to zero expresses the conservation of total probability because the total changes in probability must cancel to keep the sum of the probabilities constant at one.Thus, eqn 3 appears as a seemingly trivial result, a notational spin on ∆q i = 0.However, many generalities and connections between seemingly different disciplines follow from the partition of conserved probability into the two terms of eqn 3.

Preliminary interpretation
The Price equation by itself does not calculate the particular ∆q values of dynamics.Instead, the equation emphasizes the fundamental constraint on dynamics that arises from invariant total probability.The changes, ∆q, must satisfy the constraint in eqn 3, specifying certain properties that any possible dynamical path must have.
Put another way, all possible dynamical paths will share certain invariant properties.It is those invariant properties that reveal the ultimate unity between different applications and disciplines.
Note that q is fundamental, whereas z is an arbitrary assignment of value or meaning.The focus on q corresponds to the reason why information theory considers only probabilities, without consideration of meaning or values.In general, the unifying fundamental aspect among disciplines concerns the dynamics of probability distributions.We can then add values or meaning to that underlying fundamental basis.
In particular, we can first study universal aspects of the canonical invariant form based on a.We can then derive broader results by simply making the coordinate transformation a → z, yielding the most general expression of the abstract Price equation in eqn 1.
Constraints on z or ∆z specify additional invariances, which determine further structure of the possible dynamical paths and equilibria.Each z i may be a vector of values, allowing multiple constraints associated with the z values.
Alternatively, one can study the conditions required for ∆z to change in particular ways.For example, what are the necessary and sufficient patterns of association between initial frequency, q, relative frequency change, a, and value, z, to drive the git • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z)

Symbol Definition Equation q
Vector of frequencies with Values with average z = q • z; use z ≡ a, F, etc. for specific interpretations 1 ∆q Discrete changes, ∆q i = q i − q i , may be large 1 q Small, differential changes, ∆q → q ≡ dq 5 a Relative change of the ith type, Malthusian parameter, m = log q /q, log of relative fitness, w 26 w Relative fitness, Unitary coordinates, r = √ q, with r = 1 as invariant total probability 22 change, ∆z, in a particular direction?

Temporal dynamics
The frequency change terms, ∆q i , arise from the abstract set mapping assignment of members in the second set to members in the first set.In some cases, the abstract set mapping may differ from the traditional notion of dynamics as a temporal sequence, in which q i is the frequency of type i in the second set.
We may add various assumptions to achieve a temporal interpretation in which i retains its meaning as a type through time.For example, following Price (1995), we may partition q → q into two steps.In the initial step, q → q * , the mapping preserves type, such that q * i describes the frequency of type i in the second set.
In the subsequent step, q * → q , the mapping accounts for the forces that change type.For a force that makes the change i → j, we map type j members in the second set to type j members in the first set.Thus, ∆q j = q j − q * j describes the net frequency change from the gains and losses caused by the forces of type reassignment.
For this two-step process that preserves type, the net change q → q combines the type-changing forces with other forces that alter frequency.Thus, we may consider type-preserving maps as a special case of the general abstract set mapping.In this article, I focus on the properties of the general abstract set mapping.

Key results
Later sections use the abstract Price equation to show formal relations between natural selection and information theory, the dynamics of entropy and probability, basic aspects of physical dynamics, and other fundamental principles (Frank, 2017).Here, I list some key results without derivation or discussion.This listing gives a sense of where the argument will go, providing a target for further development in later sections.
Throughout this article, I use ratios of vectors to denote elementwise division, for example q /q = q 1 /q 1 , q 2 /q 2 , . . . .A constant added to or multiplied by a vector applies the operation to each element of the vector, for example, a + bz, for constants a and b, yields a + bz i for each i.D'Alembert's principle of physical mechanics.We can write the canonical Price equation of eqn 3 as d'Alembert's partition (Frank, 2015(Frank, , 2017) ) between the direct forces, F = a, and the inertial forces of acceleration, I, as ∆ā = (F + I) • ∆q = 0. (4) This equation generalizes Newton's second law that force equals mass times acceleration, describing the balance between force and acceleration.Here, the direct forces, F, balance the inertial forces of acceleration, I, along the path of change, ∆q.The condition ∆ā = 0 describes conservative systems.
For nonconservative systems, we can use a → z, with ∆z not necessarily conserved.
Information theory.For small changes, ∆q → q and F = a → log(q /q), the direct force term is in which D is the Kullback-Leibler divergence, a fundamental measure of information, and F is a nondimensional expression of Fisher information (Cover & Thomas, 1991).
Extreme action.The term for direct force, or action, q • F, yields frequency change dynamics, q, determined by the extremum of the action, subject to constraint in which φ = F is a given force vector.The first parenthetical term constrains the incremental distance between probability distributions to be F = q2 i /q i = C 2 , for a given constant, C. The second parenthetical term constrains the total probability to remain invariant.
Entropy and thermodynamics.The force vector, φ, can be described as a growth process, q i = q i e φ i , with φ i = log(q i /q i ).A constraint on the system's partial change in some quantity, q • z = B, constrains the new frequency vector, q .We may write the constraint as q•log q = −λ( q • z) = −λB, thus The action term, − q • log q, is the increase in entropy, −q • log q.Maximizing the action maximizes the production of entropy.
Maximum entropy and statistical mechanics.
In the prior example, the work done by the force of constraint is q • F c = −λB, with F c = log q = log k − λz.At maximum entropy, we obtain an equilibrium, log q = log q.Thus, the maximum entropy equilibrium probability distribution is This Gibbs-Boltzmann-exponential distribution is the principal result of statistical mechanics.Here, we obtained that result through a Price equation abstraction that led to maximum entropy production, subject to a constraining invariance on a component of change in z.
Constraint, invariance and sufficiency.The maximum entropy probability distribution expresses the forces of constraint, F c , acting on z.Different constraints yield different distributions.For example, the constraint q • (z − µ) 2 = σ 2 yields git • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank a Gaussian distribution for given mean, µ, and variance, σ 2 .This constraint is sufficient to determine the form of the distribution.Similarly, for small changes, the total change of the direct forces does not require the exact form of the frequency changes, q.It is sufficient to know the Fisher information distance, q2 i /q i = F, which determines the subsets of the possible change vectors, q, with the same invariant Fisher distance, F. Many results from the abstract Price equation express invariance and sufficiency.
Inference: data as a force.Use θ ≡ i as an index for different parameter values.Then q θ matches the Bayesian notion of a prior probability distribution for the values of θ.The posterior distribution is in which the normalized likelihood, L θ , describes the force of the data that drives the change in probability.In Price notation, the normalized likelihood is equivalent to the force vector, L ≡ F, and also L − 1 ≡ a.With that definition for a in terms of the force of the data, the structure and general properties of Bayesian inference follow as a special case of the abstract Price equation.
Invariance, scale and probability distributions.The maximum entropy probability distribution in eqn 7 is invariant to affine transformation, z → a+bz, because k and λ adjust to a and b.That affine invariance with respect to z, which arises directly from the abstract Price equation, is sufficient by itself to determine the structure of commonly observed probability distributions, without need of invoking entropy maximization.The structure of common probability distributions is q = ke −λe βw .
The function w(z) is a scale for z, such that a shift in that scale, w → α + w, only changes z by a constant multiple, and therefore does not change the probability pattern.Simple forms of w lead to the various commonly observed continuous probability distributions.For example, w(z) = log z yields the stretched exponential distribution.

History of earlier forms
Before analyzing the abstract Price equation and the unification of disciplines, it is useful to write down some of the earlier expressions and applications of the Price equation from biology (Frank, 1995(Frank, , 1997(Frank, , 2012a;;Walsh & Lynch, 2018).

Fitness and average excess
This section extends the definition of relative changes in eqn 2. Let w i = q i /q i be the relative growth, or relative fitness, of the ith type.Then we may define which, in biology, is Fisher's average excess in fitness (Fisher, 1941).Note that ∆q i = q i a i and that the average value of w is w = 1, thus a i = w i − w.

Variance in fitness
Considering a as a measure of fitness, the first term of eqn 3 becomes the partial change in average fitness caused by the direct forces, F. In symbols in which ∆ F is the partial change caused by the direct forces, and V w is the variance in fitness.

Fundamental theorem
If we let a i = αx i + i be the regression of fitness, a i , on some predictor, x i , and define g i = αx i , then If one interprets x i as an inherited gene, and i as an environmental effect that is not transmitted to the next generation, then the partial change in fitness by natural selection that is transmitted to the next generation is ∆ NS ā = V g .This result is analogous to Fisher's fundamental theorem of natural selection (Fisher, 1958;Price, 1972b;Ewens, 1989;Frank, 1997).
The analysis tracks three sets.The initial set before selection with ā, the second set after selection with ā † , and the third set after transmission with ā .The set after transmission retains only those changes associated with x i , interpreted as an inherited gene, such that ∆ā = ā − ā.

Covariance form and replicators
Using the definitions of relative fitness and average excess, the first term of the Price equation is in which Cov(w, z) is the covariance between fitness and value.This covariance implies that natural selection tends to increase the average value of z in proportion to the association between fitness and value.If the values do not change, ∆z i = 0, then the total change is ∆z = Cov(w, z).
In one common application, sometimes referred to as the replicator problem, we label each individual in a population by its own unique index, i, and let z i = p i be 0 or 1 to specify if each individual is a type 0 or type 1 individual (Taylor & Jonker, 1978;Schuster & Sigmund, 1983).We can think of p i as the frequency of type 1 in individual i.Then p is the frequency of type 1 individuals in the population, and ∆p = Cov(w, p) is the frequency change of types in the population (Price, 1970).Here, we assume that individuals do not change their type during transmission, ∆p i = 0, so that the second Price equation term is zero.This assumption is usually interpreted in biology as the absence of mutation.

Levels of selection
We can write the second Price equation term as 15) in which E denotes the expectation operator for the average value.Combining this expression with eqn 13, we obtain an alternative form of the Price equation This form is often used to analyze how selection acts at different levels, such as individual versus group selection (Price, 1972a;Hamilton, 1975).As an example, consider a variant of the replicator problem, which uses z ≡ p, yielding ∆p = Cov(w, p) + E(w∆p), (17) in which p i now denotes the frequency of type 1 individuals within the ith group of individuals, w i is the fitness of the ith group relative to all other groups, and ∆p i is the change in the frequency of type 1 individuals within the ith group.Thus, the two terms can be interpreted as the change caused by selection between groups and the change caused by selection between individuals within groups.

Mathematical properties
This section illustrates mathematical properties of the Price equation.These mathematical properties set the foundation for unifying apparently different kinds of problems from different disciplines.

Geometry and work
Write the standard Euclidean geometry vector length as the square root of the sum of squares git • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank For any vector z in which ω is the angle between the vectors ∆q and z.If we interpret z ≡ F as an abstract, nondimensional force, then expresses an abstract notion of work as the distance moved, ∆q , multiplied by the component of force acting along the path, F cos ω.

Divergence between sets
If we let z ≡ a describe the relative growth of the various frequencies, a i = ∆q i /q i , then the divergence between sets can be expressed as in which R is the radius of a sphere on which must lie all possible ∆q / √ q changes with the same divergence between sets.If we choose to interpret a as an abstract notion of force, or fitness, acting on frequency changes, then ∆q • a is the work, with magnitude ∆q / √ q 2 , that separates the probability distribution q from q.
Small changes, paths and logarithms If we think of the separation between sets as a sequence of small changes along a path, with each small change as ∆q → q, then a → q q = d log q, in which the overdot and the symbol "d" equivalently describe the differential.Then the partial change by direct forces separates the probability distributions of the two sets by the path length in which F is an abstract, nondimensional expression of the Fisher information distance metric.

Unitary and canonical coordinates
Let r = √ q.Then r = 1, expressing the conservation of total probability as a vector of unit length, in which all possible probability combinations of r define the surface of a unit sphere.In Hamiltonian analyses of d'Alembert's principle for the canonical Price equation, r is a canonical coordinate system (Frank, 2015).The unitary coordinates, r, also provide a direct description of Fisher information path length as a distance between two probability distributions The constraint on total probability makes square root coordinates the natural system in which to analyze Euclidean distances, which are the sums of squares.See Figure 1.

Affine invariance
Affine transformation shifts and stretches (multiplies) values, z → a + bz, for shift by a and stretch by b.Here, addition or multiplication of a vector by a constant applies to each element of the vector.
In the abstract Price equation ∆z = ∆q • z + q ∆z, affine transformation, z → a + bz, alters the terms as: ∆z → b∆z, because the shift constant cancels in the differences; ∆q • z → b∆q • z, because in (∆q i )(a + bz i ), we have a∆q i = 0; and q ∆z → bq ∆z, because the shift constant cancels in the differences.The stretch factor b multiplies each term and therefore cancels, leaving the Price equation invariant to affine transformation of the z values.Much of the universal structure expressed by the Price equation follows from this affine invariance.

Probability vs frequency
In this article, I use probability and frequency interchangeably.Many subtle issues distinguish the concepts and applications associated with those alternative words.However, in this attempt to identify common mathematical structure between various subjects, those distinctions are not essential.

D'Alembert's principle
The remaining sections repeat the list of topics in the Key results section.Prior publications discussed these topics (Frank, 2012a(Frank, , 2017)).Here, I present additional details, roughly sketching how the structure provided by the abstract Price equation unifies various subjects.
We can rewrite the canonical Price equation for the conservation of total probability in eqn 3 as Here, ∆q satisfies the constraint on total probability and any other specified constraints.The direct forces are F = a = ∆q/q.The inertial forces are in which ∆ 2 q = ∆(q − q) is the second difference of q, which is roughly like an acceleration.D'Alembert's principle is a generalization of Newton's second law, force equals mass times acceleration (Lanczos, 1986).In one dimension, Newton's law is F = −I, for force, F , and mass times acceleration, −I, so that F + I = 0. D'Alembert generalizes Newton's law to a statement about motion in multiple dimensions such that, in conservative systems, the total work for a displacement, ∆q, and total forces, F + I, is zero.Work is the distance moved multiplied by the force acting in the direction of the movement.
The canonical Price equation of eqn 3 is an abstract, nondimensional generalization of d'Alembert for probability distributions that conserve total probability.The movement of the probability distribution between two populations, or sets, can be partitioned into the balancing work components of the direct forces, ∆q • F, and the inertial forces, ∆q • I.We can often specify the direct forces in a simple and clear way.The balancing inertial forces may then be analyzed by d'Alembert's principle (Lanczos, 1986).
The movement of probability distributions in the canonical Price equation is always conservative, ∆ā = 0, so that d'Alembert's principle holds.
When we transform to the general Price equation by a → z, then it may be that ∆z = 0 and the system is not conservative.In that case, we may consider constraints on ∆z and how those constraints influence the possible paths of change for ∆q.
We can obtain a simple form of d'Alembert's principle for probability distributions when displacements are small, ∆q → q ≡ dq.Define the relative change operator as d log, the differential of the logarithm.Then F = d log q and I = d log(d log q) = d log 2 q, yielding (F + I) • dq = d log q + d log 2 q • dq = 0, (25) with the direct force proportional to the relative change in frequencies, and the inertial force proportional to the relative nondimensional acceleration in frequencies.
From eqn 5, the work of the direct forces, dq•F = q • F = F, is the Fisher information path length that separates the probability distributions, q and q, associated with the two sets.The inertial forces cause a balancing loss, q • I = −F, which describes the loss in Fisher information that arises from the recalculation of the relative forces in the new frame of reference, q .The balancing loss occurs because the average relative force, or fitness, is always zero in the current frame of reference, for example, q • a = q i ( qi /q i ) = 0. Any gain in relative fitness, q • F = F, must be balanced by an equivalent loss in relative fitness, q • I = −F.
Here, the notions of force, inertia, and work are nondimensional mathematical abstractions that arise from the common underlying structure between the Price equation and the equations of physical mechanics.Similarly, the Fisher information measure here is an abstraction of the standard usage of the Fisher metric.
By equating force with relative frequency change, we intentionally blur the distinction between external causes and internal effects.By describing change as the difference between two abstract sets rather than change through time or space, we intentionally blur the scale of change.By separating frequencies, q, from property values, z, we intentionally distinguish universal aspects of structural git • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank .r 1 ; r 2 / jjrjj D 1 Figure 1: Geometry of change by direct forces.See Table 1 for definition of symbols.Tables 2 and 3 summarize distance expressions and point to locations in the text with further details.(a) The abstract physical work of the direct forces as the distance moved between the initial set with frequencies q, and the altered set with frequencies q .For discrete changes, the frequencies are normalized by the square root of the frequencies in the initial set.The distance can equivalently be described by the various expressions shown, in which V w is the variance in fitness from population biology, J is the Jeffreys divergence from information theory, and F is the Fisher information metric which arises in many disciplines.The symbol "→" denotes the limit for small changes.(b) When changes are small, the same geometry and distances can be described more elegantly in unitary square root coordinates, r = √ q.
change between sets from the particular interpretations of property values in each application.The blurring of cause, effect and scale, and the separation of frequency from value, lead to abstract mathematical expressions that reveal the common underlying structure between seemingly different subjects.

Information theory
When changes are small, the direct force term of the canonical Price equation expresses classic measures of information theory (eqn 5).In particular, q • a = q • F is a symmetric expression of the Kullback-Leibler divergence, which measures the change in information associated with the separation between two probability distributions (Cover & Thomas, 1991).
For small changes, the Kullback-Leibler divergence is equivalent to a nondimensional expression of the Fisher information metric.The Fisher met-ric provides the foundation for much of classic statistical theory and for the subject of information geometry (Fisher, 1925;Amari & Nagaoka, 2000).The Fisher metric also arises as an equivalent description for dynamics in many classic problems in physics and other subjects (Frieden, 2004).
What does it mean that the Price equation matches classic measures of information, which also arise other subjects?That remains an open question.I suggest that the Price equation reveals the common mathematical structure among those seemingly different subjects.That mathematical structure arises from the conserved quantities, invariances, or constraints that impose a common pattern on dynamics.By this interpretation, dynamics is just a description of the changes between a sequence of sets.
The key aspect of the Price equation seems to be the separation of frequencies from property values.That separation shadows Shannon's separation of the information in a message, expressed by git • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank frequencies of symbols in sets, from the meaning of a message, expressed by the properties associated with the message symbols.The Price equation takes that separation further by considering the abstract description of the separation between sets rather than the information in messages.Price (1995) was clearly influenced by the information theory separation between frequency and property in his discussion of a generalized notion of natural selection that might unify disparate subjects.
The equivalence of the Price equation and information measures arises directly from the assumption of small changes.For larger changes, the relation between the Price equation and information remains an open problem.We might, for example, describe larger changes as in which m i is a nondimensional expression for the total force that separates frequencies.From that expression, in which w i is a form relative fitness, and m i is called the Malthusian parameter in biology.Then, similarly to eqn 5, we have which is known as the Jeffreys divergence.In this case, with ∆q not necessarily small, we no longer have a direct equivalence to Fisher information.Information geometry, which analyzes continuous paths along contours of conserved total probability, describes the relations between Fisher information and this discrete divergence (Dabak & Johnson, 2002).The idea is that big changes, ∆q, become a series of small changes, q, along a continuous path that connects the endpoints, q to q .Each small step along the path can be described as a Fisher information path length, and the sum of those small lengths equals the Jeffreys divergence.
Earlier work in population genetics theory derived the total change caused by natural selection as q2 /q i (reviewed by Ewens, 1992;Wei et al., 2009;Raju & Krishnaprasad, 2019).That initial work did not emphasize the equivalence of the change by natural selection and Fisher information (Frank, 2009b).Here, the Fisher metric arises most simply as the continuous limiting form of the canonical Price equation description for the distance between two sets.

Extreme action
We can write eqn 6 as By the principle of extreme action, the dynamics, q, maximize or minimize (extremize) the action, q • φ, subject to the constraints.In this case, maximizing the action simply describes the fact that the movement, q, tends to be in the direction of the force vector, φ, subject to any constraints on motion.The Lagrangian, L, combines the action and the constraints into one expression.To illustrate the principle of extreme action with the Lagrangian above, we maximize the action subject to the constraints by solving ∂L/∂ qi = 0, while also solving for κ and ξ by requiring that F = C 2 and q • 1 = 0.The solution is qi = κq i φ i − φ , in which φ i − φ is the excess force relative to the average, and ξ = φ follows from satisfying the constraint on total probability under the assumption of small changes.The constant, κ = C/σ φ , satisfies the constraint on total path length, F = C 2 , in which σ φ is the standard deviation of the forces.We can rewrite the solution as This expression shows that we can determine the frequency changes, q, from the given forces, φ, or we can determine the forces from the given frequency changes.The mathematics is neutral about what is given and what is derived.In this case, φ is an arbitrary force vector.Using z = φ in the general Price equation does not necessarily yield ∆z = ∆ φ = 0.A nonconservative system does not satisfy d'Alembert's principle.Often, git • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank we can specify certain invariances associated with ∆z, and use those invariances as additional forces of constraint on q in the Lagrangian.The additional forces of constraint typically alter the dynamics and the potential equilibria, as shown in the following section.
Across many disciplines, problems can often be solved by this variational method of writing a Lagrangian and then extremizing the action subject to the constraints (Lanczos, 1986).The difficulty is determining the correct Lagrangian for a particular problem.No general method specifies the correct form.
In this example, the Price equation essentially gave us the form of the action and the constraints.
Here, the action is the frequency displacement multiplied by the arbitrary force vector, q • φ, which is analogous to the physical work done in the movement of the probability distribution.The constraints follow from the conservation of total probability and the description of total distance moved as Fisher information, F, which arises from the canonical Price equation.

Entropy and thermodynamics
The tendency for systems to increase in entropy provides the foundation for much of thermodynamics (Van Ness, 1983).Entropy can be studied abstractly by the information entropy quantity, E = −q • log q.For small changes in frequencies, the change in entropy is dE = − q • log q.
System dynamics often maximize the production of entropy (Dewar et al., 2014).Maximum entropy production suggests that the dynamics may be analyzed by a Lagrangian in which the action to be maximized is the production of entropy, − q • log q.
In the basic Lagrangian for dynamics given by eqn 29, the action is the abstract notion of physical work, q • φ, the displacement, q, multiplied by the force, φ.
The force vector, φ, can be related to frequency change in a growth process, q i = q i e φ i , with φ i = m i = log(q i /q i ), as in eqn 27.The work becomes in which the second term on the right is the production of entropy.
If the system conserves the change in some quantity, ∆z = B, then that invariant change imposes a constraint on the possible change in the probability distribution, q = q − q.Suppose that the value z i is a property of a type, i, such that each type does not change its property value between sets, ∆z i = z i − z i = 0.Then, from the general Price equation, ∆z = B implies q•z = B.This constraint acts as a force that limits the possible probability distributions, q , given the initial distribution, q.
We can express the constraint q • z = B on z in terms of a constraint on q as log q = log k − λz, for constant, k.Then the constraint q • z has an equivalent expression in terms of q as q • log q = −λ( q • z) = −λB. (32) We can now split the total force, φ, as in eqn 31 and, considering q • log q as a force of constraint, we can rewrite the Lagrangian of eqn 29 as (33) The action term, dE = − q • log q, is the increase in entropy, E = −q • log q.Maximizing the action maximizes the production of entropy.
The maximization by solving ∂L/∂ qi = 0 subject to the constraints yields a solution with the same form as eqn 30.The force term is replaced by a partition of forces into components that match the direct entropy increase and the constraint on z as in which the star superscripts denote the deviations from average values, The value of κ is C/σ φ , as in the previous section.
In this case, we use for φ the partition of the forces on the right side of eqn 34 into the direct entropy and the constraining forces.
The constraint q • z = B implies git • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank The term β Ez is the regression of − log q on z, which acts to transform the scale for the forces of constraint imposed by z to be on a common scale with the direct forces of entropy, − log q.The term B/κσ 2 z describes the required force of constraint on frequency changes so that the new frequencies move z by the amount q • z = B.The term σ 2 z is the variance in z.
In these examples of dynamics derived from Lagrangians, the action is the partial change term of the direct forces derived from the universal properties of the Price equation.Thus, the maximum entropy production in this case can be interpreted as a universal partial maximum entropy production principle, in the Price equation sense of the partial change associated with the direct forces, holding the inertial frame constant (Frank, 2017).
In many applications, causal analysis reduces to this pattern of partial change by direct focal causes, holding other causes constant.The particular partition into direct, constraining, and inertial forces is a choice that we make to isolate or highlight particular causes (Lanczos, 1986).

Entropy and statistical mechanics
When entropy reaches its maximum value subject to the forces of constraint, equilibrium occurs at q = q.From the force of constraint given in the previous section, log q = log k−λz, the equilibrium can be written as in which I have dropped the i subscript.This Gibbs-Boltzmann-exponential distribution is the principal result of statistical mechanics (Feynman, 1998).Here, we obtained the exponential distribution through a Price equation abstraction that led to maximum entropy production.This result suggests that equilibrium probability distributions are simple expressions of maximum entropy subject to the forces of constraint.Jaynes (1957a,b) developed this maximum entropy perspective in his quest to overthrow Boltzmann's canonical ensemble for statistical mechanics.The canonical ensemble describes macroscopic probabil-ity patterns by aggregation over a large number of equivalent microscopic particles.
The theory of statistical mechanics, based on the microcanonical ensemble, yields several commonly observed probability distributions.However, Jaynes (2003) emphasized that the same probability distributions commonly arise in economics, biology, and many other disciplines.In those nonphysical disciplines, there is no meaningful canonical ensemble of identical microscopic particles.According to Jaynes, there must another more general cause of the common probability patterns.The maximization of entropy is one possibility (Frank, 2009a).
Jaynes emphasized that increase in entropy is equivalent to loss of information.The inherent randomizing tendency in all systems causes loss of information.Maximum entropy is simply a consequence of that loss of information.Because systems lose all information except the forces of constraint, common probability distributions simply reflect those underlying forces of constraint.
The Gibbs-Boltzmann-exponential distribution in eqn 36 expresses the simple force of constraint on the mean of some value, z, associated with the system.Different constraints lead to different distributions.For example, the constraint q • (z − µ) 2 = σ 2 yields a Gaussian distribution for mean µ and variance σ 2 .
Jaynes invoked maximum entropy as a consequence of the thermodynamic principle that systems increase in entropy.Here, I developed the maximization of entropy from the abstract Price equation expression for frequency dynamics and the extreme action principle.
Extreme action simply expresses the notion that changing frequencies align with the direction of the force vector.That geometric alignment is equivalent to the maximization of frequency change multiplied by force, an abstract notion of physical work.
Jaynes argued that the fundamental notion of information sets the underlying structural unity of thermodynamics, probability, and many aspects of statistical inference.I argue for underlying unity based on abstract properties of invariance and geometry (Frank, 2017).Those properties of invariance and geometry give a common mathematical git • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank structure to any problem that can be considered abstractly by the Price equation's description of the change between two sets.The next section reviews and extends these notions of invariance and common mathematical structure.

Invariance and sufficiency
The Price equation expresses constraints on the change in probability distributions between sets, ∆q.For example, if z is a constant, conserved value, then the changes, ∆q, must satisfy that constraint.We may say that the conserved value of z imposes a force of constraint on the frequency changes.This section relates the Price equation's abstract notions of change and constraint to Jaynes' arguments.
Jaynes emphasized that systems tend to increase in entropy or, equivalently, to lose information.Entropy increase is a force that drives a system to an equilibrium at which entropy is maximized subject to any forces of constraint.
Because entropy increase is essentially universal, it is sufficient to know the particular forces of constraint to determine the most likely form of a probability distribution.Sufficiency expresses the forces of constraint in terms of conserved quantities.
Put another way, sufficiency partitions all possible populations into subsets.Each subset contains all of those populations with the same invariant conserved quantity.For example, if the constraint is a conserved value of z, then all populations with the same invariant value of z fall into the same subset.
To analyze the force arising from constraint on z and the most likely form of the associated probability distribution, it is sufficient to know that the dynamics of populations driven by entropy increase must remain within the subset with invariant values defined by the constraints of the conserved quantities.
Jaynesian thermodynamics follows from the general force of information loss, in which the constraints sufficiently describe the only information that remains after maximum information loss.
The Price equation goes beyond Jaynes in reveal-ing the underlying abstract mathematical structure that unifies seemingly different subjects.In all of the disciplines we have discussed, the key results for each discipline arise from the basic description of change between sets constrained by invariant conditions that we place on frequency, q, and value, z.
In addition, the Price equation expresses the intrinsic invariance to affine transformation z → a + bz.
From the perspective of the abstract Price equation, notions of information and entropy increase arise as secondary descriptions of the underlying primary geometric aspects of change between sets subject to intrinsic invariances and to invariant conditions imposed as constraints.Those aspects of geometry and invariance set the shared foundations for many seemingly different disciplines.

Inference: data as a force
Jaynes considered information as a force that changes probability distributions.Entropy increase is the force that causes loss of information, driving probability distributions to maximum entropy subject to constraint.For inference, data provide an informational force that drives the Bayesian dynamics of probability distributions to provide estimates of parameter values.The parameters are typically the conserved, constrained quantities that are sufficient to define maximum entropy probability distributions.
How does the Jaynesian interpretation of data as an informational force in statistical inference follow from the underlying Price equation abstraction?Consider the estimation of a parameter, θ, such as the mean of an exponential probability distribution.In the Bayesian framework, we describe the current information that we have about θ by the probability distribution, q θ .The value of q θ represents the relative likelihood that the true value of the parameter is θ.The probability distribution over alternative values of θ represents our current knowledge, or information, about θ.To relate this to the Price framework, note that we are now using θ as the subscript for types instead of i.The vector q now implicitly describes the set of values for q θ .git • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank Our problem concerns how new information about θ changes the probability values to q θ .The new probability values summarize the combination of our prior information in q θ and the force of the new information in the data.This problem is the Bayesian dynamics of combining a prior distribution, q θ , with new data to generate a posterior distribution, q θ , with ∆q θ = q θ − q θ .
We have from our universal definitions for change given earlier the relation q θ = q θ w θ , in which we called w = q /q the relative fitness, describing the force of change on probabilities.Here, the force arises from the way in which new data alters the net likelihood associated with a value of θ.
Following Bayesian tradition, denote that force of the data as L(D|θ), the likelihood of observing the data, D, given a value for the parameter, θ.To interpret a force as equivalent to relative fitness, the average value of the force must be one to satisfy the conservation of total probability.Thus, define .
We can now write the classic expression for Bayesian updating of a prior, q θ , driven by the force of new data, L θ = L(D|θ), to yield the posterior, q θ , as By recognizing L as a force vector acting on frequency change, we can use all of the general results derived from the Price equation.For example, the Malthusian parameter, m, relates to the log-likelihood as This equivalence for log-likelihood relates frequency change to the Kullback-Leibler expressions for the change in information which we may think of as the gain of information from the force of the data.Perhaps the most general expression of change describes the relative separation within the unitary square root coordinates as the Euclidean length which is an abstract, nondimensional expression for the work done by the displacement of the frequencies, ∆q, in relation to the force of the data, L. I defined L as a normalized form of the likelihood, L, such that the average value is one, L = q • L = 1.Thus, we have a canonical form of the Price equation for normalized likelihood The second terms shows how the inertial forces alter the frame of reference that determines the normalization of the likelihoods, L → L. Typically, as information is gained from data, the normalizing force of the frame of reference reduces the force of the same data in subsequent updates.
All of this simply shows that Bayesian updating describes the change in probability distributions between two sets.That change between sets follows the universal principles given by the abstract Price equation.
Prior work noted the analogy between natural selection and Bayesian updating (Shalizi, 2009;Harper, 2010;Campbell, 2016).Here, I emphasized a more general perspective that includes natural selection and Bayesian updating as examples of the common invariances and geometry that unify many topics.

Invariance and probability
In the earlier section Affine invariance, I showed that the Price equation is invariant to affine transformations z → a + bz.This section suggests that the Price equation's intrinsic affine invariance explains universal aspects of probability distributions in a more general and fundamental manner than Jaynes' focus on entropy and information.
The general form of probability distributions in eqn 36 followed from the constraint log q = log k − λz.Affine transformation does not change the force imposed by that constraint, because log k − λz → log k − aλ − bλz = log k a − λ b z, git • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank in which k a = ke −aλ and λ b = bλ.Because the constants, k a and λ b , adjust to satisfy underlying constraints, the shift and stretch constants a and b do not alter the constraints or the final form of the probability distribution.
Thus, the probability distribution in eqn 36, arising from analysis of extreme action applied to a Lagrangian, is affine invariant with respect to z.We can make a more fundamental argument, by deriving the form of the probability distribution solely as a consequence of the intrinsic affine invariance of the Price equation.
In particular, shift invariance by itself explains why the probability distribution in eqn 36 has an exponential form (Frank, 2016a).If we assume that the functional form for the probability distribution, q i = f (z i ), is invariant to a constant shift, a + z i , then, dropping the i subscripts and using continuous notation, by the conservation of total probability holds for any magnitude of the shift, a, in which the proportionality constant, k a , changes with the magnitude of the shift, a, independently of the value of z, in order to satisfy the conservation of total probability.
Because k a is independent of z, the condition for the conservation of total probability is The invariance holds for any shift, a, so it must hold for an infinitesimal shift, a = .We can write the Taylor series expansion for an infinitesimal shift as with κ = 1−λ , because is small and independent of z, and κ 0 = 1.Thus, is a differential equation with solution in which k is determined by the conservation of total probability, and λ is determined by z.When z ranges over positive values, z > 0, then k = λ = 1/z.Invariance to stretch transformation by b follows from the adjustment, λ b , given above.Affine invariance of the probability distribution with respect to z implies additional structure.In particular, we can write z = e βw , in which a shift w(z) → α + w(z) multiplies z by a constant, which does not change the form of the probability distribution.Thus, in terms of the shift-invariant scale, w(z), we obtain the canonical expression that describes nearly all commonly observed continuous probability distributions (Frank, 2016a,c) when we add a few additional details about the measure, dψ z , and the commonly observed base scales, w(z).Understanding the abstract form of common probability patterns clarifies the study of many problems (Frank, 2016b(Frank, ,c, 2018) ) (see Appendix A).

Meaning
One cannot explain mathematical form by appeal to extrinsic physical notions.The structure of mathematical results does not follow from energy or heat or natural selection.Instead, those extrinsic phenomena arise as consistent interpretations for the structure of the mathematics.The mathematical structure can only be analyzed, explained and understood by reference to mathematical properties.For example, we may invoke invariance, conserved values, and geometry to understand why certain mathematical forms arise in the abstract Price equation description for changes in frequency, and why those same forms recur in many different applications.We may not invoke entropy or information as a cause, only as a description.
My goal has been to reveal the common mathematical structure that unifies seemingly disparate results from different subjects.The common mathematical structure arises primarily through simple invariances and their expression in geometry.
Second, new mathematical results and new insights into empirical phenomena may follow.I believe this to be true.However, the argument for novel results and insights is nearly impossible to make.For any particular result or insight, it is always possible to claim that the same could have been achieved without the broader framing.Ascribing the origins of insight to a general framework is almost always subjective.
The strongest argument I can make arises from two personal anecdotes.It is only in these cases that I understand the origin of insight in relation to the broad use of invariance as a unifying perspective.

Probability, invariance, and maximum entropy
The first anecdote shows how observations in biology motivated my search for a broader synthesis of concepts between disciplines.That synthesis, in terms of invariance, helped me to understand the observed biological patterns.It also led to a unified understanding of the commonly observed probability distributions in terms of the invariances that define scale, and an understanding of the relations between the equations of thermodynamics, natural selection in biology, and probability patterns.
In my work on cancer and other aspects of agerelated disease (Frank, 2007(Frank, , 2016c)), I noted that a wide variety of seemingly different dynamical models of disease progression tended to converge to a few similar forms of probability distributions for the age of disease onset.At first, I used Jaynes' maximum entropy approach (Jaynes, 1957a(Jaynes, ,b, 2003) ) to try and understand the relations between apparently complex processes and the resulting simple patterns (Frank, 2009a).That worked, in the sense that one could find constraints that led to maximum entropy distributions that matched the data.
The problem with maximum entropy is that the constraints simply describe the patterns in the data, without giving one a sense of how patterns arise and what relates different patterns to each other.Instead, one ends up with a catalog of the commonly observed probability distributions and the match-ing constraints for each distribution.
Those difficulties led me to study the forms of commonly observed probability distributions.I felt that if I could understand probability patterns more deeply, I would be in a better position to understand the biological problems that interested me.And, along the way, I would perhaps better understand more general aspects of probability patterns.
Over many years, I developed a unified understanding of probability patterns in terms of invariance and scale (Frank, 2014(Frank, , 2016a)).I used that improved understanding of probability to enhance my analyses of age-related diseases (Frank, 2016c) and the size distributions of trees in forests (Frank, 2016b).
That work on invariance and scale in probability left open the puzzle of how that perspective related to Jaynes' classic maximum entropy approach.Although my invariance approach to probability patterns could stand separately from maximum entropy, Jaynes' approach was widely used and formed a standard against which my new work would reasonably be compared.Also, I developed my ideas by initially starting with maximum entropy, and Jaynes himself strongly hinted that invariance might be the way forward from where he left the subject (Jaynes, 2003).
How could I connect my pure invariance approach to Jaynes' work on maximum entropy, which was developed explicitly as an extension to classical thermodynamics and statistical mechanics?
My work on probability seemingly has little relation to the Price equation.However, in my other studies, I had been using the Price equation as a tool to understand natural selection in biology (Frank, 1986(Frank, , 1995(Frank, , 2012a)).Over time, I began to see the broader connections between the Price equation and information theory (Frank, 2009b(Frank, , 2012b(Frank, , 2013)).
Through those studies of natural selection and the Price equation, I gained understanding of the dynamics of information.I was then able to see the connections between some of the classic results of thermodynamic change in entropy and the equations of natural selection.
With that broader understanding of entropy git • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank and information dynamics, I could then synthesize Jaynes' maximum entropy approach to probability with my approach based on invariance and scale (Frank, 2017).Some fundamental aspects of physical mechanics also began to fit within the unified structure (Frank, 2015).All of that abstract work fed back into my analyses and understanding of agerelated diseases, the sizes of trees, and the distribution of enzyme rates (Frank, 2016c,b).
For any of the particular insights into empirical problems or any of the particular mathematical results, it would have been possible to achieve the same without a broader perspective or an attempt to unify between disciplines.However, in fact, the broader perspective and unification of disciplines played a primary role.

The universal law of generalization in psychology
The second anecdote shows how the broad framework led to a new insight for a particular discipline.In this case, I happened to read an article in Science about an intriguing pattern in psychology (Sims, 2018).
The probability that an organism perceives two stimuli as similar typically decays exponentially with the separation between the stimuli.The exponential decay in perceptual similarity is often referred to as the universal law of generalization (Shepard, 1987;Chater & Vitányi, 2003).
Both theory and empirical analysis depend on the definition of the perceptual scale.For example, how does one translate the perceived differences between two circles with different properties into a quantitative measurement scale?
There are many different suggestions in the literature for how to define a perceptual scale.Each of those suggestions develops very specific notions of measurement based, for example, on information theory, Kolmogorov complexity theory, or multidimensional scaling descriptions derived from observations (Chater & Vitányi, 2003;Shepard, 1987;Sims, 2018).
I showed that the inevitable shift invariance of any reasonable perceptual scale determines the exponential form for the universal law of generaliza-tion in perception (Frank, 2018).All of the other details of information, complexity, and empirical scaling are superfluous with respect to understanding why the universal law of generalization has the exponential form.
Certainly, the insight that the inevitable shift invariance of scale is a sufficient explanation does not require a broad conceptual framework derived from the Price equation.However, I was able to see immediately that solution only because I had for years been working toward a unified understanding of information, scale, and invariance.Many others had worked on this central puzzle in psychology without seeing the underlying simplicity.

Table 2 :32
Mathematical forms that highlight similarities between different disciplines, part 1 Mathematical form Comments Equation Price equation: ∆z = ∆q • z + q • ∆z Most general form; separates frequency, q, from property value, z; partitions frequency and property value change 1 ∆ā = ∆q • a + q • ∆a = 0 Canonical form; emphasizes conservation of total frequency; recover general form by coordinate change a → z Mathematical relations: ∆q • z = ∆q z cos ω Geometric equivalence for dot product; a ≡ F yields abstract expression of physical work (see below) 19 ∆q • z = Cov(w, z) Equivalent statistical form 13 q • ∆z = E(w∆z) Equivalent statistical form 15 ∆q • a = ∆q / √ q Geometric expression for total distance between sets in terms of frequency; discrete generalization of Fisher information, F 20 Physical mechanics: ∆ā = (F + I) • ∆q = 0 Abstraction of D'Alembert's principle for physical work in conservative systems; work from direct forces, ∆q • F = ∆q • a, balances work from inertial forces, ∆q • I = q • ∆a; generalize by coordinate transformation a → z; cases in which ∆z = 0 describe nonconservative systems 23 ∆q • F = ∆q F cos ω Abstract form of work as distance moved, ∆q , multiplied by component of force along path, F cos ω; for given lengths of force and frequency change vectors, the frequency changes that minimize the angle between force and frequency change maximize the work 19 Information theory:

Table 3 :∆
Mathematical forms that highlight similarities between different disciplines, part 2 F ā = ∆q • a = V w Natural selection moves population a distance equal to the variance in fitness; equivalent to abstract form of physical work with a ≡ F 11 ∆ F ā = V w = V g + V Partition variance (distance) into part associated with genetic predictors, V g , and part associated with other environment effects, V 12 ∆ NS ā = V g Analog of fundamental theorem, the part of total transmissible change caused by natural selection 12 ∆p = Cov(w, p) Replicator equation with p ≡ z as gene frequency within individuals and p as population gene frequency 14 ∆p = Cov(w, p) + E(w∆p) Group selection with p ≡ z as gene frequency within groups, first term as selection between groups, and second term as selection within groups 17 Extreme action:L = q • φ + constraintsLagrangian as work of direct forces, φ ≡ F; maximizing the work (action), q • φ, chooses the frequency changes, q, in the direction of the forces subject to constraints 29 qi = κq i φ i − φ Dynamics for constrained total frequency and constrained total distance, F = C 2 , with κ = C/σ φ and σ φ as standard deviation of forces 30Thermodynamics:

Table 1 :
Definitions of key symbols and concepts • safrank