Nonlinear Stochastic Dynamics of Complex Systems, II: Potential of Entropic Force in Markov Systems with Nonequilibrium Steady State, Generalized Gibbs Function and Criticality

In this paper we revisit the notion of the"minus logarithm of stationary probability"as a generalized potential in nonequilibrium systems and attempt to illustrate its central role in an axiomatic approach to stochastic nonequilibrium thermodynamics of complex systems. It is demonstrated that this quantity arises naturally through both monotonicity results of Markov processes and as the rate function when a stochastic process approaches a deterministic limit. We then undertake a more detailed mathematical analysis of the consequences of this quantity, culminating in a necessary and sufficient condition for the criticality of stochastic systems. This condition is then discussed in the context of recent results about criticality in biological systems


Introduction
This is part II of a series on stochastic nonlinear dynamics of complex systems. Part I [1] presents a chemical reaction kinetic perspective on complex systems in terms of a mesoscopic stochastic nonlinear kinetic approach, e.g., Delbrück-Gillespie processes, as well as a stochastic nonequilibrium thermodynamics (stoc-NET) in phase space. One particularly important feature of the theory in [1] is that it takes the abstract mathematical concepts seriously -that is, it follows what the mathematics tells us. For example, it was shown that the widely employed local equilibrium assumption in the traditional macroscopic theory of NET can be eliminated when one recognizes the fine distinction between the set of random events, the S in a probability space (S , F, P ) and a random variable that is defined as an observable on the top of the measurable space, x : S → R. The local equilibrium assumption is needed only when one applies the phase space stoc-NET to physically measurable transport processes [2].
The same chemical kinetic approach can be applied to other biological systems. Biological organisms are complex systems with a large number of heterogeneous constituents, which can be thought of as "individuals". To be able to develop a scientific theory for such a complex system with any predictive power, one must use a probabilistic treatment that classifies the individuals into "statistically identical groups". Thermodynamics and statistical mechanics provide a powerful conceptual framework, as well as a set of tools with which one can comprehend and analyze these systems. The fully developed statistical thermodynamic theory taught in college physics classes is mainly a theory of equilibrium systems. The application of its fundamental ideas, however, is not limited to just equilibrium systems or molecular processes. Stoc-NET [2,3,4,5,6], along with the information theoretical approach [7,8,9,10], is a further development in this area.
One of the key elements of the theory presented in [1] was the nonequilibrium steady state (NESS) potential, or "energy", defined as the minus logarithm of the stationary probability distribution of a kinetic model. In the past, this quantity has appeared repeatedly in the literature [11,12,13,14,15], but most of the studies focus on its computation. In this paper, we attempt to illustrate its central role as a novel "law of force", a necessary theoretical element in the stoc-NET of complex systems.
The paper is organized as follows: Section 2 serves as a brief historical review of the use of the negative logarithm of a stationary probability distribution as an energy potential. In Sec. 2.1 we first look at the history of using minus-log-probability to equilibrium chemical thermodynamics and briefly review J. G. Kirkwood's fundamental idea of the potential of mean force and the notion of entropic force. In Sec. 2.2, two recent results identifying the minus-log-probability as "energy" are described: a self-contained and consistent mesoscopic stoc-NET [16], and a precise agreement between its macroscopic limit and Gibbs' theory [17,18]. These two results provide strong evidence for the validity of such an identification. In Sec. 2.3, we discuss the legitimacy and centrality of stationary distribution in the "entropy inequality" for a Markov process from a mathematical standpoint. In Sec. 3, a definition of the "corresponding deterministic dynamics" of a stochastic process is proposed using power-scaling of probability densities. In Sec. 3.1 it is shown that the rate of convergence to this corresponding deterministic process coincides with the minus-log-probability definition of energy. With the justifications given in Sec. 2 and 3, a more detailed analysis of such a probability distribution is carried out in Sec. 4. In Sec. 4.1 terms analagous to Boltzmann's and Gibbs' entropy are defined, along with their corresponding microcanonical partition functions. The relative merits of these definitions are discussed. In Sec. 4.2, it is shown that the system has a crtical temperature if and only if the Gibbs' entropy of the system is asymptotic to the energy. In Sec. 4.3 several example distributions are discussed in order to emphasize some subtleties in the definition of states. Finally, in Sec. 5 the ideas from previous sections are related to some recent results on biological systems.

A novel law of force: Potential of entropic force
In Boltzmann's statistical mechanics, phenomenological thermodynamics is given a Newtonian mechanical basis. Based on the already well developed concepts of mechanical energy and its conservation, Boltzmann derived the relation 1 where U (x) is the mechanical energy of a microstate 2 x and p eq (x) is the probability of state x when the system is in thermal equilibrium -a concept which had also already been well established in thermodynamics via the notion of quasi-stationary processes. In a thermodynamic equilibrium, there is no net transport of any kind. 3 1 Boltzmann's mathematical derivation matched the modern maximum entropy principle with the constraint of given mean value for energy, which yields an exponential law for the enegy distibution. Note the mathematical statements of energy conservation N k=1 E k = C and fixed mean energy 1 N N k=1 E k = c are equivalent when N is given. 2 However, a thermodynamic state is a state of recurrent motion; defined by an entire level set A = {x | U (x) = E}. Thus, Boltzmann also introduced his celebrated entropy S B (E) = k B ln Ω(E) where S B is the entropy and Ω(E) is the number of microstates consistent with a given energy E. That is, Ω(E) is the cardinality of A. In terms of E, then p eq (E) ∝ Ω(E)e −E/kB T = e −[E−T S(E)]/kB T . 3 In the thermodynamics before Gibbs, macroscopic transport processes were driven by either a temperature or a pressure gradient in the three-dimensional physical space. In Gibbs' macroscopic chemical thermodynamics, a chemical equilibrium has no net flux in the abstract stoichiometric network. In the current mesoscopic, stochastic thermodynamics, an equilibrium has no net proba-Inspired by Boltzmann's law (1), generalizations of the concept of equilibrium thermodynamic potentials have been proposed in many studies. These generalizations go by a variety of names: generalized thermodynamic potential, kinetic potential, nonequilibrium potential, pseudo-potential, emergent landscape, etc. [11,12,13,14,15,19]. One of the common features of all these names is that the "potential function" is defined by applying Eq. 1 in reverse. One defines a potential based on the stationary probability, which can be obained in many statistical models and whose existence can be mathematically proven for a large class of systems. Most importantly, many systems with stationary probability have non-zero transport flux(es)! In fact, this tradition of taking (2) as a legitimate potential function started in equilibrium statistical chemical thermodynamics. Note that according to Eq. 1, the term −k B T ln p eq (x) is simply the total mechanical energy of state x, which is known a priori. Therefore, there is no reason to define (2) in studies of a pure mechanical system. However, in statistical chemical thermodynamics, one usually does not have a full Hamiltonian function for a complex molecule in hand. It is at this juncture that the notion of a potential of mean force [25] enters the theory.

Equilibrium potential of mean force
Physical chemists deal with complex molecules and force fields. Even though in molecular dynamics (MD) a molecule has a classical mechanical representation in terms of atoms as point masses, the precise potential energy is not known. The force fields in MD have therefore been under intense development over the past 50 years [26]. With such complexities, is it even possible to do statistical mechanics?
Let us first note a very important mathematical equality in connection to Eq. 1. We consider a function U (x) with x = (x 1 , x 2 ) where x ∈ S = bility transport in an appropriate state space. The notion of detailed balance independently arose in physics [20,21], chemistry [22,23] and in probability theory [24]. S 1 ⊕ S 2 , x 1 ∈ S 1 and x 2 ∈ S 2 . Then, We see that if one considers ϕ(x 1 ) as a "potential function" for the system in (coarse-grained) state x 1 , then one can obtain the same Z(T ) using Eq. 3, which is in the exact same form as in (1). More importantly, one sees that ϕ(x 1 ) is the free energy with fluctuating x 2 and fixed x 1 . After reading the calculations above, one is naturally led to the question, "what does this potential energy function ϕ(x 1 ) defined in (4) represent?" J. G. Kirkwood answered this question in a very satisfying manner [25]: it is the potential function of a "mean force", in equilibrium, acting on the system which is fixed at in which p eq (x 2 |x 1 ) is the conditional equilibrium probability distribution for x 2 given x 1 , and the partial derivative −(∂U (x 1 , x 2 )/∂x 1 ) x 2 is precisely the mechanical force in the x 1 direction, with the given x 2 . Averaging over the fluctuating x 2 with distribution p eq (x 2 |x 1 ), Eq. 5 is the mean force on x 1 . In other words, Eq. 4 states that the negative logarithm of the marginal probability distribution for x 1 is simply the potential of mean force if one chooses the free energy of the entire system, F (T ) = −k B T ln Z(T ), as the zero energy reference point.
One of the most important facts, as is clear from (4), is that the potential of mean force ϕ(x 1 ) is itself a function of temperature. In physical chemistry, one usually builds a statistical mechanical model using such a potential of mean force rather than using a mechanical energy function. That is, one uses a free energy function with certain degrees of freedom fixed, and averaged over all the others. Since ϕ(x 1 ) is temperature dependent, it has its own energy part and entropy part: A potential of mean force can be purely entropic. One of the best known examples is rubber elasticity, which arises from a Gaussian polymer chain [27]. If the temperature is suddenly droped to zero, the force (and its associated energy) disappears instantly.
Observing this significant conceptual distance between chemical thermodynamics and its mechanical origin, and the essential statistical nature of Gibbs' energy based on minus-log probability in all modeling practices, it is not surprising that some researchers who mainly work with biochemical thermodynamics strongly feel that one could reformulate statistical thermodynamics (at least in connection to energy) in terms of a "measure of information" and abandon the very term "entropy", along with its root in mechanics [28].

Nonequilibrium steady state potential
For stochastic models of equilibrium systems, therefore, (2) yields a meaningful free energy function, in k B T units. It embodies an exact coarsegraining procedure. For stochastic models of nonequilibrium steady state with non-zero transport flux, we now have sufficient evidence to suggest that (where p ss is a stationary distribution, but may or may not be an equilibrium distribution) is also a meaningful energy function. We start with some conceptual discussions. First, outside classical mechanics, the question "what is a force and how do we quantify it" is highly non-trival and vague. Onsager, however, introduced the notion of a thermodynamic force in his theory of irreversible processes [29]. Intuitively, a force is the cause of an action. In Newtonian mechanics, a force is the cause of a change in the vector d dt x. But in an "overdamped world", which emcompasses most of chemistry, biology, and society, a force is actually needed to cause a meaningful movement (i.e., a transport).
In terms of the mathematical theory of stochastic dynamics, there is a universal conception for movement, or "dynamics": Given the option to move to one of many states, a system is most likely to move to the state with the highest stationary probability. One should immediately note that this statement is highly problematic from a rigrous mathematical standpoint. Nevertheless, at least in one class of systems, the above notion is attainable: the class of systems whose dynamics have an invariant measure that is ergodic.
When discussing statistical mechanics, Montroll and Green have stated that [32] "The aim of statistical mechanics is to develop a formalism from which one can deduce the macroscopic behavior of physical systems composed of a large number of molecules from a specification of the component molecular species, the laws of force which govern intermolecular interactions, and the nature of their surroundings." With the rise of equilibrium chemical thermodynamics, it is clear that the "laws of force" themselves can be discovered from the equilibrium distribution. In fact, most such laws of force in biophysical modeling are statistical in nature and can be seen as entropic forces.
Indeed, "[t]o date no one has succeeded in deriving the laws of nonequilibrium phenomena from the [Newtonian] equations of motion merely by allowing the number of particles involved to become infinite. However, considerable success has been achieved by introducing various statistical hypotheses." [32] Recent studies have shown that if one identifies H(x) as a "generalized Helmhotz or Gibbs energy function", a complete and consistent mesoscopic thermodynamics can be formulated that includes nonequilibrium steady states [16,2]. Furthermore, if one passes the system from mesoscopic to macroscopic by allowing the number of particles involved and the system's volume to become infinite, two macroscopic thermodynamic laws can be derived [17]. If the mesoscopic system is a general chemical reaction network with detailed balance, the macroscopic emergent potential was shown mathematically to be Gibbs' function G(x), where x i are the concentrations of chemical species, with ∂G/∂x i being the chemical potential for the i th species. The same theory also proves the existence of, and provides an equation for computing, a generalized Gibbs function for an open chemical reaction network under a chemostat, which approaches to a nonequilibrium steady state.

Stationary distribution and entropy inequalities of Markov processes
Unless stated otherwise, we will exclusively deal with a denumerable state space S (either finite or infinite) for the remainder of the paper.
A stronger monotonicity result. The strongest version of a monotoic entropy result that we are aware of is [33,34] in which p x (t) and q x (t) are two solutions to the Kolmogorov forward equation with different initial distributions. Eq. 9 immediately yields a variety of related inequalities: where {π x } is a stationary distribution of the Markov process, then (9) is the widely known "free energy theorem" [35,36].
(ii) When q x (t) ≡ π x ∀t, and p i (0) = δ iℓ , one has therefore, where I[x t x 0 ] is the mutual information between x 0 and x t of a stationary Markov process. Similarly, This result was in [37]. The term inside (· · · ) is the conditional Shannon entropy H[x t |x 0 ] for the stationary x t . It is also the Kolmogorov-Sinai (KS) entropy of every t steps of the stationary x t : The result is more easily understood when interpreted this way: KS entropy quantifies the randomness in a "map". The randomness does not decrease with map composition.
(iii) When p x (t) ≡ π x (and when we then rename q x (t) as p x (t)), we have To explain this result more intuitively, we note that the sum in (11) can be interpreted as the information lost when predicting π x from p x (t). Roughly speaking, if t 1 < t 2 , then it takes more information to predict the distant future (π x ) from time t 1 than it does from time t 2 because the prediction from p x (t 1 ) has to account for the random events that can happen within the time interval [t 1 , t 2 ]. Filtration and entropy monotonicity. Even though the original Shannon entropy used an implicit uniform prior, the necessity for an explicit prior has been widely discussed in information theory 4 [30,31]. More importantly, for a continuous random variable, the logarithm of a probability density is simply ill-defined mathematically. All the various monotoic "entropy" results in the previous section provide the legitimacy of using {π x } as the reference measure for a Markov process. We would like to argue that this is in fact necessary.
We consider a Markov process in a more general setting in this section. Let the triple (S , F, P ) be a probability space; let (I, ≤) be a totally ordered index set; and let (S, Σ) be a measurable space. If X : I × S → S is a stochastic process, then its natural filtration of F with respect to X is a sequence F That is, F is the smallest σ-algebra on S that contains all pre-images of Σ-measurable subsets of S for times j up to i. The definition given in (12) yields a monotonic relation Such a property is called non-anticipating; in other words, "when including the future, the dynamics are at least as random as up to now." The monotonicity in Eq. 13 can be expressed in terms of Shannon's information entropy as This inequality is true because which is never negative. Notice that Eqs. 13 and 14 are concerned with the sequences of X j |j ≤ i , but the "entropy monotonicity" results in statistical physics deal with individual X i and X i+1 , and entropy has deterministic values that are different for different times. The relationship among X i , X i+1 , and the filtration is shown as We now consider the information lost from X i to X i+1 when the event ω occurs, ln P X i+1 (ω) − ln P X i (ω). Then its expected value with respect to the stationary, invariant measure µ π (ω) is given by If both X i and X i+1 are real valued (i.e., S = R) with density functions where π(x) = dµ π /dx is the density of the stationary measure. We know that Eq. 17 is never negative; therefore the mean information lost or equivalently, where X ss : S → S is a random variable distributed according to the stationary distribution π. This is essentially equivalent to the result in Eq.

11.
Eq. 18 states that information lost from X i to X i+1 , averaged with respect to the invariant density, is always greater than zero, while Eq. 19 suggests that "the infinitely distant future has more information to gain from X i than from X i+1 ". There is a subtle difference between these statements and the following: "when including the future, the world is at least as random as up to now." The reason for this, we suggest, is that (18) and (19) require the existence of the stationary measure. Knowing the existence of a stationary behavior, "the future is at least as random as now."

Deterministic correspondence and infinite β
Any representation of reality requires elements of both chance and determinism. These correspond to the stochastic and deterministic components of complex dynamics. As repeatedly pointed out in [38,39,40], it is the interaction between these two that yields self-orgranization and complex behavior. Therefore, the ability to "envision" a corresponding deterministic dynamics to some given stochastic dynamics, even when there is no obvious "system size parameter", provides a deeper understanding of complex dynamics. The natural parameter for a stochastic differential equation (SDE) dx(t) = b(x)dt + adB(t) is the noise strength a; the natural parameter in classical statistical mechanics is the system's size (or one could use the temperature); and the natural parameter in a Delbrück-Gillespie process is the system's volume.
How can one envision such a deterministic correspondence when no obvious natural parameters exist? It is becoming increasingly common to use the modal value of a distribution as a "deterministic" counter part to the stochastic system. According to this view, a bimodal distribution corresponds to a bistable system. Note it is a widely held misconception that the mean dynamics x(t) are the deterministic counterpart of a stochastic x(t). For a SDE, dx(t) = b( x ) in general. More importantly, while x(t) is a non-random function of t, it is not a trajectory of any meaningful, self-contained dynamical system. This point is best illustrated by the fact that the differential equation describing x(t) usually depends on higher moments like x 2 (t) . Moreover, for a discrete system, even if the mean is defined, it does not usually lie in the same space as x(t).
We propose the following "deterministic" counterpart for a random variable x with probability mass function p ss x , and we will show that it is intimately related to the energy defined in (8). We will define the "deterministic" variable x ∞ as where with normalization constant The random variable x ∞ will be concentrated on a finite number of states (the most probable ones of p x (x)) with probability 1. In particular, if p ss x (x) is unimodal, then x ∞ really will be a deterministic system. On the other hand, if p ss x (x) is multimodal, then there is no unique deterministic counterpart. Applying this idea to a discrete-state Markov process, the corresponding dynamics become a deterministic transformation, as discussed in [44].
It is worth noting that similar definitions are often introduced formally as analogues to inverse-temperature without any discussion of deterministic correspondence (e.g., [8,9]). We spend so much time on the concept in order to emphasize that it arises naturally in a study of stochastic systems, without any reference to thermodynamic concepts. The scaling factor β should not just be thought of as a formal method for introducing temperature to a system, but as a natural feature of any probabilistic system.
With this definition in hand, the obvious question becomes "how fast does the limit in (20) converge?" In the next section, we will try to make this question more rigorous, and in the process provide more evidence that H(x) is an important quantity.

Large deviation principle for infinite β
We will now investigate the rate of convergence of the limit in (20). This is a question well suited to the methods of large deviation theory, but before we can use such methods we need to frame the question somewhat more rigorously. Strictly speaking, we should be dealing with limits of measures rather than limits of random variables.
Let (S , F, P ) be a discrete probability space with probability mass function p ss and define the family of measures P β on (S , F) whose probability mass functions are given by (As we will show later, this is always possible for β ≥ 1.) In addition, let (S, Σ) be a measurable space and choose a function σ : S → S. This defines a family of S-valued random variables O β , where In particular, if σ is one-to-one, then Pr {O β = z} = p(σ −1 (z), β).
For unimodal distributions, we know that as β goes to infinity, the distribution of O β becomes concentrated on a single value z * ∈ S. However, it is not clear a priori how the rate of this convergence depends on our choice of O. It is conceivable that different observables could lead to different convergence rates. Moreover, we could eschew observables altogether and work solely with the measures P β . In this section we will show that the rate of convergence is identical for a wide range of observables, and that it is intimately related to H(x).
Case (i): Let S = R. We will not restrict σ to be one-to-one, but we will assume that if σ(x 1 ) = σ(x 2 ) for some x 1 , x 2 ∈ S , then p ss (x 1 ) = p ss (x 2 ). We will let N (x) denote the (necessarily finite) number of elements y ∈ S such σ(y) = σ(x). Finally, let x * ∈ S be a state with maximal probability. We know that for any η ∈ R + . In fact, Pr {|O β − σ(x * )| ≥ η} is a non-increasing step function of η. Under reasonable conditions, we can write where If we definex η = argmax x∈S {|σ(x) − σ(x * )|}, then we have

Case (ii):
Instead of creating a somewhat arbitrary family of observables O β , we can also work solely with the measures P β . To make this more convenient, we will introduce some additional notation.
Let Y = H(S ) ⊂ R and let y * be the minimum value in Y . For any h > y * , let S h = {x ∈ S | H(x) < h} and Y h = {y ∈ Y | y < h}. Let ⌊h⌋ denote the minimum value of Y \ Y h . Finally, define We know that P β (S \ S h ) approaches zero as β goes to infinity. Much like the previous case, we would like to know how quickly this quantity decays. We have where In fact, this is in some sense just a special case of case (i). If we choose σ = H and let h = η + y * , then I 1 and I 2 are identical.
Case (iii): One of the key insights from the theory of large deviations is that in the limit of β → ∞, the probability Pr x β / ∈ S h is determined by one particular x * / ∈ S h , the one with p(x * , 1) ≥ p(x, 1) ∀x / ∈ S h . Therefore, one has lim β→∞ p(x, β) ≈ e −βI 3 (x) , for any z ∈ S . This is essentially the same as the WKB ansatz. We then have

Entropy, energy and criticality in systems with generalized potential
The results of the previous section suggest that H(x) = − ln p ss (x) is a mathematically relevant quantity and that it can reasonably be interpreted as an energy. We will now investigate some of the consequences of this definition in more detail. In particular, we will shed some light on the distinction between Gibbs and Boltzmann entropies and derive a necessary and sufficient condition for the existence of a critical temperature in stationary stochastic systems.
Let us again suppose that our system takes on possible states from a discrete (finite or countably infinite) set S , and let p ss : S → [0, 1] be the probability mass function describing the chance that event x ∈ S ocurrs. As above, we will define the energy of a state x ∈ S as In addition, we will avoid substantial difficulties later if we endow H with units of energy. If we do so, then we can no longer simply write p ss (x) = e −H(x) . Instead, we need to introduce another parameter β with units of inverse energy. This gives us where the partition function Z(β) is defined as Note that the partition function is necessarily a dimensionless quantity, as discussed in [41,42,43]. These distributions are precisely the probability mass functions of the measures P β defined in Sec. 3. With this definition, there is a serious concern that the sum in (31) might not converge. Since p ss is a probability distribution, however, we do know that the sum converges for β = 1 (in fact, we know that Z(1) = 1.) We will spend much of the following sections discussing the cases where the sum in (31) diverges, but for the moment we will simply assume that Z(β) is well-defined on some subset of R containing [1, ∞).
In classical statistical mechanics, one typically has the mechanical energy function in hand before p ss , and then shows that the system at finite "temperature" β −1 has an equilibrium distribution among the states described by (30). Note that when β → ∞, the distribution p ss (x; β) converges to a uniform probability distribution on the set of states with minimal H. For certain non-convex H(x), the phenomenon of phase transition occurs [45]. This limit gives precisely the deterministic correspondence described in Sec. 3.
In a classical statistical mechanical problem, S is a continuous space describing the positions and momenta of all particles in the system, H is a Hamiltonian for this system and β = (k B T ) −1 is the inverse temperature.
One would then be interested in level sets with constant energy h. In particular, Gibbs' and Boltzmann's entropies are concerned with the phase volume and phase surface area of such level sets.
Unlike in a classical problem, though, our state space S is arbitrary, and in general may not be useful as a phase space. In particular, S often does not come equipped with a metric, or even any sort of order. To remedy this, we will define the rank of a state x as where # |·| denotes cardinality. That is, the rank of x is the number of states which have lower energy than x (or are at least as probable as x). Since R depends on x only through p ss (x), we can unambiguously define the rank in terms of energy as V : [0, ∞) → Z + as so that R(x) = V(H(x)) for every x ∈ S . Notice that V, as opposed to R, is no longer defined on a discrete space -it is a function of the continuous variable h -but because S is discrete, V can be written as a non-decreasing piecewise constant function.
It is also worth noting that our assumption of a countable state space cannot be easily relaxed in this approach. If S were uncountable, then one could not hope to order the states by their rank. Indeed, R and V would generally be infinite for almost all input. Such issues arise because p ss is, by assumption, a probability density with respect to the counting measure. We could have instead assumed that p ss was a density with respect to some other measure (e.g., the Lebesgue measure on S = R), but this would introduce many other subtleties later on.

Microcanonical partition functions and entropy
If we take the liberty of treating the derivative of a Heaviside function as a Dirac−δ function, then we can write V as It is very important to note that ∂V(h)/∂h has units of inverse energy. It is tempting (and often quite useful) to define and then write where the sum is taken over the values h n ≤ h such that Ω(h n ) > 0. 5 However, one should keep in mind that dV/ dh = Ω(h). That is, dV/ dh is not really just a number of states; it is a density 6 . One of the main reasons we have introduced this notation with V is that it gives us a much more convenient way to write Z(β). In particular, we can write Z without reference to the individual states x.
This is exactly the Laplace-Stieltjes transform of V.
It is tempting to rewrite Z as (38) and to then identify ∂V/∂h as the microcanonical partition function and k B ln Ω(h) as the entropy. Unfortunately, this is entirely wrong. Equation 38 relies on the identification of ∂ V ∂h with Ω(h), which is invalid. This method can be salvaged by introducing a factor ∆h with units of energy, so that the (38) becomes and the entropy becomes In fact, if we choose ∆h as a constant, then this is exactly the Boltzmann entropy. Such a solution is somewhat unsatisfying; the introduction of arbitrary constants to correct units generally suggests a deeper misunderstanding.
(Worse yet, there is no real reason for ∆h to be constant, so long as it has the correct units.) A much more satisfying interpretation of Z arises if we integrate by parts, obtaining (41) Here, we can interpret V(h) as the microcanonical partition function and as the entropy. We have chosen the subscripts G and B to emphasize that S B corresponds to Boltzmann entropy, while S G corresponds to Gibbs entropy. There has been much debate over the relative merit of these definitions of entropy in statistical mechanics (e.g., [51,52,55,54]). While we do not claim to have resolved this question, equations (38) and (41) suggest that Gibbs entropy is the more natural choice. Furthermore, as we will see in the next section, Gibbs entropy plays a central role in the notion of criticality.
It is worth noting that the terminology surrounding Boltzmann and Gibbs entropy is not entirely consistent. Most notably, some authors (e.g., [56,57]) use the phrase "Boltzmann entropy" to refer to the logarithm of the volume of any phase space region corresponding to a suitable macrostate and use "Gibbs entropy" to refer to the quantity p ln p dx, where p is some probability density. Using this terminology, (40) and (42) would both be Boltzmann entropies, but would use different macrostates.
Instead, we follow the convention used in, e.g., [52,51,55,54] and use "Boltzmann entropy" to indicate the logarithm of the volume of a thin shell in phase space and "Gibbs entropy" to indicate the logarithm of the volume of the interior of such a shell. If the quantity p ln p dx is needed, we will refer to it as Shannon entropy.

Analyticity of Z as a function of β
The analyticity of Z(β), which is analogous to the partition function in statistical mechanics, is intimately related to phase transitions and critical phenomena [46,47,48,49]. Our system has a critical temperature (in the statistical mechanical sense of the term) if and only if the partition function is non-analytic for some β ∈ (0, ∞). Since Z(β) is a Laplace transform, we have access to some useful theorems from classical analysis, all of which can be found in [50].
First, there is some value β c ∈ [−∞, ∞] such that Z(β) converges for all β ∈ C with real part greater than β c and diverges for all β ∈ C with real part less than β c . The value β c is called the abscissa of convergence.
Second, if the state space S is finite then Z is a sum of finitely many terms and therefore converges for any β (i.e., β c = −∞). However, if S is infinite then the partition function will not be analytic for all real β. In particular, it cannot converge when β = 0 because Z(0) = # |S |. However, by definition we know that Z(β) converges when β = 1, since Z(1) is the normalization constant of p ss . For infinite systems, the abscissa of convergence must therefore lie somewhere in [0, 1].
Since the abscissa of convergence is non-negative, we have or We now know that the partition function is analytic for all complex β with real part greater than β c , where β c is found as in (44). However, we have not yet shown that Z(β) cannot be extended analytically beyond β = β c . For a general Laplace-Stieltjes transform, this might be possible. (In the worst case, a Laplace transform may have a finite abscissa of convergence, but still have an analytic continuation to the entire complex plane.) Fortunately, since V is monotonic, Z(β) has a singularity at β c . (This also means that β c = 1.) This means that the partition function Z(β) has a singularity at some positive β c if and only if S G is asymptotic to h in the sense of (44).

Examples
So far, we have let our system be very general. The arguments above apply equally well to a wide range of systems -from the single electron of a hydrogen atom (where S is the set of possible orbits) to the configuration of amino acids in a strand of DNA. It is not immediately clear how (44) might be influenced by the structures of S and p ss . To illustrate the consequences of our result, we will look at a few examples.
First, we will investigate two so-called "non-degenerate" cases where each state has a distinct probability (i.e., Ω ≡ 1). Since we only care about the rank of states, we will suffer no loss of generality by assuming that S = Z + and that the states are ordered so that p ss (x) > p ss (y) whenever x < y. As an example, consider the distribution: We have This distribution therefore does not have a critical temperature, which should not be surprising, since it is exponential. Alternatively, consider a power law distribution.
where α > 1 and ζ is the Riemann zeta function. This gives us This means that power law distributions do indeed have a critical temperature. This result was already demonstrated in [8], but arises as a special case of our work. These examples highlight the main feature of criticality: a system will be critical if and only if the probability of a state decays too slowly as a function of rank. That is, critical distributions are fat-tailed in "phase space".
We observe a similar result when Ω is not identically 1 ("degenerate" distributions). For example, consider a distribution where, for each n ∈ Z + , there are 2 n states with stationary probability 2 −2n . That is, for each h n = 2n ln 2, we have Ω(h n ) = 2 n . In this case, and we find that β c = 1/2. In light of our previous examples, this should not be surprising: when written as a function of rank, p ss decays like x −2 , so this β c is exactly what we expect. However, it also illustrates the importance of how we label our state space. Suppose that we observed the system given above, but that we could not identify each individual state. If instead of observing 2 n distinct states, each with probability 2 −2n , we only measured 1 state with probability 2 −n , we would then calculate the probability distribution p ss (x) = 2 −x , for which β c = 0. Depending on how states are counted, the distribution could either have a critical temperature or not! This distinction is exactly why the partition functions in classical and quantum statistical mechanics differ by a factor of N !. The classical version overcounts the number of possible microstates because it assumes particles are distinguishable. Without the correction term, this would often lead to substantially different predictions between the two theories. Fortunately, we know that quantum mechanics is the correct theory, and so we are able to choose the correct definition of a microstate.
In many applications, however, we do not know what a true microstate looks like. For example, imagine a particle undergoing a random walk on a lattice X, and suppose that we can measure only the distance r between a particle and the origin. It would be natural to define a microstate of this system by the distance between the particle and the origin. If X = Z + , then this is exactly correct, but if X = Z, then there are really 2 microstates for each r. Worse yet, if the lattice is two-dimensional (i.e., X = Z × Z), then each r corresponds to a different number of microstates, and this number grows without bound. As discussed in section 2.1, we can still find a reasonable interpretation for the energy of such a system. If we treat each r as a microstate, then H(r) is the potential of mean force in the radial direction. However, our notions of entropy and criticality may change drastically depending on how we define our state space.
For a slightly more involved example, consider the so-called "zipper model" (described in, e.g., [58,59,60]). This is a highly simplified model of, among other things, the conformation of a double-stranded DNA molecule. Suppose there are N base pairs along the DNA molecule (where N can be a positive integer or ∞; if N = ∞ then think of the molecule as having a fixed left end, but extending infinitely to the right), each of which can either be linked or broken. We will assume that there is only one possible linked configuration for each base pair, but that there are G possible broken configurations for each pair, where G is a positive integer. Furthermore, we will suppose that bonds are only broken from left to right. That is, it is possible for a base pair to be in one of the G broken configurations if and only if every base pair to the left is also broken. 7 Suppose that the energy of a linked base pair is 0 and that the energy of any of the G broken configurations for a single base pair is E > 0 if all base pairs to the left are broken and infinite otherwise. When N = ∞ and G > 1, this system has a phase transition at β = ln G/E. Otherwise, it has no critical temperature [59]. We will show that this critical behaivor is reproduced using (44).
The state space S of this system is the collection of all possible allowed configurations of linked and broken base pairs. Each configuration consists of m broken base pairs followed by N − m linked base pairs, and there are G m distinct states for each m. Notice that S is finite whenever N is and countably infinite when N = ∞. The probability of each of these configurations is given by where m is the number of broken base pairs in x and Q N is a constant that depends on N (but not x). Note that it is not immediately obvious from the previous assumptions that p ss (x; N ) is well-defined, but one can show that Q ∞ is non-zero and finite for sufficiently large E. (In fact, we can solve for Q ∞ exactly, but for our purposes it is enough to know that it is finite.) Since S is finite whenever N is, we know that there is no critical temperature for p ss (·; N ) when N < ∞, so consider the case where N = ∞. The possible energy values are h m = mE − ln Q ∞ for any m ∈ Z + . The Gibbs entropy is therefore if Applying (44), we therefore have These critical temperatures exactly match the known values, and the mechanism for this behavior is easy to see. When G = 1, the phase-volume V(h) grows linearly with h, but when G > 1 the phase-volume grows exponentially. This allows the entropy S G to keep pace with the energy as h grows, leading to a criticality.
The preceding calculations are quite similar to those used in the equilibrium statistical mechanical approach of Kittel ([59]), but the procedure is very different in spirit. In Kittel's approach, one finds Q N for arbitrary N , then uses Q N to calculate a statistic such as the expected number of broken base pairs. Finally, one takes the limit as N → ∞ and demonstrates that this statistic becomes non-analytic at some finite temperature. In particular, Kittel warns that "it is dangerous to write ... the partition function for N = ∞; the correct procedure is to evaluate the thermodynamic quantities for finite N and then to examine the limit." In our approach, we start by finding p ss (x; ∞) (up to a constant). Once we have obtained this distribution, we can calculate S G (h) for the infinite system and directly obtain β c . The danger that Kittel describes is still present: our method will fail if p ss (x; ∞) is not well-defined.

Discussion
It is worth taking a moment to discuss not only what we have shown in the previous sections, but also what we have not shown. We have demonstrated that a stationary distribution over a discrete state space has a critical temperature if and only if the Gibbs entropy of the distribution (42) satisfies the relation (44). The terminology used here is deliberately suggestive, but one should not take it too far. For one thing, there are phase transitions in equilibrium statistical mechanics that do not seem to fit the description given in section 4. The Lee-Yang theorem, for instance, describes cases where the partition function becomes zero rather than infinite, and two-dimensional Ising models can exhibit different types of phase transitions.
The key point is that we have assumed, from the outset, the existence of a well-defined stationary probability distribution on a countable state space. Such a distribution has a critical temperature β c if Z(β) approaches either zero or infinity as β → β c . Because p ss (x; β = 1) is a probability mass function, Z(β) cannot become zero for any finite β. That is, Lee-Yang type criticalities can only occur if the stationary distribution p ss is not welldefined for any temperature.
Ising models, on the other hand, may have well-defined equilibrium distributions even in the thermodynamic limit. However, these models typically have an uncountable state space when N → ∞. For such a distribution, the proofs of section 4 do not hold as written and other types of criticalities may be present.
Mora and Bialek have also discussed this approach in regards to Ising models [8]. In particular, they showed that systems where p ss (x; N ) ∝ R(x) −α follows a power law have a critical temperature given by β c = 1/α when N goes to infinity. Their result utilized the identification of S G with S B , which becomes precise in the thermodynamic limit. In the present paper, we have shown that such an identification is unnecessary and that the critical temperature conditions are still exact in "smaller" systems. Moreover, we have found a broader condition for the existence of a critical temperature, of which the power law relationship is a special case.
After Mora and Bialek's paper, there has been much discussion about the idea that biological systems are poised at a critical point. This idea arose because researchers obtained estimates of p ss for a wide range of biological systems, and all appeared to follow some sort of power law. Such a distribution would indicate a non-zero abscissa β c . The result from Sec. 4.2 does seem like it should indicate a criticality, but there are some important caveats worth considering.
First, it is notoriously difficult to calculate tail properties (such as β c ) from an estimated distribution. Estimates of p ss are necessarily based on a finite number of samples, and therefore cannot give reliable information about arbitrarilly low probability events, which is required to calculate (44).
Second, and much more insidious, many biological processes are not in a true steady state. The formal analogies we have made with statistical mechanics only make sense in the context of stationary systems. If p ss is actually actually varies slowly with respect to some other variable (most importantly time), then our notion of criticality does not necessarily correspond to any interesting feature of the system. For instance, Schwab, Nemenman and Mehta [53] has shown that slowly varying latent variables can give rise to apparent power law distributions, which necessarily have a non-zero β c , even in conditionally independent systems.