Thermodynamics Beyond Molecules: Statistical Thermodynamics of Probability Distributions

Statistical thermodynamics has a universal appeal that extends beyond molecular systems, and yet, as its tools are being transplanted to fields outside physics, the fundamental question, what is thermodynamics, has remained unanswered. We answer this question here. Generalized statistical thermodynamics is a variational calculus of probability distributions. It is independent of physical hypotheses but provides the means to incorporate our knowledge, assumptions and physical models about a stochastic processes that gives rise to the probability in question. We derive the familiar calculus of thermodynamics via a probabilistic argument that makes no reference to physics. At the heart of the theory is a space of distributions and a special functional that assigns probabilities to this space. The maximization of this functional generates the mathematical network of thermodynamic relationship. We obtain statistical mechanics as a special case and make contact with Information Theory and Bayesian inference.


DERIVATIONS
This document gives derivations of results that appear in the paper. All equation numbers not prefixed by "SI" refer to the manuscript.

Homogeneous Bias (Equation 14)
Homogeneity allows us to express log W as an integral over the variational derivatives log w(x; h), where a(x) is a fixed function of x, Eq. (SI.1) is satisfied with log w(x; h) = a(x), and Eq.
Equations (SI.1) and (SI.3) are the equivalents of the following two results for homogeneous functions f (x 1 , x 2 · · · ) of degree 1 with respect to all x i , extended to functionals:

Most Probable Distribution in Biased Sampling (Equation 20)
We maximize the generic probability functional (Eq. (16) in the paper) with respect to h under the normalization constraint Using the Lagrange multiplier λ 0 , the equivalent unconstrained maximization problem is with q, λ 0 and r fixed. We set the variational derivative at h = h * equal to zero, and solve for h * to obtain we have: and finally, α = r. The most probable distribution is This is Eq. (20) in the text.

Canonical Probability Functional (Equation 22)
We obtain the canonical functional by setting h 0 (x) = βe −βx in Eq. (SI.7): wherex is the mean of h. We define q = r/β and write the canonical functional as This is Eq. (22) in the text.

Most Probable Distribution in Canonical Space (Equation 24)
The canonical functional in Eq. (SI.14) is a special case of the generic functional in Eq. (SI.7) with h 0 = βe −βx and q = r/β. The most probable distribution of the generic probability functional is given in Eq. (SI.12); accordingly, the most probable distribution in the canonical space is obtained from that equation with h 0 (x) = βe −βx and r = qβ: We write Eq. (24) as and take the derivative d(log q)/dβ: (SI.17) The last integral is identically equal to zero by virtue of Eq. (SI.3). The final result is

Microcanonical Probability Functional (Equation 27)
The microcanonical functional in the continuous limit is with r such that normalization is satisfied.
which is Eq. (27) in the text.

Most Probable Distribution in Microcanonical Space (Equation 24)
We now show that that the distribution that maximizes the microcanonical functional is given by the same distribution as in the canonical case (Eq. 24 of the manuscript). We maximize the microcanonical functional The equivalent unconstrained maximization is where λ 0 and λ 1 are Lagrange multipliers andx and ω are fixed. We set the variational derivative with respect to h equal to zero: and solve for h * : This is the same as the most probable distribution in the canonical space.

Relationships for log ω (Equations 29 and 31)
We write the microcanonical probability functional in the equivalent form The entropy of the most probable distribution is We substitute this result into Eq. (SI.31) to obtain log ω = βx + log q.
Here we show that log ω is concave function ofx. Consider the microcanonical spaces of distributions with meansx 1 andx 2 and let h * 1 and h * 2 be the most probable distributions in these spaces. We form the distribution h by linear combination of h * 1 and h * 2 , Let h * be the most probable distribution in the space of distributions with meanx. We then have: and states that log ω(x) is a concave function ofx. It follows that ∂ 2 log ω ∂x 2 ≤ 0, (SI.37) which is Eq. (33) in the text.

Existence of W (Equation 41)
Given the functional derivative log w(x) = log f (x) + a 0 + a 1 x, (SI.38) the selection functional is obtained via the Euler theorem SI.39) and the functional on the left-hand side of Eq. (34) becomes (SI.40) This is maximized by h = f (a 0 , a 1 andx are constant) and its maximum is We set  (40) is a special case of (SI.38) with a 0 = a 1 = 0, therefore it also satisfies the theorem.

Entropic selection functional Eq. (45)
First we write the entropy functional in the homogeneous form and solve for f (x): f (x) = e −βx/2 √ q . (SI.48) We obtain the parameters β and q from the zeroth and first order moments: