Open Access
This article is

- freely available
- re-usable

*Entropy*
**2019**,
*21*(9),
890;
https://doi.org/10.3390/e21090890

Article

Thermodynamics Beyond Molecules: Statistical Thermodynamics of Probability Distributions

Department of Chemical Engineering, Pennsylvania State University, University Park, PA 16802, USA

Received: 26 July 2019 / Accepted: 11 September 2019 / Published: 13 September 2019

## Abstract

**:**

Statistical thermodynamics has a universal appeal that extends beyond molecular systems, and yet, as its tools are being transplanted to fields outside physics, the fundamental question, what is thermodynamics, has remained unanswered. We answer this question here. Generalized statistical thermodynamics is a variational calculus of probability distributions. It is independent of physical hypotheses but provides the means to incorporate our knowledge, assumptions and physical models about a stochastic processes that gives rise to the probability in question. We derive the familiar calculus of thermodynamics via a probabilistic argument that makes no reference to physics. At the heart of the theory is a space of distributions and a special functional that assigns probabilities to this space. The maximization of this functional generates the mathematical network of thermodynamic relationship. We obtain statistical mechanics as a special case and make contact with Information Theory and Bayesian inference.

Keywords:

statistical thermodynamics; statistical mechanics; biased sampling; most probable distribution## 1. Introduction

What is thermodynamics? The question, so central to physics, has been asked numerous times and has been given nearly as many different answers. To quote just a few: thermodynamics is “the branch of science concerned with the relations between heat and other forms of energy involved in physical and chemical processes” [1]; “the study of the restrictions on the possible properties of matter that follow from the symmetry properties of the fundamental laws of physics” [2]; “concerned with the relationships between certain macroscopic properties of a system in equilibrium” [3]; “a phenomenological theory of matter” [4]. Such statements, while strictly true, focus on aspects that are far too narrow to converge to a definition of sufficient generality as to what to call thermodynamics or how to carry it outside physics. And yet, since Gibbs [5], Shannon [6] and Jaynes [7] drew quantitative connections between entropy and probability distributions, thermodynamics has been spreading to new fields. The tools of statistical thermodynamics are now used in network theory [8], ecology [9], epidemics [10], neuroscience [11], financial markets [12], and in the study of complexity in general. What motivates the impulse to apply thermodynamics to such vastly diverse problems? Is thermodynamics even applicable outside classical or quantum mechanical systems? And if so, what is the scope of its applicability?

Here we answer these fundamental questions: Statistical thermodynamics is variational calculus applied to probability distributions and by extension to stochastic processes in general; it is independent of physical hypotheses but provides the means to incorporate our knowledge and model assumptions about the particular problem. The fundamental ensemble is a space of probability distributions sampled via a bias functional. The maximization of this functional expresses a distribution—any distribution—via a set of parameters (microcanonical partition function, canonical partition function and generalized temperature) that are connected through a set of mathematical relationships that we recognize as the familiar equations of thermodynamic. Entropy and the second law have simple interpretations in this theory. We obtain statistical mechanics as a special case and make contact with Information Theory and Bayesian inference.

## 2. The Calculus of Statistical Thermodynamics

Before we derive a theory of generalized thermodynamics we review the key elements of the standard thermodynamic calculus. The central quantity of interest in statistical thermodynamics is the probability of microstate. For a system of N particles in volume V and temperature T this probability is given by the exponential (canonical) distribution,
where Q is the canonical partition function, ${E}_{i}$ is the energy of microstate, $\beta =1/{k}_{B}T$ and ${k}_{B}$ is Boltzmann’s constant. The corresponding probability to find the system in a microstate with energy E is obtained by summing all microstates with fixed energy E and is given by
where $\mathsf{\Omega}$ is the microcanonical partition function, also equal to the number of microstates with energy E, volume V and number of particles N. The mean energy $\overline{E}$ and the parameters $\mathsf{\Omega}$, Q and $\beta $ that appear in Equation (2) are interrelated:

$$\mathrm{Prob}(\mathrm{microstate}i)=\frac{{e}^{-\beta {E}_{i}}}{Q},$$

$$\mathrm{Prob}\left(E\right)=\mathsf{\Omega}\frac{{e}^{-\beta E}}{Q},$$

$$\begin{array}{c}log\mathsf{\Omega}=\beta \overline{E}+logQ,\end{array}$$

$$\begin{array}{c}\beta =\frac{\partial log\mathsf{\Omega}}{\partial \overline{E}},\end{array}$$

$$\begin{array}{c}\overline{E}=-\frac{\partial logQ}{\partial \beta},\end{array}$$

$$\begin{array}{c}\frac{{\partial}^{2}log\mathsf{\Omega}}{\partial {\overline{E}}^{2}}\le 0.\end{array}$$

Equations (3) and (4) establish that $log\mathsf{\Omega}(E,V,T)$ and $logQ(\beta ,V,N)$ are Legendre pairs; Equation (6) states that $log\mathsf{\Omega}$ is concave. In addition, any probability distribution ${p}_{i}$ that could be assigned to microstate i under fixed $(\overline{E},V,N)$ satisfies the inequality,
with the equal sign only for the canonical distribution in Equation (1). This inequality is the statistical expression of the second law. If we identify ${k}_{B}log\mathsf{\Omega}$ with entropy and $-(logQ)/\beta $ with free energy Equations (3)–(6) represent the familiar relationships of classical thermodynamics. Along with Equations (2) and (7), which provide the probabilistic context, the above set comprises the core relationships of statistical thermodynamics. The physical assumptions and postulates that produce these results can be found in any standard textbook (for example [3]). We will now show that this network of mathematical relationships arises naturally via a probabilistic construction that makes no reference to physics and endows any probability distribution $f\left(x\right)$, $x\ge 0$ with the thermodynamic relationships shown here.

$$-\sum _{i}{p}_{i}log{p}_{i}\le log\mathsf{\Omega},$$

## 3. Theory

#### 3.1. Random Sampling

Consider the continuous probability distribution ${h}_{0}\left(x\right)\ge 0$, $x\in ({x}_{a},{x}_{b})$, normalized to unit area. We define a discrete grid ${x}_{i}={x}_{a}+(i-1)\Delta $ with $\Delta =({x}_{b}-{x}_{a})/K$, $i=1,2\cdots K+1$, such that the probability to sample a value of x in the ith interval is
if $\Delta $ is sufficiently small. We sample N values from ${h}_{0}$ and construct the frequency distribution $\mathbf{n}=({n}_{1},{n}_{2},\cdots )$, where ${n}_{i}$ is the number of sampled values that lie in the ith interval. The probability to observe distribution $\mathbf{n}$ in a random sample of size N is
and its logarithm is
where $\mathbf{p}=({p}_{1},{p}_{2}\cdots )$. We define $h\left({x}_{i}\right)={n}_{i}/N\Delta $ and take the limit $\Delta \to 0$, $N\to \infty $ in Equation (10). We then have $P\left(\mathbf{n}\right|\mathbf{p},N)\to \delta P\left(h\right|{h}_{0},N)$ and
where $\delta P\left(h\right|{h}_{0},N)$ is the probability to sample region $(h,h+\delta h)$ in the continuous space of distributions, while taking a random sample of size N from ${h}_{0}$ (all integrals are understood to be taken over the domain of ${h}_{0}$). Any probability distribution $h\left(x\right)$ defined in the domain of ${h}_{0}$ may materialize in a random sample taken from ${h}_{0}$. Clearly, the most probable distribution in this space is ${h}_{0}$ and indeed ${h}_{0}$ maximizes Equation (11). For all other distributions we must have $\delta P\left(h\right|{h}_{0},N)\le \delta P\left({h}_{0}\right|{h}_{0},N)=1$, or
with the equal sign only for $h={h}_{0}$. The probability in the limit $N\to \infty $ to obtain ${h}_{0}$ relative to the probability to obtain any other distribution is

$${p}_{i}={h}_{0}\left({x}_{i}\right)\Delta ,$$

$$P\left(\mathbf{n}\right|\mathbf{p},N)=N!\prod _{i}\frac{{p}_{i}^{{n}_{i}}}{{n}_{i}!},$$

$$logP\left(\mathbf{n}\right|\mathbf{p},N)=-\sum _{i}{n}_{i}log\frac{{n}_{i}}{{p}_{i}N}+O(logN),$$

$$\frac{log\delta P\left(h\right|{h}_{0},N)}{N}=-\int h\left(x\right)log\frac{h\left(x\right)}{{h}_{0}\left(x\right)}dx\doteq -D(h\parallel {h}_{0}),$$

$$D(h\parallel {h}_{0})\ge 0,$$

$$\frac{\delta P\left({h}_{0}\right|{h}_{0},N)}{\delta P\left(h\right|{h}_{0},N)}={e}^{ND(h\parallel {h}_{0})}\to \infty .$$

Accordingly, ${h}_{0}$ is overwhelmingly more probable than any other distribution in its domain.

These results make contact with a broader mathematical literature. The quantity $D(h\parallel {h}_{0})$ in Equation (11) is the relative entropy (Kullback-Leibler divergence) of distribution h with respect to ${h}_{0}$, and plays an important role in Information Theory [13,14,15]; Equation (12) is the Gibbs inequality, a well known property of relative entropy; the relationship between relative entropy and the probability of a sample drawn from ${h}_{0}$ is a known result in the theory of large deviations [16]. The key point we take from these results is that the process of sampling distribution ${h}_{0}$ establishes a probability space of distributions with the same domain as ${h}_{0}$—these are the distributions obtained as samples. The Gibbs inequality states the elementary fact that the most probable distribution in this space is ${h}_{0}$. We will now generalize this probability space and the Gibbs inequality.

#### 3.2. Biased Sampling

Random sampling always converges to the distribution from which the sample is taken; the probability of all other distributions vanishes as $N\to \infty $. We now modify the sampling process in order to obtain some different limiting distribution ${h}^{*}$ while still sampling from ${h}_{0}$. We do this by applying a bias, such that a random sample of size N from ${h}_{0}$ is accepted with probability proportional to $W\left[Nh\right]$, where $Nh$ is the frequency distribution of the sample and W is a functional with the homogeneous property $logW\left[Nh\right]=NlogW\left[h\right]$. We require homogeneity so that the limiting distribution is independent of N when $N\to \infty $. By virtue of homogeneity, $logW$ is written as
where $logw(x;h)$ is the variational derivative of $logW\left[h\right]$ with respect to h. The probability to obtain a sample with distribution h under this biased sampling is
where ${r}^{N}$ is a normalizing constant; the logarithm of this probability in the continuous limit is

$$logW[h]=\int h(x)logw(x;h)dx,$$

$$P\left(h\right|\mathbf{p},W,N)=\frac{W\left[N{h}_{i}\right]}{{r}^{N}}\left(N!\prod _{i}\frac{{p}_{i}^{{n}_{1}}}{{n}_{i}!}\right),$$

$$\frac{log\delta P\left(h\right|{h}_{0},W,N)}{N}=-\int h\left(x\right)log\frac{h\left(x\right)}{w(x;h){h}_{0}\left(x\right)}dx-logr.$$

We define the probability functional
so that the probability to observe a distribution within $(h,h+\delta h)$ in a biased sample taken from ${h}_{0}$ is $\delta P\left(h\right|{h}_{0},N)={\varrho}^{N}\left[h\right|{h}_{0},W]$. The ratio of the probability to sample the most probable distribution ${h}^{*}$ relative to that for any other distribution in the continuous limit is

$$log\varrho \left[h\right|{h}_{0},W]\doteq -\int h\left(x\right)log\frac{h\left(x\right)}{w(x;h){h}_{0}\left(x\right)}dx-logr,$$

$$\frac{\delta P\left({h}^{*}\right|{h}_{0},W,N)}{\delta P\left(h\right|{h}_{0},W,N)}={\left(\frac{\varrho \left[{h}^{*}\right|{h}_{0},W]}{\varrho \left[h\right|{h}_{0},W]}\right)}^{N}\to \infty .$$

As in random sampling, the most probable distribution is overwhelmingly more probable than any other feasible distribution. Then we must have
with the equal sign only for the most probable distribution ${h}^{*}$. This distribution is (see Supplementary Information).

$$\varrho \left[h\right|{h}_{0},W]\le 1,$$

$${h}^{*}\left(x\right)=w(x;{h}^{*})\frac{{h}_{0}\left(x\right)}{r},$$

#### 3.3. Canonical Sampling

We now choose the generating distribution ${h}_{0}$ to be the normalized exponential distribution with parameter $\beta $,
and write the probability functional $\varrho $ in Equation (17) as
where $q=r/\beta $ and $\overline{x}$ is the mean of $h\left(x\right)$. We call this probability space canonical. The probability of h is ${\varrho}^{N}\left[h\right|W,\beta ]$ and by the same argument that led to Equation (19) we now have

$${h}_{0}\left(x\right)=\beta {e}^{-\beta x};\phantom{\rule{1.em}{0ex}}0\le x<\infty ,$$

$$log\varrho \left[h\right|W,\beta ]=-\int h\left(x\right)log\frac{h\left(x\right)}{w(x;h)}dx-\beta \overline{x}-logq,$$

$$\varrho \left[h\right|W,\beta ]\le 1.$$

The equal sign defines the most probable distribution ${h}^{*}$; this distribution is

$${h}^{*}\left(x\right)=w(x;{h}^{*})\frac{{e}^{-\beta x}}{q}.$$

The parameter q is fixed by the normalization condition and satisfies

$$\overline{x}=-\frac{dlogq}{d\beta}.$$

(Details are given in Supplementary Information).

#### 3.4. Microcanonical Sampling

Next we define the microcanonical space as the subset of distributions with fixed mean $\overline{x}$. The generating distribution is again the exponential function, which we now write as
with $\overline{x}$ fixed. The probability to observe distribution h while sampling ${h}_{0}$ is still given by Equation (16) except that r is replaced with a new normalizing factor ${r}^{\prime}$. We define the microcanonical probability functional
with $log\omega =1+log\overline{x}+log{r}^{\prime}$ and write the probability of h as ${\varrho}^{N}\left[h\right|W;\overline{x}]$. The argument that produced Equations (19) and (23) now gives

$${h}_{0}\left(x\right)=\frac{{e}^{-x/\overline{x}}}{\overline{x}},$$

$$log\varrho \left[h\right|W,\overline{x}]=-\int h\left(x\right)log\frac{h\left(x\right)}{w(x;h)}dx-log\omega ,$$

$$\varrho \left[h\right|W,\overline{x}]\le 1.$$

This functional is maximized by the same distribution ${h}^{*}$ that maximizes the canonical functional, Equation (24), except that both q and $\beta $ are now Lagrange multipliers and are fixed by normalization and by the known mean $\overline{x}$. As in the canonical case, ${h}^{*}$ is overwhelmingly more probable than any other distribution in the microcanonical space and its mean satisfies Equation (25). We insert Equation (24) into (28) to obtain
where $S\left[{h}^{*}\right]$ is the Gibbs–Shannon entropy of the most probable distribution,

$$log\omega =S\left[{h}^{*}\right]+logW\left[{h}^{*}\right],$$

$$S\left[{h}^{*}\right]=-{\int}_{0}^{\infty}{h}^{*}\left(x\right)log{h}^{*}\left(x\right)dx.$$

Substituting Equation (24) for ${h}^{*}$ in (29) we obtain a relationship between $\omega $, $\beta $, q and $\overline{x}$:

$$log\omega =\beta \overline{x}+logq.$$

In combination with Equation (25), this result defines $log\omega \left(\overline{x}\right)$ as the Legendre transformation of $q\left(\beta \right)$ with respect to $\beta $. By the reciprocal property of the transformation we then have

$$\beta =\frac{dlog\omega}{d\overline{x}}.$$

Given Equation (31), the canonical probability functional in Equation (22) and the microcanonical functional in Equation (27) are seen to be the same. The difference is that in canonical maximization $\overline{x}$ is a floating parameter, whereas in the microcanonical maximization it is held constant. Both functionals are maximized by the same distribution and have the same $\beta $, q, $\omega $ at same $\overline{x}$: the two ensembles are equivalent. Finally, the maximization of the microcanonical functional implies that $\varrho [h;W,\overline{x}]$ is a concave functional in h. It follows that $log\omega $ is a concave function of $\overline{x}$, therefore we must have

$$\frac{{d}^{2}log\omega}{d{\overline{x}}^{2}}=\frac{d\beta}{d\overline{x}}\le 0.$$

The details are shown in Supplementary Information.

## 4. Generalized Statistical Thermodynamics

These results can be summarized in the form of the following theorem:

**Theorem**

**1.**

Given normalized distribution $f\left(x\right)$, $x\ge 0$, with mean $\overline{x}$, it is possible to construct a functional W such that:

(a) All distributions $h\left(x\right)$, $x\ge 0$, with mean $\overline{x}$ satisfy the inequality
with the equal sign only for $h=f$, a condition that defines ω;

$$logW\left[h\right]-{\int}_{0}^{\infty}h\left(x\right)logh\left(x\right)dx\le log\omega $$

(b) f can be expressed in canonical form as
where $logw$ is the variational derivative of $logW\left[f\right]$; and

$$f\left(x\right)=w\left(x\right)\frac{{e}^{-\beta x}}{q},$$

(c) parameters $\overline{x}$, β, q and ω satisfy

$$\begin{array}{c}\overline{x}=-\frac{dlogq}{d\beta},\end{array}$$

$$\begin{array}{c}\beta =\frac{dlog\omega}{d\overline{x}},\end{array}$$

$$\begin{array}{c}log\omega =\beta \overline{x}+logq,\end{array}$$

$$\begin{array}{c}\frac{{d}^{2}log\omega}{d{\overline{x}}^{2}}\le 0.\end{array}$$

The existence of W is established by the fact that the functional
satisfies the theorem. This is a linear functional whose derivative is $logf$ for all h. More generally, any homogeneous functional $logW\left[h\right]$ of degree 1, linear or non-linear, whose derivative at $h=f$ is given by
where ${a}_{0}$ and ${a}_{1}$ satisfy
but are otherwise arbitrary, also satisfies the theorem. The inequality in Equation (39) follows from the concave requirement that ensures the maximization of Equation (34).

$$logW\left[h\right]={\int}_{0}^{\infty}h\left(x\right)logf\left(x\right)dx,$$

$${\left.\frac{\delta logW\left[h\right]}{\delta h}\right|}_{h=f}=logf\left(x\right)+{a}_{0}+{a}_{1}x\doteq logw\left(x\right),$$

$$\frac{d{a}_{0}}{d{a}_{1}}=-\overline{x},$$

We recognize Equation (35) as the canonical distribution of statistical mechanics, Equations (36)–(38) and (33), which relate its parameters, as the core set of thermodynamic relationships, and Equation (34) as the inequality of the second law. The probabilistic interpretation is that any distribution f may be obtained as the most probable distribution under a probability measure defined via a suitable functional W. Whereas in statistical mechanics the central stochastic variable is the mechanical microstate, in generalized thermodynamics it is the probability distribution itself. Thermodynamics may be condensed into the microcanonical inequality in Equation (34), a generalized expression of the second law that defines the most probable distribution in the microcanonical space. All relationships between $\omega $ (microcanonical partition function), q (canonical partition function), $\beta $ (generalized inverse temperature) and $\overline{x}$ follow from the maximization of this inequality and have equivalents in familiar thermodynamics. The derivatives $dlogq/d\beta $ and $dlog\omega /d\overline{x}$ in Equations (36) and (37) may be viewed as equations of change along a path (“process”) in the space of distributions under fixed bias W. This path is described parametrically in terms of $\overline{x}$ and represents a nonstationary stochastic process. We call this process quasistatic—a continuous path of distributions that maximize locally the thermodynamic functional.

#### 4.1. Contact with Statistical Mechanics

The obvious way to make contact with statistical mechanics is to take f to be the probability of microstate at fixed temperature, volume and number of particles. The postulate of equal a priori probabilities fixes the selection functional and its derivative to $W=w=1$; if we identify x as the energy ${E}_{i}$ of microstate i, $\beta $ as $1/{k}_{B}T$, q as the thermodynamic canonical partition function, $\omega $ as the thermodynamic microcanonical partition function, Equations (24)–(33) map to standard thermodynamic relationships. From Equation (29) we obtain $\varrho ={e}^{S\left[h\right]}/\omega $: the canonical probability f maximizes entropy and thus we obtain the second law.

This is not the only way to establish contact with statistical mechanics. We may choose f to be some other probability distribution, for example, the probability to find a macroscopic system of fixed $(T,V,N)$ at energy E. We write the energy distribution in the form of Equation (24) with w, $\beta $ and q to be determined. From Equations (25), (31) and (32) with $\overline{x}=\overline{E}$ we make the identifications $\beta \to 1/{k}_{B}T$, $logq\to -F/{k}_{B}T$ (free energy), $log\omega \to $ thermodynamic entropy. To identify w we require input from physics and this comes via the observation that the probability density of macroscopic energy E is asymptotically a Dirac delta function at $E=\overline{E}$. Then $S\left[f\right]=0$ (this is the entropy of the energy distribution, not to be confused with thermodynamic entropy). From Equations (14) and (29) we find $logW\left[f\right]=logw(x;f)=log\omega $, and conclude that $log\omega $ is the thermodynamic entropy. This establishes correspondence between generalized thermodynamics and macroscopic (classical) thermodynamics. If we further postulate, again motivated by physics, that $w\left(E\right)$ is the number of microstates under fixed volume and number of particles, we establish the microscopic connection. Since $f\left(E\right)$ is proportional to the number of microstates with energy E and individual microstates are unobservable, we may as well ascribe equal probability to all microstates. Thus we recover the postulate of equal a priori probabilities (statistical thermodynamics). Finally, by adopting a physical model of microstate, classical, quantum or other, we obtain classical statistical mechanics, quantum statistical mechanics or yet-to-be-discovered statistical mechanics, depending on the model. In all cases the thermodynamic calculus is the same, only the enumeration of microstates—that is, W— depends on the physical model.

#### 4.2. What is W?

Once the selection functional W is specified the most probable distribution is fixed and all canonical variables become known functions of $\overline{x}$. But what is W? The selection functional is a placeholder for our knowledge, hypotheses and model assumptions about the stochastic processes that gives rise to the probability distribution of interest. This knowledge fully specifies the distribution. The opposite is not true: given distribution f there is an infinite number of functionals W that produce that distribution as the most probable distribution in their probability space. This nonuniquness is a feature, not a bug: it allows models that are quite different in their details to produce the same final distribution. Here is an example. The unbiased functional $W\left[h\right]=w\left(x\right)=1$ produces the exponential distribution
with canonical parameters

$${h}^{*}\left(x\right)=\frac{{e}^{-\beta x}}{q},$$

$$\beta =1/\overline{x},\phantom{\rule{1.em}{0ex}}q=\overline{x},\phantom{\rule{1.em}{0ex}}log\omega =1+log\overline{x}.$$

Now consider the nonlinear selection functional
whose logarithm is equal to entropy. The corresponding microcanonical probability functional is obtained by inserting this into Equation (27),
and is maximized by (see Supplementary Information)
with

$$logW\left[h\right]=S\left[h\right]=-{\int}_{0}^{\infty}h\left(x\right)logh\left(x\right)dx,$$

$$log\varrho \left[h\right|W,\overline{x}]=-2{\int}_{0}^{\infty}h\left(x\right)logh\left(x\right)dx-log\omega $$

$${h}^{*}\left(x\right)=w\left(x\right)\frac{{e}^{-\beta x}}{q},$$

$$w\left(x\right)=\overline{x}{e}^{x/\overline{x}},\phantom{\rule{3.33333pt}{0ex}}\beta =2/\overline{x},\phantom{\rule{3.33333pt}{0ex}}q={\overline{x}}^{2},\phantom{\rule{3.33333pt}{0ex}}log\omega =2+2log\overline{x}.$$

We have arrived at the exponential distribution, the same distribution that is obtained by the unbiased functional $w\left(x\right)=1$, but with different canonical parameters because the probability space from which it arises is now different. If all we know is that the probability distribution in a stochastic process is exponential, it is not possible to determine whether it was obtained using $W\left[h\right]=1$, $W\left[h\right]={e}^{S\left[h\right]}$, or any other functionals that is capable of reproducing the exponential distribution. While the selection bias identifies the most probable distribution uniquely, the opposite is not true.

The selection functional represents external input to thermodynamics and is fixed by the rules that govern the stochastic process that produces the distribution in question. In the case of statistical mechanics it is fixed by the postulate of equal a priori probabilities. In another example, recently given for stochastic binary clustering, it is fixed by the aggregation kernel, a function that determines the aggregation probability between clusters of different sizes [17]. The selection functional is the contact point between generalized statistical thermodynamics—a mathematical theory for generic distributions—and physics, i.e., our knowledge in the form of model assumptions and postulates about the process that gives rise to the observed distribution. It is interesting to point out that the variational derivative w in Equation (27) appears in the form of Bayesian prior [18]. In the context of generalized thermodynamics w is not a prior distribution—although it might if ${a}_{0}={a}_{1}=0$ in Equation (41). In general, w is a non normalizable derivative of the functional that represents our knowledge about the process, an improper prior that points nonetheless to a proper distribution.

## 5. Thermodynamic Sampling of Distributions

We have shown that any distribution $f\left(x\right)$ defined in ${\mathbb{R}}_{+}$ can be viewed as the most probable distribution in an appropriately constructed probability space. Here we will show that any distribution f in this domain can be obtained as the equilibrium distribution of reacting clusters under an appropriately constructed equilibrium constant. Consider a population of M identical particles (“monomers”) distributed into N clusters and let $\mathbf{m}=({m}_{1},{m}_{2}\cdots ,{m}_{N})$ be an ordered list of N cluster masses with total mass M such that ${m}_{k}$ is the mass of the kth cluster in the list (“configuration”). The complete set of configurations with N clusters and total mass M comprises the cluster ensemble $(M,N)$. Let $\mathbf{n}=({n}_{1},{n}_{2}\cdots )$ be the size distribution of the clusters in configuration $\mathbf{m}$ such that ${n}_{i}$ is the number of clusters with i monomers. With $M,N\to \infty $ at fixed $M/N=\overline{x}$, the cluster ensemble contains every discrete distribution ${h}_{i}={n}_{i}/N$ with mean $\overline{x}$. We now construct the following stochastic process: given a configuration $\mathbf{m}$, pick two clusters at random, merge them, then split them back into two clusters at random. This amounts to an exchange of mass between two clusters that is represented schematically by the reaction
and transforms the parent configuration $\mathbf{m}$ into an new configuration ${\mathbf{m}}^{\prime}$ with the same number of clusters N and total mass M. This process may also be represented as a reaction that transforms a parent configuration into an offspring,

$${m}_{i}+{m}_{j}\stackrel{\phantom{\rule{3.33333pt}{0ex}}}{\to}{m}_{i}^{\prime}+{m}_{j}^{\prime}$$

$$\mathbf{m}\stackrel{K}{\to}{\mathbf{m}}^{\prime}.$$

We define the equilibrium constant of this reaction as
where ${\mathbf{n}}^{\prime}$ and $\mathbf{n}$ are the cluster size distributions of the product and reactant configurations, respectively, and $W\left(\mathbf{n}\right)$ is the selection functional applied to distribution $\mathbf{n}$. The change $\delta \mathbf{n}$ of the corresponding distributions upon the exchange reaction is a change of $-1$ in the number of cluster masses ${m}_{i}$ and ${m}_{j}$ on the reactant side, and $+1$ for cluster masses on the product side. By virtue of the homogeneous property of $logW$, its change for large M and N is a differential that can be expressed in terms of the derivatives of $logW$
where $logw$ is the functional derivative of $logW$ evaluated in distribution $\mathbf{n}$. Using this result the equilibrium constant becomes

$${K}_{\mathbf{m}\to {\mathbf{m}}^{\prime}}=\frac{W\left({\mathbf{n}}^{\prime}\right)}{W\left(\mathbf{n}\right)},$$

$$logW\left({\mathbf{n}}^{\prime}\right)-logW\left(\mathbf{n}\right)=-logw\left({m}_{i}\right)-logw\left({m}_{j}\right)+logw\left({m}_{i}^{\prime}\right)+logw\left({m}_{j}^{\prime}\right),$$

$${K}_{\mathbf{m}\to {\mathbf{m}}^{\prime}}=\frac{w\left({m}_{i}^{\prime}\right)w\left({m}_{j}^{\prime}\right)}{w\left({m}_{i}\right)w\left({m}_{j}\right)}.$$

This has the standard form of an equilibrium constant for the reaction in Equation (49). We may identify $w\left(x\right)$ as the “fugacity” of species x and “species” as a cluster with mass x. The reaction can be simulated by Monte Carlo using the Metropolis transition probabilities
where $\mathtt{rnd}$ is a uniform random number in $(0,1)$. This forms a reducible Markov process that samples the microcanonical space of distribution $\mathbf{n}$ with fixed zeroth order moment N and first moment M. Its stationary distribution is [19]
where $logw\left(x\right)$ is the functional derivative of $logW$ evaluated at $h={h}^{*}$ and the parameters $\beta $ and q are obtained by solving the set of equations

$${P}_{\mathbf{n}\to {\mathbf{n}}^{\prime}}=\left\{\begin{array}{cc}\mathtt{rnd}\hfill & \mathrm{if}\phantom{\rule{4.pt}{0ex}}\mathtt{rnd}\le {K}_{\mathbf{n}\to {\mathbf{n}}^{\prime}},\hfill \\ 1\hfill & \mathrm{if}\phantom{\rule{4.pt}{0ex}}\mathtt{rnd}>{K}_{\mathbf{n}\to {\mathbf{n}}^{\prime}},\hfill \end{array}\right.$$

$${h}^{*}\left(x\right)=w\left(x\right)\frac{{e}^{-\beta x}}{q}.$$

$$\begin{array}{c}q={\int}_{0}^{\infty}w\left(x\right){e}^{-\beta x}dx,\end{array}$$

$$\begin{array}{c}\overline{x}=\frac{1}{q}{\int}_{0}^{\infty}xw\left(x\right){e}^{-\beta x}dx.\end{array}$$

With $W\left[h\right]=w\left[x\right]=1$ we obtain the exponential distribution, which implies that the exchange reaction with equilibrium constant $K=1$ for all transitions is equivalent to unbiased sampling from an exponential distribution with fixed mean $\overline{x}=M/N$.

Once the selection functional W is given the most probable distribution is fixed and may be obtained either by simulation or in many cases analytically. We will now construct W such that the most probable distribution is any distribution f defined in ${\mathbb{R}}_{+}$. We construct the linearized selection functional
with w from Equation (41), which we write in the form
and ${a}_{0}$ and ${a}_{1}$ arbitrary constants. It is easy to show that the selection of ${a}_{0}$ and ${a}_{1}$ is immaterial because both constants drop out of Equation (53). If we choose ${a}_{0}={a}_{1}$, then $w\left(x\right)=f\left(x\right)$; alternatively, we may choose these constants so as to obtain simpler forms for $w\left(x\right)$. We demonstrate the construction of w with three examples using the exponential, the Weibull, and the uniform distribution.

$$logW\left[h\right]={\int}_{0}^{\infty}h\left(x\right)logw\left(x\right)dx$$

$$w\left(x\right)=f\left(x\right){e}^{{a}_{0}+{a}_{1}x}$$

- Exponential distribution$$f\left(x\right)={e}^{-x/\overline{x}}/\overline{x}.$$The function w is$$w\left(x\right)=\frac{{e}^{-x/\overline{x}+{a}_{0}+{a}_{1}}}{\overline{x}}.$$Choosing ${a}_{0}=log\overline{x}$, ${a}_{1}=1/\overline{x}$ we obtain ${w}_{\mathrm{exp}}\left(x\right)=1$, which represents the unbiased selection functional.
- Weibull distribution$$f\left(x\right)=\left(\frac{k}{\lambda}\right){\left(\frac{x}{\lambda}\right)}^{k-1}{e}^{-{(x/\lambda )}^{k}}.$$Using ${a}_{0}=klog\lambda -logk$ and ${a}_{1}=0$ in Equation (59) we obtain$${w}_{\mathrm{Weibull}}\left(x\right)={x}^{k-1}{e}^{-{(x/\lambda )}^{k}}.$$
- Uniform distribution$$f\left(x\right)=\left\{\begin{array}{cc}1/(b-a)\hfill & a\le x\le b\hfill \\ 0\hfill & \mathrm{otherwise}.\hfill \end{array}\right.$$With ${a}_{0}={a}_{1}=0$ we obtain$${w}_{\mathrm{uniform}}\left(x\right)=f\left(x\right).$$

We implement thermodynamic sampling using Monte Carlo. We begin with an ordered list of N integers $i>0$ whose sum is M. We then pick two numbers at random and implement a random exchange reaction to produce a new pair of integers with the same combined sum. The new pair replaces the old with acceptance probability computed according to Equation (54) using ${K}_{\mathrm{eq}}$ from Equation (53) and the function $w\left(x\right)$ obtained above. Following a successful trial we calculate the distribution of the current configuration. The mean distribution is obtained by averaging over a large number of trials. For these simulations $N=100$, $M=3000$, $\overline{x}=30$, and the mean distribution is calculated over 20,000 trials. As we discuss elsewhere, the mean distribution and the most probable distribution converge to each other unless the system exhibits phase separation [17,19,20]. The results in Figure 1 make it clear that thermodynamic sampling converges indeed to the distribution for which the w function was derived. Any discrete distribution ${h}_{i}$, and with proper scaling, any continuous distribution $h\left(x\right)$, may be associated with the equilibrium distribution of reacting clusters under a suitable equilibrium constant.

The selection functionals constructed by the procedure discussed here apply the variational derivative at f to all distributions h, i.e., they are linearized at the most probable distribution. Any nonlinear functional $logW$ with the same derivative at $h=f$ will produce the same distribution as the stationary distribution under exchange reactions. One example is the entropic functional in Equation (45), a nonlinear functional that produces the exponential distribution. Even though the entropic and unbiased functionals both produce the same distribution (Figure 2a), their corresponding ensembles are distinctly different because each functional assigns different probabilities to the distributions of the ensemble. This difference can be seen in the fluctuations (Figure 2b). The entropic functional is more selective than the unbiased, which picks configurations with equal probability. Accordingly, fluctuations in the entropic ensemble have narrower distribution. This can be clearly seen in Figure 2b that shows the fluctuations in the number of monomers for the entropic and the unbiased functionals.

## 6. Conclusions

Stripped to its core, what we call statistical thermodynamics is a mapping between a probability distribution f and a set of functions, $\{w,\beta ,q,\omega \}$ from which the distribution may be reconstructed. What we call classical thermodynamics is the set of relationships among $\{\beta ,q,\omega ,\overline{x}\}$—relationships that are the same for all distributions. What we call second law is the variational condition that identifies the most probable distribution in the domain of feasible distributions. What we call quasistatic process is a path in the space of distributions under fixed W. Physics enters through W. This generic mathematical formalism applies to any distribution. To use an analogy, thermodynamics is a universal grammar that becomes a language when applied to specific problems. It is a fitting coincidence—or perhaps an inevitable consequence—that it was the human desire to maximize the amount of useful work in the steam engine that would eventually make contact with the variational foundation of thermodynamics. Gibbs’s breakthrough was to connect thermodynamics to a probability distribution, and that of Shannon and Jaynes to transplant it outside physics. In the time since, the vocabulary of statistical thermodynamics has felt intuitively familiar across disciplines in a déjà vu sort of manner, even as its grammar remained undeciphered. This intuition can now be understood: The common thread that runs through every discipline that has adopted the thermodynamic language is an underlying stochastic process, and where there is probability, there is statistical thermodynamics.

## Supplementary Materials

The following are available online at https://www.mdpi.com/1099-4300/21/9/890/s1.

## Funding

This research received no external funding.

## Conflicts of Interest

The authors declare no conflict of interest.

## References

- The Concise Oxford English Dictionary, 11th ed.; Oxford University Press: Oxford, UK, 2008.
- Callen, H. Thermodynamics and Introduction to Thermostatistics, 2nd ed.; Wiley: Hoboken, NJ, USA, 1985. [Google Scholar]
- Hill, T.L. Statistical Mechanics Principles and Selected Applications; Reprint of the 1956 Edition; Dover: Mineola, NY, USA, 1987. [Google Scholar]
- Huang, K. Statistical Mechanics, 2nd ed.; Wiley: New York, NY, USA, 1963. [Google Scholar]
- Gibbs, J.W. Elementary Principles in Statistical Mechanics; Reprint of the 1902 Edition; Ox Bow Press: Woodbridge, CT, USA, 1981. [Google Scholar]
- Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J.
**1948**, 27, 379–423. [Google Scholar] [CrossRef] - Jaynes, E.T. Information Theory and Statistical Mechanics. Phys. Rev.
**1957**, 106, 620–630. [Google Scholar] [CrossRef] - Albert, R.; Barabási, A.L. Statistical mechanics of complex networks. Rev. Mod. Phys.
**2002**, 74, 47–97. [Google Scholar] [CrossRef] - Harte, J.; Zillio, T.; Conlisk, E.; Smith, A.B. Maximum Entropy and the State-Variable approach to macroecology. Ecology
**2008**, 89, 2700–2711. [Google Scholar] [CrossRef] [PubMed] - Durrett, R. Stochastic Spatial Models. SIAM Rev.
**1999**, 41, 677–718. [Google Scholar] [CrossRef] - Timme, N.M.; Lapish, C. A Tutorial for Information Theory in Neuroscience. eNeuro
**2018**. [Google Scholar] [CrossRef] [PubMed] - Voit, J. The Statistical Mechanics of Financial Markets; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar] [CrossRef]
- Kullback, S.; Leibler, R.A. On Information and Sufficiency. Ann. Math. Statist.
**1951**, 22, 79–86. [Google Scholar] [CrossRef] - Gray, R.M. Entropy and Information Theory, 2nd ed.; Springer: New York, NY, USA, 2011. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
- Touchette, H. The large deviation approach to statistical mechanics. Phys. Rep.
**2009**, 478, 1–69. [Google Scholar] [CrossRef] - Matsoukas, T. Statistical Thermodynamics of Irreversible Aggregation: The Sol-Gel Transition. Sci. Rep.
**2015**, 5, 8855. [Google Scholar] [CrossRef] [PubMed] - Jaynes, E. Prior Probabilities. IEEE Trans. Syst. Sci. Cybern.
**1968**, 4, 227–241. [Google Scholar] [CrossRef] - Matsoukas, T. Statistical thermodynamics of clustered populations. Phys. Rev. E
**2014**, 90, 022113. [Google Scholar] [CrossRef] [PubMed] - Matsoukas, T. Abrupt percolation in small equilibrated networks. Phys. Rev. E
**2015**, 91, 052105. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**The exchange reaction transfers mass between two clusters and samples the space of all distributions with fixed number of clusters N and fixed total number of monomers M. We may construct the equilibrium constant of this reaction so as to to obtain any desired equilibrium distribution. Any distribution $f\left(x\right)$, $x\ge 0$, may be obtained as the equilibrium distribution. In this example we obtain (

**a**) the exponential distribution; (

**b**) the Weibull distribution with $\lambda =33.8514$, $k=2$; and (

**c**) the uniform distribution between $a=20$ and $b=40$. In all cases $\overline{x}=30$.

**Figure 2.**(

**a**) The entropic selection functional, $W\left[h\right]={e}^{S\left[h\right]}$, and the unbiased functional, $W\left[h\right]=1$, both produce the same equilibrium distribution (exponential). Nonetheless the two selection functionals represent distinctly different ensembles, as can be seen in fluctuations of the number of monomers (

**b**). The entropic functional is more selective than the unbiased and produces a tighter distribution of fluctuations.

© 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).