#### 2.1. The problem of macroscopic prediction

The practical problem we are trying to address is how to predict the macroscopic behaviour of non-equilibrium physical systems (e.g., the large-scale structure of turbulent heat flow within the Earth’s atmosphere and oceans, or the mean fluid velocity and temperature fields in a Rayleigh-Bénard cell). One approach would be to try to calculate the microscopic trajectory of the system directly, by (numerically) integrating the microscopic equations of motion forward in time from some initial microstate, typically requiring some approximate computational scheme involving discretization in space and time. Practical challenges here include the treatment of sub-grid scale dynamics, symmetry-breaking of the equations of motion by the computational scheme [

14], and the rapid increase in computational costs with the number of microscopic degrees of freedom

N. Therefore one might ask whether there is not an alternative approach.

Nature herself suggests that there is: at least for large

N, macroscopic behaviour can be accurately reproduced experimentally through the control of a relatively small number of macroscopic degrees of freedom, implying that the initial microstate—which is not under experimental control—must be largely irrelevant to the experimental result. This remarkable degree of

microscopic redundancy has a simple explanation—it must be that the overwhelming majority of possible initial microstates lead to microscopic trajectories that look essentially the same on macroscopic scales [

10,

11]. Hence, as far as macroscopic behaviour is concerned, it does not matter in which one of those initial microstates the system finds itself. Given this remarkable empirical fact, there ought to be a more efficient way to predict macroscopic behaviour directly from a tractably reduced set of physical assumptions and constraints—what one might call the

essential physics governing the macroscopic phenomena under investigation. The essential physics should include not only the relevant experimental control parameters and other macroscopic constraints that are relatively straightforward to identify, but also certain general features of the microscopic equations of motion (e.g., conservation laws, the energy spectrum of initial states) which manifest themselves at the macroscopic scale, and which might be the subject of conjecture as in the historical example of blackbody radiation. The essential physics is not universal but will depend on the macroscopic phenomenon under investigation, including its spatial or temporal resolution. Two practical challenges, then, are how to identify the essential physics in any given problem, and how to make macroscopic predictions from it.

#### 2.2. MaxEnt: an inference algorithm with no physical content

We may view Jaynes’ information-based formulation of statistical mechanics as a response to these two challenges, within which the MaxEnt algorithm plays a central role. Broadly speaking, statistical mechanics takes an informed guess as to the essential physics and applies MaxEnt to make macroscopic predictions from that guess. Disagreement between prediction and experiment informs a new guess and so on until, by trial and error, acceptable agreement is reached.

Operationally, the MaxEnt algorithm is very simple, although its interpretation remains a subject of controversy. As a concrete and familiar example, consider the application of MaxEnt to a closed system in thermal equilibrium with a heat bath at temperature

T. By closed we mean that no matter enters or leaves the system, but energy can be transferred between the system and the heat bath. In equilibrium, only the mean energy

U of the system is fixed; the actual energy

E of the system fluctuates about

U. Let

E_{i} denote the system energy in microstate

i. Here the given physical information consists of

U, plus knowledge of (or assumptions about) the spectrum of possible microstates

i and their energies

E_{i}. Given this information (or guess), MaxEnt makes macroscopic predictions by first assigning a probability

p_{i} that the system will be in microstate

i and then taking expectation values over

p_{i}. In its simplest version [

12], MaxEnt constructs

p_{i} by maximising the Shannon information entropy:

with respect to the

p_{i}, subject to the mean energy constraint:

and the normalisation constraint:

The solution to this constrained optimisation problem is:

where

$Z\left(\lambda \right)={\sum}_{i}{e}^{-\lambda {E}_{i}}$
is a normalisation factor (partition function) and λ is a Lagrange multiplier associated with the mean energy constraint. The value of λ is obtained as a function of

U by solving Equation 2, which is equivalent to:

The value of the maximised Shannon entropy is then:

which is a function of

U alone since

$\lambda =\lambda \left(U\right)$ via Equation 5. Contact with traditional thermodynamics is established through the identifications

$kT=1/\lambda $,

$S=k{H}_{\text{max}}$ and

$A=-kT\text{ln}Z$, where

k is Boltzmann’s constant,

T is the temperature,

S is the thermodynamic entropy, and

A is the Helmholtz free energy. Then Equation 4 is Gibbs’ canonical distribution and Equation 6 is the standard thermodynamic relation

$A=U-TS$. Once we have constructed the MaxEnt distribution

p_{i}, the prediction of any observable

x can be obtained from the expectation value

$X={\displaystyle {\sum}_{i}{p}_{i}{x}_{i}}$ (where

x_{i} = value of

x in state

i). Actually we can construct the entire probability distribution

p(

x), from

$p\left(x\right)={\displaystyle {\sum}_{i}{p}_{i}\delta \left(x-{x}_{i}\right)}$ (where

$\delta \left(x\right)$ is the Dirac delta function), and therefore MaxEnt predicts statistical fluctuations about the mean, not just the mean value

X itself.

What then is MaxEnt? What are we doing when we apply it? And what does it mean when MaxEnt predictions agree or disagree with experiment? In Jaynes’ interpretation, MaxEnt is a passive inference algorithm that converts given information into predictions; it has no physical content itself (

Figure 1a). This point may be understood as follows. In information terms,

H represents the missing information (or uncertainty) about which microstate the system is in [

12]. By maximising

H subject to the given physical information, that information and no more is encoded into

p_{i}. Any other distribution

${p}_{i}^{\prime}$ reflects more information (less uncertainty) than is actually given. Therefore, the statistical distribution

p(

x) of any observable

x, constructed as above from the MaxEnt microstate distribution

p_{i}, faithfully reflects the given physical information—no more, no less. Then, when the predicted

p(

x) differs from the experimental one, it means that the given or assumed physical information is erroneous or incomplete relative to the essential physics governing the statistical behaviour of

x. Conversely, when MaxEnt predictions of

p(

x) agree with experiment, it means that the given information captures the essential physics governing the statistical behaviour of

x, all other information being irrelevant.

Experimental falsification of MaxEnt is thus a meaningless concept because, in Jaynes’ interpretation, MaxEnt is not a physical principle. When MaxEnt predictions disagree with experiment, it is the assumed essential physics (the message) that is falsified, not MaxEnt (the messenger). Jaynes’ interpretation also makes it clear that MaxEnt can be applied to any system, equilibrium or non-equilibrium, physical, biological or otherwise. In all cases, MaxEnt makes no claim as to the reality of its predictions; its role within the general programme of statistical mechanics is to ensure that the assumed essential physics (and no other physical assumptions) is faithfully represented in the predictions that are compared with experiment. MaxEnt thus plays a central role in the identification of the actual essential physics, which would otherwise be obscured by the implicit introduction of extra physical assumptions.

Nevertheless, one may still question whether MaxEnt is the appropriate algorithm to use in the first place. For example, maximum

relative entropy (MaxREnt) is the appropriate generalisation of MaxEnt when the microstates are not

a priori equiprobable (e.g., [

15]), while Maximum Probability (e.g., [

16]) is a combinatorial approach to statistical mechanics which differs significantly from MaxEnt (or MaxREnt) when

N is small (

Section 2.3). However, that question falls within the realm of statistical inference, not physics.

#### 2.3. Does MaxEnt require N to be large?

This question arises at both a theoretical and a practical level. As an alternative to Jaynes’ information-based formulation of statistical mechanics, the Maximum Probability (MaxProb) formulation (e.g., [

16]) is a combinatorial approach which takes as its basic concept the number of ways

W that a given macrostate can be realised microscopically. For example, the number of ways in which

N distinguishable entities can be assigned to

M distinguishable states such that there are

n_{i} entities in state

i is given by the multinomial coefficient:

The macrostate {

n_{i}} predicted by MaxProb is the one for which

W is maximal, subject to given physical constraints. MaxProb is just Boltzmann’s state-counting principle [

17] extended to all values of

N (not just large

N); in contrast, MaxEnt (without its information-based interpretation) is the statistical principle adopted by Gibbs [18], as illustrated in

Section 2.2 The mathematical connection between MaxProb and MaxEnt occurs in the limit of large

N and large

n_{i}, for then Stirling’s approximation (

$\text{ln}N!\approx N\text{ln}N$ and

$\text{ln}{n}_{i}!\approx {n}_{i}\text{ln}{n}_{i}$) yields the asymptotic result:

where the state probability

${p}_{i}$ is defined as

${p}_{i}={n}_{i}/N$,

i.e. the occupation frequency of state

i. Within the MaxProb formulation of statistical mechanics, therefore, MaxEnt is an approximation to MaxProb that is only valid for large

N (and large

n_{i}), and one must apply corrections to MaxEnt when

N is small (or when statistics other than multinomial are assumed) [

19,

20]. Conversely, if one accepts MaxEnt as the fundamental basis of statistical mechanics, it is the validity of MaxProb that is restricted to large

N. MaxProb and MaxEnt also differ conceptually in their interpretation of

p_{i}. In MaxProb,

p_{i} is a physical frequency (a property of the real world) while in MaxEnt

p_{i} is a Bayesian probability (describing our objective state of knowledge about the real world based on objectively-prescribed information) [

12].

Therefore, if one adopts MaxProb as the fundamental basis of statistical mechanics, there is a theoretical issue regarding the validity of MaxEnt for small N. In this paper, however, I am considering MEP within the framework of Jaynes’ information-based formulation of statistical mechanics, in which the fundamental concept is not W (Equation 7) but the Shannon information entropy H (Equation 1). Within Jaynes’ formulation, there is nothing intrinsic to the MaxEnt algorithm which demands that N be large; as we have seen, MaxEnt has no physical content.

Nevertheless there remains the practical issue of whether MaxEnt still requires the assumption that

N is large as part of the given physical information, if its predictions are ever to agree with experiment. Since MaxEnt predicts the probability distribution

p(

x) of a physical observable

x, not just its mean value

$X={\displaystyle {\sum}_{i}{p}_{i}{x}_{i}}$, it is pertinent here to consider how

N affects the predicted and experimental

p(

x). If

N is known to be large then, as a general rule, MaxEnt predicts that

p(

x) is sharply peaked about the mean

X (the relative standard deviation declining typically as

$1/\sqrt{N}$). An exception to this rule occurs at critical phase transitions (e.g., at the Curie temperature of a magnet). Even when

N is of the order of Avogadro’s number MaxEnt predicts that, at or near the transition point, the so-called macroscopic order parameter

x (e.g., the magnetisation) will fluctuate widely about its mean value—

p(

x) is very broad—and, indeed, the detailed nature of the predicted fluctuations (obtained from renormalisation group analyses of the MaxEnt solution [

21]) has been confirmed experimentally (e.g.,

via the divergence of the magnetic susceptibility). The success of MaxEnt here reveals the essential physics governing the critical fluctuations (e.g., for a magnet at the Curie temperature, the thermal energy per degree of freedom is of the same order as the interaction energy between individual magnetic moments, resulting in a highly sensitive balance between the disordering effect of thermal agitation and the ordering effect of the magnetic interaction).

If

N is known to be small, then typically MaxEnt will predict large relative fluctuations in all observables—again,

p(

x) is broad. But this is also what one observes experimentally; as

N decreases, the experimental behaviour becomes less reproducible under given experimental conditions as fluctuations about the mean behaviour become more important (e.g., [

22]).

**Figure 2.**
Different scenarios for MaxEnt-predicted (

) vs. experimental (

) probability distributions

p(

x) of some macroscopic observable

x.

N = number of microscopic degrees of freedom. Disagreement between the predicted and observed

p(

x) signals either (a) erroneous or (b) incomplete physics. Agreement between predicted and observed

p(

x) reveals the essential physics governing the statistical behaviour of

x, whether that behaviour (c) is sharply-peaked about the mean or (d) involves large fluctuations. MaxEnt remains practically useful whether

N is large or small.

**Figure 2.**
Different scenarios for MaxEnt-predicted (

) vs. experimental (

) probability distributions

p(

x) of some macroscopic observable

x.

N = number of microscopic degrees of freedom. Disagreement between the predicted and observed

p(

x) signals either (a) erroneous or (b) incomplete physics. Agreement between predicted and observed

p(

x) reveals the essential physics governing the statistical behaviour of

x, whether that behaviour (c) is sharply-peaked about the mean or (d) involves large fluctuations. MaxEnt remains practically useful whether

N is large or small.

Thus, regardless of whether

N is large or small, the predicted

p(

x) is the “best” possible one in the sense that it faithfully reflects the given physical information (which includes the size of

N) and no more.

Figure 2 depicts various scenarios for the relation between the MaxEnt-predicted and experimental

p(

x). When there is disagreement between the predicted and experimental

p(

x)—whether sharply-peaked or broad—it means that the given or assumed information is either erroneous (if the predicted and observed

p(

x) are both sharply-peaked but mismatched,

Figure 2a), or incomplete (if the predicted

p(

x) is broad but the observed

p(

x) is sharply-peaked,

Figure 2b). One can also envisage a scenario (not shown) in which the predicted

p(

x) is narrower than the experimental one, indicating an over-constrained set of physical assumptions. Conversely, agreement between the predicted and experimental

p(

x)—whether sharply-peaked or broad—means that the given or assumed information captures the essential physics governing the statistical behaviour of

x (

Figure 2c,

Figure 2d).

In summary, MaxEnt remains practically useful whether

N is large or small, because MaxEnt predicts fluctuations as well as mean values, and fluctuations can be an important aspect of the macroscopic phenomena under investigation, especially for small

N. However, MaxEnt is generally more efficient in exposing erroneous physical assumptions when

N is large, for then generally both the predicted and observed

p(

x) are sharply-peaked (

Figure 2a).