Calibration Invariance of the MaxEnt Distribution in the Maximum Entropy Principle

The maximum entropy principle consists of two steps: The first step is to find the distribution which maximizes entropy under given constraints. The second step is to calculate the corresponding thermodynamic quantities. The second part is determined by Lagrange multipliers’ relation to the measurable physical quantities as temperature or Helmholtz free energy/free entropy. We show that for a given MaxEnt distribution, the whole class of entropies and constraints leads to the same distribution but generally different thermodynamics. Two simple classes of transformations that preserve the MaxEnt distributions are studied: The first case is a transform of the entropy to an arbitrary increasing function of that entropy. The second case is the transform of the energetic constraint to a combination of the normalization and energetic constraints. We derive group transformations of the Lagrange multipliers corresponding to these transformations and determine their connections to thermodynamic quantities. For each case, we provide a simple example of this transformation.


Introduction
The maximum entropy principle (MEP) is one of the most fundamental concepts in equilibrium statistical mechanics. It was originally proposed by Jaynes [1,2] in order to connect information entropy introduced by Shannon and thermodynamic entropy introduced by Clausius, Boltzmann, and Gibbs. Although the MEP was originally introduced for the case of Shannon entropy, with the advent of generalized entropies [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17] the natural effort was to apply the maximum entropy principle beyond the case of Shannon entropy. Another question that arose naturally is whether the MEP can be applied to other than ordinary linear constraints. Examples of the constraints that might be considered in connection with the MEP are escort constraints [18][19][20], Kolmogorov-Nagumo means [21,22], or more exotic types of constraints [23]. It brought some discussion about the applicability of the principle for the case of generalized entropies [24,25] and nonlinear constraints and its thermodynamic interpretation [26][27][28][29][30]. Indeed, MEP is not the only one extremal principle in statistical physics, let us mention, e.g., the principle of maximum caliber [31] which is useful in non-equilibrium physics. In this paper, we stick, however, to MEP, as it is the most widespread principle and the theory of generalized thermostatistics has been mainly focused on MEP. For a recent review of other principles, see also in [32]. For the discussion between entropy arising from information theory and thermodynamics, see in [33]. For the sake of simplicity, let us consider canonical ensemble, i.e., fluctuations in internal energy. For the case of the grand-canonical ensemble, one can obtain similar results to the ones presented in this paper for the case of a chemical potential µ.
In order to grasp the debate about the applicability of the MEP, let us emphasize that the MEP consists of two main parts: (I) Finding a distribution (MaxEnt distribution) that maximizes entropy under given constraints. (II) Plugging the distribution into the entropic functional and calculating physical quantities as thermodynamic potentials, temperature, or response coefficients (specific heat, compressibility, etc.).
The first part is rather a mathematical procedure of finding a maximum subject to constraints. This is done by the method of Lagrange multipliers, by defining a Lagrange function in the form The Lagrange multipliers' role at this stage is to ensure fulfillment of constraints as they are determined from the set of equations obtained from the maximization of the Lagrange function. This procedure is known in statistics as Softmax, a method used to infer distribution from given data. Shore and Johnson [34,35] therefore studied MEP as a statistical inference procedure and established a set of consistency axioms. Shore and Johnson's work heated a debate about whether MEP for generalized entropies can be also understood as a statistical inference method satisfying the consistency requirements [24,[36][37][38][39][40][41]. In [42], it was shown that the class of entropies satisfying the original Shore-Johnson axioms is wider than previously thought. Moreover, in [43], the connection between Shore-Johnson axioms and Shannon-Khinchin axioms was investigated and the equivalence of information theory and statistical inference axiomatics was established.
In the second part, the physical interpretation of entropy starts to arise. Similar to the case of Lagrangian mechanics, where the Lagrangian is the difference between kinetic and potential energy and the Lagrange multipliers play the role of the normal force to the constraints, here the entropy becomes a thermodynamic state variable. For Shannon entropy and linear constraints, the Lagrange multipliers become inverse temperature and free entropy, respectively.
The main aim of this paper is to discuss the relation between points (I) and (II). In the first part, it is possible to find a class of entropic functionals and constraints leading to the same MaxEnt distribution. However, in the second part, different entropy and/or constraints lead to different thermodynamics and different relations between physical quantities and Lagrange multipliers. The two main messages of this paper are listed below.
(i) For each MaxEnt distribution, there exists the whole class of entropies and constraints leading to generally different thermodynamics. (ii) It is possible to establish transformation relations of Lagrange parameters (and subsequently the thermodynamic quantities) for classes of entropies and constraints giving the same MaxEnt distribution.
We call the latter transformation relation calibration invariance of the MaxEnt distribution. A straightforward consequence is that in order to fully determine the statistical properties of a thermal system in equilibrium, it is not enough to measure the statistical distribution of energies.
The rest of the paper is organized as follows. In the next section, we briefly discuss the main aspects of MEP for the case of general entropic functional and general constraints. In the following two sections, we introduce two simple transformations of entropic functional (Section 3) and constraints (Section 4) that lead to the same MaxEnt distribution and derive transformations between the Lagrange multipliers. These transformations form a group. After the general derivation, we provide a few simple examples for each case. The last section is devoted to conclusions.

Maximum Entropy Principle in Statistical Physics
Maximum entropy principle is the way of obtaining the representing probability distribution from the limited amount of information. Our aim is to find the probability distribution of the system P = {p i } n i=1 under the set of given constraints. In the simplest case, the principle can be formulated as follows.
The normalization condition is considered in the regular form, i.e., f 0 (P) = ∑ i p i − 1 = 1 − 1. Moreover, we have a class of constraints, which originally described the average energy of the system. Therefore, we call them energy constraints. We consider only one energy constraint, for simplicity, although there can be more constraints, and they do not have to consider only internal energy but also other thermodynamic quantities. In the original formulation, the energy constraint is linear in probabilities, i.e., but it can be generally any nonlinear function of probabilities-escort means provide an example. A large class of energy constraints can be written in a separable form, which means that f E (P) = E (P) − E, i.e., in the form expressing the "expected" internal energy (macroscopic variable) as a function of probability distribution (microscopic variable). This class of constraints plays a dominant role in the thermodynamic systems.
In order to find a solution of the Maximum entropy principle, we use a common method of Lagrange multipliers, which can be done through maximization of Lagrange function: The maximization procedure leads to the set of equations from which we determine the resulting MaxEnt distribution. In order to obtain a unique solution, we require that the entropic functional should be a Schur-concave symmetric function [42]. As a consequence, we obtain the values of Lagrange multipliers α and β. From the strictly mathematical point of view, Lagrange multipliers are just auxiliary parameters to be solved from the set of Equation (3). However, in physics, Lagrange parameters also have a physical interpretation. In Lagrangian mechanics, Lagrange parameters play the role of normal force to the constraints. Similarly, in ordinary statistical mechanics based on Shannon entropy H(P) = − ∑ i p i log p i and linear constraints (1), the Lagrange multipliers have the particular physical interpretation: Note that the free entropy is, similarly to Helmholtz free energy, a Legendre transform of entropy w.r.t. internal energy. For the case of ordinary thermodynamics (Shannon entropy and linear constraints), it is equal to the logarithm of the partition function. This interpretation is valid only in this case. In the case, when we use different entropy functional or different constraints, these relation between Lagrange multipliers and thermodynamic quantities are no longer valid. This is even the case, when the resulting MaxEnt distribution is the same.
The main aim of this paper is to show how the invariance of MaxEnt distribution affects the Lagrange multipliers and their relations to thermodynamic quantities. Let us now solve Equation (3). The first set of equations leads to Let us assume the normalization in the usual way which leads to let us consider separable energy constraint, so . The resulting probability distribution can be expressed as where (−1) denotes inverse function of ∂S/∂p i (provided it exists and is unique). We can express α by multiplying the equation by p i and summing over i, which leads to where X = ∑ i x i p i and ∇ P = ( ∂ ∂p 1 , . . . , ∂ ∂p n ). By plugging back to the previous equation, we can get β as where ∆ i (X) = x i − X is the difference from the average. The solution of Equation (3) depends on the internal energy E. However, in thermodynamics it is natural to invert the relation β = β(E) and express the relevant quantities in terms of β, so E = E(β). With that, we can calculate dependence of entropy on β: For separable energy constraints, ∂ f E ∂E = −1, so we obtain the well-known relation Let us now define the Legendre conjugate of entropy called free entropy (also called Jaynes parameter [44] or Massieu function [45]): Free entropy is connected to Helmholtz free energy as ψ = −βF. The difference between α and ψ can be expressed as Therefore, we can understand the difference ψ − α as the Legendre transform of ψ with respect to P. From this, we see that the difference between ψ and α is a constant (not depending on thermodynamic quantities), if two independent conditions are fulfilled, i.e., E = ∇ P E (P) and S = ∇ P S + a. The former constraint leads to linear energy constraints, while the latter one leads to the the conclusion that the entropy must be in trace form S(P) = ∑ i g(p i ). Moreover, the function g has to fulfill the following equation, g(x) − ax = xg (x) (14) leading to g(x) = −ax log(x) + bx which is equivalent to Shannon entropy.
In the next sections, we will explore how the transformation of the entropy and the energy constraint that leaves the MaxEnt distribution invariant affects the Lagrange multipliers and their relation to thermodynamic quantities.

Calibration Invariance of MaxEnt Distribution with Entropy Transformation
The simplest transformation of Lagrange functional that leaves the MaxEnt distribution invariant is to consider an arbitrary increasing function of entropy, i.e., we replace S(P) by c(S(P)), where c (x) > 0. Let us note that this transform preserves the uniqueness of the MEP because it is easy to show that if S(P) is Schur-concave, c(S(P)) is also Schurconcave [42] which is a sufficient condition for uniqueness of the MaxEnt distribution.
In this case, the Lagrange equations are adjusted as follows, leading to α c = c (S(P)) ∇ P S(P) − β c ∇ P E (P) (16) and so we get that the function c causes rescaling of α and β, so while its ratio remains unchanged, i.e., α c /β c = α/β. Actually, the set of increasing functions conform a group of Lagrange multipliers, because it is easy to show that the Lagrange parameters related to the entropy c 1 (c 2 (S(P)) β c 1 •c 2 = c 1 (c 2 (S(P)) · c 2 (S(P)) β = c 1 (c 2 (S(P))β c 2 (20) which can be described as the group operation (c 1 • c 2 ) → c 1 (c 2 ) · c 2 . An important property of this transformation is that it changes the extensive-intensive duality of the conjugated pair of thermodynamic variables and the respective forces while it maintains the distribution. Notably, by changing the entropic functional from extensive (i.e., S(n) ∼ U(n)) to non-extensive, it changes β from intensive (i.e., size-independent, at least in the thermodynamic limit) to non-intensive, i.e., explicitly size-dependent. This point has been discussed in connection with q-non-extensive statistical physics of [29,30] and the relation to the zeroth law of thermodynamics was shown in [46]. As one can see from the example below, although Rényi entropy and Tsallis entropy have the same maximizer, the corresponding thermodynamics is different. While Rényi entropy is additive (and therefore extensive for systems where U(n) ∼ n) and the temperature is intensive, Tsallis entropy is non-extensive, and the corresponding temperature explicitly depends on the size of the system.
Let us finally mention that the difference between free entropy and Lagrange parameter α transforms as While free entropy and other thermodynamic potentials are transformed, the heat change remains invariant under this transformation: and therefore we obtain that c q (S q (P)) = The difference between free entropy and α can be obtained as One can therefore see that even though Rényi and Tsallis entropy lead to the same MaxEnt distribution, their thermodynamic quantities, such as temperature or free entropy, are different.
Whether the system follows Rényi or Tsallis entropy depends on additional facts, as e.g., (non)-extensitivity and (non)-intensivity of thermodynamic quantities. • Shannon entropy and Entropy power: A similar example is provided with Shannon entropy H(P) = ∑ i p i ln 1/p i and entropy power P (P) = ∏ i (1/p i ) p i . The relation between them is simply H(P) = c(P (P)) = log(P (P)), (26) so we obtain that c (P (P)) = 1/(P (P)) = exp(−H(P)). (27) For the difference between free entropy and α, we obtain that from which we get that ψ P − α P = P (P)(1 − log P (P)).
Therefore, we see that even that the MaxEnt distribution remains unchanged, the relation between α and free energy is different.

Calibration Invariance of MaxEnt Distribution with Constraints Transformation
Similarly, one can uncover the invariance of the MaxEnt distribution when the constraints are transformed in a certain way. Generally, if two sets of constraints define the same domain, the resulting Maximum entropy principle should lead to equivalent results. We will not be so general, but we focus on a specific situation, which might be quite interesting for thermodynamic applications. Let us remind two conditions, which we assume: normalization f 0 (P) = 0 and energy constraint f E (P) = 0. Let us investigate the latter. Similarly to the previous case, it is possible to take any function g of f E (P), for which g(y) = 0 if y = 0. More generally, we can also take into account the normalization constraint and replace the original energy condition by g( f 0 (P), f E (P)) = 0 (30) for any g(x, y), for which g(x, y) = 0 ⇒ y = 0. Let us investigate the Maximum entropy principle for this case. We can express the Lagrange function as which leads to a set of equations where G (1,0) = ∂g(x,y) ∂x | (0,0) and G (0,1) = ∂g(y,x) ∂x | (0,0) . We take again into account that ∂ f 0 (P) ∂p i = 1, multiply the equations by p i and some over i. This gives us By plugging α g back, we end with relation for β g : For α g we end with Thus, we end again with rescaling of α g and β g , which reads The ratio of Lagrange multipliers is also transformed, so we get Again, the set of all functions fulfilling the aforementioned condition conform a group. The group operation can be described by the relation between coefficients G (1,0) and G (0,1) for the composite function g(x, y) = g 1 (x, g 2 (x, y)). We obtain that which leads to group relations Example 2. Here we mention two simple examples of the aforementioned transformation.
• Energy shift: Under this scheme, we can assume the constant shift in the energy spectrum. Let us rewrite the constraint f (P) in the following form, which allows us to identify the function g(x, y) as We obtain G (1,0) = −E and G (0,1) = 1, which means that α = α − βE . • Latent escort means: Apart from linear means, it is possible to use some generalized approaches. One of these examples is provided by so-called escort mean: which for q = 1 becomes an ordinary linear mean, when P = {p i } n i=1 are normalized to one. When we use this class of means in the Maximum entropy principle, the normalization is enforced by the normalization condition f 0 (P) = 0, therefore for q = 1 we obtain the same results. Nevertheless, by taking q = 1 for the results with escort distribution, the energy constraint is actually expressed as can be understood in the same way as considered before in this section, i.e., as a combination of a normalization constraint and energy constraint. In this case the function g has the following form, Therefore, we obtain that G (1,0) = −E and G (0,1) = 1, which correspond to the previous example for E = E. Therefore, the latent energy mean can be understood in terms of MaxEnt procedure as the shift of the energy spectrum by its average energy.

Conclusions
In this paper, we have discussed the calibration invariance of MEP, which means that for a given MaxEnt distribution, there exists a whole class of entropies and constraints that lead to different thermodynamics (Thermodynamic quantities and response coefficients generally have different behavior. For example, from intensive temperature we can obtain temperature that explicitly depends on the size of the system). We have stressed that the MEP procedure consists of two parts, where the first part, consisting of determining the MaxEnt distribution, is rather a mathematical tool, while the second part, making connection between Lagrange multipliers and thermodynamic quantities, is a specific for application of MEP in statistical physics. Indeed, the paper does not cover all possible transformations leading to the same MaxEnt distribution (let us mention, at least, the additive duality of Tsallis entropy, where maximizing S 2−q with linear constraint leads to the same result as maximizing S q with escort constraints [47]). The main lesson of this paper is that in order to fully determine a thermal system in equilibrium, we need to measure not only probability distribution, but also all relevant thermodynamic quantities (as entropy). Moreover, the transformation between Lagrange parameters and its connection to thermodynamic potentials can be useful in situations when one is not certain about the exact form of entropy.