Conceptual Inadequacy of the Shore and Johnson Axioms for Wide Classes of Complex Systems

It is by now well known that the Boltzmann-Gibbs-von Neumann-Shannon logarithmic entropic functional ($S_{BG}$) is inadequate for wide classes of strongly correlated systems: see for instance the 2001 Brukner and Zeilinger's {\it Conceptual inadequacy of the Shannon information in quantum measurements}, among many other systems exhibiting various forms of complexity. On the other hand, the Shannon and Khinchin axioms uniquely mandate the BG form $S_{BG}=-k\sum_i p_i \ln p_i$; the Shore and Johnson axioms follow the same path. Many natural, artificial and social systems have been satisfactorily approached with nonadditive entropies such as the $S_q=k \frac{1-\sum_i p_i^q}{q-1}$ one ($q \in {\cal R}; \,S_1=S_{BG}$), basis of nonextensive statistical mechanics. Consistently, the Shannon 1948 and Khinchine 1953 uniqueness theorems have already been generalized in the literature, by Santos 1997 and Abe 2000 respectively, in order to uniquely mandate $S_q$. We argue here that the same remains to be done with the Shore and Johnson 1980 axioms. We arrive to this conclusion by analyzing specific classes of strongly correlated complex systems that await such generalization.

W i=1 p i = 1), which scales with N like its equal-probability particular instance, namely S BG = k ln W . Indeed, for the independent class, we have S(N ) = k ln W (N ) ∼ N , thus providing an extensive entropy as thermodynamically required.
In relevant contrast with the above case, strong correlations between the N elements do exist in great variety of natural, artificial and social systems. For example, it might be W (N ) ∼ N ρ (ρ > 0) (referred to as the powerlaw class), hence W (N + 1) ∼ W (N )( N +1 N ) ρ ∼ W (N )(1 + ρ N ). The entropy corresponding to this case is S q = k p i ln 2−q p i with ln q z ≡ z 1−q −1 1−q (ln 1 z = ln z) and q = 1 − 1 ρ . Indeed, the extremal S q (occurring for equal probabilities) is given by S q = k ln q W , and, for that specific value of q, we verify that S q=1−1/ρ (N ) ∝ N , thus once again providing an extensive entropy as thermodynamically required. Another example of strong correlations between the N elements of the system occurs when W (N ) ∼ ν N γ (ν > 1; 0 < γ < 1) (referred to as the stretched-exponential class), hence W (N + 1) ∼ W (N ) ν (N +1) γ ν N γ ∼ W (N )(1 + γ ln ν N 1−γ ). An entropy corresponding to this case is S δ = k W i=1 p i (ln 1 pi ) δ with δ = 1/γ (footnote in page 69 of [4]; see also [7]). Indeed, the extremal S δ (occurring for equal probabilities) is given by S δ = k(ln W ) δ , and, for that specific value of δ, we verify that S δ=1/γ (N ) ∝ N , thus once again providing an extensive entropy as thermodynamically required. Notice that, in the limit N → ∞, N ρ ≪ ν N γ ≪ µ N , which means that the phase-space Lebesgue measure corresponding to the power-law and stretched-exponential classes vanishes. And this is the deep reason why, in the presence of strong correlations (e.g., the power-law and the stretched-exponential classes), the BG entropy is generically not physically appropriate anymore since it violates the thermodynamical requirement of entropic extensivity (the reasons why entropy is in all cases required to be thermodynamically extensive remains outside the scope of the present Comment; the interested reader can however refer to [8,9]). The above cases are summarized in Table I. We see therefore that the SJ set of axioms, demanding, as they do, system independence, are not applicable unless we have indisputable reasons to believe that the data that we are facing correspond to a case belonging to the exponential class, and by no means correspond to strongly correlated cases such as those belonging to the power-law or stretched-exponential classes, or even others [10][11][12]. Even if we were not particularly interested in thermodynamics, it is clear that the scenario of system independence must be assumed for the data in order to justify the use of the SJ framework. If we may insist, the actual data must be minimally consistent with a scenario of system independence; if they are not, the whole inference procedure needs to follow a different path. This is the central point that is missed in [1]. Unfortunately, there are in [1] a few more inadvertences or inadequacies, the main of which we point now. (a) On the nature of the index q: It is emphasized in [1] that q would be obtained not from first principles but only as a fitting parameter. To support such a viewpoint, the authors quote a nontechnical article published 12 years ago (their reference [25]). Regrettably they seem to be unaware of the many examples where q has in fact been obtained from (dynamical) first principles. Let us quote here some of them, among many others: (i) The value of q (both for the sensitivity to the initial conditions and the entropy production per unit time) at the Feingenbaum point of the logistic map is [13,14] q = 1 − ln 2 ln αF = 0.2444877013.... (1018 exact digits are presently known: see for instance [4]; α F is the Feigenbaum universal constant); (ii) The value of q for the q-Gaussian distribution of velocities of cold atoms in optical lattices is given [15,16] by q = 1 + 44E R /U 0 , where E R is the recoil energy and U 0 the potential depth; (iii) The value of q in order to have an extensive block entropy in the class of one-dimensional systems at a quantum critical point characterized by the central charge c is given by q = [17]; (iv) The index q for the stationary-state distribution in space (and, in fact, also in momenta) in overdamped systems of the type-II superconductors in the presence of an external confining potential is given [18] by a q-exponential of the potential with q = 0, which, within the variational framework which uses linear constraints, corresponds to an entropic index q = 2 − 0.
However, even if it is definitively wrong that q cannot be derived from first principles, it is certainly true that, in the literature, it does frequently play the role of a fitting parameter. This is a natural consequence of the simple fact that the calculation of q from first principles demands the exact knowledge of the microscopic dynamics, which is very rarely available (and, even when it is available, it frequently demands mathematically intractable calculations). This has no deeper epistemological significance that the well known fact that the precise orbits of the planets of our planetary system are not in practice obtained from first principles but rather from astronomical observations and fittings. This by no means implies that Newtonian mechanics is not a first-principle theory, but it just exhibits that its full calculation is virtually impossible because the exact knowledge of the initial conditions of all the masses of the planetary system is required. In spite of that gigantic difficulty, classical mechanics is nevertheless capable of easily establishing the (approximate) form of all those orbits, namely the Keplerian elliptic form.
(b) On nonindependent probabilities: We read in [1] that "the Tsallis entropy can only be justified if events i and j were to have the following joint probability, p q−1 ij = (u q−1 i + v q−1 j − 1)", and "We apply SJ's approach to derive what joint probability for states of two systems would be required to justify the form of the Tsallis entropy". In the notation currently used in q-entropies and nonextensive statistical mechanics, this corresponds to is referred to as the q-product [19,20]. There is in [1] some degree of confusion with respect to this point. Let us be precise. If we have the particular instance of equal probabilities, then S q = k ln q W , hence indeed we have the extensive-like equality S q (W u ⊗ q W v ) = S q (W u ) + S q (W v ), to be contrasted with the nonadditive expression , W u and W v being the total number of states of the systems with probabilities {u i } and {v j } respectively. But, for the generic case (i.e., when we have not necessarily equal probabilities), we do not have such a simple expression for the composition of q-entropies.
. This is an interesting and nontrivial consequence of this class of correlations (curiously enough, referred in [1] to as "spurious correlations") between probabilistic events.
(c) On unconventional averages: We read in [1] that "Furthermore, unconventional averages must often be used to constrain nonadditive entropies [their Refs. [28][29][30][31][32][33]] to assure the convexity of those functions if they are to be used to infer a unique model." This claim appears to be kind of unclear since, for example, the nonadditive entropic functional S q is itself concave for all q > 0 (in contrast, for instance, with the additive Renyi entropic functional, which is concave only for 0 < q ≤ 1). When we want to go one step further, and produce a statistical mechanics, then constraints must indeed be added, and they naturally play a relevant role. From an information-theory standpoint, constraints must be simple and robust, such as mean values and widths. These quantities are typically taken to be x and x 2 , since they simply characterize the location and the width (given by x 2 − x 2 ) of the distribution of probabilities p(x) of the random variable noted x. These quantities are mathematically admissible as long as they are finite (which typically occurs if p(x) decays quickly enough, for example, exponentially quick) at large values of |x|. When we have long-tailed power-laws these quantities diverge, and we must therefore replace the conventional constraints by mathematically appropriate ones. For example, the mean value x for the q-exponential , and e z 1 = e z ) diverges for q ≥ 3/2, whereas appropriately q-generalized mean values remain finite as long as the distribution p(x) remains normalizable (in our example, for q < 2). Another example: if we had p(x) ∝ e −βx 2 q (β > 0; −∞ < x < ∞), then x 2 diverges for q ≥ 5/3, whereas the appropriately q-generalized variance remains finite as long as the distribution p(x) remains normalizable (in this example, for q < 3). These important mathematical issues are illustrated in [21] and analytically discussed in [22]; see also [23] for transformations associated with constraints of different kinds within nonextensive statistical mechanics. In the case of power-laws, the authors of [1] offer as alternative to use S BG with logarithmic constraints: "We conclude by adding that it is possible to infer power laws within a principle of maximizing the BG entropy by constraining just one average: Mandelbrot [their Ref. [35]] showed this by invoking logarithmic constraints ln x [to avoid confusion, we have noted x, and not k the random variable]". If we do this, we obtain p(x) ∝ 1/x η (η ∈ R; 0 < x < ∞), which is not normalizable for any value of η. Another most relevant aspect which also justifies the use, within a variational framework, of both additive and nonadditive entropies is the following. If we consider the standard (homogeneous and linear) Fokker-Planck equation in the presence of an external confining potential V (x), we obtain for its unique stationary state p(x) ∝ e −βV (x) , which precisely coincides with the distribution which uniquely optimizes S BG with fixed V (x) . If we consider now quite general inhomogeneous and/or nonlinear Fokker-Planck equations still in the presence of the confining potential V (x), its unique stationary state once again precisely coincides with the distribution extremizing specific generalized entropies satisfying the H-theorem, and whose functional form is mandated by the specific inhomogeneity and/or nonlinearity [24][25][26][27][28][29].
Last but not least, a considerable amount of predictions, verifications and applications of nonadditive entropies are today available in the literature which are useful in theoretical, experimental, observational and computational approaches of a wide variety of systems. Among many others, we may illustrate the existing bibliography [30] with: (i) The velocities of cold atoms in dissipative optical lattices [16]; (ii) The velocities of particles in quasi-two dimensional dusty plasma [31]; (iii) Single ions in radio frequency traps interacting with a classical buffer gas [32]; (iv) The relaxation curves of RKKY spin glasses, like CuMn and AuFe [33]; (v) Hadronic transverse momenta distributions at LHC experiments [34]; (vi) Long-ranged-interacting many-body classical Hamiltonians [35]; (vii) Nonlinear generalizations of the Schroedinger, Klein-Gordon and Dirac equations through q-plane waves [36,37]. Also q-generalized forms of the Central Limit Theorem have emerged from this theory [38,39].
For better understanding the domain of validity of the SJ System Independence Axiom (Axiom III of Ref. [14] of [1]), let us focus on a specific case. Consider, for example, a one-dimensional system at zero temperature, where it displays a quantum critical point, say an Ising ferromagnetic chain in the presence of a transverse magnetic field at its critical value. Let us assume that the chain has N first-neighbor-connected spins, and let us focus on a subsystem of it made of L successive spins, see for instance [17]. Let us note ρ L ≡ T r N −L ρ N the density matrix of the subset, which is obtained by tracing over (N − L) spins the density matrix ρ N of the total chain. We define the block q-entropy as follows: S q (L, N ) = k 1−T r(ρL) q q−1 , S 1 (L, N ) = −kT rρ L ln ρ L being of course the block Boltzmann-Gibbs-von Neumann entropy. The entropy of the total chain is given by S q (N, N ). For the strongly quantum-entangled state that we are considering here the system is in a pure state (i.e., T rρ 2 N = 1), therefore S q (N, N ) vanishes for all N and all q, in particular for q = 1. Consequently the asymptotic entropy per particle lim N →∞ Sq(N ) N also vanishes, ∀q. In remarkable contrast, the block of L spins is in a very different state, namely a statistical mixture (i.e., T rρ 2 L < 1). Consequently, the block entropy per particle has a very different behavior than that of the full entropy per particle. More precisely, lim L→∞ Sq(L,∞) L vanishes for q > q * , diverges for q < q * , and is finite for q = q * , where q * < 1, and S q (L, ∞) ≡ lim N →∞ S q (L, N ) [17]. All these somewhat counterintuitive facts come from the fact that, if we divide the full chain into say an L-block and an (N − L)-block, it is not possible to provide informations on the two blocks that would not interact. This fact badly accommodates with the SJ set of axioms (and very specifically with the System Independence Axiom). The same happens for the power-law and stretched-exponential classes discussed earlier in this paper.
The well known set of axioms of Shannon and of Khinchin mandate an unique entropic functional, namely S BG . They have both been q-generalized (in [40] and [41] respectively), and the unique entropic functional that is then mandated is S q . Let us conclude by mentioning that we believe that, similarly, the Shore-Johnson axioms might be generalizable in order to cover S q or even other nonadditive entropies. Undoubtedly, such a generalization would be definitively welcome.
I thank L.J.L. Cirto, E.M.F. Curado and F.D. Nobre for very fruitful conversations. Partial financial support from CNPq and FAPERJ (Brazilian agencies) is acknowledged.