Reply to C . Tsallis ’ “ Conceptual Inadequacy of the Shore and Johnson Axioms for Wide Classes of Complex Systems ”

In a recent PRL (2013, 111, 180604), we invoked the Shore and Johnson axioms which demonstrate that the least-biased way to infer probability distributions {pi} from data is to maximize the Boltzmann-Gibbs entropy. We then showed which biases are introduced in models obtained by maximizing nonadditive entropies. A rebuttal of our work appears in entropy (2015, 17, 2853) and argues that the Shore and Johnson axioms are inapplicable to a wide class of complex systems. Here we highlight the errors in this reasoning.

In [1], we invoked the powerful results of Shore and Johnson (SJ) [2], who showed that, under quite general circumstances, the least-biased way to infer a probability distribution {p i } is to maximize the Boltzmann-Gibbs relative entropy (or any function that is monotonic with this entropy), under constraints where q i is the prior distribution on p i that contains any foreknowledge of the system.In [1], we showed that mathematical forms of H that are not monotonic with Equation (1)-which we call noncanonical entropies-lead to unwarranted biases.In his comment on our [1], Tsallis objects to this argument on multiple grounds [3].We show here the flaws in his objections.First, Tsallis contends that "nonadditive entropies emerge from strong correlations (among the random variables of the system) which are definitively out of the SJ hypothesis" adding that the "SJ axiomatic framework addresses essentially systems for which it is possible to provide information about their subsystems without providing information about the interactions between the subsystems, which clearly is not the case where nonadditive entropies are to be used".These statements are incorrect.SJ is not limited in any way to small interactions or weak correlations; it can handle interactions of any strength.
Standard methods of statistical physics-which are grounded in Equation ( 1)-provide a clear recipe for treating correlated systems while assuming no prior subsystem correlations.For example, we do not need a special form of Equation ( 1) in order to build spin-spin correlations into an Ising model.On the contrary, the Boltzmann weights for a spin-spin-correlated Ising model are constructed by assuming that spin correlations originate from the data (which is used to constrain Equation ( 1)).In the absence of spin-spin correlations, standard statistical physics returns an Ising model with decoupled spins, as it should.On the other hand, if we choose to assume a priori some particular coupling between the spins that do not originate from data, then SJ prove that it should be introduced through {q i } exactly as it appears in Equation (1).In particular, models of power laws can only arise in a principled way either from data constraints or from {q i } in conjunction with Equation (1).There is no principled basis for power laws that can be obtained by re-assigning the meaning of H and changing its form [1].
Second, Tsallis asserts that "We see therefore that the SJ set of axioms, demanding, as they do, system independence, are not applicable unless we have indisputable reasons to believe that the data that we are facing correspond to a case belonging to the exponential class, and by no means correspond to strongly correlated cases such as those belonging to the power-law or stretched-exponential classes, or even others".This is not correct.The SJ inference procedure is not limited to independent systems.Rather, SJ asserts that when systems are independent of each other, then the joint outcome for two independent systems must be the product of marginal probabilities if data are provided for systems independently [4].SJ is otherwise perfectly applicable broadly across situations not involving independent systems.In the language of Bayesian inference, the SJ axioms are used to derive a form for the noninformative prior over the model (the probability distribution {p i }) [5].The SJ axioms have no input on the likelihood function that otherwise determines how data updates the prior.What is more, because priors and data are confounded in the Tsallis entropy, the parametrization of the fitting parameter q in the Tsallis entropy relies on a misapplication of Bayes' theorem as has been shown in [5].
Third, Tsallis asserts that "This is the deep reason why, in the presence of strong correlations, the BG entropy is generically not physically appropriate anymore since it violates the thermodynamical requirement of entropic extensivity."In this statement, Tsallis is incorrectly asserting that extensivity must be a foundational property from which the functional form of the entropy follows.In fact, the logic is just the opposite.Extensivity-or not-of an entropy is the outcome of an inference problem at hand, not its input.Certainly throughout much of equilibrium thermodynamics, extensivity happens to hold.But that is a matter of the particulars of that (macroscopic) class of applications.Said differently, this argument confuses the distinction between pre-maximization and post-maximization entropies.SJ focus on pre-maximization.SJ seek a functional H that, upon maximization, achieves certain properties required for drawing consistent unbiased inferences.At this stage, no system property (such as how entropy scales with system size) is yet relevant.This is just establishing a very general inference principle.However, once maximization has been performed, H({p i = p * i }) = − i p * i ln p * i is an entropy function that may be used to make predictions about physical systems including how properties scale with system size.The SJ argument is agnostic about whether extensivity holds or not for the post-maximization entropy H({p i = p * i }).

Conclusions
In short, the great power of the SJ arguments is in showing that Equation ( 1) is an extremely broad and deep result, applicable across all matters of inference of probability distributions, given only priors (q i ) on p i and given new information.The power of the SJ arguments is that they apply upstream of any particular application, whether it should involve equilibria or dynamics, materials or informational channel capacities or other, weak or strong correlations, extensivity or not, or any other particularity.We are assured by [2] that no other form of entropy function-beyond those monotonic with H-can generate unbiased inferences.