Entropic Divergence and Entropy Related to Nonlinear Master Equations

We reverse engineer entropy formulas from entropic divergence, optimized to given classes of probability distribution function (PDF) evolution dynamical equation. For linear dynamics of the distribution function, the traditional Kullback–Leibler formula follows from using the logarithm function in the Csiszár’s f-divergence construction, while for nonlinear master equations more general formulas emerge. As applications, we review a local growth and global reset (LGGR) model for citation distributions, income distribution models and hadron number fluctuations in high energy collisions.


Motivation
The entropic convergence is being intensively studied alongside functional inequalities. The log-Sobolev inequality and a family of other related functional inequalities have been developed to control the speed of convergence of stochastic processes on the Euclidian space and on manifolds. The control is given in a form of an upper bound on the speed of convergence in one or other metrics, e.g., Wasserstein distance, Lp-norm or other. The most important development by this approach is the incorporation of the curvature condition, which replaces the condition on the lower bound of the spectral gap. The latter one can be typically used on compact sets while the former allows us to control the convergence on non-compact sets, on spaces of infinite diameter or measure.
The theory of measure concentration ("entropic distance shrinking") received a new impact from the work of Marton [1] and Talagrand [2]. This initiative later grow into a unified treatment by Otto and Villanyi [3] incorporating mass transport theory and gradient flow on manifolds of measures. All those efforts are focused on finding conditions for the convergence of the process under scrutiny. In these works several extensions of the log-Sobolev inequality, Wasserstein distance and Fisher-Donsker-Varadhan information is established [4].
The present work is dealing with processes for which such methods does not work to our best knowledge. Convergence is ensured in a different way. A metric, in particular a Kullback-type divergence [5] is created in which-thanks to the intimate relation to the structure of the master equation-it can be shown that its time derivative is negative.
For recent developments dealing with discrete Markov chains see also References [6,7] and references therein.
The classical entropy-probability relation, pioneered by Boltzmann, used and discussed by Gibbs, Planck and Shannon [8][9][10][11][12] (just to mention a few), is under steady generalization attempts, e.g., [13][14][15][16][17][18]. Most of the suggestions are mathematically motivated, formulated in terms of axioms (Khintchine or other) [19,20], sometimes abandoning the additivity property for factorizing probabilities, therefore treating statistical independence in an unorthodox way [21]. The motivation behind doing so lies in the study of complex systems, where complexity reveals itself in long-range correlations causing unusual behavior in the thermodynamical limit. It is common in such phenomena that interface effects, multiplied with a characteristic correlation length, are not smaller than bulk (volume) effects. This ruins the familiar hierarchy valid for big systems in the classical thermodynamics and is mathematically reflected in the non-extensivity property. Hence formulas generalizing the logarithmic function at the core of the entropy formula are necessarily non-extensive. Besides this, some approaches keep the logarithm but use a function of the probability in the formula. For example, in the field of sensor data analysis, the Deng entropy is in use [22][23][24].
By the generalized entropy formula, important properties, like non-negativity, expansibility, convexity are still to be satisfied [25]. It is, however, hard to convince about one or the other "next to simplest" generalization without investigating physical phenomena which realize or-if we are lucky-even suggest preferring a given formula against others [26,27]. First, in a deductive way someone suggests a formula and then checks its fundamental properties before applying to a number of physical observations. Any suggested new entropy formula must include the conventional one as a limiting case, since this has already the largest observational support. A second way to select a favorite formula may rely on starting with the study of a given class of complex system behavior and adjust the classical definition accordingly. Most importantly, one expects that the formula commands the entropy to increase in a closed system, under the given microscopic dynamics already known or assumed.
Although both ways are legitimate, in this paper we follow the second one. Given an evolution dynamics for the microstate probabilities, we first seek for an expedient formula for the entropic divergence [28]: a non-negative measure between two probability distributions (in a continuous model probability distribution functions (PDFs) which shrinks during the dynamical evolution. Possibly for two arbitrary initial distributions. Or at least between an arbitrary initial distribution and the stationary distribution. Once this convergence criterion is fulfilled, we study the entropic divergence from the uniform distribution. In special cases, we shall realize that the entropic divergence from the uniform distribution can be written as proportional to a difference of the same functionality on the uniform and on the investigated distribution. Finally this functional will be proposed as a candidate for an entropy formula. For linear master equations describing the evolution of probability, this turns out to be the Boltzmann entropy formula, while for other master equations, different formulas result. In particular, for a fractal power dependence on the probability, the Tsallis entropy formula emerges. Our method described above does not assume detailed balance while delivering the proof, in this way it is valid in a much broader class of dynamics than Boltzmann's famous H-theorem.
It is often assumed that a non-exponential energy distribution would be the sign for non-extensive entropy. This assumption is, however, mistaken. Both a non-extensive entropy formula with a linear constraint on the energy and a traditional, logarithmic entropy formula with a constraint on another function of the energy lead to non-exponential PDFs in equilibrium. Moreover, several physical systems tend to a statistical behavior with a non-exponential stationary distribution without reaching equilibrium (open systems). In the framework of local growth and global reset model (LGGR) we demonstrate that far from the detailed balance state stationary distributions show a thermodynamical behavior, akin to the fluctuation-dissipation theorem [29]. Examples in income distribution, popularity measured by citations and hadron multiplicities in high-energy collisions demonstrate the flexibility and usefulness of the LGGR model.
As part of this presentation, we repeat some formulas published already earlier [26,28]. Our purpose is to provide the reader with a self-contained chain of thoughts, followable without further reading. For helping to concentrate on the distinction, however, we summarize here briefly the main development relative to our earlier publications. The general proof of deriving of the shrinking of the entropic divergence is now given for two arbitrary distributions, contrary to the former and more traditional derivation when one of the distributions is the stationary one. We also present some details on why, in the generalized Markovian dynamics, a similar proof based on the convexity property of the definition of the entropic divergence suffices only for an approach to the stationary distribution. Arguments for the generalization to nonlinear probability dynamics are shortly mentioned, but the applications in the framework of our LGGR model is again linear, concentrating on the processes with a stationary state without detailed balance.

Shrinking Entropic Divergence
Entropic divergence (sometimes cited as "entropic distance", however, it is usually not a symmetric measure, nor does it satisfy any triangle inequality) is a non-negative real-valued functional of two probability distributions, ρ[P, Q] ≥ 0. We also want that whenever this measure is zero, then the equality of the two distributions follows: ρ[P, Q] = 0 ⇒ ∀n P n = Q n . We add a further requirement, related to the evolution dynamics of the probabilities. This irreversibility property expresses that starting with an arbitrary distribution at time t = 0, it will approach the stationary distribution, so their entropic divergence shrinks. To begin with we consider Csiszar's f-divergence formula [30,31] as a class of entropy divergence formulas investigated here. Applying the Jensen inequality [32,33] to this formula one obtains for f > 0. The non-negativity property is hence satisfied by any function in the Csiszar formula with the correct second derivative and f (1) = 0. Next we review a linear dynamics for the probability evolution, adjusted to conserve the normalization, i.e., ∑ nṖn = 0. Here the transition rates from the state m to n, w nm ≥ 0, define the probability evolution. Now we are interested in the evolution of the entropic divergence, while both distributions-initially arbitrary-evolve governed by the common transition rates, w nm . This description includes complex systems, where transitions can happen between non-neighboring, in the index space far away states, too. We make no restriction on n − m. Now using the definition in Equation (1) the evolution of the entropic divergence shows the following time derivativė with the notation ξ n = P n /Q n . Substituting the dynamical master Equation (3) we obtaiṅ In the last term we exchange the summation indices to arrive aṫ Now a definite statement can be made about the sign of this expression by utilizing the Taylor series remainder theorem in the Lagrange form [34] for the yet unspecified function, f (ξ): with c nm being an intermediate argument between ξ n and ξ m . Using this knowledge we finally obtaiṅ For any definite sign of the second derivative of f (ξ), the above formula also has a definite sign, for f > 0 functions we haveρ ≤ 0. We note that f > 0 was the very same condition necessary for the non-negativity property of ρ itself, proven via the Jensen inequality, Equation (2). Consequently, we have proven that two arbitrary initial PDFs evolving via the linear master Equation (3) show a shrinking entropic f-divergence between them. This shrinking of the f-divergence from the stationary distribution to an arbitrary one is a basic property, discussed in the literature on Markov processes (for a review see e.g., [35]).
How far can this result be generalized for nonlinear evolution scenarios? We consider a slight modification using a positive function of the probability, a(P), in the master equatioṅ Here a(P) can be any positive function, including the most known a(P) = P case. In some other complex system kinetics here powers may occur, as in chemical reaction networks [36], morphogenesis [37] or in analogy to some models of finance dynamics generalizing the Fokker-Planck equation to a nonlinear dependence on the phase space density [38]. Other functions may appear when simulating Boltzmann-Uehling-Uhlenbeck type of finite-state blocking or enhancement factors as in quantum transport models [39]. In such problems, one would apply a(P) = P/(1 ± P).
In this case, the above presented general proof does not apply. The more general trace formula for the entropic divergence, has the rate of changeρ Replacing the dynamical Equation (9) both forṖ n andQ n and re-arranging the double summation indices as well as using the notation ξ n = a(P n )/a(Q n ) we obtaiṅ For simplifying we introduce the definition for the arbitrary indexed quantities. Using this assertion the change of the divergence is straightforwardly written aṡ In order to apply the proof given in the linear master equation case and recited above, we have to achieve Whether it is possible or not, can be decided upon investigating the mixed second partial derivative of σ(P, Q), i.e., the integrability condition: In our particular case this results in This does not happen, unless a (P) = a (Q). This restricts us to the linear case or motivates investigations of non-trace like entropy divergence formulas.
It can be proven, however, that the above more general entropy divergence formula Equation (10), shrinks if we take Q n as the stationary distribution. The latter is defined by the "total balance" condition We note that the "detailed balance" condition would annulate each term in the above sum, not only the total result. That classical condition is actually a condition on the transition rates, reflecting microscopic time reversibility, while the total balance (consisting of n = 1, 2, . . . W equations) simply defines the stationary distribution Q n .
In order to use the argumentation which holds in the linear case, one has only to satisfy with the modified definition ξ n = a(P n )/a(Q n ). The above equation can be solved for any positive a(P) function, particular solutions are set by the natural constraint ρ[Q, Q] = 0. This result, stating that all distributions converge to the stationary one due to such a(P)-dynamics, does not mean that the entropic distance between two non-stationary distribution would continuously shrink during their evolution. They will all reach the stationary state and reduce their entropic divergence to zero only in the final stage.
It is traditional to use the function f (ξ) = − ln ξ for generating the entropic divergence formula. Indeed in this case f (ξ) = −1/ξ and f (ξ) = 1/ξ 2 > 0. The usual argumentation behind this choice is a further property, making the core function, f (ξ), in the Csiszar formula additive. For composite systems with subsystems (1) and (2) the probabilities namely factorize if statistical independence holds: P n (1 ⊕ 2) = P n (1)P n (2) and similarly for Q n . This makes ξ(1 ⊕ 2) = ξ(1)ξ(2) and for f (1 ⊕ 2) = f (1) + f (2) with positive second derivative only this choice remains. Using this starting point the entropic divergence formula, Equation (1), and returning to the a(P) = P case specifies to the Kullback-Leibler definition [5,40] This formula contains a so called "cross entropy" term and a pure entropy term associated to the PDF Q, but it is not yet unique what entropy formula would follow from this. To make this last step we utilize our concept of complexity: the entropic divergence of the uniform distribution, U n = 1/W for n = 1, 2, . . . W from Q n , i.e., In this case the Kullback-Leibler based complexity measure turns out to be a difference of the Boltzmann entropies, as suggested by the P-linear master equation dynamics as a proper measure. Its generalization, however, is not a difference of more general entropy formulas. The entropic divergence formula based on the corresponding probability evolution description in the master equation has to be the starting point, and its specification for a divergence from the uniform distribution is either a difference of the same formula at U and Q, or not.
As an example, we investigate a power-like nonlinearity, a(P) = P λ . In this case, we do not use the Csiszar formula, but solve Equation (19) with the traditional logarithmic ansatz, f (ξ) = − ln ξ: Its particular solution fixing ρ[Q, Q] = 0 reads as with the so called deformed logarithm function [41][42][43] parametrized with λ: We note that starting with an f (ξ) function of a deformed logarithm as above, the result again has the same form, just the new λ is the product of the starting parameter ν in the deformation and the power in the master equation level non-linearity, λ new = λν. The result Equation (24) is robust. Finally the suggested complexity measure becomes It is proportional to a difference of Tsallis entropies, and hence by construction proves that the Tsallis entropy is also maximal at the uniform distribution, since ρ ≥ 0 holds, based on the Jensen inequality applied to Equation (24). The forefactor W λ−1 , depending on the number of states, signalizes non-extensivity in this case.

LGGR: A Linear Model for Local Growth and Global Resets
The master Equation (3) and its nonlinear generalization (9) describing the evolution of a probability distribution over possible physical microstates are quite general. The familiar diffusion problem in one dimension is described by transition rates between neighboring indices only, allowing to increase or decrease the index by one in a short time ∆t. Their symmetric part is responsible for the diffusion coefficient while their difference provides a driving force for the drift. In higher dimensional diffusion problems there are also some sparse nonzero elements according to the finite size, like w n+N x ,n for a local jump in the positive y direction, etc. These systems in the continuum limit are equivalent to a simple Fokker-Planck equation [44][45][46].
On the other hand in the up to date research we often face complex systems with underlying dynamics far from the detailed balance: the ratio of reversed micro-transition rates is far from looking as a ratio of a simple given function of indices, w nm /w mn = Q n /Q m . In this paper we present a particular simple model for extremely asymmetric processes: the growth rate is local in the index, while there is a global reset rate from anywhere to the starting (ground) state with index zero. In this local growth, global reset (LGGR) model we assume the following structure for the transition rate matrix (m → n): w nm = µ m δ n,m+1 + γ m δ n,0 .
The resulting linear master equation becomeṡ for n > 0. The evolution for the n = 0 state,Ṗ 0 , then includes the summatioṅ but alternatively P 0 can be obtained from the normalization condition, P 0 = 1 − ∞ ∑ n=1 P n . Now the complexity measure is given by and we are especially interested in its value for the stationary distribution of a given LGGR model. So the missing link is the computation of the stationary PDF in terms of µ n and γ n .
Here we review some of the most important LGGR models. The stationary distribution is obtained by the recursive resolution of theQ n = 0 condition, For constant growth and reset rates, µ n = σ, γ n = γ, the result is the exponential distribution, with β = ln(1 + γ/σ). In this LGGR model with the Kullback-Leibler divergence and the traditional Boltzmannian entropy formula also non-exponential stationary distributions emerge if the transition rates are state-dependent. With a growth rate featuring linear preference, µ n = σ(n + b), the stationary solution is a Waring distribution [47], using the Pochhammer symbol (a) n = a(a + 1) . . . (a + n − 1) and c = b + 1 + γ/σ. It exemplifies a power-law decay for large n, Q n ∼ n −1−γ/σ . In the continuous LGGR version we consider the evolution equation with This quantity may vanish for some special cases where µ(0) = 0 and ∞ 0 γ(x)P (x, t)dx = 0. The corresponding stationary distribution becomes In particular the stationary PDF belonging to the linear local growth rate, µ(x) = σ(x + b), featuring linear preference and combined with a constant reset rate, γ(x) = γ, is given by the Tsallis-Pareto distribution: So power-law tailed distributions are stationary solutions to linear probability evolution equations with the traditional entropy and entropic divergence formula, if a constant global reset rate is connected with a linear preference in the local growth rate.
The total balance condition (37) determining the stationary PDF, on the other hand, can be viewed as a generalized fluctuation-dissipation theorem [48]. This form is useful for cases when the stationary PDF, Q(x), can be easily observed: than one of the rates provides information on the other. Simple integration of Equation (35) withQ(x) = 0 leads to such a relation, expressing the local growth rate with a distribution-tail average of the reset rate: For an exponential PDF, Q(x) = e −βx /Z, and constant reset rate, γ(x) = γ, one obtains µ = γ/β in analogy to the relation between diffusion rate and dissipation coefficient [49,50]. In general, for constant global reset rate γ, µ(x)/γ becomes the inverse hazard rate [51]. Such features transform LGGR to a simple and effective candidate for being an "Ising model". It can be applied to far from equilibrium, with no detailed balance, and with complex system dynamics with various transition rates describing preferential scenarios.

Citation Popularity, Income Distribution and Some Further Application Areas
Finally we would like to review a few applications of this simplified LGGR treatment. As it has been published in [26], stationary PDF-s can be reproduced in several socio-economic or physical models. Socio-economic examples include popularity ranking from citation numbers [52][53][54], income [55][56][57] and settlement sizes [58]. For biological and ecological systems population abundance [59] on different levels (species, spatial, etc.) can be studied. For physical systems a good example is the hadron number distribution in high energy accelerator collision events [60].
In exceptional cases the income data are known on the level of individuals and are available for an extended period over several years. In such cases they allow for testing the growth and the entry/exit rates experimentally. We emphasize here that simply observing a certain distribution, Q(x), does not decide whether it is stationary, and even less whether an LGGR model could satisfactorily describe the real dynamics in its background. Whenever both observed distributions and transition rates with unidirected small-step growth and big-step global resets can be observed, then the LGGR can be verified or falsified.
As already discussed in earlier publications, citation popularity data follow a continuous Tsallis-Pareto distribution [52] which, scaled with the average number of citations, contains only one nontrivial fit parameter. In our analysis we have included Web of Science citations of some journals and institutes with international reputation as well as individuals. Furthermore Facebook likes and shares were counted, and similarly Youtube likes were recorded for diverse web pages. Surprisingly, all these popularity data collapse onto the same universal curve. The observed distributions can be understood by assuming linear preference in the local growth rate and a constant reset rate in the continuous version of the dynamics [52].
It is methodically useful to report here briefly about our recent investigations on income distribution. The starting point for our study is an individual level, complete income data set for a nine year period (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009) in Cluj county (Romania) [61]. Beside the observed Q(x) PDF for income, we can follow the individual level income data to justify assumptions on the rates µ(x) and γ(x). Our preliminary findings include an improved ansatz on these rates. The reset rate, γ(x), occurs to be intelligent: at low income level it is negative, people enter into the income system mainly at this level, while at high income it is positive, the income is naturally high when they are close to retirement. The following fit describes well the averaged data with K, b and q freely adjustable parameters. The local growth rate, µ(x), contains a simple linear term since the income (salary) increase is as a rule in per cent, so the amount of increase is proportional to the income at that moment. The assumed functional form fits well the income data in Romania, Cluj district in the years 2001-2009. For these growth and reset rates the resulting PDF is analytically obtained: using Equation (29), and normalizing it we obtain the Beta Prime distribution This formula, normalized and expressed in terms of z = x/ x contains only one parameter, and income distributions from Romania, Hungary, USA, Finland and Australia fall onto the common universal curve, x Q(x) = 12z(1 + z) −5 .
Finally, let us mention the hadronization process in high energy elementary particle and heavy-ion collisions as a possible application field. Here, two distributions seem not to have a microscopic dynamical explanation, namely the multiplicity distribution of hadron number in a series of collisions with the same total energy, and the one-particle kinetic energy distribution averages over such collision events. The former seems to follow a negative binomial distribution (NBD) [62], although at the Large Hadron Collider (LHC) energies this is just a first approximation: the real distribution, including higher and higher kinetic energies in the count might require more sophisticated descriptions [63]. The latter on the other hand is beautifully fitted by a Tsallis-Pareto distribution in the mid-rapidity range, best interpolating between low, mid and higher transverse momenta in the range p T = 0.8 . . . 12 GeV [60,64].
We note that these distributions can be related to each other: a simple filling of a phase space volume whose dimensionality changes with the fluctuating number of produced hadrons event by event relates the NBD and the Tsallis-Pareto power-law tailed distribution in the individual kinetic energy. It is easy to demonstrate as follows.
In kinetic models, the microcanonical phase space is an energy shell in high dimension. The constant total energy constraint is a norm on the individual momenta, in a one-dimensional extreme relativistic jet this norm is L 1 , E = n ∑ i=1 |p i |, in a two-dimensional section at mid-rapidity of massive, non-relativistic particles on the other hand it is Pythagorean, E = 2n ∑ i=1 p 2 i /2m. Both formulas are given for n particles. In a general asset we are interested in the surface of an N-ball with radius R in L p norm, which is known to be [65] V (p) One specifies N = n, p = 1 and R(E) = E for the jets and N = 2n, p = 2 with R(E) = √ 2mE for the two-dimensional ideal non-relativistic gas. The calculations deliver the results V N (R(E))/dE. Finally the phase space ratio (conditional probability) for a selected single particle having ε energy from this total of E in both cases is given by This ratio is to be averaged over an NBD distribution of the newly produced n − 2 hadrons (2 is minimally needed for making the collision), The result is a Tsallis-Pareto distribution [66] with some modifying factors: For large n 1, characteristic for central heavy ion collisions, it approaches the classical form with T = E/ n = f (k + 1), while for small n 1, more proper for describing elementary pp collisions, a similar Tsallis-Pareto law emerges with a modified power: Equation (46) represents a monotonic falling distribution in ε in the interval [0, E] and its integral over this interval is normalized to 1. Several measurements agree with this or similar functional forms, although some details can also be fitted with different forefactors to the Tsallis-Pareto term [67].

Summary
In conclusion, we presented an H-theorem like proof for stochastic evolution without detailed balance, in particular we have shown that Csiszár's f-divergence formula for the entropic divergence shrinks between two arbitrary starting probability distributions for f > 0. The condition for this statement is that the same master equation, linear in the probability, governs the evolution of both. Further we presented the local growth global reset (LGGR) model, as a simple approach to processes far from the detailed balance and yet leading to interesting stationary distributions. This model utilizes only two rates, µ(x) for the growth and γ(x) for the reset to the x = 0 point in the state space. A fluctuation-dissipation relation connects these rates via the normalized tail probability [68] of the stationary PDF, cf. Equation (39). Application fields for LGGR type approaches are numerous, here we examplified the citation popularity, the income distribution and the hadron distribution in high energy experiments.
Author Contributions: Conceptualization and methodology was performed by T.S.B.; Formal analysis, data validation and discussion was shared by all authors; Conference talk and draft preparation as well as funding acquisition for this, including APC was provided by T.S.B.
Funding: This research and the APC for this paper was funded by the Hungarian National Research, Development and Innovation Office, grant number K123815. Further support was achieved from the UEFISCDI research grant PN-III-P4-ID-PCCF-2016-0084.

Conflicts of Interest:
The authors declare no conflict of interest.