Entropic Divergence and Entropy Related to Nonlinear Master Equations

We reverse engineer entropy formulas from entropic divergence, optimized to given classes of probability distribution function (PDF) evolution dynamical equation. For linear dynamics of the distribution function, the traditional Kullback–Leibler formula follows from using the logarithm function in the Csiszár’s f-divergence construction, while for nonlinear master equations more general formulas emerge. As applications, we review a local growth and global reset (LGGR) model for citation distributions, income distribution models and hadron number fluctuations in high energy collisions.

Keywords:

entropy; entropic divergence; master equation; reset; preferential growth

1. Motivation

The entropic convergence is being intensively studied alongside functional inequalities.

The log–Sobolev inequality and a family of other related functional inequalities have been developed to control the speed of convergence of stochastic processes on the Euclidian space and on manifolds. The control is given in a form of an upper bound on the speed of convergence in one or other metrics, e.g., Wasserstein distance, Lp-norm or other. The most important development by this approach is the incorporation of the curvature condition, which replaces the condition on the lower bound of the spectral gap. The latter one can be typically used on compact sets while the former allows us to control the convergence on non-compact sets, on spaces of infinite diameter or measure.

The theory of measure concentration (“entropic distance shrinking”) received a new impact from the work of Marton [1] and Talagrand [2]. This initiative later grow into a unified treatment by Otto and Villanyi [3] incorporating mass transport theory and gradient flow on manifolds of measures. All those efforts are focused on finding conditions for the convergence of the process under scrutiny. In these works several extensions of the log-Sobolev inequality, Wasserstein distance and Fisher–Donsker–Varadhan information is established [4].

The present work is dealing with processes for which such methods does not work to our best knowledge. Convergence is ensured in a different way. A metric, in particular a Kullback-type divergence [5] is created in which—thanks to the intimate relation to the structure of the master equation—it can be shown that its time derivative is negative.

For recent developments dealing with discrete Markov chains see also References [6,7] and references therein.

The classical entropy—probability relation, pioneered by Boltzmann, used and discussed by Gibbs, Planck and Shannon [8,9,10,11,12] (just to mention a few), is under steady generalization attempts, e.g., [13,14,15,16,17,18]. Most of the suggestions are mathematically motivated, formulated in terms of axioms (Khintchine or other) [19,20], sometimes abandoning the additivity property for factorizing probabilities, therefore treating statistical independence in an unorthodox way [21]. The motivation behind doing so lies in the study of complex systems, where complexity reveals itself in long-range correlations causing unusual behavior in the thermodynamical limit. It is common in such phenomena that interface effects, multiplied with a characteristic correlation length, are not smaller than bulk (volume) effects. This ruins the familiar hierarchy valid for big systems in the classical thermodynamics and is mathematically reflected in the non-extensivity property. Hence formulas generalizing the logarithmic function at the core of the entropy formula are necessarily non-extensive. Besides this, some approaches keep the logarithm but use a function of the probability in the formula. For example, in the field of sensor data analysis, the Deng entropy is in use [22,23,24].

By the generalized entropy formula, important properties, like non-negativity, expansibility, convexity are still to be satisfied [25]. It is, however, hard to convince about one or the other “next to simplest” generalization without investigating physical phenomena which realize or—if we are lucky—even suggest preferring a given formula against others [26,27]. First, in a deductive way someone suggests a formula and then checks its fundamental properties before applying to a number of physical observations. Any suggested new entropy formula must include the conventional one as a limiting case, since this has already the largest observational support. A second way to select a favorite formula may rely on starting with the study of a given class of complex system behavior and adjust the classical definition accordingly. Most importantly, one expects that the formula commands the entropy to increase in a closed system, under the given microscopic dynamics already known or assumed.

Although both ways are legitimate, in this paper we follow the second one. Given an evolution dynamics for the microstate probabilities, we first seek for an expedient formula for the entropic divergence [28]: a non-negative measure between two probability distributions (in a continuous model probability distribution functions (PDFs) which shrinks during the dynamical evolution. Possibly for two arbitrary initial distributions. Or at least between an arbitrary initial distribution and the stationary distribution. Once this convergence criterion is fulfilled, we study the entropic divergence from the uniform distribution. In special cases, we shall realize that the entropic divergence from the uniform distribution can be written as proportional to a difference of the same functionality on the uniform and on the investigated distribution. Finally this functional will be proposed as a candidate for an entropy formula. For linear master equations describing the evolution of probability, this turns out to be the Boltzmann entropy formula, while for other master equations, different formulas result. In particular, for a fractal power dependence on the probability, the Tsallis entropy formula emerges. Our method described above does not assume detailed balance while delivering the proof, in this way it is valid in a much broader class of dynamics than Boltzmann’s famous H-theorem.

It is often assumed that a non-exponential energy distribution would be the sign for non-extensive entropy. This assumption is, however, mistaken. Both a non-extensive entropy formula with a linear constraint on the energy and a traditional, logarithmic entropy formula with a constraint on another function of the energy lead to non-exponential PDFs in equilibrium. Moreover, several physical systems tend to a statistical behavior with a non-exponential stationary distribution without reaching equilibrium (open systems). In the framework of local growth and global reset model (LGGR) we demonstrate that far from the detailed balance state stationary distributions show a thermodynamical behavior, akin to the fluctuation–dissipation theorem [29]. Examples in income distribution, popularity measured by citations and hadron multiplicities in high-energy collisions demonstrate the flexibility and usefulness of the LGGR model.

As part of this presentation, we repeat some formulas published already earlier [26,28]. Our purpose is to provide the reader with a self-contained chain of thoughts, followable without further reading. For helping to concentrate on the distinction, however, we summarize here briefly the main development relative to our earlier publications. The general proof of deriving of the shrinking of the entropic divergence is now given for two arbitrary distributions, contrary to the former and more traditional derivation when one of the distributions is the stationary one. We also present some details on why, in the generalized Markovian dynamics, a similar proof based on the convexity property of the definition of the entropic divergence suffices only for an approach to the stationary distribution. Arguments for the generalization to nonlinear probability dynamics are shortly mentioned, but the applications in the framework of our LGGR model is again linear, concentrating on the processes with a stationary state without detailed balance.

2. Shrinking Entropic Divergence

Entropic divergence (sometimes cited as “entropic distance”, however, it is usually not a symmetric measure, nor does it satisfy any triangle inequality) is a non-negative real-valued functional of two probability distributions,

ρ [P, Q] \geq 0

. We also want that whenever this measure is zero, then the equality of the two distributions follows:

ρ [P, Q] = 0 \Rightarrow \forall n P_{n} = Q_{n}

. We add a further requirement, related to the evolution dynamics of the probabilities. This irreversibility property expresses that starting with an arbitrary distribution at time

t = 0

, it will approach the stationary distribution, so their entropic divergence shrinks. To begin with we consider Csiszar’s f-divergence formula [30,31]

ρ [P, Q] = \sum_{n} Q_{n} f (P_{n} / Q_{n}),

(1)

as a class of entropy divergence formulas investigated here. Applying the Jensen inequality [32,33] to this formula one obtains

ρ [P, Q] \geq f (\sum_{n} P_{n}) = f (1)

(2)

for

f^{″} > 0

. The non-negativity property is hence satisfied by any function in the Csiszar formula with the correct second derivative and

f (1) = 0

.

Next we review a linear dynamics for the probability evolution,

{\dot{P}}_{n} = \sum_{m} (w_{n m} P_{m} - w_{m n} P_{n}),

(3)

adjusted to conserve the normalization, i.e.,

\sum_{n} {\dot{P}}_{n} = 0

. Here the transition rates from the state m to n,

w_{n m} \geq 0

, define the probability evolution.

Now we are interested in the evolution of the entropic divergence, while both distributions—initially arbitrary—evolve governed by the common transition rates,

w_{n m}

. This description includes complex systems, where transitions can happen between non-neighboring, in the index space far away states, too. We make no restriction on

n - m

. Now using the definition in Equation (1) the evolution of the entropic divergence shows the following time derivative

\dot{ρ} = \sum_{n} (\dot{P_{n}} f^{'} (ξ_{n}) + {\dot{Q}}_{n} [f (ξ_{n}) - ξ_{n} f^{'} (ξ_{n})]),

(4)

with the notation

ξ_{n} = P_{n} / Q_{n}

. Substituting the dynamical master Equation (3) we obtain

\dot{ρ} = \sum_{n m} (w_{n m} Q_{m} [ξ_{m} f^{'} (ξ_{n}) + f (ξ_{n}) - ξ_{n} f^{'} (ξ_{n})] - w_{m n} Q_{n} [f (ξ_{n})]) .

(5)

In the last term we exchange the summation indices to arrive at

\dot{ρ} = \sum_{n m} w_{n m} Q_{m} [(ξ_{m} - ξ_{n}) f^{'} (ξ_{n}) + f (ξ_{n}) - f (ξ_{m})] .

(6)

Now a definite statement can be made about the sign of this expression by utilizing the Taylor series remainder theorem in the Lagrange form [34] for the yet unspecified function,

f (ξ)

:

f (ξ_{m}) = f (ξ_{n}) + (ξ_{m} - ξ_{n}) f^{'} (ξ_{n}) + \frac{1}{2} {(ξ_{m} - ξ_{n})}^{2} f^{″} (c_{n m}),

(7)

with

c_{n m}

being an intermediate argument between

ξ_{n}

and

ξ_{m}

.

Using this knowledge we finally obtain

\dot{ρ} = - \frac{1}{2} \sum_{n m} w_{n m} Q_{m} {(ξ_{m} - ξ_{n})}^{2} f^{″} (c_{n m}) .

(8)

For any definite sign of the second derivative of

f (ξ)

, the above formula also has a definite sign, for

f^{″} > 0

functions we have

\dot{ρ} \leq 0

. We note that

f^{″} > 0

was the very same condition necessary for the non-negativity property of

ρ

itself, proven via the Jensen inequality, Equation (2). Consequently, we have proven that two arbitrary initial PDFs evolving via the linear master Equation (3) show a shrinking entropic f-divergence between them. This shrinking of the f-divergence from the stationary distribution to an arbitrary one is a basic property, discussed in the literature on Markov processes (for a review see e.g., [35]).

How far can this result be generalized for nonlinear evolution scenarios? We consider a slight modification using a positive function of the probability,

a (P)

, in the master equation

{\dot{P}}_{n} = \sum_{m} (w_{n m} a (P_{m}) - w_{m n} a (P_{n})) .

(9)

Here

a (P)

can be any positive function, including the most known

a (P) = P

case. In some other complex system kinetics here powers may occur, as in chemical reaction networks [36], morphogenesis [37] or in analogy to some models of finance dynamics generalizing the Fokker-Planck equation to a nonlinear dependence on the phase space density [38]. Other functions may appear when simulating Boltzmann–Uehling–Uhlenbeck type of finite-state blocking or enhancement factors as in quantum transport models [39]. In such problems, one would apply

a (P) = P / (1 \pm P)

.

In this case, the above presented general proof does not apply. The more general trace formula for the entropic divergence,

ρ [P, Q] = \sum_{n} σ (P_{n}, Q_{n}),

(10)

has the rate of change

\dot{ρ} = \sum_{n} ({\dot{P}}_{n} \frac{\partial σ}{\partial P_{n}} + {\dot{Q}}_{n} \frac{\partial σ}{\partial Q_{n}}) .

(11)

Replacing the dynamical Equation (9) both for

{\dot{P}}_{n}

and

{\dot{Q}}_{n}

and re-arranging the double summation indices as well as using the notation

ξ_{n} = a (P_{n}) / a (Q_{n})

we obtain

\dot{ρ} = \sum_{n, m} w_{n m} a (Q_{m}) [ξ_{m} (\frac{\partial σ}{\partial P_{n}} - \frac{\partial σ}{\partial P_{m}}) + (\frac{\partial σ}{\partial Q_{n}} - \frac{\partial σ}{\partial Q_{m}})] .

(12)

For simplifying we introduce the definition

f (ξ) \equiv ξ \frac{\partial σ}{\partial P} + \frac{\partial σ}{\partial Q}

(13)

for the arbitrary indexed quantities. Using this assertion the change of the divergence is straightforwardly written as

\dot{ρ} = \sum_{n, m} w_{n m} a (Q_{m}) [(ξ_{m} - ξ_{n}) \frac{\partial σ}{\partial P_{n}} + f (ξ_{n}) - f (ξ_{m})] .

(14)

In order to apply the proof given in the linear master equation case and recited above, we have to achieve

\frac{\partial σ}{\partial P_{n}} = f^{'} (ξ_{n}) and \frac{\partial σ}{\partial Q_{n}} = f (ξ_{n}) - ξ_{n} f^{'} (ξ_{n}) .

(15)

Whether it is possible or not, can be decided upon investigating the mixed second partial derivative of

σ (P, Q)

, i.e., the integrability condition:

\frac{\partial^{2} σ}{\partial Q_{n} \partial P_{n}} = \frac{\partial^{2} σ}{\partial P_{n} \partial Q_{n}} .

(16)

In our particular case this results in

- ξ_{n} f^{″} (ξ_{n}) \frac{a^{'} (Q_{n})}{a (Q_{n})} = - ξ_{n} f^{″} (ξ_{n}) \frac{a^{'} (P_{n})}{a (Q_{n})} .

(17)

This does not happen, unless

a^{'} (P) = a^{'} (Q)

. This restricts us to the linear case or motivates investigations of non-trace like entropy divergence formulas.

It can be proven, however, that the above more general entropy divergence formula Equation (10), shrinks if we take

Q_{n}

as the stationary distribution. The latter is defined by the ”total balance” condition

0 = \sum_{m} (w_{n m} a (Q_{m}) - w_{m n} a (Q_{n})) .

(18)

We note that the “detailed balance” condition would annulate each term in the above sum, not only the total result. That classical condition is actually a condition on the transition rates, reflecting microscopic time reversibility, while the total balance (consisting of

n = 1, 2, \dots W

equations) simply defines the stationary distribution

Q_{n}

.

In order to use the argumentation which holds in the linear case, one has only to satisfy

\frac{\partial σ}{\partial P_{n}} = f^{'} (ξ_{n}),

(19)

with the modified definition

ξ_{n} = a (P_{n}) / a (Q_{n})

.

The above equation can be solved for any positive

a (P)

function, particular solutions are set by the natural constraint

ρ [Q, Q] = 0

. This result, stating that all distributions converge to the stationary one due to such

a (P)

-dynamics, does not mean that the entropic distance between two non-stationary distribution would continuously shrink during their evolution. They will all reach the stationary state and reduce their entropic divergence to zero only in the final stage.

It is traditional to use the function

f (ξ) = - ln ξ

for generating the entropic divergence formula. Indeed in this case

f^{'} (ξ) = - 1 / ξ

and

f^{″} (ξ) = 1 / ξ^{2} > 0

. The usual argumentation behind this choice is a further property, making the core function,

f (ξ)

, in the Csiszar formula additive. For composite systems with subsystems

(1)

and

(2)

the probabilities namely factorize if statistical independence holds:

P_{n} (1 \oplus 2) = P_{n} (1) P_{n} (2)

and similarly for

Q_{n}

. This makes

ξ (1 \oplus 2) = ξ (1) ξ (2)

and for

f (1 \oplus 2) = f (1) + f (2)

with positive second derivative only this choice remains. Using this starting point the entropic divergence formula, Equation (1), and returning to the

a (P) = P

case specifies to the Kullback–Leibler definition [5,40]

ρ^{(K L)} [P, Q] = \sum_{n} Q_{n} ln \frac{Q_{n}}{P_{n}} .

(20)

This formula contains a so called “cross entropy” term and a pure entropy term associated to the PDF Q, but it is not yet unique what entropy formula would follow from this. To make this last step we utilize our concept of complexity: the entropic divergence of the uniform distribution,

U_{n} = 1 / W

for

n = 1, 2, \dots W

from

Q_{n}

, i.e.,

ρ^{(K L)} [U, Q] = \sum_{n = 1}^{W} Q_{n} ln (W Q_{n}) = ln W + \sum_{n = 1}^{W} Q_{n} ln Q_{n},

(21)

equals to a difference,

ρ^{(K L)} [U, Q] = S^{(B G)} [U] - S^{(B G)} [Q]

, with the traditional definition of the Boltzmann–Gibbs entropy

S^{(B G)} [Q] \equiv - \sum_{n} Q_{n} ln Q_{n} .

(22)

In this case the Kullback–Leibler based complexity measure turns out to be a difference of the Boltzmann entropies, as suggested by the P-linear master equation dynamics as a proper measure. Its generalization, however, is not a difference of more general entropy formulas. The entropic divergence formula based on the corresponding probability evolution description in the master equation has to be the starting point, and its specification for a divergence from the uniform distribution is either a difference of the same formula at U and Q, or not.

As an example, we investigate a power-like nonlinearity,

a (P) = P^{λ}

. In this case, we do not use the Csiszar formula, but solve Equation (19) with the traditional logarithmic ansatz,

f (ξ) = - ln ξ

:

\frac{\partial σ}{\partial P_{n}} = f^{'} (ξ_{n}) = - \frac{Q_{n}^{λ}}{P_{n}^{λ}} .

(23)

Its particular solution fixing

ρ [Q, Q] = 0

reads as

ρ^{(P O W)} [P, Q] = \sum_{n} Q_{n} {ln}_{λ} \frac{Q_{n}}{P_{n}},

(24)

with the so called deformed logarithm function [41,42,43] parametrized with

λ

:

{ln}_{λ} x \equiv \frac{1 - x^{λ - 1}}{1 - λ} .

(25)

We note that starting with an

f (ξ)

function of a deformed logarithm as above, the result again has the same form, just the new

λ

is the product of the starting parameter

ν

in the deformation and the power in the master equation level non-linearity,

λ_{new} = λ ν

. The result Equation (24) is robust. Finally the suggested complexity measure becomes

ρ^{(P O W)} [U, Q] = \sum_{n = 1}^{W} Q_{n} {ln}_{λ} (W Q_{n}) = W^{λ - 1} (S_{T} [U] - S_{T} [Q]) .

(26)

It is proportional to a difference of Tsallis entropies,

S_{T} [Q] \equiv - \sum_{n} Q_{n} {ln}_{λ} Q_{n}, S_{T} [U] = - {ln}_{λ} (1 / W),

(27)

and hence by construction proves that the Tsallis entropy is also maximal at the uniform distribution, since

ρ \geq 0

holds, based on the Jensen inequality applied to Equation (24). The forefactor

W^{λ - 1}

, depending on the number of states, signalizes non-extensivity in this case.

3. LGGR: A Linear Model for Local Growth and Global Resets

The master Equation (3) and its nonlinear generalization (9) describing the evolution of a probability distribution over possible physical microstates are quite general. The familiar diffusion problem in one dimension is described by transition rates between neighboring indices only, allowing to increase or decrease the index by one in a short time

Δ t

. Their symmetric part is responsible for the diffusion coefficient while their difference provides a driving force for the drift. In higher dimensional diffusion problems there are also some sparse nonzero elements according to the finite size, like

w_{n + N_{x}, n}

for a local jump in the positive y direction, etc. These systems in the continuum limit are equivalent to a simple Fokker–Planck equation [44,45,46].

On the other hand in the up to date research we often face complex systems with underlying dynamics far from the detailed balance: the ratio of reversed micro-transition rates is far from looking as a ratio of a simple given function of indices,

w_{n m} / w_{m n} \neq Q_{n} / Q_{m}

. In this paper we present a particular simple model for extremely asymmetric processes: the growth rate is local in the index, while there is a global reset rate from anywhere to the starting (ground) state with index zero. In this local growth, global reset (LGGR) model we assume the following structure for the transition rate matrix

(m \to n)

:

w_{n m} = μ_{m} δ_{n, m + 1} + γ_{m} δ_{n, 0} .

(28)

The resulting linear master equation becomes

{\dot{P}}_{n} = μ_{n - 1} P_{n - 1} - (μ_{n} + γ_{n}) P_{n}

(29)

for

n > 0

. The evolution for the

n = 0

state,

{\dot{P}}_{0}

, then includes the summation

{\dot{P}}_{0} = \sum_{m} γ_{m} P_{m} - (μ_{0} + γ_{0}) P_{0},

(30)

but alternatively

P_{0}

can be obtained from the normalization condition,

P_{0} = 1 - \sum_{n = 1}^{\infty} P_{n}

. Now the complexity measure is given by

ρ^{(L G G R)} [U, Q] = \sum_{n = 1}^{W} Q_{n} f^{(K L)} (W Q_{n}) = ln W + \sum_{n = 1}^{W} Q_{n} ln Q_{n},

(31)

and we are especially interested in its value for the stationary distribution of a given LGGR model. So the missing link is the computation of the stationary PDF in terms of

μ_{n}

and

γ_{n}

.

Here we review some of the most important LGGR models. The stationary distribution is obtained by the recursive resolution of the

{\dot{Q}}_{n} = 0

condition,

Q_{n} = \frac{μ_{n - 1}}{μ_{n} + γ_{n}} Q_{n - 1} = Q_{0} \prod_{j = 1}^{n} \frac{μ_{j - 1}}{μ_{j} + γ_{j}} .

(32)

For constant growth and reset rates,

μ_{n} = σ

,

γ_{n} = γ

, the result is the exponential distribution,

Q_{n} = Q_{0} e^{- β n}

(33)

with

β = ln (1 + γ / σ)

. In this LGGR model with the Kullback–Leibler divergence and the traditional Boltzmannian entropy formula also non-exponential stationary distributions emerge if the transition rates are state-dependent. With a growth rate featuring linear preference,

μ_{n} = σ (n + b)

, the stationary solution is a Waring distribution [47],

Q_{n} = Q_{0} \frac{{(b)}_{n}}{{(c)}_{n}},

(34)

using the Pochhammer symbol

{(a)}_{n} = a (a + 1) \dots (a + n - 1)

and

c = b + 1 + γ / σ

. It exemplifies a power-law decay for large n,

Q_{n} \sim n^{- 1 - γ / σ}

.

In the continuous LGGR version we consider the evolution equation

\frac{\partial}{\partial t} P (x, t) = - \frac{\partial}{\partial x} (μ (x) P (x, t)) - γ (x) P (x, t) + A (t) δ (x)

(35)

with

A (t) \equiv \int_{0}^{\infty} γ (x) P (x, t) d x - μ (0) P (0, t) .

(36)

This quantity may vanish for some special cases where

μ (0) = 0

and

\int_{0}^{\infty} γ (x) P (x, t) d x = 0

. The corresponding stationary distribution becomes

Q (x) = \frac{μ (0) Q (0)}{μ (x)} exp (- \int_{0}^{x} \frac{γ (u)}{μ (u)} d u) .

(37)

In particular the stationary PDF belonging to the linear local growth rate,

μ (x) = σ (x + b)

, featuring linear preference and combined with a constant reset rate,

γ (x) = γ

, is given by the Tsallis–Pareto distribution:

Q (x) = \frac{γ}{σ b} {(1 + \frac{x}{b})}^{- 1 - γ / σ} .

(38)

So power-law tailed distributions are stationary solutions to linear probability evolution equations with the traditional entropy and entropic divergence formula, if a constant global reset rate is connected with a linear preference in the local growth rate.

The total balance condition (37) determining the stationary PDF, on the other hand, can be viewed as a generalized fluctuation–dissipation theorem [48]. This form is useful for cases when the stationary PDF,

Q (x)

, can be easily observed: than one of the rates provides information on the other. Simple integration of Equation (35) with

\dot{Q} (x) = 0

leads to such a relation, expressing the local growth rate with a distribution-tail average of the reset rate:

μ (x) = \frac{1}{Q (x)} \int_{x}^{\infty} γ (u) Q (u) d u .

(39)

For an exponential PDF,

Q (x) = e^{- β x} / Z

, and constant reset rate,

γ (x) = γ

, one obtains

μ = γ / β

in analogy to the relation between diffusion rate and dissipation coefficient [49,50]. In general, for constant global reset rate

γ

,

μ (x) / γ

becomes the inverse hazard rate [51]. Such features transform LGGR to a simple and effective candidate for being an “Ising model”. It can be applied to far from equilibrium, with no detailed balance, and with complex system dynamics with various transition rates describing preferential scenarios.

4. Citation Popularity, Income Distribution and Some Further Application Areas

Finally we would like to review a few applications of this simplified LGGR treatment. As it has been published in [26], stationary PDF-s can be reproduced in several socio-economic or physical models. Socio-economic examples include popularity ranking from citation numbers [52,53,54], income [55,56,57] and settlement sizes [58]. For biological and ecological systems population abundance [59] on different levels (species, spatial, etc.) can be studied. For physical systems a good example is the hadron number distribution in high energy accelerator collision events [60].

In exceptional cases the income data are known on the level of individuals and are available for an extended period over several years. In such cases they allow for testing the growth and the entry/exit rates experimentally. We emphasize here that simply observing a certain distribution,

Q (x)

, does not decide whether it is stationary, and even less whether an LGGR model could satisfactorily describe the real dynamics in its background. Whenever both observed distributions and transition rates with unidirected small-step growth and big-step global resets can be observed, then the LGGR can be verified or falsified.

As already discussed in earlier publications, citation popularity data follow a continuous Tsallis–Pareto distribution [52] which, scaled with the average number of citations, contains only one nontrivial fit parameter. In our analysis we have included Web of Science citations of some journals and institutes with international reputation as well as individuals. Furthermore Facebook likes and shares were counted, and similarly Youtube likes were recorded for diverse web pages. Surprisingly, all these popularity data collapse onto the same universal curve. The observed distributions can be understood by assuming linear preference in the local growth rate and a constant reset rate in the continuous version of the dynamics [52].

It is methodically useful to report here briefly about our recent investigations on income distribution. The starting point for our study is an individual level, complete income data set for a nine year period (2001–2009) in Cluj county (Romania) [61]. Beside the observed

Q (x)

PDF for income, we can follow the individual level income data to justify assumptions on the rates

μ (x)

and

γ (x)

. Our preliminary findings include an improved ansatz on these rates. The reset rate,

γ (x)

, occurs to be intelligent: at low income level it is negative, people enter into the income system mainly at this level, while at high income it is positive, the income is naturally high when they are close to retirement. The following fit describes well the averaged data

γ (x) = K - \frac{b}{x + q},

(40)

with K, b and q freely adjustable parameters.

The local growth rate,

μ (x)

, contains a simple linear term since the income (salary) increase is as a rule in per cent, so the amount of increase is proportional to the income at that moment. The assumed

μ (x) = β x

(41)

functional form fits well the income data in Romania, Cluj district in the years 2001–2009.

For these growth and reset rates the resulting PDF is analytically obtained: using Equation (29), and normalizing it we obtain the Beta Prime distribution

Q (x) = q^{(\frac{K}{β} - \frac{b}{β q})} \frac{Γ (\frac{b}{q β})}{Γ (\frac{b}{q β} - \frac{K}{β}) Γ (\frac{K}{β})} {(1 + \frac{x}{q})}^{- \frac{b}{β q}} x^{\frac{b}{β q} - \frac{K}{β} - 1} .

(42)

This formula, normalized and expressed in terms of

z = x / ⟨ x ⟩

contains only one parameter, and income distributions from Romania, Hungary, USA, Finland and Australia fall onto the common universal curve,

⟨ x ⟩ Q (x) = 12 z {(1 + z)}^{- 5}

.

Finally, let us mention the hadronization process in high energy elementary particle and heavy-ion collisions as a possible application field. Here, two distributions seem not to have a microscopic dynamical explanation, namely the multiplicity distribution of hadron number in a series of collisions with the same total energy, and the one-particle kinetic energy distribution averages over such collision events. The former seems to follow a negative binomial distribution (NBD) [62], although at the Large Hadron Collider (LHC) energies this is just a first approximation: the real distribution, including higher and higher kinetic energies in the count might require more sophisticated descriptions [63]. The latter on the other hand is beautifully fitted by a Tsallis–Pareto distribution in the mid-rapidity range, best interpolating between low, mid and higher transverse momenta in the range

p_{T} = 0.8 \dots 12

GeV [60,64].

We note that these distributions can be related to each other: a simple filling of a phase space volume whose dimensionality changes with the fluctuating number of produced hadrons event by event relates the NBD and the Tsallis–Pareto power-law tailed distribution in the individual kinetic energy. It is easy to demonstrate as follows.

In kinetic models, the microcanonical phase space is an energy shell in high dimension. The constant total energy constraint is a norm on the individual momenta, in a one-dimensional extreme relativistic jet this norm is

L_{1}

,

E = \sum_{i = 1}^{n} | p_{i} |

, in a two-dimensional section at mid-rapidity of massive, non-relativistic particles on the other hand it is Pythagorean,

E = \sum_{i = 1}^{2 n} p_{i}^{2} / 2 m

. Both formulas are given for n particles. In a general asset we are interested in the surface of an N-ball with radius R in

L_{p}

norm, which is known to be [65]

V_{N}^{(p)} (R) = \frac{{[2 R Γ (1 + 1 / p)]}^{N}}{Γ (1 + N / p)} .

(43)

One specifies

N = n, p = 1

and

R (E) = E

for the jets and

N = 2 n, p = 2

with

R (E) = \sqrt{2 m E}

for the two-dimensional ideal non-relativistic gas. The calculations deliver the results

V_{n}^{(1)} (E) = {(2 E)}^{n} / n!

and

V_{2 n}^{(2)} (\sqrt{2 m E}) = {(π m E)}^{n} / n!

. The microcanonical constraint volumes are obtained by a simple derivation from these formulas,

Ω_{N}^{(p)} (E) = d V_{N}^{(p)} (R (E)) / d E

. Finally the phase space ratio (conditional probability) for a selected single particle having

ε

energy from this total of E in both cases is given by

r_{n}^{(1)} (ε, E) = r_{2 n}^{(2)} (ε, E) = \frac{Ω (E - ε)}{Ω (E)} = \frac{n - 1}{E} {(1 - \frac{ε}{E})}^{n - 2} .

(44)

This ratio is to be averaged over an NBD distribution of the newly produced

n - 2

hadrons (2 is minimally needed for making the collision),

Q_{n} = (\binom{n + k}{n}) f^{n} {(1 + f)}^{- n - k - 1} .

(45)

The result is a Tsallis–Pareto distribution [66] with some modifying factors:

Q (ε) = \frac{1}{E} [1 + f (k + 1) - f k \frac{ε}{E}] {(1 + f \frac{ε}{E})}^{- k - 2} .

(46)

For large

⟨ n ⟩ ≫ 1

, characteristic for central heavy ion collisions, it approaches the classical form

Q (ε) \approx \frac{1}{T} {(1 + \frac{1}{k + 1} \frac{ε}{T})}^{- k - 2},

(47)

with

T = E / ⟨ n ⟩ = f (k + 1)

, while for small

⟨ n ⟩ ≪ 1

, more proper for describing elementary

p p

collisions, a similar Tsallis–Pareto law emerges with a modified power:

Q (ε) \approx \frac{1}{E} {(1 + \frac{1}{k + 1} \frac{ε}{T})}^{- k - 1} .

(48)

Equation (46) represents a monotonic falling distribution in

ε

in the interval

[0, E]

and its integral over this interval is normalized to 1. Several measurements agree with this or similar functional forms, although some details can also be fitted with different forefactors to the Tsallis–Pareto term [67].

5. Summary

In conclusion, we presented an H-theorem like proof for stochastic evolution without detailed balance, in particular we have shown that Csiszár’s f-divergence formula for the entropic divergence shrinks between two arbitrary starting probability distributions for

f^{″} > 0

. The condition for this statement is that the same master equation, linear in the probability, governs the evolution of both. Further we presented the local growth global reset (LGGR) model, as a simple approach to processes far from the detailed balance and yet leading to interesting stationary distributions. This model utilizes only two rates,

μ (x)

for the growth and

γ (x)

for the reset to the

x = 0

point in the state space. A fluctuation-dissipation relation connects these rates via the normalized tail probability [68] of the stationary PDF, cf. Equation (39). Application fields for LGGR type approaches are numerous, here we examplified the citation popularity, the income distribution and the hadron distribution in high energy experiments.

Author Contributions

Conceptualization and methodology was performed by T.S.B.; Formal analysis, data validation and discussion was shared by all authors; Conference talk and draft preparation as well as funding acquisition for this, including APC was provided by T.S.B.

Funding

This research and the APC for this paper was funded by the Hungarian National Research, Development and Innovation Office, grant number K123815. Further support was achieved from the UEFISCDI research grant PN-III-P4-ID-PCCF-2016-0084.

Conflicts of Interest

The authors declare no conflict of interest.

References

Marton, K. A measure concentration inequality for contracting Markov chains. Geom. Funct. Anal. 1996, 6, 556–571. [Google Scholar] [CrossRef]
Talagrand, M. Transportation cost for Gaussian and other product measures. Geom. Funct. Anal. 1996, 6, 587–600. [Google Scholar] [CrossRef]
Villani, C. Optimal Transport: Old and New; Springer: Berlin/Heidelberger, Germany, 2008; Volume 338. [Google Scholar]
Guillin, A.; Léonard, C.; Wu, L.; Yao, N. Transportation-information inequalities for Markov processes. Probab. Theory Relat. Field 2009, 144, 669–695. [Google Scholar] [CrossRef]
Kullback, S. Information Theory and Statistics; Courier Corporation: Paris, France, 1997. [Google Scholar]
Fathi, M.; Shu, Y. Curvature and transport inequalities for Markov chains in discrete spaces. Bernoulli 2018, 24, 672–698. [Google Scholar] [CrossRef]
Cheng, L.; Li, R.; Wu, L. Ricci curvature and W_1-exponential convergence of Markov processes on graphs. arXiv 2019, arXiv:1907.11036. [Google Scholar]
Boltzmann, L. Über die Beziehung zwischen dem zweiten Hauptsatze der mechanischen Wärmetheorie und der Wahrscheinlichkeitsrechnung, respective den Sätzen über das Wärmegleichgewicht; Kk Hof-und Staatsdruckerei: Wien, Austria, 1877. [Google Scholar]
Gibbs, J.W. Elementary Principles in Statistical Mechanics; Yale University Press: New Haven, CT, USA, 2014. [Google Scholar]
Planck, M. Über den zweiten Hauptsatz der mechanicshen Wärmetheorie. Inaugural Dissertation zur Erlangung der Philosophischen Doktowürde, K. Universität München, 1879, Theodor Ackermann, München, 1879. Available online: https://edoc.hu-berlin.de/bitstream/handle/18452/734/planck.pdf?sequence=1 (accessed on 9 October 2019).
Planck, M. Entropy and Temperature of Radiant Heat. Ann. Phys.-Berlin 1900, 1, 719–737. [Google Scholar] [CrossRef]
Shannon, C.E. A mathematical theory of communication. Bell Labs Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Rényi, A. On the dimension and entropy of probability distributions. Acta Math. Hung. 1959, 10, 193–215. [Google Scholar] [CrossRef]
Rényi, A. On measures of entropy and information. In Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability, Vol 1: Contributions to the Theory of Statistics, Berkeley, CA, USA, 20 June–30 July 1960; University of California Press: Oakland, CA, USA, 1961; p. 767. [Google Scholar]
Aczél, J.; Daróczy, Z. On measures of information and their characterizations. In Mathematics in Science and Engineering; Academic Press: New York, NY, USA, 1975; Volume 115, p. 168. [Google Scholar]
Aczel, J.; Forte, B. Generalized entropies and the maximum entropy principle. In Maximum Entropy and Bayesian Methods in Applied Statistics; Cambridge University Press: Cambridge, UK, 1986; pp. 95–100. [Google Scholar]
Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
Deng, Y. Deng entropy. Chaos Solitons Fractals 2016, 91, 549–553. [Google Scholar] [CrossRef]
Khintchine, A. Korrelationstheorie der stationären stochastischen Prozesse. Math. Ann. 1934, 109, 604–615. [Google Scholar] [CrossRef]
Khinchin, A.Y. The concept of entropy in the theory of probability. Uspekhi Mat. Nauk 1953, 8, 3–20. [Google Scholar]
Tsallis, C. Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World; Springer: Berlin/Heidelberger, Germany, 2009. [Google Scholar]
Song, Y.; Deng, Y. Divergence Measure of Belief Function and Its Application in Data Fusion. IEEE Access 2019, 7, 107465–107472. [Google Scholar] [CrossRef]
Song, Y.; Deng, Y. A new method to measure the divergence in evidential sensor data fusion. Int. J. Distrib. Sens. Netw. 2019, 15, 1550147719841295. [Google Scholar] [CrossRef]
Fei, L.; Deng, Y. A new divergence measure for basic probability assignment and its applications in extremely uncertain environments. Int. J. Intell. Syst. 2019, 34, 584–600. [Google Scholar] [CrossRef]
Aczél, J.; Forte, B.; Ng, C.T. Why the Shannon and Hartley entropies are ‘natural’. Adv. Appl. Probab. 1974, 6, 131–146. [Google Scholar] [CrossRef]
Biró, T.S.; Néda, Z. Unidirectional random growth with resetting. Physica A 2018, 499, 335–361. [Google Scholar] [CrossRef]
Chen, A.; Lu, Y.; Ng, K.; Zhang, H. Markov branching processes with killing and resurrection. Sci. China-Math. 2016, 59, 573–588. [Google Scholar] [CrossRef]
Biró, T.; Telcs, A.; Néda, Z. Entropic Distance for Nonlinear Master Equation. Universe 2018, 4, 10. [Google Scholar] [CrossRef]
Kubo, R. The fluctuation-dissipation theorem. Rep. Prog. Phys. 1966, 29, 255. [Google Scholar] [CrossRef]
Csiszár, I. I-divergence geometry of probability distributions and minimization problems. Ann. Probab. 1975, 146–158. [Google Scholar] [CrossRef]
Csiszár, I. Eine informationtheoretische Ungleichung und ihre Anwednung auf den Beweis der Egrodizität von Markoffschen Ketten. Magyar Tud. Akad. Mat. Kut. Int. Közl. 1963, 8, 85–108. [Google Scholar]
Jensen, J.L.W.V. Sur les fonctions convexes et les inégalités entre les valeurs moyennes. Acta Math. 1906, 30, 175–193. [Google Scholar] [CrossRef]
Lieb, E.H. Some convexity and subadditivity properties of entropy. In Inequalities; Springer: Berlin/Heidelberger, Germany, 2002; pp. 67–79. [Google Scholar]
Lagrange, J.L. Théorie des Fonctions Analytiques; Ve. Courcier: Paris, France, 1797. [Google Scholar]
Gorban, A.N.; Gorban, P.A.; Judge, G. Entropy: The Markov Ordering Approach. Entropy 2010, 12, 1145–1193. [Google Scholar] [CrossRef]
Zhou, Y.; Yin, C. Power-law Fokker-Planck equation of unimolecular reaction based on the approximation to master equation. Phyisca A 2016, 463, 445–451. [Google Scholar] [CrossRef]
Boon, J.P.; Lutsko, J.F.; Lutsko, C. Microscopic approach to nonlinear reaction-diffusion: The case of morphogen gradient formation. Phys. Rev. E 2012, 85, 021126. [Google Scholar] [CrossRef]
Borland, L. Microscopic dynamics of the nonlinear Fokker-Planck equation: A phenomenological model. Phys. Rev. E 1998, 57, 6634–6642. [Google Scholar] [CrossRef]
Bertsch, G.F.; Gupta, S.D. A guide to microscopic models fir intermediate energy heavy ion collisions. Phys. Rep. 1988, 160, 189–233. [Google Scholar] [CrossRef]
Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
Naudts, J. Deformed exponentials and logarithms in generalized thermostatistics. Physica A 2002, 316, 323–334. [Google Scholar] [CrossRef]
Borges, E.P. A possible deformed algebra and calculus inspired in nonextensive thermostatistics. Physica A 2004, 340, 95–101. [Google Scholar] [CrossRef]
Kaniadakis, G.; Lissia, M.; Scarfone, A.M. Two-parameter deformations of logarithm, exponential, and entropy: A consistent framework for generalized statistical mechanics. Phys. Rev. E 2005, 71, 046128. [Google Scholar] [CrossRef] [PubMed]
Planck, M. Fokker-Planck equation. Sitzungsber. Preuß. Akad. Wiss 1917, 3, 324–341. [Google Scholar]
Fokker, A.D. Ein invarianter Variationssatz für die Bewegung mehrerer elektrischer Massenteilchen. Z. Phys. 1929, 58, 386–393. [Google Scholar] [CrossRef]
Risken, H. The Fokker-Planck Equation; Springer: Berlin/Heidelberger, Germany, 1996; pp. 63–95. [Google Scholar]
Irwin, J.O. The generalized Waring distribution applied to accident theory. J. R. Stat. Soc. Ser. A-Stat. Soc. 1968, 131, 205–225. [Google Scholar] [CrossRef]
Bochkov, G.N.; Kuzovlev, Y.E. Nonlinear fluctuation-dissipation relations and stochastic models in nonequilibrium thermodynamics: I. generalized fluctuation-dissipation theorem. Physica A 1981, 106, 443–479. [Google Scholar]
Einstein, A. Über die von der molekularkinetischen Theorie der Wärme geforderte Bewegung von in ruhenden Flüssigkeiten suspendierten Teilchen. Ann. Phys. (Berl.) 1905, 322, 549–560. [Google Scholar] [CrossRef]
Einstein, A. Zur Theorie der Brownschen Bewegung. Ann. Phys. (Berl.) 1906, 324, 371–381. [Google Scholar] [CrossRef]
Rausand, M.; Hoyland, A. System Reliability Theory: Models, Statistical Methods, and Applications; John Wiley & Sons: Chichester, UK, 2004. [Google Scholar]
Néda, Z.; Varga, L.; Biró, T.S. Science and Facebook: The same popularity law ! PLoS ONE 2017, 12, e0179656. [Google Scholar] [CrossRef]
Schubert, A.; Glänzel, W. A dynamic look at a class of skew distributions. A model with scientometric applications. Scientometrics 1984, 6, 149. [Google Scholar] [CrossRef]
Glänzel, W.; Schubert, A. Predictive aspects of a stochastic model for citation processes. Inf. Process. Manag. 1995, 31, 69. [Google Scholar] [CrossRef]
Chakraborti, A.; Chatterjee, A.; Chakrabarti, B.; Chakravarty, S.R. Econophysics of Income and Wealth Distributions; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
Piketty, T.; Goldhammer, A. Capital in the Twenty-First Century; The Belknap Press of Harvard University Press: Cambridge, MA, USA, 2014. [Google Scholar]
Bouchaud, J.P.; Mézard, M. Wealth condensation in a simple model of economy. Physica A 2000, 282, 536–545. [Google Scholar] [CrossRef]
Bee, M.; Riccaboni, M.; Schiavo, S. Distribution of City Size: Gibrat, Pareto, Zipf. In The Mathematics of Urban Morphology; Birkhäuser: Cham, Switzerland, 2019; pp. 77–91. [Google Scholar]
Horvát, S.; Derzsi, A.; Néda, Z.; Balog, A. A spatially explicit model for tropical tree diversity patterns. J. Theor. Biol. 2010, 265, 517–523. [Google Scholar] [CrossRef] [PubMed]
Bíró, G.; Barnaföldi, G.G.; Papp, G.; Biró, T.S. Multiplicity Dependence in the Non-Extensive Hadronization Model Calculated by the HIJING Framework++. Universe 2019, 5, 134. [Google Scholar] [CrossRef]
Derzsy, N.; Neda, Z.; Santos, M.A. Income distribution patterns from a complete social security database. Physica A 2012, 391, 5611–5619. [Google Scholar] [CrossRef]
Fisher, R.A. The negative binomial distribution. Ann. Eugen. 1941, 11, 182–187. [Google Scholar] [CrossRef]
Grosse-Oetringshaus, J.I.; Rygers, K. Charged-Particle Multiplicity in Proton-Proton Collisions. J. Phys. G 2010, 37, 083001. [Google Scholar] [CrossRef]
Bíró, G. Application of New-Generation High Energy Physical Detector Simulators to the Investigation of Identified Hadron Spectra. MSc Thesis, Eötvös University, Budapest, Hungary, 2015. [Google Scholar]
Barthe, F.; Guédon, O.; Mendelson, S.; Naor, A. A probabilistic approach to the geometry of the ℓp n-ball. Ann. Probab. 2005, 33, 480–513. [Google Scholar] [CrossRef]
Pareto, V. Manuale di Economia Politica; Societa Editrice: Torino, Italy, 1906; Volume 13. [Google Scholar]
Bíró, G.; Barnaföldi, G.G.; Biró, T.S.; Ürmössy, K.; Takács, A. Systematic Analysis of the Non-Extensive Statistical Approach in High Energy Particle Collisions-Experiments vs Theory. Entropy 2017, 19, 88. [Google Scholar] [CrossRef]
Telcs, A.; Glänzel, W.; Schubert, A. Characterization and statistical test using truncated expectations for a class of skew distributions. Math. Soc. Sci. 1985, 10, 169. [Google Scholar] [CrossRef]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Entropic Divergence and Entropy Related to Nonlinear Master Equations

Abstract

1. Motivation

2. Shrinking Entropic Divergence

3. LGGR: A Linear Model for Local Growth and Global Resets

4. Citation Popularity, Income Distribution and Some Further Application Areas

5. Summary

Author Contributions

Funding

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics