Generalized (c,d)-entropy and aging random walks

Complex systems are often inherently non-ergodic and non-Markovian for which Shannon entropy loses its applicability. In particular accelerating, path-dependent, and aging random walks offer an intuitive picture for these non-ergodic and non-Markovian systems. It was shown that the entropy of non-ergodic systems can still be derived from three of the Shannon-Khinchin axioms, and by violating the fourth -- the so-called composition axiom. The corresponding entropy is of the form $S_{c,d} \sim \sum_i \Gamma(1+d,1-c\ln p_i)$ and depends on two system-specific scaling exponents, $c$ and $d$. This entropy contains many recently proposed entropy functionals as special cases, including Shannon and Tsallis entropy. It was shown that this entropy is relevant for a special class of non-Markovian random walks. In this work we generalize these walks to a much wider class of stochastic systems that can be characterized as `aging' systems. These are systems whose transition rates between states are path- and time-dependent. We show that for particular aging walks $S_{c,d}$ is again the correct extensive entropy. Before the central part of the paper we review the concept of $(c,d)$-entropy in a self-contained way.


INTRODUCTION -MINI-REVIEW OF (c, d)-ENTROPY
In their seminal works, Shannon and Khinchin showed that assuming four information theoretic axioms the entropy must be of Boltzmann-Gibbs type, S = − i p i log p i . In many physical systems one of these axioms may be violated. For non-ergodic systems the so called separation axiom (Shannon-Khinchin axiom 4) is not valid. We show that whenever this axiom is violated the entropy takes a more general form, S c,d ∝ W i Γ(d + 1, 1 − c log p i ), where c and d are scaling exponents and Γ(a, b) is the incomplete gamma function. These exponents (c, d) define equivalence classes for all!, interacting and non interacting, systems and unambiguously characterize any statistical system in its thermodynamic limit. The proof is possible because of two newly discovered scaling laws which any entropic form has to fulfill, if the first three Shannon-Khinchin axioms hold [1]. (c, d) can be used to define equivalence classes of statistical systems. A series of known entropies can be classified in terms of these equivalence classes. We show that the corresponding distribution functions are special forms of Lambert-W exponentials containing -as special cases -Boltzmann, stretched exponential, and Tsallis distributions (power-laws). We go on by showing how the dependence of phase space volume W (N ) of a classical system on its size N , uniquely determines its extensive entropy, and in particular that the requirement of extensively fixes the exponents (c, d), [2]. We give a concise criterion when this entropy is not of Boltzmann-Gibbs type but has to assume a generalized (non-additive) form. We showed that generalized entropies can only exist when the dynamically (statistically) relevant fraction of degrees of freedom in the system vanishes in the thermodynamic limit [2]. These are systems where the bulk of the degrees of freedom is frozen and is practically statistically inactive. Systems governed by generalized entropies are therefore systems whose phase space volume effectively collapses to a lower-dimensional 'surface'. We explicitly illustrated the situation for binomial processes and argue that generalized entropies could be relevant for self organized critical systems such as sand piles, for spin systems which form meta-structures such as vortices, domains, instantons, etc., and for problems associated with anomalous diffusion [2]. In this contribution we largely follow the lines of thought presented in [1][2][3].
Theorem number 2 in the seminal 1948 paper, The Mathematical Theory of Communication [4], by Claude Shannon, proves the existence of the one and only form of entropy, given that three fundamental requirements hold. A few years later A.I. Khinchin remarked in his Mathematical Foundations of Information Theory [5]: "However, Shannon's treatment is not always sufficiently complete and mathematically correct so that, besides having to free the theory from practical details, in many instances I have amplified and changed both the statement of definitions and the statement of proofs of theorems." Khinchin adds a fourth axiom. The three fundamental requirements of Shannon, in the 'amplified' version of Khinchin, are known as the Shannon-Khinchin (SK) axioms. These axioms list the requirements needed for an entropy to be a reasonable measure of the 'uncertainty' about a finite probabilistic system. Khinchin further suggests to also use entropy as a measure of the information gained about a system when making an 'experiment', i.e. by observing a realization of the probabilistic system.
• Khinchin's first axiom states that for a system with W potential outcomes (states) each of which is given by a probability p i ≥ 0, with S(p 1 , · · · , p W ) as a measure of uncertainty about the system must take its maximum for the equi-distribution p i = 1/W , for all i.
• Khinchin's third axiom (separability axiom) finally makes a statement of the composition of two finite probabilistic systems A and B. If the systems are independent of each other, entropy should be additive, meaning that the entropy of the combined system A + B should be the sum of the individual systems, S(A + B) = S(A)+ S(B). If the two systems are dependent on each other, the entropy of the combined system, i.e. the information given by the realization of the two finite schemes A and B, S(A + B), is equal to the information gained by a realization of system A, S(A), plus the mathematical expectation of information gained by a realization of system B, after the realization of system A, S(A + B) = S(A) + S| A (B).
• Khinchin's fourth axiom is the requirement that entropy is a continuous function of all its arguments p i and does not depend on anything else.
Given these axioms, the Uniqueness theorem [5] states that the one and only possible entropy is where k is an arbitrary positive constant. The result is of course the same as Shannon's. We call the combination of 4 axioms the Shannon-Khinchin (SK) axioms. From information theory now to physics, where systems may exist that violate the separability axiom. This might especially be the case for non-ergodic, complex systems exhibiting long-range and strong interactions. Such complex systems may show extremely rich behavior in contrast to simple ones, such as gases. There exists some hope that it should be possible to understand such systems also on a thermodynamical basis, meaning that a few measurable quantities would be sufficient to understand their macroscopic phenomena. If this would be possible, through an equivalent to the second law of thermodynamics, some appropriate entropy would enter as a fundamental concept relating the number of microstates in the system to its macroscopic properties. Guided by this hope, a series of so called generalized entropies have been suggested over the past decades, see [6][7][8][9][10][11] and Table 1. These entropies have been designed for different purposes and have not been related to a fundamental origin. Here we ask how generalized entropies can look like if they fulfill some of the Shannon-Khinchin axioms, but explicitly violate the separability axiom. We do this axiomatically as first presented in [1]. By doing so we can relate a large class of generalized entropies to a single fundamental origin.
The reason why this axiom is violated in some physical, biological or social systems is broken ergodicity, i.e. that not all regions in phase space are visited and many micro states are effectively 'forbidden'. Entropy relates the number of micro states of a system to an extensive quantity, which plays the fundamental role in the systems thermodynamical description. Extensive means that if two initially isolated, i.e. sufficiently separated systems, A and B, with W A and W B the respective numbers of states, are brought together, the entropy of the combined system A + B is S(W A+B ) = S(W A ) + S(W B ). W A+B is the number of states in the combined system A + B. This is not to be confused with additivity which is the property that S(W A W B ) = S(W A ) + S(W B ). Both, extensivity and additivity coincide if number of states in the combined system is W A+B = W A W B . Clearly, for a noninteracting system Boltzmann-Gibbs-Shannon entropy, S BG [p] = − W i p i ln p i , is extensive and additive. By 'non-interacting' (short-range, ergodic, sufficiently mixing, Markovian, ...) systems we mean W A+B = W A W B . For interacting statistical systems the latter is in general not true; phase space is only partly visited and W A+B < W A W B . In this case, an additive entropy such as Boltzmann-Gibbs-Shannon can no longer be extensive and vice versa. To ensure extensivity of entropy, an entropic form should be found for the particular interacting statistical systems at hand. These entropic forms are called generalized entropies and usually assume trace form [6][7][8][9][10][11] W being the number of states. Obviously not all generalized entropic forms are of this type. Rényi entropy e.g. is of the form G( W i g(p i )), with G a monotonic function. We use trace forms Eq. (2) for simplicity. Rényi forms can be studied in exactly the same way as will be shown, however at more technical cost.
Let us revisit the Shannon-Khinchin axioms in the light of generalized entropies of trace form Eq. (2). Specifically axioms SK1-SK3 (now re-ordered) have implications on the functional form of g • SK1: The requirement that S depends continuously on p implies that g is a continuous function.
• SK2: The requirement that the entropy is maximal for the equi-distribution p i = 1/W (for all i) implies that g is a concave function.
• SK3: The requirement that adding a zeroprobability state to a system, W +1 with p W +1 = 0, does not change the entropy, implies that g(0) = 0.
• SK4 (separability axiom): The entropy of a system -composed of sub-systems A and B -equals the entropy of A plus the expectation value of the entropy of B, conditional on A. Note that this also corresponds exactly to Markovian processes.
As mentioned, if SK1 to SK4 hold, the only possible entropy is the Boltzmann-Gibbs-Shannon entropy. We are now going to derive the extensive entropy when the separability axiom SK4 is violated. Obviously this entropy will be more general and should contain BG entropy as a special case.
We now assume that axioms SK1, SK2, SK3 hold, i.e. we restrict ourselves to trace form entropies with g continuous, concave and g(0) = 0. These systems we call admissible systems. Admissible systems when combined with a maximum entropy principle show remarkably simple mathematical properties [12,13].
This generalized entropy for (large) admissible statistical systems (SK1-SK3 hold) is derived from two hitherto unexplored fundamental scaling laws of extensive entropies [1]. Both scaling laws are characterized by exponents c and d, respectively, which allow to uniquely define equivalence classes of entropies, meaning that two entropies are equivalent in the thermodynamic limit if their exponents (c, d) coincide. Each admissible system belongs to one of these equivalence classes (c, d), [1].
In terms of the exponents (c, d) we showed in [1] that all generalized entropies have the form

Special cases of equivalence classes
Let us look at some specific equivalence classes (c, d) • Boltzmann-Gibbs entropy belongs to the (c, d) = (1, 1) class. One gets from Eq. (3) • Tsallis entropy belongs to the (c, d) = (c, 0) class. From Eq. (3) and the choice r Note, that although the pointwise limit c → 1 of Tsallis entropy yields BG entropy, the asymptotic properties (c, 0) do not change continuously to (1, 1) in this limit! In other words the thermodynamic limit and the limit c → 1 do not commute.
As a specific example we compute the (c, d) = (1, 2) case, leading to a superposition of two entropy terms, the asymptotic behavior being dominated by the second.
Other entropies which are special cases of our scheme are found in Table 1. Inversely, for any given entropy we are now in the remarkable position to characterize all large SK1-SK3 systems by a pair of two exponents (c, d), see Fig. 1. For example, for g BG (x) = −x ln(x) we have c = 1, and d = 1. S BG therefore belongs to the universality class (c, d) = (1, 1). For g q (x) = (x − x q )/(1 − q) (Tsallis entropy) and 0 < q < 1 one finds c = q and d = 0, and Tsallis entropy, S q , belongs to the universality class (c, d) = (q, 0). Other examples are listed in Table 1.
The universality classes (c, d) are equivalence classes with the equivalence relation given by: g α ≡ g β ⇔ c α = c β and d α = d β . This relation partitions the space of all admissible g into equivalence classes completely specified by the pair (c, d).

Distribution functions
Distribution functions associated with our Γ-entropy, Eq. (3), can be derived from so-called generalized logarithms of the entropy. Under the maximum entropy I: Order in the zoo of recently introduced entropies for which SK1-SK3 hold. All of them are special cases of the entropy given in Eq. (3) and their asymptotic behavior is uniquely determined by c and d. It can be seen immediately that Sq>1, S b and SE are asymptotically identical; so are Sq<1 and Sκ, as well as Sη and Sγ .
entropy [15] principle (given ordinary constraints) the inverse functions of these logarithms, E = Λ −1 , are the distribution functions, p(ǫ) = E c,d,r (−ǫ), where for example r can be with the constant B ≡ (1−c)r 1−(1−c)r exp (1−c)r 1−(1−c)r . The function W k is the k'th branch of the Lambert-W function which -as a solution to the equation x = W(x) exp(W(x)) -has only two real solutions W k , the branch k = 0 and branch k = −1. Branch k = 0 covers the classes for d ≥ 0, branch k = −1 those for d < 0.

Special cases of distribution functions
It is easy to verify that the class (c, d) = (1, 1) leads to Boltzmann distributions, and the class (c, d) = (c, 0) yields power-laws, or more precisely, Tsallis distributions i.e. q-exponentials. All classes associated with (c, d) = (1, d), for d > 0 are associated with stretched exponential distributions. Expanding the k = 0 branch of the Lambert-W function W 0 (x) ∼ x − x 2 + . . . for 1 ≫ |x|, the limit c → 1 is shown to be a stretched exponential. It was shown that r does not effect its asymptotic properties (tail of the distributions), but can be used to incorporate finite size properties of the distribution function for small x.
How to determine the exponents c and d?
In [2] we have shown that the requirement of extensivity determines uniquely both exponents c and d. What does extensivity mean? Consider a system with N elements. The number of system configurations (microstates) as a function of N are denoted by W (N ). Starting with SK2, p i = 1/W (for all i), we have S g = W i=1 g(p i ) = W g(1/W ). As mentioned above extensivity for two subsystems A and B means that (8) Using this equation one can straight forwardly derive the formulas (for details see [2]) Here W ′ means the derivative with respect to N .

A note on Rényi-type entropies
Rényi entropy is obtained by relaxing SK4 to the unconditional additivity condition. Following the same scaling idea for Rényi-type entropies, S = G( W i=1 g(p i )), with G and g some functions, one gets where f g (z) = lim x→0 g(zx)/g(x). The expression f G (s) ≡ lim s G(sy)/G(s), provides the starting point for deeper analysis which now gets more involved. In particular, for Rényi entropy with G(x) ≡ ln(x)/(1 − α) and g(x) ≡ x α , the asymptotic properties yield the class (c, d) = (1, 1), (BG entropy) meaning that Rényi entropy is additive. However, in contrast to the trace form entropies used above, Rényi entropy can be shown to be not Lesche stable, as was observed before [17][18][19][20][21]. All of the S = W i g(p i ) entropies can be shown to be Lesche stable, see [3].

AGING RANDOM WALKS
In [2] we have discussed a particular type of an accelerating random walk that requires generalized entropy. We first revisit the example of this auto-correlated random walk x and point out that all moments of this random walk are identical to the moments of an accelerating random walk. This means that two processes, where the the first requires a generalized entropy and the second requires Shannon entropy, they both have the same distribution function asymptotically. We then show that auto-correlated random walks are asymptotically equivalent to aging random walks.
Random walks of length N consist of sequences of N decisions ω n with n = 1, 2, · · · , N . Each decision determines whether to take a step of size ∆x to the left, ω n = −1, or to the right, ω n = 1, at time t = n∆t, with a probability q + and q − . The path x(N ∆t) is given by In the following we set the time increment ∆t = 1 and the step size ∆x = 1. For the usual random walk each decision ω n has no bias for any direction, i.e. q + = q − = 1/2 and the expectation value ω n = q + − q − = 0. Further, all decisions are independent, meaning that ω m ω n = δ nm , where δ nm is the Kronecker delta. The number of possible paths W such a random walk can take -its phase-space volume for N steps -is given by W (N ) = 2 N . Using Eq. (9) and Eq. (10) one immediately finds (c, d) = (1, 1). Random walks consisting of independent decisions are described by Shannon's entropy.

Accelerating and auto-correlated random walks
In [2] we considered a different type of random walk where again decisions have no a priori bias on the direction of the walk, i.e. ω n = 0. However, decisions ω n and ω m are not independent anymore. In particular we considered a constant 0 < α ≤ 1 such that ω m ω n = 1 if z ≤ n α , m α < z + 1 (13) for some z = 0, 1, 2, · · · , and ω m ω n = 0 otherwise. This means that the process is correlated with its history, and that after n steps the number of free decisions is given by z ∼ n α . As the walk progresses it heads persistently in the same direction for approximately 1 α n 1−α steps at the n'th step. Therefore, the number of possible paths W grows like W (N ) = 2 N α and the random walk has a stretched exponential growth of phase-space volume. Using Eq. (9) and Eq. (10), the universality class of the process belongs to (c, d) = (1, 1 α ). Increasing persistence of a process over time therefore can be seen as the hallmark of processes that follow generalized extensive entropies.
Computing the moments of x one finds that odd moments vanish, x 2r+1 (N ) = 0, where r is a natural number, and even moments behave as The auto-correlated random walk therefore possesses the same moments as an accelerated random walk, i.e. a random walk with independent decisions ω m ω n = δ nm , however with a time-dependent step size ∆x(n) = D(n)∆x, that increases proportional to n (1−α)/2 . Here D(n) is the time-dependent 'diffusion constant' of the process. In particular, the second moment is given by We conclude that observable distribution functions do not necessary tell us which entropy class the process belongs to. In this example the auto-correlated random walk of class (c, d) = (1, 1/α) has all moments in common with the accelerated random walk, which is of class (c, d) = (1, 1).

Generalization to aging (path-dependent) random walks
The above generating rule Eq. (13), for incorporating auto-correlations into random walks is somewhat artificial. We now show that it is possible to get a completely analogous auto-correlated behavior by considering aging in the decision process ω. This can be done as follows. Consider a second process η n , such that ω n = η n ω n−1 . This process indicates whether at step n the random walk will proceed in the direction of the previous time step (η n = 1) or whether the walk reverses direction (η n = −1). Let k + (N ) (k − (N )) be the number of times that η n = +1 (η n = −1) for 1 ≤ n ≤ N , i.e. k = (k + , k − ) is the histogram of the process η up to time step N . Aging can now be incorporated by considering conditional probabilities for reversing direction or not. In particular we have where 0 < α ≤ 1 takes the same numerical values as in the auto-correlated random walk. As a consequence these aging random walks are non-Markovian processes with memory, since the conditional probabilities for making the next decision depend on the entire history of the process. The dependence is such that the process conditions its next decisions on the histogram of decisions made in the past, not on its precise trajectory and again the decisions become increasingly persistent. To handle this type of process analytically is difficult. However, we can demonstrate numerically, that the first three even moments x 2 , x 4 , and x 6 of the auto-correlated and the aging random walk are identical, and also the number of reversal decisions k − of both processes asymptotically behave in exactly the same way. This shows that the effective number of different paths, i.e. the phase-space volume, of both processes grows in the same way and therefore the aging random walk belongs to the equivalence class (c, d) = (1, 1/α).
It is possible to show that one can arrive at different equivalence class by altering the expression n α in Eq. (13). In particular by exchanging n α with a log n (same for m), one arrives at the Tsallis equivalence class (c, d) = (q, 0).

General classes of aging random walks
We are now in the position to generalize random aging walks to different classes (c, d) of entropies. This can be done by generalizing the path dependent conditional probabilities of Eq. (16) in the following way: , where g(k + ) is a monotonically decreasing function (lim k+→∞ g(k + ) = 0). In the above example g(k + ) = α[1+k + (n)] α−1 corresponds to an aging process in the entropy class (c, d) = (1, 1/α). Different choices of the function g will in general lead to different entropy classes (c, d) depending on the asymptotic behavior k − (N ) which corresponds to the effective number of free decisions occurring during the walk and therefore to the way phase-space grows with N . Again, a precise analytical analysis of how the choice of g determines (c, d) is complicated and goes beyond the scope of the paper. However, it is known that systems with 0 < c < 1 allow only a finite effective number of free decisions, e.g. [2,22]. This can for instance be achieved with the function with 0 < ν ≤ 1 and λ > 1. By using a 'mean field' approach and setting dk + (N ) dN = p(+1|k(N )) and dk − (N ) dN = p(−1|k(N )) (19) one can derive the following asymptotic expression: where γ(a, b) = b 0 dtt a−1 e −t is the lower incomplete gamma function. Consequently the effective number of free decisions in this aging walks can be estimated by k − (∞). The behavior of k − (∞) is shown in Fig. (4).
The fact that only a finite number, k − , of direction reversal decisions happen during such a random walk leads to a peculiar cross-over phenomenon that can be observed by studying the second moment x 2 (N ) of the walk. In particular x 2 (N ) ∼ N for small N . For large N ≫ 1 the random walk persistently heads into one direction and x 2 (N ) ∼ N 2 . At an intermediate range of N that depends on the value of λ the behavior x 2 (N ) crosses over from N to N 2 , see Fig. (5). The derive the exact function that relates ν and λ to c and d is beyond the scope of this paper. However, we conjecture that c = 1−ν since ν = 0 corresponds to the usual random walk and therefore we require c = 1 in this case.
It would be desirable to have a comprehensive classfication of aging random walks in terms of equivalence classes (c, d). We conjecture that this is in fact possible by exchanging the expression n α in Eq. (13) with more general forms n α → n α (log n) β , where α and β are directly related to c and d.
Finally, let us remark that it is not straight forward to relate aging random walks and its class (c, d) with more traditional scaling exponents such as for example the Hurst exponent. The very nature of aging walks is that their persistence changes over time.

CONCLUSIONS
Based on recently discovered scaling laws for trace form entropies we can classify all statistical systems and assign the a unique system-specific (extensive) generalized entropy. For non-ergodic systems these entropies may deviate from the Shannon form. The exponents for BG systems are (c, d) = (1, 1), systems characterized by stretched exponentials belong to the class (c, d) = (1, d), and Tsallis systems have (c, d) = (q, 0). A further interesting feature all admissible systems is that they are all Lesche stable, and that the classification scheme for generalized entropies of type S = i g(p i ) can be easily extended to entropies of Rényi type, i.e. S = G( i g(p i )). For proofs see [3].
We demonstrated that the auto-correlated random walk characterized by 0 < α ≤ 1 introduced in [2] can not be distinguished from accelerating random walks. Although the presented auto-correlated random walk is of entropy class (c, d) = (1, 1/α) and the accelerated random walk is of class (c, d) = (1, 1), both processes have the same distribution function since all moments x n are identical. We have shown that other classes of random walks can naturally be obtained, including those belonging to the (c, d) = (q, 0), or Tsallis equivalence class. Moreover, we showed numerically that the autocorrelated random walk is asymptotically equivalent to a particular aging random walks, where the probability of a decision to reverse the direction of the walk depends on the path the random walk has taken. This concept of aging can easily be generalized to different forms of aging and it can be expected that many of the admissible systems can be represented by a specific type of aging that is specified by the aging function g, Eq. (17)). Finally, we have seen that different equivalence classes (c, d) can be realized by specifying a aging function g. The effective number of direction reversal decisions corresponding to the aging function remains finite and therefore the associated generalized entropy requires a class (c, d) with 0 < c < 1. We believe that it should be possible that the scheme of aging random walks can be naturally extended to aging processes in physical, biological, and social systems in general. * Electronic address: stefan.thurner@meduniwien.ac.at 1. Hanel R.; Thurner S. A comprehensive classification of complex statistical systems and an axiomatic derivation of their entropy and distribution functions. Europhys