Alternative Entropy Measures and Generalized Khinchin–Shannon Inequalities

The Khinchin–Shannon generalized inequalities for entropy measures in Information Theory, are a paradigm which can be used to test the Synergy of the distributions of probabilities of occurrence in physical systems. The rich algebraic structure associated with the introduction of escort probabilities seems to be essential for deriving these inequalities for the two-parameter Sharma–Mittal set of entropy measures. We also emphasize the derivation of these inequalities for the special cases of one-parameter Havrda–Charvat’s, Rényi’s and Landsberg–Vedral’s entropy measures.


Introduction
In the present contribution we derive the Generalized Khinchin-Shannon inequalities (GKS) [1,2] associated to entropy measures of the Sharma-Mittal (SM) set [3]. We stress that the derivations to be presented here are a tentative way of implementing the ideas of the literature on interdisciplinary topics of Statistical Mechanics and Theory of Information [4][5][6]. The algebraic structure of the escort probability distributions on these derivations seems to be essential, contrary to the intuitive derivation of the usual Khinchin-Shannon inequalities for the Gibbs-Shannon entropy measures. We start on Section 2 with the construction of a generic probabilistic space with their elements-the probabilities of occurrence-arranged on blocks of m rows and n columns. It then follows the introduction of the definitions of simple, joint, conditional and marginal probabilities through the use of the Bayes' law. In Section 3, we make use of the assumption of concavity in order to unveil the Synergy of the distribution of values of Gibbs-Shannon entropy measures [2]. In Section 4, we present the same development but for the SM set of entropy measures, after the introduction of the concept of escort probabilities. We then specialize the derivations to Havrda-Charvat's, Rényi's and Landsberg-Vedral's entropies [7][8][9]. A detailed study is then undertaken in this section to treat the eventual ordering between the probabilities of occurrence and their associated escort probabilities. This is then enough for deriving the GKS inequalities for the SM entropy measures. In Section 5, we present a proposal for Information measure associated to SM entropies and we derive its related inequalities [10]. At this point we stress once more the upsurge of the synergy effect on the comparison of the information obtained from the entropy calculated with joint probabilities of occurrence and the entropies corresponding to simple probabilities. In Section 6, we present an alternative derivation of the GKS inequalities based on Hölder inequalities [11]. These can provide, in association with Bayes' law, the same assumptions of concavity which have been used in Sections 3 and 4 and a consequent identical derivation of the GKS inequalities given in Section 4.

The Probability Space. Probabilities of Occurrence
We consider that the data could be represented on two-dimensional arrays of m rows and n columns. We then have m × n blocks of data to undertake the statistical analysis.
The joint probabilities of occurrence of a set of t variables a 1 , . . . , a t in columns j 1 , . . . , j t , respectively, are given by where m is the number of rows in the subarray m × t of the array m × n, and n j 1 ...j t (a 1 , . . . , a t ) is the number of occurrences of the set a 1 , . . . , a t . The values assumed by the variables j 1 , . . . , j t , j 1 < j 2 < . . . < j t , are respectively given by: or, There are then ( n t ) = n! t!(n−t)! objects of t columns each, 1 ≤ t ≤ n, and if the variables a 1 , . . . , a t take on values 1, . . . , W, then we will have (W) t components for each of these objects. Since: we can write: On the study of distributions of bases of nucleotides or distributions of amino acids in proteins, the related values of W are W = 4 and W = 20, respectively.
The Bayes' law for the probabilities of occurrence of Equation (1) is written as: where p j 1 ...j t (a 1 , . . . , a t−1 |a t ) stands for the conditional probability of occurrence of the values associated to the variables a 1 , . . . , a t−1 in the columns j 1 , . . . , j t−1 , respectively, if the values associated to a t in the jth column are given a priori. This also means that: ∑ a 1 ,...,a t−1 p j 1 ...j t (a 1 , . . . , a t−1 |a t ) = 1 .
The marginal probabilities related to p j 1 ...j t (a 1 , . . . , a t ) are then given by We then have from Equations (6) and (8): which is the same result of Equation (5).

The Assumption of Concavity and the Synergy of Gibbs-Shannon Entropy Measures
A concave function of several variables should satisfy the following inequality: We shall apply Equation (10) to the Gibbs-Shannon entropies: where Equation (12) stands for the definition of Gibbs-Shannon entropy which is related to the conditional probabilities p j 1 ...j t (a 1 , . . . , a t−1 |a t ). It is a measure of the uncertainty [2] on the distribution of probabilities of the columns j 1 , . . . , j t−1 , when we have previous information on the distribution of the column j t . From Bayes' law, Equation (6) and from Equations (8), (11) and (12), we get: We now use the correspondences: and we then have: and After substituting Equations (12), (16) and (17) into Equation (10), we get: or, This means that the uncertainty of the distribution on the columns j 1 , . . . , j t−1 cannot be increased when we have previous information on the distribution of column j t .
From Equations (13) and (19), we then write: and by iteration we get the Khinchin-Shannon inequality for the Gibbs-Shannon entropy measure: The usual meaning given to Equation (21) is that the minimum of the information to be obtained from the analysis of the joint probabilities of a set of t columns is given by the sum of the informations associated with the t columns if considered as independent [1,2,10]. This is also seen as an aspect of Synergy [12,13] of the distribution of probabilities of occurrence.

The Assumption of Concavity and the Synergy of Sharma-Mittal (SM) Entropy Measures. The GKS Inequalities
We shall now use the assumption of concavity given by Equation (10) where, and r, s are non-dimensional parameters. Analogously to Equation (12), we also introduce the "conditional entropy measure" where andp j t (a t ) stands for the escort probabilitŷ We have in general: The inverse transformations are given by: A range of variation for the parameters r, s of Sharma-Mittal entropies, Equation (22), should be derived from a requirement for strict concavity. In order to do so, let us remember that for each set of t columns (m × t subarray) there are m rows of t values each (t-sequences). We now denote these t-sequences by: A sufficient requirement for strict concavity is the negative definiteness of the quadratic form associated to the Hessian matrix [14], whose elements are given by: We then consider the m submatrices along the diagonal of the Hessian matrix. Their determinants should be alternately negative or positive according to whether their order is odd or even [15], respectively: We then choose: and we have from Equations (34)-(36): det H q µ q ν (µ, ν = 1, 2) > 0 , det H q µ q ν (µ, ν = 1, 2, 3) < 0 .
We then have generally: This completes the proof. From Bayes' law, Equation (6) and from Equations (8), (22)-(25), we can write: We are now ready to use the concavity assumption, Equation (10) for deriving the GKS inequalities. In order to do so, we make the correspondences: We can then write: An additional information should be taken into consideration before we derive the GKS inequalities: On each j t column, t = 1, . . . , n of a m × n block, there will be values a t , a t of a t such that and After multiplying inequalities (48) and (49) by p j 1 ...j t (a 1 , . . . , a t−1 |a t ) and p j 1 ...j t (a 1 , . . . , a t−1 |a t ), respectively, and summing up in a t and a t , respectively, we get: and ∑ a t p j t (a t )p j 1 ...j t (a 1 , . . . , a t−1 |a t ) ≤ ∑ a tp j t (a t )p j 1 ...j t (a 1 , . . . , a t−1 |a t ) .
From Equations (48) and (49), any sum over the a t values can be partitioned as sums over the sets of values a t and a t : ∑ a t p j t (a t )p j 1 ...j t (a 1 , . . . , a t−1 |a t ) = ∑ a t p j t (a t )p j 1 ...j t (a 1 , . . . , a t−1 |a t ) + ∑ a t p j t (a t )p j 1 ...j t (a 1 , . . . , a t−1 |a t ) . (52) Substituting Equation (52) into Equations (50) and (51), we have: and ∑ a t respectively. After applying the Bayes' law, Equation (6), to the first term on the left hand side of Equations (53) and (54), we get: where We now write, the concavity assumption, Equation (10), as: where and Equations (59) and (60) are now written as: where A ≡ ∑ a 1 ,...,a t−1 p j 1 ...j t−1 (a 1 , . . . , a t−1 ) and Since B ≤ 0 and B ≥ 0, we have trivially that: and The set of inequalities, Equations (61), (64) and (69), or and the set of inequalities, Equations (61), (65) and (70), or can be arranged as the chains of inequalities and respectively. The inequality A ≥ D is common to the two chains above and it can be written as: From Equations (75) and (76) and the definition of the α-symbols, Equation (23), we have: We then get by iteration, Equation (78) do correspond to the Generalized Khinchin-Shannon Inequalities (GKS) here derived for Sharma-Mittal entropies.
From Equations (22) and (37) we can also write for the GKS inequalities: The same words which have been written after Equation (21), could be written also here for the Sharma-Mittal entropy measures as the aspect of Synergy is concerned. We will introduce a proposal for information measure to stress this aspect on the next section.
After the comparison of Equation (42) and (92), we get the result of Equation (79) again.

An Information Measure Proposal Associated to Sharma-Mittal Entropy Measures
We are looking for a proposal of information measure which can fulfill a requirement of clear interpretation of the upsurge of Synergy in a probabilistic distribution and is supported by the usual idea of entropy as a measure of uncertainty.
For the Sharma-Mittal set of entropy measures the proposal for the associated information measure would be: where (SM) j 1 ...j t and α j 1 ...j t are given by Equations (22) From the GKS inequalities, Equation (79), and from Equation (94), we get: The meaning of Equation (95) is that the minimum of information associated with t columns of probabilities of occurrence is given by the sum of informations associated to each column. This corresponds to the expression of Synergy of the distribution of probabilities of occurrences which we have derived on the previous section.
The inequalities (95), for t = 2, 3 are written as: I j 1 j 2 j 3 ≥ I j 1 + I j 2 + I j 3 + (1 − r) I j 1 I j 2 + I j 2 I j 3 + I j 1 I j 3 + (1 − r) 2 I j 1 I j 2 I j 3 It seems worthwhile to derive yet another result which unveils once more the fundamental aspect of synergy of the distribution of probabilities of occurrence. From Equation (93), we have: and we then write from the GKS inequalities, Equation (78): We then get: Equation (100) do correspond to another result which originates from the Synergy of the distribution of probabilities of occurrence. It can be written as: The minimum of the rate of information increase with decreasing entropy in probability distribution for sets of t columns, is given by the product of the rates of information increase pertaining to each of the t columns.
A fundamental aim would be the derivation of a dynamical theory which would be able to describe the process of formation of these structures. A theory based on the evolution of the entropy values on databases which we hope that could be realized by methods introduced by the exhaustive study of Fokker-Planck equations.
Some introductory results on this promising line of research have been already published [16] and a forthcoming publication of a comprehensive review will summarize all of them.