Next Article in Journal
Affect-Logic, Embodiment, Synergetics, and the Free Energy Principle: New Approaches to the Understanding and Treatment of Schizophrenia
Next Article in Special Issue
Monitoring the Evolution of Asynchrony between Mean Arterial Pressure and Mean Cerebral Blood Flow via Cross-Entropy Methods
Previous Article in Journal
Multi-Label Feature Selection Combining Three Types of Conditional Relevance
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Alternative Entropy Measures and Generalized Khinchin–Shannon Inequalities

by
Rubem P. Mondaini
* and
Simão C. de Albuquerque Neto
COPPE, Centre of Technology, Federal University of Rio de Janeiro, Rio de Janeiro 21941-901, Brazil
*
Author to whom correspondence should be addressed.
Entropy 2021, 23(12), 1618; https://doi.org/10.3390/e23121618
Submission received: 2 October 2021 / Revised: 12 November 2021 / Accepted: 21 November 2021 / Published: 1 December 2021
(This article belongs to the Special Issue Sample Entropy: Theory and Application)

Abstract

:
The Khinchin–Shannon generalized inequalities for entropy measures in Information Theory, are a paradigm which can be used to test the Synergy of the distributions of probabilities of occurrence in physical systems. The rich algebraic structure associated with the introduction of escort probabilities seems to be essential for deriving these inequalities for the two-parameter Sharma–Mittal set of entropy measures. We also emphasize the derivation of these inequalities for the special cases of one-parameter Havrda–Charvat’s, Rényi’s and Landsberg–Vedral’s entropy measures.

1. Introduction

In the present contribution we derive the Generalized Khinchin–Shannon inequalities (GKS) [1,2] associated to entropy measures of the Sharma–Mittal (SM) set [3]. We stress that the derivations to be presented here are a tentative way of implementing the ideas of the literature on interdisciplinary topics of Statistical Mechanics and Theory of Information [4,5,6]. The algebraic structure of the escort probability distributions on these derivations seems to be essential, contrary to the intuitive derivation of the usual Khinchin–Shannon inequalities for the Gibbs–Shannon entropy measures. We start on Section 2 with the construction of a generic probabilistic space with their elements—the probabilities of occurrence—arranged on blocks of m rows and n columns. It then follows the introduction of the definitions of simple, joint, conditional and marginal probabilities through the use of the Bayes’ law. In Section 3, we make use of the assumption of concavity in order to unveil the Synergy of the distribution of values of Gibbs–Shannon entropy measures [2]. In Section 4, we present the same development but for the SM set of entropy measures, after the introduction of the concept of escort probabilities. We then specialize the derivations to Havrda–Charvat’s, Rényi’s and Landsberg–Vedral’s entropies [7,8,9]. A detailed study is then undertaken in this section to treat the eventual ordering between the probabilities of occurrence and their associated escort probabilities. This is then enough for deriving the GKS inequalities for the SM entropy measures. In Section 5, we present a proposal for Information measure associated to SM entropies and we derive its related inequalities [10]. At this point we stress once more the upsurge of the synergy effect on the comparison of the information obtained from the entropy calculated with joint probabilities of occurrence and the entropies corresponding to simple probabilities. In Section 6, we present an alternative derivation of the GKS inequalities based on Hölder inequalities [11]. These can provide, in association with Bayes’ law, the same assumptions of concavity which have been used in Section 3 and Section 4 and a consequent identical derivation of the GKS inequalities given in Section 4.

2. The Probability Space. Probabilities of Occurrence

We consider that the data could be represented on two-dimensional arrays of m rows and n columns. We then have m × n blocks of data to undertake the statistical analysis. The joint probabilities of occurrence of a set of t variables a 1 , , a t in columns j 1 , , j t , respectively, are given by
p j 1 j t ( a 1 , , a t ) = n j 1 j t ( a 1 , , a t ) m ,
where m is the number of rows in the subarray m × t of the array m × n , and n j 1 j t ( a 1 , , a t ) is the number of occurrences of the set a 1 , , a t . The values assumed by the variables j 1 , , j t , j 1 < j 2 < < j t , are respectively given by:
j 1 = 1 , 2 , , n t + 1 j 2 = j 1 + 1 , j 1 + 2 , , n t + 2 j t 1 = j t 2 + 1 , j t 2 + 2 , , n 1 j t = j t 1 + 1 , j t 1 + 2 , , n ,
or,
j 1 = 1 , 2 , , n t + 1 j 2 = j 1 + 1 , j 1 + 2 , , n t + 2 j t 1 = j 1 + t 2 , j 1 + t 1 , , n 1 j t = j 1 + t 1 , j 1 + t , , n .
There are then n t = n ! t ! ( n t ) ! objects of t columns each, 1 t n , and if the variables a 1 , , a t take on values 1 , , W , then we will have ( W ) t components for each of these objects.
Since:
a 1 , , a t = 1 W n j 1 j t ( a 1 , , a t ) = m ,
we can write:
a 1 , , a t = 1 W p j 1 j t ( a 1 , , a t ) = 1 .
On the study of distributions of bases of nucleotides or distributions of amino acids in proteins, the related values of W are W = 4 and W = 20 , respectively.
The Bayes’ law for the probabilities of occurrence of Equation (1) is written as:
p j 1 j t ( a 1 , , a t ) = p j 1 j t ( a 1 , , a t 1 | a t ) · p j t ( a t ) = p j t j 1 j t 1 ( a t | a 1 , , a t 1 ) · p j 1 j t 1 ( a 1 , , a t 1 ) = p j t j 1 j t 1 ( a t , a 1 , , a t 1 ) ,
where p j 1 j t ( a 1 , , a t 1 | a t ) stands for the conditional probability of occurrence of the values associated to the variables a 1 , , a t 1 in the columns j 1 , , j t 1 , respectively, if the values associated to a t in the jth column are given a priori. This also means that:
a 1 , , a t 1 p j 1 j t ( a 1 , , a t 1 | a t ) = 1 .
The marginal probabilities related to p j 1 j t ( a 1 , , a t ) are then given by
p j t ( a t ) = a 1 , , a t 1 p j 1 j t ( a 1 , , a t ) = a 1 , , a t 1 p j 1 j t ( a 1 , , a t 1 | a t ) p j t ( a t ) .
We then have from Equations (6) and (8):
1 = a t p j t ( a t ) = a 1 , , a t p j 1 j t ( a 1 , , a t ) ,
which is the same result of Equation (5).

3. The Assumption of Concavity and the Synergy of Gibbs–Shannon Entropy Measures

A concave function of several variables should satisfy the following inequality:
l λ l f ( x l ) f l λ l x l ; λ l 0 ; l λ l = 1 .
We shall apply Equation (10) to the Gibbs–Shannon entropies:
S j 1 j t = a 1 , , a t p j 1 j t ( a 1 , , a t ) log p j 1 j t ( a 1 , , a t ) ,
S j 1 j t 1 | j t = a 1 , , a t p j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) log p j 1 j t ( a 1 , , a t 1 | a t ) .
where Equation (12) stands for the definition of Gibbs–Shannon entropy which is related to the conditional probabilities p j 1 j t ( a 1 , , a t 1 | a t ) . It is a measure of the uncertainty [2] on the distribution of probabilities of the columns j 1 , , j t 1 , when we have previous information on the distribution of the column j t .
From Bayes’ law, Equation (6) and from Equations (8), (11) and (12), we get:
S j 1 j t = S j 1 j t 1 | j t + S j t .
We now use the correspondences:
λ l p j t ( a t ) ; x l p j 1 j t ( a 1 , , a t 1 | a t ) , f ( x l ) a 1 , , a t 1 p j 1 j t ( a 1 , , a t 1 | a t ) log p j 1 j t ( a 1 , , a t 1 | a t ) ,
and we then have:
l λ l x l a t p j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) = a t p j 1 j t ( a 1 , , a t ) = p j 1 j t 1 ( a 1 , , a t 1 ) ,
l λ l f ( x l ) a t , a 1 , , a t 1 p j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) log p j 1 j t ( a 1 , , a t 1 | a t ) ,
and
f l λ l x l a 1 , , a t 1 p j 1 j t 1 ( a 1 , , a t 1 ) log p j 1 j t 1 ( a 1 , , a t 1 ) .
After substituting Equations (12), (16) and (17) into Equation (10), we get:
S j 1 j t 1 | j t = a 1 , , a t p j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) log p j 1 j t ( a 1 , , a t 1 | a t ) a 1 , , a t 1 p j 1 j t 1 ( a 1 , , a t 1 ) log p j 1 j t 1 ( a 1 , , a t 1 ) ,
or,
S j 1 j t 1 | j t S j 1 j t 1 .
This means that the uncertainty of the distribution on the columns j 1 , , j t 1 cannot be increased when we have previous information on the distribution of column j t .
From Equations (13) and (19), we then write:
S j 1 j t S j 1 j t 1 + S j t ,
and by iteration we get the Khinchin–Shannon inequality for the Gibbs–Shannon entropy measure:
S j 1 j t l = 1 t S j l .
The usual meaning given to Equation (21) is that the minimum of the information to be obtained from the analysis of the joint probabilities of a set of t columns is given by the sum of the informations associated with the t columns if considered as independent [1,2,10]. This is also seen as an aspect of Synergy [12,13] of the distribution of probabilities of occurrence.

4. The Assumption of Concavity and the Synergy of Sharma–Mittal (SM) Entropy Measures. The GKS Inequalities

We shall now use the assumption of concavity given by Equation (10) on Sharma–Mittal (SM) entropy measures:
( S M ) j 1 j t = ( α j 1 j t ) 1 r 1 s 1 1 r ,
where,
α j 1 j t = a 1 , , a t p j 1 j t ( a 1 , , a t ) s ,
and r, s are non-dimensional parameters.
Analogously to Equation (12), we also introduce the “conditional entropy measure”
( S M ) j 1 j t 1 | j t = ( β j 1 j t 1 | j t ) 1 r 1 s 1 1 r ,
where
β j 1 j t 1 | j t = a 1 , , a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) s ,
and p ^ j t ( a t ) stands for the escort probability
p ^ j t ( a t ) = p j t ( a t ) s α j t = p j t ( a t ) s a t p j t ( a t ) s .
We have in general:
p ^ j 1 j t ( a 1 , , a t ) = p j 1 j t ( a 1 , , a t ) s α j 1 j t = p j 1 j t ( a 1 , , a t ) s a 1 , , a t p j 1 j t ( a 1 , , a t ) s .
The inverse transformations are given by:
p j t ( a t ) = p ^ j t ( a t ) 1 / s a t p ^ j t ( a t ) 1 / s ,
p j 1 j t ( a 1 , , a t ) = p ^ j 1 j t ( a 1 , , a t ) 1 / s a 1 , , a t p ^ j 1 j t ( a 1 , , a t ) 1 / s ,
with
a t p j t ( a t ) = 1 = a t p ^ j t ( a t ) ,
a 1 , , a t p j 1 j t ( a 1 , , a t ) = 1 = a 1 , , a t p ^ j 1 j t ( a 1 , , a t ) .
A range of variation for the parameters r, s of Sharma–Mittal entropies, Equation (22), should be derived from a requirement for strict concavity. In order to do so, let us remember that for each set of t columns ( m × t subarray) there are m rows of t values each (t-sequences). We now denote these t-sequences by:
( a 1 q μ , , a t q μ ) , μ = 1 , , m
A sufficient requirement for strict concavity is the negative definiteness of the quadratic form associated to the Hessian matrix [14], whose elements are given by:
H q μ q ν = 2 ( S M ) j 1 j t p j 1 j t ( a 1 q μ , , a t q μ ) p j 1 j t ( a 1 q ν , , a t q ν ) , μ , ν = 1 , , m .
We then consider the m submatrices along the diagonal of the Hessian matrix. Their determinants should be alternately negative or positive according to whether their order is odd or even [15], respectively:
det H q μ q ν ( μ , ν = 1 ) = s ( α j 1 j t ) s r 1 s p j 1 j t ( a 1 q 1 , , a t q 1 ) s 2 s ( s r ) ( 1 s ) 2 · p ^ j 1 j t ( a 1 q 1 , , a t q 1 ) 1 ,
det H q μ q ν ( μ , ν = 1 , 2 ) = s 2 ( α j 1 j t ) 2 s r 1 s p j 1 j t ( a 1 q 1 , , a t q 1 ) · p j 1 j t ( a 1 q 2 , , a t q 2 ) s 2 s ( s r ) ( 1 s ) 2 · p ^ j 1 j t ( a 1 q 1 , , a t q 1 ) + p ^ j 1 j t ( a 1 q 2 , , a t q 2 ) 1 ,
det H q μ q ν ( μ , ν = 1 , 2 , 3 ) = s 3 ( α j 1 j t ) 3 s r 1 s p j 1 j t ( a 1 q 1 , , a t q 1 ) · p j 1 j t ( a 1 q 2 , , a t q 2 ) · p j 1 j t ( a 1 q 3 , , a t q 3 ) s 2 · s ( s r ) ( 1 s ) 2 · p ^ j 1 j t ( a 1 q 1 , , a t q 1 ) + p ^ j 1 j t ( a 1 q 2 , , a t q 2 ) + p ^ j 1 j t ( a 1 q 3 , , a t q 3 ) 1 .
We then choose:
1 > r s > 0 ,
and we have from Equations (34)–(36):
det H q μ q ν ( μ , ν = 1 ) < 0 ,
det H q μ q ν ( μ , ν = 1 , 2 ) > 0 ,
det H q μ q ν ( μ , ν = 1 , 2 , 3 ) < 0 .
We then have generally:
det H q μ q ν ( μ , ν = 1 , , m ) = ( 1 ) m 1 · s m ( α j 1 j t ) m s r 1 s μ = 1 m p j 1 j t ( a 1 q μ , , a t q μ ) s 2 s ( s r ) ( 1 s ) 2 μ = 1 m p ^ j 1 j t ( a 1 q μ , , a t q μ ) 1 .
This completes the proof.
From Bayes’ law, Equation (6) and from Equations (8), (22)–(25), we can write:
( S M ) j 1 j t = ( S M ) j 1 j t 1 | j t + ( S M ) j t + ( 1 r ) ( S M ) j 1 j t 1 | j t · ( S M ) j t .
We are now ready to use the concavity assumption, Equation (10) for deriving the GKS inequalities. In order to do so, we make the correspondences:
λ l p ^ j t ( a t ) ; x l p j 1 j t ( a 1 , , a t 1 | a t ) , f ( x l ) a 1 , , a t 1 p j 1 j t ( a 1 , , a t 1 | a t ) s .
We can then write:
l λ l x l a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) ,
l λ l f ( x l ) a t , a 1 , , a t 1 p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) s ,
and
f l λ l x l a 1 , , a t 1 a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) s .
With the correspondences above, Equation (10) will turn into:
a 1 , , a t 1 a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) s a 1 , , a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) s .
An additional information should be taken into consideration before we derive the GKS inequalities:
On each j t column, t = 1 , , n of a m × n block, there will be values a t , a t of a t such that
p j t ( a t ) p ^ j t ( a t ) ,
and
p j t ( a t ) p ^ j t ( a t ) .
After multiplying inequalities (48) and (49) by p j 1 j t ( a 1 , , a t 1 | a t ) and p j 1 j t ( a 1 , , a t 1 | a t ) , respectively, and summing up in a t and a t , respectively, we get:
a t p j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) ,
and
a t p j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) .
From Equations (48) and (49), any sum over the a t values can be partitioned as sums over the sets of values a t and a t :
a t p j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) = a t p j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) + a t p j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) .
Substituting Equation (52) into Equations (50) and (51), we have:
a t p j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) a t p j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) ,
and
a t p j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) a t p j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) ,
respectively.
After applying the Bayes’ law, Equation (6), to the first term on the left hand side of Equations (53) and (54), we get:
p j 1 j t 1 ( a 1 , , a t 1 ) a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) + B ,
p j 1 j t 1 ( a 1 , , a t 1 ) a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) + B ,
where
B = a t p j t ( a t ) p ^ j t ( a t ) · p j 1 j t ( a 1 , , a t 1 | a t ) ,
B = a t p j t ( a t ) p ^ j t ( a t ) · p j 1 j t ( a 1 , , a t 1 | a t ) ,
and we have, B 0 , B 0 , according to Equations (48) and (49), respectively.
After taking the s-power in Equations (55) and (56) and summing up in a 1 , , a t 1 , we have:
a 1 , , a t 1 p j 1 j t 1 ( a 1 , , a t 1 ) s a 1 , , a t 1 a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) + B s ,
a 1 , , a t 1 p j 1 j t 1 ( a 1 , , a t 1 ) s a 1 , , a t 1 a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) + B s .
We now write, the concavity assumption, Equation (10), as:
C D ,
where
C a 1 , , a t 1 a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) s ,
and
D a 1 , , a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) s .
Equations (59) and (60) are now written as:
A C ,
A C ,
where
A a 1 , , a t 1 p j 1 j t 1 ( a 1 , , a t 1 ) s ,
and
C a 1 , , a t 1 a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) + B s ,
C a 1 , , a t 1 a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) + B s .
Since B 0 and B 0 , we have trivially that:
C C ,
and
C C .
The set of inequalities, Equations (61), (64) and (69), or
C D ; A C ; C C ,
and the set of inequalities, Equations (61), (65) and (70), or
C D ; A C ; C C ,
can be arranged as the chains of inequalities
C A D C ,
and
C A C D ,
respectively.
The inequality A D is common to the two chains above and it can be written as:
a 1 , , a t 1 p j 1 j t 1 ( a 1 , , a t 1 ) s a 1 , , a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) s .
From the definition of the escort probabilities, Equations (26) and (27), we can write the right-hand side of Equation (75), as:
a 1 , , a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) s = a 1 , , a t p j 1 j t ( a 1 , , a t ) s a t p j t ( a t ) s .
From Equations (75) and (76) and the definition of the α -symbols, Equation (23), we have:
α j 1 j t α j 1 j t 1 · α j t .
We then get by iteration,
α j 1 j t l = 1 t α j l .
Equation (78) do correspond to the Generalized Khinchin–Shannon Inequalities (GKS) here derived for Sharma–Mittal entropies.
From Equations (22) and (37) we can also write for the GKS inequalities:
( S M ) j 1 j t l = 1 t 1 + ( 1 r ) ( S M ) j l 1 1 r .
The same words which have been written after Equation (21), could be written also here for the Sharma–Mittal entropy measures as the aspect of Synergy is concerned. We will introduce a proposal for information measure to stress this aspect on the next section.
For t = 2 , 3 , we can write from Equation (79):
( S M ) j 1 j 2 ( S M ) j 1 + ( S M ) j 2 + ( 1 r ) ( S M ) j 1 ( S M ) j 2 ,
( S M ) j 1 j 2 j 3 ( S M ) j 1 + ( S M ) j 2 + ( S M ) j 3 + ( 1 r ) [ ( S M ) j 1 ( S M ) j 2 + ( S M ) j 2 ( S M ) j 3 + ( S M ) j 1 ( S M ) j 3 ] + ( 1 r ) 2 ( S M ) j 1 ( S M ) j 2 ( S M ) j 3 .
The Havrda–Charvat’s, Rényi’s and Landsberg–Vedral’s entropies are easily obtained by taking the convenient limits in Equation (24):
lim r s ( S M ) j 1 j t = ( H C ) j 1 j t = α j 1 j t 1 1 s ,
lim r 1 ( S M ) j 1 j t = R j 1 j t = log ( α j 1 j t ) 1 s ,
lim r 2 s ( S M ) j 1 j t = ( L V ) j 1 j t = α j 1 j t 1 ( 1 s ) α j 1 j t .
The Gibbs–Shannon entropy measure S j 1 j t , Equation (11), is included in all these entropies through:
lim s 1 ( H C ) j 1 j t = lim s 1 R j 1 j t = lim s 1 ( L V ) j 1 j t = S j 1 j t .
Equations (83) and (85) have been derived via the l’Hôpital theorem.
For t = 2 , 3 , we write from Equations (82)–(84):
( H C ) j 1 j 2 ( H C ) j 1 + ( H C ) j 2 + ( 1 s ) ( H C ) j 1 ( H C ) j 2 ,
( H C ) j 1 j 2 j 3 ( H C ) j 1 + ( H C ) j 2 + ( H C ) j 3 + ( 1 s ) [ ( H C ) j 1 ( H C ) j 2 + ( H C ) j 2 ( H C ) j 3 + ( H C ) j 1 ( H C ) j 3 ] + ( 1 s ) 2 ( H C ) j 1 ( H C ) j 2 ( H C ) j 3 ,
R j 1 j 2 R j 1 + R j 2 ,
R j 1 j 2 j 3 R j 1 + R j 2 + R j 3 ,
( L V ) j 1 j 2 ( L V ) j 1 + ( L V ) j 2 ( 1 s ) ( L V ) j 1 ( L V ) j 2 ,
( L V ) j 1 j 2 j 3 ( L V ) j 1 + ( L V ) j 2 + ( L V ) j 3 ( 1 s ) [ ( L V ) j 1 ( L V ) j 2 + ( L V ) j 2 ( L V ) j 3 + ( L V ) j 1 ( L V ) j 3 ] + ( 1 s ) 2 ( L V ) j 1 ( L V ) j 2 ( L V ) j 3 .
As a last result of this section, we note that Equation (79) could be also derived from Equation (75), since this equation could be also written as:
( S M ) j 1 j t 1 | j t ( S M ) j 1 j t 1 ,
where we have used Equations (22)–(25).
After the comparison of Equation (42) and (92), we get the result of Equation (79) again.

5. An Information Measure Proposal Associated to Sharma–Mittal Entropy Measures

We are looking for a proposal of information measure which can fulfill a requirement of clear interpretation of the upsurge of Synergy in a probabilistic distribution and is supported by the usual idea of entropy as a measure of uncertainty.
For the Sharma–Mittal set of entropy measures the proposal for the associated information measure would be:
I j 1 j t = ( S M ) j 1 j t ( α j 1 j t ) 1 r 1 s , 1 > r s > 0 ,
where ( S M ) j 1 j t and α j 1 j t are given by Equations (22) and (23). We then have from Equation (93)
( S M ) j 1 j t = I j 1 j t 1 + ( 1 r ) I j 1 j t .
From the GKS inequalities, Equation (79), and from Equation (94), we get:
I j 1 j t l = 1 t 1 + ( 1 r ) I j l 1 1 r l = 1 t I j l .
The meaning of Equation (95) is that the minimum of information associated with t columns of probabilities of occurrence is given by the sum of informations associated to each column. This corresponds to the expression of Synergy of the distribution of probabilities of occurrences which we have derived on the previous section.
The inequalities (95), for t = 2 , 3 are written as:
I j 1 j 2 I j 1 + I j 2 + ( 1 r ) I j 1 I j 2 I j 1 + I j 2 ,
I j 1 j 2 j 3 I j 1 + I j 2 + I j 3 + ( 1 r ) I j 1 I j 2 + I j 2 I j 3 + I j 1 I j 3 + ( 1 r ) 2 I j 1 I j 2 I j 3 I j 1 + I j 2 + I j 3 .
It seems worthwhile to derive yet another result which unveils once more the fundamental aspect of synergy of the distribution of probabilities of occurrence. From Equation (93), we have:
d I j 1 j t = d ( S M ) j 1 j t ( α j 1 j t ) 2 1 r 1 s ,
and we then write from the GKS inequalities, Equation (78):
1 ( α j 1 j t ) 2 1 r 1 s 1 l = 1 t ( α j l ) 2 1 r 1 s = l = 1 t 1 ( α j l ) 2 1 r 1 s .
We then get:
d I j 1 j t d ( S M ) j 1 j t l = 1 t d I j l d ( S M ) j l .
Equation (100) do correspond to another result which originates from the Synergy of the distribution of probabilities of occurrence. It can be written as: The minimum of the rate of information increase with decreasing entropy in probability distribution for sets of t columns, is given by the product of the rates of information increase pertaining to each of the t columns.

6. The Use of Hölder’s Inequality for an Alternative Derivation of the GKS Inequalities

We firstly note that:
a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) = a t p j 1 j t ( a 1 , , a t ) p j t ( a t ) s 1 a t p j t ( a t ) s ,
and we now introduce the Hölder’s inequality: Ref. [11]
l x l y l a b l ( x l ) a b · l ( y l ) b a , a b = a + b , a 1 .
We can also write:
l x l y l l ( x l ) a 1 a · l ( y l ) b 1 b , 0 a < 1 , a b 0 ,
or
l x l y l a l ( x l ) a · l ( y l ) b a b , 0 a < 1 , b 0 .
We now make the correspondences:
x l p j 1 j t ( a 1 , , a t ) ; y l p j t ( a t ) s 1 .
We take the s-power of the sides of Equation (101) and after using Equations (104) and (105) with a = s and b = s s 1 , we get:
a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) s = a t p j 1 j t ( a 1 , , a t ) p j t ( a t ) s 1 a t p j t ( a t ) s s a t p j 1 j t ( a 1 , , a t ) s a t p j t ( a t ) s s 1 a t p j t ( a t ) s s = a t p j 1 j t ( a 1 , , a t ) s a t p j t ( a t ) s .
After summing up in a 1 , , a t 1 , we then have:
a 1 , , a t 1 a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t ) s a 1 , , a t p j 1 j t ( a 1 , , a t ) s a t p j t ( a t ) s .
From the application of Bayes’ law to the right-hand side of Equation (107), we get Equation (47) again, i.e., the correspondence of the assumption of concavity has been derived once more. All the subsequent development of Section 4 will then follows accordingly.

7. Concluding Remarks

It should be stressed that the introduction of escort probabilities has been efficient on the construction of generalized entropy measures. These can be used for the classification of databases in terms of their clustering as driven by their intrinsic synergy and the resulting formation of more complex structures like families and clans.
A fundamental aim would be the derivation of a dynamical theory which would be able to describe the process of formation of these structures. A theory based on the evolution of the entropy values on databases which we hope that could be realized by methods introduced by the exhaustive study of Fokker–Planck equations.
Some introductory results on this promising line of research have been already published [16] and a forthcoming publication of a comprehensive review will summarize all of them.

Author Contributions

Conceptualization, R.P.M. and S.C.d.A.N.; methodology, R.P.M. and S.C.d.A.N.; formal analysis, R.P.M. and S.C.d.A.N.; writing—original draft preparation, R.P.M.; writing—review and editing, R.P.M. and S.C.d.A.N.; visualization, R.P.M. and S.C.d.A.N.; supervision, R.P.M.; project administration, R.P.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
GKSGeneralized Khinchin–Shannon
SMSharma–Mittal

References

  1. Mondaini, R.P.; de Albuquerque Neto, S.C. Khinchin–Shannon Generalized Inequalities for “Non-additive” Entropy Measures. In Trends in Biomathematics 2; Mondaini, R.P., Ed.; Springer International Publishing: Cham, Switzerland, 2019; pp. 177–190. [Google Scholar]
  2. Khinchin, A.I. Mathematical Foundations of Information Theory; Dover Publications: New York, NY, USA, 1957. [Google Scholar]
  3. Sharma, B.D.; Mittal, D.P. New Non-additive Measures of Entropy for Discrete Probability Distributions. J. Math. Sci. 1975, 10, 28–40. [Google Scholar]
  4. Volkenstein, M.V. Entropy and Information; Birkhäuser: Basel, Switzerland, 2009. [Google Scholar]
  5. Beck, C. Generalized Information and Entropy Measures in Physics. Contemp. Phys. 2009, 50, 495–510. [Google Scholar] [CrossRef]
  6. Lavenda, B.H. A New Perspective on Thermodynamics; Springer Science+Business Media: New York, NY, USA, 2010. [Google Scholar]
  7. Havrda, J.; Charvat, F. Quantification Method of Classification Processes. Concept of Structural α-entropy. Kybernetica 1967, 3, 30–35. [Google Scholar]
  8. Rényi, A. On Measures of Entropy and Information. In Contributions to the Theory of Statistics, Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 20 June–30 July 1960; Neyman, J., Ed.; University of California Press: Berkeley, CA, USA, 1961; Volume 1, pp. 547–561. [Google Scholar]
  9. Landsberg, P.T.; Vedral, V. Distributions and Channel Capacities in Generalized Statistical Mechanics. Phys. Lett. A 1997, 224, 326–330. [Google Scholar] [CrossRef]
  10. Mondaini, R.P.; de Albuquerque Neto, S.C. The Statistical Analysis of Protein Domain Family Distributions via Jaccard Entropy Measures. In Trends in Biomathematics 3; Mondaini, R.P., Ed.; Springer International Publishing: Cham, Switzerland, 2020; pp. 169–207. [Google Scholar]
  11. Hardy, G.H.; Littlewood, J.E.; Pólya, G. Inequalities; Cambridge University Press: London, UK, 1934. [Google Scholar]
  12. Ay, N.; Olbrich, E.; Bertschinger, N.; Jost, J. A Geometric Approach to Complexity. Chaos 2011, 21, 037103. [Google Scholar] [CrossRef] [PubMed]
  13. Olbrich, E.; Bertschinger, N.; Rauh, J. Information Decomposition and Synergy. Entropy 2015, 17, 3501–3517. [Google Scholar] [CrossRef] [Green Version]
  14. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: New York, NY, USA, 2009. [Google Scholar]
  15. Marsden, J.E.; Tromba, A. Vector Calculus; W. H. Freeman and Company Publishers: New York, NY, USA, 2012. [Google Scholar]
  16. Mondaini, R.P.; de Albuquerque Neto, S.C. A Jaccard-like Symbol and its Usefulness in the Derivation of Amino Acid Distributions in Protein Domain Families. In Trends in Biomathematics 4; Mondaini, R.P., Ed.; Springer International Publishing: Cham, Switzerland, 2021; pp. 201–220. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Mondaini, R.P.; de Albuquerque Neto, S.C. Alternative Entropy Measures and Generalized Khinchin–Shannon Inequalities. Entropy 2021, 23, 1618. https://doi.org/10.3390/e23121618

AMA Style

Mondaini RP, de Albuquerque Neto SC. Alternative Entropy Measures and Generalized Khinchin–Shannon Inequalities. Entropy. 2021; 23(12):1618. https://doi.org/10.3390/e23121618

Chicago/Turabian Style

Mondaini, Rubem P., and Simão C. de Albuquerque Neto. 2021. "Alternative Entropy Measures and Generalized Khinchin–Shannon Inequalities" Entropy 23, no. 12: 1618. https://doi.org/10.3390/e23121618

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop