Alternative Entropy Measures and Generalized Khinchin–Shannon Inequalities

Rubem P. Mondaini; Simão C. de Albuquerque Neto

doi:10.3390/e23121618

and

COPPE, Centre of Technology, Federal University of Rio de Janeiro, Rio de Janeiro 21941-901, Brazil

^*

Author to whom correspondence should be addressed.

Entropy2021, 23(12), 1618;https://doi.org/10.3390/e23121618

This article belongs to the Special Issue Sample Entropy: Theory and Application

Version Notes

Order Reprints

Abstract

The Khinchin–Shannon generalized inequalities for entropy measures in Information Theory, are a paradigm which can be used to test the Synergy of the distributions of probabilities of occurrence in physical systems. The rich algebraic structure associated with the introduction of escort probabilities seems to be essential for deriving these inequalities for the two-parameter Sharma–Mittal set of entropy measures. We also emphasize the derivation of these inequalities for the special cases of one-parameter Havrda–Charvat’s, Rényi’s and Landsberg–Vedral’s entropy measures.

Keywords:

entropy measures; synergy; probabilistic distributions

1. Introduction

In the present contribution we derive the Generalized Khinchin–Shannon inequalities (GKS) [1,2] associated to entropy measures of the Sharma–Mittal (SM) set [3]. We stress that the derivations to be presented here are a tentative way of implementing the ideas of the literature on interdisciplinary topics of Statistical Mechanics and Theory of Information [4,5,6]. The algebraic structure of the escort probability distributions on these derivations seems to be essential, contrary to the intuitive derivation of the usual Khinchin–Shannon inequalities for the Gibbs–Shannon entropy measures. We start on Section 2 with the construction of a generic probabilistic space with their elements—the probabilities of occurrence—arranged on blocks of m rows and n columns. It then follows the introduction of the definitions of simple, joint, conditional and marginal probabilities through the use of the Bayes’ law. In Section 3, we make use of the assumption of concavity in order to unveil the Synergy of the distribution of values of Gibbs–Shannon entropy measures [2]. In Section 4, we present the same development but for the SM set of entropy measures, after the introduction of the concept of escort probabilities. We then specialize the derivations to Havrda–Charvat’s, Rényi’s and Landsberg–Vedral’s entropies [7,8,9]. A detailed study is then undertaken in this section to treat the eventual ordering between the probabilities of occurrence and their associated escort probabilities. This is then enough for deriving the GKS inequalities for the SM entropy measures. In Section 5, we present a proposal for Information measure associated to SM entropies and we derive its related inequalities [10]. At this point we stress once more the upsurge of the synergy effect on the comparison of the information obtained from the entropy calculated with joint probabilities of occurrence and the entropies corresponding to simple probabilities. In Section 6, we present an alternative derivation of the GKS inequalities based on Hölder inequalities [11]. These can provide, in association with Bayes’ law, the same assumptions of concavity which have been used in Section 3 and Section 4 and a consequent identical derivation of the GKS inequalities given in Section 4.

2. The Probability Space. Probabilities of Occurrence

We consider that the data could be represented on two-dimensional arrays of m rows and n columns. We then have

m \times n

blocks of data to undertake the statistical analysis. The joint probabilities of occurrence of a set of t variables

a_{1}, \dots, a_{t}

in columns

j_{1}, \dots, j_{t}

, respectively, are given by

p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) = \frac{n_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t})}{m},

(1)

where m is the number of rows in the subarray

m \times t

of the array

m \times n

, and

n_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t})

is the number of occurrences of the set

a_{1}, \dots, a_{t}

. The values assumed by the variables

j_{1}, \dots, j_{t}

,

j_{1} < j_{2} < \dots < j_{t}

, are respectively given by:

\begin{matrix} j_{1} & = 1, 2, \dots, n - t + 1 \\ j_{2} & = j_{1} + 1, j_{1} + 2, \dots, n - t + 2 \\ ⋮ \\ j_{t - 1} & = j_{t - 2} + 1, j_{t - 2} + 2, \dots, n - 1 \\ j_{t} & = j_{t - 1} + 1, j_{t - 1} + 2, \dots, n, \end{matrix}

(2)

or,

\begin{matrix} j_{1} & = 1, 2, \dots, n - t + 1 \\ j_{2} & = j_{1} + 1, j_{1} + 2, \dots, n - t + 2 \\ ⋮ \\ j_{t - 1} & = j_{1} + t - 2, j_{1} + t - 1, \dots, n - 1 \\ j_{t} & = j_{1} + t - 1, j_{1} + t, \dots, n . \end{matrix}

(3)

There are then

(\binom{n}{t}) = \frac{n!}{t! (n - t)!}

objects of t columns each,

1 \leq t \leq n

, and if the variables

a_{1}, \dots, a_{t}

take on values

1, \dots, W

, then we will have

{(W)}^{t}

components for each of these objects.

Since:

\sum_{a_{1}, \dots, a_{t} = 1}^{W} n_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) = m,

(4)

we can write:

\sum_{a_{1}, \dots, a_{t} = 1}^{W} p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) = 1 .

(5)

On the study of distributions of bases of nucleotides or distributions of amino acids in proteins, the related values of W are

W = 4

and

W = 20

, respectively.

The Bayes’ law for the probabilities of occurrence of Equation (1) is written as:

\begin{matrix} p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) & = p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) \cdot p_{j_{t}} (a_{t}) \\ = p_{j_{t} j_{1} \dots j_{t - 1}} (a_{t} | a_{1}, \dots, a_{t - 1}) \cdot p_{j_{1} \dots j_{t - 1}} (a_{1}, \dots, a_{t - 1}) \\ = p_{j_{t} j_{1} \dots j_{t - 1}} (a_{t}, a_{1}, \dots, a_{t - 1}), \end{matrix}

(6)

where

p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t})

stands for the conditional probability of occurrence of the values associated to the variables

a_{1}, \dots, a_{t - 1}

in the columns

j_{1}, \dots, j_{t - 1}

, respectively, if the values associated to

a_{t}

in the jth column are given a priori. This also means that:

\sum_{a_{1}, \dots, a_{t - 1}} p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) = 1 .

(7)

The marginal probabilities related to

p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t})

are then given by

\begin{matrix} p_{j_{t}} (a_{t}) & = \sum_{a_{1}, \dots, a_{t - 1}} p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) \\ = \sum_{a_{1}, \dots, a_{t - 1}} p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) p_{j_{t}} (a_{t}) . \end{matrix}

(8)

We then have from Equations (6) and (8):

1 = \sum_{a_{t}} p_{j_{t}} (a_{t}) = \sum_{a_{1}, \dots, a_{t}} p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}),

(9)

which is the same result of Equation (5).

3. The Assumption of Concavity and the Synergy of Gibbs–Shannon Entropy Measures

A concave function of several variables should satisfy the following inequality:

\sum_{l} λ_{l} f (x_{l}) \leq f (\sum_{l} λ_{l} x_{l}); λ_{l} \geq 0; \sum_{l} λ_{l} = 1 .

(10)

We shall apply Equation (10) to the Gibbs–Shannon entropies:

S_{j_{1} \dots j_{t}} = - \sum_{a_{1}, \dots, a_{t}} p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) log p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}),

(11)

S_{j_{1} \dots j_{t - 1} | j_{t}} = - \sum_{a_{1}, \dots, a_{t}} p_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) log p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) .

(12)

where Equation (12) stands for the definition of Gibbs–Shannon entropy which is related to the conditional probabilities

p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t})

. It is a measure of the uncertainty [2] on the distribution of probabilities of the columns

j_{1}, \dots, j_{t - 1}

, when we have previous information on the distribution of the column

j_{t}

.

From Bayes’ law, Equation (6) and from Equations (8), (11) and (12), we get:

S_{j_{1} \dots j_{t}} = S_{j_{1} \dots j_{t - 1} | j_{t}} + S_{j_{t}} .

(13)

We now use the correspondences:

\begin{matrix} λ_{l} \leftrightarrow p_{j_{t}} (a_{t}); x_{l} \leftrightarrow p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}), \\ f (x_{l}) \leftrightarrow - \sum_{a_{1}, \dots, a_{t - 1}} p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) log p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}), \end{matrix}

(14)

and we then have:

\sum_{l} λ_{l} x_{l} \leftrightarrow \sum_{a_{t}} p_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) = \sum_{a_{t}} p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) = p_{j_{1} \dots j_{t - 1}} (a_{1}, \dots, a_{t - 1}),

(15)

\sum_{l} λ_{l} f (x_{l}) \leftrightarrow - \sum_{a_{t}, a_{1}, \dots, a_{t - 1}} p_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) log p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}),

(16)

and

f (\sum_{l} λ_{l} x_{l}) \leftrightarrow - \sum_{a_{1}, \dots, a_{t - 1}} p_{j_{1} \dots j_{t - 1}} (a_{1}, \dots, a_{t - 1}) log p_{j_{1} \dots j_{t - 1}} (a_{1}, \dots, a_{t - 1}) .

(17)

After substituting Equations (12), (16) and (17) into Equation (10), we get:

\begin{matrix} S_{j_{1} \dots j_{t - 1} | j_{t}} & = - \sum_{a_{1}, \dots, a_{t}} p_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) log p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) \\ \leq - \sum_{a_{1}, \dots, a_{t - 1}} p_{j_{1} \dots j_{t - 1}} (a_{1}, \dots, a_{t - 1}) log p_{j_{1} \dots j_{t - 1}} (a_{1}, \dots, a_{t - 1}), \end{matrix}

(18)

or,

S_{j_{1} \dots j_{t - 1} | j_{t}} \leq S_{j_{1} \dots j_{t - 1}} .

(19)

This means that the uncertainty of the distribution on the columns

j_{1}, \dots, j_{t - 1}

cannot be increased when we have previous information on the distribution of column

j_{t}

.

From Equations (13) and (19), we then write:

S_{j_{1} \dots j_{t}} \leq S_{j_{1} \dots j_{t - 1}} + S_{j_{t}},

(20)

and by iteration we get the Khinchin–Shannon inequality for the Gibbs–Shannon entropy measure:

S_{j_{1} \dots j_{t}} \leq \sum_{l = 1}^{t} S_{j_{l}} .

(21)

The usual meaning given to Equation (21) is that the minimum of the information to be obtained from the analysis of the joint probabilities of a set of t columns is given by the sum of the informations associated with the t columns if considered as independent [1,2,10]. This is also seen as an aspect of Synergy [12,13] of the distribution of probabilities of occurrence.

4. The Assumption of Concavity and the Synergy of Sharma–Mittal (SM) Entropy Measures. The GKS Inequalities

We shall now use the assumption of concavity given by Equation (10) on Sharma–Mittal (SM) entropy measures:

{(S M)}_{j_{1} \dots j_{t}} = \frac{{(α_{j_{1} \dots j_{t}})}^{\frac{1 - r}{1 - s}} - 1}{1 - r},

(22)

where,

α_{j_{1} \dots j_{t}} = \sum_{a_{1}, \dots, a_{t}} {[p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t})]}^{s},

(23)

and r, s are non-dimensional parameters.

Analogously to Equation (12), we also introduce the “conditional entropy measure”

{(S M)}_{j_{1} \dots j_{t - 1} | j_{t}} = \frac{{(β_{j_{1} \dots j_{t - 1} | j_{t}})}^{\frac{1 - r}{1 - s}} - 1}{1 - r},

(24)

where

β_{j_{1} \dots j_{t - 1} | j_{t}} = \sum_{a_{1}, \dots, a_{t}} {\hat{p}}_{j_{t}} (a_{t}) {[p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t})]}^{s},

(25)

and

{\hat{p}}_{j_{t}} (a_{t})

stands for the escort probability

{\hat{p}}_{j_{t}} (a_{t}) = \frac{{(p_{j_{t}} (a_{t}))}^{s}}{α_{j_{t}}} = \frac{{(p_{j_{t}} (a_{t}))}^{s}}{\sum_{a_{t}} {(p_{j_{t}} (a_{t}))}^{s}} .

(26)

We have in general:

{\hat{p}}_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) = \frac{{(p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}))}^{s}}{α_{j_{1} \dots j_{t}}} = \frac{{(p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}))}^{s}}{\sum_{a_{1}, \dots, a_{t}} {(p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}))}^{s}} .

(27)

The inverse transformations are given by:

p_{j_{t}} (a_{t}) = \frac{{({\hat{p}}_{j_{t}} (a_{t}))}^{1 / s}}{\sum_{a_{t}} {({\hat{p}}_{j_{t}} (a_{t}))}^{1 / s}},

(28)

p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) = \frac{{({\hat{p}}_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}))}^{1 / s}}{\sum_{a_{1}, \dots, a_{t}} {({\hat{p}}_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}))}^{1 / s}},

(29)

with

\sum_{a_{t}} p_{j_{t}} (a_{t}) = 1 = \sum_{a_{t}} {\hat{p}}_{j_{t}} (a_{t}),

(30)

\sum_{a_{1}, \dots, a_{t}} p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) = 1 = \sum_{a_{1}, \dots, a_{t}} {\hat{p}}_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) .

(31)

A range of variation for the parameters r, s of Sharma–Mittal entropies, Equation (22), should be derived from a requirement for strict concavity. In order to do so, let us remember that for each set of t columns (

m \times t

subarray) there are m rows of t values each (t-sequences). We now denote these t-sequences by:

(a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}}), μ = 1, \dots, m

(32)

A sufficient requirement for strict concavity is the negative definiteness of the quadratic form associated to the Hessian matrix [14], whose elements are given by:

H_{q_{μ} q_{ν}} = \frac{\partial^{2} {(S M)}_{j_{1} \dots j_{t}}}{\partial p_{j_{1} \dots j_{t}} (a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}}) \partial p_{j_{1} \dots j_{t}} (a_{1}^{q_{ν}}, \dots, a_{t}^{q_{ν}})}, μ, ν = 1, \dots, m .

(33)

We then consider the m submatrices along the diagonal of the Hessian matrix. Their determinants should be alternately negative or positive according to whether their order is odd or even [15], respectively:

det H_{q_{μ} q_{ν}} (μ, ν = 1) = s {(α_{j_{1} \dots j_{t}})}^{\frac{s - r}{1 - s}} {[p_{j_{1} \dots j_{t}} (a_{1}^{q_{1}}, \dots, a_{t}^{q_{1}})]}^{s - 2} [\frac{s (s - r)}{{(1 - s)}^{2}} \cdot {\hat{p}}_{j_{1} \dots j_{t}} (a_{1}^{q_{1}}, \dots, a_{t}^{q_{1}}) - 1],

(34)

\begin{matrix} det H_{q_{μ} q_{ν}} (μ, ν = 1, 2) = - s^{2} {(α_{j_{1} \dots j_{t}})}^{2 (\frac{s - r}{1 - s})} {[p_{j_{1} \dots j_{t}} (a_{1}^{q_{1}}, \dots, a_{t}^{q_{1}}) \cdot p_{j_{1} \dots j_{t}} (a_{1}^{q_{2}}, \dots, a_{t}^{q_{2}})]}^{s - 2} \\ \{\frac{s (s - r)}{{(1 - s)}^{2}} \cdot [{\hat{p}}_{j_{1} \dots j_{t}} (a_{1}^{q_{1}}, \dots, a_{t}^{q_{1}}) + {\hat{p}}_{j_{1} \dots j_{t}} (a_{1}^{q_{2}}, \dots, a_{t}^{q_{2}})] - 1\}, \end{matrix}

(35)

\begin{matrix} det H_{q_{μ} q_{ν}} (μ, ν = 1, 2, 3) = s^{3} {(α_{j_{1} \dots j_{t}})}^{3 (\frac{s - r}{1 - s})} {[p_{j_{1} \dots j_{t}} (a_{1}^{q_{1}}, \dots, a_{t}^{q_{1}}) \cdot p_{j_{1} \dots j_{t}} (a_{1}^{q_{2}}, \dots, a_{t}^{q_{2}}) \cdot p_{j_{1} \dots j_{t}} (a_{1}^{q_{3}}, \dots, a_{t}^{q_{3}})]}^{s - 2} \\ \cdot \{\frac{s (s - r)}{{(1 - s)}^{2}} \cdot [{\hat{p}}_{j_{1} \dots j_{t}} (a_{1}^{q_{1}}, \dots, a_{t}^{q_{1}}) + {\hat{p}}_{j_{1} \dots j_{t}} (a_{1}^{q_{2}}, \dots, a_{t}^{q_{2}}) + {\hat{p}}_{j_{1} \dots j_{t}} (a_{1}^{q_{3}}, \dots, a_{t}^{q_{3}})] - 1\} . \end{matrix}

(36)

We then choose:

1 > r \geq s > 0,

(37)

and we have from Equations (34)–(36):

det H_{q_{μ} q_{ν}} (μ, ν = 1) < 0,

(38)

det H_{q_{μ} q_{ν}} (μ, ν = 1, 2) > 0,

(39)

det H_{q_{μ} q_{ν}} (μ, ν = 1, 2, 3) < 0 .

(40)

We then have generally:

\begin{matrix} det H_{q_{μ} q_{ν}} (μ, ν = 1, \dots, m) = {(- 1)}^{m - 1} \cdot s^{m} {(α_{j_{1} \dots j_{t}})}^{m (\frac{s - r}{1 - s})} {[\prod_{μ = 1}^{m} p_{j_{1} \dots j_{t}} (a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}})]}^{s - 2} \\ \{\frac{s (s - r)}{{(1 - s)}^{2}} \sum_{μ = 1}^{m} {\hat{p}}_{j_{1} \dots j_{t}} (a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}}) - 1\} . \end{matrix}

(41)

This completes the proof.

From Bayes’ law, Equation (6) and from Equations (8), (22)–(25), we can write:

{(S M)}_{j_{1} \dots j_{t}} = {(S M)}_{j_{1} \dots j_{t - 1} | j_{t}} + {(S M)}_{j_{t}} + (1 - r) {(S M)}_{j_{1} \dots j_{t - 1} | j_{t}} \cdot {(S M)}_{j_{t}} .

(42)

We are now ready to use the concavity assumption, Equation (10) for deriving the GKS inequalities. In order to do so, we make the correspondences:

\begin{matrix} λ_{l} \leftrightarrow {\hat{p}}_{j_{t}} (a_{t}); x_{l} \leftrightarrow p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}), \\ f (x_{l}) \leftrightarrow \sum_{a_{1}, \dots, a_{t - 1}} {[p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t})]}^{s} . \end{matrix}

(43)

We can then write:

\sum_{l} λ_{l} x_{l} \leftrightarrow \sum_{a_{t}} {\hat{p}}_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}),

(44)

\sum_{l} λ_{l} f (x_{l}) \leftrightarrow \sum_{a_{t}, a_{1}, \dots, a_{t - 1}} {\hat{p}}_{j_{t}} (a_{t}) {[p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t})]}^{s},

(45)

and

f (\sum_{l} λ_{l} x_{l}) \leftrightarrow \sum_{a_{1}, \dots, a_{t - 1}} {[\sum_{a_{t}} {\hat{p}}_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t})]}^{s} .

(46)

With the correspondences above, Equation (10) will turn into:

\sum_{a_{1}, \dots, a_{t - 1}} {[\sum_{a_{t}} {\hat{p}}_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t})]}^{s} \geq \sum_{a_{1}, \dots, a_{t}} {\hat{p}}_{j_{t}} (a_{t}) {[p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t})]}^{s} .

(47)

An additional information should be taken into consideration before we derive the GKS inequalities:

On each

j_{t}

column,

t = 1, \dots, n

of a

m \times n

block, there will be values

a_{t}^{'}

,

a_{t}^{″}

of

a_{t}

such that

p_{j_{t}} (a_{t}^{'}) \geq {\hat{p}}_{j_{t}} (a_{t}^{'}),

(48)

and

p_{j_{t}} (a_{t}^{″}) \leq {\hat{p}}_{j_{t}} (a_{t}^{″}) .

(49)

After multiplying inequalities (48) and (49) by

p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}^{'})

and

p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}^{″})

, respectively, and summing up in

a_{t}^{'}

and

a_{t}^{″}

, respectively, we get:

\sum_{a_{t}^{'}} p_{j_{t}} (a_{t}^{'}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}^{'}) \geq \sum_{a_{t}^{'}} {\hat{p}}_{j_{t}} (a_{t}^{'}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}^{'}),

(50)

and

\sum_{a_{t}^{″}} p_{j_{t}} (a_{t}^{″}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}^{″}) \leq \sum_{a_{t}^{″}} {\hat{p}}_{j_{t}} (a_{t}^{″}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}^{″}) .

(51)

From Equations (48) and (49), any sum over the

a_{t}

values can be partitioned as sums over the sets of values

a_{t}^{'}

and

a_{t}^{″}

:

\sum_{a_{t}} p_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) = \sum_{a_{t}^{'}} p_{j_{t}} (a_{t}^{'}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}^{'}) + \sum_{a_{t}^{″}} p_{j_{t}} (a_{t}^{″}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}^{″}) .

(52)

Substituting Equation (52) into Equations (50) and (51), we have:

\begin{matrix} \sum_{a_{t}} & p_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) - \sum_{a_{t}^{″}} p_{j_{t}} (a_{t}^{″}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}^{″}) \\ \geq \sum_{a_{t}} {\hat{p}}_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) - \sum_{a_{t}^{″}} {\hat{p}}_{j_{t}} (a_{t}^{″}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}^{″}), \end{matrix}

(53)

and

\begin{matrix} \sum_{a_{t}} & p_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) - \sum_{a_{t}^{'}} p_{j_{t}} (a_{t}^{'}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}^{'}) \\ \leq \sum_{a_{t}} {\hat{p}}_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) - \sum_{a_{t}^{'}} {\hat{p}}_{j_{t}} (a_{t}^{'}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}^{'}), \end{matrix}

(54)

respectively.

After applying the Bayes’ law, Equation (6), to the first term on the left hand side of Equations (53) and (54), we get:

p_{j_{1} \dots j_{t - 1}} (a_{1}, \dots, a_{t - 1}) \geq \sum_{a_{t}} {\hat{p}}_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) + B^{″},

(55)

p_{j_{1} \dots j_{t - 1}} (a_{1}, \dots, a_{t - 1}) \leq \sum_{a_{t}} {\hat{p}}_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) + B^{'},

(56)

where

B^{″} = \sum_{a_{t}^{″}} (p_{j_{t}} (a_{t}^{″}) - {\hat{p}}_{j_{t}} (a_{t}^{″})) \cdot p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}^{″}),

(57)

B^{'} = \sum_{a_{t}^{'}} (p_{j_{t}} (a_{t}^{'}) - {\hat{p}}_{j_{t}} (a_{t}^{'})) \cdot p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}^{'}),

(58)

and we have,

B^{″} \leq 0

,

B^{'} \geq 0

, according to Equations (48) and (49), respectively.

After taking the s-power in Equations (55) and (56) and summing up in

a_{1}, \dots, a_{t - 1}

, we have:

\sum_{a_{1}, \dots, a_{t - 1}} {[p_{j_{1} \dots j_{t - 1}} (a_{1}, \dots, a_{t - 1})]}^{s} \geq \sum_{a_{1}, \dots, a_{t - 1}} {[\sum_{a_{t}} {\hat{p}}_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) + B^{″}]}^{s},

(59)

\sum_{a_{1}, \dots, a_{t - 1}} {[p_{j_{1} \dots j_{t - 1}} (a_{1}, \dots, a_{t - 1})]}^{s} \leq \sum_{a_{1}, \dots, a_{t - 1}} {[\sum_{a_{t}} {\hat{p}}_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) + B^{'}]}^{s} .

(60)

We now write, the concavity assumption, Equation (10), as:

C \geq D,

(61)

where

C \equiv \sum_{a_{1}, \dots, a_{t - 1}} {[\sum_{a_{t}} {\hat{p}}_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t})]}^{s},

(62)

and

D \equiv \sum_{a_{1}, \dots, a_{t}} {\hat{p}}_{j_{t}} (a_{t}) {[p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t})]}^{s} .

(63)

Equations (59) and (60) are now written as:

A \geq C^{″},

(64)

A \leq C^{'},

(65)

where

A \equiv \sum_{a_{1}, \dots, a_{t - 1}} {[p_{j_{1} \dots j_{t - 1}} (a_{1}, \dots, a_{t - 1})]}^{s},

(66)

and

C^{″} \equiv \sum_{a_{1}, \dots, a_{t - 1}} {[\sum_{a_{t}} {\hat{p}}_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) + B^{″}]}^{s},

(67)

C^{'} \equiv \sum_{a_{1}, \dots, a_{t - 1}} {[\sum_{a_{t}} {\hat{p}}_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) + B^{'}]}^{s} .

(68)

Since

B^{″} \leq 0

and

B^{'} \geq 0

, we have trivially that:

C^{″} \leq C,

(69)

and

C^{'} \geq C .

(70)

The set of inequalities, Equations (61), (64) and (69), or

C \geq D; A \geq C^{″}; C^{″} \leq C,

(71)

and the set of inequalities, Equations (61), (65) and (70), or

C \geq D; A \leq C^{'}; C^{'} \geq C,

(72)

can be arranged as the chains of inequalities

C \geq A \geq D \geq C^{″},

(73)

and

C^{'} \geq A \geq C \geq D,

(74)

respectively.

The inequality

A \geq D

is common to the two chains above and it can be written as:

\sum_{a_{1}, \dots, a_{t - 1}} {[p_{j_{1} \dots j_{t - 1}} (a_{1}, \dots, a_{t - 1})]}^{s} \geq \sum_{a_{1}, \dots, a_{t}} {\hat{p}}_{j_{t}} (a_{t}) {[p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t})]}^{s} .

(75)

From the definition of the escort probabilities, Equations (26) and (27), we can write the right-hand side of Equation (75), as:

\sum_{a_{1}, \dots, a_{t}} {\hat{p}}_{j_{t}} (a_{t}) {[p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t})]}^{s} = \frac{\sum_{a_{1}, \dots, a_{t}} {[p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t})]}^{s}}{\sum_{a_{t}} {[p_{j_{t}} (a_{t})]}^{s}} .

(76)

From Equations (75) and (76) and the definition of the

α

-symbols, Equation (23), we have:

α_{j_{1} \dots j_{t}} \leq α_{j_{1} \dots j_{t - 1}} \cdot α_{j_{t}} .

(77)

We then get by iteration,

α_{j_{1} \dots j_{t}} \leq \prod_{l = 1}^{t} α_{j_{l}} .

(78)

Equation (78) do correspond to the Generalized Khinchin–Shannon Inequalities (GKS) here derived for Sharma–Mittal entropies.

From Equations (22) and (37) we can also write for the GKS inequalities:

{(S M)}_{j_{1} \dots j_{t}} \leq \frac{\prod_{l = 1}^{t} [1 + (1 - r) {(S M)}_{j_{l}}] - 1}{1 - r} .

(79)

The same words which have been written after Equation (21), could be written also here for the Sharma–Mittal entropy measures as the aspect of Synergy is concerned. We will introduce a proposal for information measure to stress this aspect on the next section.

For

t = 2, 3

, we can write from Equation (79):

{(S M)}_{j_{1} j_{2}} \leq {(S M)}_{j_{1}} + {(S M)}_{j_{2}} + (1 - r) {(S M)}_{j_{1}} {(S M)}_{j_{2}},

(80)

\begin{matrix} {(S M)}_{j_{1} j_{2} j_{3}} \leq {(S M)}_{j_{1}} & + {(S M)}_{j_{2}} + {(S M)}_{j_{3}} + (1 - r) [{(S M)}_{j_{1}} {(S M)}_{j_{2}} + {(S M)}_{j_{2}} {(S M)}_{j_{3}} \\ + {(S M)}_{j_{1}} {(S M)}_{j_{3}}] + {(1 - r)}^{2} {(S M)}_{j_{1}} {(S M)}_{j_{2}} {(S M)}_{j_{3}} . \end{matrix}

(81)

The Havrda–Charvat’s, Rényi’s and Landsberg–Vedral’s entropies are easily obtained by taking the convenient limits in Equation (24):

lim_{r \to s} {(S M)}_{j_{1} \dots j_{t}} = {(H C)}_{j_{1} \dots j_{t}} = \frac{α_{j_{1} \dots j_{t}} - 1}{1 - s},

(82)

lim_{r \to 1} {(S M)}_{j_{1} \dots j_{t}} = R_{j_{1} \dots j_{t}} = \frac{log (α_{j_{1} \dots j_{t}})}{1 - s},

(83)

lim_{r \to 2 - s} {(S M)}_{j_{1} \dots j_{t}} = {(L V)}_{j_{1} \dots j_{t}} = \frac{α_{j_{1} \dots j_{t}} - 1}{(1 - s) α_{j_{1} \dots j_{t}}} .

(84)

The Gibbs–Shannon entropy measure

S_{j_{1} \dots j_{t}}

, Equation (11), is included in all these entropies through:

lim_{s \to 1} {(H C)}_{j_{1} \dots j_{t}} = lim_{s \to 1} R_{j_{1} \dots j_{t}} = lim_{s \to 1} {(L V)}_{j_{1} \dots j_{t}} = S_{j_{1} \dots j_{t}} .

(85)

Equations (83) and (85) have been derived via the l’Hôpital theorem.

For

t = 2, 3

, we write from Equations (82)–(84):

{(H C)}_{j_{1} j_{2}} \leq {(H C)}_{j_{1}} + {(H C)}_{j_{2}} + (1 - s) {(H C)}_{j_{1}} {(H C)}_{j_{2}},

(86)

\begin{matrix} {(H C)}_{j_{1} j_{2} j_{3}} \leq {(H C)}_{j_{1}} & + {(H C)}_{j_{2}} + {(H C)}_{j_{3}} + (1 - s) [{(H C)}_{j_{1}} {(H C)}_{j_{2}} + {(H C)}_{j_{2}} {(H C)}_{j_{3}} \\ + {(H C)}_{j_{1}} {(H C)}_{j_{3}}] + {(1 - s)}^{2} {(H C)}_{j_{1}} {(H C)}_{j_{2}} {(H C)}_{j_{3}}, \end{matrix}

(87)

R_{j_{1} j_{2}} \leq R_{j_{1}} + R_{j_{2}},

(88)

R_{j_{1} j_{2} j_{3}} \leq R_{j_{1}} + R_{j_{2}} + R_{j_{3}},

(89)

{(L V)}_{j_{1} j_{2}} \leq {(L V)}_{j_{1}} + {(L V)}_{j_{2}} - (1 - s) {(L V)}_{j_{1}} {(L V)}_{j_{2}},

(90)

\begin{matrix} {(L V)}_{j_{1} j_{2} j_{3}} \leq {(L V)}_{j_{1}} & + {(L V)}_{j_{2}} + {(L V)}_{j_{3}} - (1 - s) [{(L V)}_{j_{1}} {(L V)}_{j_{2}} + {(L V)}_{j_{2}} {(L V)}_{j_{3}} \\ + {(L V)}_{j_{1}} {(L V)}_{j_{3}}] + {(1 - s)}^{2} {(L V)}_{j_{1}} {(L V)}_{j_{2}} {(L V)}_{j_{3}} . \end{matrix}

(91)

As a last result of this section, we note that Equation (79) could be also derived from Equation (75), since this equation could be also written as:

{(S M)}_{j_{1} \dots j_{t - 1} | j_{t}} \leq {(S M)}_{j_{1} \dots j_{t - 1}},

(92)

where we have used Equations (22)–(25).

After the comparison of Equation (42) and (92), we get the result of Equation (79) again.

5. An Information Measure Proposal Associated to Sharma–Mittal Entropy Measures

We are looking for a proposal of information measure which can fulfill a requirement of clear interpretation of the upsurge of Synergy in a probabilistic distribution and is supported by the usual idea of entropy as a measure of uncertainty.

For the Sharma–Mittal set of entropy measures the proposal for the associated information measure would be:

I_{j_{1} \dots j_{t}} = - \frac{{(S M)}_{j_{1} \dots j_{t}}}{{(α_{j_{1} \dots j_{t}})}^{\frac{1 - r}{1 - s}}}, 1 > r \geq s > 0,

(93)

where

{(S M)}_{j_{1} \dots j_{t}}

and

α_{j_{1} \dots j_{t}}

are given by Equations (22) and (23). We then have from Equation (93)

{(S M)}_{j_{1} \dots j_{t}} = - \frac{I_{j_{1} \dots j_{t}}}{1 + (1 - r) I_{j_{1} \dots j_{t}}} .

(94)

From the GKS inequalities, Equation (79), and from Equation (94), we get:

I_{j_{1} \dots j_{t}} \geq \frac{\prod_{l = 1}^{t} [1 + (1 - r) I_{j_{l}}] - 1}{1 - r} \geq \sum_{l = 1}^{t} I_{j_{l}} .

(95)

The meaning of Equation (95) is that the minimum of information associated with t columns of probabilities of occurrence is given by the sum of informations associated to each column. This corresponds to the expression of Synergy of the distribution of probabilities of occurrences which we have derived on the previous section.

The inequalities (95), for

t = 2, 3

are written as:

I_{j_{1} j_{2}} \geq I_{j_{1}} + I_{j_{2}} + (1 - r) I_{j_{1}} I_{j_{2}} \geq I_{j_{1}} + I_{j_{2}},

(96)

\begin{matrix} I_{j_{1} j_{2} j_{3}} & \geq I_{j_{1}} + I_{j_{2}} + I_{j_{3}} + (1 - r) [I_{j_{1}} I_{j_{2}} + I_{j_{2}} I_{j_{3}} + I_{j_{1}} I_{j_{3}}] + {(1 - r)}^{2} I_{j_{1}} I_{j_{2}} I_{j_{3}} \\ \geq I_{j_{1}} + I_{j_{2}} + I_{j_{3}} . \end{matrix}

(97)

It seems worthwhile to derive yet another result which unveils once more the fundamental aspect of synergy of the distribution of probabilities of occurrence. From Equation (93), we have:

d I_{j_{1} \dots j_{t}} = - \frac{d {(S M)}_{j_{1} \dots j_{t}}}{{(α_{j_{1} \dots j_{t}})}^{2 (\frac{1 - r}{1 - s})}},

(98)

and we then write from the GKS inequalities, Equation (78):

\frac{1}{{(α_{j_{1} \dots j_{t}})}^{2 (\frac{1 - r}{1 - s})}} \geq \frac{1}{\prod_{l = 1}^{t} {(α_{j_{l}})}^{2 (\frac{1 - r}{1 - s})}} = \prod_{l = 1}^{t} \frac{1}{{(α_{j_{l}})}^{2 (\frac{1 - r}{1 - s})}} .

(99)

We then get:

- \frac{d I_{j_{1} \dots j_{t}}}{d {(S M)}_{j_{1} \dots j_{t}}} \geq \prod_{l = 1}^{t} (- \frac{d I_{j_{l}}}{d {(S M)}_{j_{l}}}) .

(100)

Equation (100) do correspond to another result which originates from the Synergy of the distribution of probabilities of occurrence. It can be written as: The minimum of the rate of information increase with decreasing entropy in probability distribution for sets of t columns, is given by the product of the rates of information increase pertaining to each of the t columns.

6. The Use of Hölder’s Inequality for an Alternative Derivation of the GKS Inequalities

We firstly note that:

\sum_{a_{t}} {\hat{p}}_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) = \frac{\sum_{a_{t}} p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) {(p_{j_{t}} (a_{t}))}^{s - 1}}{\sum_{a_{t}} {(p_{j_{t}} (a_{t}))}^{s}},

(101)

and we now introduce the Hölder’s inequality: Ref. [11]

{(\sum_{l} x_{l} y_{l})}^{a b} \leq {(\sum_{l} {(x_{l})}^{a})}^{b} \cdot {(\sum_{l} {(y_{l})}^{b})}^{a}, a b = a + b, a \neq 1 .

(102)

We can also write:

\sum_{l} x_{l} y_{l} \geq {(\sum_{l} {(x_{l})}^{a})}^{\frac{1}{a}} \cdot {(\sum_{l} {(y_{l})}^{b})}^{\frac{1}{b}}, 0 \leq a < 1, a b \leq 0,

(103)

or

{(\sum_{l} x_{l} y_{l})}^{a} \geq \sum_{l} {(x_{l})}^{a} \cdot {(\sum_{l} {(y_{l})}^{b})}^{\frac{a}{b}}, 0 \leq a < 1, b \leq 0 .

(104)

We now make the correspondences:

x_{l} \leftrightarrow p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}); y_{l} \leftrightarrow {(p_{j_{t}} (a_{t}))}^{s - 1} .

(105)

We take the s-power of the sides of Equation (101) and after using Equations (104) and (105) with

a = s

and

b = \frac{s}{s - 1}

, we get:

\begin{matrix} {(\sum_{a_{t}} {\hat{p}}_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}))}^{s} & = {(\frac{\sum_{a_{t}} p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) {(p_{j_{t}} (a_{t}))}^{s - 1}}{\sum_{a_{t}} {(p_{j_{t}} (a_{t}))}^{s}})}^{s} \\ \geq \frac{(\sum_{a_{t}} {(p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}))}^{s}) {(\sum_{a_{t}} {(p_{j_{t}} (a_{t}))}^{s})}^{s - 1}}{{(\sum_{a_{t}} {(p_{j_{t}} (a_{t}))}^{s})}^{s}} \\ = \frac{\sum_{a_{t}} {(p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}))}^{s}}{\sum_{a_{t}} {(p_{j_{t}} (a_{t}))}^{s}} . \end{matrix}

(106)

After summing up in

a_{1}, \dots, a_{t - 1}

, we then have:

\sum_{a_{1}, \dots, a_{t - 1}} {(\sum_{a_{t}} {\hat{p}}_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}))}^{s} \geq \frac{\sum_{a_{1}, \dots, a_{t}} {(p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}))}^{s}}{\sum_{a_{t}} {(p_{j_{t}} (a_{t}))}^{s}} .

(107)

From the application of Bayes’ law to the right-hand side of Equation (107), we get Equation (47) again, i.e., the correspondence of the assumption of concavity has been derived once more. All the subsequent development of Section 4 will then follows accordingly.

7. Concluding Remarks

It should be stressed that the introduction of escort probabilities has been efficient on the construction of generalized entropy measures. These can be used for the classification of databases in terms of their clustering as driven by their intrinsic synergy and the resulting formation of more complex structures like families and clans.

A fundamental aim would be the derivation of a dynamical theory which would be able to describe the process of formation of these structures. A theory based on the evolution of the entropy values on databases which we hope that could be realized by methods introduced by the exhaustive study of Fokker–Planck equations.

Some introductory results on this promising line of research have been already published [16] and a forthcoming publication of a comprehensive review will summarize all of them.

Author Contributions

Conceptualization, R.P.M. and S.C.d.A.N.; methodology, R.P.M. and S.C.d.A.N.; formal analysis, R.P.M. and S.C.d.A.N.; writing—original draft preparation, R.P.M.; writing—review and editing, R.P.M. and S.C.d.A.N.; visualization, R.P.M. and S.C.d.A.N.; supervision, R.P.M.; project administration, R.P.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

GKS	Generalized Khinchin–Shannon
SM	Sharma–Mittal

References

Mondaini, R.P.; de Albuquerque Neto, S.C. Khinchin–Shannon Generalized Inequalities for “Non-additive” Entropy Measures. In Trends in Biomathematics 2; Mondaini, R.P., Ed.; Springer International Publishing: Cham, Switzerland, 2019; pp. 177–190. [Google Scholar]
Khinchin, A.I. Mathematical Foundations of Information Theory; Dover Publications: New York, NY, USA, 1957. [Google Scholar]
Sharma, B.D.; Mittal, D.P. New Non-additive Measures of Entropy for Discrete Probability Distributions. J. Math. Sci. 1975, 10, 28–40. [Google Scholar]
Volkenstein, M.V. Entropy and Information; Birkhäuser: Basel, Switzerland, 2009. [Google Scholar]
Beck, C. Generalized Information and Entropy Measures in Physics. Contemp. Phys. 2009, 50, 495–510. [Google Scholar] [CrossRef]
Lavenda, B.H. A New Perspective on Thermodynamics; Springer Science+Business Media: New York, NY, USA, 2010. [Google Scholar]
Havrda, J.; Charvat, F. Quantification Method of Classification Processes. Concept of Structural α-entropy. Kybernetica 1967, 3, 30–35. [Google Scholar]
Rényi, A. On Measures of Entropy and Information. In Contributions to the Theory of Statistics, Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 20 June–30 July 1960; Neyman, J., Ed.; University of California Press: Berkeley, CA, USA, 1961; Volume 1, pp. 547–561. [Google Scholar]
Landsberg, P.T.; Vedral, V. Distributions and Channel Capacities in Generalized Statistical Mechanics. Phys. Lett. A 1997, 224, 326–330. [Google Scholar] [CrossRef]
Mondaini, R.P.; de Albuquerque Neto, S.C. The Statistical Analysis of Protein Domain Family Distributions via Jaccard Entropy Measures. In Trends in Biomathematics 3; Mondaini, R.P., Ed.; Springer International Publishing: Cham, Switzerland, 2020; pp. 169–207. [Google Scholar]
Hardy, G.H.; Littlewood, J.E.; Pólya, G. Inequalities; Cambridge University Press: London, UK, 1934. [Google Scholar]
Ay, N.; Olbrich, E.; Bertschinger, N.; Jost, J. A Geometric Approach to Complexity. Chaos 2011, 21, 037103. [Google Scholar] [CrossRef] [PubMed]
Olbrich, E.; Bertschinger, N.; Rauh, J. Information Decomposition and Synergy. Entropy 2015, 17, 3501–3517. [Google Scholar] [CrossRef] [Green Version]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: New York, NY, USA, 2009. [Google Scholar]
Marsden, J.E.; Tromba, A. Vector Calculus; W. H. Freeman and Company Publishers: New York, NY, USA, 2012. [Google Scholar]
Mondaini, R.P.; de Albuquerque Neto, S.C. A Jaccard-like Symbol and its Usefulness in the Derivation of Amino Acid Distributions in Protein Domain Families. In Trends in Biomathematics 4; Mondaini, R.P., Ed.; Springer International Publishing: Cham, Switzerland, 2021; pp. 201–220. [Google Scholar]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Alternative Entropy Measures and Generalized Khinchin–Shannon Inequalities

Abstract

1. Introduction

2. The Probability Space. Probabilities of Occurrence

3. The Assumption of Concavity and the Synergy of Gibbs–Shannon Entropy Measures

4. The Assumption of Concavity and the Synergy of Sharma–Mittal (SM) Entropy Measures. The GKS Inequalities

5. An Information Measure Proposal Associated to Sharma–Mittal Entropy Measures

6. The Use of Hölder’s Inequality for an Alternative Derivation of the GKS Inequalities

7. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics