On some properties of Tsallis hypoentropies and hypodivergences

Both the Kullback-Leibler and the Tsallis divergence have a strong limitation: if the value $0$ appears in probability distributions $\left( p_{1},\cdots ,p_{n}\right)$ and $\left( q_{1},\cdots ,q_{n}\right)$, it must appear in the same positions for the sake of significance. In order to avoid that limitation in the framework of Shannon statistics, Ferreri introduced in 1980 the hypoentropy:"such conditions rarely occur in practice". The aim of the present paper is to extend Ferreri's hypoentropy to the Tsallis statistics. We introduce the Tsallis hypoentropy and the Tsallis hypodivergence and describe their mathematical behavior. Fundamental properties like nonnegativity, monotonicity, the chain rule and subadditivity are established.

In what follows we consider |X| = |Y | = |U | = n, unless otherwise specified. Making use of the concavity of the logarithmic function, one can easily check that the equiprobable states are maximizing the entropy, that is H(X) ≤ H(U ) = log n. (2) The right hand side term of this inequality is known since 1928 as Hartley entropy [10]. For two random variables X and Y following distributions {p(x i )} and {p(y i )}, the Kullback-Leibler [12] discrimination function (divergence or relative entropy) 1 is defined by Here the conventions 2 a · log 0 a = −∞ (a > 0) and 0 · log b 0 = 0 (b ≥ 0) are used. In what follows, we use such conventions in the definitions of the entropies and divergences. However we do not state them repeatedly.
C. Tsallis introduced a one-parameter extension of the entropy in 1988 in [18], for handling systems which appear to deviate from standard statistical distributions. It plays an important role in the nonextensive statistical mechanics of complex systems, being defined as p(x i ) ln q 1 p(x i ) (q ≥ 0, q = 1).
Here the q−logarithmic function for x > 0 is defined by ln q (x) ≡ x 1−q −1 1−q , which converges to the usual logarithmic function log(x) in the limit q → 1. The Tsallis divergence (relative entropy) [19] is given by 1 The relative entropy is usually defined for two probability distributions P = {pi} and Q = {qi} as D(P ||Q) ≡ − n i=1 pi log q i p i in the standard notation of Information theory. D(P ||Q) is often rewritten by D(X||Y ) for random variables X and Y following the distributions P and Q. Throughout this paper, we use the style of Eq.(3) for relative entropies to unify the notation with simple descriptions. 2 The convention is often given in the following way with the definition of D(X||Y ). If there exists i such that p(xi) = 0 = p(yi), then we define D(X||Y ) ≡ +∞ (in this case, D(X||Y ) is not significant as an information measure any longer). Otherwise, D(X||Y ) is defined by Eq.(3) with the convention 0 · log 0 0 = 0. This fact has been mentioned in the abstract of the paper.

Hypoentropy and hypodivergence
For nonnegative real numbers a i and b i (i = 1, · · · , n), we define the generalized relative entropy (for incomplete probability distributions): D (gen) (a 1 , · · · , a n ||b 1 Then we have the so-called "log-sum" inequality: with equality if and only if a i b i = const. for all i = 1, · · · , n. If we impose the condition then D (gen) (a 1 , · · · , a n ||b 1 , · · · , b n ) is just the relative entropy, D(a 1 , · · · , a n ||b 1 , We put Clearly we have lim Using the "log-sum" inequality, we have the nonnegativity with equality if and only if p(x i ) = p(y i ) for all i = 1, · · · , n. The hypoentropy at the level λ (λ-entropy) was introduced in 1980 by Ferreri [5] as an alternative measure of information in the following form: for λ > 0. According to Ferreri [5], the parameter λ can be interpreted as a measure of the information inaccuracy of economic forecast. For this quantity F λ (X), we have the following fundamental relations.
Proposition 2.1 For λ > 0, we have the following inequalities: The equality in the first inequality holds if and only if p(x j ) = 1 for some j (then p(x i ) = 0 for all i = j). The equality in the second inequality holds if and only if p(x i ) = 1/n for all i = 1, · · · , n.
Proof: From the nonnegativity of the hypodivergence Eq. (14), we get Thus we have Adding 1 λ (λ + 1) log(λ + 1) to both sides, we have with equality if and only if p(x i ) = 1/n for all i = 1, · · · , n. For the first inequality it is sufficient to prove: Since n i=1 p(x i ) = 1, the above inequality is written as so that we have only to prove for any λ > 0 and 0 ≤ p(x i ) ≤ 1. Lemma 2.2 below shows this inequality and the equality condition.
Lemma 2.2 For any a > 0 and 0 ≤ x ≤ 1, we have Proof: For any a > 0 we then have It is a known fact that F λ (X) is monotonically increasing as a function of λ and whence its name. Thus the hypoentropy appears as a generalization of Shannon's entropy. One can see that the hypoentropy also equals zero as the entropy does, in the case of certainty (i.e., for a so-called pure state when all probabilities vanish but one).
It also holds that It is of some interest for the reader to look at the hypoentropy which arises for equiprobable states, Seen as a function of two variables, n and λ, it increases in each variable [5]. Since we shall call it Hartley hypoentropy 3 . We have the cross-hypoentropy It holds . This enables us to state the following lemma.

Lemma 2.3
We have the following inequality for all λ > 0.
As direct consequences we have some interesting inequalities as follows.

Proposition 2.4 It holds that
for all λ > 0.
Proof : From Lemma 2.3, for X = U we get and the conclusion follows. An upper bound for F λ (X) can be found as follows: The following inequality holds.
Proof : In Lemma 2.3, if for a fixed k one takes the probability of the k-th component of Y to be p(y k ) = 1, then This implies that Since k is arbitrarily fixed, the conclusion follows.
Remark 2.6 It is of interest to notice now that, for the particular case X = U , we have We add here one more detail: the inequality (36) can be verified using Bernoulli's inequality.

Tsallis hypoentropy and hypodivergence
Now we turn our attention to the Tsallis statistics. We extend the definition of hypodivergences as follows: Definition 3.1 The Tsallis hypodivergence (q-hypodivergence, Tsallis relative hypoentropy) is defined by for λ > 0 and q ≥ 0.
Then we have the relation: which is the Tsallis divergence, and which is the hypodivergence.
Remark 3.2 This definition can be also obtained from the generalized Tsallis relative entropy (for incomplete probability distributions {a 1 , · · · , a n } and {b 1 , · · · , b n }) by putting The generalized relative entropy (9) and the generalized Tsallis relative entropy (40) can be written as the generalized f -divergence (for incomplete probability distributions): for a convex function f on (0, ∞) and a i ≥ 0, b i ≥ 0 (i = 1, · · · , n).
By the concavity of the q-logarithmic function, we have the following "ln q -sum" inequality with equality if and only if a i b i = const. for all i = 1, · · · , n. Using the "ln q -sum" inequality, we have the nonnegativity of the Tsallis hypodivergence: with equality if and only if p(x i ) = p(y i ) for all i = 1, · · · , n. (The equality condition comes from the equality condition of the "ln q -sum" inequality and the condition n i=1 p( where the function h(λ, q) > 0 satisfies two conditions, These conditions are equivalent to and, respectively, lim λ→∞ H λ,q (X) = T q (X) = Tsallis entropy.
Remark 3.4 It may be remarkable to discuss the Tsallis cross-hypoentropy. The first candidate for the definition of the Tsallis cross-hypoentropy is which recovers the cross-hypoentropy defined in Eq.(29) in the limit q → 1. Then we have The last inequality is due to the nonnegativity given in Eq.(43). Since lim q→1 h(λ, q) = 1 by the definition of the Tsallis hypoentropy (see Eq.(45)), the above relation recovers the inequality (30) in the limit q → 1.
We turn to show the nonnegativity and maximality for the Tsallis hypoentropy.
Lemma 3.5 For any a > 0, q ≥ 0 and 0 ≤ x ≤ 1, we have For any a > 0 and q ≥ 0 we then have d 2 g(x) dx 2 = qa 2 1 1+ax 2−q ≥ 0 and g(0) = g(1) = 0. Thus we have the inequality. Proposition 3.6 For λ > 0, q ≥ 0 and h(λ, q) > 0 satisfying (45) and (46), we have the following inequalities: The equality in the first inequality holds if and only if p(x j ) = 1 for some j (then p(x i ) = 0 for all i = j). The equality in the second inequality holds if and only if p(x i ) = 1/n for all i = 1, · · · , n.
Proof: In a similar way to the proof of Proposition 2.1, for the first inequality it is sufficient to prove so that we have only to prove for any λ > 0, q ≥ 0 and 0 ≤ p(x i ) ≤ 1. Lemma 3.5 shows this inequality with equality condition.
The second inequality is proven by the use of the nonnegativity of the Tsallis hypodivergence in the following way: which implies (by the use of the formula, ln q The equality condition of the second inequality follows from the equality condition of the nonnegativity of the Tsallis hypodivergence (43).
We may call Hartley-Tsallis hypoentropy. We study the monotonicity of the Hartley-Tsallis hypoentropy H λ,q (U ) and the Tsallis hypoentropy H λ,q (X).
is monotonically increasing in x, for any q ≥ 0.
Proof: By direct calculations, we have and
Proof: Note that Putting x = n λ > 0 for λ > 0 fixed in Lemma 3.7, we get the function which is a monotonically increasing function of n. Thus we have the present proposition.

Remark 3.9
We have the relation We notice from the condition (46) that and conclude that the result is independent of the choice of h(λ, q). For the limit λ → 0 we consider two cases.
Closing this subsection, we give a q-extended version for Proposition 2.5 and Proposition 2.4.
Proof: From the "ln q -sum" inequality, we have D λ,q (X||Y ) ≥ 0. Since λ > 0, we have which is equivalent to Thus we have (1 + λp(x i )) q ln q (1 + λp(y i )), which extends the result of Lemma 2.3. For arbitrarily fixed k, we set p(y k ) = 1 (and p(y i ) = 0 for i = k) in the above inequality, then we have Since Multiplying both sides by h(λ,q) λ > 0 and then adding to both sides, we have Since k is arbitrary, we have this proposition.
Letting q → 1 in the above proposition, we recover Proposition 2.5. We give some notations before we state the next proposition. For any x, y > 0 satisfying x 1−q + y 1−q − 1 > 0, we define the q-product [16] by Then we have lim q→1 x ⊗ q y = xy and ln q (x ⊗ q y) = ln q x + ln q y. We also use the notation Proposition 3.12 for all λ > 0 and 0 ≤ q < 1.
Proof: In the inequality (60), we put p(x i ) = 1 n for all i = 1, · · · , n. Then we have which implies this proposition.
The limit q → 1 in the above proposition recovers Proposition 2.4. In addition, it is known that lim n→∞ 1 + λ n ⊗ n q = exp q (λ), where exp q (x) is the inverse function of ln q (x) and defined

The subadditivities of the Tsallis hypoentropies
Throughout this section we assume |X| = n, |Y | = m, |Z| = l. We define the joint Tsallis hypoentropy at the level λ by For all i = 1, · · · , n for which p(x i ) = 0, we define the Tsallis hypoentropy of Y given X = x i , at the level λp(x i ), by (66) For n = 1, this coincides with the hypoentropy H λ,q (Y ). As for the particular case m = 1, we get H λp(x i ),q (Y |x i ) = 0.

Definition 4.1 The Tsallis conditional hypoentropy at the level λ is defined by
(As a usual convention, the corresponding summand is defined as 0, if p(x i ) = 0. ) Throughout this section we consider the particular function h(λ, q) = λ 1−q for λ > 0, q ≥ 0.

Lemma 4.2
We assume h(λ, q) = λ 1−q . The chain rule for the Tsallis hypoentropy holds: Proof: The proof is done by straightforward computation as follows.
In the limit λ → ∞, the identity (68) becomes p(x i , y j ) ln q 1 p(x i ,y j ) is the Tsallis joint entropy (see also [6, p.3]). In the limit q → 1 in Lemma 4.2, we also obtain the identity F λ (X, Y ) = F λ (X) + F λ (Y |X), which naturally leads to the definition of F λ (Y |X) as conditional hypoentropy.
In order to obtain the subadditivity for the Tsallis hypoentropy, we prove the monotonicity of the Tsallis hypoentropy.

Proof: Note that
where is defined on 0 ≤ x ≤ 1 and λ > 0. Then we have where is defined on 0 ≤ x ≤ 1 and λ > 0. By elementary computations, we obtain Since we have l λ,q (0) = l λ,q (1) = 0, we find that l λ,q (x) ≥ 0 for 0 ≤ q ≤ 2 and any λ > 0. We also find that l λ,q (x) ≤ 0 for q ≥ 2 (or q ≤ 0) and any λ > 0. Therefore we have dH λ,q (X) dλ ≥ 0 when 0 ≤ q ≤ 2, and This result also agrees with the known fact that the usual (Ferreri) hypoentropy is increasing as a function of λ.
Proof: We note that Ln λ,q (x) is a nonnegative and concave function in x, when 0 ≤ x ≤ 1, λ > 0 and q ≥ 0. Here we use the notation for the conditional probability as p(y j |x i ) = when p(x i ) = 0. By the concavity of Ln λ,q (x), we have = Ln λ,q n i=1 p(x i , y j ) = Ln λ,q (p(y j )).
Summing both sides of the above inequality over j, we have Ln λ,q (p(y j |x i )) ≤ m j=1 Ln λ,q (p(y j )).
Summing both sides of the above inequality over i, we have Ln λ,q (p(y j |x i )) .
Here we can see that m j=1 Ln λ,q (p(y j |x i )) is the Tsallis hypoentropy for fixed x i and the Tsallis hypoentropy is a monotonically increasing function of λ in the case 1 ≤ q ≤ 2, due to Lemma 4.3. Thus we have Ln λ,q (p(y j |x i )) .
By the two inequalities (79) and (80), we finally have Ln λ,q (p(y j )), which implies (since p(y j |x i ) = Ln λ,q (p(y j )), since we have for all fixed x i , Ln λp(x i ),q (p(y j |x i )).
Proof: The proof is easily done by Lemma 4.2 and Theorem 4.4.
We are now in a position to prove the strong subadditivity for the Tsallis hypoentropies. The strong subadditivity for entropy is one of interesting subjects in entropy theory [14]. For this purpose, we firstly give a chain rule for three random variables X, Y and Z. Lemma 4. 6 We assume h(λ, q) = λ 1−q . The following chain rule holds: Proof: The proof can be done following the recipe used in Lemma 4.2.
Proof: This theorem is proven in a similar way as Theorem 4.4. By the concavity of the function Ln λp(z k ),q (x) in x, we have Multiplying both sides by p(z k ) q and summing over i and k, we have since m j=1 p(y j |z k )p(x i |y j , z k ) = p(x i |z k ). By p(y j |z k ) q ≤ p(y j |z k ) for all j, k and 1 ≤ q ≤ 2, and by the nonnegativity of the function Ln λp(z k ),q , we have Ln λp(z k ),q (p(x i |y j , z k )) ≤ p(y j |z k ) Ln λp(z k ),q (p(x i |y j , z k )).
Multiplying both sides by p(z k ) q and summing over j and k in the above inequality, we have Ln λp(z k ),q (p(x i |y j , z k )). (87) From the two inequalities (86) and (87) we have Ln λp(z k ),q (p(x i |z k )), since p(y j , z k ) ≤ p(z k ) (because of m j=1 p(y j , z k ) = p(z k )) for all j and k and the function Ln λp(z k ),q is monotonically increasing in λp(z k ) > 0, when 1 ≤ q ≤ 2. Thus we have H λ,q (X|Y, Z) ≤ H λ,q (X|Z) which is equivalent to the inequality by Lemma 4.2 and Lemma 4.6.
Remark 4.8 Passing to the limit λ → ∞ in Corollary 4.5 and Theorem 4.7, we recover the subadditivity and the strong subadditivity [7] for the Tsallis entropy: and T q (X, Y, Z) + T q (Z) ≤ T q (X, Z) + T q (Y, Z) (q ≥ 1).
Thanks to the subadditivities, we may define the Tsallis mutual hypoentropies for 1 ≤ q ≤ 2 and λ > 0. Definition 4.9 Let 1 ≤ q ≤ 2 and λ > 0. The Tsallis mutual hypoentropy is defined by and the Tsallis conditional mutual hypoentropy is defined by From the chain rule given in Lemma 4.2, we find that the Tsallis mutual hypoentropy is symmetric, that is, In addition, we have from the subadditivity given in Theorem 4.4 and nonnegativity of the Tsallis conditional hypoentropy. We also find I λ,q (X; Y |Z) ≥ 0 from the strong subadditivity given in Theorem 4.7.
Moreover we have the chain rule for the Tsallis mutual hypoentropies in the following.

Jeffreys and Jensen-Shannon hypodivergences
In what follows we indicate extensions of two known information measures.
Definition 5.1 ([4], [11]) The Jeffreys divergence is defined by and the Jensen-Shannon divergence is defined as The Jensen-Shannon divergence was introduced in 1991 in [13], but its roots can be older, since one can see some analogous formulae used in thermodynamics under the name entropy of mixing [17, p.598], for the study of gaseous, liquid or crystalline mixtures.
Jeffreys and Jensen-Shannon divergences have been extended to the context of Tsallis theory in [8]: and the Jensen-Shannon-Tsallis divergence is Note that This expression was used in [1] as Jensen-Tsallis divergence.
In accordance with the above definition, we define the directed Jeffreys and Jensen-Shannon q-hypodivergence measures between two distributions and emphasize the mathematical significance of our definitions. and the Jensen-Shannon-Tsallis hypodivergence is Here we point out that again one has where JS λ (X||Y ) ≡ lim q→1 JS λ,q (X||Y ).
The next two results of the present paper are stated in order to establish the counterpart of Theorem 3.5 in [8] for hypodivergences. for q ≥ 0 and λ > 0.
Using Lemma 5.4, we have the following inequality.