On a General Deﬁnition of Conditional Rényi Entropies †

: In recent decades, different definitions of conditional Rényi entropy (CRE) have been introduced. Thus, Arimoto proposed a definition that found an application in information theory, Jizba and Arimitsu proposed a definition that found an application in time series analysis and Renner-Wolf, Hayashi and Cachin proposed definitions that are suitable for cryptographic applications. However, there is still no a commonly accepted definition, nor a general treatment of the CRE-s, which can essentially and intuitively be represented as an average uncertainty about a random variable X if a random variable Y is given. In this paper we fill the gap and propose a three-parameter CRE, which contains all of the previous definitions as special cases that can be obtained by a proper choice of the parameters. Moreover, it satisfies all of the properties that are simultaneously satisfied by the previous definitions, so that it can successfully be used in aforementioned applications. Thus, we show that the proposed CRE is positive, continuous, symmetric, permutation invariant, equal to Rényi entropy for independent X and Y , equal to zero for X = Y and monotonic. In addition, as an example for the further usage, we discuss the properties of generalized mutual information, which is defined using proposed CRE.


Introduction
Rényi entropy (RE) is a well known one parameter generalization of Shannon entropy. It has successfully been used in a number of different fields, such as statistical physics, quantum mechanics, communication theory and data processing [1,2]. On the other hand, the generalization of conditional Shannon entropy to Rényi entropy case is not uniquely defined. Thus, different definitions of conditional Rényi entropy has been proposed in the context of channel coding [3], secure communication [4][5][6] and multifractal analysis [1]. However, no one of the generalization satisfies a set of basic properties which are satisfied by the Shannon conditional entropy and there is no general agreement about the proper definition, so the choice of the definition depends on application purpose. Essentially, in all of the previous discussions, CRE can be represented as an average uncertainty about a random variable X if a random variable Y is given.
In this paper, we introduce three-parameter CRE which contains previously defined conditional entropies as special cases that can be obtained by a proper choice of the parameters. Moreover, it satisfies all of the properties that are simultaneously satisfied by the previous definitions, so that it can successfully be used in aforementioned applications. Thus, we show that the proposed CRE is positive, continuous, symmetric, permutation invariant, equal to Rényi entropy for independent X and Y, equal to zero for X = Y and monotonic.
One of the most frequent usages of conditional entropies is for the definition of mutual information (MI). The MI represents information transfer between the channel input and output which can be defined as the input uncertainty reduction if the output symbols are known. Thus, the MI which corresponds to the α-β-γ entropy measured as a difference between the input RE and the input α-β-γ CRE when the output is given. We analyze the properties of α-β-γ MI and show that the basic properties of Shannon MI, such as continuity, annulation for independent input and output and reducing to the output entropy for independent events, are also satisfied in the case of α-β-γ MI, which further validates the usage of α-β-γ CRE.
The paper is organized as follows. In Section 2 we review basic notions about Rényi entropy. The definition of α-β-γ CRE is introduced in Section 3 and its properties are considered in Section 4. The α-β-γ MI is considered in Section 5.

Rényi Entropy
Let X be a discrete random variable taking values from a sample space {x 1 , . . . , x n } and distributed according to P X = (p 1 , . . . , p n ). The Rényi entropy of X of order α (α > 0) is defined as For two discrete random variables X and Y, with joint density P XY , the joint Rényi entropy is defined with The following properties hold [1]: (A1) R α (X) is continuous with respect to P X ; (A2) Adding a zero probability event to the sample space of X does not change R α (X) (A3) R α (X) takes its largest value for the uniformly distributed random variable i.e., with equality iff P X = (1/n, . . . , 1/n).
where π is any permutation of {1, . . . , n}, than R α (X) = R α (Y) for all P X ; (A6) R α (X) is continuous with respect to α and in the limit case reduces to Shannon entropy S(X) [7]: (A7) Rényi entropy can be represented as quasi-linear mean of Hartley information content H = − log 2 P X as: where the function g α is defined with x, for α = 1.
or any linear function of Equation (6) (it follows from a well known result from mean value theory: if one function is a linear function of another one, they generate the same quasi-linear mean).

α-β-γ Conditional Rényi Entropy
Previously, several definitions of the CRE has been proposed [8] as a measure of average uncertainty about random variable Y when X is known. In this section, we unify all of this definitions and we define the CRE as three parameter function which can access all of the previous definitions by a special choice of the parameters.
Let (X, Y) ∼ P X,Y , X ∼ P X . The Rényi entropy of conditional random variables Y|X = x distributed according to P Y|X=x is denoted with R α (Y|X = x). The conditional Rényi entropy is defined with where g γ is given with Equation (6) and escort distribution of P X is defined with: The definition can straightforwardly be extended to the joint conditional entropy. For random variables X,Y,Z, we define joint conditional entropy as: The definitions extends to the case β = ∞, by taking a limit β → ∞, and by using lim β→∞ P (β) X (x) = max x P X (x). In the case of α = β = 1, the definitions reduces to the Shannon case. By choosing appropriate values for β and γ we get the previously considered definitions as follows: [JA-CRE] β = γ = α, Jizba and Arimitsu [1] [RW-CRE] β = ∞ Renner and Wolf [5] (RW-CRE) [H-CRE] α = γ, β = 1 Hayashi [6] (H-CRE).

Properties of α-β-γ CRE
The α-β-γ CRE satisfies the set of important properties for all α, β, γ: is symmetric with respect to P Y|X=x for all P X ∈ ∆ n , i = 1, . . . , n and P Y|X=x ; (B4) If P Y|X=x is a permutation of P Y|X=x 1 , for all x, then, and R β,γ In the case of α = β = 1, the definitions reduces to the Shannon case. (B8) Let X, Y, Z be random variables distributed according with joint distribution P X,Y,Z and corresponding marginal distributions P X , The proofs for the properties B1-B6 straightforwardly follows from the definition Equation (7) of the conditional Rényi entropy and from the properties A1-A5 of Rényi entropy. Here, we give the proof for the property B8.
Proof of the Property 8. Let, P Y,Z|x , P Z|x , P Y|x,z be conditional distributions. Similarly as in [9], we have: so that we have R α (Y, Z|X = x) ≤ R α (Y|X = x) and result follows from the definition of conditional entropy since g γ is increasing.
Additional properties which are satisfied in the case of Shannon entropy are not satisfied in general and are limited to a special choices of β and γ.
(B9) Chain rule: is satisfied in general only in the case of JA-CRE. Jizba and Arimitsu [1] used it (with an assumption that g γ is invertible and positive) as one of the Generalized Shannon-Khinchin axioms, along with the properties A1-A3 of Rényi entropy, and shown that Réni can be characterized as a unique function which satisfies them. It also implies the symmetry of generalized mutual information introduced in the next section.
(B11) Conditioning reduces the entropy (CRE), is satisfied in the cases of H-CRE, A-CRE and RW-CRE (for α ≥ 1). CRE states that an additional knowledge can not increase the information. Although it can intuitively be treated as an ineluctable property, breaking the CRE can still be interpreted using concept of spoiling knowledge as in [4].
(B12) Monotonicity says that if X, Y, and Z forms Markov chain then It holds in the case of A-CRE and H-CRE and implies data processing inequality (defined in the next section), which is an important property for applications of Rényi entropy in cryptography [5].
The previous discussion is summarized in Table 1 [9]. Table 1. Properties of different CRE-s ( stands for satisfied, for not satisfied, and * for satisfied for α ≥ 1).

C J A RW H A
Chain Rule Weak Chain Rule CRE * Monotonicity Since the additional information theoretic properties are not satisfied in general, no one of the previous definitions has not been commonly accepted, and the choice of the definition depends on application purpose. On the other hand, a set of properties B1-B8 which are satisfied for all of them, also hold in case of α-β-γ CRE defined by Equation (7), which justifies the definition.

α-β-γ Mutual Information
A communication channel with input X and output Y described by transition matrix P Y|X P (j,i) The mutual information is a measure of information transfer between input and output of communication channel. Thus, if the uncertainty about the input is measured by R α (X) and its uncertainty after all symbols are received by R (β,γ) α (X|Y), the α-β-γ mutual information between X and Y is defined as I β,γ This definition generalizes the previously introduced one by Arimoto who used A-CRE, and one by Jizba et al. [10] who used JA-CRE.
By usage of the properties of α-β-γ CRE, it is easy to conclude that the basic properties are satisfied for all α, β, γ:  Table 2 we list the properties and their fulfilments for the cases when the mutual information is defined using C, JA, RW, A and H conditional Rényi entropies [9]. Thus, if the MI is defined using any of the previously considered CRE definitions, it fails down to satisfy some of the properties properties D4-D6. On the other hand, the set of properties D1-D3 which is sattisfied for all of them is also satisfied for α-β-γ MI, which justify its usage as a measure of information transfer. Table 2. Properties of different MI deffinitions ( stands for satisfied, for not satisfied, and * for satisfied for α ≥ 1).

C J A RW H A
Non-Negativity * Symmetry DPI

Conclusions
We introduced α-β-γ conditional Rényi entropy (CRE) which contains previously defined conditional entropies as special cases that can be obtained by a proper choice of the parameters [1,[3][4][5][6]. It satisfies all of the properties that are simultaneously satisfied by the previous definitions, so that it could be successfully be used in channel coding, secure communication and multifractal analysis.
In addition, we analyzed the properties of mutual information (MI) which is defined using α-β-γ CRE. The resulting MI measure satisfies the set of basic properties which further validates the usage of α-β-γ CRE.