1. Introduction
Rényi entropy (RE) is a well known one parameter generalization of Shannon entropy. It has successfully been used in a number of different fields, such as statistical physics, quantum mechanics, communication theory and data processing [
1,
2]. On the other hand, the generalization of conditional Shannon entropy to Rényi entropy case is not uniquely defined. Thus, different definitions of conditional Rényi entropy has been proposed in the context of channel coding [
3], secure communication [
4,
5,
6] and multifractal analysis [
1]. However, no one of the generalization satisfies a set of basic properties which are satisfied by the Shannon conditional entropy and there is no general agreement about the proper definition, so the choice of the definition depends on application purpose. Essentially, in all of the previous discussions, CRE can be represented as an average uncertainty about a random variable
X if a random variable
Y is given.
In this paper, we introduce three-parameter CRE which contains previously defined conditional entropies as special cases that can be obtained by a proper choice of the parameters. Moreover, it satisfies all of the properties that are simultaneously satisfied by the previous definitions, so that it can successfully be used in aforementioned applications. Thus, we show that the proposed CRE is positive, continuous, symmetric, permutation invariant, equal to Rényi entropy for independent X and Y, equal to zero for and monotonic.
One of the most frequent usages of conditional entropies is for the definition of mutual information (MI). The MI represents information transfer between the channel input and output which can be defined as the input uncertainty reduction if the output symbols are known. Thus, the MI which corresponds to the -- entropy measured as a difference between the input RE and the input -- CRE when the output is given. We analyze the properties of -- MI and show that the basic properties of Shannon MI, such as continuity, annulation for independent input and output and reducing to the output entropy for independent events, are also satisfied in the case of -- MI, which further validates the usage of -- CRE.
The paper is organized as follows. In
Section 2 we review basic notions about Rényi entropy. The definition of
-
-
CRE is introduced in
Section 3 and its properties are considered in
Section 4. The
-
-
MI is considered in
Section 5.
2. Rényi Entropy
Let
X be a discrete random variable taking values from a sample space
and distributed according to
. The Rényi entropy of
X of order
(
) is defined as
For two discrete random variables
X and
Y, with joint density
, the joint Rényi entropy is defined with
The following properties hold [
1]:
- (A1)
is continuous with respect to ;
- (A2)
Adding a zero probability event to the sample space of X does not change
- (A3)
takes its largest value for the uniformly distributed random variable i.e.,
with equality iff
.
- (A4)
If , then ;
- (A5)
is symmetric with respect to , i.e., if and , where is any permutation of , than for all ;
- (A6)
is continuous with respect to
and in the limit case reduces to Shannon entropy
[
7]:
- (A7)
Rényi entropy can be represented as quasi-linear mean of Hartley information content
as:
where the function
is defined with
or any linear function of Equation (6) (it follows from a well known result from mean value theory: if one function is a linear function of another one, they generate the same quasi-linear mean).
3. -- Conditional Rényi Entropy
Previously, several definitions of the CRE has been proposed [
8] as a measure of average uncertainty about random variable
Y when
X is known. In this section, we unify all of this definitions and we define the CRE as three parameter function which can access all of the previous definitions by a special choice of the parameters.
Let
,
. The Rényi entropy of conditional random variables
distributed according to
is denoted with
. The conditional Rényi entropy is defined with
where
is given with Equation (6) and escort distribution of
is defined with:
The definition can straightforwardly be extended to the joint conditional entropy. For random variables
X,
Y,
Z, we define joint conditional entropy as:
The definitions extends to the case , by taking a limit , and by using . In the case of , the definitions reduces to the Shannon case. By choosing appropriate values for and we get the previously considered definitions as follows:
[C-CRE]
, Cachin [
4]
[JA-CRE]
, Jizba and Arimitsu [
1]
[RW-CRE]
Renner and Wolf [
5] (RW-CRE)
[A-CRE]
Arimoto [
3] (A-CRE)
[H-CRE]
Hayashi [
6] (H-CRE).
4. Properties of -- CRE
The -- CRE satisfies the set of important properties for all :
- (B1)
- (B2)
is continuous with respect to ;
- (B3)
is symmetric with respect to for all , and ;
- (B4)
If is a permutation of , for all x, then, and
- (B5)
If X and Y are independent, then
- (B6)
If , then
- (B7)
In the case of , the definitions reduces to the Shannon case.
- (B8)
Let
be random variables distributed according with joint distribution
and corresponding marginal distributions
The proofs for the properties B1–B6 straightforwardly follows from the definition Equation (7) of the conditional Rényi entropy and from the properties A1–A5 of Rényi entropy. Here, we give the proof for the property B8.
Proof of the Property 8. Let,
,
be conditional distributions. Similarly as in [
9], we have:
so that we have
and result follows from the definition of conditional entropy since
is increasing. □
Additional properties which are satisfied in the case of Shannon entropy are not satisfied in general and are limited to a special choices of and .
- (B9)
Chain rule:
is satisfied in general only in the case of JA-CRE. Jizba and Arimitsu [
1] used it (with an assumption that
is invertible and positive) as one of the Generalized Shannon-Khinchin axioms, along with the properties A1-A3 of Rényi entropy, and shown that Réni can be characterized as a unique function which satisfies them. It also implies the symmetry of generalized mutual information introduced in the next section.
- (B10)
- (B11)
Conditioning reduces the entropy (CRE),
is satisfied in the cases of H-CRE, A-CRE and RW-CRE (for
). CRE states that an additional knowledge can not increase the information. Although it can intuitively be treated as an ineluctable property, breaking the CRE can still be interpreted using concept of spoiling knowledge as in [
4].
- (B12)
Monotonicity says that if
X,
Y, and
Z forms Markov chain then
It holds in the case of A-CRE and H-CRE and implies data processing inequality (defined in the next section), which is an important property for applications of Rényi entropy in cryptography [
5].
The previous discussion is summarized in
Table 1 [
9].
Since the additional information theoretic properties are not satisfied in general, no one of the previous definitions has not been commonly accepted, and the choice of the definition depends on application purpose. On the other hand, a set of properties B1–B8 which are satisfied for all of them, also hold in case of -- CRE defined by Equation (7), which justifies the definition.
5. -- Mutual Information
A communication channel with input
X and output
Y described by transition matrix
The mutual information is a measure of information transfer between input and output of communication channel. Thus, if the uncertainty about the input is measured by
and its uncertainty after all symbols are received by
, the
-
-
mutual information between
X and
Y is defined as
This definition generalizes the previously introduced one by Arimoto who used A-CRE, and one by Jizba et al. [
10] who used JA-CRE.
By usage of the properties of -- CRE, it is easy to conclude that the basic properties are satisfied for all :
- (D1)
is continuous with respect to ;
- (D2)
If X and Y are independent, then
- (D3)
If , then
However, another properties which hold in the Shannon case are not satisfied in general:
- (D4)
Non-Negativity
- (D5)
Symmetry:
- (D6)
Data processing inequality (DPI): If X, Y, and Z forms Markov chain then
In
Table 2 we list the properties and their fulfilments for the cases when the mutual information is defined using C, JA, RW, A and H conditional Rényi entropies [
9]. Thus, if the MI is defined using any of the previously considered CRE definitions, it fails down to satisfy some of the properties properties D4-D6. On the other hand, the set of properties D1-D3 which is sattisfied for all of them is also satisfied for
-
-
MI, which justify its usage as a measure of information transfer.
6. Conclusions
We introduced
-
-
conditional Rényi entropy (CRE) which contains previously defined conditional entropies as special cases that can be obtained by a proper choice of the parameters [
1,
3,
4,
5,
6]. It satisfies all of the properties that are simultaneously satisfied by the previous definitions, so that it could be successfully be used in channel coding, secure communication and multifractal analysis.
In addition, we analyzed the properties of mutual information (MI) which is defined using -- CRE. The resulting MI measure satisfies the set of basic properties which further validates the usage of -- CRE.