On a General Definition of Conditional Rényi Entropies

Ilić, Velimir M.; Djordjević, Ivan B.; Stanković, Miomir

doi:10.3390/ecea-4-05030

Open AccessProceeding Paper

On a General Definition of Conditional Rényi Entropies^†

by

Velimir M. Ilić

^1,*,

Ivan B. Djordjević

²

and

Miomir Stanković

³

¹

Mathematical Institute of the Serbian Academy of Sciences and Arts, Kneza Mihaila 36, 11000 Beograd, Serbia

²

Department of Electrical and Computer Engineering, University of Arizona, 1230 E. Speedway Blvd, Tucson, AZ 85721, USA

³

Faculty of Occupational Safety, University of Niš, Čarnojevića 10a, 18000 Niš, Serbia

^*

Author to whom correspondence should be addressed.

^†

Presented at the 4th International Electronic Conference on Entropy and Its Applications, 21 November–1 December 2017; Available online: http://sciforum.net/conference/ecea-4.

Proceedings 2018, 2(4), 166; https://doi.org/10.3390/ecea-4-05030

Published: 21 November 2017

(This article belongs to the Proceedings of The 4th International Electronic Conference on Entropy and Its Applications)

Download Versions Notes

Abstract

:

In recent decades, different definitions of conditional Rényi entropy (CRE) have been introduced. Thus, Arimoto proposed a definition that found an application in information theory, Jizba and Arimitsu proposed a definition that found an application in time series analysis and Renner-Wolf, Hayashi and Cachin proposed definitions that are suitable for cryptographic applications. However, there is still no a commonly accepted definition, nor a general treatment of the CRE-s, which can essentially and intuitively be represented as an average uncertainty about a random variable X if a random variable Y is given. In this paper we fill the gap and propose a three-parameter CRE, which contains all of the previous definitions as special cases that can be obtained by a proper choice of the parameters. Moreover, it satisfies all of the properties that are simultaneously satisfied by the previous definitions, so that it can successfully be used in aforementioned applications. Thus, we show that the proposed CRE is positive, continuous, symmetric, permutation invariant, equal to Rényi entropy for independent X and Y, equal to zero for

X = Y

and monotonic. In addition, as an example for the further usage, we discuss the properties of generalized mutual information, which is defined using proposed CRE.

Keywords:

Rényi entropy; conditional entropy; mutual information

PACS:

89.70.Cf, 87.19.lo

1. Introduction

Rényi entropy (RE) is a well known one parameter generalization of Shannon entropy. It has successfully been used in a number of different fields, such as statistical physics, quantum mechanics, communication theory and data processing [1,2]. On the other hand, the generalization of conditional Shannon entropy to Rényi entropy case is not uniquely defined. Thus, different definitions of conditional Rényi entropy has been proposed in the context of channel coding [3], secure communication [4,5,6] and multifractal analysis [1]. However, no one of the generalization satisfies a set of basic properties which are satisfied by the Shannon conditional entropy and there is no general agreement about the proper definition, so the choice of the definition depends on application purpose. Essentially, in all of the previous discussions, CRE can be represented as an average uncertainty about a random variable X if a random variable Y is given.

In this paper, we introduce three-parameter CRE which contains previously defined conditional entropies as special cases that can be obtained by a proper choice of the parameters. Moreover, it satisfies all of the properties that are simultaneously satisfied by the previous definitions, so that it can successfully be used in aforementioned applications. Thus, we show that the proposed CRE is positive, continuous, symmetric, permutation invariant, equal to Rényi entropy for independent X and Y, equal to zero for

X = Y

and monotonic.

One of the most frequent usages of conditional entropies is for the definition of mutual information (MI). The MI represents information transfer between the channel input and output which can be defined as the input uncertainty reduction if the output symbols are known. Thus, the MI which corresponds to the

α

-

β

-

γ

entropy measured as a difference between the input RE and the input

α

-

β

-

γ

CRE when the output is given. We analyze the properties of

α

-

β

-

γ

MI and show that the basic properties of Shannon MI, such as continuity, annulation for independent input and output and reducing to the output entropy for independent events, are also satisfied in the case of

α

-

β

-

γ

MI, which further validates the usage of

α

-

β

-

γ

CRE.

The paper is organized as follows. In Section 2 we review basic notions about Rényi entropy. The definition of

α

-

β

-

γ

CRE is introduced in Section 3 and its properties are considered in Section 4. The

α

-

β

-

γ

MI is considered in Section 5.

2. Rényi Entropy

Let X be a discrete random variable taking values from a sample space

{x_{1}, \dots, x_{n}}

and distributed according to

P_{X} = (p_{1}, \dots, p_{n})

. The Rényi entropy of X of order

α

(

α > 0

) is defined as

R_{α} (X) = \frac{1}{1 - α} {log}_{2} (\sum_{x} P_{X} {(x)}^{α}) .

(1)

For two discrete random variables X and Y, with joint density

P_{X Y}

, the joint Rényi entropy is defined with

R_{α} (X, Y) = \frac{1}{1 - α} {log}_{2} (\sum_{x, y} P_{X Y} {(x, y)}^{α}) .

(2)

The following properties hold [1]:

(A1): $R_{α} (X)$ is continuous with respect to $P_{X}$ ;
(A2): Adding a zero probability event to the sample space of X does not change $R_{α} (X)$
(A3): $R_{α} (X)$ takes its largest value for the uniformly distributed random variable i.e.,

$R_{α} (X) \leq {log}_{2} n,$

(3)

with equality iff $P_{X} = (1 / n, \dots, 1 / n)$ .
(A4): If $P_{X} = (1, 0, \dots, 0)$ , then $R_{α} (X) = 0$ ;
(A5): $R_{α} (X)$ is symmetric with respect to $P_{X}$ , i.e., if $X \sim (p_{1}, \dots, p_{n})$ and $Y \sim ((p_{π (1)}, \dots, p_{π (n)}))$ , where $π$ is any permutation of ${1, \dots, n}$ , than $R_{α} (X) = R_{α} (Y)$ for all $P_{X}$ ;
(A6): $R_{α} (X)$ is continuous with respect to $α$ and in the limit case reduces to Shannon entropy $S (X)$ [7]:

$S (X) = lim_{α \to 1} R_{α} (X) = - \sum_{x} P_{X} (x) {log}_{2} P_{X} (x);$

(4)
(A7): Rényi entropy can be represented as quasi-linear mean of Hartley information content $H = - {log}_{2} P_{X}$ as:

$R_{α} (X) = g_{α}^{- 1} (\sum_{x} P (x) g_{α} (H (x)))$

(5)

where the function $g_{α}$ is defined with

$g_{α} (x) = \{\begin{matrix} \frac{2^{(1 - α) x - 1}}{1 - α}, & for α \neq 1; \\ x, & for α = 1 . \end{matrix}$

(6)

or any linear function of Equation (6) (it follows from a well known result from mean value theory: if one function is a linear function of another one, they generate the same quasi-linear mean).

3. $α$ - $β$ - $γ$ Conditional Rényi Entropy

Previously, several definitions of the CRE has been proposed [8] as a measure of average uncertainty about random variable Y when X is known. In this section, we unify all of this definitions and we define the CRE as three parameter function which can access all of the previous definitions by a special choice of the parameters.

Let

(X, Y) \sim P_{X, Y}

,

X \sim P_{X}

. The Rényi entropy of conditional random variables

Y | X = x

distributed according to

P_{Y | X = x}

is denoted with

R_{α} (Y | X = x)

. The conditional Rényi entropy is defined with

R_{α}^{β, γ} (Y | X) = g_{γ}^{- 1} (\sum_{x} P_{X}^{(β)} (x) g_{γ} (R_{α} (Y | X = x)))

(7)

where

g_{γ}

is given with Equation (6) and escort distribution of

P_{X}

is defined with:

P_{X}^{(β)} (x) = \frac{P_{X} {(x)}^{β}}{\sum_{x} P_{X} {(x)}^{β}} .

(8)

The definition can straightforwardly be extended to the joint conditional entropy. For random variables X,Y,Z, we define joint conditional entropy as:

R_{α}^{β, γ} (Y, Z | X) = g_{γ}^{- 1} (\sum_{x} P_{X}^{(β)} (x) g_{γ} (R_{α} (Z, Y | X = x)))

(9)

The definitions extends to the case

β = \infty

, by taking a limit

β \to \infty

, and by using

{lim}_{β \to \infty} P_{X}^{(β)} (x) = {max}_{x} P_{X} (x)

. In the case of

α = β = 1

, the definitions reduces to the Shannon case. By choosing appropriate values for

β

and

γ

we get the previously considered definitions as follows:

[C-CRE]

β = γ = 1

, Cachin [4]

R_{α}^{C} (Y | X) = \sum_{x} P_{X} (x) R_{α} (Y | X = x)

(10)

[JA-CRE]

β = γ = α

, Jizba and Arimitsu [1]

R_{α}^{J A} (Y | X) = \frac{1}{1 - α} {log}_{2} \sum_{x} P_{X}^{(α)} (x) 2^{(1 - α) R_{α} (Y | X = x)}

(11)

[RW-CRE]

β = \infty

Renner and Wolf [5] (RW-CRE)

R_{α}^{R W} (Y | X) = \{\begin{matrix} min_{x} R_{α} (Y | X = x) & if α > 1 \\ max_{x} R_{α} (Y | X = x) & if α < 1 \end{matrix}

(12)

[A-CRE]

β = 1, γ = 2 - α^{- 1}

Arimoto [3] (A-CRE)

R_{α}^{A} (Y | X) = \frac{1}{1 - α} {log}_{2} \sum_{x} P_{X} (x) 2^{\frac{1 - α}{α} R_{α} (Y | X = x)}

(13)

[H-CRE]

α = γ, β = 1

Hayashi [6] (H-CRE).

R_{α}^{H} (Y | X) = \frac{1}{1 - α} {log}_{2} \sum_{x} P_{X} (x) 2^{(1 - α) R_{α} (Y | X = x)}

(14)

4. Properties of $α$ - $β$ - $γ$ CRE

The

α

-

β

-

γ

CRE satisfies the set of important properties for all

α, β, γ

:

(B1): $R_{α}^{β, γ} (Y | X) \geq 0$
(B2): $R_{α}^{β, γ} (Y | X)$ is continuous with respect to $P_{X, Y}$ ;
(B3): $R_{α}^{β, γ} (Y | X)$ is symmetric with respect to $P_{Y | X = x}$ for all $P_{X} \in Δ_{n}$ , $i = 1, \dots, n$ and $P_{Y | X = x}$ ;
(B4): If $P_{Y | X = x}$ is a permutation of $P_{Y | X = x_{1}}$ , for all x, then, and $R_{α}^{β, γ} (Y | X) = R (Y | X = x_{1})$
(B5): If X and Y are independent, then $R_{α}^{β, γ} (Y | X) = R (Y)$
(B6): If $X = Y$ , then $R_{α}^{β, γ} (Y | X) = 0$
(B7): In the case of $α = β = 1$ , the definitions reduces to the Shannon case.
(B8): Let $X, Y, Z$ be random variables distributed according with joint distribution $P_{X, Y, Z}$ and corresponding marginal distributions $P_{X}, P_{Y}, P_{Y, Z}$

$R_{α}^{β, γ} (Y, Z | X) \leq R_{α}^{β, γ} (Y | X)$

(15)

The proofs for the properties B1–B6 straightforwardly follows from the definition Equation (7) of the conditional Rényi entropy and from the properties A1–A5 of Rényi entropy. Here, we give the proof for the property B8.

Proof of the Property 8.

Let,

P_{Y, Z | x}, P_{Z | x}

,

P_{Y | x, z}

be conditional distributions. Similarly as in [9], we have:

\begin{matrix} \sum_{y, z} P_{Y, Z | x} {(y, z)}^{α} = \sum_{z} P_{Z | x} {(z)}^{α} \sum_{y} P_{Y | x, z} {(y)}^{α} \\ \{\begin{matrix} \leq \sum_{z} P_{Z | x} {(z)}^{α}, for α > 1 \\ \geq \sum_{z} P_{Z | x} {(z)}^{α}, for α < 1 \end{matrix} \end{matrix}

(16)

so that we have

R_{α} (Y, Z | X = x) \leq R_{α} (Y | X = x)

and result follows from the definition of conditional entropy since

g_{γ}

is increasing. □

Additional properties which are satisfied in the case of Shannon entropy are not satisfied in general and are limited to a special choices of

β

and

γ

.

(B9): Chain rule:

$\begin{matrix} R_{α} (X, Y) & = R_{α} (X) + R_{α}^{β, γ} (Y | X) \\ = R_{α} (Y) + R_{α}^{β, γ} (X | Y) \end{matrix}$

(17)

is satisfied in general only in the case of JA-CRE. Jizba and Arimitsu [1] used it (with an assumption that $g_{γ}$ is invertible and positive) as one of the Generalized Shannon-Khinchin axioms, along with the properties A1-A3 of Rényi entropy, and shown that Réni can be characterized as a unique function which satisfies them. It also implies the symmetry of generalized mutual information introduced in the next section.
(B10): Weak chain rule:

$R_{α}^{β, γ} (Y | X) \geq R_{α} (X, Y) - {log}_{2} m$

(18)
(B11): Conditioning reduces the entropy (CRE),

$R_{α}^{β, γ} (Y | X) \leq R_{α} (X)$

(19)

is satisfied in the cases of H-CRE, A-CRE and RW-CRE (for $α \geq 1$ ). CRE states that an additional knowledge can not increase the information. Although it can intuitively be treated as an ineluctable property, breaking the CRE can still be interpreted using concept of spoiling knowledge as in [4].
(B12): Monotonicity says that if X, Y, and Z forms Markov chain then

$R_{α}^{β, γ} (X | Z) \leq R_{α}^{β, γ} (X | Y)$

(20)

It holds in the case of A-CRE and H-CRE and implies data processing inequality (defined in the next section), which is an important property for applications of Rényi entropy in cryptography [5].

The previous discussion is summarized in Table 1 [9].

Since the additional information theoretic properties are not satisfied in general, no one of the previous definitions has not been commonly accepted, and the choice of the definition depends on application purpose. On the other hand, a set of properties B1–B8 which are satisfied for all of them, also hold in case of

α

-

β

-

γ

CRE defined by Equation (7), which justifies the definition.

5. $α$ - $β$ - $γ$ Mutual Information

A communication channel with input X and output Y described by transition matrix

P_{Y | X}

P_{Y | X}^{(j, i)} = P_{Y | X = x_{i}} (Y = y_{j} | X = x_{i})

The mutual information is a measure of information transfer between input and output of communication channel. Thus, if the uncertainty about the input is measured by

R_{α} (X)

and its uncertainty after all symbols are received by

R_{α}^{(β, γ)} (X | Y)

, the

α

-

β

-

γ

mutual information between X and Y is defined as

I_{α}^{β, γ} (X, Y) = R_{α} (X) - R_{α}^{β, γ} (X | Y) .

(21)

This definition generalizes the previously introduced one by Arimoto who used A-CRE, and one by Jizba et al. [10] who used JA-CRE.

By usage of the properties of

α

-

β

-

γ

CRE, it is easy to conclude that the basic properties are satisfied for all

α, β, γ

:

(D1): $I_{α}^{β, γ} (Y, X)$ is continuous with respect to $P_{X, Y}$ ;
(D2): If X and Y are independent, then $I_{α}^{β, γ} (X, Y) = 0$
(D3): If $X = Y$ , then $I_{α}^{β, γ} (X, Y) = R_{α} (X)$

However, another properties which hold in the Shannon case are not satisfied in general:

(D4): Non-Negativity $I_{α}^{β, γ} (Y, X) \geq 0$
(D5): Symmetry: $I_{α}^{β, γ} (X, Y) = I_{α}^{β, γ} (Y, X)$
(D6): Data processing inequality (DPI): If X, Y, and Z forms Markov chain then $I_{α}^{β, γ} (X, Y) \geq I_{α}^{β, γ} (X, Z)$

In Table 2 we list the properties and their fulfilments for the cases when the mutual information is defined using C, JA, RW, A and H conditional Rényi entropies [9]. Thus, if the MI is defined using any of the previously considered CRE definitions, it fails down to satisfy some of the properties properties D4-D6. On the other hand, the set of properties D1-D3 which is sattisfied for all of them is also satisfied for

α

-

β

-

γ

MI, which justify its usage as a measure of information transfer.

6. Conclusions

We introduced

α

-

β

-

γ

conditional Rényi entropy (CRE) which contains previously defined conditional entropies as special cases that can be obtained by a proper choice of the parameters [1,3,4,5,6]. It satisfies all of the properties that are simultaneously satisfied by the previous definitions, so that it could be successfully be used in channel coding, secure communication and multifractal analysis.

In addition, we analyzed the properties of mutual information (MI) which is defined using

α

-

β

-

γ

CRE. The resulting MI measure satisfies the set of basic properties which further validates the usage of

α

-

β

-

γ

CRE.

Acknowledgments

Research supported by Ministry of Science and Technological Development, Republic of Serbia, Grants Nos. ON 174026 and III 044006.

References

Jizba, P.; Arimitsu, T. The world according to Rényi: thermodynamics of multifractal systems. Ann. Phys. 2004, 312, 17–59. [Google Scholar] [CrossRef]
Csiszár, I. Generalized cutoff rates and Renyi’s information measures. IEEE Trans. Inf. Theory 1995, 41, 26–34. [Google Scholar] [CrossRef]
Arimoto, S. Information Mesures and Capacity of Order α for Discrete Memoryless Channels. In Topics in Information Theory; Colloquia Mathematica Societatis János Bolyai; Csiszár, I., Elias, P., Eds.; János Bolyai Mathematical Society and North-Holland: Budapest, Hungary, 1977; Volume 16, pp. 493–519. [Google Scholar]
Cachin, C. Entropy Measures and Unconditional Security in Cryptography. Ph.D. Thesis, Swiss Federal Institute of Technology Zurich, Zürich, Switzerland, 1997. [Google Scholar]
Renner, R.; Wolf, S. Advances in Cryptology—ASIACRYPT 2005. In Proceedings of the 11th International Conference on the Theory and Application of Cryptology and Information Security, Chennai, India, 4–8 December 2005; Chapter Simple and Tight Bounds for Information Reconciliation and Privacy Amplification. Springer: Berlin/Heidelberg, Germany, 2005; pp. 199–216. [Google Scholar]
Hayashi, M. Exponential decreasing rate of leaked information in universal random privacy amplification. IEEE Trans. Inf. Theory 2011, 57, 3989–4001. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing); Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
Fehr, S.; Berens, S. On the conditional Rényi entropy. IEEE Trans. Inf. Theory 2014, 60, 6801–6810. [Google Scholar] [CrossRef]
Iwamoto, M.; Shikata, J. Information theoretic security for encryption based on conditional Rényi entropies. In Information Theoretic Security; Springer: Berlin/Heidelberg, Germany, 2013; pp. 103–121. [Google Scholar]
Jizba, P.; Kleinert, H.; Shefaat, M. Rényi’s information transfer between financial time series. Phys. A 2012, 391, 2971–2989. [Google Scholar] [CrossRef]

Table 1. Properties of different CRE-s (✓ stands for satisfied, ✗ for not satisfied, and ✓

^{*}

for satisfied for

α \geq 1

).

Table 1. Properties of different CRE-s (✓ stands for satisfied, ✗ for not satisfied, and ✓

^{*}

for satisfied for

α \geq 1

).

	C	$JA$	$RW$	H	A
Chain Rule	✗	✓	✗	✗	✗
Weak Chain Rule	✗	✓	✓	✗	✗
CRE	✗	✗	✓ $^{*}$	✓	✓
Monotonicity	✗	✗	✗	✓	✓

Table 2. Properties of different MI deffinitions (✓ stands for satisfied, ✗ for not satisfied, and ✓

^{*}

for satisfied for

α \geq 1

).

Table 2. Properties of different MI deffinitions (✓ stands for satisfied, ✗ for not satisfied, and ✓

^{*}

for satisfied for

α \geq 1

).

	C	$JA$	$RW$	H	A
Non-Negativity	✗	✗	✓ $^{*}$	✓	✓
Symmetry	✗	✓	✗	✗	✗
DPI	✗	✗	✗	✓	✓

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ilić, V.M.; Djordjević, I.B.; Stanković, M. On a General Definition of Conditional Rényi Entropies. Proceedings 2018, 2, 166. https://doi.org/10.3390/ecea-4-05030

AMA Style

Ilić VM, Djordjević IB, Stanković M. On a General Definition of Conditional Rényi Entropies. Proceedings. 2018; 2(4):166. https://doi.org/10.3390/ecea-4-05030

Chicago/Turabian Style

Ilić, Velimir M., Ivan B. Djordjević, and Miomir Stanković. 2018. "On a General Definition of Conditional Rényi Entropies" Proceedings 2, no. 4: 166. https://doi.org/10.3390/ecea-4-05030

APA Style

Ilić, V. M., Djordjević, I. B., & Stanković, M. (2018). On a General Definition of Conditional Rényi Entropies. Proceedings, 2(4), 166. https://doi.org/10.3390/ecea-4-05030

Article Menu

On a General Definition of Conditional Rényi Entropies^†

Abstract

1. Introduction

2. Rényi Entropy

3. $α$ - $β$ - $γ$ Conditional Rényi Entropy

4. Properties of $α$ - $β$ - $γ$ CRE

5. $α$ - $β$ - $γ$ Mutual Information

6. Conclusions

Acknowledgments

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

On a General Definition of Conditional Rényi Entropies †

Abstract

1. Introduction

2. Rényi Entropy

3. α - β - γ Conditional Rényi Entropy

4. Properties of α - β - γ CRE

5. α - β - γ Mutual Information

6. Conclusions

Acknowledgments

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

On a General Definition of Conditional Rényi Entropies^†

3. $α$ - $β$ - $γ$ Conditional Rényi Entropy

4. Properties of $α$ - $β$ - $γ$ CRE

5. $α$ - $β$ - $γ$ Mutual Information