Conditional Rényi Divergence Saddlepoint and the Maximization of α-Mutual Information
Abstract
1. Introduction
- As aforementioned, the straight generalization has not yet found wide applicability.
 - In the discrete case and , Arimoto [28] proposed the definition of the nonnegative quantitywhere the Rényi entropy [1] and Arimoto-Rényi conditional entropy [28] arewith the -norm of a probability mass function denoted as . Arimoto extended his algorithm in [14] to compute what he called the capacity of order ,for finite-alphabet random transformations and showed that there exist codes of rate R and blocklength n whose error probability is upper bounded by
 - Augustin [29] and, later, Csiszár [4] definedis dubbed the Augustin capacity of order in [30]. Csiszár [4] showed that for , is the intercept on the R-axis of a supporting line of slope of the error exponent function for codes of rate R with constant-composition . Unfortunately, the minimization in (14) is not amenable to explicit solution.
 - For the purpose of coming up with a measure of the similarity among a finite collection of probability measures , weighted by on , Sibson [31] proposed the information radius of order asAs we will see, the minimization in (15) can be solved explicitly. This is the generalization of mutual information we adopt in this paper and which, as in [27], we refer to as -mutual information. A word of caution is that in [4], the symbols and are used in lieu of what we denote and , respectively. is dubbed the Rényi capacity of order in [26].
 - Independently, Lapidoth-Pfister [32] and Tomamichel-Hayashi [33] proposedand showed that it determines the performance of composite hypothesis tests for independence where the hypothesized joint distribution is known but under the independence hypothesis the marginals are unknown. It was shown in [34] that
 
2. Notation, Definitions and Properties
- If is a probability space, indicates for all .
 - Let and be measurable spaces, which we refer to as the input and output spaces, respectively, with and referred to as the input and output alphabets respectively. denotes a random transformation from to , i.e., for any , is a probability measure on , and for any , is an -measurable function. For brevity, we will usually drop mention of the underlying -fields. If P is a probability measure on and is a random transformation, the corresponding joint probability measure on is denoted by (or, interchangeably, ). The notation indicates that the output marginal of the joint probability measure is denoted by Q.
 - The relative information between two probability measures P and Q on the same measurable space such that is defined aswhere is the Radon-Nikodym derivative of P with respect to Q. The relative entropy is
 - Given , the information density is defined as
 - Fix , , and a probability measure on . Then, the output probability measure is called the -response to ifwhere , and is a scalar that guarantees that is a probability measure. For notational convenience, we omit the dependence of on and . Equivalently, if and denote the densities with respect to some dominating measure, then (24) becomesIn particular, the 1-response to is . In [26], the -response to is dubbed the order Rényi mean for prior .
 - Given two probability measures P and Q on the same measurable space and a scalar , the Rényi divergence of order between P and Q is defined as [1]where p and q are the Radon-Nikodym derivatives of P and Q, respectively, with respect to a common dominating -finite measure . We define as this coincides with the limit from the left at . It is also the limit from the right whenever for some . The cases and can be defined by taking the corresponding limits. In this work, we only focus on the simple orders of , i.e., . As we saw in (1), if , then (26) becomes
 - If , then the binary Rényi divergence of order is given byNote thatwhere the usual binary entropy function is denoted by . Given , , the solution to is
 - , with equality only if .
 - is monotonically increasing with .
 - While we may have and simultaneously, for any is equivalent to P and Q being orthogonal. Conversely, if for some , , then .
 - The Rényi divergence satisfies the data-processing inequality. If and , then
 - Gilardoni [38] gave a strengthened Pinsker’s inequality upper bounding the square of the total variation distance bywhere we have used the monotonicity in of the Rényi divergence.
 - The Rényi divergence is lower semicontinuous in the topology of setwise convergence, i.e., if for every event , , and , thenIn particular, note that (36) holds if .
 - In the theory of robust lossless source coding [22,25] the following scalar, called the -minimax redundancy of , is an important measure of the worst-case redundancy penalty that ensues when the encoder only knows that the data is generated according to one of the probability measures in the collection :where the infimum is over all the probability measures on .
 - Given input distribution and random transformations , the conditional Rényi divergence of order isAlthough (38) also holds for the familiar case, in general the conditional Rényi divergence is not the arithmetic average of with respect to if . Instead it’s a generalized mean, or a scaled cumulant generating function evaluated at . Specifically, if , thenRegardless of whether or , (39) implies thatwith the supremum in (41) over all input probability measures.
 - Given , and , the α-mutual information is [27,31]where . It can be checked that the constant in (24) is equal toNote that but, in general, .
 - An alternative expression for -mutual information, which will come in handy in our analysis and which does not involve either or is obtained by introducing an auxiliary probability measure dominating the collection [27]:whereAs usual, sometimes it is convenient to fix -finite measures and on the input and output spaces which dominate and , respectively, and denote their densities with respect to the reference measures byThen, we can write -mutual information as
 - In the special case of discrete alphabets,where the left side is the familiar Gallager function defined in [36] for as
 - Fix , , and a collection of probability measures on the input space. Then, we denoteWhen is the set of all input measures we write simply , dubbed the Rényi capacity in [26]. is a measure of the similarity of the family , which plays an important role in the analysis of the fundamental limits of information transmission through noisy channels, particularly in the regime of exponentially small error probability. For a long time (e.g., [39]) the cutoff rate was conjectured to be the maximal rate for which reliable codes with manageable decoding complexity can be found. The zero-error capacity of the discrete memoryless channel with feedback is equal to either zero or [40]depending on whether there is such that .
 - While is convex in the pair , the picture for Rényi divergence is somewhat more nuanced:
- (a)
 - If , then is convex in .
 - (b)
 
 - For any fixed pair , is concave (resp. convex) in if (resp. ) (see [43]).
 
3. Conditional Rényi Divergence Game
3.1. Saddle point
- : Regardless of whether or , we see from (45) that if there exists some such thatthen , which contradicts the assumed optimality of . Moreover, if there exists some such that (68) holds with the strict inequality reversed, then (59) would be violated, again contradicting the assumed optimality of .
 
3.2. Minimax identity
4. Finding
5. Proofs
5.1. Proof of Theorem 1
5.2. Proof of Theorem 2
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Rényi, A. On measures of information and entropy. In Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability; Neyman, J., Ed.; University of California Press: Berkeley, CA, USA, 1961; pp. 547–561. [Google Scholar]
 - Van Erven, T.; Harremoës, P. Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inf. Theory 2014, 60, 3797–3820. [Google Scholar] [CrossRef]
 - Blahut, R.E. Hypothesis testing and information theory. IEEE Trans. Inf. Theory 1974, 20, 405–417. [Google Scholar] [CrossRef]
 - Csiszár, I. Generalized cutoff rates and Rényi’s information measures. IEEE Trans. Inf. Theory 1995, 41, 26–34. [Google Scholar] [CrossRef]
 - Shayevitz, O. On Rényi measures and hypothesis testing. In Proceedings of the 2011 IEEE International Symposium on Information Theory, Saint Petersburg, Russia, 31 July–5 August 2011; pp. 894–898. [Google Scholar]
 - Harremoës, P. Interpretations of Rényi entropies and divergences. Phys. A Stat. Mech. Its Appl. 2006, 365, 57–62. [Google Scholar] [CrossRef]
 - Sason, I.; Verdú, S. Improved bounds on lossless source coding and guessing moments via Rényi measures. IEEE Trans. Inf. Theory 2018, 64, 4323–4346. [Google Scholar] [CrossRef]
 - Haroutunian, E.A. Estimates of the exponent of the error probability for a semicontinuous memoryless channel. Probl. Inf. Transm. 1968, 4, 29–39. [Google Scholar]
 - Haroutunian, E.A.; Haroutunian, M.E.; Harutyunyan, A.N. Reliability criteria in information theory and in statistical hypothesis testing. Found. Trends Commun. Inf. Theory 2007, 4, 97–263. [Google Scholar] [CrossRef]
 - Polyanskiy, Y.; Verdú, S. Arimoto channel coding converse and Rényi divergence. In Proceedings of the 48th Annual Allerton Conference on Communication, Control, and Computing, Allerton, IL, USA, 29 September–1 October 2010; pp. 1327–1333. [Google Scholar]
 - Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
 - Liese, F.; Vajda, I. Convex Statistical Distances; Number 95 in Teubner Texte zur Mathematik; Teubner: Leipzig, Germany, 1987. [Google Scholar]
 - Tridenski, S.; Zamir, R.; Ingber, A. The Ziv-Zakai-Renyi bound for joint source-channel coding. IEEE Trans. Inf. Theory 2015, 61, 429–4315. [Google Scholar] [CrossRef]
 - Arimoto, S. An algorithm for computing the capacity of arbitrary discrete memoryless channels. IEEE Trans. Inf. Theory 1972, 18, 14–20. [Google Scholar] [CrossRef]
 - Blahut, R.E. Computation of channel capacity and rate-distortion functions. IEEE Trans. Inf. Theory 1972, 18, 460–473. [Google Scholar] [CrossRef]
 - Blachman, N.M. Communication as a game. Proc. IRE Wescon 1957, 2, 61–66. [Google Scholar]
 - Borden, J.M.; Mason, D.M.; McEliece, R.J. Some information theoretic saddlepoints. SIAM J. Control Optim. 1985, 23, 129–143. [Google Scholar] [CrossRef]
 - Lapidoth, A.; Narayan, P. Reliable communication under channel uncertainty. IEEE Trans. Inf. Theory 1998, 44, 2148–2177. [Google Scholar] [CrossRef]
 - Kemperman, J. On the Shannon capacity of an arbitrary channel. In Koninklijke Nederlandse Akademie van Wetenschappen, Indagationes Mathematicae; Elsevier: Amsterdam, The Netherlands, 1974; Volume 77, pp. 101–115. [Google Scholar]
 - Gallager, R.G. Source Coding With Side Information and Universal Coding; Technical Report LIDS-P-937; Lab. Information Decision Systems, Massachusetts Institute of Technology: Cambridge, MA, USA, 1979. [Google Scholar]
 - Ryabko, B. Encoding a source with unknown but ordered probabilities. Probl. Inf. Transm. 1979, 15, 134–138. [Google Scholar]
 - Davisson, L.; Leon-Garcia, A. A source matching approach to finding minimax codes. IEEE Trans. Inf. Theory 1980, 26, 166–174. [Google Scholar] [CrossRef]
 - Ryabko, B. Comments on “A Source Matching Approach to Finding Minimax Codes”. IEEE Trans. Inf. Theory 1981, 27, 780–781. [Google Scholar] [CrossRef]
 - Haussler, D. A general minimax result for relative entropy. IEEE Trans. Inf. Theory 1997, 43, 1276–1280. [Google Scholar] [CrossRef]
 - Yagli, S.; Altuğ, Y.; Verdú, S. Minimax Rényi redundancy. IEEE Trans. Inf. Theory 2018, 64, 3715–3733. [Google Scholar] [CrossRef]
 - Nakiboğlu, B. The Rényi capacity and center. IEEE Trans. Inf. Theory 2019, 65, 841–860. [Google Scholar] [CrossRef]
 - Verdú, S. α-mutual information. In Proceedings of the 2015 Information Theory and Applications Workshop, San Diego, CA, USA, 1–6 February 2015; pp. 1–6. [Google Scholar]
 - Arimoto, S. Information Measures and Capacity of Order α for Discrete Memoryless Channels. In Topics in Information Theory, Proceedings of the Coll. Math. Soc. János Bolyai; Bolyai: Keszthely, Hungary, 1975; pp. 41–52. [Google Scholar]
 - Augustin, U. Noisy Channels. Ph.D. Thesis, Universität Erlangen-Nürnberg, Erlangen, Germany, 1978. [Google Scholar]
 - Nakiboglu, B. The Augustin capacity and center. arXiv 2018, arXiv:1606.00304. [Google Scholar]
 - Sibson, R. Information radius. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 1969, 14, 149–160. [Google Scholar] [CrossRef]
 - Lapidoth, A.; Pfister, C. Two measures of dependence. Entropy 2019, 21, 778. [Google Scholar] [CrossRef]
 - Tomamichel, M.; Hayashi, M. Operational interpretation of Rényi information measures via composite hypothesis testing against product and Markov distributions. IEEE Trans. Inf. Theory 2018, 64, 1064–1082. [Google Scholar] [CrossRef]
 - Aishwarya, G.; Madiman, M. Remarks on Rényi versions of conditional entropy and mutual information. In Proceedings of the 2019 IEEE International Symposium on Information Theory, Paris, France, 7–12 July 2019; pp. 1117–1121. [Google Scholar]
 - Shannon, C.E.; Gallager, R.G.; Berlekamp, E. Lower bounds to error probability for coding on discrete memoryless channels, I. Inf. Control 1967, 10, 65–103. [Google Scholar] [CrossRef]
 - Gallager, R.G. A simple derivation of the coding theorem and some applications. IEEE Trans. Inf. Theory 1965, 11, 3–18. [Google Scholar] [CrossRef]
 - Yagli, S.; Cuff, P. Exact soft-covering exponent. IEEE Trans. Inf. Theory 2019, 65, 6234–6262. [Google Scholar] [CrossRef]
 - Gilardoni, G.L. On Pinsker’s and Vajda’s type inequalities for Csiszár’s f-divergences. IEEE Trans. Inf. Theory 2010, 56, 5377–5386. [Google Scholar] [CrossRef]
 - Massey, J.L. Coding and modulation in digital communications. In Proceedings of the 1974 International Zurich Seminar on Digital Communications, Zurich, Switzerland, 12–15 March 1974; pp. E2(1)–E2(4). [Google Scholar]
 - Shannon, C.E. The zero error capacity of a noisy channel. IRE Trans. Inf. Theory 1956, 2, 8–19. [Google Scholar] [CrossRef]
 - Sundaresan, R. Guessing under source uncertainty. IEEE Trans. Inf. Theory 2007, 53, 269–287. [Google Scholar] [CrossRef]
 - Bunte, C.; Lapidoth, A. Encoding tasks and Rényi entropy. IEEE Trans. Inf. Theory 2014, 60, 5065–5076. [Google Scholar] [CrossRef]
 - Ho, S.W.; Verdú, S. Convexity/concavity of Rényi entropy and α-mutual information. In Proceedings of the 2015 IEEE International Symposium on Information Theory, Hong Kong, China, 14–19 June 2015; pp. 745–749. [Google Scholar]
 - Gallager, R.G. Information Theory and Reliable Communication; John Wiley and Sons: New York, USA, 1968; Volume 2. [Google Scholar]
 - Verdú, S. Channel Capacity. In The Electrical Engineering Handbook, 2nd ed.; IEEE Press: Piscataway, NJ, USA; CRC Press: Boca Raton, FL, USA, 1997; Chapter 73.5; pp. 1671–1678. [Google Scholar]
 - Luenberger, D. Optimization by vector space methods; John Wiley and Sons: Hoboken, NJ, USA, 1997. [Google Scholar]
 - Polyanskiy, Y.; Wu, Y. Lecture Notes on Information Theory. 2017. Available online: http://people.lids.mit.edu/yp/homepage/data/itlectures_v5.pdf (accessed on 30 April 2019).
 
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cai, C.; Verdú, S. Conditional Rényi Divergence Saddlepoint and the Maximization of α-Mutual Information. Entropy 2019, 21, 969. https://doi.org/10.3390/e21100969
Cai C, Verdú S. Conditional Rényi Divergence Saddlepoint and the Maximization of α-Mutual Information. Entropy. 2019; 21(10):969. https://doi.org/10.3390/e21100969
Chicago/Turabian StyleCai, Changxiao, and Sergio Verdú. 2019. "Conditional Rényi Divergence Saddlepoint and the Maximization of α-Mutual Information" Entropy 21, no. 10: 969. https://doi.org/10.3390/e21100969
APA StyleCai, C., & Verdú, S. (2019). Conditional Rényi Divergence Saddlepoint and the Maximization of α-Mutual Information. Entropy, 21(10), 969. https://doi.org/10.3390/e21100969
        