On Relations Between the Relative Entropy and χ2-Divergence, Generalizations and Applications
Abstract
:1. Introduction
1.1. Paper Contributions
1.2. Paper Organization
2. Preliminaries and Notation
- If , then
- By the continuous extension of ,
3. Relations between Divergences
3.1. Relations between the Relative Entropy and the Chi-Squared Divergence
- (a)
- New consequences and applications of it are obtained, including new shorter proofs of some known results;
- (b)
- An interesting extension provides new relations between f-divergences (see Section 3.3).
- (a)
- Pinsker’s inequality:
- (b)
- Furthermore, let be a sequence of probability measures that is defined on a measurable space , and which converges to a probability measure P in the sense that
- (c)
- For all ,Moreover, under the assumption in (31), for all
- (d)
- [15] [Theorem 2]:
3.2. Implications of Theorem 1
- (a)
- If , then
- (b)
- The lower bound on the right side of (40) is attained for P and Q which are defined on the two-element set , and
- (c)
- If and and are selected arbitrarily, then
- (a)
- The means under and are both equal to m (independently of n);
- (b)
- The variance under is equal to , and the variance under is equal to (independently of n);
- (c)
- The relative entropy vanishes as we let .
3.3. Monotonic Sequences of f-Divergences and an Extension of Theorem 1
- (a)
- is a non-increasing (and non-negative) sequence of f-divergences.
- (b)
- For all and ,
3.4. On Probabilities and f-Divergences
4. Applications
4.1. Application of Corollary 3: Shannon Code for Universal Lossless Compression
4.2. Application of Theorem 2 in the Context of the Method of Types and Large Deviations Theory
- (i)
- If , then it follows from (80) that .
- (ii)
- Consider a richer alphabet size of the i.i.d. samples where, e.g., . By relying on the same universal lower bound , which holds independently of the value of ( can possibly be an infinite set), it follows from (80) that is the minimal value such that the upper bound in (78) does not exceed .
- (a)
- The relative entropy between real-valued Gaussian distributions is given by
- (b)
- Let denote a random variable which is exponentially distributed with mean ; its probability density function is given byThen, for and ,In this case, the means under P and Q are and , respectively, and the variances are and . Hence, for obtaining the required means and variances, set
- (i)
- If , then the lower bound in (40) is equal to 0.521 nats, and the two relative entropies in (82) and (84) are equal to 0.625 and 1.118 nats, respectively.
- (ii)
- If , then the lower bound in (40) is equal to 2.332 nats, and the two relative entropies in (82) and (84) are equal to 5.722 and 3.701 nats, respectively.
4.3. Strong Data–Processing Inequalities and Maximal Correlation
5. Proofs
5.1. Proof of Theorem 1
5.2. Proof of Proposition 1
- (a)
- We need the weaker inequality , proved by the Cauchy–Schwarz inequality:By combining (24) and (139)–(141), it follows that
- (b)
- Proof of (30) and its local tightness:We next show the local tightness of inequality (30) by proving that (31) yields (32). Let be a sequence of probability measures, defined on a measurable space , and assume that converges to a probability measure P in the sense that (31) holds. In view of [16] [Theorem 7] (see also [15] [Section 4.F] and [63]), it follows that
- (c)
- Proof of (33) and (34): The proof of (33) relies on (28) and the following lemma.Lemma 1.For all ,Proof.From (28) and (155), for all ,This proves (33). Furthermore, under the assumption in (31), for all ,
- (d)
- Proof of (35): From (24), we getReferring to the integrand of the first term on the right side of (169), for all ,From (170)–(174), an upper bound on the right side of (169) results. This givesIt should be noted that [15] [Theorem 2(a)] shows that inequality (35) is tight. To that end, let , and define probability measures and on the set with and . Then,
5.3. Proof of Theorem 2
- (i)
- If , then implies that , and ;
- (ii)
- If , then implies that , and .
5.4. Proof of Theorem 3
5.5. Proof of Theorem 4
5.6. Proof of Corollary 5
5.7. Proof of Theorem 5 and Corollary 6
5.8. Proof of Theorem 6
5.9. Proof of Corollary 7
5.10. Proof of Proposition 3
5.11. Proof of Proposition 4
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
- Pearson, K. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Lond. Edinb. Dublin Philos. Mag. J. Sci. 1900, 50, 157–175. [Google Scholar] [CrossRef] [Green Version]
- Csiszár, I.; Shields, P.C. Information Theory and Statistics: A Tutorial. Found. Trends Commun. Inf. Theory 2004, 1, 417–528. [Google Scholar] [CrossRef] [Green Version]
- Ali, S.M.; Silvey, S.D. A general class of coefficients of divergence of one distribution from another. J. R. Stat. Soc. 1966, 28, 131–142. [Google Scholar] [CrossRef]
- Csiszár, I. Eine Informationstheoretische Ungleichung und ihre Anwendung auf den Bewis der Ergodizität von Markhoffschen Ketten. Publ. Math. Inst. Hungar. Acad. Sci. 1963, 8, 85–108. [Google Scholar]
- Csiszár, I. Information-type measures of difference of probability distributions and indirect observations. Stud. Sci. Math. Hung. 1967, 2, 299–318. [Google Scholar]
- Csiszár, I. On topological properties of f-divergences. Stud. Sci. Math. Hung. 1967, 2, 329–339. [Google Scholar]
- Csiszár, I. A class of measures of informativity of observation channels. Period. Math. Hung. 1972, 2, 191–213. [Google Scholar] [CrossRef]
- Van Erven, T.; Harremoës, P. Rényi divergence and Kullback–Leibler divergence. IEEE Trans. Inf. Theory 2014, 60, 3797–3820. [Google Scholar] [CrossRef] [Green Version]
- Rényi, A. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics; University of California Press: Berkeley, CA, USA, 1961; pp. 547–561. [Google Scholar]
- Liese, F.; Vajda, I. Convex Statistical Distances; Teubner-Texte Zur Mathematik: Leipzig, Germany, 1987. [Google Scholar]
- Liese, F.; Vajda, I. On divergences and informations in statistics and information theory. IEEE Trans. Inf. Theory 2006, 52, 4394–4412. [Google Scholar] [CrossRef]
- DeGroot, M.H. Uncertainty, information and sequential experiments. Ann. Math. Stat. 1962, 33, 404–419. [Google Scholar] [CrossRef]
- Polyanskiy, Y.; Poor, H.V.; Verdú, S. Channel coding rate in the finite blocklength regime. IEEE Trans. Inf. Theory 2010, 56, 2307–2359. [Google Scholar] [CrossRef]
- Sason, I.; Verdú, S. f-divergence inequalities. IEEE Trans. Inf. Theory 2016, 62, 5973–6006. [Google Scholar] [CrossRef]
- Sason, I. On f-divergences: Integral representations, local behavior, and inequalities. Entropy 2018, 20, 383. [Google Scholar] [CrossRef] [Green Version]
- Melbourne, J.; Madiman, M.; Salapaka, M.V. Relationships between certain f-divergences. In Proceedings of the 57th Annual Allerton Conference on Communication, Control and Computing, Urbana, IL, USA, 24–27 September 2019; pp. 1068–1073. [Google Scholar]
- Melbourne, J.; Talukdar, S.; Bhaban, S.; Madiman, M.; Salapaka, M.V. The Differential Entropy of Mixtures: New Bounds and Applications. Available online: https://arxiv.org/pdf/1805.11257.pdf (accessed on 22 April 2020).
- Audenaert, K.M.R. Quantum skew divergence. J. Math. Phys. 2014, 55, 112202. [Google Scholar] [CrossRef] [Green Version]
- Gibbs, A.L.; Su, F.E. On choosing and bounding probability metrics. Int. Stat. Rev. 2002, 70, 419–435. [Google Scholar] [CrossRef] [Green Version]
- Györfi, L.; Vajda, I. A class of modified Pearson and Neyman statistics. Stat. Decis. 2001, 19, 239–251. [Google Scholar] [CrossRef]
- Le Cam, L. Asymptotic Methods in Statistical Decision Theory; Series in Statistics; Springer: New York, NY, USA, 1986. [Google Scholar]
- Vincze, I. On the concept and measure of information contained in an observation. In Contributions to Probability; Gani, J., Rohatgi, V.K., Eds.; Academic Press: New York, NY, USA, 1981; pp. 207–214. [Google Scholar]
- Nishiyama, T. A New Lower Bound for Kullback–Leibler Divergence Based on Hammersley-Chapman-Robbins Bound. Available online: https://arxiv.org/abs/1907.00288v3 (accessed on 2 November 2019).
- Sason, I. On Csiszár’s f-divergences and informativities with applications. In Workshop on Channels, Statistics, Information, Secrecy and Randomness for the 80th birthday of I. Csiszár; The Rényi Institute of Mathematics, Hungarian Academy of Sciences: Budapest, Hungary, 2018. [Google Scholar]
- Makur, A.; Polyanskiy, Y. Comparison of channels: Criteria for domination by a symmetric channel. IEEE Trans. Inf. Theory 2018, 64, 5704–5725. [Google Scholar] [CrossRef]
- Simic, S. On a new moments inequality. Stat. Probab. Lett. 2008, 78, 2671–2678. [Google Scholar] [CrossRef]
- Chapman, D.G.; Robbins, H. Minimum variance estimation without regularity assumptions. Ann. Math. Stat. 1951, 22, 581–586. [Google Scholar] [CrossRef]
- Hammersley, J.M. On estimating restricted parameters. J. R. Stat. Soc. Ser. B 1950, 12, 192–240. [Google Scholar] [CrossRef]
- Verdú, S. Information Theory, in preparation.
- Wang, L.; Madiman, M. Beyond the entropy power inequality, via rearrangments. IEEE Trans. Inf. Theory 2014, 60, 5116–5137. [Google Scholar] [CrossRef] [Green Version]
- Lewin, L. Polylogarithms and Associated Functions; Elsevier North Holland: Amsterdam, The Netherlands, 1981. [Google Scholar]
- Marton, K. Bounding d¯-distance by informational divergence: A method to prove measure concentration. Ann. Probab. 1996, 24, 857–866. [Google Scholar] [CrossRef]
- Marton, K. Distance-divergence inequalities. IEEE Inf. Theory Soc. Newsl. 2014, 64, 9–13. [Google Scholar]
- Boucheron, S.; Lugosi, G.; Massart, P. Concentration Inequalities—A Nonasymptotic Theory of Independence; Oxford University Press: Oxford, UK, 2013. [Google Scholar]
- Raginsky, M.; Sason, I. Concentration of Measure Inequalities in Information Theory, Communications and Coding: Third Edition. In Foundations and Trends in Communications and Information Theory; NOW Publishers: Boston, MA, USA; Delft, The Netherlands, 2018. [Google Scholar]
- Csiszár, I. Sanov property, generalized I-projection and a conditional limit theorem. Ann. Probab. 1984, 12, 768–793. [Google Scholar] [CrossRef]
- Clarke, B.S.; Barron, A.R. Information-theoretic asymptotics of Bayes methods. IEEE Trans. Inf. Theory 1990, 36, 453–471. [Google Scholar] [CrossRef] [Green Version]
- Evans, R.J.; Boersma, J.; Blachman, N.M.; Jagers, A.A. The entropy of a Poisson distribution. SIAM Rev. 1988, 30, 314–317. [Google Scholar] [CrossRef] [Green Version]
- Knessl, C. Integral representations and asymptotic expansions for Shannon and Rényi entropies. Appl. Math. Lett. 1998, 11, 69–74. [Google Scholar] [CrossRef] [Green Version]
- Merhav, N.; Sason, I. An integral representation of the logarithmic function with applications in information theory. Entropy 2020, 22, 51. [Google Scholar] [CrossRef] [Green Version]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
- Csiszár, I. The method of types. IEEE Trans. Inf. Theory 1998, 44, 2505–2523. [Google Scholar] [CrossRef] [Green Version]
- Corless, R.M.; Gonnet, G.H.; Hare, D.E.G.; Jeffrey, D.J.; Knuth, D.E. On the Lambert W function. Adv. Comput. Math. 1996, 5, 329–359. [Google Scholar] [CrossRef]
- Tamm, U. Some refelections about the Lambert W function as inverse of x·log(x). In Proceedings of the 2014 IEEE Information Theory and Applications Workshop, San Diego, CA, USA, 9–14 February 2014. [Google Scholar]
- Cohen, J.E.; Kemperman, J.H.B.; Zbăganu, G. Comparison of Stochastic Matrices with Applications in Information Theory, Statistics, Economics and Population Sciences; Birkhäuser: Boston, MA, USA, 1998. [Google Scholar]
- Cohen, J.E.; Iwasa, Y.; Rautu, G.; Ruskai, M.B.; Seneta, E.; Zbăganu, G. Relative entropy under mappings by stochastic matrices. Linear Algebra Its Appl. 1993, 179, 211–235. [Google Scholar] [CrossRef] [Green Version]
- Makur, A.; Zheng, L. Bounds between contraction coefficients. In Proceedings of the 53rd Annual Allerton Conference on Communication, Control and Computing, Urbana, IL, USA, 29 September–2 October 2015; pp. 1422–1429. [Google Scholar]
- Makur, A. Information Contraction and Decomposition. Ph.D. Thesis, MIT, Cambridge, MA, USA, May 2019. [Google Scholar]
- Polyanskiy, Y.; Wu, Y. Strong data processing inequalities for channels and Bayesian networks. In Convexity and Concentration; The IMA Volumes in Mathematics and its Applications; Carlen, E., Madiman, M., Werner, E.M., Eds.; Springer: New York, NY, USA, 2017; Volume 161, pp. 211–249. [Google Scholar]
- Raginsky, M. Strong data processing inequalities and Φ-Sobolev inequalities for discrete channels. IEEE Trans. Inf. Theory 2016, 62, 3355–3389. [Google Scholar] [CrossRef] [Green Version]
- Sason, I. On data-processing and majorization inequalities for f-divergences with applications. Entropy 2019, 21, 1022. [Google Scholar] [CrossRef] [Green Version]
- Csiszár, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems, 2nd ed.; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
- Burbea, J.; Rao, C.R. On the convexity of some divergence measures based on entropy functions. IEEE Trans. Inf. Theory 1982, 28, 489–495. [Google Scholar] [CrossRef]
- Lin, J. Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 1991, 37, 145–151. [Google Scholar] [CrossRef] [Green Version]
- Menéndez, M.L.; Pardo, J.A.; Pardo, L.; Pardo, M.C. The Jensen–Shannon divergence. J. Frankl. Inst. 1997, 334, 307–318. [Google Scholar] [CrossRef]
- Topsøe, F. Some inequalities for information divergence and related measures of discrimination. IEEE Trans. Inf. Theory 2000, 46, 1602–1609. [Google Scholar] [CrossRef] [Green Version]
- Nielsen, F. On a generalization of the Jensen–Shannon divergence and the Jensen–Shannon centroids. Entropy 2020, 22, 221. [Google Scholar] [CrossRef] [Green Version]
- Asadi, M.; Ebrahimi, N.; Karazmi, O.; Soofi, E.S. Mixture models, Bayes Fisher information, and divergence measures. IEEE Trans. Inf. Theory 2019, 65, 2316–2321. [Google Scholar] [CrossRef]
- Sarmanov, O.V. Maximum correlation coefficient (non-symmetric case). Sel. Transl. Math. Stat. Probab. 1962, 2, 207–210. (In Russian) [Google Scholar]
- Gilardoni, G.L. Corrigendum to the note on the minimum f-divergence for given total variation. Comptes Rendus Math. 2010, 348, 299. [Google Scholar] [CrossRef]
- Reid, M.D.; Williamson, R.C. Information, divergence and risk for binary experiments. J. Mach. Learn. Res. 2011, 12, 731–817. [Google Scholar]
- Pardo, M.C.; Vajda, I. On asymptotic properties of information-theoretic divergences. IEEE Trans. Inf. Theory 2003, 49, 1860–1868. [Google Scholar] [CrossRef]
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nishiyama, T.; Sason, I. On Relations Between the Relative Entropy and χ2-Divergence, Generalizations and Applications. Entropy 2020, 22, 563. https://doi.org/10.3390/e22050563
Nishiyama T, Sason I. On Relations Between the Relative Entropy and χ2-Divergence, Generalizations and Applications. Entropy. 2020; 22(5):563. https://doi.org/10.3390/e22050563
Chicago/Turabian StyleNishiyama, Tomohiro, and Igal Sason. 2020. "On Relations Between the Relative Entropy and χ2-Divergence, Generalizations and Applications" Entropy 22, no. 5: 563. https://doi.org/10.3390/e22050563
APA StyleNishiyama, T., & Sason, I. (2020). On Relations Between the Relative Entropy and χ2-Divergence, Generalizations and Applications. Entropy, 22(5), 563. https://doi.org/10.3390/e22050563