Bounds on the Excess Minimum Risk via Generalized Information Divergence Measures
Abstract
1. Introduction
- We extend the bound in [14] by introducing a family of bounds based on generalized information divergence measures, namely, the Rényi divergence, the -Jensen–Shannon divergence, and the Sibson mutual information, parameterized by the order . Unlike [11] and [17], where the sub-Gaussian parameter is assumed to be constant, our setup allows this parameter to depend on the (target) random vector being estimated. This makes our bounds applicable to a broader class of joint distributions over the random vectors involved.
- For the Rényi divergence based bounds, we adopt an approach similar to that of [11], deriving upper bounds by making use of the the variational representation of the Rényi divergence.
- We provide simple conditions under which the -Jensen–Shannon divergence bound is tighter than the other two bounds for bounded loss functions.
- We compare the bounds based on the aforementioned information divergence measures with mutual information-based bounds by providing numerical examples.
2. Preliminaries
2.1. Problem Setup
2.2. Definitions
3. Bounding Excess Minimum Risk
3.1. Rényi Divergence-Based Upper Bound
3.2. -Jensen–Shannon Divergence-Based Upper Bound
3.3. Sibson Mutual Information-Based Upper Bound
3.4. Comparison of Proposed Upper Bounds
4. Numerical Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Appendix B
References
- Rényi, A. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, Berkeley, CA, USA, 20 June–30 July 1960; University of California Press: Berkeley, CA, USA, 1961; Volume 4, pp. 547–562. [Google Scholar]
- Nielsen, F. On a generalization of the Jensen–Shannon divergence and the Jensen–Shannon centroid. Entropy 2020, 22, 221. [Google Scholar] [CrossRef]
- Lin, J. Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 1991, 37, 145–151. [Google Scholar] [CrossRef]
- Sibson, R. Information radius. Z. Wahrscheinlichkeitstheorie Verwandte Geb. 1969, 14, 149–160. [Google Scholar] [CrossRef]
- Csiszar, I. Generalized cutoff rates and Renyi’s information measures. IEEE Trans. Inf. Theory 1995, 41, 26–34. [Google Scholar] [CrossRef]
- Xu, A.; Raginsky, M. Information-theoretic analysis of generalization capability of learning algorithms. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Bu, Y.; Zou, S.; Veeravalli, V.V. Tightening mutual information-based bounds on generalization error. IEEE J. Sel. Areas Inf. Theory 2020, 1, 121–130. [Google Scholar] [CrossRef]
- Esposito, A.R.; Gastpar, M.; Issa, I. Robust generalization via f- mutual information. In Proceedings of the 2020 IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020; pp. 2723–2728. [Google Scholar]
- Esposito, A.R.; Gastpar, M.; Issa, I. Variational characterizations of Sibson’s α-mutual information. In Proceedings of the 2024 IEEE International Symposium on Information Theory (ISIT), Athens, Greece, 7–12 July 2024; pp. 2110–2115. [Google Scholar]
- Esposito, A.R.; Gastpar, M.; Issa, I. Generalization error bounds via Rényi-, f-divergences and maximal leakage. IEEE Trans. Inf. Theory 2021, 67, 4986–5004. [Google Scholar] [CrossRef]
- Modak, E.; Asnani, H.; Prabhakaran, V.M. Rényi divergence based bounds on generalization error. In Proceedings of the 2021 IEEE Information Theory Workshop (ITW), Kanazawa, Japan, 17–21 October 2021; pp. 1–6. [Google Scholar]
- Ji, K.; Zhou, Y.; Liang, Y. Understanding estimation and generalization error of generative adversarial networks. IEEE Trans. Inf. Theory 2021, 67, 3114–3129. [Google Scholar] [CrossRef]
- Xu, A.; Raginsky, M. Minimum excess risk in Bayesian learning. IEEE Trans. Inf. Theory 2022, 68, 7935–7955. [Google Scholar] [CrossRef]
- Györfi, L.; Linder, T.; Walk, H. Lossless transformations and excess risk bounds in statistical inference. Entropy 2023, 25, 1394. [Google Scholar] [CrossRef]
- Hafez-Kolahi, H.; Moniri, B.; Kasaei, S. Information-theoretic analysis of minimax excess risk. IEEE Trans. Inf. Theory 2023, 69, 4659–4674. [Google Scholar] [CrossRef]
- Aminian, G.; Bu, Y.; Toni, L.; Rodrigues, M.R.; Wornell, G.W. Information-theoretic characterizations of generalization error for the Gibbs algorithm. IEEE Trans. Inf. Theory 2023, 70, 632–655. [Google Scholar] [CrossRef]
- Aminian, G.; Masiha, S.; Toni, L.; Rodrigues, M.R. Learning algorithm generalization error bounds via auxiliary distributions. IEEE J. Sel. Areas Inf. Theory 2024, 5, 273–284. [Google Scholar] [CrossRef]
- Birrell, J.; Dupuis, P.; Katsoulakis, M.A.; Rey-Bellet, L.; Wang, J. Variational representations and neural network estimation of Rényi divergences. SIAM J. Math. Data Sci. 2021, 3, 1093–1116. [Google Scholar] [CrossRef]
- Atar, R.; Chowdhary, K.; Dupuis, P. Robust bounds on risk-sensitive functionals via Rényi divergence. SIAM/ASA J. Uncertain. Quantif. 2015, 3, 18–33. [Google Scholar] [CrossRef]
- Anantharam, V. A variational characterization of Rényi divergences. IEEE Trans. Inf. Theory 2018, 64, 6979–6989. [Google Scholar] [CrossRef]
- Csiszár, I. Information-type measures of difference of probability distributions and indirect observations. Stud. Sci. Math. Hung. 1967, 2, 299–318. [Google Scholar]
- Huang, Y.; Xiao, F.; Cao, Z.; Lin, C.T. Fractal belief Rényi divergence with its applications in pattern classification. IEEE Trans. Knowl. Data Eng. 2024, 36, 8297–8312. [Google Scholar] [CrossRef]
- Huang, Y.; Xiao, F.; Cao, Z.; Lin, C.T. Higher order fractal belief Rényi divergence with its applications in pattern classification. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 14709–14726. [Google Scholar] [CrossRef]
- Zhang, L.; Xiao, F. Belief Rényi divergence of divergence and its application in time series classification. IEEE Trans. Knowl. Data Eng. 2024, 36, 3670–3681. [Google Scholar] [CrossRef]
- McAllester, D.A. PAC-Bayesian model averaging. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory (COLT), Santa Cruz, CA, USA, 7–9 July 1999; pp. 164–170. [Google Scholar]
- Alquier, P.; Ridgway, J.; Chopin, N. Properties of variational approximations of Gibbs posteriors. J. Mach. Learn. Res. 2016, 17, 8372–8414. [Google Scholar]
- Lopez, A.; Jog, V. Generalization bounds via Wasserstein distance based algorithmic stability. In Proceedings of the 37th International Conference on Machine Learning (ICML), Virtual, 13–18 July 2020; pp. 6326–6335. [Google Scholar]
- Esposito, A.R.; Gastpar, M. From generalisation error to transportation-cost inequalities and back. In Proceedings of the 2022 IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, 26 June–1 July 2022; pp. 294–299. [Google Scholar]
- Lugosi, G.; Neu, G. Generalization bounds via convex analysis. In Proceedings of the Machine Learning Research (PMLR), London, UK, 2–5 July 2022; pp. 1–23. [Google Scholar]
- Welfert, M.; Kurri, G.R.; Otstot, K.; Sankar, L. Addressing GAN training instabilities via tunable classification losses. IEEE J. Sel. Areas Inf. Theory 2024, 5, 534–553. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Annual Conference on Neural Information Processing Systems 2014, Montreal, QC, Canada, 8–13 December 2014; Volume 27. [Google Scholar]
- Esposito, A.R.; Vandenbroucque, A.; Gastpar, A. Lower bounds on the Bayesian risk via information measures. J. Mach. Learn. Res. 2024, 25, 1–45. [Google Scholar]
- Donsker, M.; Varadhan, S. Asymptotic evaluation of certain Markov process expectations for large time. IV. Commun. Pure Appl. Math. 1983, 36, 183–212. [Google Scholar] [CrossRef]
- Van Erven, T.; Harremos, P. Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inf. Theory 2014, 60, 3797–3820. [Google Scholar] [CrossRef]
- Verdú, S. α-mutual information. In Proceedings of the 2015 Information Theory and Applications Workshop (ITA), San Diego, CA, USA, 1–6 February 2015; pp. 1–6. [Google Scholar]
- Esposito, A.R.; Gastpar, M.; Issa, I. Sibson’s α-mutual information and its variational representations. arXiv 2024, arXiv:2405.08352. [Google Scholar]
- Hoeffding, W. Probability inequalities for sums of bounded random variables. In The Collected Works of Wassily Hoeffding; Springer: New York, NY, USA, 1994; pp. 409–426. [Google Scholar]
- Palomar, D.P.; Verdú, S. Lautum information. IEEE Trans. Inf. Theory 2008, 54, 964–975. [Google Scholar] [CrossRef]
- Buldygin, V.V.; Kozachenko, Y.V. Sub-Gaussian random variables. Ukr. Math. J. 1980, 32, 483–489. [Google Scholar] [CrossRef]
- Rivasplata, O. Subgaussian Random Variables: An Expository Note. 2012. Available online: https://www.stat.cmu.edu/~arinaldo/36788/subgaussians.pdf (accessed on 24 May 2025).
- Endres, D.M.; Schindelin, J.E. A new metric for probability distributions. IEEE Trans. Inf. Theory 2003, 49, 1858–1860. [Google Scholar] [CrossRef]
- Omanwar, A.; Alajaji, F.; Linder, T. Bounding excess minimum risk via Rényi’s divergence. In Proceedings of the 2024 International Symposium on Information Theory and Its Applications (ISITA), Taipei, Taiwan, 10–13 November 2024; pp. 59–63. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Omanwar, A.; Alajaji, F.; Linder, T. Bounds on the Excess Minimum Risk via Generalized Information Divergence Measures. Entropy 2025, 27, 727. https://doi.org/10.3390/e27070727
Omanwar A, Alajaji F, Linder T. Bounds on the Excess Minimum Risk via Generalized Information Divergence Measures. Entropy. 2025; 27(7):727. https://doi.org/10.3390/e27070727
Chicago/Turabian StyleOmanwar, Ananya, Fady Alajaji, and Tamás Linder. 2025. "Bounds on the Excess Minimum Risk via Generalized Information Divergence Measures" Entropy 27, no. 7: 727. https://doi.org/10.3390/e27070727
APA StyleOmanwar, A., Alajaji, F., & Linder, T. (2025). Bounds on the Excess Minimum Risk via Generalized Information Divergence Measures. Entropy, 27(7), 727. https://doi.org/10.3390/e27070727