Two Types of Geometric Jensen–Shannon Divergences
Abstract
1. Introduction
1.1. Kullback–Leibler and Jensen–Shannon Divergences
1.2. Jensen–Shannon Symmetrization of Dissimilarities with Generalized Mixtures
1.3. Paper Outline
2. A Novel Definition: The G-JSD, Extended to Positive Measures
2.1. Definition and Properties
- First, we replace the KLD with the extended KLD between positive densities and instead of normalized densities:(with );
- Second, we consider unnormalized M-mixture densities:
2.2. Power JSDs and (Extended) Min-JSD and Max-JSD
- When , we have
- When , we let and , where andThe total variation is given by
3. Geometric JSDs Between Gaussian Distributions
3.1. Exponential Families
3.2. Closed-Form Formula for Gaussian Distributions
4. The Extended and Normalized G-JSDs as Regularizations of the Ordinary JSD
5. Estimation and Approximation of the Extended and Normalized M-JSDs
5.1. Monte Carlo Estimators
5.2. Approximations via -Divergences
6. Summary and Concluding Remarks
Funding
Acknowledgments
Conflicts of Interest
Nomenclature
Means: | |
weighted scalar mean | |
weighted quasi-arithmetic scalar mean for generator | |
arithmetic mean | |
weighted arithmetic mean | |
weighted geometric mean | |
geometric mean | |
power mean with and | |
weighted power mean | |
Densities on measure space : | |
normalized density | |
unnormalized density | |
density normalizer | |
normalizer of M-mixture () | |
Monte Carlo estimator of | |
normalizer of weighted M-mixture | |
M-mixture | |
weighted M-mixture | |
Dissimilarities, divergences, and distances: | |
Kullback–Leibler divergence (KLD) | |
extended Kullback–Leibler divergence | |
reverse Kullback–Leibler divergence | |
cross-entropy | |
Shannon discrete or differential entropy | |
Jeffreys divergence | |
total variation distance | |
Bhattacharyya “distance” (not metric) | |
-skewed Bhattacharyya “distance” | |
Chernoff information or Chernoff distance | |
Taneja T-divergence | |
Ali–Silvey–Csiszár f-divergence | |
arbitrary dissimilarity measure | |
reverse dissimilarity measure | |
extended dissimilarity measure | |
projective dissimilarity measure | |
-divergence | |
Monte Carlo estimation of dissimilarity | |
Jensen–Shannon divergences and generalizations: | |
Jensen–Shannon divergence (JSD) | |
-weighted -skewed mixture JSD | |
M-JSD for M-mixtures | |
geometric JSD | |
extended geometric JSD | |
left-sided geometric JSD (right-sided for ) | |
min-JSD | |
max-JSD | |
gap between extended and normalized M-JSDs |
References
- Cover, T.M. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 1999. [Google Scholar]
- Jeffreys, H. The Theory of Probability; OuP Oxford: Oxford, UK, 1998. [Google Scholar]
- Johnson, D.H.; Sinanovic, S. Symmetrizing the Kullback-Leibler distance. IEEE Trans. Inf. Theory 2001, 1, 1–10. [Google Scholar]
- Lin, J. Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 1991, 37, 145–151. [Google Scholar] [CrossRef]
- Fuglede, B.; Topsoe, F. Jensen-Shannon divergence and Hilbert space embedding. In Proceedings of the International Symposium on Information Theory (ISIT), Chicago, IL, USA, 27 June–2 July 2004; p. 31. [Google Scholar]
- Endres, D.M.; Schindelin, J.E. A new metric for probability distributions. IEEE Trans. Inf. Theory 2003, 49, 1858–1860. [Google Scholar] [CrossRef]
- Okamura, K. Metrization of powers of the Jensen-Shannon divergence. arXiv 2023, arXiv:2302.10070. [Google Scholar] [CrossRef]
- Sibson, R. Information radius. Z. FÜR Wahrscheinlichkeitstheorie Und Verwandte Geb. 1969, 14, 149–160. [Google Scholar] [CrossRef]
- Briët, J.; Harremoës, P. Properties of classical and quantum Jensen-Shannon divergence. Phys. Rev. A 2009, 79, 052311. [Google Scholar] [CrossRef]
- Virosztek, D. The metric property of the quantum Jensen-Shannon divergence. Adv. Math. 2021, 380, 107595. [Google Scholar] [CrossRef]
- Ali, S.M.; Silvey, S.D. A general class of coefficients of divergence of one distribution from another. J. R. Stat. Soc. Ser. Methodol. 1966, 28, 131–142. [Google Scholar] [CrossRef]
- Csiszár, I. Information-type measures of difference of probability distributions and indirect observation. Stud. Sci. Math. Hung. 1967, 2, 229–318. [Google Scholar]
- Amari, S.i. Information Geometry and Its Applications; Applied Mathematical Sciences; Springer: Tokyo, Japan, 2016. [Google Scholar]
- Csiszár, I.; Shields, P.C. Information Theory and Statistics: A Tutorial. (Foundations and Trends® in Communications and Information Theory); Now Publishers Inc.: Hanover, MA, USA, 2004; Volume 1, pp. 417–528. [Google Scholar]
- Osterreicher, F.; Vajda, I. A new class of metric divergences on probability spaces and its applicability in statistics. Ann. Inst. Stat. Math. 2003, 55, 639–653. [Google Scholar] [CrossRef]
- Schoenberg, I.J. Metric spaces and completely monotone functions. Ann. Math. 1938, 39, 811–841. [Google Scholar] [CrossRef]
- Nielsen, F. On the Jensen–Shannon symmetrization of distances relying on abstract means. Entropy 2019, 21, 485. [Google Scholar] [CrossRef]
- Bullen, P.S. Handbook of Means and Their Inequalities; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013; Volume 560. [Google Scholar]
- Yamano, T. Some bounds for skewed α-Jensen-Shannon divergence. Results Appl. Math. 2019, 3, 100064. [Google Scholar] [CrossRef]
- Nielsen, F. Revisiting Chernoff information with likelihood ratio exponential families. Entropy 2022, 24, 1400. [Google Scholar] [CrossRef]
- Jerfel, G.; Wang, S.; Wong-Fannjiang, C.; Heller, K.A.; Ma, Y.; Jordan, M.I. Variational refinement for importance sampling using the forward Kullback-Leibler divergence. In Proceedings of the Uncertainty in Artificial Intelligence, PMLR, Online, 27–30 July 2021; pp. 1819–1829. [Google Scholar]
- Asadi, M.; Ebrahimi, N.; Kharazmi, O.; Soofi, E.S. Mixture models, Bayes Fisher information, and divergence measures. IEEE Trans. Inf. Theory 2018, 65, 2316–2321. [Google Scholar] [CrossRef]
- Grosse, R.B.; Maddison, C.J.; Salakhutdinov, R.R. Annealing between distributions by averaging moments. Adv. Neural Inf. Process. Syst. 2013, 26, 1–12. [Google Scholar]
- Amari, S.I. Integration of stochastic models by minimizing α-divergence. Neural Comput. 2007, 19, 2780–2796. [Google Scholar] [CrossRef]
- Bhattacharyya, A. On a measure of divergence between two multinomial populations. Sankhyā Indian J. Stat. 1946, 7, 401–406. [Google Scholar]
- Melville, P.; Yang, S.M.; Saar-Tsechansky, M.; Mooney, R. Active learning for probability estimation using Jensen-Shannon divergence. In Proceedings of the European Conference on Machine Learning, Porto, Portugal, 3–7 October 2005; pp. 268–279. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
- Sutter, T.; Daunhawer, I.; Vogt, J. Multimodal generative learning utilizing Jensen-Shannon-divergence. Adv. Neural Inf. Process. Syst. 2020, 33, 6100–6110. [Google Scholar]
- Michalowicz, J.V.; Nichols, J.M.; Bucholtz, F. Calculation of differential entropy for a mixed Gaussian distribution. Entropy 2008, 10, 200–206. [Google Scholar] [CrossRef]
- Deasy, J.; Simidjievski, N.; Liò, P. Constraining variational inference with geometric Jensen-Shannon divergence. Adv. Neural Inf. Process. Syst. 2020, 33, 10647–10658. [Google Scholar]
- Deasy, J.; McIver, T.A.; Simidjievski, N.; Lio, P. α-VAEs: Optimising variational inference by learning data-dependent divergence skew. In Proceedings of the ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models, Virtual, 23 July 2021. [Google Scholar]
- Kumari, J.; Deepak, G.; Santhanavijayan, A. RDS: Related document search for economics data using ontologies and hybrid semantics. In Proceedings of the International Conference on Data Analytics and Insights, Kolkata, India, 11–13 May 2023; pp. 691–702. [Google Scholar]
- Ni, S.; Lin, C.; Wang, H.; Li, Y.; Liao, Y.; Li, N. Learning geometric Jensen-Shannon divergence for tiny object detection in remote sensing images. Front. Neurorobot. 2023, 17, 1273251. [Google Scholar] [CrossRef]
- Sachdeva, R.; Gakhar, R.; Awasthi, S.; Singh, K.; Pandey, A.; Parihar, A.S. Uncertainty and Noise Aware Decision Making for Autonomous Vehicles—A Bayesian Approach. IEEE Trans. Veh. Technol. 2024, 74, 378–389. [Google Scholar] [CrossRef]
- Wang, J.; Massiceti, D.; Hu, X.; Pavlovic, V.; Lukasiewicz, T. NP-SemiSeg: When neural processes meet semi-supervised semantic segmentation. In Proceedings of the International Conference on Machine Learning, PMLR, Honolulu, HI, USA, 23–29 July 2023; pp. 36138–36156. [Google Scholar]
- Serra, G.; Stavrou, P.A.; Kountouris, M. On the computation of the Gaussian rate–distortion–perception function. IEEE J. Sel. Areas Inf. Theory 2024, 5, 314–330. [Google Scholar] [CrossRef]
- Thiagarajan, P.; Ghosh, S. Jensen–Shannon divergence based novel loss functions for Bayesian neural networks. Neurocomputing 2025, 618, 129115. [Google Scholar] [CrossRef]
- Hanselmann, N.; Doll, S.; Cordts, M.; Lensch, H.P.; Geiger, A. EMPERROR: A Flexible Generative Perception Error Model for Probing Self-Driving Planners. IEEE Robot. Autom. Lett. 2025, 10, 5807–5814. [Google Scholar] [CrossRef]
- Fujisawa, H.; Eguchi, S. Robust parameter estimation with a small bias against heavy contamination. J. Multivar. Anal. 2008, 99, 2053–2081. [Google Scholar] [CrossRef]
- Jones, L.K.; Byrne, C.L. General entropy criteria for inverse problems, with applications to data compression, pattern classification, and cluster analysis. IEEE Trans. Inf. Theory 2002, 36, 23–30. [Google Scholar] [CrossRef]
- Nishimura, T.; Komaki, F. The information geometric structure of generalized empirical likelihood estimators. Commun. Stat. Methods 2008, 37, 1867–1879. [Google Scholar] [CrossRef]
- Nielsen, F. Generalized Bhattacharyya and Chernoff upper bounds on Bayes error using quasi-arithmetic means. Pattern Recognit. Lett. 2014, 42, 25–34. [Google Scholar] [CrossRef][Green Version]
- Barndorff-Nielsen, O. Information and Exponential Families: In Statistical Theory; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
- Kailath, T. The Divergence and Bhattacharyya Distance Measures in Signal Selection. IEEE Trans. Commun. Technol. 1967, 15, 52–60. [Google Scholar] [CrossRef]
- Nielsen, F.; Boltz, S. The Burbea-Rao and Bhattacharyya centroids. IEEE Trans. Inf. Theory 2011, 57, 5455–5466. [Google Scholar] [CrossRef]
- Grünwald, P.D. The Minimum Description Length Principle; MIT Press: Cambridge, MA USA, 2007. [Google Scholar]
- Cena, A.; Pistone, G. Exponential statistical manifold. Ann. Inst. Stat. Math. 2007, 59, 27–56. [Google Scholar] [CrossRef]
- Taneja, I.J. New developments in generalized information measures. In Advances in Imaging and Electron Physics; Elsevier: Amsterdam, The Netherlands, 1995; Volume 91, pp. 37–135. [Google Scholar]
- Rubinstein, R.Y.; Kroese, D.P. Simulation and the Monte Carlo Method; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
- Cobb, L.; Koppstein, P.; Chen, N.H. Estimation and moment recursion relations for multimodal distributions of the exponential family. J. Am. Stat. Assoc. 1983, 78, 124–130. [Google Scholar] [CrossRef]
- Hayakawa, J.; Takemura, A. Estimation of exponential-polynomial distribution by holonomic gradient descent. Commun. Stat.-Theory Methods 2016, 45, 6860–6882. [Google Scholar] [CrossRef]
- Kloek, T.; Van Dijk, H.K. Bayesian estimates of equation system parameters: An application of integration by Monte Carlo. Econom. J. Econom. Soc. 1978, 46, 1–19. [Google Scholar] [CrossRef]
- Banerjee, A.; Chen, T.; Li, X.; Zhou, Y. Stability based generalization bounds for exponential family Langevin dynamics. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MA, USA, 17–23 July 2022; pp. 1412–1449. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nielsen, F. Two Types of Geometric Jensen–Shannon Divergences. Entropy 2025, 27, 947. https://doi.org/10.3390/e27090947
Nielsen F. Two Types of Geometric Jensen–Shannon Divergences. Entropy. 2025; 27(9):947. https://doi.org/10.3390/e27090947
Chicago/Turabian StyleNielsen, Frank. 2025. "Two Types of Geometric Jensen–Shannon Divergences" Entropy 27, no. 9: 947. https://doi.org/10.3390/e27090947
APA StyleNielsen, F. (2025). Two Types of Geometric Jensen–Shannon Divergences. Entropy, 27(9), 947. https://doi.org/10.3390/e27090947