On the Jensen–Shannon Symmetrization of Distances Relying on Abstract Means
Abstract
:1. Introduction and Motivations
1.1. Kullback–Leibler Divergence and Its Symmetrizations
1.2. Statistical Distances and Parameter Divergences
1.3. J-Symmetrization and -Symmetrization of Distances
1.4. Contributions and Paper Outline
2. Jensen–Shannon Divergence in Mixture and Exponential Families
3. Generalized Jensen–Shannon Divergences
Definitions
- the arithmetic mean ,
- the geometric mean , and
- the harmonic mean .
4. Some Closed-Form Formula for the M-Jensen–Shannon Divergences
- the harmonic H-Jensen–Shannon divergence for the family of Cauchy scale distributions (Section 4.2).
4.1. The Geometric G-Jensen–Shannon Divergence
4.1.1. Case Study: The Multivariate Gaussian Family
4.1.2. Applications to k-Means Clustering
4.2. The Harmonic Jensen–Shannon Divergence (H-)
4.3. The M-Jensen–Shannon Matrix Distances
5. Conclusions and Perspectives
Funding
Conflicts of Interest
Appendix A. Summary of Distances and Their Notations
Weighted mean | , |
Arithmetic mean | |
Geometric mean | |
Harmonic mean | |
Power mean | , |
Quasi-arithmetic mean | , f strictly monotonous |
M-mixture | with |
Statistical distance | |
Dual/reverse distance | |
Kullback-Leibler divergence | |
reverse Kullback-Leibler divergence | |
Jeffreys divergence | |
Resistor divergence | . |
skew K-divergence | |
Jensen-Shannon divergence | |
skew Bhattacharrya divergence | |
Hellinger distance | |
-divergences | |
Mahalanobis distance | for a positive-definite matrix |
f-divergence | , with f strictly convex at 1 |
reverse f-divergence | for |
J-symmetrized f-divergence | |
JS-symmetrized f-divergence | for |
Parameter distance | |
Bregman divergence | |
skew Jeffreys-Bregman divergence | |
skew Jensen divergence | |
Jensen-Bregman divergence | . |
Generalized Jensen-Shannon divergences | |
skew J-symmetrization | |
skew -symmetrization | |
skew M-Jensen-Shannon divergence | |
skew M--symmetrization | |
N-Jeffreys divergence | |
N-J D divergence | |
skew -D JS divergence |
Appendix B. Symbolic Calculations in Maxima
assume(gamma>0); |
Cauchy(x,gamma) := gamma/(%pi∗(x∗∗2+gamma∗∗2)); |
assume(alpha>0); |
assume(alpha<1); |
h(x,y,alpha) := (x∗y)/((1-alpha)∗y+alpha∗x); |
assume(gamma1>0); |
assume(gamma2>0); |
m(x,alpha) := ratsimp(h(Cauchy(x,gamma1),Cauchy(x,gamma2),alpha)); |
/∗ calculate Z ∗/ |
integrate(m(x,alpha),x,-inf,inf); |
References
- Billingsley, P. Probability and Measure; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
- Ho, S.W.; Yeung, R.W. On the discontinuity of the Shannon information measures. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Adelaide, Australia, 4–9 September 2005; pp. 159–163. [Google Scholar]
- Nielsen, F. Jeffreys centroids: A closed-form expression for positive histograms and a guaranteed tight approximation for frequency histograms. IEEE Signal Process. Lett. 2013, 20, 657–660. [Google Scholar] [CrossRef]
- Johnson, D.; Sinanovic, S. Symmetrizing the Kullback-Leibler Distance. Technical report of Rice University (US). 2001. Available online: https://scholarship.rice.edu/handle/1911/19969 (accessed on 11 May 2019).
- Lin, J. Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 1991, 37, 145–151. [Google Scholar] [CrossRef]
- Nielsen, F.; Boltz, S. The Burbea-Rao and Bhattacharyya centroids. IEEE Trans. Inf. Theory 2011, 57, 5455–5466. [Google Scholar] [CrossRef]
- Vajda, I. On metric divergences of probability measures. Kybernetika 2009, 45, 885–900. [Google Scholar]
- Fuglede, B.; Topsoe, F. Jensen-Shannon divergence and Hilbert space embedding. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Waikiki, HI, USA, 29 June–4 July 2014; p. 31. [Google Scholar]
- Sims, G.E.; Jun, S.R.; Wu, G.A.; Kim, S.H. Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proc. Natl. Acad. Sci. USA 2009, 106, 2677–2682. [Google Scholar] [CrossRef]
- DeDeo, S.; Hawkins, R.X.; Klingenstein, S.; Hitchcock, T. Bootstrap methods for the empirical study of decision-making and information flows in social systems. Entropy 2013, 15, 2246–2276. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2014; pp. 2672–2680. [Google Scholar]
- Wang, Y.; Woods, K.; McClain, M. Information-theoretic matching of two point sets. IEEE Trans. Image Process. 2002, 11, 868–872. [Google Scholar] [CrossRef]
- Peter, A.M.; Rangarajan, A. Information geometry for landmark shape analysis: Unifying shape representation and deformation. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 337–350. [Google Scholar] [CrossRef] [PubMed]
- Nielsen, F.; Sun, K. Guaranteed bounds on information-theoretic measures of univariate mixtures using piecewise log-sum-exp inequalities. Entropy 2016, 18, 442. [Google Scholar] [CrossRef]
- Wang, F.; Syeda-Mahmood, T.; Vemuri, B.C.; Beymer, D.; Rangarajan, A. Closed-form Jensen-Rényi divergence for mixture of Gaussians and applications to group-wise shape registration. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI); Springer: Berlin, Germany, 2009; pp. 648–655. [Google Scholar]
- Nielsen, F. Closed-form information-theoretic divergences for statistical mixtures. In Proceedings of the IEEE 21st International Conference on Pattern Recognition (ICPR2012), Tsukuba, Japan, 11–15 November 2012; pp. 1723–1726. [Google Scholar]
- Amari, S.I. Information Geometry and Its Applications; Springer: Berlin, Germany, 2016. [Google Scholar]
- Csiszár, I. Information-type measures of difference of probability distributions and indirect observation. Stud. Sci. Math. Hung. 1967, 2, 229–318. [Google Scholar]
- Eguchi, S. Geometry of minimum contrast. Hiroshima Math. J. 1992, 22, 631–647. [Google Scholar] [CrossRef]
- Amari, S.I.; Cichocki, A. Information geometry of divergence functions. Bull. Pol. Acad. Sci. Tech. Sci. 2010, 58, 183–195. [Google Scholar] [CrossRef]
- Ciaglia, F.M.; Di Cosmo, F.; Felice, D.; Mancini, S.; Marmo, G.; Pérez-Pardo, J.M. Hamilton-Jacobi approach to potential functions in information geometry. J. Math. Phys. 2017, 58, 063506. [Google Scholar] [CrossRef]
- Banerjee, A.; Merugu, S.; Dhillon, I.S.; Ghosh, J. Clustering with Bregman divergences. J. Mach. Learn. Res. 2005, 6, 1705–1749. [Google Scholar]
- Nielsen, F. A family of statistical symmetric divergences based on Jensen’s inequality. arXiv 2010, arXiv:1009.4004. [Google Scholar]
- Chen, P.; Chen, Y.; Rao, M. Metrics defined by Bregman divergences. Commun. Math. Sci. 2008, 6, 915–926. [Google Scholar] [CrossRef]
- Chen, P.; Chen, Y.; Rao, M. Metrics defined by Bregman divergences: Part 2. Commun. Math. Sci. 2008, 6, 927–948. [Google Scholar] [CrossRef]
- Kafka, P.; Österreicher, F.; Vincze, I. On powers of f-divergences defining a distance. Stud. Sci. Math. Hung. 1991, 26, 415–422. [Google Scholar]
- Österreicher, F.; Vajda, I. A new class of metric divergences on probability spaces and its applicability in statistics. Ann. Inst. Stat. Math. 2003, 55, 639–653. [Google Scholar] [CrossRef]
- Nielsen, F.; Nock, R. On the geometry of mixtures of prescribed distributions. In Proceeding of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 Aprli 2018; pp. 2861–2865. [Google Scholar]
- Nielsen, F.; Hadjeres, G. Monte Carlo Information Geometry: The dually flat case. arXiv 2018, arXiv:1803.07225. [Google Scholar]
- Watanabe, S.; Yamazaki, K.; Aoyagi, M. Kullback information of normal mixture is not an analytic function. IEICE Tech. Rep. Neurocomput. 2004, 104, 41–46. [Google Scholar]
- Nielsen, F.; Nock, R. On the chi square and higher-order chi distances for approximating f-divergences. IEEE Signal Process. Lett. 2014, 21, 10–13. [Google Scholar] [CrossRef]
- Nielsen, F.; Hadjeres, G. On power chi expansions of f-divergences. arXiv 2019, arXiv:1903.05818. [Google Scholar]
- Niculescu, C.; Persson, L.E. Convex Functions and Their Applications, 2nd ed.; Springer: Berlin, Germany, 2018. [Google Scholar]
- Rényi, A. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics; The Regents of the University of California: Oakland, CA, USA, 1961. [Google Scholar]
- McLachlan, G.J.; Lee, S.X.; Rathnayake, S.I. Finite mixture models. Ann. Rev. Stat. Appl. 2019, 6, 355–378. [Google Scholar] [CrossRef]
- Nielsen, F.; Garcia, V. Statistical exponential families: A digest with flash cards. arXiv 2009, arXiv:0911.4863. [Google Scholar]
- Nielsen, F. Generalized Bhattacharyya and Chernoff upper bounds on Bayes error using quasi-arithmetic means. Pattern Recognit. Lett. 2014, 42, 25–34. [Google Scholar] [CrossRef]
- Eguchi, S.; Komori, O. Path connectedness on a space of probability density functions. In Geometric Science of Information (GSI); Springer: Cham, Switzerland, 2015; pp. 615–624. [Google Scholar]
- Eguchi, S.; Komori, O.; Ohara, A. Information geometry associated with generalized means. In Information Geometry and its Applications IV; Springer: Berlin, Germany, 2016; pp. 279–295. [Google Scholar]
- Asadi, M.; Ebrahimi, N.; Kharazmi, O.; Soofi, E.S. Mixture models, Bayes Fisher information, and divergence measures. IEEE Trans. Inf. Theory 2019, 65, 2316–2321. [Google Scholar] [CrossRef]
- Amari, S.I. Integration of stochastic models by minimizing α-divergence. Neural Comput. 2007, 19, 2780–2796. [Google Scholar] [CrossRef]
- Nielsen, F.; Nock, R. Generalizing skew Jensen divergences and Bregman divergences with comparative convexity. IEEE Signal Process. Lett. 2017, 24, 1123–1127. [Google Scholar] [CrossRef]
- Lee, L. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA, 20–26 June 1999; pp. 25–32. [Google Scholar] [CrossRef]
- Nielsen, F. The statistical Minkowski distances: Closed-form formula for Gaussian mixture models. arXiv 2019, arXiv:1901.03732. [Google Scholar]
- Zhang, J. Reference duality and representation duality in information geometry. AIP Conf. Proc. 2015, 1641, 130–146. [Google Scholar]
- Yoshizawa, S.; Tanabe, K. Dual differential geometry associated with the Kullback-Leibler information on the Gaussian distributions and its 2-parameter deformations. SUT J. Math. 1999, 35, 113–137. [Google Scholar]
- Nielsen, F.; Nock, R. A closed-form expression for the Sharma–Mittal entropy of exponential families. J. Phys. A Math. Theor. 2011, 45, 032003. [Google Scholar] [CrossRef]
- Nielsen, F. An elementary introduction to information geometry. arXiv 2018, arXiv:1808.08271. [Google Scholar]
- Nielsen, F.; Nock, R. Optimal interval clustering: Application to Bregman clustering and statistical mixture learning. IEEE Signal Process. Lett. 2014, 21, 1289–1292. [Google Scholar] [CrossRef]
- Arthur, D.; Vassilvitskii, S. k-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics; ACM: New York, NY, USA, 2007; pp. 1027–1035. [Google Scholar]
- Nielsen, F.; Nock, R.; Amari, S.I. On clustering histograms with k-means by using mixed α-divergences. Entropy 2014, 16, 3273–3301. [Google Scholar] [CrossRef]
- Nielsen, F.; Nock, R. Total Jensen divergences: definition, properties and clustering. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, QLD, Australia, 19–24 August 2015; pp. 2016–2020. [Google Scholar]
- Ackermann, M.R.; Blömer, J. Bregman clustering for separable instances. In Scandinavian Workshop on Algorithm Theory; Springer: Berlin, Germany, 2010; pp. 212–223. [Google Scholar]
- Nielsen, F.; Nock, R. Sided and symmetrized Bregman centroids. IEEE Trans. Inf. Theory 2009, 55, 2882–2904. [Google Scholar] [CrossRef]
- Tzagkarakis, G.; Tsakalides, P. A statistical approach to texture image retrieval via alpha-stable modeling of wavelet decompositions. In Proceedings of the 5th International Workshop on Image Analysis for Multimedia Interactive Services, Instituto Superior Técnico, Lisboa, Portugal, 21–23 April 2004; pp. 21–23. [Google Scholar]
- Boissonnat, J.D.; Nielsen, F.; Nock, R. Bregman Voronoi diagrams. Discrete Comput. Geom. 2010, 44, 281–307. [Google Scholar] [CrossRef]
- Naudts, J. Generalised Thermostatistics; Springer Science & Business Media: Berlin, Germany, 2011. [Google Scholar]
- Briët, J.; Harremoës, P. Properties of classical and quantum Jensen-Shannon divergence. Phys. Rev. A 2009, 79, 052311. [Google Scholar] [CrossRef]
- Audenaert, K.M. Quantum skew divergence. J. Math. Phys. 2014, 55, 112202. [Google Scholar] [CrossRef]
- Cherian, A.; Sra, S.; Banerjee, A.; Papanikolopoulos, N. Jensen-Bregman logdet divergence with application to efficient similarity search for covariance matrices. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2161–2174. [Google Scholar] [CrossRef]
- Bhatia, R.; Jain, T.; Lim, Y. Strong convexity of sandwiched entropies and related optimization problems. Rev. Math. Phys. 2018, 30, 1850014. [Google Scholar] [CrossRef]
- Kulis, B.; Sustik, M.A.; Dhillon, I.S. Low-rank kernel learning with Bregman matrix divergences. J. Mach. Learn. Res. 2009, 10, 341–376. [Google Scholar]
- Nock, R.; Magdalou, B.; Briys, E.; Nielsen, F. Mining matrix data with Bregman matrix divergences for portfolio selection. In Matrix Information Geometry; Springer: Berlin, Germany, 2013; pp. 373–402. [Google Scholar]
Mean M | Parametric Family | ||
---|---|---|---|
arithmetic A | mixture family | ||
geometric G | exponential family | ||
harmonic H | Cauchy scale family |
© 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nielsen, F. On the Jensen–Shannon Symmetrization of Distances Relying on Abstract Means. Entropy 2019, 21, 485. https://doi.org/10.3390/e21050485
Nielsen F. On the Jensen–Shannon Symmetrization of Distances Relying on Abstract Means. Entropy. 2019; 21(5):485. https://doi.org/10.3390/e21050485
Chicago/Turabian StyleNielsen, Frank. 2019. "On the Jensen–Shannon Symmetrization of Distances Relying on Abstract Means" Entropy 21, no. 5: 485. https://doi.org/10.3390/e21050485