# Guaranteed Bounds on Information-Theoretic Measures of Univariate Mixtures Using Piecewise Log-Sum-Exp Inequalities

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Prior Work

#### 1.2. Contributions

#### 1.3. Paper Outline

## 2. A Generic Combinatorial Bounding Algorithm Based on Density Envelopes

#### 2.1. Tighter Adaptive Bounds

#### 2.2. Another Derivation Using the Arithmetic-Geometric Mean Inequality

#### 2.3. Case Studies

#### 2.3.1. The Case of Exponential Mixture Models

#### 2.3.2. The Case of Rayleigh Mixture Models

#### 2.3.3. The Case of Gaussian Mixture Models

#### 2.3.4. The Case of Gamma Distributions

## 3. Upper-Bounding the Differential Entropy of a Mixture

**Lemma**

**1.**

**Lemma**

**2.**

## 4. Bounding the α-Divergence

#### 4.1. Basic Bounds

#### 4.2. Adaptive Bounds

#### 4.3. Variance-Reduced Bounds

## 5. Lower Bounds of the $\mathit{f}$-Divergence

## 6. Experiments

- ${\mathtt{EMM}}_{1}$’s components, in the form $({\lambda}_{i},{w}_{i})$, are given by $(0.1,1/3)$, $(0.5,1/3)$, $(1,1/3)$; ${\mathtt{EMM}}_{2}$’s components are $(2,0.2)$, $(10,0.4)$, $(20,0.4)$.
- ${\mathtt{RMM}}_{1}$’s components, in the form $({\sigma}_{i},{w}_{i})$, are given by $(0.5,1/3)$, $(2,1/3)$, $(10,1/3)$; ${\mathtt{RMM}}_{2}$ consists of $(5,0.25)$, $(60,0.25)$, $(100,0.5)$.
- ${\mathtt{GMM}}_{1}$’s components, in the form $({\mu}_{i},{\sigma}_{i},{w}_{i})$, are $(-5,1,0.05)$, $(-2,0.5,0.1)$, $(5,0.3,0.2)$, $(10,0.5,0.2)$, $(15,0.4,0.05)$, $(25,0.5,0.3)$, $(30,2,0.1)$; ${\mathtt{GMM}}_{2}$ consists of $(-16,0.5,0.1)$, $(-12,0.2,0.1)$, $(-8,0.5,0.1)$, $(-4,0.2,0.1)$, $(0,0.5,0.2)$, $(4,0.2,0.1)$, $(8,0.5,0.1)$, $(12,0.2,0.1)$, $(16,0.5,0.1)$.
- ${\mathtt{GaMM}}_{1}$’s components, in the form $({k}_{i},{\lambda}_{i},{w}_{i})$, are $(2,0.5,1/3)$, $(2,2,1/3)$, $(2,4,1/3)$; ${\mathtt{GaMM}}_{2}$ consists of $(2,5,1/3)$, $(2,8,1/3)$, $(2,10,1/3)$.

## 7. Concluding Remarks and Perspectives

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## Appendix A. The Kullback–Leibler Divergence of Mixture Models Is Not Analytic [6]

## Appendix B. Closed-Form Formula for the Kullback–Leibler Divergence between Scaled and Truncated Exponential Families

## Appendix C. On the Approximation of KL between Smooth Mixtures by a Bregman Divergence [5]

- First, continuous mixture distributions have smooth densities that can be arbitrarily closely approximated using a single distribution (potentially multi-modal) belonging to the Polynomial Exponential Families [53,54] (PEFs). A polynomial exponential family of order D has log-likelihood $l(x;\theta )\propto {\sum}_{i=1}^{D}{\theta}_{i}{x}^{i}$: Therefore, a PEF is an exponential family with polynomial sufficient statistics $t\left(x\right)=(x,{x}^{2},\dots ,{x}^{D})$. However, the log-normalizer ${F}_{D}\left(\theta \right)=log\int exp\left({\theta}^{\top}t\left(x\right)\right)\mathrm{d}x$ of a D-order PEF is not available in closed-form: It is computationally intractable. Nevertheless, the KL between two mixtures $m\left(x\right)$ and ${m}^{\prime}\left(x\right)$ can be theoretically approximated closely by a Bregman divergence between the two corresponding PEFs: $\mathrm{KL}(m\left(x\right):{m}^{\prime}\left(x\right))\simeq \mathrm{KL}(p(x;\theta ):p(x;{\theta}^{\prime}))={B}_{{F}_{D}}({\theta}^{\prime}$:$\theta )$, where θ and ${\theta}^{\prime}$ are the natural parameters of the PEF family $\left\{p\right(x;\theta \left)\right\}$ approximating $m\left(x\right)$ and ${m}^{\prime}\left(x\right)$, respectively (i.e., $m\left(x\right)\simeq p(x;\theta )$ and ${m}^{\prime}\left(x\right)\simeq p(x;{\theta}^{\prime})$). Notice that the Bregman divergence of PEFs has necessarily finite value but the KL of two smooth mixtures can potentially diverge (infinite value), hence the conditions on Jeffreys divergence to be finite.
- Second, consider two finite mixtures $m\left(x\right)={\sum}_{i=1}^{k}{w}_{i}{p}_{i}\left(x\right)$ and ${m}^{\prime}\left(x\right)={\sum}_{j=1}^{{k}^{\prime}}{w}_{j}^{\prime}{p}_{j}^{\prime}\left(x\right)$ of k and ${k}^{\prime}$ components (possibly with heterogeneous components ${p}_{i}\left(x\right)$’s and ${p}_{j}^{\prime}\left(x\right)$’s), respectively. In information geometry, a mixture family is the set of convex combination of fixed component densities. Thus in statistics, a mixture is understood as a convex combination of parametric components while in information geometry a mixture family is the set of convex combination of fixed components. Let us consider the mixture families $\left\{g(x;(w,{w}^{\prime}))\right\}$ generated by the $D=k+{k}^{\prime}$ fixed components ${p}_{1}\left(x\right),\dots ,{p}_{k}\left(x\right),{p}_{1}^{\prime}\left(x\right),\dots ,{p}_{{k}^{\prime}}^{\prime}\left(x\right)$:$$\left\{g(x;(w,{w}^{\prime}))=\sum _{i=1}^{k}{w}_{i}{p}_{i}\left(x\right)+\sum _{j=1}^{{k}^{\prime}}{w}_{j}^{\prime}{p}_{j}^{\prime}\left(x\right)\phantom{\rule{4pt}{0ex}}:\phantom{\rule{4pt}{0ex}}\sum _{i=1}^{k}{w}_{i}+\sum _{j=1}^{{k}^{\prime}}{w}_{j}^{\prime}=1\right\}$$

## References

- Huang, Z.K.; Chau, K.W. A new image thresholding method based on Gaussian mixture model. Appl. Math. Comput.
**2008**, 205, 899–907. [Google Scholar] [CrossRef] - Seabra, J.; Ciompi, F.; Pujol, O.; Mauri, J.; Radeva, P.; Sanches, J. Rayleigh mixture model for plaque characterization in intravascular ultrasound. IEEE Trans. Biomed. Eng.
**2011**, 58, 1314–1324. [Google Scholar] [CrossRef] [PubMed] - Julier, S.J.; Bailey, T.; Uhlmann, J.K. Using Exponential Mixture Models for Suboptimal Distributed Data Fusion. In Proceedings of the 2006 IEEE Nonlinear Statistical Signal Processing Workshop, Cambridge, UK, 13–15 September 2006; IEEE: New York, NY, USA, 2006; pp. 160–163. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
- Banerjee, A.; Merugu, S.; Dhillon, I.S.; Ghosh, J. Clustering with Bregman divergences. J. Mach. Learn. Res.
**2005**, 6, 1705–1749. [Google Scholar] - Watanabe, S.; Yamazaki, K.; Aoyagi, M. Kullback Information of Normal Mixture is Not an Analytic Function; Technical Report of IEICE Neurocomputing; The Institute of Electronics, Information and Communication Engineers: Tokyo, Japan, 2004; pp. 41–46. (In Japanese) [Google Scholar]
- Michalowicz, J.V.; Nichols, J.M.; Bucholtz, F. Calculation of differential entropy for a mixed Gaussian distribution. Entropy
**2008**, 10, 200–206. [Google Scholar] [CrossRef] - Pichler, G.; Koliander, G.; Riegler, E.; Hlawatsch, F. Entropy for singular distributions. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Honolulu, HI, USA, 29 June–4 July 2014; pp. 2484–2488.
- Huber, M.F.; Bailey, T.; Durrant-Whyte, H.; Hanebeck, U.D. On entropy approximation for Gaussian mixture random vectors. In Proceedings of the IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, Seoul, Korea, 20–22 August 2008; IEEE: New York, NY, USA, 2008; pp. 181–188. [Google Scholar]
- Yamada, M.; Sugiyama, M. Direct importance estimation with Gaussian mixture models. IEICE Trans. Inf. Syst.
**2009**, 92, 2159–2162. [Google Scholar] [CrossRef] - Durrieu, J.L.; Thiran, J.P.; Kelly, F. Lower and upper bounds for approximation of the Kullback-Leibler divergence between Gaussian Mixture Models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 25–30 March 2012; IEEE: New York, NY, USA, 2012; pp. 4833–4836. [Google Scholar]
- Schwander, O.; Marchand-Maillet, S.; Nielsen, F. Comix: Joint estimation and lightspeed comparison of mixture models. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, 20–25 March 2016; pp. 2449–2453.
- Moshksar, K.; Khandani, A.K. Arbitrarily Tight Bounds on Differential Entropy of Gaussian Mixtures. IEEE Trans. Inf. Theory
**2016**, 62, 3340–3354. [Google Scholar] [CrossRef] - Mezuman, E.; Weiss, Y. A Tight Convex Upper Bound on the Likelihood of a Finite Mixture. arXiv
**2016**. [Google Scholar] - Amari, S.-I. Information Geometry and Its Applications; Springer: Tokyo, Japan, 2016; Volume 194. [Google Scholar]
- Nielsen, F.; Sun, K. Guaranteed Bounds on the Kullback–Leibler Divergence of Univariate Mixtures. IEEE Signal Process. Lett.
**2016**, 23, 1543–1546. [Google Scholar] [CrossRef] - Nielsen, F.; Garcia, V. Statistical exponential families: A digest with flash cards. arXiv
**2009**. [Google Scholar] - Calafiore, G.C.; El Ghaoui, L. Optimization Models; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
- Shen, C.; Li, H. On the dual formulation of boosting algorithms. IEEE Trans. Pattern Anal. Mach. Intell.
**2010**, 32, 2216–2231. [Google Scholar] [CrossRef] [PubMed] - Beck, A. Introduction to Nonlinear Optimization: Theory, Algorithms, and Applications with MATLAB; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2014. [Google Scholar]
- Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
- De Berg, M.; van Kreveld, M.; Overmars, M.; Schwarzkopf, O.C. Computational Geometry; Springer: Heidelberg, Germany, 2000. [Google Scholar]
- Setter, O.; Sharir, M.; Halperin, D. Constructing Two-Dimensional Voronoi Diagrams via Divide-and-Conquer of Envelopes in Space; Springer: Heidelberg, Germany, 2010. [Google Scholar]
- Devillers, O.; Golin, M.J. Incremental algorithms for finding the convex hulls of circles and the lower envelopes of parabolas. Inf. Process. Lett.
**1995**, 56, 157–164. [Google Scholar] [CrossRef] - Nielsen, F.; Yvinec, M. An output-sensitive convex hull algorithm for planar objects. Int. J. Comput. Geom. Appl.
**1998**, 8, 39–65. [Google Scholar] [CrossRef] - Nielsen, F.; Nock, R. Entropies and cross-entropies of exponential families. In Proceedings of the 17th IEEE International Conference on Image Processing (ICIP), Hong Kong, China, 26–29 September 2010; IEEE: New York, NY, USA, 2010; pp. 3621–3624. [Google Scholar]
- Sharir, M.; Agarwal, P.K. Davenport-Schinzel Sequences and Their Geometric Applications; Cambridge University Press: Cambridge, UK, 1995. [Google Scholar]
- Bronstein, M. Algorithms and computation in mathematics. In Symbolic Integration. I. Transcendental Functions; Springer: Berlin, Germany, 2005. [Google Scholar]
- Carreira-Perpinan, M.A. Mode-finding for mixtures of Gaussian distributions. IEEE Trans. Pattern Anal. Mach. Intell.
**2000**, 22, 1318–1323. [Google Scholar] [CrossRef] - Aprausheva, N.N.; Sorokin, S.V. Exact equation of the boundary of unimodal and bimodal domains of a two-component Gaussian mixture. Pattern Recognit. Image Anal.
**2013**, 23, 341–347. [Google Scholar] [CrossRef] - Learned-Miller, E.; DeStefano, J. A probabilistic upper bound on differential entropy. IEEE Trans. Inf. Theory
**2008**, 54, 5223–5230. [Google Scholar] [CrossRef] - Amari, S.-I. α-Divergence Is Unique, Belonging to Both f-Divergence and Bregman Divergence Classes. IEEE Trans. Inf. Theory
**2009**, 55, 4925–4931. [Google Scholar] [CrossRef] - Cichocki, A.; Amari, S.I. Families of Alpha- Beta- and Gamma-Divergences: Flexible and Robust Measures of Similarities. Entropy
**2010**, 12, 1532–1568. [Google Scholar] [CrossRef] - Póczos, B.; Schneider, J. On the Estimation of α-Divergences. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 11–13 April 2011; pp. 609–617.
- Nielsen, F.; Nock, R. On Rényi and Tsallis entropies and divergences for exponential families. arXiv
**2011**. [Google Scholar] - Minka, T. Divergence Measures and Message Passing; Technical Report MSR-TR-2005-173; Microsoft Research: Cambridge, UK, 2005. [Google Scholar]
- Améndola, C.; Drton, M.; Sturmfels, B. Maximum Likelihood Estimates for Gaussian Mixtures Are Transcendental. arXiv
**2015**. [Google Scholar] - Hellinger, E. Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen. J. Reine Angew. Math.
**1909**, 136, 210–271. (In German) [Google Scholar] - Van Erven, T.; Harremos, P. Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inf. Theory
**2014**, 60, 3797–3820. [Google Scholar] [CrossRef] - Nielsen, F.; Nock, R. A closed-form expression for the Sharma-Mittal entropy of exponential families. J. Phys. A Math. Theor.
**2012**, 45, 032003. [Google Scholar] [CrossRef] - Nielsen, F.; Nock, R. On the Chi Square and Higher-Order Chi Distances for Approximating f-Divergences. IEEE Signal Process. Lett.
**2014**, 21, 10–13. [Google Scholar] [CrossRef] - Nielsen, F.; Boltz, S. The Burbea-Rao and Bhattacharyya centroids. IEEE Trans. Inf. Theory
**2011**, 57, 5455–5466. [Google Scholar] [CrossRef] - Jarosz, W. Efficient Monte Carlo Methods for Light Transport in Scattering Media. Ph.D. Thesis, University of California, San Diego, CA, USA, 2008. [Google Scholar]
- Fujisawa, H.; Eguchi, S. Robust parameter estimation with a small bias against heavy contamination. J. Multivar. Anal.
**2008**, 99, 2053–2081. [Google Scholar] [CrossRef] - Havrda, J.; Charvát, F. Quantification method of classification processes. Concept of structural α-entropy. Kybernetika
**1967**, 3, 30–35. [Google Scholar] - Liang, X. A Note on Divergences. Neural Comput.
**2016**, 28, 2045–2062. [Google Scholar] [CrossRef] [PubMed] - Lin, J. Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory
**1991**, 37, 145–151. [Google Scholar] [CrossRef] - Endres, D.M.; Schindelin, J.E. A new metric for probability distributions. IEEE Trans. Inf. Theory
**2003**, 49, 1858–1860. [Google Scholar] [CrossRef] [Green Version] - Nielsen, F.; Boissonnat, J.D.; Nock, R. On Bregman Voronoi diagrams. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–9 January 2007; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2007; pp. 746–755. [Google Scholar]
- Boissonnat, J.D.; Nielsen, F.; Nock, R. Bregman Voronoi diagrams. Discret. Comput. Geom.
**2010**, 44, 281–307. [Google Scholar] [CrossRef] - Foster, D.V.; Grassberger, P. Lower bounds on mutual information. Phys. Rev. E
**2011**, 83, 010101. [Google Scholar] [CrossRef] [PubMed] - Nielsen, F.; Sun, K. PyKLGMM: Python Software for Computing Bounds on the Kullback-Leibler Divergence between Mixture Models. 2016. Available online: https://www.lix.polytechnique.fr/~nielsen/KLGMM/ (accessed on 6 December 2016).
- Cobb, L.; Koppstein, P.; Chen, N.H. Estimation and moment recursion relations for multimodal distributions of the exponential family. J. Am. Stat. Assoc.
**1983**, 78, 124–130. [Google Scholar] [CrossRef] - Nielsen, F.; Nock, R. Patch matching with polynomial exponential families and projective divergences. In Proceedings of the 9th International Conference Similarity Search and Applications (SISAP), Tokyo, Japan, 24–26 October 2016.

**Figure 1.**Lower envelope of parabolas corresponding to the upper envelope of weighted components of a Gaussian mixture with ${k}^{\prime}=3$ components.

**Figure 2.**Lower and upper bounds on the KL divergence between mixture models. The y-axis means KL divergence. Solid/dashed lines represent the combinatorial/adaptive bounds, respectively. The error-bars show the 0.95 confidence interval by Monte Carlo estimation using the corresponding sample size (x-axis). The narrow dotted bars show the CGQLB estimation w.r.t. the sample size.

**Figure 3.**Lower and upper bounds on the differential entropy of Gaussian mixture models. On the left of each subfigure is the simulated GMM signal. On the right of each subfigure is the estimation of its differential entropy. Note that a subset of the bounds coincide with each other in several cases.

**Figure 4.**Two pairs of Gaussian Mixture Models and their α-divergences against different values of α. The “true” value of ${D}_{\alpha}$ is estimated by MC using ${10}^{4}$ random samples. VR(L) and VR(U) denote the variation reduced lower and upper bounds, respectively. The range of α is selected for each pair for a clear visualization.

α | MC(${10}^{2}$) | MC(${10}^{3}$) | MC(${10}^{4}$) | Basic | Adaptive | VR | ||||
---|---|---|---|---|---|---|---|---|---|---|

L | U | L | U | L | U | |||||

${\mathtt{GMM}}_{\mathtt{1}}\phantom{\rule{0.166667em}{0ex}}\&\phantom{\rule{0.166667em}{0ex}}{\mathtt{GMM}}_{\mathtt{2}}$ | 0 | $15.96\pm 3.9$ | $12.30\pm 1.0$ | $13.63\pm 0.3$ | 11.75 | 15.89 | 12.96 | 14.63 | ||

$0.01$ | $13.36\pm 2.9$ | $10.63\pm 0.8$ | $11.66\pm 0.3$ | −700.50 | 11.73 | −77.33 | 11.73 | 11.40 | 12.27 | |

$0.5$ | $3.57\pm 0.3$ | $3.47\pm 0.1$ | $3.47\pm 0.07$ | −0.60 | 3.42 | 3.01 | 3.42 | 3.17 | 3.51 | |

$0.99$ | $40.04\pm 7.7$ | $37.22\pm 2.3$ | $38.58\pm 0.8$ | −333.90 | 39.04 | 5.36 | 38.98 | 38.28 | 38.96 | |

1 | $104.01\pm 28$ | $84.96\pm 7.2$ | $92.57\pm 2.5$ | 91.44 | 95.59 | 92.76 | 94.41 | |||

${\mathtt{GMM}}_{\mathtt{3}}\phantom{\rule{0.166667em}{0ex}}\&\phantom{\rule{0.166667em}{0ex}}{\mathtt{GMM}}_{\mathtt{4}}$ | 0 | $0.71\pm 0.2$ | $0.63\pm 0.07$ | $0.62\pm 0.02$ | 0.00 | 1.76 | 0.00 | 1.16 | ||

$0.01$ | $0.71\pm 0.2$ | $0.63\pm 0.07$ | $0.62\pm 0.02$ | −179.13 | 7.63 | −38.74 | 4.96 | 0.29 | 1.00 | |

$0.5$ | $0.82\pm 0.3$ | $0.57\pm 0.1$ | $0.62\pm 0.04$ | −5.23 | 0.93 | −0.71 | 0.85 | −0.18 | 1.19 | |

$0.99$ | $0.79\pm 0.3$ | $0.76\pm 0.1$ | $0.80\pm 0.03$ | −165.72 | 12.10 | −59.76 | 9.11 | 0.37 | 1.28 | |

1 | $0.80\pm 0.3$ | $0.77\pm 0.1$ | $0.81\pm 0.03$ | 0.00 | 1.82 | 0.31 | 1.40 |

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Nielsen, F.; Sun, K.
Guaranteed Bounds on Information-Theoretic Measures of Univariate Mixtures Using Piecewise Log-Sum-Exp Inequalities. *Entropy* **2016**, *18*, 442.
https://doi.org/10.3390/e18120442

**AMA Style**

Nielsen F, Sun K.
Guaranteed Bounds on Information-Theoretic Measures of Univariate Mixtures Using Piecewise Log-Sum-Exp Inequalities. *Entropy*. 2016; 18(12):442.
https://doi.org/10.3390/e18120442

**Chicago/Turabian Style**

Nielsen, Frank, and Ke Sun.
2016. "Guaranteed Bounds on Information-Theoretic Measures of Univariate Mixtures Using Piecewise Log-Sum-Exp Inequalities" *Entropy* 18, no. 12: 442.
https://doi.org/10.3390/e18120442