# Parametric Bayesian Estimation of Differential Entropy and Relative Entropy

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Notation and Background

## 2. Related Work

#### 2.1. Prior Work on Parametric Differential Entropy Estimation

#### 2.2. Prior Work on Nonparametric Differential Entropy Estimation

#### 2.3. Prior Work on Relative Entropy Estimation

## 3. Functional Estimates that Minimize Expected Bregman Loss

**Proposition**

**1.**

## 4. Bayesian Differential Entropy Estimate of the Uniform Distribution

#### 4.1. No Prior Knowledge About the Uniform

**Figure 1.**Example comparison of differential entropy estimators. Left: For each of 10,000 runs of the simulation, n samples were drawn iid from a uniform distribution on $[-5,5]$. The proposed estimate (9) is compared to the maximum likelihood estimate, and to the nearest-neighbor estimate given in (6). Right: For each of 10,000 runs of the simulation, n samples were drawn iid from a Gaussian distribution. For each of the 10,000 runs, a new Gaussian distribution with diagonal covariance was randomly generated by drawing each of the variances iid from a uniform on $[0,1]$. The Bayesian estimator prior parameters were $q=d$ and $B=.5qI$. The proposed estimate (12) is compared to the only feasible estimator for this range of n, the nearest-neighbor estimate given in (6).

#### 4.2. Pareto Prior Knowledge About the Uniform

## 5. Gaussian Distribution

#### 5.1. Differential Entropy Estimate of the Gaussian Distribution

#### 5.2. Relative Entropy Estimate between Gaussian Distributions

## 6. Wishart and Inverse Wishart Distributions

#### 6.1. Wishart Differential Entropy and Relative Entropy

#### 6.2. Inverse Wishart Differential Entropy and Relative Entropy

#### 6.3. Bayesian Estimation of Wishart Differential Entropy

#### 6.4. Bayesian Estimation of Relative Entropy between Two Wisharts

#### 6.5. Bayesian Estimation of Inverse Wishart Differential Entropy

#### 6.6. Bayesian Estimation of Relative Entropy between Two Inverse Wisharts

## 7. Discussion

## Acknowledgments

## References

- El Saddik, A.; Orozco, M.; Asfaw, Y.; Shirmohammadi, S.; Adler, A. A novel biometric system for identification and verification of haptic users. IEEE Trans. Instrum. Meas.
**2007**, 56, 895–906. [Google Scholar] [CrossRef] - Choi, H. Adaptive Sampling and Forecasting with Mobile Sensor Networks . PhD Dissertation, MIT, Cambridge, MA, USA, 2009. [Google Scholar]
- Moddemeijer, R. On estimation of entropy and mutual information of continuous distributions. Signal Process.
**1989**, 16, 233–246. [Google Scholar] [CrossRef] - Misra, N.; Singh, H.; Demchuk, E. Estimation of the entropy of a multivariate normal distribution. J. Multivariate Anal.
**2005**, 92, 324–342. [Google Scholar] [CrossRef] - Wang, Q.; Kulkarni, S.R.; Verdú, S. Divergence estimation for multi-dimensional densities via k nearest-neighbor distances. IEEE Trans. Inform. Theory
**2009**, 55, 2392–2405. [Google Scholar] [CrossRef] - Ahmed, N.A.; Gokhale, D.V. Entropy expressions and their estimators for multivariate distributions. IEEE Trans. Inform. Theory
**1989**, 688–692. [Google Scholar] [CrossRef] - Beirlant, J.; Dudewicz, E.; Györfi, L.; Meulen, E.V.D. Nonparametric entropy estimation: An overview. Intl. J. Math. Stat. Sci.
**1997**, 6, 17–39. [Google Scholar] - Nilsson, M.; Kleijn, W.B. On the estimation of differential entropy from data located on embedded manifolds. IEEE Trans. Inform. Theory
**2007**, 53, 2330–2341. [Google Scholar] [CrossRef] - Kozachenko, L.F.; Leonenko, N.N. Sample estimate of entropy of a random vector. Probl. Inform. Transm.
**1987**, 23, 95–101. [Google Scholar] - Goria, M.N.; Leonenko, N.N.; Mergel, V.V.; Inverardi, P.L. A new class of random vector entropy estimators and its applications in testing statistical hypotheses. J. Nonparametric Stat.
**2005**, 17, 277–297. [Google Scholar] [CrossRef] - Mnatsakanov, R.M.; Misra, N.S.E. k
_{n}-Nearest neighbor estimators of entropy. Math. Method. Stat.**2008**, 17, 261–277. [Google Scholar] [CrossRef] - Hero, A.; Michel, O. Asymptotic theory of greedy approximations to minimal k-point random graphs. IEEE Trans. Inform. Theory
**1999**, 45, 1921–1939. [Google Scholar] [CrossRef] - Hero, A.; Ma, B.; Michel, O.; Gorman, J. Applications of entropic spanning graphs. IEEE Signal Process. Mag.
**2002**, 19, 85–95. [Google Scholar] [CrossRef] - Costa, J.; Hero, A. Geodesic entropic graphs for dimension and entropy estimation in manifold learning. IEEE Trans. Signal Process.
**2004**, 52, 2210–2221. [Google Scholar] [CrossRef] - Hulle, M.M.V. Edgeworth approximation of multivariate differential entropy. Neural Comput.
**2005**, 17, 1903–1910. [Google Scholar] [CrossRef] [PubMed] - Wang, Q.; Kulkarni, S.R.; Verdú, S. Divergence estimation of continuous distributions based on data-dependent partitions. IEEE Trans. Inform. Theory
**2005**, 51, 3064–3074. [Google Scholar] [CrossRef] - Nguyen, X.; Wainwright, M.J.; Jordan, M.I. Estimating divergence functional and the likelihood ratio by penalized convex risk minimization. Advances Neural Inform. Process. Syst.
**2007**. [Google Scholar] - Wang, Q.; Kulkarni, S.R.; Verdú, S. A nearest-neighbor approach to estimating divergence between continuous random vectors. In Proceedings of the 2006 IEEE International Symposium on Information Theory, Seattle, WA, USA, 9–14 July 2006; IEEE: Washington, DC, USA, 2006. [Google Scholar]
- Pérez-Cruz, F. Estimation of information-theoretic measures for continuous random variables. Adv. Neural Inform. Process. Syst. (NIPS)
**2009**. [Google Scholar] - Lehmann, E.L.; Casella, G. Theory of Point Estimation; Springer: New York, NY, USA, 1998; Chapter 4. [Google Scholar]
- Bregman, L. The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys.
**1967**, 7, 200–217. [Google Scholar] [CrossRef] - Banerjee, A.; Guo, X.; Wang, H. On the optimality of conditional expectation as a Bregman predictor. IEEE Trans. Inform. Theory
**2005**, 51, 2664–2669. [Google Scholar] [CrossRef] - Jones, L.K.; Byrne, C.L. General entropy criteria for inverse problems, with applications to data compression, pattern classification, and cluster analysis. IEEE Trans. Inform. Theory
**1990**, 36, 23–30. [Google Scholar] [CrossRef] - Frigyik, B.A.; Srivastava, S.; Gupta, M.R. Functional Bregman divergence and Bayesian estimation of distributions. IEEE Trans. Inform. Theory
**2008**, 54, 5130–5139. [Google Scholar] [CrossRef] - Amari, S.; Nagaoka, H. Methods of Information Geometry; Oxford University Press: New York, NY, USA, 2000. [Google Scholar]
- Kass, R.E. The geometry of asymptotic inference. Stat. Sci.
**1989**, 4, 188–234. [Google Scholar] [CrossRef] - Srivastava, S.; Gupta, M.R.; Frigyik, B.A. Bayesian quadratic discriminant analysis. J. Mach. Learn. Res.
**2007**, 8, 1287–1314. [Google Scholar] - Havil, J. Gamma; Princeton University Press: Princeton, NJ, USA, 2003. [Google Scholar]
- Bercher, J.; Vignat, C. Estimating the entropy of a signal with applications. IEEE Trans. Signal Process.
**2000**, 48, 1687–1694. [Google Scholar] [CrossRef] - Bilodeau, M.; Brenner, D. Theory of Multivariate Statistics; Springer Texts in Statistics: New York, NY, USA, 1999. [Google Scholar]

## Appendix

#### A.1. Proof of Proposition 1

#### A.2. Derivation of Uniform Differential Entropy Estimate

#### A.3. Derivation of Uniform differential Entropy Given Pareto Prior

#### A.4. Propositions Used in Remaining Derivations

**Identity**

**1.**

**Identity**

**2.**

**Proposition**

**2.**

**Proposition**

**3.**

**Proposition**

**4.**

**Proposition**

**5.**

**Proposition**

**6.**

#### A.5. Derivation of Bayesian Gaussian Differential Entropy Estimate

#### A.6. Derivation of Bayesian Gaussian Relative Entropy Estimate

#### A.7. Derivation of Wishart Differential Entropy:

#### A.8. Derivation of Wishart Relative Differential Entropy:

#### A.9. Derivation of Inverse Wishart Differential Entropy:

#### A.10. Derivation of Inverse Wishart Relative Entropy:

#### A.11. Derivation of Bayesian Estimate of Wishart Differential Entropy:

#### A.12. Derivation of Bayesian Estimate of Relative Entropy Between Wisharts:

#### A.13. Derivation of Bayesian Estimate of Inverse Wishart Differential Entropy:

#### A.14. Derivation of Bayesian Estimate of Relative Entropy Between Inverse Wisharts:

© 2010 by the authors licensee Molecular Diversity Preservation International, Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license http://creativecommons.org/licenses/by/3.0/.

## Share and Cite

**MDPI and ACS Style**

Gupta, M.; Srivastava, S. Parametric Bayesian Estimation of Differential Entropy and Relative Entropy. *Entropy* **2010**, *12*, 818-843.
https://doi.org/10.3390/e12040818

**AMA Style**

Gupta M, Srivastava S. Parametric Bayesian Estimation of Differential Entropy and Relative Entropy. *Entropy*. 2010; 12(4):818-843.
https://doi.org/10.3390/e12040818

**Chicago/Turabian Style**

Gupta, Maya, and Santosh Srivastava. 2010. "Parametric Bayesian Estimation of Differential Entropy and Relative Entropy" *Entropy* 12, no. 4: 818-843.
https://doi.org/10.3390/e12040818