Next Article in Journal
Statistical Significance of Earth’s Electric and Magnetic Field Variations Preceding Earthquakes in Greece and Japan Revisited
Next Article in Special Issue
Information Perspective to Probabilistic Modeling: Boltzmann Machines versus Born Machines
Previous Article in Journal
A Novel Index Based on Binary Entropy to Confirm the Spatial Expansion Degree of Urban Sprawl
Previous Article in Special Issue
Recognizing Information Feature Variation: Message Importance Transfer Measure and Its Applications in Big Data
Open AccessArticle

Ensemble Estimation of Information Divergence

Genetics Department and Applied Math Program, Yale University, New Haven, CT 06520, USA
Intuit Inc., Mountain View, CA 94043, USA
IBM Research, Cambridge, MA 02142, USA
Electrical Engineering and Computer Science Department, University of Michigan, Ann Arbor, MI 48109, USA
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in the 2016 IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, 10–15 July 2016; pp. 1133–1137.
Current address: Department of Mathematics and Statistics, Utah State University, Logan, UT 84322, USA
Entropy 2018, 20(8), 560;
Received: 29 June 2018 / Revised: 23 July 2018 / Accepted: 26 July 2018 / Published: 27 July 2018
(This article belongs to the Special Issue Information Theory in Machine Learning and Data Science)
PDF [997 KB, uploaded 27 July 2018]


Recent work has focused on the problem of nonparametric estimation of information divergence functionals between two continuous random variables. Many existing approaches require either restrictive assumptions about the density support set or difficult calculations at the support set boundary which must be known a priori. The mean squared error (MSE) convergence rate of a leave-one-out kernel density plug-in divergence functional estimator for general bounded density support sets is derived where knowledge of the support boundary, and therefore, the boundary correction is not required. The theory of optimally weighted ensemble estimation is generalized to derive a divergence estimator that achieves the parametric rate when the densities are sufficiently smooth. Guidelines for the tuning parameter selection and the asymptotic distribution of this estimator are provided. Based on the theory, an empirical estimator of Rényi-α divergence is proposed that greatly outperforms the standard kernel density plug-in estimator in terms of mean squared error, especially in high dimensions. The estimator is shown to be robust to the choice of tuning parameters. We show extensive simulation results that verify the theoretical results of our paper. Finally, we apply the proposed estimator to estimate the bounds on the Bayes error rate of a cell classification problem. View Full-Text
Keywords: divergence; differential entropy; nonparametric estimation; central limit theorem; convergence rates; bayes error rate divergence; differential entropy; nonparametric estimation; central limit theorem; convergence rates; bayes error rate

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Moon, K.R.; Sricharan, K.; Greenewald, K.; Hero, A.O., III. Ensemble Estimation of Information Divergence . Entropy 2018, 20, 560.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top