Next Article in Journal
A Modified Shielding and Rapid Transition DDES Model for Separated Flows
Previous Article in Journal
Sample Size Calculations in Simple Linear Regression: A New Approach
 
 
Article
Peer-Review Record

Empirical Squared Hellinger Distance Estimator and Generalizations to a Family of α-Divergence Estimators

Entropy 2023, 25(4), 612; https://doi.org/10.3390/e25040612
by Rui Ding * and Andrew Mullhaupt
Reviewer 1:
Reviewer 2: Anonymous
Entropy 2023, 25(4), 612; https://doi.org/10.3390/e25040612
Submission received: 14 March 2023 / Revised: 31 March 2023 / Accepted: 2 April 2023 / Published: 4 April 2023
(This article belongs to the Section Information Theory, Probability and Statistics)

Round 1

Reviewer 1 Report

n many applications, the metric property of squared Helliinger distance is unimportant. There is a distinguished distribution Q, and a measure of discrepancy between various measures P and Q is important, e.g. to find the minimum discepancy among a convex set of measures P.  Hence the KL divergence could be rationalized even though it is not a metric.  Moreover, it has has a frequentist interpretation found in the statistical t heory of large deviations, which has proven important in statistics and applications like statistical mechanics. 

A possible compensating advantage of the proposed squared Hellinger estimators would be a faster rate of convergence in important examples than achieved by the empirical KL divergence estimator cited here.  The authors should provide some numerical examples that provide insight into this question.   I would find that more informative than some of the seemingly endless numerical explorations that occupy the latter pages of their manuscript.  

Author Response

We want to thank the reviewer for the comments. Here's our response to the suggestions:

In this paper we are interested in the squared Hellinger distance mainly because of its power in meaningful statistical inference; it is important to note that the Hellinger distance is a metric, and it is symmetric, always bounded, and has close connections to the total variation distance, which is exactly what inference depends on (KL divergence does not admit a useful lower bound on the TVD). Since we are interested in two sample estimate problems, we are not dealing with optimization problems where P comes from a (convex) uncertainty set. To demonstrate the desirability of squared Hellinger distance against KL divergence, we added an example where we considered a standard Cauchy distribution and a standard normal distribution. KL divergence between these two distributions is infinity while the squared Hellinger distance is bounded. Our estimator shows convergence to the ground truth value while the empirical KL estimator cannot converge to any meaningful values. 

Reviewer 2 Report

See attached file.

Comments for author File: Comments.pdf

Author Response

We want to thank the reviewer for the detailed comments. According to the mentioned bullet points, the below revision has been done:

  1. The multivariate case used a similar but slightly different technique to perform the empirical estimator. It is based on a k-NN density estimator proposed in Perez-Cruz, and their convergence results follow from a different waiting time argument. The main concepts behind this technique are introduced in section 3 when reviewing the KL estimator, and the following sections used the results for general alpha-divergences. We added the reference to that paper for clarity.
  2. We added respective notations and definitions for P_e, Q_e, P_c, Q_c, to distinguish their empirical CDFs and interpolated CDFs.
  3. The wording was changed from "demonstrate" to "show" for the numerical sections.
  4. The derivation for this inequality is included as a brief discussion after it in the paper and followed by relevant references.
  5. All citations have been revised to follow the journal style.
  6. We have corrected the typos mentioned by the reviewer.
Back to TopTop