# Computing Accurate Probabilistic Estimates of One-D Entropy from Equiprobable Random Samples

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

**,**if the mathematical form of $p\left(x\right)=\left\{p\left({x}^{\left(j\right)}\right),j=1,\dots {N}_{X}\right\}$ is known, then ${H}_{p}\left(X\right)$ can be computed by applying the mathematical definition of discrete Entropy:

## 2. Popular Approaches to Estimating Distributions from Data

**,**where $\Delta $ is the bin width, provided $\Delta $ is sufficiently small [2]. This fact allows conversion of the discrete entropy estimate to differential entropy simply by adding $ln\left(\Delta \right)$ to the discrete entropy estimate.

## 3. Proposed Quantile Spacing (QS) Approach

#### 3.1. Step 1—Assumption about Support Interval

#### 3.2. Step 2—Assumption about Approximate Form of $p\left(X\right)$

#### 3.3. Step 3—Estimation of the Quantiles of $p\left(x\right)$

#### 3.4. Random Variability Associated with the QS-Based Entropy Estimate

## 4. Properties of the Proposed Approach

- (i)
**A1:**The piecewise constant approximation $\hat{p}\left(x|Z\right)$ of $p\left(x\right)$ on the intervals between the quantile positions is adequate- (ii)
**A2:**The quantile positions $Z=\left\{{z}_{0},{z}_{1},{z}_{2},\dots ,{z}_{{N}_{Z}}\right\}$ of $p\left(x\right)$ have been estimated accurately.- (iii)
**A3:**The pdf $p\left(x\right)$ exists only on the support interval $\left[{x}_{min},{x}_{max}\right]$, which has been properly chosen- (iv)
**A4:**The sample set $S$ is consistent, representative and sufficiently informative about the underlying nature of $p\left(x\right)$

#### 4.1. Implications of the Piecewise Constant Assumption

#### 4.2. Implications of Imperfect Quantile Position Estimation

#### 4.3. Implications of the Finite Support Assumption

#### 4.4. Combined Effect of the Piecewise Constant Assumption, Finite Support Assumption, and Quantile Position Estimation Using Finite Sample Sizes

#### 4.5. Implications of Informativeness of the Data Sample

#### 4.6. Summary of Properties of the Proposed Quantile Spacing Approach

#### 4.7. Algorithm for Estimating Entropy via the Quantile Spacing Approach

- (1)
- Set ${x}_{min}=\mathrm{min}\left\{S\right\}$ and ${x}_{max}=\mathrm{max}\left\{S\right\}$
- (2)
- Select values for $\psi =\left\{{N}_{Z},{N}_{K},{N}_{B}\right\}$. Recommended default values are ${N}_{Z}=\alpha \xb7{N}_{S}$, ${N}_{K}=500$ and ${N}_{B}=500$, with $\alpha =0.25$.
- (3)
- Bootstrap a sample set ${S}_{b}$ of size ${N}_{S}$ from $S$ with replacement.
- (4)
- Compute the entropy estimate ${\hat{H}}_{\hat{p}}\left(X|{\hat{Z}}_{b}\right)$ using Equation (9) and the procedure outlined in Section 3.
- (5)
- Repeat the above steps ${N}_{B}$ times to generate the bootstrapped distribution of ${\hat{H}}_{\hat{p}}\left(X|{\hat{Z}}_{b}\right)$ as an empirical probabilistic estimate $p\left({\hat{H}}_{\hat{p}}\left(X|S\right)\right)$ of the Entropy ${H}_{p}\left(X\right)$ of $p\left(x\right)$ given $S$.

## 5. Relationship to the Bin Counting Approach

## 6. Experimental Comparison with the Bin Counting Method

^{−5}. As with QS, the bias in entropy computed using the BC equal bin-width piecewise-continuous approximation declines to zero with increasing numbers of bins, and becomes less that $1\%$ when ${N}_{Bin}\ge 100$; in fact, it can decline somewhat faster than for the QS approach. Clearly, for the Gaussian (blue) and Exponential (red) densities, the BC constant bin-width approximation can provide better entropy estimates with fewer bins than the QS variable bin-width approach. However, for the skewed Log-Normal density (orange) the behavior of the BC approximation is more complicated, whereas the QS approach shows an exponential rate of improvement with increasing number of bins for all three density types. This suggests that the variable bin-width QS approximation may provide a more consistent approach for more complex distributional forms (see Section 7).

**{**$100,200,500,1000,2000$ and $5000$}. The yellow marker symbols indicate where each curve crosses the zero-bias line; clearly ${N}_{Bin}$ is not a constant fraction of ${N}_{S}$, and for any given sample size the ratio of $\frac{{N}_{Bin}}{{N}_{S}}$ changes with form of the pdf.

## 7. Testing on Multi-Modal PDF Forms

## 8. Experimental Comparison with the Kernel Density Method

## 9. Discussion and Conclusions

**∆**for BC) was optimized for each random sample, by finding the value that maximizes the Likelihood of the sample. As can clearly be seen, the QS-based estimates remain relatively unbiased even for samples as small as 100 data points, whereas the KD- and BC-based estimates tend to get progressively worse (negatively biased) as sample sizes are decreased. Overall, QS is both easier to apply (no hyper-parameter tuning required) and likely to be more accurate than BC or KD when applied to data from an unknown distributional form, particularly since the the piecewise linear interpolation between CDF points makes it applicable to pdfs of any arbitrary shape, including those with sharp discontinuities in slope and/or magnitude. A follow-up study investigating the accuracy of these methods when faced with data drawn from complex, arbitrarily shaped, pdfs is currently in progress and will be reported in due course.

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J.
**1948**, 27, 379–423. [Google Scholar] [CrossRef] [Green Version] - Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
- Gong, W.; Yang, D.; Gupta, H.V.; Nearing, G. Estimating information entropy for hydrological data: One dimensional case. Tech. Note Water Resour. Res.
**2014**, 50, 5003–5018. [Google Scholar] [CrossRef] - Silverman, B.W. Density Estimation for Statistics and Data Analysis; Chapman and Hall: New York, NY, USA, 1986. [Google Scholar]
- Beirlant, J.; Dudewicz, E.J.; Gyorfi, L.; van der Meulen, E.C. Nonparametric entropy estimation: An overview. Int. J. Math. Stat. Sci.
**1997**, 6, 17–39. [Google Scholar] - Scott, D.W. Multivariate Density Estimation: Theory, Practice, and Visualization; John Wiley: New York, NY, USA, 2008. [Google Scholar]
- Loftsgarden, D.O.; Quesenberry, C.D. A Nonparametric Estimate of a Multivariate Density Function. Ann. Math. Stat.
**1965**, 36, 1049–1051. [Google Scholar] [CrossRef] - Scott, D.W. Optimal and data-based histograms. Biometrika
**1979**, 66, 605–610. [Google Scholar] [CrossRef] - Scott, D.W.; Thompson, J.R. Probability Density Estimation in Higher Dimensions. In Computer Science and Statistics: Proceedings of the Fifteenth Symposium on the Interface; Gentle, J.E., Ed.; North-Holland: Amsterdam, The Netherlands, 1983; pp. 173–179. [Google Scholar]
- Scott, D.W. Handbook of Computational Statistics—Concepts and Methods; Springer: New York, NY, USA, 2004. [Google Scholar]
- Scott, D.W. Averaged shifted histograms—Effective nonparametric density estimators in several dimensions. Ann. Stat.
**1985**, 13, 1024–1040. [Google Scholar] [CrossRef] - Wegman, E.J. Maximum likelihood estimation of a probability density function. Sankhyā Indian J. Stat.
**1975**, 3, 211–224. [Google Scholar] - Denison, D.G.T.; Adams, N.M.; Holmes, C.C.; Hand, D.J. Bayesian partition modelling. Comput. Stat. Data Anal.
**2002**, 38, 475–485. [Google Scholar] [CrossRef] - Jackson, B.; Scargle, J.; Barnes, D.; Arabhi, S.; Alt, A.; Gioumousis, P.; Gwin, E.; Sangtrakulcharoen, P.; Tan, L.; Tsai, T.T. An algorithm for optimal partitioning of data on an interval. IEEE Signal Process. Lett.
**2005**, 12, 105–108. [Google Scholar] [CrossRef] [Green Version] - Endres, A.; Foldiak, P. Bayesian bin distribution inference and mutual information. IEEE Trans. Inf. Theory
**2005**, 51, 3766–3779. [Google Scholar] [CrossRef] - Hutter, M. Exact Bayesian regression of piecewise constant functions. Bayesian An.
**2007**, 2, 635–664. [Google Scholar] [CrossRef] - Darscheid, P.; Guthke, A.; Ehret, U. A maximum-entropy method to estimate discrete distributions from samples ensuring nonzero probabilities. Entropy
**2018**, 20, 601. [Google Scholar] [CrossRef] [Green Version] - Sturges, H.A. The choice of a class interval. J. Am. Stat. Assoc.
**1926**, 21, 65–66. [Google Scholar] [CrossRef] - Freedman, D.; Diaconis, P. On the histogram as a density estimator: L2 theory. Z. Wahrscheinlichkeitstheorie Verwandte Geb.
**1981**, 57, 453–476. [Google Scholar] [CrossRef] [Green Version] - Knuth, K.H. Optimal data-based binning for histograms and histogram-based probability density models. Digit. Signal Process.
**2019**, 95, 102581. [Google Scholar] [CrossRef] - Schwartz, S.; Zibulevsky, M.; Schechner, Y.Y. Fast kernel entropy estimation and optimization. Signal Process.
**2005**, 85, 1045–1058. [Google Scholar] [CrossRef] [Green Version] - Viola, P.A. Alignment by Maximization of Mutual Information. Ph.D. Thesis, Massachusetts Institute of Technology–Artificial Intelligence Laboratory, Cambridge, MA, USA, 1995. [Google Scholar]
- Hyndman, R.J.; Fan, Y. Sample quantiles in statistical packages. Am. Stat.
**1996**, 50, 361–365. [Google Scholar] - Harrell, F.E.; Davis, C.E. A new distribution-free quantile estimator. Biometrika
**1982**, 69, 635–640. [Google Scholar] [CrossRef] - Kaigh, W.D.; Lachenbruch, P.A. A Generalized Quantile Estimator. Commun. Stat. Part A Theory Methods
**1982**, 11, 2217–2238. [Google Scholar] - Brodin, E. On quantile estimation by bootstrap. Comput. Stat. Data Anal.
**2006**, 50, 1398–1406. [Google Scholar] [CrossRef] - Parzen, E. Nonparametric Statistical Data Modeling. J. Am. Stat. Assoc.
**1979**, 74, 105–131. [Google Scholar] [CrossRef] - Sheather, J.S.; Marron, J.S. Kernel quantile estimators. J. Am. Stat. Assoc.
**1990**, 85, 410–416. [Google Scholar] [CrossRef] - Cheng, C.; Parzen, E. Unified estimators of smooth quantile and quantile density functions. J. Stat. Plan. Inference
**1997**, 59, 291–307. [Google Scholar] [CrossRef] - Park, C. Smooth Nonparametric Estimation of a Quantile Function under Right Censoring Using Beta Kernels; Technical Report (TR 2006-01-CP); Department of Mathematical Sciences, Clemson University: Clemson, SC, USA, 2006. [Google Scholar]
- Yang, S.S. A smooth nonparametric estimator of a quantile function. J. Am. Stat. Assoc.
**1985**, 80, 1004–1011. [Google Scholar] [CrossRef] - Sfakianakis, M.E.; Verginis, D.G. A new family of nonparametric quantile estimators. Commun. Stat. Simul. Comput.
**2008**, 37, 337–345. [Google Scholar] [CrossRef] - Navruz, G.; Özdemir, A.F. A new quantile estimator with weights based on a subsampling approach. Br. J. Math. Stat. Psychol.
**2020**, 73, 506–521. [Google Scholar] [CrossRef] [PubMed] - Cheng, C. The Bernstein polynomial estimator of a smooth quantile function. Stat. Probab. Lett.
**1995**, 24, 321–330. [Google Scholar] [CrossRef] - Pepelyshev, A.; Rafajłowicz, E.; Steland, A. Estimation of the quantile function using Bernstein–Durrmeyer polynomials. J. Nonparametric Stat.
**2014**, 26, 1–20. [Google Scholar] [CrossRef] - Kohler, M.; Tent, R. Nonparametric quantile estimation using surrogate models and importance sampling. Metrika
**2020**, 83, 141–169. [Google Scholar] [CrossRef]

**Figure 1.**Plots showing how entropy estimation bias associated with the piecewise-constant approximation of various theoretical pdf forms varies with the number of quantiles ((

**a**); QS method) or number of equal-width bins ((

**b**); BC method) used in the approximation. The dashed horizontal lines indicate $\pm 1\%$ and $\pm 5\%$ bias error. No sampling is involved and the bias is due purely to the piecewise constant assumption. For QS, the locations of the quantiles are set to their theoretical values. To address the “infinite support” issue, $\left[{x}_{min},{x}_{max}\right]$ were set to be the locations where $P\left({z}_{0}\right)=\epsilon $ and $P\left({z}_{{N}_{Z}}\right)=1-\epsilon $ respectively, with $\epsilon ={10}^{-5}$. In both cases, bias approaches zero as the number of piecewise-constant units is increased. For the QS method, the decline in bias is approximately linear in the log-log space (see inlay in the left subplot).

**Figure 2.**Plots showing bias and uncertainty associated with estimates of the quantiles derived from random samples, for the Log-Normal pdf. Uncertainty associated with random sampling variability is estimated by repeating each experiment $500$ times. In both subplots, for each case, the box plots are shown side by side to improve legibility. (

**a**) Subplot showing results varying ${N}_{Z}=\left[100,200,500,1000,2000,5000,10000\right]$ for fixed ${N}_{K}=500$. (

**b**) Subplot showing results varying ${N}_{K}=\left[10,20,50,100,200,500\right]$ for fixed ${N}_{Z}=1000$.

**Figure 3.**Plots showing percentage entropy fraction associated with each quantile spacing for the Gaussian, Exponential and Log-Normal pdfs, for ${N}_{Z}=100$ (

**a**), and ${N}_{Z}=1000$ (

**b**). For the Uniform pdf (not shown to avoid complicating the figures) the percentage entropy fraction associated with each quantile spacing is a horizontal line (at 1% in the left panel, and at 0.1% in the right panel). Note that the entropy fractions can be proportionally quite large or small at the extremes, depending on the form of the pdf. However, the overall entropy fraction associated with each quantile spacing diminishes with increasing ${N}_{Z}.$ For the examples shown, the maximum contributions associated with a quantile spacing are less than $6\%$ for ${N}_{Z}=100$ (

**a**), and become less than $1\%$ for ${N}_{Z}=1000$ (

**b**).

**Figure 4.**Plots showing expected percent error in the QS-based estimate of entropy derived from random samples, as a function of $\alpha =100*\left({N}_{Z}/{N}_{S}\right)$, which expresses the number of quantiles ${N}_{Z}$ as a fractional percentage of the sample size ${N}_{S}$. Results are averaged over $500$ trials obtained by drawing sample sets of size ${N}_{S}$ from the theoretical pdf, where ${x}_{min}$ and ${x}_{max}$ are set to be the smallest and largest data values in the particular sample. Results are shown for different sample sizes ${N}_{S}=\left[100,200,500,1000,2000,5000\right]$, for the Gaussian (

**a**), Exponential (

**b**) and Log-Normal (

**c**) densities. In each case, when $\alpha $ is small the estimation bias is positive (overestimation) and can be greater than $10\%$ for $\alpha <10\%$, and crosses zero to become negative (underestimation) when $\alpha >25\u201335\%.$ The marginal cost of setting $\alpha $ too large is low compared to setting $\alpha $ too small. As ${N}_{S}$ increases, the bias diminishes. The optimal choice is $\alpha \approx 25\u201330\%$ and is relatively insensitive to pdf shape or sample size.

**Figure 5.**Plots showing bias and uncertainty in the QS-based estimate of entropy derived from random samples, as a function of sample size ${N}_{S}$, when the number of quantiles ${N}_{Z}$ is set to $25\%$ of the sample size ($\alpha =0.25$), and ${x}_{min}$ and ${x}_{max}$ are respectively set to be the smallest and largest data values in the particular sample. The uncertainty shown is due to random sampling variability, estimated by drawing $500$ different samples from the parent density. Results are shown for the Gaussian (blue), Exponential (red) and Log-Normal (orange) densities; box plots are shown side by side to improve legibility. As sample size ${N}_{S}$ increases, the uncertainty diminishes.

**Figure 6.**Plots showing, for different sample sizes and $\alpha =25\%$, the ratio of the interquartile range (IQR) of the QS-based estimate of entropy obtained using bootstrapping to that of the actual IQR arising due to random sampling variability. Here, each sample set drawn from the parent density is bootstrapped to obtain ${N}_{B}=500$ different estimates of the associated entropy, and the width of the resulting inter-quartile range is computed. The procedure is repeated for $500$ different sample sets drawn from the parent population, and the graph shows the resulting variability as box-plots. The ideal result would be a ratio of 1.0.

**Figure 7.**Plots showing how expected percentage error in the BC-based estimate of Entropy derived from random samples, varies as a function of the number of bins ${N}_{Bin}$ for the (

**a**) Gaussian, (

**b**), Exponential, and (

**c**) Log-Normal densities. Results are averaged over $500$ trials obtained by drawing sample sets of size ${N}_{S}$ from the theoretical pdf, where ${x}_{min}$ and ${x}_{max}$ are set to be the smallest and largest data values in the particular sample. Results are shown for different sample sizes ${N}_{S}=\left[100,200,500,1000,2000,5000\right]$. When the number of bins is small the estimation bias is positive (overestimation) but rapidly declines to cross zero and become negative (underestimation) as the number of bins is increased. In general, the overall ranges of overestimation and underestimation bias are larger than for the QS method (see Figure 4).

**Figure 8.**Boxplots showing the sampling variability distribution of optimal fractional number of bins (as a percentage of sample size) to achieve zero bias, when using the BC method for estimating entropy from random samples. Results are shown for the Gaussian (blue), Exponential (red) and Log-Normal (orange) densities. The uncertainty estimates are computed by drawing $500$ different sample data sets of a given size from the parent distribution. Note that the expected optimal fractional number of bins varies with shape of the pdf, and is not constant but declines as the sample size increases. This is in contrast with the QS method where the optimal fractional number of bins is constant at $~25\%$ for different sample sizes and pdf shapes. Further, the variability in optimal fractional number of bins can be large and highly skewed at smaller sample sizes.

**Figure 9.**Plots showing results for the Bimodal pdf. (

**a**) Pdf and Cdf for the Gaussian Mixture model. (

**b**) Showing convergence of entropy computed using piecewise constant approximation as the number of quantiles ${N}_{Z}$ is increased. (

**c**) Bias and sampling variability of the QS-based estimate of entropy plotted against ${N}_{Z}$ as a percentage of sample size. (

**d**) Expected bias of QS-based estimate of entropy plotted against ${N}_{Z}$ as a percentage of sample size, for different sample sizes ${N}_{S}=\left[100,200,500,1000,2000,5000\right]$.

**Figure 10.**Plot showing how expected percentage error in the KD-based estimate of Entropy derived from random samples, varies as a function of $K={\sigma}_{k}\xb7\sqrt{{N}_{S}}$ when using a Gaussian kernel. Results are averaged over $500$ trials obtained by drawing sample sets of size ${N}_{S}$ from the theoretical pdf, where ${x}_{min}$ and ${x}_{max}$ are set to be the smallest and largest data values in the particular sample. Results are shown for different sample sizes ${N}_{S}=\left[100,200,500,1000,2000,5000\right]$, for the (

**a**) Gaussian, (

**b**), Exponential, (

**c**) Log-Normal, and (

**d**) Bimodal densities. When the kernel standard deviation ${\sigma}_{k}$ (and hence $K$) is small the estimation bias is negative (underestimation) but rapidly increases to cross zero and become positive (overestimation) as the kernel standard deviation is increased. The location of the crossing point (corresponding to optimal value for $K$ (and hence ${\sigma}_{k}$) varies with sample size and shape of the pdf.

**Figure 11.**Plot showing how the optimal value of the KD hyper-parameter $K={\sigma}_{k}\xb7\sqrt{{N}_{S}}$ varies as a function of sample size ${N}_{S}$ and pdf type when using a Gaussian kernel. In disagreement with Parzen-window theory, the optimal value for $K$ does not remain approximately constant as the sample size ${N}_{S}$ is varied. Further, the value of $K$ varies significantly with shape of the underlying pdf.

**Figure 12.**Plots showing expected percent error in the QS- (blue), KD- (purple) and BC-based (green) estimates of entropy derived from random samples, as a function of sample size ${N}_{S}$ for the (

**a**) Gaussian, (

**b**), Log-Normal, (

**c**) Exponential, and (

**d**) Bimodal densities; box plots are shown side by side to improve legibility. Results are averaged over $500$ trials obtained by drawing sample sets of size ${N}_{S}$ from the theoretical pdf, where ${x}_{min}$ and ${x}_{max}$ are set to be the smallest and largest data values in the particular sample. For QS, the fractional number of bins was fixed at $\alpha =25\%$ regardless of pdf form or sample size. For KD and BC, the corresponding hyperparameter (kernel standard deviation ${\sigma}_{K}$ and bin width ∆ respectively) was optimized for each random sample by finding the value that maximizes the Likelihood of the sample. Results show clearly that QS-based estimates are relatively unbiased, even for small sample sizes, whereas KD- and BC-based estimates can have significant negative bias when sample sizes are small.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Gupta, H.V.; Ehsani, M.R.; Roy, T.; Sans-Fuentes, M.A.; Ehret, U.; Behrangi, A.
Computing Accurate Probabilistic Estimates of One-D Entropy from Equiprobable Random Samples. *Entropy* **2021**, *23*, 740.
https://doi.org/10.3390/e23060740

**AMA Style**

Gupta HV, Ehsani MR, Roy T, Sans-Fuentes MA, Ehret U, Behrangi A.
Computing Accurate Probabilistic Estimates of One-D Entropy from Equiprobable Random Samples. *Entropy*. 2021; 23(6):740.
https://doi.org/10.3390/e23060740

**Chicago/Turabian Style**

Gupta, Hoshin V., Mohammad Reza Ehsani, Tirthankar Roy, Maria A. Sans-Fuentes, Uwe Ehret, and Ali Behrangi.
2021. "Computing Accurate Probabilistic Estimates of One-D Entropy from Equiprobable Random Samples" *Entropy* 23, no. 6: 740.
https://doi.org/10.3390/e23060740