Goodness-of-Fit Tests via Entropy-Based Density Estimation Techniques

Al-Labadi, Luai; Yu, Ruodie; Bao, Kairui

doi:10.3390/stats8040097

Open AccessArticle

Goodness-of-Fit Tests via Entropy-Based Density Estimation Techniques

by

Luai Al-Labadi

^1,2,*

,

Ruodie Yu

³

and

Kairui Bao

³

¹

Department of Mathematics and Statistics, American University of Sharjah, Sharjah 26666, United Arab Emirates

²

Department of Mathematical & Computational Sciences, University of Toronto Mississauga, Mississauga, ON L5L 1C6, Canada

³

Department of Statistical Sciences, University of Toronto, Toronto, ON M5S 3G3, Canada

^*

Author to whom correspondence should be addressed.

Stats 2025, 8(4), 97; https://doi.org/10.3390/stats8040097

Submission received: 9 September 2025 / Revised: 26 September 2025 / Accepted: 2 October 2025 / Published: 14 October 2025

Download Versions Notes

Abstract

Goodness-of-fit testing remains a fundamental problem in statistical inference with broad practical importance. In this paper, we introduce two new goodness-of-fit tests grounded in entropy-based density estimation techniques. The first is a boundary-corrected empirical likelihood ratio test, which refines the classic approach by addressing bias near the support boundaries, though, in practice, it yields results very similar to the uncorrected version. The second is a novel test built on Correa’s local linear entropy estimator, leveraging quantile regression to improve density estimation accuracy. We establish the theoretical properties of both test statistics and demonstrate their practical effectiveness through extensive simulation studies and real-data applications. The results show that the proposed methods deliver strong power and flexibility in assessing model adequacy in a wide range of settings.

Keywords:

bootstrap; empirical likelihood; goodness-of-fit test; local linear estimator; model assessment; sample entropy

1. Introduction

Goodness-of-fit testing is a fundamental problem in statistical inference with widespread applications in model selection, validation, and diagnostics. In the parametric framework, the classical approach is often based on the likelihood ratio test (LRT), as motivated by the Neyman–Pearson lemma. While the lemma guarantees the optimality of the LRT for testing simple hypotheses, the LRT is also widely applied to composite hypotheses due to its favorable asymptotic properties [1].

Let

X_{1}, \dots, X_{n}

be independent and identically distributed (i.i.d.) observations from a continuous distribution F with density f. We consider the hypothesis-testing problem

H_{0} : f = f_{0} vs . H_{1} : f = f_{1},

where under the null hypothesis

f_{0} (x) = f_{0} (x; θ)

is known up to a parameter vector

θ = (θ_{1}, \dots, θ_{d})

, and under the alternative,

f_{1}

is completely unspecified. The classical likelihood ratio test statistic is given by

Λ = \frac{\prod_{i = 1}^{n} f_{1} (X_{i})}{\prod_{i = 1}^{n} f_{0} (X_{i})} .

We reject

H_{0}

in favor of

H_{1}

for large values of

Λ

, or equivalently, when the log-likelihood ratio

2 log Λ = 2 \sum_{i = 1}^{n} log [\frac{f_{1} (X_{i})}{f_{0} (X_{i})}]

exceeds a critical threshold. Under standard regularity conditions and assuming

H_{0}

holds, the distribution of

2 log Λ

converges asymptotically to a chi-squared distribution with degrees of freedom equal to the difference in dimensionality between the null and alternative models [2].

However, in many practical situations—particularly in nonparametric settings—the alternative density

f_{1}

is unknown, rendering direct application of the LRT infeasible. To address this, nonparametric density-based methods have been developed as flexible alternatives. Among these, empirical likelihood ratio tests and entropy-based statistics have attracted considerable attention.

Vexler and Gurevich [3] introduced an empirical likelihood ratio test in which the unknown density

f_{1}

is estimated nonparametrically using Vasicek’s [4] entropy estimator. They defined the empirical likelihood ratio using

\prod_{i = 1}^{n} f_{1, n} (X_{(i)}),

where

f_{1, n} (X_{(i)}) = \frac{F_{n} (X_{(i + m)}) - F_{n} (X_{(i - m)})}{X_{(i + m)} - X_{(i - m)}} = \frac{2 m / n}{X_{(i + m)} - X_{(i - m)}},

(1)

and

F_{n} (x)

denotes the empirical cumulative distribution function. Here,

\hat{θ}

is the maximum likelihood estimate under

H_{0}

, and

X_{(1)} \leq X_{(2)} \leq \dots \leq X_{(n)}

are the order statistics. Boundary corrections are made such that

X_{(i - m)} = X_{(1)}

if

i \leq m

, and

X_{(i + m)} = X_{(n)}

if

i \geq n - m

. The integer m, called the window size, is a positive number less than

n / 2

. Vexler and Gurevich [3] proposed the test statistic

\begin{matrix} T_{m n} = \frac{\prod_{i = 1}^{n} (\frac{2 m}{n (X_{(i + m)} - X_{(i - m)})})}{\prod_{i = 1}^{n} f_{0} (X_{i}; \hat{θ})} . \end{matrix}

(2)

Since

T_{m n}

depends on m, they further suggested the minimization

\begin{matrix} T_{m n} = \frac{{min}_{1 \leq m \leq n^{δ}} \prod_{i = 1}^{n} (\frac{2 m}{n (X_{(i + m)} - X_{(i - m)})})}{\prod_{i = 1}^{n} f_{0} (X_{i}; \hat{θ})}, \end{matrix}

(3)

where

δ \in (0, 1)

.

This general methodology has since been extended to various settings, including logistic distributions [5], skew normality [6], Laplace distributions [7], and Rayleigh distributions [8]. Although the method of Vexler and Gurevich [3] is generally effective, it is known to suffer from boundary bias, particularly near the endpoints—a limitation also discussed by Ebrahimi et al. [9] in the context of entropy estimation.

To address these shortcomings, we propose two test statistics. The first is a corrected version of the Vexler–Gurevich statistic (2) and (3), incorporating a position-dependent correction factor to properly account for boundary effects. The second is a new test statistic based on Correa’s [10] local linear entropy estimator, which improves density estimation by locally interpolating the quantile function. Together, these methods aim to enhance the performance and reliability of goodness-of-fit testing.

The remainder of the paper is organized as follows. Section 2 introduces the two proposed test statistics. Section 3 presents their theoretical properties. Section 4 describes the computational implementation using a bootstrap procedure. Section 5 provides simulation studies and real-data applications to evaluate the tests’ performance. Finally, Section 6 concludes and outlines potential future research directions.

2. Two Approaches for Goodness-of-Fit Testing

In this section, we introduce two test statistics for assessing model adequacy. Both are constructed as likelihood-type ratios, where the numerator provides a nonparametric estimate of the true density, obtained either through the boundary-corrected m-spacing method or through Correa’s local linear entropy estimator. The denominator corresponds to the fitted parametric model under the null hypothesis. Values of the statistic close to one indicate good agreement between the empirical and theoretical densities, whereas large values provide evidence against the adequacy of the hypothesized model.

The first test is a boundary-corrected version of the empirical likelihood ratio statistic of Vexler and Gurevich [3], designed to mitigate bias near the support boundaries. The second test is based on Correa’s [10] local linear entropy estimator, which improves density estimation by locally approximating the derivative of the quantile function.

2.1. Corrected Test Statistic of Vexler and Gurevich (2010)

As noted in Section 1, the original test statistic proposed by Vexler and Gurevich is subject to boundary bias, particularly when the index i lies near the extremes of the support. This bias arises because the slope between

X_{(i - m)}

and

X_{(i + m)}

in the empirical distribution is not accurately estimated at the boundaries. To address this issue, we propose a boundary-corrected estimator, following Ebrahimi et al. [9] and Al-Labadi et al. [11]. The estimator is defined as

f_{1, n} (X_{(i)}) = \frac{F_{n} (X_{(i + m)}) - F_{n} (X_{(i - m)})}{X_{(i + m)} - X_{(i - m)}} = \frac{c_{i} m / n}{X_{(i + m)} - X_{(i - m)}},

where

X_{(i - m)} = X_{(1)}

if

i \leq m

, and

X_{(i + m)} = X_{(n)}

if

i \geq n - m

. The correction factor

c_{i}

is defined as

\begin{matrix} c_{i} = \{\begin{matrix} \frac{m + i - 1}{m}, & if 1 \leq i \leq m, \\ 2, & if m + 1 \leq i \leq n - m, \\ \frac{n + m - i}{m}, & if n - m + 1 \leq i \leq n . \end{matrix} \end{matrix}

(4)

This position-dependent factor replaces the constant value 2 used in the original formulation and leads to improved accuracy in density estimation near the boundaries. The resulting test statistic takes the form given by

\begin{matrix} {\tilde{T}}_{m n} = \frac{\prod_{i = 1}^{n} (\frac{c_{i} m}{n (X_{(i + m)} - X_{(i - m)})})}{\prod_{i = 1}^{n} f_{0} (X_{i}; \hat{θ})}, \end{matrix}

(5)

where

\hat{θ}

denotes the maximum likelihood estimator under the null model

f_{0}

. The selection method for m is described in (8).

2.2. Correa’s Entropy-Based Density Estimator

Recognizing that the test statistic in (5) is constructed as a likelihood-type ratio, where the numerator represents a nonparametric estimate of the true density, we propose a new test statistic based on the entropy estimator of Correa [10], which relies on estimating the derivative of the quantile function via local linear regression. Unlike spacing-based methods, Correa’s estimator directly models quantile derivatives via local linear regression, avoiding boundary issues and adaptively smoothing noise, particularly advantageous for multimodal or skewed densities. Let X be a continuous random variable with cumulative distribution function F and density f, and define the quantile function as

q (p) = F^{- 1} (p)

. The Shannon entropy can be expressed in terms of the quantile function as

H (f) = \int_{0}^{1} log q^{'} (p) d p,

so that entropy estimation reduces to estimating

q^{'} (p)

. Correa’s method estimates this derivative locally using linear regression.

Given a sample

X_{1}, \dots, X_{n}

, let

X_{(1)} \leq \dots \leq X_{(n)}

denote the order statistics and define the empirical quantiles

p_{i} = i / n

. Since

X_{(i)} \approx F^{- 1} (p_{i})

, a local linear regression is performed in a neighborhood of each

p_{i}

:

X_{(j)} = a_{i} + b_{i} p_{j} + ε_{j}, j = i - m, \dots, i + m,

where the slope

b_{i}

is estimated via least squares as

\begin{matrix} b_{i} = \frac{\sum_{j = i - m}^{i + m} (p_{j} - {\bar{p}}_{i}) (X_{(j)} - {\bar{X}}_{i})}{\sum_{j = i - m}^{i + m} {(p_{j} - {\bar{p}}_{i})}^{2}}, \end{matrix}

(6)

with

{\bar{p}}_{i}

and

{\bar{X}}_{i}

denoting local means. According to Correa [10], an estimator for

H (f)

is then

\hat{H} (f) = \frac{1}{n} \sum_{i = 1}^{n} log b_{i} .

Since

q^{'} (p_{i}) \approx b_{i} \approx 1 / f (X_{(i)})

, the corresponding density estimate is defined as

f_{1, n} (X_{(i)}) = \frac{1}{b_{i}} .

The proposed test statistic based on Correa’s estimator is therefore given by

\begin{matrix} T_{m n}^{new} = \frac{\prod_{i = 1}^{n} f_{1, n} (X_{(i)})}{\prod_{i = 1}^{n} f_{0} (X_{i}; \hat{θ})} = \frac{\prod_{i = 1}^{n} b_{i}^{- 1}}{\prod_{i = 1}^{n} f_{0} (X_{i}; \hat{θ})}, \end{matrix}

(7)

where

b_{i}

is as defined in (6). In Section 4, we demonstrate that use of (7) yields a powerful goodness-of-fit test.

3. Theoretical Results

In this section, we investigate the asymptotic properties of the proposed test statistic

T_{m n}^{new}

introduced in the previous section. In particular, we show that the normalized logarithm of the test statistic (7) converges to zero under the null hypothesis and to a strictly positive limit under fixed alternatives. These results hold under mild regularity conditions and do not require asymptotic normality or parametric assumptions. Instead, they follow from standard consistency properties of nonparametric density estimators and laws of large numbers.

Let

X_{1}, \dots, X_{n}

be i.i.d. observations from a continuous distribution with density f, and let

f_{0} (x) = f_{0} (x; θ)

denote the fully specified null density. The test statistic

T_{m n}^{new}

is as defined in (7). We impose the following regularity conditions:

(C1): The density f is continuous and strictly positive on its support.
(C2): The null density $f_{0}$ is continuous and strictly positive on the support of f, and $log f_{0} (x; θ)$ is integrable under f.

It is well known that under (C1), the local linear estimator

1 / b_{i}

is consistent in probability for

f (X_{(i)})

at each fixed i [10]; that is,

1 / b_{i} \overset{p}{\to} f (X_{(i)}) as n \to \infty .

Lemma 1.

Under conditions(C1)–(C2), the following hold:

(i): Under $H_{0} : f = f_{0}$ , we have

$\frac{1}{n} log T_{m n}^{new} \overset{p}{\to} 0 .$
(ii): Under $H_{1} : f \neq f_{0}$ , we have

$\frac{1}{n} log T_{m n}^{new} \overset{p}{\to} E_{f} [log (\frac{f (X)}{f_{0} (X)})] > 0,$

where the expectation is taken with respect to the true density f.

Proof.

We express the log of the test statistic as

log T_{m n}^{new} = \sum_{i = 1}^{n} [log f_{1, n} (X_{(i)}) - log f_{0} (X_{i})] = \sum_{i = 1}^{n} [log (\frac{1}{b_{i}}) - log f_{0} (X_{i})] .

Thus, the normalized log statistic becomes

\frac{1}{n} log T_{m n}^{new} = \frac{1}{n} \sum_{i = 1}^{n} [log (\frac{1}{b_{i}}) - log f_{0} (X_{i})] .

Part (i): Under the null hypothesis

f = f_{0}

, by the consistency of the estimator,

1 / b_{i} \overset{p}{\to} f_{0} (X_{i})

for each i. By the continuous mapping theorem,

log (1 / b_{i}) \overset{p}{\to} log f_{0} (X_{i})

, so each summand converges in probability to zero. To extend this pointwise convergence to the average, it is necessary that the sequence

{\{log (1 / b_{i}) - log f_{0} (X_{i})\}}_{i = 1}^{n}

is uniformly integrable. This condition guarantees that convergence in probability of the summands, together with integrability of

log f_{0} (X_{i})

by (C2), implies convergence of the average. Hence, by uniform integrability,

\frac{1}{n} log T_{m n}^{new} \overset{p}{\to} 0 .

Part (ii): Under the alternative

f \neq f_{0}

,

1 / b_{i} \overset{p}{\to} f (X_{i})

, so

log (\frac{1}{b_{i} f_{0} (X_{i})}) \overset{p}{\to} log (\frac{f (X_{i})}{f_{0} (X_{i})}) .

The weak law of large numbers then gives

\frac{1}{n} \sum_{i = 1}^{n} log (\frac{1}{b_{i} f_{0} (X_{i})}) \overset{p}{\to} E_{f} [log (\frac{f (X)}{f_{0} (X)})] .

The expectation on the right is the Kullback–Leibler divergence

D_{KL} (f ∥ f_{0})

, which is strictly positive under

H_{1}

. This completes the proof. □

This result confirms that the proposed test statistic

T_{m n}^{new}

is consistent and does not require knowledge of the null distribution’s asymptotic form or any parametric estimation procedure under the alternative.

Remark 1.

1.: The requirement of uniform integrability here is not restrictive. Since $log f_{0} (X_{i})$ is integrable by (C2) and $1 / b_{i}$ is a consistent estimator of $f_{0} (X_{i})$ , the difference $log (1 / b_{i}) - log f_{0} (X_{i})$ inherits uniform integrability under the null. Thus, the additional condition is automatically satisfied in this setting.
2.: The same arguments and consistency results hold for the boundary-corrected m-spacing test statistic ${\tilde{T}}_{m n}$ defined in Equation (5). The boundary correction, as in Ebrahimi et [9], ensures that the m-spacing estimator is consistent in probability for $f (X_{(i)})$ at each sample point, even near the endpoints of the support. Thus, the asymptotic properties established above for $T_{m n}^{new}$ also apply to ${\tilde{T}}_{m n}$ under the same regularity conditions.

Unlike classical likelihood ratio tests, which rely on parameter estimation under both the null and alternative models, the proposed statistic

T_{m n}^{new}

depends only on parameter estimation under the null hypothesis (if required). In fully specified null cases, it avoids parameter estimation entirely. This feature simplifies implementation and enhances robustness, especially in settings where maximum likelihood estimation is computationally challenging or unreliable. This simplification enhances robustness and broadens applicability, especially in settings where maximum likelihood estimation is unstable or computationally intensive.

4. Computational Algorithm

To implement the proposed test statistics

{\tilde{T}}_{m n}

and

T_{m n}^{new}

, it is necessary to select an appropriate window size parameter m. A commonly used rule, suggested by Grzegorzewski and Wieczorkowski [12], is

\begin{matrix} m = ⌊\sqrt{n} + 0.5⌋, \end{matrix}

(8)

where

⌊ \cdot ⌋

denotes the floor function. This choice effectively balances bias and variance in the entropy-based density estimators and is widely adopted in practice.

After computing the test statistic, the next step is to assess whether its value provides sufficient evidence to reject the null hypothesis. Values of the test statistic close to one indicate agreement between the empirical and theoretical densities, whereas large values suggest model misspecification and provide evidence against the null. Since the asymptotic distribution of

T_{m n}^{new}

is analytically intractable, we employ a bootstrap procedure to approximate its null distribution and obtain the corresponding p-value.

The following algorithm outlines the steps for conducting the bootstrap-based goodness-of-fit test:

1.: Given an observed sample $X_{1}, \dots, X_{n}$ , compute the test statistic $T_{m n}^{new}$ as defined in Equation (7).
2.: Fit the null model $f_{0} (x; \hat{θ})$ to the data, where $\hat{θ}$ denotes the maximum likelihood estimator under $H_{0}$ .
3.: Generate B bootstrap samples ${X_{1}^{* (b)}, \dots, X_{n}^{* (b)}}$ , for $b = 1, \dots, B$ , by sampling from the fitted null model $f_{0} (x; \hat{θ})$ .
4.: For each bootstrap sample, compute the corresponding test statistic $T_{m n}^{new, * (b)}$ .
5.: Estimate the bootstrap p-value as

$\begin{matrix} p_{boot} = \frac{1}{B} \sum_{b = 1}^{B} I (T_{m n}^{new, * (b)} \geq T_{m n}^{new}), \end{matrix}$

(9)

where $I (\cdot)$ denotes the indicator function.
6.: Reject the null hypothesis $H_{0}$ at significance level $α$ if $p_{boot} < α$ .

This resampling approach avoids reliance on the asymptotic distribution of the test statistic, making the procedure suitable for small to moderate sample sizes and for complex models. An identical bootstrap procedure is applied to compute the p-value for the statistic

{\tilde{T}}_{m n}

defined in Equation (5).

Note that, although the test statistics

{\tilde{T}}_{m n}

and

T_{m n}^{n e w}

are defined in multiplicative form, their theoretical properties are most naturally expressed on the log scale. When the null hypothesis

H_{0}

holds, the entropy-based density estimators

{\hat{f}}_{n}

consistently estimate the parametric density

f_{0} (\cdot; θ_{0})

. By Lemma 1(i), the normalized log-statistics

\frac{1}{n} log {\tilde{T}}_{m n} and \frac{1}{n} log T_{m n}^{n e w}

converge in probability to zero. Consequently, the observed values

log {\tilde{T}}_{m n}

and

log T_{m n}^{n e w}

fluctuate around zero in finite samples. Since the bootstrap samples are generated from

f_{0} (\cdot; \hat{θ})

, their corresponding statistics

log {\tilde{T}}_{m n}^{* (b)}

and

log T_{m n}^{new, * (b)}

also concentrate near zero, ensuring that the bootstrap distribution provides a valid approximation to the null distribution. Thus, the bootstrap p-values are approximately uniform under

H_{0}

, thereby controlling the Type I error.

In contrast, when

H_{0}

is false, Lemma 1(ii) shows that

\frac{1}{n} log {\tilde{T}}_{m n} \overset{p}{\to} E_{f} [log \frac{f (X)}{f_{0} (X; θ^{*})}] > 0, \frac{1}{n} log T_{m n}^{n e w} \overset{p}{\to} E_{f} [log \frac{f (X)}{f_{0} (X; θ^{*})}] > 0,

where

θ^{*}

minimizes the Kullback–Leibler divergence between f and

f_{0} (\cdot; θ)

. Thus, the observed statistics become much larger than their bootstrap replicates, which remain centered near zero. This separation forces the bootstrap p-values to converge to zero as

n \to \infty

, thereby guaranteeing the consistency and power of the proposed tests.

5. Simulation Study

In this section, we evaluate the performance of the proposed test statistics through both simulation studies and real data applications. Due to the complexity and nonparametric nature of the estimators, the exact sampling distributions under the null hypothesis are analytically intractable. Therefore, we employ the bootstrap procedure described in Section 4 to approximate the null distribution and compute the corresponding p-values.

For each scenario in Examples 1 and 2, samples of size

n \in {20, 50, 100, 200}

are generated from the specified true distributions, as detailed in Table 1 and Table 2. The test statistics

T_{m n}^{origin} = T_{m n}

,

{\tilde{T}}_{m n}

, and

T_{m n}^{new}

, as defined in Equations (2), (5), and (7), respectively, are computed for each sample. Corresponding p-values are then estimated using

B = 1000

bootstrap replications from the fitted null model. To ensure reproducibility, the random seed is set in R via set.seed(2025). The R code implementing the proposed methods is available upon request from the corresponding author. In what follows, we denote the bootstrap p-values by

p_{boot}

, as defined in (9).

Example 1.

(Testing

N (μ, 1)

as the Null): We begin by testing the null hypothesis

H_{0} : f = f_{0}

, where

f_{0}

is the density of

N (μ, 1)

with μ unknown and estimated by

\hat{μ} = \bar{x}

for each sample.

Samples are generated from various true distributions, as listed in Table 1, to evaluate the sensitivity of the proposed test statistics to departures from normality. These alternatives include normal distributions with different means and variances, a symmetric mixture, as well as heavy-tailed and skewed distributions.

Table 1 displays the resulting p-values. When the data are truly normal—whether matching the null mean (

N (0, 1)

) or with a shifted mean (

N (5, 1)

)—the tests do not reject the null hypothesis and p-values are large for all sample sizes, as expected. As the alternative distributions deviate further from normality (e.g., increased variance or heavier tails), the tests become more sensitive. For instance, under the

t_{3}

distribution, the null is not rejected for small samples (

n = 20

) but is rejected for larger n, reflecting the increase in power. For non-normal alternatives like the Cauchy and exponential distributions, the tests exhibit high power, yielding near-zero p-values even for moderate sample sizes.

Notably,

{\tilde{T}}_{m n}

and

T_{m n}^{origin}

yield identical results in our simulations (see Table 1), while

T_{m n}^{new}

can be marginally more sensitive in some cases, especially for moderate sample sizes or challenging alternatives. Overall, the results demonstrate that the proposed methods maintain the correct Type I error rate under the null and reliably detect departures from normality as the sample size increases.

Although

{\tilde{T}}_{m n}

modifies the construction of the statistic by introducing a boundary correction, it is in fact equivalent to the original statistic up to a constant factor. To see the equivalence between

T_{m n}^{origin} = T_{m n}

and

{\tilde{T}}_{m n}

, we fix the window size m. By (3) and (5), we obtain

\frac{{\tilde{T}}_{m n}}{T_{m n}} = \frac{\frac{\prod_{i = 1}^{n} (\frac{c_{i} m}{n (X_{(i + m)} - X_{(i - m)})})}{\prod_{i = 1}^{n} f_{0} (X_{i}; \hat{θ})}}{\frac{\prod_{i = 1}^{n} (\frac{2 m}{n (X_{(i + m)} - X_{(i - m)})})}{\prod_{i = 1}^{n} f_{0} (X_{i}; \hat{θ})}} = \prod_{i = 1}^{n} \frac{c_{i}}{2} .

The following lemma provides an explicit form of this product.

Lemma 2.

For each n and m, we have

\prod_{i = 1}^{n} \frac{c_{i}}{2} = C (n, m),

where

C (n, m) = 2^{- 2 m} {(\frac{(2 m - 1)!}{(m - 1)! m^{m}})}^{2} .

Proof.

From the definition of

c_{i}

in (4), we can distinguish between interior indices and boundary indices.

First, consider the interior indices, i.e.,

i = m + 1, \dots, n - m

. For these indices we have

c_{i} = 2

. Hence, each term contributes

\frac{c_{i}}{2} = \frac{2}{2} = 1 .

Since there are

n - 2 m

such indices, their total contribution to the product

\prod_{i = 1}^{n} c_{i} / 2

is simply 1.

Next, consider the lower boundary indices, i.e.,

i = 1, \dots, m

. For these indices we have

c_{i} = \frac{m + i - 1}{m},

so that

\prod_{i = 1}^{m} \frac{c_{i}}{2} = \prod_{i = 1}^{m} \frac{m + i - 1}{2 m} .

Reindex the product by letting

k = m + i - 1

. Then, as i runs from 1 to m, k runs from m to

2 m - 1

. Thus,

\prod_{i = 1}^{m} \frac{c_{i}}{2} = \frac{1}{{(2 m)}^{m}} \prod_{k = m}^{2 m - 1} k .

Now, consider the upper boundary indices, i.e.,

i = n - m + 1, \dots, n

. For these indices we have

c_{i} = \frac{n + m - i}{m} .

Let

j = n + 1 - i

. Then, as i runs from

n - m + 1

to n, j runs from 1 to m. Substituting gives

c_{n + 1 - j} = \frac{m - 1 + j}{m}, j = 1, \dots, m .

Hence,

\frac{c_{n + 1 - j}}{2} = \frac{m - 1 + j}{2 m}, j = 1, \dots, m .

Therefore, the set of factors from the upper boundary is

\{\frac{m}{2 m}, \frac{m + 1}{2 m}, \dots, \frac{2 m - 1}{2 m}\},

which is exactly the same as the set from the lower boundary. Thus,

\prod_{i = n - m + 1}^{n} \frac{c_{i}}{2} = \frac{1}{{(2 m)}^{m}} \prod_{k = m}^{2 m - 1} k .

Combining the lower and upper boundary contributions, we obtain

{(\frac{1}{{(2 m)}^{m}} \prod_{k = m}^{2 m - 1} k)}^{2} .

Since the interior indices contribute 1, the overall constant is

C (n, m) = \prod_{i = 1}^{n} \frac{c_{i}}{2} = {(\frac{1}{{(2 m)}^{m}} \prod_{k = m}^{2 m - 1} k)}^{2} .

To simplify the product, observe that

(2 m - 1)! = (1 \cdot 2 \cdot \dots \cdot (m - 1)) \cdot (m \cdot (m + 1) \dots (2 m - 1)) .

The first part is

(m - 1)!

, while the second part is exactly

\prod_{k = m}^{2 m - 1} k

. Therefore,

\prod_{k = m}^{2 m - 1} k = \frac{(2 m - 1)!}{(m - 1)!} .

Substituting back yields

C (n, m) = 2^{- 2 m} {(\frac{(2 m - 1)!}{(m - 1)! m^{m}})}^{2} .

□

These constants depend only on the pair

(n, m)

and not on the observed data. Consequently, the boundary-corrected statistic

{\tilde{T}}_{m n}

differs from the original

T_{m n}

only by multiplication with

C (n, m)

. Since the same constant rescales both the observed test statistic and the bootstrap distribution under the null, the resulting p-values remain identical. More explicitly, under the null we have

Pr {{\tilde{T}}_{m n}^{*} \geq {\tilde{t}}_{obs}} = Pr {C (n, m) T_{m n}^{*} \geq C (n, m) t_{obs}} = Pr {T_{m n}^{*} \geq t_{obs}} .

Thus, the boundary correction reduces edge bias in the density estimate but introduces only a constant multiplicative factor in the test statistic. For hypothesis testing based on parametric bootstrap calibration, it has no impact on the p-values. This explains why Table 1, Table 2 and Table 3 report identical p-values for

T_{m n}

and

{\tilde{T}}_{m n}

across all scenarios. Accordingly, in the following example, we omit reporting

T_{m n}

separately.

Example 2.

(Testing

N (μ, σ^{2})

as the Null): In this simulation, we test the null hypothesis

H_{0} : f = f_{0}

, where

f_{0}

is the density of the normal distribution

N (μ, σ^{2})

with both mean μ and variance

σ^{2}

unknown. The maximum likelihood estimators

\hat{μ} = \bar{x}

and

{\hat{σ}}^{2} = \frac{1}{n} \sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}

are used to fit the null model.

Samples are generated from the same set of alternative distributions as in Example 1, including normal distributions with different means and variances, symmetric mixtures, as well as heavy-tailed and skewed distributions. For each sample size

n \in {20, 50, 100, 200}

, the test statistics

{\tilde{T}}_{m n}

and

T_{m n}^{new}

are computed, and bootstrap p-values are estimated using

B = 1000

replications.

Table 2 reports the resulting p-values, which provide insight into the performance of the tests when both location and scale are unknown. When the data are drawn from a normal distribution (either with the same or shifted mean), the tests maintain the nominal Type I error rate, as p-values remain large across all sample sizes. For alternatives with heavier tails, skewness, or mixtures, the p-values decrease rapidly as the sample size increases, demonstrating the ability of the proposed tests to detect deviations from the null model. The tests are especially powerful against the Cauchy and exponential alternatives, with p-values near zero even for small samples. When both location and scale are unknown, the tests show some loss of sensitivity to changes in variance alone, as seen in the

N (0, 4)

case, where p-values only decrease substantially for larger samples. Overall,

{\tilde{T}}_{m n}

and

T_{m n}^{new}

provide complementary perspectives, with the latter sometimes exhibiting greater sensitivity in challenging cases and with moderate sample sizes. These results highlight the robustness and power of the proposed methods for model assessment under a composite normal null hypothesis.

Example 3.

(Real Data–Yarn Strength): We apply the proposed tests to the breaking strength values of 100 yarns, originally reported by Duncan [13]:

\begin{matrix} 66, & 117, & 132, & 111, & 107, & 85, & 89, & 79, & 91, & 97, & 138, & 103, & 111, \\ 86, & 78, & 96, & 93, & 101, & 102, & 110, & 95, & 96, & 88, & 122, & 115, & 92, \\ 137, & 91, & 84, & 96, & 97, & 100, & 105, & 104, & 137, & 80, & 104, & 104, & 106, \\ 84, & 92, & 86, & 104, & 132, & 94, & 99, & 102, & 101, & 104, & 107, & 99, & 85, \\ 95, & 89, & 102, & 100, & 98, & 97, & 104, & 114, & 111, & 98, & 99, & 102, & 91, \\ 95, & 111, & 104, & 97, & 98, & 102, & 109, & 88, & 91, & 103, & 94, & 105, & 103, \\ 96, & 100, & 101, & 98, & 97, & 97, & 101, & 102, & 98, & 94, & 100, & 98, & 99, \\ 92, & 102, & 87, & 99, & 62, & 92, & 100, & 96, & 98 \end{matrix}

We assess whether these data are adequately modeled by a Laplace distribution,

f (x) = \frac{1}{2 σ} exp (- \frac{| x - μ |}{σ}),

where

μ

and

σ > 0

denote the location and scale parameters, respectively. The maximum likelihood estimates are

\hat{μ} = 99

and

\hat{σ} = 8.33

, consistent with the values reported by Alizadeh Noughabi [7]. Using these estimates, we compute the p-values for

{\tilde{T}}_{m n}

and

T_{m n}^{new}

via a bootstrap procedure with

B = 1000

replications. The resulting p-values,

0.493

for

{\tilde{T}}_{m n}

and

0.875

for

T_{m n}^{new}

, are both well above the conventional

0.05

significance level, providing no evidence against the Laplace model. These results support the adequacy of the Laplace distribution for the yarn strength data and further indicate that the Correa-based statistic tends to be more conservative, offering stronger support in this setting.

Example 4. (Real Data–River Flow): We assess the suitability of the three-parameter gamma distribution for modeling river flow measurements. The density is given by

f (x; α, β, γ) = \frac{β^{α} {(x - γ)}^{α - 1} e^{- β (x - γ)}}{Γ (α)}, x > γ,

where

α > 0

,

β > 0

, and

γ > 0

are the shape, rate, and location parameters, respectively.

The dataset comprises river flow measurements (in millions of cubic feet per second) from the Susquehanna River at Harrisburg, Pennsylvania, recorded over the five-year period 1980–1984:

\begin{matrix} 0.654, & 0.613, & 0.315, & 0.449, & 0.297, & 0.402, & 0.379, & 0.423, & 0.379, & 0.324, \\ 0.269, & 0.740, & 0.418, & 0.412, & 0.494, & 0.416, & 0.338, & 0.392, & 0.484, & 0.265 \end{matrix}

Following Al-Labadi and Evans [14], the maximum likelihood estimates for the parameters are

\hat{α} = 1.7050

,

\hat{β} = 0.0955

, and

\hat{γ} = 0.2602

. Using these estimates, we apply the bootstrap algorithm with

B = 1000

replications to compute the p-values for the

{\tilde{T}}_{m n}

and

T_{m n}^{new}

test statistics.

The resulting p-values are

0.810

for

{\tilde{T}}_{m n}

and

0.916

for

T_{m n}^{new}

. Both values are well above conventional significance thresholds, indicating strong agreement between the observed data and the fitted three-parameter gamma model. These results confirm the suitability of the gamma distribution for describing the river flow data and show that the Correa-based statistic tends to be slightly more conservative. This conclusion is consistent with the Bayesian nonparametric test of Al-Labadi and Evans [14], which likewise does not spuriously reject the adequacy of the gamma model.

To further evaluate the effectiveness of the proposed tests in detecting model misspecification, we conduct a power analysis under several alternative distributions, as well as under the null hypothesis. For each scenario, we fix the significance level at

α = 0.05

and generate 1000 independent samples for each sample size

n \in {20, 50, 100, 200}

.

For each sample, the test statistics

{\tilde{T}}_{m n}

and

T_{m n}^{new}

are computed, and the corresponding p-values are estimated using

B = 1000

bootstrap replications from the null model

N (0, 1)

, as described in Section 4. The empirical rejection rate is calculated as the proportion of samples for which the null hypothesis is rejected.

To assess Type I error control, we also report results under the null hypothesis

H_{0}

, where samples are drawn from

N (0, 1)

. The empirical rejection rates in this case should be close to the nominal level

α

, indicating the validity of the bootstrap calibration and the reliability of the testing procedure.

The alternative distributions considered highlight various types of departures from the null, including a mean shift (

N (1, 1)

), heavy tails (Cauchy (0, 1)), and a symmetric distribution with different kurtosis (Logistic (0, 1)). This range of alternatives enables a comprehensive assessment of the sensitivity and robustness of the proposed methods.

The results, presented in Table 3, demonstrate that both tests maintain appropriate Type I error rates when

H_{0}

is true. Moreover, the power increases rapidly with sample size and is highest when the true distribution deviates substantially from the null. Notably, the test based on Correa’s local linear entropy estimator (

T_{m n}^{new}

) consistently outperforms the boundary-corrected m-spacing test (

{\tilde{T}}_{m n}

), particularly for small to moderate samples and challenging alternatives. These findings confirm the strong performance of the proposed methods, both in maintaining control of Type I error and in delivering high power against a range of alternatives.

Taken together, the simulation results and real-data examples demonstrate that both proposed tests provide reliable inference for model adequacy across a range of scenarios. The tests maintain proper Type I error control under the null, deliver substantial power under diverse alternatives, and do not spuriously reject in well-specified real-data settings. The new test based on Correa’s entropy estimator in particular offers notable advantages for moderate samples and challenging alternatives, while the boundary-corrected m-spacing approach retains strong and consistent performance. These findings illustrate the utility and flexibility of entropy-based density estimation methods for modern goodness-of-fit testing.

6. Conclusions

In this paper, we proposed two bootstrap-based test statistics for assessing the goodness-of-fit of fully specified parametric models. The first test is a boundary-corrected version of the empirical likelihood ratio statistic originally introduced by Vexler and Gurevich [3], which incorporates a position-dependent correction factor to improve density estimation near the boundaries. Although the correction modifies the construction of the statistic, in practice it yields results that are numerically identical up to a fixed multiplicative constant and therefore lead to the same p-values as the uncorrected version. The second test is based on Correa’s [10] local linear entropy estimator, which provides a flexible and accurate alternative by utilizing local linear regression to approximate the derivative of the quantile function.

We established the theoretical properties of the proposed statistics, demonstrating their consistency and showing that, under fixed alternatives, they converge to the Kullback–Leibler divergence. Since the asymptotic distributions of these statistics are analytically intractable, we developed a bootstrap algorithm for practical implementation.

Comprehensive simulation studies demonstrated that both test statistics perform well in terms of controlling Type I error and detecting model misspecification. In particular, the test based on Correa’s estimator exhibited superior power across a wide range of alternatives, especially in small to moderate sample sizes. Applications to real data further confirmed the utility and flexibility of the proposed methods.

Unlike composite likelihood ratio tests, which typically involve parameter estimation under both the null and alternative models, the proposed approach only requires parameter estimation under the null hypothesis (if at all). In the case of a fully specified null, parameter estimation is entirely avoided. This significantly simplifies implementation and enhances robustness, especially in settings where maximum likelihood estimation is computationally challenging or unreliable.

Future research directions include extending these tests to multivariate distributions, regression models, and right-censored data.

Author Contributions

All authors have contributed equally on this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Acknowledgments

The authors gratefully acknowledge the editor and the two reviewers for their valuable comments and suggestions, which have significantly improved the quality of this paper. We would also like to thank Karen Mabroukeh for proofreading the manuscript and enhancing its readability.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lehmann, E.L.; Romano, J.P. Testing Statistical Hypotheses, 4th ed.; Springer: New York, NY, USA, 2022. [Google Scholar]
Casella, G.; Berger, R.L. Statistical Inference, 2nd ed.; Duxbury Press: Pacific Grove, CA, USA, 2002. [Google Scholar]
Vexler, A.; Gurevich, G. Empirical likelihood ratios applied to goodness-of-fit tests based on sample entropy. Comput. Stat. Data Anal. 2010, 54, 531–545. [Google Scholar] [CrossRef]
Vasicek, O. A test for normality based on sample entropy. J. R. Stat. Soc. Ser. B (Methodol.) 1976, 38, 54–59. [Google Scholar] [CrossRef]
Alizadeh Noughabi, H. Empirical likelihood ratio-based goodness-of-fit test for the logistic distribution. J. Appl. Stat. 2015, 42, 1973–1983. [Google Scholar] [CrossRef]
Ning, W.; Ngunkeng, G. An empirical likelihood ratio based goodness-of-fit test for skew normality. Stat. Methods Appl. 2013, 22, 209–226. [Google Scholar] [CrossRef]
Alizadeh Noughabi, H. Empirical likelihood ratio-based goodness-of-fit test for the Laplace distribution. Commun. Math. Stat. 2016, 4, 459–471. [Google Scholar] [CrossRef]
Safavinejad, M.; Jomhoori, S.; Alizadeh Noughabi, H. A density-based empirical likelihood ratio goodness-of-fit test for the Rayleigh distribution and power comparison. J. Stat. Comput. Simul. 2015, 85, 3322–3334. [Google Scholar] [CrossRef]
Ebrahimi, N.; Pflughoeft, K.; Soofi, E.S. Two measures of sample entropy. Stat. Probab. Lett. 1994, 20, 225–234. [Google Scholar] [CrossRef]
Correa, J.C. A new estimator of entropy. Commun. Stat. Theory Methods 1995, 24, 2439–2449. [Google Scholar] [CrossRef]
Al-Labadi, L.; Chu, Z.; Xu, Y. Advancements in Rényi entropy and divergence estimation for model assessment. Comput. Stat. 2025, 40, 633–650. [Google Scholar]
Grzegorzewski, P.; Wieczorkowski, R. Entropy-based goodness-of-fit test for exponentiality. Commun. Stat. Theory Methods 1999, 28, 1183–1202. [Google Scholar] [CrossRef]
Duncan, A.J. Quality Control and Industrial Statistics, 5th ed.; Irwin: Homewood, IL, USA, 1974. [Google Scholar]
Al-Labadi, L.; Evans, M. Goodness-of-fit for the three-parameter gamma model. J. Appl. Stat. 2018, 45, 317–334. [Google Scholar]

Table 1. Bootstrap p-values (

p_{boot}

) for various true distributions under

H_{0} : f = f_{0}

, where

f_{0}

is the density of

N (μ, 1)

, using

{\tilde{T}}_{m n}

and

T_{m n}^{new}

.

Table 1. Bootstrap p-values (

p_{boot}

) for various true distributions under

H_{0} : f = f_{0}

, where

f_{0}

is the density of

N (μ, 1)

, using

{\tilde{T}}_{m n}

and

T_{m n}^{new}

.

Sample Size	True Distribution	${\tilde{T}}_{mn}$	$T_{mn}^{new}$	$T_{mn}^{origin}$
20	$N (0, 1)$	0.995	0.949	0.995
	$N (5, 1)$	0.992	0.927	0.992
	$N (0, 4)$	0.002	0.000	0.002
	$0.5 N (- 1, 1) + 0.5 N (1, 1)$	0.006	0.010	0.006
	$t_{3}$	0.663	0.571	0.663
	Logistic (0, 1)	0.131	0.076	0.131
	Cauchy (0, 1)	0.000	0.000	0.000
	Exponential (1)	0.034	0.031	0.034
50	$N (0, 1)$	0.757	0.745	0.757
	$N (5, 1)$	0.776	0.751	0.776
	$N (0, 4)$	0.000	0.000	0.000
	$0.5 N (- 1, 1) + 0.5 N (1, 1)$	0.005	0.000	0.005
	$t_{3}$	0.002	0.000	0.002
	Logistic (0, 1)	0.000	0.000	0.000
	Cauchy (0, 1)	0.000	0.000	0.000
	Exponential (1)	0.000	0.000	0.000
100	$N (0, 1)$	0.917	0.958	0.917
	$N (5, 1)$	0.928	0.965	0.928
	$N (0, 4)$	0.000	0.000	0.000
	$0.5 N (- 1, 1) + 0.5 N (1, 1)$	0.000	0.000	0.000
	$t_{3}$	0.000	0.000	0.000
	Logistic (0, 1)	0.000	0.000	0.000
	Cauchy (0, 1)	0.000	0.000	0.000
	Exponential (1)	0.000	0.000	0.000
200	$N (0, 1)$	0.737	0.530	0.737
	$N (5, 1)$	0.740	0.541	0.740
	$N (0, 4)$	0.000	0.000	0.000
	$0.5 N (- 1, 1) + 0.5 N (1, 1)$	0.000	0.000	0.000
	$t_{3}$	0.000	0.000	0.000
	Logistic (0, 1)	0.000	0.000	0.000
	Cauchy (0, 1)	0.000	0.000	0.000
	Exponential (1)	0.000	0.000	0.000

Table 2. Bootstrap p-values (

p_{boot}

) for various true distributions under

H_{0} : f = f_{0}

, where

f_{0}

is the density of the normal distribution

N (μ, σ^{2})

, using

{\tilde{T}}_{m n}

and

T_{m n}^{new}

.

Table 2. Bootstrap p-values (

p_{boot}

) for various true distributions under

H_{0} : f = f_{0}

, where

f_{0}

is the density of the normal distribution

N (μ, σ^{2})

, using

{\tilde{T}}_{m n}

and

T_{m n}^{new}

.

Sample Size	True Distribution	${\tilde{T}}_{mn}$	$T_{mn}^{new}$
20	$N (0, 1)$	0.999	0.986
	$N (5, 1)$	0.998	0.978
	$N (0, 4)$	0.524	0.411
	$0.5 N (- 1, 1) + 0.5 N (1, 1)$	0.605	0.795
	$t_{3}$	0.983	0.992
	Logistic (0, 1)	0.936	0.888
	Cauchy (0, 1)	0.000	0.000
	Exponential (1)	0.047	0.039
50	$N (0, 1)$	0.767	0.745
	$N (5, 1)$	0.774	0.773
	$N (0, 4)$	0.765	0.744
	$0.5 N (- 1, 1) + 0.5 N (1, 1)$	0.424	0.676
	$t_{3}$	0.891	0.736
	Logistic (0, 1)	0.899	0.641
	Cauchy (0, 1)	0.000	0.000
	Exponential (1)	0.000	0.000
100	$N (0, 1)$	0.919	0.961
	$N (5, 1)$	0.928	0.959
	$N (0, 4)$	0.930	0.965
	$0.5 N (- 1, 1) + 0.5 N (1, 1)$	0.000	0.000
	$t_{3}$	0.000	0.000
	Logistic (0, 1)	0.722	0.481
	Cauchy (0, 1)	0.000	0.000
	Exponential (1)	0.000	0.000
200	$N (0, 1)$	0.756	0.543
	$N (5, 1)$	0.709	0.533
	$N (0, 4)$	0.733	0.518
	$0.5 N (- 1, 1) + 0.5 N (1, 1)$	0.030	0.001
	$t_{3}$	0.000	0.000
	Logistic (0, 1)	0.939	0.952
	Cauchy (0, 1)	0.000	0.000
	Exponential (1)	0.000	0.000

Table 3. Empirical rejection rates at significance level

α = 0.05

based on 1000 simulations.

Table 3. Empirical rejection rates at significance level

α = 0.05

based on 1000 simulations.

Alternative Distribution	Sample Size	${\tilde{T}}_{mn}$	$T_{mn}^{new}$
Normal (0, 1) (Null)	20	0.041	0.044
	50	0.049	0.041
	100	0.048	0.045
	200	0.056	0.049
Normal (1, 1)	20	0.942	0.953
	50	1.000	1.000
	100	1.000	1.000
	200	1.000	1.000
Cauchy (0, 1)	20	0.980	0.983
	50	1.000	1.000
	100	1.000	1.000
	200	1.000	1.000
Logistic (0, 1)	20	0.830	0.846
	50	0.977	0.991
	100	1.000	1.000
	200	1.000	1.000

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Al-Labadi, L.; Yu, R.; Bao, K. Goodness-of-Fit Tests via Entropy-Based Density Estimation Techniques. Stats 2025, 8, 97. https://doi.org/10.3390/stats8040097

AMA Style

Al-Labadi L, Yu R, Bao K. Goodness-of-Fit Tests via Entropy-Based Density Estimation Techniques. Stats. 2025; 8(4):97. https://doi.org/10.3390/stats8040097

Chicago/Turabian Style

Al-Labadi, Luai, Ruodie Yu, and Kairui Bao. 2025. "Goodness-of-Fit Tests via Entropy-Based Density Estimation Techniques" Stats 8, no. 4: 97. https://doi.org/10.3390/stats8040097

APA Style

Al-Labadi, L., Yu, R., & Bao, K. (2025). Goodness-of-Fit Tests via Entropy-Based Density Estimation Techniques. Stats, 8(4), 97. https://doi.org/10.3390/stats8040097

Article Menu

Goodness-of-Fit Tests via Entropy-Based Density Estimation Techniques

Abstract

1. Introduction

2. Two Approaches for Goodness-of-Fit Testing

2.1. Corrected Test Statistic of Vexler and Gurevich (2010)

2.2. Correa’s Entropy-Based Density Estimator

3. Theoretical Results

4. Computational Algorithm

5. Simulation Study

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI