Next Article in Journal
Expansions for the Conditional Density and Distribution of a Standard Estimate
Previous Article in Journal
Robust Parameter Designs Constructed from Hadamard Matrices
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Goodness-of-Fit Tests via Entropy-Based Density Estimation Techniques

1
Department of Mathematics and Statistics, American University of Sharjah, Sharjah 26666, United Arab Emirates
2
Department of Mathematical & Computational Sciences, University of Toronto Mississauga, Mississauga, ON L5L 1C6, Canada
3
Department of Statistical Sciences, University of Toronto, Toronto, ON M5S 3G3, Canada
*
Author to whom correspondence should be addressed.
Stats 2025, 8(4), 97; https://doi.org/10.3390/stats8040097 (registering DOI)
Submission received: 9 September 2025 / Revised: 26 September 2025 / Accepted: 2 October 2025 / Published: 14 October 2025

Abstract

Goodness-of-fit testing remains a fundamental problem in statistical inference with broad practical importance. In this paper, we introduce two new goodness-of-fit tests grounded in entropy-based density estimation techniques. The first is a boundary-corrected empirical likelihood ratio test, which refines the classic approach by addressing bias near the support boundaries, though, in practice, it yields results very similar to the uncorrected version. The second is a novel test built on Correa’s local linear entropy estimator, leveraging quantile regression to improve density estimation accuracy. We establish the theoretical properties of both test statistics and demonstrate their practical effectiveness through extensive simulation studies and real-data applications. The results show that the proposed methods deliver strong power and flexibility in assessing model adequacy in a wide range of settings.

1. Introduction

Goodness-of-fit testing is a fundamental problem in statistical inference with widespread applications in model selection, validation, and diagnostics. In the parametric framework, the classical approach is often based on the likelihood ratio test (LRT), as motivated by the Neyman–Pearson lemma. While the lemma guarantees the optimality of the LRT for testing simple hypotheses, the LRT is also widely applied to composite hypotheses due to its favorable asymptotic properties [1].
Let X 1 , , X n be independent and identically distributed (i.i.d.) observations from a continuous distribution F with density f. We consider the hypothesis-testing problem
H 0 : f = f 0 vs . H 1 : f = f 1 ,
where under the null hypothesis f 0 ( x ) = f 0 ( x ; θ ) is known up to a parameter vector θ = ( θ 1 , , θ d ) , and under the alternative, f 1 is completely unspecified. The classical likelihood ratio test statistic is given by
Λ = i = 1 n f 1 ( X i ) i = 1 n f 0 ( X i ) .
We reject H 0 in favor of H 1 for large values of Λ , or equivalently, when the log-likelihood ratio
2 log Λ = 2 i = 1 n log f 1 ( X i ) f 0 ( X i )
exceeds a critical threshold. Under standard regularity conditions and assuming H 0 holds, the distribution of 2 log Λ converges asymptotically to a chi-squared distribution with degrees of freedom equal to the difference in dimensionality between the null and alternative models [2].
However, in many practical situations—particularly in nonparametric settings—the alternative density f 1 is unknown, rendering direct application of the LRT infeasible. To address this, nonparametric density-based methods have been developed as flexible alternatives. Among these, empirical likelihood ratio tests and entropy-based statistics have attracted considerable attention.
Vexler and Gurevich [3] introduced an empirical likelihood ratio test in which the unknown density f 1 is estimated nonparametrically using Vasicek’s [4] entropy estimator. They defined the empirical likelihood ratio using
i = 1 n f 1 , n ( X ( i ) ) ,
where
f 1 , n ( X ( i ) ) = F n ( X ( i + m ) ) F n ( X ( i m ) ) X ( i + m ) X ( i m ) = 2 m / n X ( i + m ) X ( i m ) ,
and F n ( x ) denotes the empirical cumulative distribution function. Here, θ ^ is the maximum likelihood estimate under H 0 , and X ( 1 ) X ( 2 ) X ( n ) are the order statistics. Boundary corrections are made such that X ( i m ) = X ( 1 ) if i m , and X ( i + m ) = X ( n ) if i n m . The integer m, called the window size, is a positive number less than n / 2 . Vexler and Gurevich [3] proposed the test statistic
T m n = i = 1 n 2 m n ( X ( i + m ) X ( i m ) ) i = 1 n f 0 ( X i ; θ ^ ) .
Since T m n depends on m, they further suggested the minimization
T m n = min 1 m n δ i = 1 n 2 m n ( X ( i + m ) X ( i m ) ) i = 1 n f 0 ( X i ; θ ^ ) ,
where δ ( 0 , 1 ) .
This general methodology has since been extended to various settings, including logistic distributions [5], skew normality [6], Laplace distributions [7], and Rayleigh distributions [8]. Although the method of Vexler and Gurevich [3] is generally effective, it is known to suffer from boundary bias, particularly near the endpoints—a limitation also discussed by Ebrahimi et al. [9] in the context of entropy estimation.
To address these shortcomings, we propose two test statistics. The first is a corrected version of the Vexler–Gurevich statistic (2) and (3), incorporating a position-dependent correction factor to properly account for boundary effects. The second is a new test statistic based on Correa’s [10] local linear entropy estimator, which improves density estimation by locally interpolating the quantile function. Together, these methods aim to enhance the performance and reliability of goodness-of-fit testing.
The remainder of the paper is organized as follows. Section 2 introduces the two proposed test statistics. Section 3 presents their theoretical properties. Section 4 describes the computational implementation using a bootstrap procedure. Section 5 provides simulation studies and real-data applications to evaluate the tests’ performance. Finally, Section 6 concludes and outlines potential future research directions.

2. Two Approaches for Goodness-of-Fit Testing

In this section, we introduce two test statistics for assessing model adequacy. Both are constructed as likelihood-type ratios, where the numerator provides a nonparametric estimate of the true density, obtained either through the boundary-corrected m-spacing method or through Correa’s local linear entropy estimator. The denominator corresponds to the fitted parametric model under the null hypothesis. Values of the statistic close to one indicate good agreement between the empirical and theoretical densities, whereas large values provide evidence against the adequacy of the hypothesized model.
The first test is a boundary-corrected version of the empirical likelihood ratio statistic of Vexler and Gurevich [3], designed to mitigate bias near the support boundaries. The second test is based on Correa’s [10] local linear entropy estimator, which improves density estimation by locally approximating the derivative of the quantile function.

2.1. Corrected Test Statistic of Vexler and Gurevich (2010)

As noted in Section 1, the original test statistic proposed by Vexler and Gurevich is subject to boundary bias, particularly when the index i lies near the extremes of the support. This bias arises because the slope between X ( i m ) and X ( i + m ) in the empirical distribution is not accurately estimated at the boundaries. To address this issue, we propose a boundary-corrected estimator, following Ebrahimi et al. [9] and Al-Labadi et al. [11]. The estimator is defined as
f 1 , n ( X ( i ) ) = F n ( X ( i + m ) ) F n ( X ( i m ) ) X ( i + m ) X ( i m ) = c i m / n X ( i + m ) X ( i m ) ,
where X ( i m ) = X ( 1 ) if i m , and X ( i + m ) = X ( n ) if i n m . The correction factor c i is defined as
c i = m + i 1 m , if 1 i m , 2 , if m + 1 i n m , n + m i m , if n m + 1 i n .
This position-dependent factor replaces the constant value 2 used in the original formulation and leads to improved accuracy in density estimation near the boundaries. The resulting test statistic takes the form given by
T ˜ m n = i = 1 n c i m n ( X ( i + m ) X ( i m ) ) i = 1 n f 0 ( X i ; θ ^ ) ,
where θ ^ denotes the maximum likelihood estimator under the null model f 0 . The selection method for m is described in (8).

2.2. Correa’s Entropy-Based Density Estimator

Recognizing that the test statistic in (5) is constructed as a likelihood-type ratio, where the numerator represents a nonparametric estimate of the true density, we propose a new test statistic based on the entropy estimator of Correa [10], which relies on estimating the derivative of the quantile function via local linear regression. Unlike spacing-based methods, Correa’s estimator directly models quantile derivatives via local linear regression, avoiding boundary issues and adaptively smoothing noise, particularly advantageous for multimodal or skewed densities. Let X be a continuous random variable with cumulative distribution function F and density f, and define the quantile function as q ( p ) = F 1 ( p ) . The Shannon entropy can be expressed in terms of the quantile function as
H ( f ) = 0 1 log q ( p ) d p ,
so that entropy estimation reduces to estimating q ( p ) . Correa’s method estimates this derivative locally using linear regression.
Given a sample X 1 , , X n , let X ( 1 ) X ( n ) denote the order statistics and define the empirical quantiles p i = i / n . Since X ( i ) F 1 ( p i ) , a local linear regression is performed in a neighborhood of each p i :
X ( j ) = a i + b i p j + ε j , j = i m , , i + m ,
where the slope b i is estimated via least squares as
b i = j = i m i + m ( p j p ¯ i ) ( X ( j ) X ¯ i ) j = i m i + m ( p j p ¯ i ) 2 ,
with p ¯ i and X ¯ i denoting local means. According to Correa [10], an estimator for H ( f ) is then
H ^ ( f ) = 1 n i = 1 n log b i .
Since q ( p i ) b i 1 / f ( X ( i ) ) , the corresponding density estimate is defined as
f 1 , n ( X ( i ) ) = 1 b i .
The proposed test statistic based on Correa’s estimator is therefore given by
T m n new = i = 1 n f 1 , n ( X ( i ) ) i = 1 n f 0 ( X i ; θ ^ ) = i = 1 n b i 1 i = 1 n f 0 ( X i ; θ ^ ) ,
where b i is as defined in (6). In Section 4, we demonstrate that use of (7) yields a powerful goodness-of-fit test.

3. Theoretical Results

In this section, we investigate the asymptotic properties of the proposed test statistic T m n new introduced in the previous section. In particular, we show that the normalized logarithm of the test statistic (7) converges to zero under the null hypothesis and to a strictly positive limit under fixed alternatives. These results hold under mild regularity conditions and do not require asymptotic normality or parametric assumptions. Instead, they follow from standard consistency properties of nonparametric density estimators and laws of large numbers.
Let X 1 , , X n be i.i.d. observations from a continuous distribution with density f, and let f 0 ( x ) = f 0 ( x ; θ ) denote the fully specified null density. The test statistic T m n new is as defined in (7). We impose the following regularity conditions:
(C1)
The density f is continuous and strictly positive on its support.
(C2)
The null density f 0 is continuous and strictly positive on the support of f, and log f 0 ( x ; θ ) is integrable under f.
It is well known that under (C1), the local linear estimator 1 / b i is consistent in probability for f ( X ( i ) ) at each fixed i [10]; that is,
1 / b i p f ( X ( i ) ) as n .
Lemma 1. 
Under conditions(C1)(C2), the following hold:
(i) 
Under H 0 : f = f 0 , we have
1 n log T m n new p 0 .
(ii) 
Under H 1 : f f 0 , we have
1 n log T m n new p E f log f ( X ) f 0 ( X ) > 0 ,
where the expectation is taken with respect to the true density f.
Proof. 
We express the log of the test statistic as
log T m n new = i = 1 n log f 1 , n ( X ( i ) ) log f 0 ( X i ) = i = 1 n log 1 b i log f 0 ( X i ) .
Thus, the normalized log statistic becomes
1 n log T m n new = 1 n i = 1 n log 1 b i log f 0 ( X i ) .
Part (i): Under the null hypothesis f = f 0 , by the consistency of the estimator, 1 / b i p f 0 ( X i ) for each i. By the continuous mapping theorem, log ( 1 / b i ) p log f 0 ( X i ) , so each summand converges in probability to zero. To extend this pointwise convergence to the average, it is necessary that the sequence
log ( 1 / b i ) log f 0 ( X i ) i = 1 n
is uniformly integrable. This condition guarantees that convergence in probability of the summands, together with integrability of log f 0 ( X i ) by (C2), implies convergence of the average. Hence, by uniform integrability,
1 n log T m n new p 0 .
Part (ii): Under the alternative f f 0 , 1 / b i p f ( X i ) , so
log 1 b i f 0 ( X i ) p log f ( X i ) f 0 ( X i ) .
The weak law of large numbers then gives
1 n i = 1 n log 1 b i f 0 ( X i ) p E f log f ( X ) f 0 ( X ) .
The expectation on the right is the Kullback–Leibler divergence D KL ( f f 0 ) , which is strictly positive under H 1 . This completes the proof. □
This result confirms that the proposed test statistic T m n new is consistent and does not require knowledge of the null distribution’s asymptotic form or any parametric estimation procedure under the alternative.
Remark 1. 
1. 
The requirement of uniform integrability here is not restrictive. Since log f 0 ( X i ) is integrable by (C2) and 1 / b i is a consistent estimator of f 0 ( X i ) , the difference log ( 1 / b i ) log f 0 ( X i ) inherits uniform integrability under the null. Thus, the additional condition is automatically satisfied in this setting.
2. 
The same arguments and consistency results hold for the boundary-corrected m-spacing test statistic T ˜ m n defined in Equation (5). The boundary correction, as in Ebrahimi et [9], ensures that the m-spacing estimator is consistent in probability for f ( X ( i ) ) at each sample point, even near the endpoints of the support. Thus, the asymptotic properties established above for T m n new also apply to T ˜ m n under the same regularity conditions.
Unlike classical likelihood ratio tests, which rely on parameter estimation under both the null and alternative models, the proposed statistic T m n new depends only on parameter estimation under the null hypothesis (if required). In fully specified null cases, it avoids parameter estimation entirely. This feature simplifies implementation and enhances robustness, especially in settings where maximum likelihood estimation is computationally challenging or unreliable. This simplification enhances robustness and broadens applicability, especially in settings where maximum likelihood estimation is unstable or computationally intensive.

4. Computational Algorithm

To implement the proposed test statistics T ˜ m n and T m n new , it is necessary to select an appropriate window size parameter m. A commonly used rule, suggested by Grzegorzewski and Wieczorkowski [12], is
m = n + 0.5 ,
where · denotes the floor function. This choice effectively balances bias and variance in the entropy-based density estimators and is widely adopted in practice.
After computing the test statistic, the next step is to assess whether its value provides sufficient evidence to reject the null hypothesis. Values of the test statistic close to one indicate agreement between the empirical and theoretical densities, whereas large values suggest model misspecification and provide evidence against the null. Since the asymptotic distribution of T m n new is analytically intractable, we employ a bootstrap procedure to approximate its null distribution and obtain the corresponding p-value.
The following algorithm outlines the steps for conducting the bootstrap-based goodness-of-fit test:
1.
Given an observed sample X 1 , , X n , compute the test statistic T m n new as defined in Equation (7).
2.
Fit the null model f 0 ( x ; θ ^ ) to the data, where θ ^ denotes the maximum likelihood estimator under H 0 .
3.
Generate B bootstrap samples { X 1 * ( b ) , , X n * ( b ) } , for b = 1 , , B , by sampling from the fitted null model f 0 ( x ; θ ^ ) .
4.
For each bootstrap sample, compute the corresponding test statistic T m n new , * ( b ) .
5.
Estimate the bootstrap p-value as
p boot = 1 B b = 1 B I T m n new , * ( b ) T m n new ,
where I ( · ) denotes the indicator function.
6.
Reject the null hypothesis H 0 at significance level α if p boot < α .
This resampling approach avoids reliance on the asymptotic distribution of the test statistic, making the procedure suitable for small to moderate sample sizes and for complex models. An identical bootstrap procedure is applied to compute the p-value for the statistic T ˜ m n defined in Equation (5).
Note that, although the test statistics T ˜ m n and T m n n e w are defined in multiplicative form, their theoretical properties are most naturally expressed on the log scale. When the null hypothesis H 0 holds, the entropy-based density estimators f ^ n consistently estimate the parametric density f 0 ( · ; θ 0 ) . By Lemma 1(i), the normalized log-statistics
1 n log T ˜ m n and 1 n log T m n n e w
converge in probability to zero. Consequently, the observed values log T ˜ m n and log T m n n e w fluctuate around zero in finite samples. Since the bootstrap samples are generated from f 0 ( · ; θ ^ ) , their corresponding statistics log T ˜ m n * ( b ) and log T m n new , * ( b ) also concentrate near zero, ensuring that the bootstrap distribution provides a valid approximation to the null distribution. Thus, the bootstrap p-values are approximately uniform under H 0 , thereby controlling the Type I error.
In contrast, when H 0 is false, Lemma 1(ii) shows that
1 n log T ˜ m n p E f log f ( X ) f 0 ( X ; θ * ) > 0 , 1 n log T m n n e w p E f log f ( X ) f 0 ( X ; θ * ) > 0 ,
where θ * minimizes the Kullback–Leibler divergence between f and f 0 ( · ; θ ) . Thus, the observed statistics become much larger than their bootstrap replicates, which remain centered near zero. This separation forces the bootstrap p-values to converge to zero as n , thereby guaranteeing the consistency and power of the proposed tests.

5. Simulation Study

In this section, we evaluate the performance of the proposed test statistics through both simulation studies and real data applications. Due to the complexity and nonparametric nature of the estimators, the exact sampling distributions under the null hypothesis are analytically intractable. Therefore, we employ the bootstrap procedure described in Section 4 to approximate the null distribution and compute the corresponding p-values.
For each scenario in Examples 1 and 2, samples of size n { 20 , 50 , 100 , 200 } are generated from the specified true distributions, as detailed in Table 1 and Table 2. The test statistics T m n origin = T m n , T ˜ m n , and T m n new , as defined in Equations (2), (5), and (7), respectively, are computed for each sample. Corresponding p-values are then estimated using B = 1000 bootstrap replications from the fitted null model. To ensure reproducibility, the random seed is set in R via set.seed(2025). The R code implementing the proposed methods is available upon request from the corresponding author. In what follows, we denote the bootstrap p-values by p boot , as defined in (9).
Example 1. 
(Testing N ( μ , 1 ) as the Null): We begin by testing the null hypothesis H 0 : f = f 0 , where f 0 is the density of N ( μ , 1 ) with μ unknown and estimated by μ ^ = x ¯ for each sample.
Samples are generated from various true distributions, as listed in Table 1, to evaluate the sensitivity of the proposed test statistics to departures from normality. These alternatives include normal distributions with different means and variances, a symmetric mixture, as well as heavy-tailed and skewed distributions.
Table 1 displays the resulting p-values. When the data are truly normal—whether matching the null mean ( N ( 0 , 1 ) ) or with a shifted mean ( N ( 5 , 1 ) )—the tests do not reject the null hypothesis and p-values are large for all sample sizes, as expected. As the alternative distributions deviate further from normality (e.g., increased variance or heavier tails), the tests become more sensitive. For instance, under the t 3 distribution, the null is not rejected for small samples ( n = 20 ) but is rejected for larger n, reflecting the increase in power. For non-normal alternatives like the Cauchy and exponential distributions, the tests exhibit high power, yielding near-zero p-values even for moderate sample sizes.
Notably, T ˜ m n and T m n origin yield identical results in our simulations (see Table 1), while T m n new can be marginally more sensitive in some cases, especially for moderate sample sizes or challenging alternatives. Overall, the results demonstrate that the proposed methods maintain the correct Type I error rate under the null and reliably detect departures from normality as the sample size increases.
Although T ˜ m n modifies the construction of the statistic by introducing a boundary correction, it is in fact equivalent to the original statistic up to a constant factor. To see the equivalence between T m n origin = T m n and T ˜ m n , we fix the window size m. By (3) and (5), we obtain
T ˜ m n T m n = i = 1 n c i m n ( X ( i + m ) X ( i m ) ) i = 1 n f 0 ( X i ; θ ^ ) i = 1 n 2 m n ( X ( i + m ) X ( i m ) ) i = 1 n f 0 ( X i ; θ ^ ) = i = 1 n c i 2 .
The following lemma provides an explicit form of this product.
Lemma 2. 
For each n and m, we have
i = 1 n c i 2 = C ( n , m ) ,
where
C ( n , m ) = 2 2 m ( 2 m 1 ) ! ( m 1 ) ! m m 2 .
Proof. 
From the definition of c i in (4), we can distinguish between interior indices and boundary indices.
First, consider the interior indices, i.e., i = m + 1 , , n m . For these indices we have c i = 2 . Hence, each term contributes
c i 2 = 2 2 = 1 .
Since there are n 2 m such indices, their total contribution to the product i = 1 n c i / 2 is simply 1.
Next, consider the lower boundary indices, i.e., i = 1 , , m . For these indices we have
c i = m + i 1 m ,
so that
i = 1 m c i 2 = i = 1 m m + i 1 2 m .
Reindex the product by letting k = m + i 1 . Then, as i runs from 1 to m, k runs from m to 2 m 1 . Thus,
i = 1 m c i 2 = 1 ( 2 m ) m k = m 2 m 1 k .
Now, consider the upper boundary indices, i.e., i = n m + 1 , , n . For these indices we have
c i = n + m i m .
Let j = n + 1 i . Then, as i runs from n m + 1 to n, j runs from 1 to m. Substituting gives
c n + 1 j = m 1 + j m , j = 1 , , m .
Hence,
c n + 1 j 2 = m 1 + j 2 m , j = 1 , , m .
Therefore, the set of factors from the upper boundary is
m 2 m , m + 1 2 m , , 2 m 1 2 m ,
which is exactly the same as the set from the lower boundary. Thus,
i = n m + 1 n c i 2 = 1 ( 2 m ) m k = m 2 m 1 k .
Combining the lower and upper boundary contributions, we obtain
1 ( 2 m ) m k = m 2 m 1 k 2 .
Since the interior indices contribute 1, the overall constant is
C ( n , m ) = i = 1 n c i 2 = 1 ( 2 m ) m k = m 2 m 1 k 2 .
To simplify the product, observe that
( 2 m 1 ) ! = 1 · 2 · · ( m 1 ) · m · ( m + 1 ) ( 2 m 1 ) .
The first part is ( m 1 ) ! , while the second part is exactly k = m 2 m 1 k . Therefore,
k = m 2 m 1 k = ( 2 m 1 ) ! ( m 1 ) ! .
Substituting back yields
C ( n , m ) = 2 2 m ( 2 m 1 ) ! ( m 1 ) ! m m 2 .
These constants depend only on the pair ( n , m ) and not on the observed data. Consequently, the boundary-corrected statistic T ˜ m n differs from the original T m n only by multiplication with C ( n , m ) . Since the same constant rescales both the observed test statistic and the bootstrap distribution under the null, the resulting p-values remain identical. More explicitly, under the null we have
Pr { T ˜ m n * t ˜ obs } = Pr { C ( n , m ) T m n * C ( n , m ) t obs } = Pr { T m n * t obs } .
Thus, the boundary correction reduces edge bias in the density estimate but introduces only a constant multiplicative factor in the test statistic. For hypothesis testing based on parametric bootstrap calibration, it has no impact on the p-values. This explains why Table 1, Table 2 and Table 3 report identical p-values for T m n and T ˜ m n across all scenarios. Accordingly, in the following example, we omit reporting T m n separately.
Example 2. 
(Testing N ( μ , σ 2 ) as the Null): In this simulation, we test the null hypothesis H 0 : f = f 0 , where f 0 is the density of the normal distribution N ( μ , σ 2 ) with both mean μ and variance σ 2 unknown. The maximum likelihood estimators μ ^ = x ¯ and σ ^ 2 = 1 n i = 1 n ( x i x ¯ ) 2 are used to fit the null model.
Samples are generated from the same set of alternative distributions as in Example 1, including normal distributions with different means and variances, symmetric mixtures, as well as heavy-tailed and skewed distributions. For each sample size n { 20 , 50 , 100 , 200 } , the test statistics T ˜ m n and T m n new are computed, and bootstrap p-values are estimated using B = 1000 replications.
Table 2 reports the resulting p-values, which provide insight into the performance of the tests when both location and scale are unknown. When the data are drawn from a normal distribution (either with the same or shifted mean), the tests maintain the nominal Type I error rate, as p-values remain large across all sample sizes. For alternatives with heavier tails, skewness, or mixtures, the p-values decrease rapidly as the sample size increases, demonstrating the ability of the proposed tests to detect deviations from the null model. The tests are especially powerful against the Cauchy and exponential alternatives, with p-values near zero even for small samples. When both location and scale are unknown, the tests show some loss of sensitivity to changes in variance alone, as seen in the N ( 0 , 4 ) case, where p-values only decrease substantially for larger samples. Overall, T ˜ m n and T m n new provide complementary perspectives, with the latter sometimes exhibiting greater sensitivity in challenging cases and with moderate sample sizes. These results highlight the robustness and power of the proposed methods for model assessment under a composite normal null hypothesis.
Example 3. 
(Real Data–Yarn Strength): We apply the proposed tests to the breaking strength values of 100 yarns, originally reported by Duncan [13]:
66 , 117 , 132 , 111 , 107 , 85 , 89 , 79 , 91 , 97 , 138 , 103 , 111 , 86 , 78 , 96 , 93 , 101 , 102 , 110 , 95 , 96 , 88 , 122 , 115 , 92 , 137 , 91 , 84 , 96 , 97 , 100 , 105 , 104 , 137 , 80 , 104 , 104 , 106 , 84 , 92 , 86 , 104 , 132 , 94 , 99 , 102 , 101 , 104 , 107 , 99 , 85 , 95 , 89 , 102 , 100 , 98 , 97 , 104 , 114 , 111 , 98 , 99 , 102 , 91 , 95 , 111 , 104 , 97 , 98 , 102 , 109 , 88 , 91 , 103 , 94 , 105 , 103 , 96 , 100 , 101 , 98 , 97 , 97 , 101 , 102 , 98 , 94 , 100 , 98 , 99 , 92 , 102 , 87 , 99 , 62 , 92 , 100 , 96 , 98
We assess whether these data are adequately modeled by a Laplace distribution,
f ( x ) = 1 2 σ exp | x μ | σ ,
where μ and σ > 0 denote the location and scale parameters, respectively. The maximum likelihood estimates are μ ^ = 99 and σ ^ = 8.33 , consistent with the values reported by Alizadeh Noughabi [7]. Using these estimates, we compute the p-values for T ˜ m n and T m n new via a bootstrap procedure with B = 1000 replications. The resulting p-values, 0.493 for T ˜ m n and 0.875 for T m n new , are both well above the conventional 0.05 significance level, providing no evidence against the Laplace model. These results support the adequacy of the Laplace distribution for the yarn strength data and further indicate that the Correa-based statistic tends to be more conservative, offering stronger support in this setting.
Example 4. (Real Data–River Flow): We assess the suitability of the three-parameter gamma distribution for modeling river flow measurements. The density is given by
f ( x ; α , β , γ ) = β α ( x γ ) α 1 e β ( x γ ) Γ ( α ) , x > γ ,
where α > 0 , β > 0 , and γ > 0 are the shape, rate, and location parameters, respectively.
The dataset comprises river flow measurements (in millions of cubic feet per second) from the Susquehanna River at Harrisburg, Pennsylvania, recorded over the five-year period 1980–1984:
0.654 , 0.613 , 0.315 , 0.449 , 0.297 , 0.402 , 0.379 , 0.423 , 0.379 , 0.324 , 0.269 , 0.740 , 0.418 , 0.412 , 0.494 , 0.416 , 0.338 , 0.392 , 0.484 , 0.265
Following Al-Labadi and Evans [14], the maximum likelihood estimates for the parameters are α ^ = 1.7050 , β ^ = 0.0955 , and γ ^ = 0.2602 . Using these estimates, we apply the bootstrap algorithm with B = 1000 replications to compute the p-values for the T ˜ m n and T m n new test statistics.
The resulting p-values are 0.810 for T ˜ m n and 0.916 for T m n new . Both values are well above conventional significance thresholds, indicating strong agreement between the observed data and the fitted three-parameter gamma model. These results confirm the suitability of the gamma distribution for describing the river flow data and show that the Correa-based statistic tends to be slightly more conservative. This conclusion is consistent with the Bayesian nonparametric test of Al-Labadi and Evans [14], which likewise does not spuriously reject the adequacy of the gamma model.
To further evaluate the effectiveness of the proposed tests in detecting model misspecification, we conduct a power analysis under several alternative distributions, as well as under the null hypothesis. For each scenario, we fix the significance level at α = 0.05 and generate 1000 independent samples for each sample size n { 20 , 50 , 100 , 200 } .
For each sample, the test statistics T ˜ m n and T m n new are computed, and the corresponding p-values are estimated using B = 1000 bootstrap replications from the null model N ( 0 , 1 ) , as described in Section 4. The empirical rejection rate is calculated as the proportion of samples for which the null hypothesis is rejected.
To assess Type I error control, we also report results under the null hypothesis H 0 , where samples are drawn from N ( 0 , 1 ) . The empirical rejection rates in this case should be close to the nominal level α , indicating the validity of the bootstrap calibration and the reliability of the testing procedure.
The alternative distributions considered highlight various types of departures from the null, including a mean shift ( N ( 1 , 1 ) ), heavy tails (Cauchy (0, 1)), and a symmetric distribution with different kurtosis (Logistic (0, 1)). This range of alternatives enables a comprehensive assessment of the sensitivity and robustness of the proposed methods.
The results, presented in Table 3, demonstrate that both tests maintain appropriate Type I error rates when H 0 is true. Moreover, the power increases rapidly with sample size and is highest when the true distribution deviates substantially from the null. Notably, the test based on Correa’s local linear entropy estimator ( T m n new ) consistently outperforms the boundary-corrected m-spacing test ( T ˜ m n ), particularly for small to moderate samples and challenging alternatives. These findings confirm the strong performance of the proposed methods, both in maintaining control of Type I error and in delivering high power against a range of alternatives.
Taken together, the simulation results and real-data examples demonstrate that both proposed tests provide reliable inference for model adequacy across a range of scenarios. The tests maintain proper Type I error control under the null, deliver substantial power under diverse alternatives, and do not spuriously reject in well-specified real-data settings. The new test based on Correa’s entropy estimator in particular offers notable advantages for moderate samples and challenging alternatives, while the boundary-corrected m-spacing approach retains strong and consistent performance. These findings illustrate the utility and flexibility of entropy-based density estimation methods for modern goodness-of-fit testing.

6. Conclusions

In this paper, we proposed two bootstrap-based test statistics for assessing the goodness-of-fit of fully specified parametric models. The first test is a boundary-corrected version of the empirical likelihood ratio statistic originally introduced by Vexler and Gurevich [3], which incorporates a position-dependent correction factor to improve density estimation near the boundaries. Although the correction modifies the construction of the statistic, in practice it yields results that are numerically identical up to a fixed multiplicative constant and therefore lead to the same p-values as the uncorrected version. The second test is based on Correa’s [10] local linear entropy estimator, which provides a flexible and accurate alternative by utilizing local linear regression to approximate the derivative of the quantile function.
We established the theoretical properties of the proposed statistics, demonstrating their consistency and showing that, under fixed alternatives, they converge to the Kullback–Leibler divergence. Since the asymptotic distributions of these statistics are analytically intractable, we developed a bootstrap algorithm for practical implementation.
Comprehensive simulation studies demonstrated that both test statistics perform well in terms of controlling Type I error and detecting model misspecification. In particular, the test based on Correa’s estimator exhibited superior power across a wide range of alternatives, especially in small to moderate sample sizes. Applications to real data further confirmed the utility and flexibility of the proposed methods.
Unlike composite likelihood ratio tests, which typically involve parameter estimation under both the null and alternative models, the proposed approach only requires parameter estimation under the null hypothesis (if at all). In the case of a fully specified null, parameter estimation is entirely avoided. This significantly simplifies implementation and enhances robustness, especially in settings where maximum likelihood estimation is computationally challenging or unreliable.
Future research directions include extending these tests to multivariate distributions, regression models, and right-censored data.

Author Contributions

All authors have contributed equally on this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Acknowledgments

The authors gratefully acknowledge the editor and the two reviewers for their valuable comments and suggestions, which have significantly improved the quality of this paper. We would also like to thank Karen Mabroukeh for proofreading the manuscript and enhancing its readability.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Lehmann, E.L.; Romano, J.P. Testing Statistical Hypotheses, 4th ed.; Springer: New York, NY, USA, 2022. [Google Scholar]
  2. Casella, G.; Berger, R.L. Statistical Inference, 2nd ed.; Duxbury Press: Pacific Grove, CA, USA, 2002. [Google Scholar]
  3. Vexler, A.; Gurevich, G. Empirical likelihood ratios applied to goodness-of-fit tests based on sample entropy. Comput. Stat. Data Anal. 2010, 54, 531–545. [Google Scholar] [CrossRef]
  4. Vasicek, O. A test for normality based on sample entropy. J. R. Stat. Soc. Ser. B (Methodol.) 1976, 38, 54–59. [Google Scholar] [CrossRef]
  5. Alizadeh Noughabi, H. Empirical likelihood ratio-based goodness-of-fit test for the logistic distribution. J. Appl. Stat. 2015, 42, 1973–1983. [Google Scholar] [CrossRef]
  6. Ning, W.; Ngunkeng, G. An empirical likelihood ratio based goodness-of-fit test for skew normality. Stat. Methods Appl. 2013, 22, 209–226. [Google Scholar] [CrossRef]
  7. Alizadeh Noughabi, H. Empirical likelihood ratio-based goodness-of-fit test for the Laplace distribution. Commun. Math. Stat. 2016, 4, 459–471. [Google Scholar] [CrossRef]
  8. Safavinejad, M.; Jomhoori, S.; Alizadeh Noughabi, H. A density-based empirical likelihood ratio goodness-of-fit test for the Rayleigh distribution and power comparison. J. Stat. Comput. Simul. 2015, 85, 3322–3334. [Google Scholar] [CrossRef]
  9. Ebrahimi, N.; Pflughoeft, K.; Soofi, E.S. Two measures of sample entropy. Stat. Probab. Lett. 1994, 20, 225–234. [Google Scholar] [CrossRef]
  10. Correa, J.C. A new estimator of entropy. Commun. Stat. Theory Methods 1995, 24, 2439–2449. [Google Scholar] [CrossRef]
  11. Al-Labadi, L.; Chu, Z.; Xu, Y. Advancements in Rényi entropy and divergence estimation for model assessment. Comput. Stat. 2025, 40, 633–650. [Google Scholar]
  12. Grzegorzewski, P.; Wieczorkowski, R. Entropy-based goodness-of-fit test for exponentiality. Commun. Stat. Theory Methods 1999, 28, 1183–1202. [Google Scholar] [CrossRef]
  13. Duncan, A.J. Quality Control and Industrial Statistics, 5th ed.; Irwin: Homewood, IL, USA, 1974. [Google Scholar]
  14. Al-Labadi, L.; Evans, M. Goodness-of-fit for the three-parameter gamma model. J. Appl. Stat. 2018, 45, 317–334. [Google Scholar]
Table 1. Bootstrap p-values ( p boot ) for various true distributions under H 0 : f = f 0 , where f 0 is the density of N ( μ , 1 ) , using T ˜ m n and T m n new .
Table 1. Bootstrap p-values ( p boot ) for various true distributions under H 0 : f = f 0 , where f 0 is the density of N ( μ , 1 ) , using T ˜ m n and T m n new .
Sample SizeTrue Distribution T ˜ mn T mn new T mn origin
20 N ( 0 , 1 ) 0.9950.9490.995
N ( 5 , 1 ) 0.9920.9270.992
N ( 0 , 4 ) 0.0020.0000.002
0.5 N ( 1 , 1 ) + 0.5 N ( 1 , 1 ) 0.0060.0100.006
t 3 0.6630.5710.663
Logistic (0, 1)0.1310.0760.131
Cauchy (0, 1)0.0000.0000.000
Exponential (1)0.0340.0310.034
50 N ( 0 , 1 ) 0.7570.7450.757
N ( 5 , 1 ) 0.7760.7510.776
N ( 0 , 4 ) 0.0000.0000.000
0.5 N ( 1 , 1 ) + 0.5 N ( 1 , 1 ) 0.0050.0000.005
t 3 0.0020.0000.002
Logistic (0, 1)0.0000.0000.000
Cauchy (0, 1)0.0000.0000.000
Exponential (1)0.0000.0000.000
100 N ( 0 , 1 ) 0.9170.9580.917
N ( 5 , 1 ) 0.9280.9650.928
N ( 0 , 4 ) 0.0000.0000.000
0.5 N ( 1 , 1 ) + 0.5 N ( 1 , 1 ) 0.0000.0000.000
t 3 0.0000.0000.000
Logistic (0, 1)0.0000.0000.000
Cauchy (0, 1)0.0000.0000.000
Exponential (1)0.0000.0000.000
200 N ( 0 , 1 ) 0.7370.5300.737
N ( 5 , 1 ) 0.7400.5410.740
N ( 0 , 4 ) 0.0000.0000.000
0.5 N ( 1 , 1 ) + 0.5 N ( 1 , 1 ) 0.0000.0000.000
t 3 0.0000.0000.000
Logistic (0, 1)0.0000.0000.000
Cauchy (0, 1)0.0000.0000.000
Exponential (1)0.0000.0000.000
Table 2. Bootstrap p-values ( p boot ) for various true distributions under H 0 : f = f 0 , where f 0 is the density of the normal distribution N ( μ , σ 2 ) , using T ˜ m n and T m n new .
Table 2. Bootstrap p-values ( p boot ) for various true distributions under H 0 : f = f 0 , where f 0 is the density of the normal distribution N ( μ , σ 2 ) , using T ˜ m n and T m n new .
Sample SizeTrue Distribution T ˜ mn T mn new
20 N ( 0 , 1 ) 0.9990.986
N ( 5 , 1 ) 0.9980.978
N ( 0 , 4 ) 0.5240.411
0.5 N ( 1 , 1 ) + 0.5 N ( 1 , 1 ) 0.6050.795
t 3 0.9830.992
Logistic (0, 1)0.9360.888
Cauchy (0, 1)0.0000.000
Exponential (1)0.0470.039
50 N ( 0 , 1 ) 0.7670.745
N ( 5 , 1 ) 0.7740.773
N ( 0 , 4 ) 0.7650.744
0.5 N ( 1 , 1 ) + 0.5 N ( 1 , 1 ) 0.4240.676
t 3 0.8910.736
Logistic (0, 1)0.8990.641
Cauchy (0, 1)0.0000.000
Exponential (1)0.0000.000
100 N ( 0 , 1 ) 0.9190.961
N ( 5 , 1 ) 0.9280.959
N ( 0 , 4 ) 0.9300.965
0.5 N ( 1 , 1 ) + 0.5 N ( 1 , 1 ) 0.0000.000
t 3 0.0000.000
Logistic (0, 1)0.7220.481
Cauchy (0, 1)0.0000.000
Exponential (1)0.0000.000
200 N ( 0 , 1 ) 0.7560.543
N ( 5 , 1 ) 0.7090.533
N ( 0 , 4 ) 0.7330.518
0.5 N ( 1 , 1 ) + 0.5 N ( 1 , 1 ) 0.0300.001
t 3 0.0000.000
Logistic (0, 1)0.9390.952
Cauchy (0, 1)0.0000.000
Exponential (1)0.0000.000
Table 3. Empirical rejection rates at significance level α = 0.05 based on 1000 simulations.
Table 3. Empirical rejection rates at significance level α = 0.05 based on 1000 simulations.
Alternative DistributionSample Size T ˜ mn T mn new
Normal (0, 1) (Null)200.0410.044
500.0490.041
1000.0480.045
2000.0560.049
Normal (1, 1)200.9420.953
501.0001.000
1001.0001.000
2001.0001.000
Cauchy (0, 1)200.9800.983
501.0001.000
1001.0001.000
2001.0001.000
Logistic (0, 1)200.8300.846
500.9770.991
1001.0001.000
2001.0001.000
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Al-Labadi, L.; Yu, R.; Bao, K. Goodness-of-Fit Tests via Entropy-Based Density Estimation Techniques. Stats 2025, 8, 97. https://doi.org/10.3390/stats8040097

AMA Style

Al-Labadi L, Yu R, Bao K. Goodness-of-Fit Tests via Entropy-Based Density Estimation Techniques. Stats. 2025; 8(4):97. https://doi.org/10.3390/stats8040097

Chicago/Turabian Style

Al-Labadi, Luai, Ruodie Yu, and Kairui Bao. 2025. "Goodness-of-Fit Tests via Entropy-Based Density Estimation Techniques" Stats 8, no. 4: 97. https://doi.org/10.3390/stats8040097

APA Style

Al-Labadi, L., Yu, R., & Bao, K. (2025). Goodness-of-Fit Tests via Entropy-Based Density Estimation Techniques. Stats, 8(4), 97. https://doi.org/10.3390/stats8040097

Article Metrics

Back to TopTop