Abstract
Goodness-of-fit testing remains a fundamental problem in statistical inference with broad practical importance. In this paper, we introduce two new goodness-of-fit tests grounded in entropy-based density estimation techniques. The first is a boundary-corrected empirical likelihood ratio test, which refines the classic approach by addressing bias near the support boundaries, though, in practice, it yields results very similar to the uncorrected version. The second is a novel test built on Correa’s local linear entropy estimator, leveraging quantile regression to improve density estimation accuracy. We establish the theoretical properties of both test statistics and demonstrate their practical effectiveness through extensive simulation studies and real-data applications. The results show that the proposed methods deliver strong power and flexibility in assessing model adequacy in a wide range of settings.
1. Introduction
Goodness-of-fit testing is a fundamental problem in statistical inference with widespread applications in model selection, validation, and diagnostics. In the parametric framework, the classical approach is often based on the likelihood ratio test (LRT), as motivated by the Neyman–Pearson lemma. While the lemma guarantees the optimality of the LRT for testing simple hypotheses, the LRT is also widely applied to composite hypotheses due to its favorable asymptotic properties [1].
Let be independent and identically distributed (i.i.d.) observations from a continuous distribution F with density f. We consider the hypothesis-testing problem
where under the null hypothesis is known up to a parameter vector , and under the alternative, is completely unspecified. The classical likelihood ratio test statistic is given by
We reject in favor of for large values of , or equivalently, when the log-likelihood ratio
exceeds a critical threshold. Under standard regularity conditions and assuming holds, the distribution of converges asymptotically to a chi-squared distribution with degrees of freedom equal to the difference in dimensionality between the null and alternative models [2].
However, in many practical situations—particularly in nonparametric settings—the alternative density is unknown, rendering direct application of the LRT infeasible. To address this, nonparametric density-based methods have been developed as flexible alternatives. Among these, empirical likelihood ratio tests and entropy-based statistics have attracted considerable attention.
Vexler and Gurevich [3] introduced an empirical likelihood ratio test in which the unknown density is estimated nonparametrically using Vasicek’s [4] entropy estimator. They defined the empirical likelihood ratio using
where
and denotes the empirical cumulative distribution function. Here, is the maximum likelihood estimate under , and are the order statistics. Boundary corrections are made such that if , and if . The integer m, called the window size, is a positive number less than . Vexler and Gurevich [3] proposed the test statistic
Since depends on m, they further suggested the minimization
where .
This general methodology has since been extended to various settings, including logistic distributions [5], skew normality [6], Laplace distributions [7], and Rayleigh distributions [8]. Although the method of Vexler and Gurevich [3] is generally effective, it is known to suffer from boundary bias, particularly near the endpoints—a limitation also discussed by Ebrahimi et al. [9] in the context of entropy estimation.
To address these shortcomings, we propose two test statistics. The first is a corrected version of the Vexler–Gurevich statistic (2) and (3), incorporating a position-dependent correction factor to properly account for boundary effects. The second is a new test statistic based on Correa’s [10] local linear entropy estimator, which improves density estimation by locally interpolating the quantile function. Together, these methods aim to enhance the performance and reliability of goodness-of-fit testing.
The remainder of the paper is organized as follows. Section 2 introduces the two proposed test statistics. Section 3 presents their theoretical properties. Section 4 describes the computational implementation using a bootstrap procedure. Section 5 provides simulation studies and real-data applications to evaluate the tests’ performance. Finally, Section 6 concludes and outlines potential future research directions.
2. Two Approaches for Goodness-of-Fit Testing
In this section, we introduce two test statistics for assessing model adequacy. Both are constructed as likelihood-type ratios, where the numerator provides a nonparametric estimate of the true density, obtained either through the boundary-corrected m-spacing method or through Correa’s local linear entropy estimator. The denominator corresponds to the fitted parametric model under the null hypothesis. Values of the statistic close to one indicate good agreement between the empirical and theoretical densities, whereas large values provide evidence against the adequacy of the hypothesized model.
The first test is a boundary-corrected version of the empirical likelihood ratio statistic of Vexler and Gurevich [3], designed to mitigate bias near the support boundaries. The second test is based on Correa’s [10] local linear entropy estimator, which improves density estimation by locally approximating the derivative of the quantile function.
2.1. Corrected Test Statistic of Vexler and Gurevich (2010)
As noted in Section 1, the original test statistic proposed by Vexler and Gurevich is subject to boundary bias, particularly when the index i lies near the extremes of the support. This bias arises because the slope between and in the empirical distribution is not accurately estimated at the boundaries. To address this issue, we propose a boundary-corrected estimator, following Ebrahimi et al. [9] and Al-Labadi et al. [11]. The estimator is defined as
where if , and if . The correction factor is defined as
This position-dependent factor replaces the constant value 2 used in the original formulation and leads to improved accuracy in density estimation near the boundaries. The resulting test statistic takes the form given by
where denotes the maximum likelihood estimator under the null model . The selection method for m is described in (8).
2.2. Correa’s Entropy-Based Density Estimator
Recognizing that the test statistic in (5) is constructed as a likelihood-type ratio, where the numerator represents a nonparametric estimate of the true density, we propose a new test statistic based on the entropy estimator of Correa [10], which relies on estimating the derivative of the quantile function via local linear regression. Unlike spacing-based methods, Correa’s estimator directly models quantile derivatives via local linear regression, avoiding boundary issues and adaptively smoothing noise, particularly advantageous for multimodal or skewed densities. Let X be a continuous random variable with cumulative distribution function F and density f, and define the quantile function as . The Shannon entropy can be expressed in terms of the quantile function as
so that entropy estimation reduces to estimating . Correa’s method estimates this derivative locally using linear regression.
Given a sample , let denote the order statistics and define the empirical quantiles . Since , a local linear regression is performed in a neighborhood of each :
where the slope is estimated via least squares as
with and denoting local means. According to Correa [10], an estimator for is then
Since , the corresponding density estimate is defined as
3. Theoretical Results
In this section, we investigate the asymptotic properties of the proposed test statistic introduced in the previous section. In particular, we show that the normalized logarithm of the test statistic (7) converges to zero under the null hypothesis and to a strictly positive limit under fixed alternatives. These results hold under mild regularity conditions and do not require asymptotic normality or parametric assumptions. Instead, they follow from standard consistency properties of nonparametric density estimators and laws of large numbers.
Let be i.i.d. observations from a continuous distribution with density f, and let denote the fully specified null density. The test statistic is as defined in (7). We impose the following regularity conditions:
- (C1)
- The density f is continuous and strictly positive on its support.
- (C2)
- The null density is continuous and strictly positive on the support of f, and is integrable under f.
It is well known that under (C1), the local linear estimator is consistent in probability for at each fixed i [10]; that is,
Lemma 1.
Under conditions(C1)–(C2), the following hold:
- (i)
- Under , we have
- (ii)
- Under , we havewhere the expectation is taken with respect to the true density f.
Proof.
We express the log of the test statistic as
Thus, the normalized log statistic becomes
Part (i): Under the null hypothesis , by the consistency of the estimator, for each i. By the continuous mapping theorem, , so each summand converges in probability to zero. To extend this pointwise convergence to the average, it is necessary that the sequence
is uniformly integrable. This condition guarantees that convergence in probability of the summands, together with integrability of by (C2), implies convergence of the average. Hence, by uniform integrability,
Part (ii): Under the alternative , , so
The weak law of large numbers then gives
The expectation on the right is the Kullback–Leibler divergence , which is strictly positive under . This completes the proof. □
This result confirms that the proposed test statistic is consistent and does not require knowledge of the null distribution’s asymptotic form or any parametric estimation procedure under the alternative.
Remark 1.
- 1.
- The requirement of uniform integrability here is not restrictive. Since is integrable by (C2) and is a consistent estimator of , the difference inherits uniform integrability under the null. Thus, the additional condition is automatically satisfied in this setting.
- 2.
- The same arguments and consistency results hold for the boundary-corrected m-spacing test statistic defined in Equation (5). The boundary correction, as in Ebrahimi et [9], ensures that the m-spacing estimator is consistent in probability for at each sample point, even near the endpoints of the support. Thus, the asymptotic properties established above for also apply to under the same regularity conditions.
Unlike classical likelihood ratio tests, which rely on parameter estimation under both the null and alternative models, the proposed statistic depends only on parameter estimation under the null hypothesis (if required). In fully specified null cases, it avoids parameter estimation entirely. This feature simplifies implementation and enhances robustness, especially in settings where maximum likelihood estimation is computationally challenging or unreliable. This simplification enhances robustness and broadens applicability, especially in settings where maximum likelihood estimation is unstable or computationally intensive.
4. Computational Algorithm
To implement the proposed test statistics and , it is necessary to select an appropriate window size parameter m. A commonly used rule, suggested by Grzegorzewski and Wieczorkowski [12], is
where denotes the floor function. This choice effectively balances bias and variance in the entropy-based density estimators and is widely adopted in practice.
After computing the test statistic, the next step is to assess whether its value provides sufficient evidence to reject the null hypothesis. Values of the test statistic close to one indicate agreement between the empirical and theoretical densities, whereas large values suggest model misspecification and provide evidence against the null. Since the asymptotic distribution of is analytically intractable, we employ a bootstrap procedure to approximate its null distribution and obtain the corresponding p-value.
The following algorithm outlines the steps for conducting the bootstrap-based goodness-of-fit test:
- 1.
- Given an observed sample , compute the test statistic as defined in Equation (7).
- 2.
- Fit the null model to the data, where denotes the maximum likelihood estimator under .
- 3.
- Generate B bootstrap samples , for , by sampling from the fitted null model .
- 4.
- For each bootstrap sample, compute the corresponding test statistic .
- 5.
- Estimate the bootstrap p-value aswhere denotes the indicator function.
- 6.
- Reject the null hypothesis at significance level if .
This resampling approach avoids reliance on the asymptotic distribution of the test statistic, making the procedure suitable for small to moderate sample sizes and for complex models. An identical bootstrap procedure is applied to compute the p-value for the statistic defined in Equation (5).
Note that, although the test statistics and are defined in multiplicative form, their theoretical properties are most naturally expressed on the log scale. When the null hypothesis holds, the entropy-based density estimators consistently estimate the parametric density . By Lemma 1(i), the normalized log-statistics
converge in probability to zero. Consequently, the observed values and fluctuate around zero in finite samples. Since the bootstrap samples are generated from , their corresponding statistics and also concentrate near zero, ensuring that the bootstrap distribution provides a valid approximation to the null distribution. Thus, the bootstrap p-values are approximately uniform under , thereby controlling the Type I error.
In contrast, when is false, Lemma 1(ii) shows that
where minimizes the Kullback–Leibler divergence between f and . Thus, the observed statistics become much larger than their bootstrap replicates, which remain centered near zero. This separation forces the bootstrap p-values to converge to zero as , thereby guaranteeing the consistency and power of the proposed tests.
5. Simulation Study
In this section, we evaluate the performance of the proposed test statistics through both simulation studies and real data applications. Due to the complexity and nonparametric nature of the estimators, the exact sampling distributions under the null hypothesis are analytically intractable. Therefore, we employ the bootstrap procedure described in Section 4 to approximate the null distribution and compute the corresponding p-values.
For each scenario in Examples 1 and 2, samples of size are generated from the specified true distributions, as detailed in Table 1 and Table 2. The test statistics , , and , as defined in Equations (2), (5), and (7), respectively, are computed for each sample. Corresponding p-values are then estimated using bootstrap replications from the fitted null model. To ensure reproducibility, the random seed is set in R via set.seed(2025). The R code implementing the proposed methods is available upon request from the corresponding author. In what follows, we denote the bootstrap p-values by , as defined in (9).
Table 1.
Bootstrap p-values () for various true distributions under , where is the density of , using and .
Table 2.
Bootstrap p-values () for various true distributions under , where is the density of the normal distribution , using and .
Example 1.
(Testing as the Null): We begin by testing the null hypothesis , where is the density of with μ unknown and estimated by for each sample.
Samples are generated from various true distributions, as listed in Table 1, to evaluate the sensitivity of the proposed test statistics to departures from normality. These alternatives include normal distributions with different means and variances, a symmetric mixture, as well as heavy-tailed and skewed distributions.
Table 1 displays the resulting p-values. When the data are truly normal—whether matching the null mean () or with a shifted mean ()—the tests do not reject the null hypothesis and p-values are large for all sample sizes, as expected. As the alternative distributions deviate further from normality (e.g., increased variance or heavier tails), the tests become more sensitive. For instance, under the distribution, the null is not rejected for small samples () but is rejected for larger n, reflecting the increase in power. For non-normal alternatives like the Cauchy and exponential distributions, the tests exhibit high power, yielding near-zero p-values even for moderate sample sizes.
Notably, and yield identical results in our simulations (see Table 1), while can be marginally more sensitive in some cases, especially for moderate sample sizes or challenging alternatives. Overall, the results demonstrate that the proposed methods maintain the correct Type I error rate under the null and reliably detect departures from normality as the sample size increases.
Although modifies the construction of the statistic by introducing a boundary correction, it is in fact equivalent to the original statistic up to a constant factor. To see the equivalence between and , we fix the window size m. By (3) and (5), we obtain
The following lemma provides an explicit form of this product.
Lemma 2.
For each n and m, we have
where
Proof.
From the definition of in (4), we can distinguish between interior indices and boundary indices.
First, consider the interior indices, i.e., . For these indices we have . Hence, each term contributes
Since there are such indices, their total contribution to the product is simply 1.
Next, consider the lower boundary indices, i.e., . For these indices we have
so that
Reindex the product by letting . Then, as i runs from 1 to m, k runs from m to . Thus,
Now, consider the upper boundary indices, i.e., . For these indices we have
Let . Then, as i runs from to n, j runs from 1 to m. Substituting gives
Hence,
Therefore, the set of factors from the upper boundary is
which is exactly the same as the set from the lower boundary. Thus,
Combining the lower and upper boundary contributions, we obtain
Since the interior indices contribute 1, the overall constant is
To simplify the product, observe that
The first part is , while the second part is exactly . Therefore,
Substituting back yields
□
These constants depend only on the pair and not on the observed data. Consequently, the boundary-corrected statistic differs from the original only by multiplication with . Since the same constant rescales both the observed test statistic and the bootstrap distribution under the null, the resulting p-values remain identical. More explicitly, under the null we have
Thus, the boundary correction reduces edge bias in the density estimate but introduces only a constant multiplicative factor in the test statistic. For hypothesis testing based on parametric bootstrap calibration, it has no impact on the p-values. This explains why Table 1, Table 2 and Table 3 report identical p-values for and across all scenarios. Accordingly, in the following example, we omit reporting separately.
Table 3.
Empirical rejection rates at significance level based on 1000 simulations.
Example 2.
(Testing as the Null): In this simulation, we test the null hypothesis , where is the density of the normal distribution with both mean μ and variance unknown. The maximum likelihood estimators and are used to fit the null model.
Samples are generated from the same set of alternative distributions as in Example 1, including normal distributions with different means and variances, symmetric mixtures, as well as heavy-tailed and skewed distributions. For each sample size , the test statistics and are computed, and bootstrap p-values are estimated using replications.
Table 2 reports the resulting p-values, which provide insight into the performance of the tests when both location and scale are unknown. When the data are drawn from a normal distribution (either with the same or shifted mean), the tests maintain the nominal Type I error rate, as p-values remain large across all sample sizes. For alternatives with heavier tails, skewness, or mixtures, the p-values decrease rapidly as the sample size increases, demonstrating the ability of the proposed tests to detect deviations from the null model. The tests are especially powerful against the Cauchy and exponential alternatives, with p-values near zero even for small samples. When both location and scale are unknown, the tests show some loss of sensitivity to changes in variance alone, as seen in the case, where p-values only decrease substantially for larger samples. Overall, and provide complementary perspectives, with the latter sometimes exhibiting greater sensitivity in challenging cases and with moderate sample sizes. These results highlight the robustness and power of the proposed methods for model assessment under a composite normal null hypothesis.
Example 3.
(Real Data–Yarn Strength): We apply the proposed tests to the breaking strength values of 100 yarns, originally reported by Duncan [13]:
We assess whether these data are adequately modeled by a Laplace distribution,
where and denote the location and scale parameters, respectively. The maximum likelihood estimates are and , consistent with the values reported by Alizadeh Noughabi [7]. Using these estimates, we compute the p-values for and via a bootstrap procedure with replications. The resulting p-values, for and for , are both well above the conventional significance level, providing no evidence against the Laplace model. These results support the adequacy of the Laplace distribution for the yarn strength data and further indicate that the Correa-based statistic tends to be more conservative, offering stronger support in this setting.
Example 4. (Real Data–River Flow): We assess the suitability of the three-parameter gamma distribution for modeling river flow measurements. The density is given by
where , , and are the shape, rate, and location parameters, respectively.
The dataset comprises river flow measurements (in millions of cubic feet per second) from the Susquehanna River at Harrisburg, Pennsylvania, recorded over the five-year period 1980–1984:
Following Al-Labadi and Evans [14], the maximum likelihood estimates for the parameters are , , and . Using these estimates, we apply the bootstrap algorithm with replications to compute the p-values for the and test statistics.
The resulting p-values are for and for . Both values are well above conventional significance thresholds, indicating strong agreement between the observed data and the fitted three-parameter gamma model. These results confirm the suitability of the gamma distribution for describing the river flow data and show that the Correa-based statistic tends to be slightly more conservative. This conclusion is consistent with the Bayesian nonparametric test of Al-Labadi and Evans [14], which likewise does not spuriously reject the adequacy of the gamma model.
To further evaluate the effectiveness of the proposed tests in detecting model misspecification, we conduct a power analysis under several alternative distributions, as well as under the null hypothesis. For each scenario, we fix the significance level at and generate 1000 independent samples for each sample size .
For each sample, the test statistics and are computed, and the corresponding p-values are estimated using bootstrap replications from the null model , as described in Section 4. The empirical rejection rate is calculated as the proportion of samples for which the null hypothesis is rejected.
To assess Type I error control, we also report results under the null hypothesis , where samples are drawn from . The empirical rejection rates in this case should be close to the nominal level , indicating the validity of the bootstrap calibration and the reliability of the testing procedure.
The alternative distributions considered highlight various types of departures from the null, including a mean shift (), heavy tails (Cauchy (0, 1)), and a symmetric distribution with different kurtosis (Logistic (0, 1)). This range of alternatives enables a comprehensive assessment of the sensitivity and robustness of the proposed methods.
The results, presented in Table 3, demonstrate that both tests maintain appropriate Type I error rates when is true. Moreover, the power increases rapidly with sample size and is highest when the true distribution deviates substantially from the null. Notably, the test based on Correa’s local linear entropy estimator () consistently outperforms the boundary-corrected m-spacing test (), particularly for small to moderate samples and challenging alternatives. These findings confirm the strong performance of the proposed methods, both in maintaining control of Type I error and in delivering high power against a range of alternatives.
Taken together, the simulation results and real-data examples demonstrate that both proposed tests provide reliable inference for model adequacy across a range of scenarios. The tests maintain proper Type I error control under the null, deliver substantial power under diverse alternatives, and do not spuriously reject in well-specified real-data settings. The new test based on Correa’s entropy estimator in particular offers notable advantages for moderate samples and challenging alternatives, while the boundary-corrected m-spacing approach retains strong and consistent performance. These findings illustrate the utility and flexibility of entropy-based density estimation methods for modern goodness-of-fit testing.
6. Conclusions
In this paper, we proposed two bootstrap-based test statistics for assessing the goodness-of-fit of fully specified parametric models. The first test is a boundary-corrected version of the empirical likelihood ratio statistic originally introduced by Vexler and Gurevich [3], which incorporates a position-dependent correction factor to improve density estimation near the boundaries. Although the correction modifies the construction of the statistic, in practice it yields results that are numerically identical up to a fixed multiplicative constant and therefore lead to the same p-values as the uncorrected version. The second test is based on Correa’s [10] local linear entropy estimator, which provides a flexible and accurate alternative by utilizing local linear regression to approximate the derivative of the quantile function.
We established the theoretical properties of the proposed statistics, demonstrating their consistency and showing that, under fixed alternatives, they converge to the Kullback–Leibler divergence. Since the asymptotic distributions of these statistics are analytically intractable, we developed a bootstrap algorithm for practical implementation.
Comprehensive simulation studies demonstrated that both test statistics perform well in terms of controlling Type I error and detecting model misspecification. In particular, the test based on Correa’s estimator exhibited superior power across a wide range of alternatives, especially in small to moderate sample sizes. Applications to real data further confirmed the utility and flexibility of the proposed methods.
Unlike composite likelihood ratio tests, which typically involve parameter estimation under both the null and alternative models, the proposed approach only requires parameter estimation under the null hypothesis (if at all). In the case of a fully specified null, parameter estimation is entirely avoided. This significantly simplifies implementation and enhances robustness, especially in settings where maximum likelihood estimation is computationally challenging or unreliable.
Future research directions include extending these tests to multivariate distributions, regression models, and right-censored data.
Author Contributions
All authors have contributed equally on this work. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
Not applicable.
Acknowledgments
The authors gratefully acknowledge the editor and the two reviewers for their valuable comments and suggestions, which have significantly improved the quality of this paper. We would also like to thank Karen Mabroukeh for proofreading the manuscript and enhancing its readability.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Lehmann, E.L.; Romano, J.P. Testing Statistical Hypotheses, 4th ed.; Springer: New York, NY, USA, 2022. [Google Scholar]
- Casella, G.; Berger, R.L. Statistical Inference, 2nd ed.; Duxbury Press: Pacific Grove, CA, USA, 2002. [Google Scholar]
- Vexler, A.; Gurevich, G. Empirical likelihood ratios applied to goodness-of-fit tests based on sample entropy. Comput. Stat. Data Anal. 2010, 54, 531–545. [Google Scholar] [CrossRef]
- Vasicek, O. A test for normality based on sample entropy. J. R. Stat. Soc. Ser. B (Methodol.) 1976, 38, 54–59. [Google Scholar] [CrossRef]
- Alizadeh Noughabi, H. Empirical likelihood ratio-based goodness-of-fit test for the logistic distribution. J. Appl. Stat. 2015, 42, 1973–1983. [Google Scholar] [CrossRef]
- Ning, W.; Ngunkeng, G. An empirical likelihood ratio based goodness-of-fit test for skew normality. Stat. Methods Appl. 2013, 22, 209–226. [Google Scholar] [CrossRef]
- Alizadeh Noughabi, H. Empirical likelihood ratio-based goodness-of-fit test for the Laplace distribution. Commun. Math. Stat. 2016, 4, 459–471. [Google Scholar] [CrossRef]
- Safavinejad, M.; Jomhoori, S.; Alizadeh Noughabi, H. A density-based empirical likelihood ratio goodness-of-fit test for the Rayleigh distribution and power comparison. J. Stat. Comput. Simul. 2015, 85, 3322–3334. [Google Scholar] [CrossRef]
- Ebrahimi, N.; Pflughoeft, K.; Soofi, E.S. Two measures of sample entropy. Stat. Probab. Lett. 1994, 20, 225–234. [Google Scholar] [CrossRef]
- Correa, J.C. A new estimator of entropy. Commun. Stat. Theory Methods 1995, 24, 2439–2449. [Google Scholar] [CrossRef]
- Al-Labadi, L.; Chu, Z.; Xu, Y. Advancements in Rényi entropy and divergence estimation for model assessment. Comput. Stat. 2025, 40, 633–650. [Google Scholar]
- Grzegorzewski, P.; Wieczorkowski, R. Entropy-based goodness-of-fit test for exponentiality. Commun. Stat. Theory Methods 1999, 28, 1183–1202. [Google Scholar] [CrossRef]
- Duncan, A.J. Quality Control and Industrial Statistics, 5th ed.; Irwin: Homewood, IL, USA, 1974. [Google Scholar]
- Al-Labadi, L.; Evans, M. Goodness-of-fit for the three-parameter gamma model. J. Appl. Stat. 2018, 45, 317–334. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).