Next Article in Journal
Particle-Based Dynamic Water Drops with High Surface Tension in Real Time
Next Article in Special Issue
An Empirical Comparative Assessment of Inter-Rater Agreement of Binary Outcomes and Multiple Raters
Previous Article in Journal
Associated Charmonium-Bottomonium Production in a Single Boson e+e Annihilation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Nonparametric Lack-of-Fit Test of Constant Regression in the Presence of Heteroscedastic Variances

1
Department of Mathematics, Al al-Bayt University, Mafraq 25113, Jordan
2
Department of Statistics, University of California, Davis, CA 95616, USA
3
Department of Statistics, Kansas State University, Manhattan, KS 66506, USA
4
Department of Statistics, Texas A&M University, College Station, TX 77843, USA
*
Author to whom correspondence should be addressed.
Current address: AbbVie, South San Francisco, CA 94080, USA.
Symmetry 2021, 13(7), 1264; https://doi.org/10.3390/sym13071264
Submission received: 17 June 2021 / Revised: 9 July 2021 / Accepted: 12 July 2021 / Published: 14 July 2021
(This article belongs to the Special Issue Nonparametric Statistics and Biostatistical Methods)

Abstract

:
We consider a k-nearest neighbor-based nonparametric lack-of-fit test of constant regression in presence of heteroscedastic variances. The asymptotic distribution of the test statistic is derived under the null and local alternatives for a fixed number of nearest neighbors. Advantages of our test compared to classical methods include: (1) The response variable can be discrete or continuous regardless of whether the conditional distribution is symmetric or not and can have variations depending on the predictor. This allows our test to have broad applicability to data from many practical fields; (2) this approach does not need nonlinear regression function estimation that often affects the power for moderate sample sizes; (3) our test statistic achieves the parametric standardizing rate, which gives more power than smoothing-based nonparametric methods for moderate sample sizes. Our numerical simulation shows that the proposed test is powerful and has noticeably better performance than some well known tests when the data were generated from high frequency alternatives or binary data. The test is illustrated with an application to gene expression data and an assessment of Richards growth curve fit to COVID-19 data.

1. Introduction

Nonparametric lack-of-fit tests where the constant regression is assumed for the null hypothesis have been considered by many authors. The order selection test [1], the rank-based order selection test [2], and the Bayes sum test [3] are among the top few that are intuitive and easy to compute. A classical textbook review of extensive efforts in nonparametric lack-of-fit tests based on smoothing methods is available in Reference [4]. Hart [2] extended the order selection method of Reference [1] to rank-based test under the constant variance assumption so that the test statistic is relatively insensitive to misspecification of distributional assumptions. These two order selection tests show excellent performance under low frequency alternatives. However, they may have low power under high frequency alternatives.
In another paper, Hart proposed several new tests based on Laplace approximations to better handle the high frequency alternatives [3]. In particular, one test with overall good power is the Bayes sum test. It is a modified cusum statistic with a better use of the sample Fourier coefficients arranged in the order of increasing frequency. Two versions of approximating the critical values were given in Reference [3], one based on normally generated data, and the other based on bootstrap resampling of the residuals under the null hypothesis of constant regression. It is interesting to note that, even though the response variable may not be from the normal distribution, the normal approximation approach tends to give even higher power than the bootstrap approach. An explanation for this is that the Bayes sum test starts with the canonical model that the estimators of the Fourier coefficients are normally distributed, and here the sample Fourier coefficients are approximately normally distributed for large sample sizes. Thus, the Bayes sum test works well for large sample sizes and is more powerful than the order selection test and the rank-based order selection test.
A major motivation for the current work is that the practical data may have variances vary with the covariate, whereas the order selection (OS), rank-based order selection (ROS), and Bayes sum test were derived for homoscedastic regression problems. The scale parameter of the error term is assumed to be a constant in these three tests. Even in such a case, different estimators of the scale parameter may be used assuming either the null or alternative hypothesis is true.
To deal with the presence of heteroscedasticity for testing the no-effect null hypothesis, Chen et al. [5] proposed another test statistic in addition to bootstrapping the [6] version of the order selection test. The approximate sampling distribution of that test statistic was obtained using the wild bootstrap method. In the case of heteroscedasticity, it was shown in Reference [5] that the asymptotic distribution of the [6] version of the order selection test depends on the unknown variance function of the errors. Moreover, they showed that their statistic is more robust than that of Reference [6] to heteroscedasticity and has better level accuracy. It was further shown in Reference [5] that the wild bootstrap technique has an overall good performance in terms of level accuracy and power properties in the case of heteroscedasticity.
Other consistent nonparametric lack-of-fit tests using some smoothing techniques have been proposed (cf. References [7,8,9,10,11,12,13,14]). Some of them are difficult to compute in addition to complicated conditions that are hard to justify. All of the aforementioned methods require the response variable to be continuous.
In this paper, we consider a nonparametric lack-of-fit test of constant regression in presence of heteroscedastic variances. This test has better power for data from high frequency alternatives than the four tests reviewed above. In addition, our test can also be applied to discrete data. The test statistic is derived using the k-nearest neighbor augmentation defined through the ranks of the predictor. This idea was first proposed in Reference [15] for analysis of covariance model, and further used in Reference [16] for a diagnostic test and in Reference [17] for a test of independence between a response variable and a covariate in presence of treatments. A test statistic was defined in Reference [16] for lack-of-fit test in the present regression setting. The authors considered each distinct covariate value as a factor level. Then, they augmented the observed data to construct what they called an artificial balanced one-way ANOVA (see Section 2.1 for further description of the augmentation). This way of constructing test statistics has great potential to gain power over smoothing-based methods. However, we found that their asymptotic variance estimator of the test statistic in Reference [16] seriously underestimates the true variance for intermediate sample sizes. As a consequence, regardless of the error distribution, their test has highly inflated type I error rates when k is small and becomes very conservative when k gets large.
In this paper, we present a very different asymptotic variance formula for the test statistic. In the special case of homoscedastic variance, our derived asymptotic variance contains one more term (a function of k) than that in Reference [16]. This explains the unstable behavior of the type I error pattern of their test. On the other hand, our test has consistent type I error rates across different sample sizes and different k values, and they are very close to the nominal alpha levels.
In Section 2, we state the hypotheses and define the test statistic as a difference of two quadratic forms, both of which estimate a common quantity but one under the null hypothesis and the other under the alternatives. Then, the asymptotic distribution of the test statistic is obtained under the null and the local alternatives for a fixed number of nearest neighbors. Moreover, we consider the idea of the Least Squares Cross-Validation (LSCV) procedure of Reference [18] to estimate the number of nearest neighbors. In Section 3, we present simulation studies with data generated having symmetric normal, light-tailed uniform, heavy-tailed T, and asymmetric heteroscedastic error distributions. The numerical results show that our test has encouragingly better performance in terms of type I error and power compared to the existing tests. In addition to the simulation comparisons, we present in Section 4 an application to gene expression data from patients undergoing radical prostatectomy and an application to assess COVID-19 model fit. A summary is given in Section 5. Technical proofs are provided in Appendix A.

2. Theoretical Results

2.1. The Hypotheses and Test Statistic

Let ( X j , Y j ) , j = 1 , , N , be an independent and identically distributed random sample. Let f ( x ) and F ( x ) denote the marginal probability density function and cumulative distribution function of X j , respectively. Denote Var ( Y i | X i = x ) = σ 2 ( x ) and ε i = Y i E ( Y i | X i ) .
We wish to test the hypotheses:
H 0 : E ( Y | X = x ) = m 0 ( x ) , where   m 0 ( · ) is   a   known   function , vs .
H 1 : E ( Y | X = x ) = m ( x ) , which   depends   on   x   via   other   functions   instead   of   m 0 ( · ) .
This formulation works for both continuous and categorical response variable Y. For simplicity in the presentation, we assume that there are no duplicated observations for each value of covariate X. If there are duplicated observations, we can use the middle ranks to take care of this issue. In regression settings, the nonlinear conditional mean regression E ( Y | X ) is often estimated through pooling observations from neighbors by one of the smoothing methods, such as loess, smoothing spline, kernel estimation, etc. For smoothing spline or kernel method, the number of observations in a window essentially needs to go to infinity as the sample size goes to infinity. The k-nearest neighbor approach is a popular method for classification, but the theory for a fixed k is very difficult for general regression. In this work, we use a fixed number of k-nearest neighbors in the data augmentation to help define a statistic for conducting a lack-of-fit test. This augmentation is done for each unique value x i of the predictor by generating a cell that contains k values of the response Y whose corresponding x values are among the k closest to x i in rank. We consider k to be an odd number for convenience so that the augmentation contains half of the (k− 1) values symmetrically on each side of x i when x i is an inner point. Let c denote an index defined by the covariate value X j 1 , where c = j 1 and let F ^ ( x ) = N 1 j = 1 N I ( X j x ) denote the empirical distribution of X. We make the augmentation for each cell ( c ) by selecting k 1 pairs of observations whose covariate values are among the k closest to X j 1 in rank in addition to ( X j 1 , Y j 1 ) . Let C c denote the set of indices for the covariate values used in the augmented cell ( c ) . Thus, for any pair ( X j , Y j ) to be selected in the augmentation of the cell ( c ) , the difference between the ranks of X j and X j 1 is no more than ( k 1 ) / 2 if X j 1 is an interior point whose rank is between ( k 1 ) / 2 and N ( k 1 ) / 2 , i.e., N | F ^ ( X j 1 ) F ^ ( X j ) | ( k 1 ) / 2 . For X j 1 whose rank is less than ( k 1 ) / 2 or greater than N ( k 1 ) / 2 , the difference between the ranks of X j and X j 1 is no more than k 1 . This idea was first proposed in [15] and further used in References [16,17] for different problems. A test statistic was derived in Reference [16] for lack-of-fit testing in the present regression setting by considering each distinct covariate value as a factor level. Then, the observed data were augmented by considering a window around each x i that contains the k n nearest covariate values to construct what the authors called an artificial balanced one-way ANOVA. Similar augmentation was considered in Reference [17] when there are more than one treatment. Their results cannot be applied here since the asymptotic variance calculation is ill-defined when there is no treatment factor as in our lack-of-fit setting.
Let R c t , t = 1 , , k , denote the augmented response values in cell ( c ) under the null hypothesis. Define g N k ( X 1 , X 2 ) = I N | F ^ ( X 1 ) F ^ ( X 2 ) | k 1 2 to be the indicator function that the difference between the ranks of X 1 and X 2 is no more than ( k 1 ) / 2 . Let B N and W N denote the average between-cell and within-cell variations defined as the following:
B N = k N 1 c = 1 N R ¯ c · R ¯ · · 2 and W N = 1 N ( k 1 ) c = 1 N t = 1 k R c t R ¯ c · 2 ,
where R ¯ c · = k 1 t = 1 k R c t , R ¯ · · = N 1 c = 1 N R ¯ c · . Note that B N and W N can be easily calculated since they resemble the mean squares statistics for an ANOVA model. The calculation is on the augmented data. In most cases in the literature, B N / W N is used for constructing the test statistic when B N has fixed degrees of freedom. However, in our case, the degrees of freedom for B N is N 1 , which goes to infinity. Therefore, the staztistic typically used in this case is N [ ( B N / W N ) 1 ] (see Reference [19]), which involves showing that N ( B N W N ) converges in distribution to normality and W N converges in probability to a constant. With augumented data, it is complicated to show that W N converges in probability. So, we define the following difference-based
G N = N ( B N W N ) / λ ^ N
as our test statistic instead of using B N / W N -based one, where λ ^ N is a variance estimator for N ( B N W N ) given later in (9). This test statistic is similar to that proposed in Reference [16], but with a different variance estimator.
To express B N and W N in terms of the original data, we can write
B N = k N 1 j 1 = 1 N 1 k j = 1 N Y j g N k ( X j 1 , X j ) 1 N k j 2 = 1 N j = 1 N Y j g N k ( X j 2 , X j ) 2 + O p ( N 1 ) , W N = 1 N ( k 1 ) j 1 = 1 N j = 1 N Y j g N k ( X j 1 , X j ) 1 k j 2 = 1 N Y j 2 g N k ( X j 1 , X j 2 ) 2 + O p ( N 1 ) .

2.2. Asymptotic Distribution of the Test Statistic under the Null Hypothesis

Even though the test statistic is easy to calculate, the derivation of the asymptotic distribution is challenging since the augmented data in neighboring cells are correlated. In this subsection, we derive the asymptotic distribution of the test statistic derived with a different strategy than that proposed in Reference [16]. We first simplify it by finding its projection. Specifically, define
V c t = R c t E ( R c t | X ) ,
where X = ( X 1 , , X N ) . Then, we project B N onto the space:
Extended   span { V c , c = 1 , , N }
of the form c = 1 N a c g c ( V c ) , where a i are constants, V c = ( V c 1 , · · · , V c k ) , and g c ( V c ) is some function that is possibly nonlinear. This projection will help us to split B N into two terms, one of which includes a summation over c and the other over c and c for c c :
B N = P B ( V ) + S B ( V ) ,
P B ( V ) = k N c = 1 N V ¯ c · 2 , S B ( V ) = k N ( N 1 ) c c N V ¯ c · V ¯ c · ,
where V = ( V 1 , , V N ) and V ¯ c · = k 1 t = 1 k V c t . Then, P B ( V ) is in the space defined in (3) and B N W N = ( P B ( V ) W N ) + S B ( V ) = T B + S B ( V ) , where
T B = 1 N ( k 1 ) c = 1 N t t k V c t V c t = 1 N ( k 1 ) c = 1 N t t k ( R c t E ( R c t | X ) ) ( R c t E ( R c t | X ) ) = 1 N ( k 1 ) j j N ( Y j E ( Y j | X ) ) ( Y j E ( Y j | X ) ) c = 1 N I ( j C c ) I ( j C c ) = 1 N ( k 1 ) j j N ( Y j E ( Y j | X ) ) ( Y j E ( Y j | X ) ) K j j ,
and
K j j = c = 1 N I ( j C c ) I ( j C c ) .
Note that the term in (5) is closely related to the expected covariance between every pair of response values with correlation induced by their dependence on X . The K j j in (6) serves as a weight function which associates the response locally the empirical distribution function of X. The T B term in (5) is more intuitive than N ( B N W N ) to evaluate the lack-of-fit. However, T B cannot be calculated from the sample since E ( Y | X ) is unknown. On the other hand, N ( B N W N ) can be directly obtained from the sample.
We assume the following condition to obtain the result under the null hypothesis:
Assumption 1.
For all x, suppose that F ( x ) is differentiable, and the fourth conditional central moment of Y j given X j is uniformly bounded.
The advantage of using a small or fixed k instead of a large k can be seen here. Even though S B ( V ) is a quadratic form, only nearby cells have correlated observations due to the fixed number of nearest neighbors augmentation. On the other hand, when the number of nearest neighbors tends to infinity, the augmented data in many more cells will be correlated; therefore, S B ( V ) might diverge, and the derivation of the asymptotic distribution will require unnecessarily strong conditions on the magnitude of the correlation. It is straightforward to show that S B ( V ) = O P ( N 1 ) with a small or fixed k. Hence, N S B ( V ) is asymptotically negligible. We state this result in Lemma 1 below.
Lemma 1
(Projection of B N ). Let S B ( V ) be as defined in (4). If the Assumption 1 is satisfied, then
N S B ( V ) p 0 , a s N ,
where the notation p denotes convergence in probability.
To obtain the asymptotic distribution of the test statistic under the null hypothesis, we work with
N T B = N N ( k 1 ) j j N ( Y j E ( Y j | X ) ) ( Y j E ( Y j | X ) ) K j j ,
where K j j is defined in (6). We first give the large sample behavior of the variance of this term.
Theorem 1.
Under Assumption 1, λ : = lim N λ N = lim N Var ( N T B ) exists and
λ = E ( lim N δ N ) ,
where
δ N = j < j N 4 σ 2 ( X j ) σ 2 ( X j ) N ( k 1 ) 2 [ k | j * j * | ] 2 + [ k | j * j * | ] 2 I | j * j * | k 1 2 + O ( N 1 ) I ( | j * j * | k 1 ) ,
and j * , j * are the ranks of X j and X j among the covariate values X 1 , , X N .
To estimate the asymptotic variance, let j * be the rank of X j among all covariate values. Then, it is readily seen that a consistent estimator of λ under H 0 is
λ ^ N = j < j N 4 σ ^ 2 ( X j ) σ ^ 2 ( X j ) N ( k 1 ) 2 [ k | j * j * | ] 2 + [ k | j * j * | ] 2 I | j * j * | k 1 2 I ( | j * j * | k 1 ) ,
where σ ^ 2 ( X j ) is the sample variance based on the augmented observations for the cell determined by X j , i.e.,
σ ^ 2 ( X j ) = 1 k 1 l = 1 N Y l 2 g N k ( X l , X j ) 1 k l = 1 N Y l g N k ( X l , X j ) 2 .
Note that K j j are bounded counts and (7) is a clean quadratic form as defined in Reference [20]. The Central Limit Theorem for clean quadratic forms (Proposition 3.2) in Reference [20] can be applied to obtain the following result. We omit the details of the proof.
Theorem 2.
Under H 0 in (1) and Assumption 1,
G N d N ( 0 , 1 ) , a s N ,
where the notation d denotes convergence in distribution.

2.3. Results under Local or Fixed Alternatives

In this subsection, we consider the theoretical properties of the test under fixed or local alternatives in which the conditional expectation of Y given X is m ( x ) = E ( Y | X = x ) . Let A ( x ) be an univariate function of x.
Under a fixed alternative, m ( x ) can be expressed as
m ( x ) = m 0 ( x ) + A ( x ) ,
where m 0 ( x ) = E 0 ( Y | X = x ) is the conditional expectation of Y given X under the null hypothesis.
For local alternatives, consider the sequence of conditional expectations E N ( Y | X = x ) that approach m 0 ( x ) in the order of N 1 / 4 :
m ( x ) = E N ( Y | X = x ) = m 0 ( x ) + N 1 / 4 A ( x ) .
Both alternatives are valid for either discrete or continuous response variable and allow the data to have different conditional variance under the alternative hypotheses from that under the null. For example, if Y | X has a Poisson distribution with mean m ( x ) under the alternative, then the variance is m ( x ) instead of m 0 ( x ) .
Suppose ( X i , Y i ) , i = 1 , , N are observed data under either the fixed alternatives in (10) or the local alternatives in (11). Let Q = { Q c t ; c = 1 , , N , t = 1 , , k } be the augmented response values. Note that Q c t is equal to the observed response variable whose covariate value is one of the following:
X ( t ) if   c < ( k 1 ) / 2 , X ( c + t ( k + 1 ) / 2 ) if   ( k 1 ) / 2 c N ( k 1 ) / 2 , X ( N k + t ) if   c > N ( k 1 ) / 2 .
Then, Q c t can be written as Q c t = ε c t + E ( Q c t | X ) , where E ( Q c t | X ) includes the conditional mean under the null hypothesis and departure from the null. Note that ε c t = Q c t E ( Q c t | X ) satisfies the null hypothesis and can be viewed as the augmented data for Z i = Y i m 0 ( X i ) + A ( X i ) if ( X i , Y i ) are under the fixed alternative in (10) or for Z i = Y i m 0 ( X i ) + N 1 / 4 A ( X i ) if ( X i , Y i ) are under the local alternatives in (11). In either case, the conditional mean Z i given X i satisfies the null hypothesis but with Var ( Z i | X i ) equal to Var ( Y i | X i ) under the alternative hypotheses. For convenience, define A c t to be the A ( x ) function evaluated at the covariate value for augmented observation Q c t . Let A ¯ c · = k 1 t = 1 k A c t , A ¯ · · = N 1 c = 1 N A ¯ c · , Q ¯ c · = k 1 t = 1 k Q c t , Q ¯ · · = N 1 c = 1 N Q ¯ c · , ε ¯ c · = k 1 t = 1 k ε c t , and ε ¯ · · = N 1 c = 1 N ε ¯ c · . Denote B N ( Q ) and W N ( Q ) to be the average between-cell variations and the average within-cell variations under the alternative hypotheses, respectively.
Under the local alternatives,
B N ( Q ) = k ( N 1 ) 1 c = 1 N Q ¯ c · Q ¯ · · 2 = k ( N 1 ) 1 c = 1 N ε ¯ c · ε ¯ · · + N 1 / 4 A ¯ c · A ¯ · · 2 = k ( N 1 ) 1 c = 1 N ε ¯ c · ε ¯ · · 2 + N 1 / 2 A ¯ c · A ¯ · · 2 + 2 N 1 / 4 A ¯ c · A ¯ · · ε ¯ c · ε ¯ · · ,
and
W N ( Q ) = { N ( k 1 ) } 1 c = 1 N t = 1 k Q c t Q ¯ c · 2 = { N ( k 1 ) } 1 c = 1 N t = 1 k ε c t ε ¯ c · + N 1 / 4 A c t A ¯ c · 2 = { N ( k 1 ) } 1 c = 1 N t = 1 k ε c t ε ¯ c · 2 + N 1 / 2 A c t A ¯ c · 2 + 2 N 1 / 4 ε c t ε ¯ c · A c t A ¯ c · .
In this case, the numerator of the test statistic can be written as
N ( B N ( Q ) W N ( Q ) ) = Δ N , 0 + Δ N , 1 + Δ N , 2 Δ N , 3 Δ N , 4 ,
where
Δ N , 0 = N k ( N 1 ) 1 c = 1 N ε ¯ c · ε ¯ · · 2 { N ( k 1 ) } 1 c = 1 N t = 1 k ε c t ε ¯ c · 2 ,
Δ N , 1 = N k ( N 1 ) 1 c = 1 N N 1 / 2 A ¯ c · A ¯ · · 2 ,
Δ N , 2 = N k ( N 1 ) 1 c = 1 N 2 N 1 / 4 A ¯ c · A ¯ · · ε ¯ c · ε ¯ · · ,
Δ N , 3 = N { N ( k 1 ) } 1 c = 1 N t = 1 k N 1 / 2 A c t A ¯ c · 2 ,
Δ N , 4 = 2 N { N ( k 1 ) } 1 c = 1 N t = 1 k ε c t ε ¯ c · N 1 / 4 A c t A ¯ c · .
Similarly, under the fixed alternatives,
N ( B N ( Q ) W N ( Q ) ) = Δ N , 0 + N Δ N , 1 + N 1 4 Δ N , 2 N Δ N , 3 N 1 4 Δ N , 4 ,
where Δ N , i , i = 0 , , 4 are given in (13)–(17).
The following additional condition is needed for the result under the alternative hypotheses:
Assumption 2.
Suppose that X i has bounded support [ a , b ] , and A ( x ) is locally Lipschitz continuous on [ a , b ] : for each z [ a , b ] , there exists an L > 0 such that A ( x ) is Lipschitz continuous on the neighborhood B L ( z ) = { y [ a , b ] : | y z | < L } . Further, we assume that the fourth central moments of A ( X i ) are uniformly bounded.
Before we give the asymptotic distribution of the test statistic under the alternatives, we state the following results which are valid under both the local and fixed alternative hypotheses.
Lemma 2.
Under Assumptions 1 and 2, as N ,
( i ) Δ N , 2 = O p ( N 1 4 ) ; ( i i ) Δ N , 3 = O p ( N 2 ) ; ( i i i ) Δ N , 4 = O p ( N 3 4 ) ,
where Δ N , 2 , Δ N , 3 , and Δ N , 4 are defined in (15), (16), and (17), respectively.
The proof of Lemma 2 is given in Appendix A. From this Lemma and Equations (12) and (18), we can see that Δ N , 0 and Δ N , 1 are the major terms that provide power under the alternative hypotheses. We state the results separately for fixed and local alternatives.
Theorem 3.
Let λ N A be defined similarly as λ N in Theorem 1 but with σ 2 ( X j ) calculated under the alternatives in (11). For the sequence of local alternatives E N ( Y | X ) in (11) and under the Assumptions 1 and 2, the limit λ A = lim N λ N A exists and
G N d N ( k σ A 2 / λ A , 1 ) ,
where
σ A 2 = A 2 ( x ) f ( x ) d x A ( x ) f ( x ) d x 2 = Var ( A ( X ) ) .
Note that λ N in Theorem 1 and λ N A in Theorem 3 share the same formula, except that σ 2 ( X j ) = Var ( Y j | X j ) in λ N A needs to be calculated under the alternatives in (11). For example, if Y given X has a Bernoulli distribution, then the conditional variance of Y given X under the local alternatives in (11) is σ 2 ( x ) = E N ( Y | X = x ) ( 1 E N ( Y | X = x ) ) = m ( x ) ( 1 m ( x ) ) , which is different from that under the null hypothesis E 0 ( Y | X = x ) ( 1 E 0 ( Y | X = x ) ) = m 0 ( x ) ( 1 m 0 ( x ) ) .
Theorem 4.
For the fixed alternative in (10), under Assumptions 1 and 2, the power of the test using statistic G N goes to one as N .
The proofs of Theorems 3 and 4 are given in Appendix A.
In heteroscedastic regression, it is common in the literature to write Y i = m ( X i ) + σ ( X i ) e i with e i independent of X i . In this formulation, the entire error term σ ( X i ) e i is uncorrelated with X i . In the ideal case that there is no lack-of-fit, such a model is reasonable. However, when there is a lack-of-fit because a wrong regression function is specified, the error term still contains some systematic information of E ( Y i | X i ) . Then, it is possible that the error resulting from the specified regression function is still correlated with X i .

2.4. Selection of the Number of Nearest Neighbors

The number of nearest neighbors k in the test statistic specifies the number of values augmented in each cell. Our theory requires that it takes a finite small odd integer. In simulations, we have found that the type I error remains close to the nominal level for different small k values and stays stable for a broad range of sample sizes and error distributions. Under the alternative hypothesis, different k may lead to different power for our test statistic. This section discusses how to select the parameter k.
Under the alternative hypothesis, our k-nearest neighbor augmentation is parallel to regression using a local constant based on k-nearest neighbors. For a continuous response variable, Hardle et al. [18] suggested the Least Squares Cross-Validation (LSCV) method for smoothing parameter (bandwidth) selection in kernel regression estimation. Chen et al. [5] recommended using the one-sided cross-validation procedure of Reference [21] to select smoothing parameter (bandwidth) for hypothesis testing. The number of nearest neighbors k in our setting has a similar role as the smoothing parameter in kernel regression.
For a categorical response variable, Holmes et al. [22] proposed an approach to select the parameter k in the k-nearest neighbor (KNN) classification algorithm using likelihood-based inference. Choosing k in this method can be considered as a generalized linear model variable-selection problem. In particular, for multinomial data ( y i , x i ) , i = 1 , , n , where y i { C 0 , , C Q } denotes the class label of the ith observation, and x i is a vector of p predictor variables, they considered the probability model
p r ( y i = C i | y [ i ] , x i , k ) = exp ( z i ( k , j ) θ ) υ = 0 Q exp ( z i ( k , υ ) θ ) ,
where y [ i ] = { y 1 , , y i 1 , y i + 1 , , y n } denotes the data with the ith observation deleted, θ is a single regression parameter, and z i ( k , υ ) is the difference between the proportion of observations in class C υ and that in class C 0 within the k-nearest neighbors of x i , i.e.,
z i ( k , υ ) = 1 k j k i { I ( y j = C υ ) I ( y j = C 0 ) } ,
where the notation j k i denotes that the summation is over the k-nearest neighbors of x i in the set { x 1 , , x i 1 , x i + 1 , , x n } , and the neighbors are defined based on the Euclidean distance. The prediction for a new point y n + 1 | x n + 1 is given by the most common class in the k-nearest neighbors of x n + 1 . Afterwards, the value that maximizes the profile pseudolikelihood is chosen to estimate the parameter k. However, this method is only valid when the response variable is a categorical variable and the nearest neighbor is defined using the Euclidian distance.
In our case, the response variable could be continuous or categorical, and our nearest neighbors are defined through ranks. So, we do not recommend to use our test statistic with an estimate of k obtained with aforementioned procedures. We consider an alternative method to estimate k which uses ranks to define nearest neighbors and can be applied in both categorical and continuous response cases. Here, we adopt the idea of the Least Squares Cross-Validation (LSCV) procedure of Reference [18] to select the parameter k. Different from Reference [18], where the regression function is estimated using kernel estimation, we consider k-nearest neighbor estimates with the neighbors defined through the ranks of the predictor variable. In the case of categorical response variable with Q classes, we re-code the response variable to have integer values from 1 to Q. To estimate the class for the response variable, we use the majority vote (the most common value) from the k-nearest neighbors. For tied situation where there are multiple classes achieving the same highest frequency, one of them is assigned randomly to be the estimated response. In the case of continuous response variable, the regression function is estimated by the average of the k-nearest neighbors.
In a leave-one-out procedure, for each c { 1 , , N } , we eliminate ( X c , Y c ) and use the rest of the observations to estimate the regression function which then is used to predict the response value Y at X c . Here are our steps:
1.
Find the observation in X [ c ] = { all X i , where i = 1 , , N a n d i c } such that the absolute difference between this observation, and X c is minimized. Denote
J ( c ) = { arg min j | X j X c | , where j = 1 , , N and j c } .
Then, X J ( c ) is the closest to X c .
2.
Find the k-nearest neighbors of X J ( c ) in terms of ranks. We use the corresponding Y i values such that
N | F ^ ( X J ( c ) ) F ^ ( X i ) | k 1 2 for i c ,
to obtain the leave-one-out estimate of the regression function at X c . That is,
m ^ k , c ( X c ) = k 1 i = 1 , i c N Y i I N | F ^ ( X J ( c ) ) F ^ ( X i ) | k 1 2 , continuous   case , Mode   of { Y i : all i c such   that N | F ^ ( X J ( c ) ) F ^ ( X i ) | k 1 2 } , categorical   case ,
where the Mode is defined as the most frequently observed value in a set of numbers. In the case where the most frequently observed values are not unique, one of them is randomly selected.
3.
Repeat steps 1 and 2 for c = 1 , , N to obtain all leave-one-out estimates.
Then, define the leave-one-out Least Squares Cross-Validation error as
L S C V ( k ) = 1 N c = 1 N m ^ k , c ( X c ) Y c 2 .
Finally, the number of nearest neighbors is estimated by
k ^ = arg min k κ L S C V ( k ) ,
where the set κ consists of small odd integers.
When the response variable is categorical, the estimate of k from this algorithm depends on how well the covariate values from different classes are separated and how many observations are in each class. For large class sizes, it is very possible that the resulting estimate is much greater than 10 if we leave κ unconstrained. However, our theory requires k to be a finite, positive, and odd integer.
In the continuous case with k-nearest neighbor estimation, the average of a big proportion of Y values is used to approximate the response variable if a large k value is utilized. As a consequence, bigger k tends to give larger least squares error when the regression function is under the alternative hypothesis. This is especially true when the regression function has substantial curvature, such as in high frequency alternatives. On the other hand, larger k tends to give smaller least squares error when the data were generated under the constant regression null hypothesis.
In either case, the smallest value for k is 3 (note: k = 1 corresponds to the case of no data augmentation). In order to keep the least squares error minimized under the alternative hypothesis and reasonable under the null hypothesis, we recommend to let κ contain a few small integer values. For example, κ = { 3 , 5 } , which is a safe choice for both moderate and large sample sizes.
Figure 1 shows the typical pattern of L S C V ( k ) as a function of k for k = 3 , 5 , 7 , 9 when the response variable was generated as (1) Y i = e i ; (2) Y i = 2 X i 2 + e i ; (3) Y i = 10 sin ( 8 π X i ) + e i ; and (4) Y i = 10 sin ( 8 π X i ) + e i X i , where e i and X i are i.i.d N ( 0 , 1 ) .

3. Monte Carlo Simulation Studies

In this section, we present the results of some simulation studies to investigate the type I error and power performance of our test. The test has a parameter k to specify the number of nearest neighbors for data augmentation. The inference for our test requires the k to be a small odd positive integer. We report the results for k = 3 and 5 and denote them as G N 3 and G N 5 , respectively. This is for the user to have an idea of how the test behaves with a given k. Furthermore, we report the results of our test with k selected from 3 and 5 using our considered method in Section 2.4 and denote it as G N . For the G N applied to each generated data set, the value of the k is selected using k ^ in (20), and our test with parameter k ^ is used to obtain the p-value.
For comparison, we also report the corresponding results for the test of Reference [16], the order selection (OS) test of Reference [1], the rank-based test (ROS) of Reference [2], the bootstrap order selection test (BOS) of Reference [5], and the Bayes sum test of Reference [3]. As argued in Section 7.1 of Reference [4], evenly spaced design points should be used for calculation of these four test statistics even when they are unevenly spaced. So, the generated covariate values in increasing order were replaced by evenly spaced design points on ( 0 , 1 ) for all four tests. For BOS, we apply the wild bootstrap algorithm of Reference [5] based on the residuals Y i Y ¯ , i = 1 , , n , and use their test statistic with 1000 bootstrap samples for each replication. For the Bayes sum test, we use the statistic that has been reported to have good power from a comprehensive simulation study in Reference [3]. For approximating the p-values of the Bayes sum test, Hart [3] gave two versions of the approximation, one assuming normality (BN) and one using the bootstrap (BB). For BN, a random sample of the same sample size as the data was generated from the standard normal distribution, and the Bayes sum test statistic was calculated from the data so generated, regardless of the actual distribution of the response variable. The process was independently repeated 10,000 times, and the p-value was obtained based on the empirical distribution of these 10,000 values. For BB, the bootstrap samples were drawn from the empirical distribution of the residuals Y i Y ¯ , i = 1 , , n , rather than the normal distribution, and the p-value approximation was carried out similarly. The scale parameter σ 2 for a given data set Y 1 , , Y n in both BB and BN statistics was estimated by σ ^ 2 = ( n 2 ) 1 i = 2 n 1 ( 0.809 Y i 1 0.5 Y i 0.309 Y i + 1 ) 2 , as was suggested in Reference [3]. It was reported in Reference [3] that the results obtained using the normality assumption were in basic agreement with those obtained using the bootstrap. So, we only report the simulation results for BN.
The values for the covariate X were independently generated from Uniform ( 0 , 1 ) . First, we consider the performance of different tests under the H 0 . The data were generated from
Model   M 0 : Y i = 10 + ϵ i ,
where the error term ϵ i were independently generated with one of the four error distributions:
1.
ϵ i Uniform ( 0.1 , 0.1 ) (denoted as Unif in Table 1 and Table 2);
2.
(Normal) ϵ i Normal ( 0 , 0 . 02 2 ) (denoted as Normal in Table 1 and Table 2);
3.
ϵ i = V i / 30 , where V i follows t-distribution with 5 degrees of freedom (denoted as T ( 5 ) / 30 in Table 1 and Table 2); and
4.
ϵ i = X i · e i , where e i Uniform ( 0.1 , 0.1 ) . This is a heteroscedastic regression model and denoted as Heter in Table 1 and Table 2.
The empirical type I error rates (in percentage) under H 0 are reported in Table 1. It can be seen that the test of Reference [16] with k n = 3 or 5 generally has inflated type I error and is particularly serious with smaller sample sizes. For k n = 7 , their test has type I error close to 0.05 when the error distribution is Normal or T(5)/30 but is as high as about twice of the significance level in the heteroscedastic case. Its performance for k n = 7 is better than with other k n values but still is inflated for the heteroscedastic case. The order selection test of Reference [1] and the proposed test with different k-values have better type I error control. Among the three k-values, larger k pulls more observations around each covariate value as pseudo replicates. This could lead to the test being less sensitive to curvature departure against the null hypothesis. Hence, we recommend to choose k between 3 and 5.
Next, we consider the performance of the tests with data generated from nonlinear models. The response values were independently generated according to the following four models for i = 1 , , n , with the moderate sample size of n = 50 in all cases:
  • Model M 1 : Y i = 10 c o s ( q π X i ) + ϵ i ,
  • Model M 2 : Y i = 10 s i n ( q π X i ) + ϵ i ,
  • Model M 3 : Y i = e 2 X i c o s ( q π X i ) + ϵ i , and
  • Model M 4 : Y i = 0.2 e 2 X i c o s ( q π X i ) + ϵ i ,
where q in Models M 1 M 4 represents the frequency. We considered q = 8 and q = 2 . The case with q = 8 is a higher frequency alternative compared to those reported in Reference [3]. The data for the error term ϵ i in each model were independently generated with one of the four error distributions listed earlier. Model M 0 serves as the null model to obtain the type I error rates for all tests. For each error distribution, the data were generated from Models M 0 M 4 , with sample size n = 50 for 2000 times, and the rejection rates (in percentage) at significance level 0.05 are reported in Table 2.
It can be seen that the type I error estimates for all tests were below or close to the nominal level 0.05 for all models with homoscedastic errors. For the heteroscedastic regression model, the variance of the error depends on the covariate, while the conditional mean of the response variable given the covariate is a constant under Model M 0 . In this case, all the tests tend to be liberal.
The rows M 1 to M 4 in Table 2 show the power comparison for the different combinations between Models M 1 M 4 and the four types of the error distribution. The powers of our test with k ^ ( G N ) are higher than all other tests in all cases. BN has power close to our test. OS, ROS, and BOS fall far behind. The low power performance of BOS in the case of high frequency alternatives was mentioned in Reference [5], and they suggested (without details) to use smoothing squared residuals to deal with that.
It is noticeable that the power of our test is 1 for Models M 1 and M 2 for all different types of the error distribution and very close to 1 for Models M 3 and M 4 . In addition, the power for OS was slightly higher than that for ROS in all cases.
Models M 3 and M 4 are similar, except that Model M 4 has lower signal to noise ratio than Model M 3 . With the lower signal to noise ratio, the power for ROS, OS and BOS drops drastically. To have a closer look at the numerical performance of all tests under local alternatives, we considered the model Y i = C e 2 X i c o s ( 8 π X i ) + ϵ i , with C = 0.1 , 0.12 , 0.14 , 0.16 , 0.18 and ϵ i Uniform ( 0.1 , 0.1 ) . The empirical power curves are given in Figure 2. It is obvious that our test G N has consistently higher power than the other tests.
The discussions above are for high frequency alternatives with q = 8 and moderate sample size n = 50 . When sample size increases, while the frequency stays the same, the power of each test also increases. For sample size of 100, the empirical power is 1 for all the compared tests of NB, OS, ROS, and G N under Models M 1 M 3 . Under Model M 4 , OS and ROS have power slightly below 1 for the case with uniform error. The rest of the tests have power close to 1. Similarly, for lower frequency alternatives, for example, when q = 2 and n = 50 , all these tests have power close to 1.
To examine how the power of these tests changes with the sample size, we generated data with model Y i = N 1 / 4 A ( X i ) + ϵ i , where A ( X i ) = 0.3 e 2 X i c o s ( 8 π X i ) , ϵ i Uniform ( 0.1 , 0.1 ) , for N = 15 , 25 , 50 , 75 , 100 , 125 , 150 , 175 , 200 , 250 . The empirical power of these tests is presented in Figure 3, where G N is our test with k ^ selected from k = 3 and 5 based on (20). It is obvious that the proposed test G N consistently has the highest power over all the sample sizes considered.
Even though BN showed a comparable performance to our test G N in many cases, the running time of BN is much longer than G N . In particular, the average running time from 10,000 runs from BEOCAT cluster machines for G N is 0.03 s, while that for BN is 9.7 s. So, G N is more than 300 times faster than BN.

4. Applications to Real Data

4.1. Application to Gene Expression Data from Patients Undergoing Radical Prostatectomy

In this subsection, we present an application of our test to gene expression data from patients undergoing radical prostatectomy in order to predict the behavior of Prostate cancer. This data set was collected between 1995 and 1997 at the Brigham and Women’s Hospital from 52 tumor and 50 normal prostate samples using oligonucleotide microarrays containing probes for 12,600 genes and expressed sequence tags (the data is available at https://www.ncbi.nlm.nih.gov/gds/?linkname=pubmed_gds&from_uid=12086878 on 11 July 2021). The data shows heterogeneity and has a binary response variable which is the patient outcome (tumor or normal). Applying our test to the expression data from each gene, we identified 980 genes that are significantly associated with the response variable after Bonferroni correction ( p 0.001 / 12,600). On the other hand, Singh et al. [23] used permutation test to identify important genes. They found 456 genes whose expression values are significantly correlated with patient outcome ( p 0.001 ) . Note that the significance declared by Reference [23] is at 0.001 level without any multiple comparison adjustment. Ours are obtained at the same significance level but with the Bonferroni control, which is a very conservative method for multiple comparison adjustment. With such conservative control, we still identified more than twice of the genes than Reference [23]. It is worth mentioning that our test was developed under very general assumptions that are expected to hold true for the microarry data here. These results suggest that our test is much more powerful than the permutation test of Reference [23]. Furthermore, we performed k-nearest neighbor (KNN) classification on the data for the top i genes (i genes with smallest p-values, i = 1 , 2 , , 980 ) to predict the patient outcomes. The leave-one-out cross validation (LOOCV) was used as a validation method. The parameter k in KNN was estimated with the training part of the data in LOOCV procedure by the profile pseudolikelihood method of Reference [22]. The leave-one-out accuracy curve with increasing number of selected top i genes is shown in Figure 4. We would like to comment that these genes were obtained individually. Our simple application of the test is not meant to find the best combination of genes that have the best classification accuracy. Even under such circumstances, the top genes found with our test give good LOOCV accuracy.

4.2. Application to Assess Richards Growth Curve Fit for the COVID Cases and Deaths

In this application, we would like to assess if the popular Richards growth model can fit the COVID-19 cases or deaths for the U.S.
The Richards growth curve model has been adapted recently for real-time prediction of outbreak of diseases in epidemiology. Below is a form of the the Richards curve f ( t ; θ 1 , θ 2 , θ 3 , ξ ) that was used to fit the COVID-19 outbreak in Reference [24]:
f t ; θ 1 , θ 2 , θ 3 , ξ = θ 1 · 1 + ξ · exp θ 2 · t θ 3 1 / ξ ,
where θ 1 , θ 2 , and θ 3 are real numbers, and ξ is a positive real number.
Using this parameterization of the Richards curve, Lee et al. [24] stated that, given a progression constant γ with 0 < γ < 1 , the flat time point t f l a t , γ is given by
t f l a t , γ = θ 3 1 θ 2 · log 1 ξ · 1 γ ξ 1 , 0 < γ < 1 .
They predicted that the posterior means of the flat time points t f l a t , γ for the U.S. to be 30 May, 16 July, 30 August, and 15 October when the corresponding γ ’s are chosen by 0.9, 0.99, 0.999, and 0.9999, respectively. However, as of the end of 2020, the COVID-19 confirmed cases are still continuing to climb up. This is an evidence that the Richards curve does not fit the COVID-19 infection growth well.
We downloaded the cumulative number of confirmed cases of U.S. COVID-19 historical data from https://covidtracking.com/data/download/national-history.csv on 12 July 2021. This website stopped tracking COVID in late March 2021. We also downloaded the daily number of death counts for U.S. COVID-19 data from the CSSE data repository of Johns Hopkins University at https://github.com/CSSEGISandData/COVID-19 on the same day. The data contains death counts up to 11 July 2021.
To fit a Richards curve for each data set, we only included data on days such that the cumulative count is at least seven and removed the last ten days’ data so that those ten days’ counts could be used as an additional out-of-sample assessment of the model fit. A separate Richards curve was fitted for the daily cumulative confirmed cases and deaths. Figure 5 shows the nonlinear least squares fitted Richards growth curves and the observed counts. The fitted curve also contains prediction for the future 10 days beyond the observed days. Table 3 gives the correlation coefficients between the observed counts and the estimated count. Even though the R 2 s between the fitted curves and observed numbers are over 99%, the p-values for the lack-of-fit tests are both essentially 0. The Richards curve seems to be better at predicting the number of confirmed cases than predicting the number of deaths. It appears that the pandemic progression is more complicated than the Richards curve due to changes in social distancing, stay-at-home order policies, and vaccine availability. U.S. holidays, such as Thanksgiving and Christmas, also contributed drastically to the increase of count in late November. The θ 1 parameter predicts the epidemic size. For the total number of deaths, the model predicted epidemic size is 594,331. For the confirmed cases, the model predicted size is around 27.9 million. Both numbers underestimate the true epidemic size. As of 12 July 2021, the U.S. confirmed cases is over 33.8 million, and the number of U.S. deaths is 606,000. These numbers are much larger than the model predicted size, suggesting that the Richards curves could not model the confirmed cases or deaths adequately.

5. Conclusions

In this paper, we derived the asymptotic distribution of a nonparametric lack-of-fit test of constant regression in the presence of heteroscedastic variances. We considered a test statistic obtained using the augmentation of a small number of k-nearest neighbors defined through the ranks of the predictor variable. The test statistic is the studentized version of the difference of two quadratic forms. Both of two quadratic forms estimate a common quantity under the null hypothesis but they converge to different quantities under any alternatives The asymptotic distribution of the difference was also given in Reference [16] but with a biased asymptotic variance. We derived the correct form of the asymptotic distribution of the test statistic under both the null hypothesis and local alternatives. In addition, we also provided a procedure to choose the parameter k based on the Least Squares Cross-Validation idea used in k-nearest neighbor regression. Our test has several advantages. It provides a unified framework for testing lack-of-fit for a given regression function when the response is either a discrete or a continuous random variable and the covariate is a continuous variable. This makes it convenient for unified inferences and applications. There is no need to assume the distribution of the data, which makes the test widely applicable to many practical data. The fixed number of nearest neighbors augmentation ensures good power to detect both low and high frequency alternatives even for moderate sample sizes, and the parametric standardizing rate for the test statistic is achieved. The test statistic is easy and fast to calculate. Our simulation studies show that the test is more powerful than some well known competing test procedures when data are generated under high frequency alternatives. Therefore, the results in this paper offer a useful tool for lack-of-fit testing.

Author Contributions

Conceptualization, M.M.G. and H.W.; methodology, M.M.G. and H.W.; software, M.S. and H.W.; validation, M.M.G., M.S., H.W., and S.W.; formal analysis, M.S. and H.W.; investigation, M.M.G. and H.W.; data curation, H.W.; writing—original draft preparation, M.M.G. and H.W.; writing—review and editing, H.W. and S.W.; visualization, H.W.; supervision, H.W. and S.W.; project administration, H.W.; funding acquisition, H.W. and S.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by two grants #246077 and #499650 from the Simons Foundation.

Institutional Review Board Statement

This study used publicly available data. Not institutional review is needed.

Informed Consent Statement

Not applicable.

Data Availability Statement

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Proof of Theorem 1

We can write λ N = Var ( N T B ) = E ( Var ( N T B | X ) ) + Var ( N E ( T B | X ) ) . It is clear that Var ( N E ( T B | X ) ) = 0 since by the definition of T B in (5),
E ( N T B | X ) = E N 1 / 2 ( k 1 ) j j ( Y j E ( Y j | X ) ) ( Y j E ( Y j | X ) ) X K j j = 0 a . s .
Therefore, we only need to consider E ( Var ( N T B | X ) ) to obtain λ N . Let t j j = ( Y j E ( Y j | X ) ) ( Y j E ( Y j | X ) ) K j j . Then,
N ( k 1 ) 2 E ( Var ( N T B | X ) ) = E Var j j t j j | X = 2 E j j E ( t j j 2 | X )
= 2 j j E σ 2 ( X j ) σ 2 ( X j ) K j j 2 .
Let X ( j * ) be the order statistic for X j so that j * is the rank of X j among { X j 1 , j 1 = 1 , , N } . Then,
λ N = E ( Var ( N T B | X ) ) = 4 N ( k 1 ) 2 E j < j σ 2 ( X j ) σ 2 ( X j ) E K j j 2 X j , X j , j * , j * = 4 N ( k 1 ) 2 E j < j σ 2 ( X j ) σ 2 ( X j ) [ E 2 ( K j j | X j , X j , j * , j * ) + Var ( K j j | X j , X j , j * , j * ) ] .
To find the conditional expectation, without loss of generality, assume that X j < X j , so that j * < j * . Let
Λ j j = E I ( j C c , j C c ) | X j , X j , j * , j * = P ( X j C c , X j C c | X j , X j , j * , j * ) = X j L j X j + D j f ( x ) d x I ( j * j * k 1 ) ,
where D j = the upper k / 2 spacing, and L j = the lower ( k / 2 ( j * j * ) ) spacing, from X j . Applying Taylor’s expansion twice, we can write
Λ j j = { [ F ( X j + D j ) F ( X j L j ) ] + O p ( N 2 ) } I ( j * j * k 1 ) .
From the properties of spacings in Reference [25], we have
E ( F ( X j + D j ) F ( X j L j ) | X j , X j , j * , j * ) = [ k ( j * j * ) ] / ( N + 1 ) · I ( j * j * k 1 ) .
Therefore, for X j 1 X j and X j 1 X j , we have
E ( Λ j j | X j , X j , j * , j * ) = [ k ( j * j * ) 2 I ( j * j * ( k 1 ) / 2 ) ] / ( N + 1 ) + O p ( N 2 ) × I ( j * j * k 1 ) ;
if X j 1 = X j (or symmetrically X j 1 = X j ), then
Λ j j = I ( j * C X ( j * ) ) = I j * j * ( k 1 ) / 2 .
Collecting terms from (A2) and (A3), we have
E K j j | X j , X j , j * , j * = k ( j * j * ) + O p ( N 1 ) I ( j * j * k 1 ) .
Now, consider the conditional variance. Note that, when X c { X j , X j } , the term in K j j is a constant. Therefore,
Var K j j | X j , X j , j * , j * = Var c = 1 N I ( j C c ) I ( j C c ) I ( X c { X j , X j } ) X j , X j , j * , j *
= c 1 = 1 N c 2 = 1 N E [ I ( j C c 1 ) I ( j C c 1 ) I ( j C c 2 ) I ( j C c 2 ) | X j , X j , j * , j * ]
E [ I ( j C c 1 ) I ( j C c 1 ) | X j , X j , j * , j * ] E [ I ( j C c 2 ) I ( j C c 2 ) | X j , X j , j * , j * ]
× I ( X c 1 { X j , X j } ) I ( X c 2 { X j , X j } )
= c = 1 N E I ( j C c ) I ( j C c ) I ( X c { X j , X j } ) X j , X j , j * , j *
c = 1 N [ E ( I ( j C c ) I ( j C c ) | X j , X j , j * , j * ) ] 2 I ( X c { X j , X j } ) ,
where the last equality is due to the fact that the indicator functions involving c 1 and c 2 are conditionally independent when c 1 c 2 and neither c 1 , c 2 is X j or X j . Plugging (A2) through (A4) into the right-hand side of the equation above, we obtain
Var ( K j j | X j , X j , j * , j * ) = k ( j * j * ) 2 I j * j * k 1 2 + O p ( N 1 ) I ( j * j * k 1 ) .
Putting (A4) and (A5) into (A1), we have
λ N = j < j N E 4 σ 2 ( X j ) σ 2 ( X j ) N ( k 1 ) 2 [ k ( j * j * ) ] 2 + [ k ( j * j * ) ] 2 I j * j * k 1 2 + O p ( N 1 ) I ( j * j * k 1 ) .
Next, we will show that the limit of λ N exists. Note that λ N = E ( δ N ) , where
δ N = j < j N 4 σ 2 ( X j ) σ 2 ( X j ) N ( k 1 ) 2 [ k | j * j * | ] 2 + [ k | j * j * | ] 2 I | j * j * | k 1 2 + O p ( N 1 ) I ( | j * j * | k 1 ) .
It is clear that [ k | j * j * | ] 2 I ( | j * j * | k 1 ) and k | j * j * | I ( | j * j * | k 1 ) are both at least 1. Therefore, [ [ k | j * j * | ] 2 + [ k | j * j * | ] 2 I | j * j * | k 1 2 ) ] I ( | j * j * | k 1 ) is nonnegative. Consequently, δ N is a summation of nonnegative terms.
Under Assumption 1, the conditional variance of Y j given X j is uniformly bounded (i.e., there exists a constant C > 0 such that σ 2 ( X j ) C for all j). We have
δ N = j < j N 4 σ 2 ( X j ) σ 2 ( X j ) N ( k 1 ) 2 [ k | j * j * | ] 2 + [ k | j * j * | ] 2 I | j * j * | k 1 2 + O p ( N 1 ) I ( | j * j * | k 1 ) j < j N 4 C 2 N ( k 1 ) 2 [ k | j * j * | ] 2 + [ k | j * j * | ] 2 I | j * j * | k 1 2 + O p ( N 1 ) I ( | j * j * | k 1 ) .
If we replace the summation in (A6) over the original sample index j , j by the summation over the ranks j * , j * and denoting
M ( | j * j * | ) = [ k | j * j * | ] 2 + [ k | j * j * | ] 2 I | j * j * | k 1 2 + O p ( N 1 ) I ( | j * j * | k 1 ) ,
then we have
δ N j * < j * N 4 C 2 N ( k 1 ) 2 M ( | j * j * | ) .
The right-hand side of the inequality (A7) converges to
2 k ( 2 k 1 ) 3 ( k 1 ) C 2 + 2 ( k 2 ) ( k 1 ) C 2 ,
which is finite for finite C and fixed k > 1 (note that, in our augmentation, k is a finite odd integer with minimum value of 3). Note that δ N is the summation of nonnegative terms (with probability 1) due to the fact that M ( | j * j * | ) 0 . Hence, the limit of δ N exists as a result of the Comparison Test in calculus.
The convergence of λ N = E ( δ N ) is due to the Dominated Convergence Theorem after noticing that the expectation of (A8) is finite. Applying the Dominated Convergence Theorem to λ N , we get lim N λ N = lim N E ( δ N ) = E ( lim N δ N ) . This completes the proof.

Appendix A.2. Lemma A1 and Its Proof

The following lemma will be needed in the proof of Lemma 2.
Lemma A1.
For locally Lipschitz continuous function A ( x ) on a bounded support [ a , b ] , we have
A ( X i ) I ( i C c ) A ( X j ) I ( j C c ) = O p ( N 1 ) ,
uniformly in i , j = 1 , 2 , , N , for a given c = 1 , 2 , , N .
Sketch Proof of Lemma A1.
Recall that f ( x ) and F ( x ) are the marginal probability density function and cumulative distribution function of X j , respectively. Let Y 1 , Y 2 , , Y N be independent Exponential random variables with mean 1, and U 1 , U 2 , , U N be independent Uniform random variables on ( 0 , 1 ) . Without loss of generality, assume that X 1 , X 2 , , X N are ordered. Define D i = X i X i 1 , for 2 i N . Then, from the properties of spacings on page 406 of Reference [25], there exists an a i [ a , b ] such that F ( a i ) ( U ( i 1 ) , U ( i ) ) and D i = ( N i + 1 ) 1 Y i { 1 F ( a i ) } { f ( a i ) } 1 for 2 i N . For j > i ,
X j X i = D i + 1 + D i + 2 + + D j = l = i + 1 j 1 N l + 1 Y l 1 F ( a l ) f ( a l ) l = i + 1 j 1 N l + 1 Y l 1 U ( l 1 ) f ( a l ) 1 inf l [ i + 1 , j ] f ( a l ) l = i + 1 j 1 N l + 1 Y l ( 1 U ( l 1 ) ) = K * l = i + 1 j 1 N l + 1 Y l ( 1 U ( l 1 ) ) ,
where K * is some positive constant.
Note that the random variables Y l and U ( l ) are independent, 1 l N , and U ( l 1 ) has B e t a ( l 1 , N l + 2 ) distribution. Therefore,
E 1 N l + 1 Y l ( 1 U ( l 1 ) ) = 1 N l + 1 E ( Y l ) E ( 1 U ( l 1 ) ) = N l + 2 ( N l + 1 ) ( N + 1 ) = O ( N 1 ) ,
and
Var 1 N l + 1 Y l ( 1 U ( l 1 ) ) = 1 ( N l + 1 ) 2 E ( Y l ) 2 E ( 1 U ( l 1 ) ) 2 E ( Y l ) E ( 1 U ( l 1 ) ) 2 = 1 ( N l + 1 ) 2 2 ( l 1 ) ( N l + 2 ) ( N + 1 ) 2 ( N + 2 ) + ( N l + 2 ) 2 ( N + 1 ) 2 ( N l + 2 ) 2 ( N + 1 ) 2 = 1 ( N l + 1 ) 2 2 ( l 1 ) ( N l + 2 ) ( N + 1 ) 2 ( N + 2 ) + ( N l + 2 ) 2 ( N + 1 ) 2 = 1 ( N + 1 ) 2 ( N + 2 ) 2 ( l 1 ) ( N l + 2 ) ( N l + 1 ) 2 + ( N + 2 ) ( N l + 2 ) 2 ( N l + 1 ) 2 1 ( N + 1 ) 2 ( N + 2 ) 2 ( N + 2 ) ( N l + 2 ) 2 ( N l + 1 ) 2 = O ( N 2 ) ,
where the inequality in (A10) is due to the fact that 2 ( l 1 ) < ( N + 2 ) ( N l + 2 ) . Due to (A9) and (A10) and by Theorem 14.4.1 in Reference [26], we have
1 N l + 1 Y l ( 1 U ( l 1 ) ) = O p ( N 1 ) , for   all   l = 2 , , N .
Consequently, for X i , X j in the same cell,
X j X i K * l = i + 1 j 1 N l + 1 Y l ( 1 U ( l 1 ) ) = O p j i N = O p ( N 1 ) ,
where the last equality in (A11) is due to j i 2 k since X i , X j are included in the same cell.
From the local Lipschitz continuity of A ( x ) on [ a , b ] , when N , the following condition is satisfied for any two X i , X j inside the same cell:
| A ( X j ) A ( X i ) | L * | X j X i | , for   i , j C c ,
where L * is a positive constant. From (A11) and (A12), we have
| A ( X j ) A ( X i ) | = O p ( N 1 ) , for   i , j C c .
This completes the proof. □

Appendix A.3. Proof of Lemma 2

The proof of each part is given separately.
Sketch Proof of Lemma 2 Part (i).
From (15), we have
Δ N , 2 = N k ( N 1 ) 1 c = 1 N 2 N 1 / 4 A ¯ c · A ¯ · · ε ¯ c · ε ¯ · · .
By Lemma A1 and Assumption 2,
A ¯ c · = k 1 t = 1 k A c t = k 1 i = 1 N A ( X i ) I ( i C c ) = A ( X c ) + O p ( N 1 ) ,
A ¯ · · = N 1 c = 1 N A ¯ c · = A ( X ) ¯ + O p ( N 1 ) ,
where A ( X ) ¯ = N 1 c = 1 N A ( X c ) . Therefore, Δ N , 2 can be written as
Δ N , 2 = N k ( N 1 ) 1 c = 1 N 2 N 1 / 4 A ( X c ) A ( X ) ¯ ε ¯ c · ε ¯ · · + o p ( 1 ) .
Denote U c = A ( X c ) E ( A ( X c ) ) and U ¯ · = N 1 c = 1 N U c . Then, we can write
Δ N , 2 = 2 k N 1 4 N ( N 1 ) c = 1 N A ( X c ) A ( X ) ¯ ε ¯ c · ε ¯ · · + o p ( 1 ) = 2 k N 1 4 N ( N 1 ) c = 1 N [ A ( X c ) E ( A ( X c ) ) ] [ A ( X ) ¯ E ( A ( X c ) ) ] ε ¯ c · ε ¯ · · + o p ( 1 ) = 2 k N 1 4 N ( N 1 ) c = 1 N U c U ¯ · ε ¯ c · ε ¯ · · + o p ( 1 ) = 2 k N 1 4 N ( N 1 ) c = 1 N U c ε ¯ c · N U ¯ · ε ¯ · · + o p ( 1 )
= 2 k N 1 4 N ( N 1 ) c = 1 N U c ε ¯ c · 2 k N 1 4 ( N 1 ) N U ¯ · N ε ¯ · · + o p ( 1 ) .
First, we will show that
N ( N 1 ) c = 1 N U c ε ¯ c · = O p ( 1 ) ;
therefore, the first term in (A15) is o p ( 1 ) . Note that E ( ε ¯ c · | X ) = E ( Q ¯ c · E ( Q ¯ c · | X ) | X ) = 0 , and U c is a function of X c . Therefore, we have
E N ( N 1 ) c = 1 N U c ε ¯ c · = N ( N 1 ) c = 1 N E U c E ( ε ¯ c · | X ) = 0 ,
and
Var N ( N 1 ) c = 1 N U c ε ¯ c · = N ( N 1 ) 2 E c = 1 N U c ε ¯ c · 2 = N ( N 1 ) 2 E c = 1 N U c 2 ε ¯ c · 2 + c c N U c ε ¯ c · U c ε ¯ c · = N ( N 1 ) 2 c = 1 N E U c 2 ε ¯ c · 2 + N ( N 1 ) 2 c c N E U c U c ε ¯ c · ε ¯ c · .
Denote the first term and second term in (A18) as υ N , 1 and υ N , 2 , respectively. Let r i = Y i E ( Y i | X ) , for all i. Then,
υ N , 1 = N ( N 1 ) 2 c = 1 N E U c 2 E ( ε ¯ c · 2 | X ) = N ( N 1 ) 2 c = 1 N E U c 2 E ( ( Q ¯ c · E ( Q ¯ c · | X ) ) 2 | X ) = N ( N 1 ) 2 c = 1 N E U c 2 E 1 k i = 1 N r i I ( i C c ) 2 X = N k 2 ( N 1 ) 2 c = 1 N E U c 2 E i = 1 N r i 2 I ( i C c ) + i i N r i I ( i C c ) r i I ( i C c ) X
= N k 2 ( N 1 ) 2 c = 1 N i = 1 N E U c 2 E ( r i 2 | X ) I ( i C c )
= N k 2 ( N 1 ) 2 i = 1 N c = 1 N E U c 2 σ 2 ( X i ) I ( i C c ) ,
where the equality in (A19) is due to the fact that Y i and Y i are independent when i i . Similarly,
υ N , 2 = N ( N 1 ) 2 c c N E U c U c E ( ε ¯ c · ε ¯ c · | X ) = N ( N 1 ) 2 c c N E U c U c E 1 k i = 1 N r i I ( i C c ) × 1 k i = 1 N r i I ( i C c ) X = N k 2 ( N 1 ) 2 i = 1 N c c N E U c U c E ( r i 2 X ) I ( i C c ) I ( i C c ) = N k 2 ( N 1 ) 2 i = 1 N c c N E U c U c σ 2 ( X i ) I ( i C c C c ) .
Consider individual summands in (A20) and (A21). By the Cauchy-Schwarz inequality and Assumptions 1 and 2,
E U c 2 σ 2 ( X i ) I ( i C c ) E U c 2 σ 2 ( X i ) E ( U c 4 ) 1 2 E σ 2 ( X i ) 2 1 2 = E ( U c 4 ) 1 2 E E ( r i 2 X ) 2 1 2 E ( U c 4 ) 1 2 E E ( r i 4 X ) 2 1 2 < .
Similarly,
E U c U c σ 2 ( X i ) I ( i C c C c ) E | U c U c | σ 2 ( X i ) I ( i C c C c ) E | U c U c | σ 2 ( X i ) E U c U c 2 1 2 E σ 2 ( X i ) 2 1 2 = E U c 2 1 2 E U c 2 1 2 E E ( r i 2 | X ) 2 1 2 E ( U c 4 ) 1 2 E ( U c 4 ) 1 2 E E ( r i 4 | X ) 2 1 2 < .
Note that X i can only be used to augment at most 2 k cells. That is, if the rank of X i is r, then X i cannot be used to augment cells whose x values have ranks not in the set of positive integers { max { 1 , r k } , , min { r + k , N } } . Therefore, the summation over c in (A20) and that over c and c in (A21) each contain no more than 2 k terms. As a result, the two terms υ N , 1 and υ N , 2 are O ( 1 ) ; therefore,
Var N ( N 1 ) c = 1 N U c ε ¯ c · = O ( 1 ) .
Due to (A17) and (A22), the proof of (A16) is completed by applying Theorem 14.4-1 in Reference [26].
Next, we will show that the second term in (A15) is o p ( 1 ) . Using the same technique as in the proof of (A16), it can be shown that N ε ¯ · · = O p ( 1 ) . In addition,
N U ¯ · = O p ( 1 )
is a result of the Central Limit Theorem applied to U 1 , , U N since they are i.i.d. due to the fact that X 1 , , X N are i.i.d. Consequently,
Δ N , 2 = O p ( N 1 4 ) + O p N 1 4 N 1 + o p ( 1 ) = o p ( 1 ) , as N .
This completes the proof. □
Sketch Proof of Lemma 2 Part (ii).
From (16), we have
Δ N , 3 = N { N ( k 1 ) } 1 c = 1 N t = 1 k N 1 / 2 A c t A ¯ c · 2 = { N ( k 1 ) } 1 c = 1 N t = 1 k A c t A ¯ c · 2 .
By Lemma A1, we have A c t A ¯ c · = O p ( N 1 ) . Thus,
Δ N , 3 = O p ( N 2 ) ;
therefore, Δ N , 3 is o p ( 1 ) . This completes the proof. □
Sketch Proof of Lemma 2 Part (iii).
From (17), we have
Δ N , 4 = 2 N { N ( k 1 ) } 1 c = 1 N t = 1 k ε c t ε ¯ c · N 1 / 4 A c t A ¯ c · .
By Hölder’s inequality,
| Δ N , 4 | 2 N N ( k 1 ) c = 1 N t = 1 k ε c t ε ¯ c · 2 1 2 2 N N ( k 1 ) c = 1 N t = 1 k N 1 / 4 A c t A ¯ c · 2 1 2 .
Next, we show that
c = 1 N t = 1 k ε c t ε ¯ c · 2 = O p ( N ) .
We can write
c = 1 N t = 1 k ε c t ε ¯ c · 2 = c = 1 N t = 1 k ε c t 2 k c = 1 N ε ¯ c · 2
Note that
E c = 1 N t = 1 k ε c t 2 = E E c = 1 N t = 1 k ε c t 2 | X = E E c = 1 N i = 1 N ( Y i E ( Y i | X ) ) 2 I ( i C c ) | X = c = 1 N i = 1 N E E Y i E ( Y i | X ) 2 | X I ( i C c ) = c = 1 N i = 1 N E σ 2 ( X i ) I ( i C c ) = O ( N ) ,
where the last equality in (A28) is due to the fact that σ 2 ( X i ) is uniformly bounded by Assumption 1, and the summation over i in (A28) contains only k terms. Denote r i = Y i E ( Y i | X ) , for all i. Then,
E c = 1 N t = 1 k ε c t 2 2 = E E c = 1 N t = 1 k ε c t 2 2 X = E E c = 1 N i = 1 N r i 2 I ( i C c ) 2 X = E E c = 1 N i = 1 N r i 4 I ( i C c ) + c = 1 N i i N r i 2 I ( i C c ) r i 2 I ( i C c ) + c c N i = 1 N r i 2 I ( i C c ) r i 2 I ( i C c ) + c c N i i N r i 2 I ( i C c ) r i 2 I ( i C c ) X
= c = 1 N i = 1 N E E r i 4 | X I ( i C c ) + c = 1 N i i N E σ 2 ( X i ) σ 2 ( X i ) I ( i , i C c )
+ c c N i = 1 N E E r i 4 | X I ( i C c C c ) + c c N i i N E σ 2 ( X i ) σ 2 ( X i ) I ( i C c ) I ( i C c ) = O ( N 2 ) ,
where the equality in (A30) is due to the fact that σ 2 ( X i ) and E Y i E ( Y i | X ) 4 | X are uniformly bounded by Assumption 1, and the summation over c in (A29) and that over c and c in (A30) each contain no more than 2 k terms.
From (A28) and (A30), we have
Var c = 1 N t = 1 k ε c t 2 = O ( N 2 ) .
Due to (A28) and (A31) and by Theorem 14.4-1 in Reference [26], we have
c = 1 N t = 1 k ε c t 2 = O p E c = 1 N t = 1 k ε c t 2 + O p Var c = 1 N t = 1 k ε c t 2 1 / 2 = O p ( N ) .
Similarly, it can be shown that the second term in (A27) is O p ( N ) ; therefore, the proof of (A26) is completed.
From (A24)–(A26),
| Δ N , 4 | 2 N N ( k 1 ) O p ( N ) 1 2 O p ( N 2 ) 1 2 = O p ( N 3 / 4 ) = o p ( 1 ) , as N .
This completes the proof. □

Appendix A.4. Sketch Proof of Theorem 3

The proof of the existence of lim N λ N A is similar to that for lim N λ N in Theorem 1. Now, we show that
N ( B N ( Q ) W N ( Q ) ) d N ( k σ A 2 , lim N λ N A ) .
From (12), we have
N ( B N ( Q ) W N ( Q ) ) = Δ N , 0 + Δ N , 1 + Δ N , 2 Δ N , 3 Δ N , 4 = N ( B N ( ε ) W N ( ε ) ) + Δ N , 1 + Δ N , 2 Δ N , 3 Δ N , 4 ,
where Δ N , 0 through Δ N , 4 are defined in (13)–(17). The B N ( ε ) and W N ( ε ) are the average between-cell and within-cell variations for augmented observations with Z i = Y i m 0 ( X i ) N 1 4 A ( X i ) as the response. Note that the conditional mean of Z i given X i = x satisfies the null hypothesis. But Var ( Z i | X i = x ) is equal to Var ( Y i | X i = x ) . Theorem 2 implies that
Δ N , 0 = N ( B N ( ε ) W N ( ε ) ) d N ( 0 , lim N λ N A ) .
By Lemma 2, we have
Δ N , i p 0 , as N , for i = 2 , 3 , 4 .
Thus, we only need to consider Δ N , 1 to obtain the asymptotic mean under the alternatives.
Note that A ( X 1 ) , A ( X 2 ) , , A ( X N ) are i.i.d. since X 1 , X 2 , , X N are i.i.d.. From (A13) and (A14), we can write Δ N , 1 in (14) as
Δ N , 1 = N k ( N 1 ) 1 c = 1 N N 1 / 2 A ¯ c · A ¯ · · 2 = k ( N 1 ) 1 c = 1 N A ( X c ) A ( X ) ¯ 2 = k σ ^ A 2 ,
where σ ^ A 2 is the sample variance of A ( X 1 ) , A ( X 2 ) , , A ( X N ) . By the Weak Law of Large Numbers,
k σ ^ A 2 p k σ A 2 = k Var ( A ( X ) ) = k A 2 ( x ) f ( x ) d x A ( x ) f ( x ) d x 2 ,
as N and k stays fixed.
From (A35)–(A37), we have
Δ N , 1 + Δ N , 2 Δ N , 3 Δ N , 4 p k σ A 2 .
From (A33), (A34), and (A38) and by applying Slutsky’s Theorem, we have
N ( B N ( Q ) W N ( Q ) ) d N ( k σ A 2 , λ A ) .
Theorem 3 then follows immediately since it is readily seen that λ ^ N p λ A under the local alternative (11).

Appendix A.5. Sketch Proof of Theorem 4

From the proof of Theorem 3, we know Δ N , 0 d N ( 0 , λ A ) (see (A34)), where λ A = lim N λ N A is defined similarly as λ N in Theorem 1 but with σ 2 ( X j ) calculated under the fixed alternative hypothesis.
We also know Δ N , 1 p k σ A 2 (see (A37)), where σ A 2 is given in (19). Hence, N Δ N , 1 p .
Compared to N Δ N , 1 , the remaining three terms in (18) involving Δ N , 2 , Δ N , 3 , or Δ N , 4 are all negligible. This is because Lemma 2 implies that
N 1 4 Δ N , 2 = O p ( 1 ) , N Δ N , 3 = O p ( N 3 2 ) , N 1 4 Δ N , 4 = O p ( N 1 2 ) .
Putting all terms together as in Equation (18) and applying Slutsky’s Theorem, we know that
N ( B N ( Q ) W N ( Q ) ) N Δ N , 1 N 1 4 Δ N , 2 d N ( 0 , λ A ) .
Since G N = N ( B N ( Q ) W N ( Q ) ) / λ ^ N 1 2 and the test reject the null hypothesis at significance level α if G N > z α , the power of the test is approximately
P ( G N > z α ) = P ( N ( B N ( Q ) W N ( Q ) ) > z α λ ^ N 1 2 ) = P N ( B N ( Q ) W N ( Q ) ) N Δ N , 1 N 1 4 Δ N , 2 > λ ^ N 1 2 z α N Δ N , 1 N 1 4 Δ N , 2 1 Φ ( λ A z α N k σ A 2 N 1 4 Δ N , 2 ) 1 ,
due to (A39). Essentially, the asymptotic mean of the test statistics G N converges to infinity. Hence, the power goes to one.

References

  1. Eubank, R.L.; Hart, J.D. Testing goodness-of-fit in regression via order selection criteria. Ann. Statist. 1992, 20, 1412–1425. [Google Scholar] [CrossRef]
  2. Hart, J. Smoothing-inspired lack-of-fit tests based on ranks. In Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen; Institute of Mathematical Statistics: Beachwood, OH, USA, 2008; Volume 1, pp. 138–155. [Google Scholar]
  3. Hart, J. Frequentist-Bayes lack-of-fit tests based on Laplace approximations. J. Stat. Theory Pract. 2009, 3, 681–704. [Google Scholar] [CrossRef]
  4. Hart, J. Nonparametric Smoothing and Lack-of-Fit Test; Springer: New York, NY, USA, 1997. [Google Scholar]
  5. Chen, C.-F.; Hart, J.D.; Wang, S. Bootstrapping the order selection test. J. Nonparametr. Stat. 2001, 13, 851–882. [Google Scholar] [CrossRef]
  6. Kuchibhatla, M.; Hart, J.D. Smoothing-based lack-of-fit tests: Variations on a theme. J. Nonparametr. Stat. 1996, 7, 1–22. [Google Scholar] [CrossRef]
  7. Lee, B.J. A Nonparametric Model Specification Test Using a Kernel Regression Method. Ph.D. Thesis, University of Wisconsin, Madison, WL, USA, 1988. [Google Scholar]
  8. Yatchew, A.J. Nonparametric regression tests based on least square. Econom. Theory 1992, 8, 435–451. [Google Scholar] [CrossRef]
  9. Eubank, R.L.; Spiegelman, C.H. Testing the goodness of fit of a linear model via nonparametric regression techniques. J. Am. Stat. Assoc. 1990, 85, 387–392. [Google Scholar] [CrossRef]
  10. Hardle, W.; Mammen, E. Comparing nonparametric versus parametric regression fits. Ann. Statist. 1993, 21, 1926–1947. [Google Scholar] [CrossRef]
  11. Zheng, J.X. A consistent test of functional form via nonparametric estimation techniques. J. Econom. 1996, 75, 263–289. [Google Scholar] [CrossRef]
  12. Horowitz, J.Z.; Spokoiny, V.G. An adaptive, rate-optimal test of a parametric mean-regression model against a nonparametric alternative. Econometrica 2001, 69, 599–631. [Google Scholar] [CrossRef]
  13. Guerre, E.; Lavergne, P. Data-driven rate-optimal specification testing in regression models. Ann. Statist. 2005, 33, 840–870. [Google Scholar] [CrossRef] [Green Version]
  14. Song, W.; Du, J. A note on testing the regression functions via nonparametric smoothing. Can. J. Stat. 2011, 39, 108–125. [Google Scholar] [CrossRef]
  15. Wang, L.; Akritas, M. Testing for covariate effects in the fully nonparametric analysis of covariance model. J. Am. Stat. Assoc. 2006, 101, 722–736. [Google Scholar] [CrossRef]
  16. Wang, L.; Akritas, M.G.; Van Keilegom, I. An ANOVA-type nonparametric diagnostic test for heteroscedastic regression models. J. Nonparametr. Stat. 2008, 20, 365–382. [Google Scholar] [CrossRef]
  17. Wang, H.; Tolos, S.; Wang, S. A distribution free test to detect general dependence between a response variable and a covariate in the presence of heteroscedastic treatment effects. Can. J. Stat. 2010, 38, 408–433. [Google Scholar] [CrossRef]
  18. Hardle, W.; Hall, P.; Marron, J.S. How far are automatically chosen regression smoothing parameters from their optimum? J. Am. Stat. Assoc. 1988, 83, 86–95. [Google Scholar]
  19. Wang, H.; Akritas, M. Asymptotically distribution free tests in heteroscedastic unbalanced high dimensional anova. Stat. Sin. 2011, 21, 1341–1377. [Google Scholar] [CrossRef] [Green Version]
  20. de Jong, P. A central limit theorem for generalized quadratic forms. Probab. Theory Relat. Fields 1987, 75, 261–277. [Google Scholar] [CrossRef]
  21. Hart, J.D.; Yi, S. One-sided cross-validation. J. Am. Stat. Assoc. 1998, 93, 620–631. [Google Scholar] [CrossRef]
  22. Holmes, C.C.; Adams, N.M. Likelihood inference in nearest-neighbour classification models. Biometrika 2003, 90, 99–112. [Google Scholar] [CrossRef] [Green Version]
  23. Singh, D.; Febbo, P.G.; Ross, K.; Jackson, D.G.; Manola, J.; Ladd, C.; Tamayo, P.; Renshaw, A.A.; D’Amico, A.V.; Richie, J.P.; et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 2002, 1, 203–209. [Google Scholar] [CrossRef] [Green Version]
  24. Lee, S.Y.; Lei, B.; Mallick, B.; Samy, A.M. Estimation of covid-19 spread curves integrating global data and borrowing information. PLoS ONE 2020, 15, e0236860. [Google Scholar] [CrossRef] [PubMed]
  25. Pyke, R. Spacings (with discusssion). J. R. Stat. Soc. Ser. B Stat. Methodol. 1965, 27, 395–449. [Google Scholar]
  26. Bishop, Y.M.; Fienberg, S.E.; Holland, P.W. Discrete Multivariate Analysis; Springer: New York, NY, USA, 2007. [Google Scholar]
Figure 1. Typical patterns of L S C V ( k ) versus k in continuous data.
Figure 1. Typical patterns of L S C V ( k ) versus k in continuous data.
Symmetry 13 01264 g001
Figure 2. Empirical power of the tests for data generated from Y i = C e 2 X i c o s ( 8 π X i ) + ϵ i with sample size n = 50 and ϵ i Uniform ( 0.1 , 0.1 ) . Due to small values of C, the signal to noise ratio is low. WAI is not included since it could not keep its type I error under control for this sample size and the uniform error distribution.
Figure 2. Empirical power of the tests for data generated from Y i = C e 2 X i c o s ( 8 π X i ) + ϵ i with sample size n = 50 and ϵ i Uniform ( 0.1 , 0.1 ) . Due to small values of C, the signal to noise ratio is low. WAI is not included since it could not keep its type I error under control for this sample size and the uniform error distribution.
Symmetry 13 01264 g002
Figure 3. Empirical power for different sample sizes. The data were generated from Y i = N 1 / 4 0.3 e 2 X i c o s ( 8 π X i ) + ϵ i , where ϵ i Uniform ( 0.1 , 0.1 ) . As the sample sizes N increases, the power of all tests increases. The G N test approaches 1 much faster than other tests. WAI is not included since it could not keep its type I error under control for the sample sizes and the uniform error distribution.
Figure 3. Empirical power for different sample sizes. The data were generated from Y i = N 1 / 4 0.3 e 2 X i c o s ( 8 π X i ) + ϵ i , where ϵ i Uniform ( 0.1 , 0.1 ) . As the sample sizes N increases, the power of all tests increases. The G N test approaches 1 much faster than other tests. WAI is not included since it could not keep its type I error under control for the sample sizes and the uniform error distribution.
Symmetry 13 01264 g003
Figure 4. The leave-one-out accuracy curve with increasing number of selected genes.
Figure 4. The leave-one-out accuracy curve with increasing number of selected genes.
Symmetry 13 01264 g004
Figure 5. Observed U.S. COVID-19 mortality and confirmed cases with the Richards growth curve estimates.
Figure 5. Observed U.S. COVID-19 mortality and confirmed cases with the Richards growth curve estimates.
Symmetry 13 01264 g005
Table 1. Percent of rejection at 0.05 level under H 0 for data generated from Model M 0 . OS: order selection test of Reference [1]; BN: Reference [3] Bayes sum test with Normal approximation; BOS: order selection test with wild bootstrap of Reference [5]; ROS: rank-based test of Reference [2]; WAI3, WAI5, WAI7: test of Reference [16] with k = 3 , 5, and 7, respectively; G N : the proposed test with k = 3 , 5, 7, and  k ^ from (20). Highly inflated empirical type I errors are marked with red color.
Table 1. Percent of rejection at 0.05 level under H 0 for data generated from Model M 0 . OS: order selection test of Reference [1]; BN: Reference [3] Bayes sum test with Normal approximation; BOS: order selection test with wild bootstrap of Reference [5]; ROS: rank-based test of Reference [2]; WAI3, WAI5, WAI7: test of Reference [16] with k = 3 , 5, and 7, respectively; G N : the proposed test with k = 3 , 5, 7, and  k ^ from (20). Highly inflated empirical type I errors are marked with red color.
ϵ i Normal and n = ϵ i Unif and n =
Test255075100200255075100200
BN4.254.905.405.704.455.206.155.454.854.95
OS4.355.555.055.503.654.705.605.004.955.10
ROS3.504.954.955.553.703.955.905.204.605.10
BOS3.855.455.055.854.004.355.656.005.155.15
WAI   k n = 3 8.757.257.107.356.408.909.058.207.556.65
WAI   k n = 5 6.855.856.256.406.156.858.407.056.806.45
WAI   k n = 7 5.005.205.406.256.305.758.106.306.356.40
G N     k = 3 5.604.054.103.953.455.404.954.303.303.05
G N     k = 5 4.403.453.803.903.354.455.153.953.803.80
G N     k = 7 3.653.454.154.404.003.605.204.253.954.25
G N     k ^ 6.205.954.954.654.606.004.854.855.154.35
ϵ i T ( 5 ) / 30 and n = ϵ i Heter and n =
Test255075100200255075100200
BN4.154.954.754.705.158.959.207.958.557.05
OS3.704.355.204.204.656.407.657.857.957.20
ROS4.404.454.854.505.105.707.007.557.257.10
BOS3.754.905.254.455.204.705.455.105.454.95
WAI   k n = 3 8.758.858.407.206.9512.5010.0511.5010.208.10
WAI   k n = 5 6.507.256.856.105.909.059.109.559.157.50
WAI   k n = 7 4.656.305.605.356.006.807.807.508.457.95
G N     k = 3 3.803.452.501.951.957.505.806.456.505.00
G N     k = 5 3.103.252.651.701.805.305.656.306.154.40
G N     k = 7 2.502.952.802.251.954.855.405.306.105.20
G N     k ^ 4.303.953.602.552.058.206.707.406.905.00
Table 2. Percent of rejection under high frequency alternatives M 1 to M 4 with q = 8 and sample size n = 50 . The legend of the tests are same as in Table 1. WAI is not included since it could not keep its type I error under control. Note: BN also has highly elevated type I error in the Heteroscedastic case.
Table 2. Percent of rejection under high frequency alternatives M 1 to M 4 with q = 8 and sample size n = 50 . The legend of the tests are same as in Table 1. WAI is not included since it could not keep its type I error under control. Note: BN also has highly elevated type I error in the Heteroscedastic case.
ϵ i Normal ϵ i Unif
ModelBNOSROSBOS G N BNOSROSBOS G N
M 1 10078.3571.6566.7010010078.3572.2066.85100
M 2 10087.7580.7091.5510010087.7081.0091.95100
M 3 99.6570.7070.2543.1599.7099.5067.8563.0041.1599.60
M 4 99.0561.7552.2034.4599.6081.1517.3511.6010.2593.20
ϵ i T ( 5 ) / 30 ϵ i Heter
ModelBNOSROSBOS G N BNOSROSBOS G N
M 1 10078.3571.7067.3010010078.4072.2566.45100
M 2 10087.5080.5592.0010010087.7580.7591.95100
M 3 99.6069.2066.9542.4099.6599.6069.8567.6543.4099.65
M 4 92.3029.8021.2015.9097.3597.0544.7532.2023.4599.40
Table 3. Parameters of the Richard growth models, R 2 between the observed and model estimated counts, and the p-values of the lack of fit test.
Table 3. Parameters of the Richard growth models, R 2 between the observed and model estimated counts, and the p-values of the lack of fit test.
θ 1 θ 2 θ 3 ξ R 2 p-Value of G N
U.S. deaths594,3310.0452−17.10626.85050.99500.0000
U.S. confirmed cases27,890,1380.1270−43.794311.51510.99820.0000
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Gharaibeh, M.M.; Sahtout, M.; Wang, H.; Wang, S. A Nonparametric Lack-of-Fit Test of Constant Regression in the Presence of Heteroscedastic Variances. Symmetry 2021, 13, 1264. https://doi.org/10.3390/sym13071264

AMA Style

Gharaibeh MM, Sahtout M, Wang H, Wang S. A Nonparametric Lack-of-Fit Test of Constant Regression in the Presence of Heteroscedastic Variances. Symmetry. 2021; 13(7):1264. https://doi.org/10.3390/sym13071264

Chicago/Turabian Style

Gharaibeh, Mohammed M., Mohammad Sahtout, Haiyan Wang, and Suojin Wang. 2021. "A Nonparametric Lack-of-Fit Test of Constant Regression in the Presence of Heteroscedastic Variances" Symmetry 13, no. 7: 1264. https://doi.org/10.3390/sym13071264

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop