Next Article in Journal
Fisher Information and Electromagnetic Interacting Dirac Spinors
Previous Article in Journal
Relativistic Scalar Particle Systems in a Spacetime with a Spiral-like Dislocation
Previous Article in Special Issue
Goodness-of-Fit Test for the Bivariate Negative Binomial Distribution
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimal Minimax Rate of Smoothing Parameter in Distributed Nonparametric Specification Test

1
Department of Biostatistics, School of Public Health, Shandong University, Jinan 250021, China
2
School of Mathematical Sciences, Soochow University, Suzhou 215006, China
3
School of Mathematics and Statistics, Huaiyin Normal University, Huai’an 223300, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Axioms 2025, 14(3), 228; https://doi.org/10.3390/axioms14030228
Submission received: 5 February 2025 / Revised: 15 March 2025 / Accepted: 18 March 2025 / Published: 19 March 2025
(This article belongs to the Special Issue Advances in Statistical Simulation and Computing)

Abstract

:
A model specification test is a statistical procedure used to assess whether a given statistical model accurately represents the underlying data-generating process. The smoothing-based nonparametric specification test is widely used due to its efficiency against “singular” local alternatives. However, large modern datasets create various computational problems when implementing the nonparametric specification test. The divide-and-conquer algorithm is highly effective for handling large datasets, as it can break down a large dataset into more manageable datasets. By applying divide-and-conquer, the nonparametric specification test can handle the computational problems induced by the massive size of the modern datasets, leading to improved scalability and efficiency and reduced processing time. However, the selection of smoothing parameters for optimal power of the distributed algorithm is an important problem. The rate of the smoothing parameter that ensures rate optimality of the test in the context of testing the specification of a nonlinear parametric regression function is studied in the literature. In this paper, we verified the uniqueness of the rate of the smoothing parameter that ensures the rate optimality of divide-and-conquer-based tests. By employing a penalty method to select the smoothing parameter, we obtain a test with an asymptotic normal null distribution and adaptiveness properties. The performance of this test is further illustrated through numerical simulations.

1. Introduction

Big datasets characterized by large sample sizes N and/or high dimension p are increasingly accessible. In this paper, we focus on datasets with massive sample size N and low dimension p. However, directly making inferences from such large datasets is computationally infeasible due to limitations in processor memory, which makes selecting an appropriate model for big data particularly challenging. The divide-and-conquer approach is intuitive and has been widely employed across various fields to tackle diverse problems. Zaremba et al. [1] utilized this strategy to address two-sample test problems. In situations involving large sample sizes or high-dimensional predictors, Chen and Xie [2] applied the divide-and-conquer methodology for variable selection in generalized linear models. Battey et al. [3] integrated the divide-and-conquer algorithm with high-dimensional hypothesis testing and estimation. Additionally, as noted in [4], samples in big datasets are often aggregated from multiple sources. Therefore, feasible and robust specification testing methods, essential for addressing model misspecification, are critical for handling massive datasets.
Suppose we have a sequence of independent observations { ( y i , x i ) } i = 1 N drawn from a population ( Y , X ) R × [ 0 , 1 ] p , where the unknown regression function E ( Y X = x ) = m ( x ) is assumed to be smooth. In this context, a specification test is necessary to assess the functional form of the regression and justify the use of a parametric model. Given a parametric family of known real functions g ( x ; θ ) , the null and alternative hypotheses can be described as follows:
H 0 : m ( x ) = g ( x , θ 0 ) for   some θ 0 Θ ,
H 1 : m ( x ) g ( x , θ 0 ) for   all θ 0 Θ ,
where Θ R q denotes the parameter space. This hypothesis testing problem has been widely studied in the literature. One category of approach is to measure the distance between the estimator under the null and the nonparametric estimator under alternative models (see Hardle and Mammen [5], Neumeyer and Van Keilegom [6], González-Manteiga and Crujeiras [7] and the references therein). Another competing approach relies on the empirical process of the residuals from the parametric model [8,9]. An important criterion for evaluating the behavior of these tests is their power performance under local alternatives (see, e.g., [10]). Additionally, Ingster [11,12] proposed an alternative approach to investigating the asymptotic power properties of tests via the minimax approach. Guerre and Lavergne [13] further provided the optimal minimax rate for the smoothing parameter that ensures the rate optimality of the test in the context of testing the specification of a nonlinear parametric regression function. Conditional on a subset of covariates in regression modeling, Cai et al. [14] proposed a significance test for the partial mean independence problem based on machine learning methods and data splitting. Tan and Zhu [15] proposed a residual-marked empirical process that adapts to the underlying model, forming the basis of a goodness-of-fit test for parametric single-index models with a diverging number of predictors. However, existing methods that work well for moderate-sized datasets are not feasible for massive datasets due to computational limitations. Han et al. [16] developed an optimal sampling strategy to select a small subset from a large pool of data to reduce the computation budget for model checking big data. When dealing with test statistics that are quadratic forms [5,17], the computational complexity of the quadratic form test statistic is O ( N 2 ) , which presents a significant computational burden for large-scale data.
To address this issue, a divide-and-conquer-based test statistic was proposed in [18,19]. Zhao et al. [18] incorporated a divide-and-conquer strategy into [17] a nonparametric test statistic along with a data-driven bandwidth selection procedure. However, this integrated approach can easily inflate the type I error rate. To mitigate issues associated with choosing smoothing parameters and preserving the type-I error rate, Ref. Zhao et al. [19] proposed randomly splitting the observations into two subsets. In the first subset, an “optimal” smoothing parameter is selected based on a straightforward criterion. Subsequently, a lack-of-fit test grounded in asymptotic theory is conducted using the second subset. This data-splitting strategy effectively controls the type-I error rate. However, the sample splitting will reduce the power, as only a subset of the sample is used to construct the test statistics. Furthermore, the uniqueness of the rate of the smoothing parameter, which ensures the rate optimality of the divide-and-conquer-based test statistic, is not addressed in [18,19]. In this paper, we establish and verify the uniqueness of the rate for the smoothing parameter that guarantees the rate optimality of the divide-and-conquer-based test statistic.
Moreover, it is well known that the optimal smoothing parameters for testing differ from those that are optimal for estimation [11,12,20]. As a result, there has been growing interest in adaptive testing methods. One approach is to consider a set of suitable values for the bandwidth and proceed from there, as discussed in [21,22]. In this paper, we integrate the smoothing parameter selection method in [22] with the divide-and-conquer-based test statistic proposed in [18]. This combination leads to a computationally feasible and adaptive test statistic which retains its asymptotic normality under the null hypothesis.
The paper is organized as follows: Section 2 describes the test statistics and their corresponding asymptotic behavior under the null hypothesis. In Section 3, we demonstrate the unique rate of the smoothing parameter that ensures rate optimality in the DZH test. Section 4 presents simulation studies for illustration. The proofs of the theorems are provided in Section 5.

2. The Divide-And-Conquer-Based Test Statistics

The distributed test statistic proposed in Zhao et al. [18] is based on the test statistic in Zheng [17], where the kernel method is used to estimate the conditional moment E { ζ i E ( ζ i X i ) f ( X i ) } , ζ i = y i g ( x i ; θ 0 ) , and f ( · ) is the density function of x i . The kernel-based sample estimator of the quantity is
Q N ( h N ) = 1 N ( N 1 ) i = 1 N j i N K h N x i x j e i e j ,
where K h N ( · ) = K ( · / h N ) / h N p denotes a p-dimensional kernel function, h N is the bandwidth depending on N, e i = y i g ( x i , θ ^ N ) , and θ ^ N is an estimate of θ 0 under the null hypothesis.
When handling exceptionally large datasets where the sample size N becomes unmanageable, the test statistics combined with the divide-and-conquer procedure is proposed in Zhao et al. [18]. Initially, the dataset is partitioned into K equally sized subsets, each containing n observations. The test statistic based on the observations in the kth subset is
V k ( h n ) = 1 n ( n 1 ) i = 1 n j i n K h n x i k x j k e i k e j k ,
where e i k ’s are the fitted residuals with { ( x i k , y i k ) } i = 1 n . As n h n p / 2 V k ( h n ) is asymptotically normal with mean zero and δ 2 under mild conditions [17], where
δ 2 = 2 K 2 ( u ) d u { σ 2 ( x ) } 2 f 2 ( x ) d x , σ 2 ( x ) = E ( ε i 2 x ) , ε i = y i m ( x i )
Then, Zhao et al. [18] combined the test statistic by taking an average,
T N ( h n ) = 1 K k = 1 K V k ( h n ) δ ^ ( h n ) ,
where δ ^ 2 ( h n ) is an estimate of δ 2 . A natural estimator is δ ^ 2 ( h n ) = K 1 k = 1 K δ ^ k 2 ( h n ) , where
δ ^ k 2 ( h n ) = 2 n ( n 1 ) i = 1 n j i n h n p K h n 2 x i k x j k e i k 2 e j k 2
is a consistent estimate of δ 2 based on the kth subset. The test based on statistic T N ( h n ) is denoted as the DZH test in [18]. T N ( h n ) is asymptotic normal under the null hypothesis provided some mild conditions [18]. In this paper, we study the asymptotic behavior of T N ( h n ) by relaxing n h n p / ln n to n h n p .
Assumption 1.
The density function f ( x ) of x and its first-order derivatives are uniformly bounded, 0 f ̲ f ( x ) f ¯ < , x [ 0 , 1 ] p .
Assumption 2.
Suppose that E ( ε i | X i ) = 0 and σ 2 ( x i ) σ ¯ 2 , E ( ε i 4 | X i ) = σ 4 ( x i ) C uniformly in i. We also assume that σ 2 ( x i ) is differentiable and that its first-order derivatives are uniformly bounded for all i.
Assumption 3.
For any m ( · ) , not necessarily in H 0 , let
θ * = arg min θ Θ E { m ( X ) g ( X ; θ ) } 2 .
Under H 0 , θ * = θ 0 . For any m ( · ) , θ * is unique. θ ^ n is the estimator of θ * such that n K ( θ ^ n θ * ) = O p ( 1 ) uniformly with respect to m ( · ) with E { m 4 ( X ) } C < , i.e.,
η > 0 , ϵ > 0 : lim sup n , K sup E { m 4 ( X ) } C P n K θ ^ n θ * > ϵ η .
Assumption 4.
g ( · , · ) is uniformly bounded in x and θ, is twice continuously differentiable with respect to θ, with first- and second-order derivatives g θ ( · , · ) and g θ θ ( · , · ) uniformly bounded in x and θ Θ with upper bound g ¯ θ and g ¯ θ θ , respectively.
Assumption 5.
K ( u ) is a nonnegative, bounded, continuous, and symmetric function such that K ( u ) d u = 1 .
Assumption 6.
Suppose K ( u ) ’s Fourier transform K ^ ( u ) = exp ( i t u ) K ( t ) d t is strictly positive on its nonempty support.
Theorem 1
(Null hypothesis). Suppose Assumptions 1–5 hold; if n h n p , h n 0 and K , then we have n h n p / 2 K 1 / 2 T N ( h n ) d N ( 0 , 1 ) .
This result suggests that we can reject H 0 at an α level of significance if the normalized n h n p / 2 K 1 / 2 T N ( h n ) is larger than z α , where z α is the upper α th quantile of the standard normal distribution. Given that our focus is to demonstrate the null asymptotic results under the specific bandwidth h n in Theorem 1, the condition can be relaxed to n h n p . The proof of this theorem closely resembles that of Theorem 1 in Zhao et al. [18], with the exception that we relax the condition from n h n p / ln n to n h n p . Therefore, we omit the detailed proof here. However, how to choose an appropriate K via balancing the computation budget and statistical efficiency in practical applications is still an open question.
To develop an adaptive test, we integrate the smoothing parameter selection procedure proposed by Guerre and Lavergne [22] and T N ( h n ) . This procedure advocates for a larger smoothing parameter under the null hypothesis and selects h based on this criterion.
h n * = argmax h H n { K 1 k = 1 K V k ( h ) γ n υ ^ h , h 0 } .
where H n is the given candidate set of h n and υ ^ h , h 0 is an estimator of the asymptotic null standard deviation of K 1 k = 1 K V k ( h ) V k ( h 0 ) . The asymptotic null standard deviation of K 1 k = 1 K ( V k ( h ) V k ( h 0 ) ) is
υ h , h 0 = 2 K n ( n 1 ) K h ( x 1 x 2 ) K h 0 ( x 1 x 2 ) 2 σ 2 ( x 1 ) σ 2 ( x 2 ) f ( x 1 ) f ( x 2 ) d x 1 d x 2 ,
where an intuitive estimator is
υ ^ h , h 0 = 2 K 2 n 2 ( n 1 ) 2 k = 1 K i = 1 n i j K h ( x i k x j k ) K h 0 ( x i k x j k ) 2 e i k 2 e j k 2 .
Let h 0 represent the largest element in H n . The corresponding test statistic based on the selected h n * is given by
T N * ( h n * ) = 1 K k = 1 K V k ( h n * ) δ ^ ( h 0 )
Under the null hypothesis, as γ n , the test statistic T N * ( h n * ) tends to favor T N ( h 0 ) . Given that T N ( h 0 ) is asymptotically normal under Assumptions 1–5, and considering that n h 0 p , h 0 0 and K , T N * ( h n * ) also achieves asymptotic normality under the additional condition that γ n . Moreover, this statistic exhibits an adaptive property, enhancing its suitability across a broader range of alternative hypotheses.

3. The Unique Rate of the Smoothing Parameter Ensuring Rate-Optimality in the DZH Test

Our previous study in Zhao et al. [18] demonstrated that the optimal power of the test is significantly dependent on the set of bandwidth candidates, H n . This set should encompass the optimal rate of bandwidth, h ˜ n N 2 / ( 4 s + p ) to achieve desired performance. However, the uniqueness of this bandwidth rate was not established. In this section, we will demonstrate the uniqueness of the rate O ( N 2 / ( 4 s + p ) ) , which is critical for ensuring the rate-optimality of the DZH test. Let the Hölder class C p ( L , s ) be the set of maps N ( · ) from [ 0 , 1 ] p to R with
C p ( L , s ) = N ( · ) :   | N ( x ) N ( y ) | L x y s for   all x , y in [ 0 , 1 ] p , s ( 0 , 1 ] , C p ( L , s ) = N ( · ) :   the s th   partial   derivatives   of N ( · ) are   in C p ( L , s s ) , s > 1 ,
where s is the lower integer part of s. Consider the following alternative hypothesis:
H 1 ( κ N ρ N ) = { m N ( x ) : E N 2 ( X , θ * ) κ N 2 ρ N 2 , N ( · ) C p ( L , s ) for fixed s > p / 4 } ,
where N ( · ) = m N ( · ) g ( · ; θ * ) and ρ N = N 2 s / ( 4 s + p ) . ρ N is the optimal minimax rate for nonparametric specification testing in regression models of known s > p / 4 for the Hölder class given above (see Guerre and Lavergne [13]).
Theorem 2.
Suppose Assumptions 1–6 hold, h ˜ n N 2 / ( 4 s + p ) is the only bandwidth rate such that I { n h n p / 2 K 1 / 2 T N ( h n ) z α } can consistent uniformly against { m N ( · ) } N 1 H 1 ( κ N ρ N ) for any κ N .

4. Simulation Studies

In this section, we present simulation studies to examine the behaviors of the size and power of the tests based on test statistics T N ( h n ) and T * ( h n * ) , denoted as DZH and MD, respectively. We choose p = 2 , N = 2000 , 4000 , 8000 , K = 10 , 20 , 40 . To demonstrate the adaptiveness of the MD test compared to the DZH test and to maintain the type I error rate for both tests, we select H n = { h 1 = 0.28 , h 2 = 0.14 , h 3 = 0.07 } . For the MD test, we adopt a penalty sequence γ n = c 2 ln ( | H n | ) , where c = 2 , as recommended in Guerre and Lavergne [22], and | H n | represents the cardinality of H n . Two models are used to generate response variable Y.
  • M1: Y = 1 + X 1 + X 2 + ε
  • M2: Y = 1 + X 1 + X 2 + sin ( b X 1 ) + ε , b { 0.8 , 1 , 10 }
We define the variables X 1 and X 2 where X 1 = Z 1 represents a simple assignment, and X 2 = Z 1 + Z 2 / 2 combines the influences of two independent factors under transformation to maintain variance consistency. To rigorously test the robustness of our proposed statistical method against different models, Z 1 and Z 2 are independently drawn from either the standard normal distribution, which provides a baseline due to its well-known properties, or from the Student’s t-distribution with 5 degrees of freedom, known for its heavier tails and greater kurtosis. This choice enables an examination of the test’s sensitivity deviating from normality. Furthermore, to assess the impact of error distributions on test performance, we explore three different distributions for the error term ε . These include the standard normal distribution, standardized exponential distribution, and Student’s t-distribution with 5 degrees of freedom. Standard normal distribution assumes ideal conditions. The standardized exponential distribution introduces asymmetry and is skewed. Student’s t-distribution with 5 degrees of freedom tests the resilience of the method against errors with heavier tails. This comprehensive approach allows us to determine the test’s effectiveness and reliability across various scenarios reflective of real-world data complexities. The kernel function K used is the bivariate standard normal density function. M1 is used to assess the size of the tests. To investigate the power of the test against a high(low)-frequency alternatives, M2 is considered. In model M2, small(large) values of b represent low(high)-frequency alternatives. b is selected to be { 0.8 , 1 , 10 } .
The empirical sizes are reported in Table 1, Table 2, Table 3 and Table 4 for different error distributions, demonstrating that both methods effectively maintain the Type-I error rate. We describe the variation trends of the power along sample size N and K in Table 5 and Table 6 and Figure 1, Figure 2, Figure 3 and Figure 4. DZH tests under three distinct bandwidths are included in the power comparison tables. The results indicate that power loss increases with larger K, a consequence of the information loss inherent in the divide-and-conquer procedure. For low-frequency alternatives (when b = 0.8 , 1 ), the power of the DZH test improves as h increases. However, for the high-frequency setting b = 10 , DZH test exhibits an opposite trend with changes in h. The MD test performs comparably to the best scenarios of the DZH test for both low- and high-frequency alternatives, demonstrating its adaptive capability. This adaptiveness makes it suitable for a broader range of alternative hypotheses, accommodating both low- and high-frequency variations. The comparison between Figure 1 and Figure 4 demonstrates that the proposed test exhibits higher power when the variables Z 1 and Z 2 are generated from the Student’s t(5) distribution rather than the normal distribution. This underscores the robustness of our method against the heavy-tailed distribution of variables. The power performance of all the tests under the exponential distribution is comparable to that under the normal distribution. However, there is a noticeable decrease in power when the underlying model is the Student’s t distribution compared to the other two scenarios. All analyses were conducted using R version 4.3.2.

5. Proofs

Some Lemmas

In this section, we restate three Lemmas in Zhao et al. [18], omitting detailed proofs for brevity. Lemma 2 is restated under assumption n h n p . We assume q = 1 without loss of generality. Denote
e i k = y i k g ( x i k ; θ ^ n ) , ε i k = y i k m ( x i k ) , u i k = g ( x i k ; θ ^ n ) g ( x i k ; θ * ) , i k = m ( x i k ) g ( x i k ; θ * )
We introduce the following matrix notations:
e k = e i k , 1 i n , ε k = ε i k , 1 i n , u k = u i k , 1 i n , k = i k , 1 i n
W k ( h n ) = w i k , j k 1 i , j n
w i k , j k = 1 n ( n 1 ) K h n x i k x j k , i j ; 0 , i = j
Under H 1 , i k 0 ,
V k = ( k + ε k ) W k ( h n ) ( k + ε k ) 2 k W k ( h n ) u k 2 ε k W k ( h n ) u k + u k W k ( h n ) u k = V 1 k 2 V 2 k 2 V 3 k + V 4 k
Under H 0 , i k = 0 , then V k = V 1 k 2 V 3 k + V 4 k , where V 1 k = ε k W k ( h n ) ε k . Following Zhao et al. [18], we decompose n h n p / 2 K 1 / 2 T N in the following way:
n h n p / 2 K 1 / 2 T N = n h n p / 2 δ ^ 1 K 1 / 2 k = 1 K V k = n h n p / 2 δ ^ 1 K 1 / 2 k = 1 K V 1 k 2 n h n p / 2 δ ^ 1 K 1 / 2 k = 1 K V 2 k 2 n h n p / 2 δ ^ 1 K 1 / 2 k = 1 K V 3 k + n h n p / 2 δ ^ 1 K 1 / 2 k = 1 K V 4 k = V ¯ 1 2 V ¯ 2 2 V ¯ 3 + V ¯ 4
Lemma 1.
Given Assumptions 1–5, under the null hypothesis, δ 2 can be consistently estimated by δ ^ 2 ( h n ) , as n h n p , h n 0 , K .
Lemma 2.
Given Assumptions 1–5,
1. V ¯ 2 = O p ( n h n p / 2 E ( N 2 ( x ) ) ) uniformly for m ( · ) under H 1 , as h n 0 , n h n p , K .
2. V ¯ 3 = O p ( h n p / 2 / K ) = o p ( 1 ) , uniformly for m ( · ) under H 1 and H 0 , as h n 0 , n h n p , K .
3. V ¯ 4 = O p ( h n p / 2 / K ) = o p ( 1 ) , uniformly for m ( · ) under H 1 and H 0 , as h n 0 , n h n p , K .
Lemma 3.
Denote V ˜ 1 = n h n p / 2 K 1 / 2 k = 1 K V 1 k . Under Assumptions 1–6, for any m N ( · ) H 1 ( κ N ρ N ) and n large enough, where ρ N = N 2 s 4 s + p , we have
E ( V ˜ 1 ) C 1 K 1 / 2 n h n p / 2 E N 2 ( X ) , V a r ( V ˜ 1 ) C 2 n h n p E N 2 ( x ) + C 3
When
κ N ρ N Λ n + r ( P h n ) / Λ n 1 n h n p C 0 L h n s , Λ n = E ( π k P h n π k ) E ( π k π k )
we have
E ( V ˜ 1 ) C 1 K 1 / 2 n h n p / 2 Λ n 1 n h n p E N 2 ( x ) ( Λ n + r ( P h n ) ) C 0 L h n s 2
Proof of Theorem 2.
We first construct a alternative m N ( · ) based on h ˜ n .
m N ( · ) = g ( · ; θ * ) + N ( · )
Using the method in Guerre and Lavergne [13] to construct the alternatives N ( · ) , define
I t = i = 1 p [ t i h ˜ n , ( t i + 1 ) h ˜ n )
for t Y n .
Y n = { t : t = ( t 1 , , t p ) N p , 0 t i γ n 1 }
Then, I t [ 0 , 1 ] p , without loss of generality, we assume that γ n = 1 / h ˜ n is an integer. Let
φ t ( x ) = φ x t h ˜ n h ˜ n , t Y n
where φ t ( · ) s are orthogonal with disjoint supports I t , and φ ( · ) is bounded and nonnegative. Let { B t , t Y n } be any sequence with | B t | = 1 , t ,
N ( x ) = κ N ρ N t Y n B t φ t ( x ) = κ N ρ N t Y n B t φ x t h ˜ n h ˜ n
Under Assumption 3, we have that there exists a constant C such that E { m N 4 ( X ) } C < and m N ( · ) H 1 ( κ N ρ N ) . Since
inf m ( · ) H 1 ( κ N ρ N ) P n h n p / 2 K 1 / 2 T N ( h n ) z α P n h n p / 2 K 1 / 2 T N ( h n ) z α
for any m ( · ) H 1 ( κ N ρ N ) . The main idea is that if I { n h n p / 2 K 1 / 2 T N ( h n ) z α } cannot be consistent against alternatives m N ( · ) as κ N , then we can conclude that it also not consistent uniformly against m N ( · ) H 1 ( κ N ρ N ) as κ N .
Based on Lemma 1–3, the test statistic
n h n p / 2 K 1 / 2 T N ( h n ) = n h n p / 2 δ ^ ( h n ) 1 K 1 / 2 k = 1 K V k = δ ^ ( h n ) 1 V ˜ 1 2 n h n p / 2 δ ^ ( h n ) 1 K 1 / 2 k = 1 V 2 k + o p ( 1 ) = O p ( 1 ) E V ˜ 1 + O p V a r ( V ˜ 1 ) 2 V ¯ 2 + o p ( 1 )
Without loss of generality, we assume that K ( · ) has support [ 1 , 1 ] p in the following proofs:
(i): For h n = a N h ˜ n , a N .
E ( V ˜ 1 ) = K n h n p / 2 E ( V 1 k ) = K n h n p / 2 E 1 h n p K X 1 X 2 h n ( X 1 ) ( X 2 ) = K n h n p / 2 0 1 ( 0 , 1 ) ( x 2 1 h n , x 2 + 1 h n ) 1 h n p K x 1 x 2 h n ( x 1 ) ( x 2 ) f ( x 1 ) f ( x 2 ) d x 1 d x 2 = K n h n p / 2 κ N 2 ρ N 2 1 h n p t Y n t h ˜ n ( t + 1 ) h ˜ n ( t h ˜ n , ( t + 1 ) h ˜ n ) ( x 2 1 h n , x 2 + 1 h n ) K x 1 x 2 h n φ x 1 t h ˜ n h ˜ n φ x 2 t h ˜ n h ˜ n f ( x 1 ) f ( x 2 ) d x 1 d x 2 = K n h n p / 2 κ N 2 ρ N 2 1 h n p t Y n t h ˜ n ( t + 1 ) h ˜ n t h ˜ n ( t + 1 ) h ˜ n K x 1 x 2 h n φ x 1 t h ˜ n h ˜ n φ x 2 t h ˜ n h ˜ n f ( x 1 ) f ( x 2 ) d x 1 d x 2 = K n h n p / 2 κ N 2 ρ N 2 h ˜ n 2 p h n p t Y n 0 1 0 1 K ( u v ) h ˜ n h n φ ( u ) φ ( v ) f ( u h ˜ n + t h ˜ n ) f ( v h ˜ n + t h ˜ n ) d u d v = K n h n p / 2 κ N 2 ρ N 2 h ˜ n 2 p h n p t Y n 0 1 0 1 K u v a N φ ( u ) φ ( v ) f ( u h ˜ n + t h ˜ n ) f ( v h ˜ n + t h ˜ n ) d u d v = K n h n p / 2 κ N 2 ρ N 2 h ˜ n p h n p ( h ˜ n γ n ) p 0 1 φ ( u ) f ( u h ˜ n + t h ˜ n ) d u 2 = O κ N 2 K 1 / 2 a N p / 2
(ii): For h n = h ˜ n / a N , a N .
E ( V ˜ 1 ) = K n h n p / 2 E ( V 1 k ) = K n h n p / 2 0 1 ( 0 , 1 ) ( x 2 1 h n , x 2 + 1 h n ) 1 h n p K x 1 x 2 h n ( x 1 ) ( x 2 ) f ( x 1 ) f ( x 2 ) d x 1 d x 2 = K n h n p / 2 κ N 2 ρ N 2 1 h n p t Y n t h ˜ n ( t + 1 ) h ˜ n ( t h ˜ n , ( t + 1 ) h ˜ n ) ( x 2 1 h n , x 2 + 1 h n ) K x 1 x 2 h n φ x 1 t h ˜ n h ˜ n φ x 2 t h ˜ n h ˜ n f ( x 1 ) f ( x 2 ) d x 1 d x 2 K n h n p / 2 κ N 2 ρ N 2 h ˜ n p t Y n 0 1 1 1 K u φ u · h n h ˜ n + v φ ( v ) f ( u h n + t h ˜ n + v h ˜ n ) f ( v h ˜ n + t h ˜ n ) d u d v = K n h n p / 2 κ N 2 ρ N 2 ( h ˜ n γ n ) p 0 1 1 1 K u φ u · h n h ˜ n + v φ ( v ) f ( u h n + t h ˜ n + v h ˜ n ) f ( v h ˜ n + t h ˜ n ) d u d v = K n h n p / 2 κ N 2 ρ N 2 = O κ N 2 K 1 / 2 a N p / 2
Through tedious calculation, we can also get V a r ( V ˜ 1 ) = O ( 1 ) and V ¯ 2 = o p ( 1 ) as h n 0 for above two cases. Therefore, for any a N , there exists κ N such that n h n p / 2 K 1 / 2 T N ( h n ) = O p ( 1 ) as h n 0 . Therefore, we cannot get P n h n p / 2 K 1 / 2 T N ( h n ) z α 1 . We obtain the same conclusion for inf m ( · ) H 1 ( κ N ρ N ) P n h n p / 2 K 1 / 2 T N ( h n ) z α . Thus, the theorem is proved. □

Author Contributions

Conceptualization, Y.Z.; methodology, Y.Z.; formal analysis, P.L. and Y.Z.; investigation, Y.Z. and L.X.; writing—original draft preparation, P.L. and Y.Z.; writing—review and editing, P.L., Y.Z. and L.X.; visualization, P.L., Y.Z. and T.W.; supervision, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Natural Science Foundation of China (12201351) and the Natural Science Foundation of Shandong Province (ZR2022QA013), as well as the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (24KJB110024) the Qinglan Project of Jiangsu Province of China [2022], and the Huai’an City Science and Technology Project (HAB202357).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Zaremba, W.; Gretton, A.; Blaschko, M. B-test: A Non-parametric, Low Variance Kernel Two-sample Test. Adv. Neural Inf. Process. Syst. 2013, 26, 755–763. [Google Scholar]
  2. Chen, X.Y.; Xie, M.G. A split-and-conquer approach for analysis of extraordinarily large data. Stat. Sin. 2014, 24, 1655–1684. [Google Scholar]
  3. Battey, H.; Fan, J.; Liu, H.; Lu, J.; Zhu, Z. Distributed Estimation and Inference with Statistical Guarantees. arXiv 2015, arXiv:1509.05457. [Google Scholar]
  4. Fan, J.; Han, F.; Liu, H. Challenges of Big Data analysis. Natl. Sci. Rev. 2014, 1, 293–314. [Google Scholar] [CrossRef] [PubMed]
  5. Hardle, W.; Mammen, E. Comparing nonparametric versus parametric regression fits. Ann. Statist. 1993, 21, 1926–1947. [Google Scholar] [CrossRef]
  6. Neumeyer, N.; Van Keilegom, I. Estimating the error distribution in nonparametric multiple regression with applications to model testing. J. Multivar. Anal. 2010, 101, 1067–1078. [Google Scholar] [CrossRef]
  7. González-Manteiga, W.; Crujeiras, R.M. An updated review of Goodness-of-Fit tests for regression models. Test 2013, 22, 361–411. [Google Scholar] [CrossRef] [PubMed]
  8. Delgado, M. Testing the equality of nonparametric regression curves. Stat. Probab. Lett. 1993, 17, 199–204. [Google Scholar] [CrossRef]
  9. Bierens, H.J. A consistent conditional moment test of functional form. Econometrica 1990, 58, 1443–1458. [Google Scholar] [CrossRef]
  10. Hart, J.D. Nonparametric Smoothing and Lack-of-Fit Tests, 1st ed.; Springer: New York, NY, USA, 1997. [Google Scholar]
  11. Ingster, Y.I. Minimax nonparametric detection of signals in white Gaussian noise. Probl. Inf. Transm. 1982, 18, 130–140. [Google Scholar]
  12. Ingster, Y.I. Asymptotically minimax hypothesis testing for nonparametric alternatives I, II, III. Math. Methods Stat. 1993, 2, 85–114. [Google Scholar]
  13. Guerre, E.; Lavergne, P. Optimal minimax rates for nonparametric specification testing in regression models. Econom. Theory 2002, 18, 1139–1171. [Google Scholar] [CrossRef]
  14. Cai, L.; Guo, X.; Zhong, W. Test and Measure for Partial Mean Dependence Based on Machine Learning Methods. J. Am. Stat. Assoc. 2024, 1–13. [Google Scholar] [CrossRef]
  15. Tan, F.; Zhu, L. Adaptive-to-model checking for regressions with diverging number of predictors. Ann. Stat. 2019, 47, 1960–1994. [Google Scholar] [CrossRef]
  16. Han, Y.; Ma, P.; Ren, H.; Wang, Z. Model checking in large-scale data set via structure-adaptive-sampling. Stat. Sin. 2023, 33, 303–329. [Google Scholar]
  17. Zheng, J.X. A consistent test of functional form via nonparametric estimation techniques. J. Econom. 1996, 75, 263–289. [Google Scholar] [CrossRef]
  18. Zhao, Y.; Zou, C.; Wang, Z. A scalable nonparametric specification testing for massive data. J. Stat. Plan. Inference 2019, 200, 161–175. [Google Scholar] [CrossRef]
  19. Zhao, Y.; Zou, C.; Wang, Z. An adaptive lack of fit test for big data. Stat. Theory Relat. Fields 2017, 1, 59–68. [Google Scholar] [CrossRef]
  20. Ibragimov, I.A.; Khasminski, R.Z. Statistical Estimation: Asymptotic Theory, 1st ed.; Springer: Berlin/Heidelberg, Germany, 1981. [Google Scholar]
  21. Horowitz, J.; Spokoiny, V. An adaptive, rate-optimal test of parametric mean-regression model against a nonparametric alternative. Econometrica 2001, 69, 599–631. [Google Scholar] [CrossRef]
  22. Guerre, E.; Lavergne, P. Data-driven rate-optimal specification testing in regression models. Ann. Stat. 2005, 33, 840–870. [Google Scholar] [CrossRef]
Figure 1. Power comparison of the MD test and DZH test based on different bandwidths under model M2 with b = 0.8 and K = 40 . Error term ε is generated from standard normal distribution.
Figure 1. Power comparison of the MD test and DZH test based on different bandwidths under model M2 with b = 0.8 and K = 40 . Error term ε is generated from standard normal distribution.
Axioms 14 00228 g001
Figure 2. Power comparison of the MD test and DZH test based on different bandwidths under model M2 with b = 0.8 and K = 40 . Error term ε is generated from standardized exponential distribution.
Figure 2. Power comparison of the MD test and DZH test based on different bandwidths under model M2 with b = 0.8 and K = 40 . Error term ε is generated from standardized exponential distribution.
Axioms 14 00228 g002
Figure 3. Power comparison of the MD test and DZH test based on different bandwidths under model M2 with b = 0.8 and K = 40 . Error term ε is generated from Student’s t distribution with 5 degrees of freedom.
Figure 3. Power comparison of the MD test and DZH test based on different bandwidths under model M2 with b = 0.8 and K = 40 . Error term ε is generated from Student’s t distribution with 5 degrees of freedom.
Axioms 14 00228 g003
Figure 4. Power comparison of the MD test and DZH test based on different bandwidths under model M2 with b = 0.8 and K = 40 . Error term ε is generated from standard normal distribution. Z 1 and Z 1 are generated from Student’s t distribution with 5 degrees of freedom.
Figure 4. Power comparison of the MD test and DZH test based on different bandwidths under model M2 with b = 0.8 and K = 40 . Error term ε is generated from standard normal distribution. Z 1 and Z 1 are generated from Student’s t distribution with 5 degrees of freedom.
Axioms 14 00228 g004
Table 1. Empirical sizes for different values of N with α = 1 % , 5 % , 10 % when the error term is normal distribution.
Table 1. Empirical sizes for different values of N with α = 1 % , 5 % , 10 % when the error term is normal distribution.
MDDZH
h 1 h 2 h 3
N α K 102040102040102040102040
1%1.31.31.10.81.01.00.70.70.60.60.90.8
20005%5.15.85.34.45.34.85.34.34.44.65.85.5
10%11.110.111.39.99.710.010.310.010.711.612.111.6
1%1.31.21.11.01.00.91.00.80.70.90.50.6
40005%5.45.74.84.75.14.24.94.64.65.64.94.3
10%9.710.49.89.29.98.99.28.89.810.19.28.8
1%0.51.00.50.40.70.40.40.80.51.01.00.6
80005%5.25.14.14.24.53.45.04.94.35.25.54.3
10%10.810.710.210.19.59.39.99.49.310.910.48.7
Table 2. Empirical sizes for different values of N with α = 1 % , 5 % , 10 % when the error term is exponential distribution.
Table 2. Empirical sizes for different values of N with α = 1 % , 5 % , 10 % when the error term is exponential distribution.
MDDZH
h 1 h 2 h 3
N α K 102040102040102040102040
1%1.72.21.61.51.91.50.70.81.51.10.71.1
20005%5.75.55.55.25.15.05.84.94.96.25.15.1
10%10.610.89.910.09.99.612.110.59.711.511.09.8
1%1.30.80.71.20.50.71.00.51.21.20.60.6
40005%5.85.24.84.94.64.35.14.94.64.84.93.8
10%11.39.710.410.39.09.88.99.49.38.99.88.6
1%1.01.11.00.91.00.81.41.20.51.11.30.6
80005%4.95.53.44.34.52.94.54.62.94.75.44.1
10%10.110.38.19.39.57.48.39.48.39.09.39.8
Table 3. Empirical sizes for different values of N with α = 1 % , 5 % , 10 % when the error term is Student’s t distribution.
Table 3. Empirical sizes for different values of N with α = 1 % , 5 % , 10 % when the error term is Student’s t distribution.
MDDZH
h 1 h 2 h 3
N α K 102040102040102040102040
1%1.10.71.11.10.60.70.60.60.40.70.60.6
20005%4.84.74.74.34.24.44.04.04.94.33.84.8
10%9.99.810.49.29.19.28.08.610.110.48.89.0
1%1.21.61.51.11.11.21.01.31.10.81.10.6
40005%5.35.65.84.95.15.25.05.55.55.25.45.2
10%10.010.410.59.09.59.510.210.49.111.69.39.0
1%1.21.01.51.00.91.20.50.51.30.60.91.1
80005%5.23.75.24.53.24.53.43.44.84.74.45.0
10%9.18.89.98.37.99.18.58.610.19.910.111.4
Table 4. Empirical sizes for different values of N with α = 1 % , 5 % , 10 % when the Z 1 and Z 2 are generated from Student’s t(5) distribution.
Table 4. Empirical sizes for different values of N with α = 1 % , 5 % , 10 % when the Z 1 and Z 2 are generated from Student’s t(5) distribution.
MD DZH
h 1 h 2 h 3
N α K 102040102040102040102040
1%1.21.31.70.91.11.61.50.81.11.01.11.1
20005%6.06.66.05.66.15.35.05.85.95.35.95.7
10%10.910.811.510.010.310.510.210.410.210.910.911.0
1%0.90.51.10.70.50.80.91.01.41.21.31.1
40005%5.74.94.94.94.34.64.84.24.55.45.45.3
10%11.310.49.99.69.88.610.19.29.610.89.810.1
1%0.91.80.50.81.20.30.71.11.10.81.00.6
80005%4.65.64.64.04.74.33.95.94.34.34.25.5
10%9.811.79.49.111.18.89.110.88.78.89.410.5
Table 5. Empirical power of MD and DZH with α = 5 % when the error term is normal distribution.
Table 5. Empirical power of MD and DZH with α = 5 % when the error term is normal distribution.
MD DZH
h 1 h 2 h 3
N b∖ K 102040102040102040102040
2000183.460.936.881.95935.545.125.914.517.310.97.4
1010010010018.212.48.4100100100100100100
400011009988.910098.988.394.374.446.548.52715.7
1010010010062.836.620100100100100100100
8000110010010010010010010099.994.996.375.948.4
1010010010010093.567.1100100100100100100
Table 6. Empirical power of MD and DZH with α = 5 % when the error term is exponential distribution.
Table 6. Empirical power of MD and DZH with α = 5 % when the error term is exponential distribution.
MDDZH
h 1 h 2 h 3
N b∖ K 102040102040102040102040
2000182.761.238.282.459.936.744.426.516.115.89.65.6
1010010010019.111.87.510010099.9100100100
4000199.898.589.799.898.488.994.97244.146.925.213.3
1010010010066.334.219.1100100100100100100
8000110010010010010010010099.796.19778.850.6
1010010010099.894.166.4100100100100100100
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, P.; Zhao, Y.; Xu, L.; Wang, T. Optimal Minimax Rate of Smoothing Parameter in Distributed Nonparametric Specification Test. Axioms 2025, 14, 228. https://doi.org/10.3390/axioms14030228

AMA Style

Liu P, Zhao Y, Xu L, Wang T. Optimal Minimax Rate of Smoothing Parameter in Distributed Nonparametric Specification Test. Axioms. 2025; 14(3):228. https://doi.org/10.3390/axioms14030228

Chicago/Turabian Style

Liu, Peili, Yanyan Zhao, Libai Xu, and Tao Wang. 2025. "Optimal Minimax Rate of Smoothing Parameter in Distributed Nonparametric Specification Test" Axioms 14, no. 3: 228. https://doi.org/10.3390/axioms14030228

APA Style

Liu, P., Zhao, Y., Xu, L., & Wang, T. (2025). Optimal Minimax Rate of Smoothing Parameter in Distributed Nonparametric Specification Test. Axioms, 14(3), 228. https://doi.org/10.3390/axioms14030228

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop