Optimal Minimax Rate of Smoothing Parameter in Distributed Nonparametric Specification Test

Peili Liu; Yanyan Zhao; Libai Xu; Tao Wang

doi:10.3390/axioms14030228

,

and

¹

Department of Biostatistics, School of Public Health, Shandong University, Jinan 250021, China

²

School of Mathematical Sciences, Soochow University, Suzhou 215006, China

³

School of Mathematics and Statistics, Huaiyin Normal University, Huai’an 223300, China

^*

Author to whom correspondence should be addressed.

Axioms2025, 14(3), 228;https://doi.org/10.3390/axioms14030228

This article belongs to the Special Issue Advances in Statistical Simulation and Computing

Version Notes

Order Reprints

Abstract

A model specification test is a statistical procedure used to assess whether a given statistical model accurately represents the underlying data-generating process. The smoothing-based nonparametric specification test is widely used due to its efficiency against “singular” local alternatives. However, large modern datasets create various computational problems when implementing the nonparametric specification test. The divide-and-conquer algorithm is highly effective for handling large datasets, as it can break down a large dataset into more manageable datasets. By applying divide-and-conquer, the nonparametric specification test can handle the computational problems induced by the massive size of the modern datasets, leading to improved scalability and efficiency and reduced processing time. However, the selection of smoothing parameters for optimal power of the distributed algorithm is an important problem. The rate of the smoothing parameter that ensures rate optimality of the test in the context of testing the specification of a nonlinear parametric regression function is studied in the literature. In this paper, we verified the uniqueness of the rate of the smoothing parameter that ensures the rate optimality of divide-and-conquer-based tests. By employing a penalty method to select the smoothing parameter, we obtain a test with an asymptotic normal null distribution and adaptiveness properties. The performance of this test is further illustrated through numerical simulations.

Keywords:

divide-and-conquer; optimal minimax rate; model specification test; bandwidth

MSC:

62-08; 62G10

1. Introduction

Big datasets characterized by large sample sizes N and/or high dimension p are increasingly accessible. In this paper, we focus on datasets with massive sample size N and low dimension p. However, directly making inferences from such large datasets is computationally infeasible due to limitations in processor memory, which makes selecting an appropriate model for big data particularly challenging. The divide-and-conquer approach is intuitive and has been widely employed across various fields to tackle diverse problems. Zaremba et al. [1] utilized this strategy to address two-sample test problems. In situations involving large sample sizes or high-dimensional predictors, Chen and Xie [2] applied the divide-and-conquer methodology for variable selection in generalized linear models. Battey et al. [3] integrated the divide-and-conquer algorithm with high-dimensional hypothesis testing and estimation. Additionally, as noted in [4], samples in big datasets are often aggregated from multiple sources. Therefore, feasible and robust specification testing methods, essential for addressing model misspecification, are critical for handling massive datasets.

Suppose we have a sequence of independent observations

{(y_{i}, x_{i})}_{i = 1}^{N}

drawn from a population

(Y, X) \in R \times {[0, 1]}^{p}

, where the unknown regression function

E (Y ∣ X = x) = m (x)

is assumed to be smooth. In this context, a specification test is necessary to assess the functional form of the regression and justify the use of a parametric model. Given a parametric family of known real functions

g (x; θ)

, the null and alternative hypotheses can be described as follows:

\begin{matrix} H_{0} : m (x) = g (x, θ_{0}) for some θ_{0} \in Θ, \end{matrix}

(1)

\begin{matrix} H_{1} : m (x) \neq g (x, θ_{0}) for all θ_{0} \in Θ, \end{matrix}

(2)

where

Θ \subset R^{q}

denotes the parameter space. This hypothesis testing problem has been widely studied in the literature. One category of approach is to measure the distance between the estimator under the null and the nonparametric estimator under alternative models (see Hardle and Mammen [5], Neumeyer and Van Keilegom [6], González-Manteiga and Crujeiras [7] and the references therein). Another competing approach relies on the empirical process of the residuals from the parametric model [8,9]. An important criterion for evaluating the behavior of these tests is their power performance under local alternatives (see, e.g., [10]). Additionally, Ingster [11,12] proposed an alternative approach to investigating the asymptotic power properties of tests via the minimax approach. Guerre and Lavergne [13] further provided the optimal minimax rate for the smoothing parameter that ensures the rate optimality of the test in the context of testing the specification of a nonlinear parametric regression function. Conditional on a subset of covariates in regression modeling, Cai et al. [14] proposed a significance test for the partial mean independence problem based on machine learning methods and data splitting. Tan and Zhu [15] proposed a residual-marked empirical process that adapts to the underlying model, forming the basis of a goodness-of-fit test for parametric single-index models with a diverging number of predictors. However, existing methods that work well for moderate-sized datasets are not feasible for massive datasets due to computational limitations. Han et al. [16] developed an optimal sampling strategy to select a small subset from a large pool of data to reduce the computation budget for model checking big data. When dealing with test statistics that are quadratic forms [5,17], the computational complexity of the quadratic form test statistic is

O (N^{2})

, which presents a significant computational burden for large-scale data.

To address this issue, a divide-and-conquer-based test statistic was proposed in [18,19]. Zhao et al. [18] incorporated a divide-and-conquer strategy into [17] a nonparametric test statistic along with a data-driven bandwidth selection procedure. However, this integrated approach can easily inflate the type I error rate. To mitigate issues associated with choosing smoothing parameters and preserving the type-I error rate, Ref. Zhao et al. [19] proposed randomly splitting the observations into two subsets. In the first subset, an “optimal” smoothing parameter is selected based on a straightforward criterion. Subsequently, a lack-of-fit test grounded in asymptotic theory is conducted using the second subset. This data-splitting strategy effectively controls the type-I error rate. However, the sample splitting will reduce the power, as only a subset of the sample is used to construct the test statistics. Furthermore, the uniqueness of the rate of the smoothing parameter, which ensures the rate optimality of the divide-and-conquer-based test statistic, is not addressed in [18,19]. In this paper, we establish and verify the uniqueness of the rate for the smoothing parameter that guarantees the rate optimality of the divide-and-conquer-based test statistic.

Moreover, it is well known that the optimal smoothing parameters for testing differ from those that are optimal for estimation [11,12,20]. As a result, there has been growing interest in adaptive testing methods. One approach is to consider a set of suitable values for the bandwidth and proceed from there, as discussed in [21,22]. In this paper, we integrate the smoothing parameter selection method in [22] with the divide-and-conquer-based test statistic proposed in [18]. This combination leads to a computationally feasible and adaptive test statistic which retains its asymptotic normality under the null hypothesis.

The paper is organized as follows: Section 2 describes the test statistics and their corresponding asymptotic behavior under the null hypothesis. In Section 3, we demonstrate the unique rate of the smoothing parameter that ensures rate optimality in the DZH test. Section 4 presents simulation studies for illustration. The proofs of the theorems are provided in Section 5.

2. The Divide-And-Conquer-Based Test Statistics

The distributed test statistic proposed in Zhao et al. [18] is based on the test statistic in Zheng [17], where the kernel method is used to estimate the conditional moment

E {ζ_{i} E (ζ_{i} ∣ X_{i}) f (X_{i})}

,

ζ_{i} = y_{i} - g (x_{i}; θ_{0})

, and

f (\cdot)

is the density function of

x_{i}

. The kernel-based sample estimator of the quantity is

Q_{N} (h_{N}) = \frac{1}{N (N - 1)} \sum_{i = 1}^{N} \sum_{j \neq i}^{N} K_{h_{N}} (x_{i} - x_{j}) e_{i} e_{j},

where

K_{h_{N}} (\cdot) = K (\cdot / h_{N}) / h_{N}^{p}

denotes a p-dimensional kernel function,

h_{N}

is the bandwidth depending on N,

e_{i} = y_{i} - g (x_{i}, {\hat{θ}}_{N})

, and

{\hat{θ}}_{N}

is an estimate of

θ_{0}

under the null hypothesis.

When handling exceptionally large datasets where the sample size N becomes unmanageable, the test statistics combined with the divide-and-conquer procedure is proposed in Zhao et al. [18]. Initially, the dataset is partitioned into K equally sized subsets, each containing n observations. The test statistic based on the observations in the kth subset is

V_{k} (h_{n}) = \frac{1}{n (n - 1)} \sum_{i = 1}^{n} \sum_{j \neq i}^{n} K_{h_{n}} (x_{i k} - x_{j k}) e_{i k} e_{j k},

where

e_{i k}

’s are the fitted residuals with

{(x_{i k}, y_{i k})}_{i = 1}^{n}

. As

n h_{n}^{p / 2} V_{k} (h_{n})

is asymptotically normal with mean zero and

δ^{2}

under mild conditions [17], where

δ^{2} = 2 \int K^{2} (u) d u \int {σ^{2} (x)}^{2} f^{2} (x) d x, σ^{2} (x) = E (ε_{i}^{2} ∣ x), ε_{i} = y_{i} - m (x_{i})

Then, Zhao et al. [18] combined the test statistic by taking an average,

\begin{matrix} T_{N} (h_{n}) = \frac{1}{K} \sum_{k = 1}^{K} \frac{V_{k} (h_{n})}{\hat{δ} (h_{n})}, \end{matrix}

(3)

where

{\hat{δ}}^{2} (h_{n})

is an estimate of

δ^{2}

. A natural estimator is

{\hat{δ}}^{2} (h_{n}) = K^{- 1} \sum_{k = 1}^{K} {\hat{δ}}_{k}^{2} (h_{n})

, where

{\hat{δ}}_{k}^{2} (h_{n}) = \frac{2}{n (n - 1)} \sum_{i = 1}^{n} \sum_{j \neq i}^{n} h_{n}^{p} K_{h_{n}}^{2} (x_{i k} - x_{j k}) e_{i k}^{2} e_{j k}^{2}

is a consistent estimate of

δ^{2}

based on the kth subset. The test based on statistic

T_{N} (h_{n})

is denoted as the DZH test in [18].

T_{N} (h_{n})

is asymptotic normal under the null hypothesis provided some mild conditions [18]. In this paper, we study the asymptotic behavior of

T_{N} (h_{n})

by relaxing

n h_{n}^{p} / ln n \to \infty

to

n h_{n}^{p} \to \infty

.

Assumption 1.

The density function

f (x)

of

x

and its first-order derivatives are uniformly bounded,

0 \leq \underset{̲}{f} \leq f (x) \leq \bar{f} < \infty

,

\forall x \in {[0, 1]}^{p}

.

Assumption 2.

Suppose that

E (ε_{i} | X_{i}) = 0

and

σ^{2} (x_{i}) \leq {\bar{σ}}^{2}

,

E (ε_{i}^{4} | X_{i}) = σ^{4} (x_{i}) \leq C

uniformly in i. We also assume that

σ^{2} (x_{i})

is differentiable and that its first-order derivatives are uniformly bounded for all i.

Assumption 3.

For any

m (\cdot)

, not necessarily in

H_{0}

, let

θ^{*} = \underset{θ \in Θ}{arg min} E {m (X) - g (X; θ)}^{2} .

Under

H_{0}

,

θ^{*} = θ_{0}

. For any

m (\cdot)

,

θ^{*}

is unique.

{\hat{θ}}_{n}

is the estimator of

θ^{*}

such that

\sqrt{n K} ({\hat{θ}}_{n} - θ^{*}) = O_{p} (1)

uniformly with respect to

m (\cdot)

with

E {m^{4} (X)} \leq C < \infty

, i.e.,

\forall η > 0, \exists ϵ > 0 : \underset{n, K \to \infty}{lim sup} sup_{E {m^{4} (X)} \leq C} P (\sqrt{n K} ∥ {\hat{θ}}_{n} - θ^{*} ∥ > ϵ) \leq η .

Assumption 4.

g (\cdot, \cdot)

is uniformly bounded in

x

and θ, is twice continuously differentiable with respect to θ, with first- and second-order derivatives

g_{θ} (\cdot, \cdot)

and

g_{θ θ} (\cdot, \cdot)

uniformly bounded in

x

and

θ \in Θ

with upper bound

{\bar{g}}_{θ}

and

{\bar{g}}_{θ θ}

, respectively.

Assumption 5.

K (u)

is a nonnegative, bounded, continuous, and symmetric function such that

\int K (u) d u = 1

.

Assumption 6.

Suppose

K (u)

’s Fourier transform

\hat{K} (u) = \int exp (- i t^{'} u) K (t) d t

is strictly positive on its nonempty support.

Theorem 1

(Null hypothesis). Suppose Assumptions 1–5 hold; if

n h_{n}^{p} \to \infty

,

h_{n} \to 0

and

K \to \infty

, then we have

n h_{n}^{p / 2} K^{1 / 2} T_{N} (h_{n}) \overset{d}{\to} N (0, 1)

.

This result suggests that we can reject

H_{0}

at an

α

level of significance if the normalized

n h_{n}^{p / 2} K^{1 / 2} T_{N} (h_{n})

is larger than

z_{α}

, where

z_{α}

is the upper

α

th quantile of the standard normal distribution. Given that our focus is to demonstrate the null asymptotic results under the specific bandwidth

h_{n}

in Theorem 1, the condition can be relaxed to

n h_{n}^{p} \to \infty

. The proof of this theorem closely resembles that of Theorem 1 in Zhao et al. [18], with the exception that we relax the condition from

n h_{n}^{p} / ln n \to \infty

to

n h_{n}^{p} \to \infty

. Therefore, we omit the detailed proof here. However, how to choose an appropriate K via balancing the computation budget and statistical efficiency in practical applications is still an open question.

To develop an adaptive test, we integrate the smoothing parameter selection procedure proposed by Guerre and Lavergne [22] and

T_{N} (h_{n})

. This procedure advocates for a larger smoothing parameter under the null hypothesis and selects h based on this criterion.

h_{n}^{*} = \underset{h \in H_{n}}{argmax} {K^{- 1} \sum_{k = 1}^{K} V_{k} (h) - γ_{n} {\hat{υ}}_{h, h_{0}}} .

where

H_{n}

is the given candidate set of

h_{n}

and

{\hat{υ}}_{h, h_{0}}

is an estimator of the asymptotic null standard deviation of

K^{- 1} \sum_{k = 1}^{K} (V_{k} (h) - V_{k} (h_{0}))

. The asymptotic null standard deviation of

K^{- 1} \sum_{k = 1}^{K} (V_{k} (h) - V_{k} (h_{0}))

is

υ_{h, h_{0}} = \sqrt{\frac{2}{K n (n - 1)} \int {[K_{h} (x_{1} - x_{2}) - K_{h_{0}} (x_{1} - x_{2})]}^{2} σ^{2} (x_{1}) σ^{2} (x_{2}) f (x_{1}) f (x_{2}) d x_{1} d x_{2}},

where an intuitive estimator is

{\hat{υ}}_{h, h_{0}} = \sqrt{\frac{2}{K^{2} n^{2} {(n - 1)}^{2}} \sum_{k = 1}^{K} \sum_{i = 1}^{n} \sum_{i \neq j} {[K_{h} (x_{i k} - x_{j k}) - K_{h_{0}} (x_{i k} - x_{j k})]}^{2} e_{i k}^{2} e_{j k}^{2}} .

Let

h_{0}

represent the largest element in

H_{n}

. The corresponding test statistic based on the selected

h_{n}^{*}

is given by

T_{N}^{*} (h_{n}^{*}) = \frac{1}{K} \sum_{k = 1}^{K} \frac{V_{k} (h_{n}^{*})}{\hat{δ} (h_{0})}

Under the null hypothesis, as

γ_{n} \to \infty

, the test statistic

T_{N}^{*} (h_{n}^{*})

tends to favor

T_{N} (h_{0})

. Given that

T_{N} (h_{0})

is asymptotically normal under Assumptions 1–5, and considering that

n h_{0}^{p} \to \infty

,

h_{0} \to 0

and

K \to \infty

,

T_{N}^{*} (h_{n}^{*})

also achieves asymptotic normality under the additional condition that

γ_{n} \to \infty

. Moreover, this statistic exhibits an adaptive property, enhancing its suitability across a broader range of alternative hypotheses.

3. The Unique Rate of the Smoothing Parameter Ensuring Rate-Optimality in the DZH Test

Our previous study in Zhao et al. [18] demonstrated that the optimal power of the test is significantly dependent on the set of bandwidth candidates,

H_{n}

. This set should encompass the optimal rate of bandwidth,

{\tilde{h}}_{n} ≍ N^{- 2 / (4 s + p)}

to achieve desired performance. However, the uniqueness of this bandwidth rate was not established. In this section, we will demonstrate the uniqueness of the rate

O (N^{- 2 / (4 s + p)})

, which is critical for ensuring the rate-optimality of the DZH test. Let the Hölder class

C_{p} (L, s)

be the set of maps

ℓ_{N} (\cdot)

from

{[0, 1]}^{p}

to

R

with

\begin{matrix} C_{p} (L, s) & = \{ℓ_{N} (\cdot) : | ℓ_{N} (x) - ℓ_{N} {(y) | \leq L ∥ x - y ∥}^{s} for all x, y in {[0, 1]}^{p}\}, s \in (0, 1], \\ C_{p} (L, s) & = \{ℓ_{N} (\cdot) : the ⌊ s ⌋ th partial derivatives of ℓ_{N} (\cdot) are in C_{p} (L, s - ⌊ s ⌋)\}, s > 1, \end{matrix}

where

⌊ s ⌋

is the lower integer part of s. Consider the following alternative hypothesis:

H_{1} (κ_{N} ρ_{N}) = {m_{N} (x) : E ℓ_{N}^{2} (X, θ^{*}) \geq κ_{N}^{2} ρ_{N}^{2}, ℓ_{N} (\cdot) \in C_{p} (L, s) for fixed s > p / 4},

where

ℓ_{N} (\cdot) = m_{N} (\cdot) - g (\cdot; θ^{*})

and

ρ_{N} = N^{- 2 s / (4 s + p)}

.

ρ_{N}

is the optimal minimax rate for nonparametric specification testing in regression models of known

s > p / 4

for the Hölder class given above (see Guerre and Lavergne [13]).

Theorem 2.

Suppose Assumptions 1–6 hold,

{\tilde{h}}_{n} ≍ N^{- 2 / (4 s + p)}

is the only bandwidth rate such that

I {n h_{n}^{p / 2} K^{1 / 2} T_{N} (h_{n}) \geq z_{α}}

can consistent uniformly against

{m_{N} (\cdot)}_{N \geq 1} \in H_{1} (κ_{N} ρ_{N})

for any

κ_{N} \to \infty

.

4. Simulation Studies

In this section, we present simulation studies to examine the behaviors of the size and power of the tests based on test statistics

T_{N} (h_{n})

and

T^{*} (h_{n}^{*})

, denoted as DZH and MD, respectively. We choose

p = 2, N = 2000, 4000, 8000, K = 10, 20, 40

. To demonstrate the adaptiveness of the MD test compared to the DZH test and to maintain the type I error rate for both tests, we select

H_{n} = {h_{1} = 0.28, h_{2} = 0.14, h_{3} = 0.07}

. For the MD test, we adopt a penalty sequence

γ_{n} = c \sqrt{2 ln (| H_{n} |)}

, where

c = 2

, as recommended in Guerre and Lavergne [22], and

| H_{n} |

represents the cardinality of

H_{n}

. Two models are used to generate response variable Y.

M1: $Y = 1 + X_{1} + X_{2} + ε$
M2: $Y = 1 + X_{1} + X_{2} + sin (b X_{1}) + ε, b \in {0.8, 1, 10}$

We define the variables

X_{1}

and

X_{2}

where

X_{1} = Z_{1}

represents a simple assignment, and

X_{2} = Z_{1} + Z_{2} / \sqrt{2}

combines the influences of two independent factors under transformation to maintain variance consistency. To rigorously test the robustness of our proposed statistical method against different models,

Z_{1}

and

Z_{2}

are independently drawn from either the standard normal distribution, which provides a baseline due to its well-known properties, or from the Student’s t-distribution with 5 degrees of freedom, known for its heavier tails and greater kurtosis. This choice enables an examination of the test’s sensitivity deviating from normality. Furthermore, to assess the impact of error distributions on test performance, we explore three different distributions for the error term

ε

. These include the standard normal distribution, standardized exponential distribution, and Student’s t-distribution with 5 degrees of freedom. Standard normal distribution assumes ideal conditions. The standardized exponential distribution introduces asymmetry and is skewed. Student’s t-distribution with 5 degrees of freedom tests the resilience of the method against errors with heavier tails. This comprehensive approach allows us to determine the test’s effectiveness and reliability across various scenarios reflective of real-world data complexities. The kernel function

K

used is the bivariate standard normal density function. M1 is used to assess the size of the tests. To investigate the power of the test against a high(low)-frequency alternatives, M2 is considered. In model M2, small(large) values of b represent low(high)-frequency alternatives. b is selected to be

{0.8, 1, 10}

.

The empirical sizes are reported in Table 1, Table 2, Table 3 and Table 4 for different error distributions, demonstrating that both methods effectively maintain the Type-I error rate. We describe the variation trends of the power along sample size N and K in Table 5 and Table 6 and Figure 1, Figure 2, Figure 3 and Figure 4. DZH tests under three distinct bandwidths are included in the power comparison tables. The results indicate that power loss increases with larger K, a consequence of the information loss inherent in the divide-and-conquer procedure. For low-frequency alternatives (when

b = 0.8, 1

), the power of the DZH test improves as h increases. However, for the high-frequency setting

b = 10

, DZH test exhibits an opposite trend with changes in h. The MD test performs comparably to the best scenarios of the DZH test for both low- and high-frequency alternatives, demonstrating its adaptive capability. This adaptiveness makes it suitable for a broader range of alternative hypotheses, accommodating both low- and high-frequency variations. The comparison between Figure 1 and Figure 4 demonstrates that the proposed test exhibits higher power when the variables

Z_{1}

and

Z_{2}

are generated from the Student’s t(5) distribution rather than the normal distribution. This underscores the robustness of our method against the heavy-tailed distribution of variables. The power performance of all the tests under the exponential distribution is comparable to that under the normal distribution. However, there is a noticeable decrease in power when the underlying model is the Student’s t distribution compared to the other two scenarios. All analyses were conducted using R version 4.3.2.

Table 1. Empirical sizes for different values of N with

α = 1 %, 5 %, 10 %

when the error term is normal distribution.

Table 2. Empirical sizes for different values of N with

α = 1 %, 5 %, 10 %

when the error term is exponential distribution.

Table 3. Empirical sizes for different values of N with

α = 1 %, 5 %, 10 %

when the error term is Student’s t distribution.

Table 4. Empirical sizes for different values of N with

α = 1 %, 5 %, 10 %

when the

Z_{1}

and

Z_{2}

are generated from Student’s t(5) distribution.

Table 5. Empirical power of MD and DZH with

α = 5 %

when the error term is normal distribution.

Table 6. Empirical power of MD and DZH with

α = 5 %

when the error term is exponential distribution.

Figure 1. Power comparison of the MD test and DZH test based on different bandwidths under model M2 with

b = 0.8

and

K = 40

. Error term

ε

is generated from standard normal distribution.

Figure 2. Power comparison of the MD test and DZH test based on different bandwidths under model M2 with

b = 0.8

and

K = 40

. Error term

ε

is generated from standardized exponential distribution.

Figure 3. Power comparison of the MD test and DZH test based on different bandwidths under model M2 with

b = 0.8

and

K = 40

. Error term

ε

is generated from Student’s t distribution with 5 degrees of freedom.

Figure 4. Power comparison of the MD test and DZH test based on different bandwidths under model M2 with

b = 0.8

and

K = 40

. Error term

ε

is generated from standard normal distribution.

Z_{1}

and

Z_{1}

are generated from Student’s t distribution with 5 degrees of freedom.

5. Proofs

Some Lemmas

In this section, we restate three Lemmas in Zhao et al. [18], omitting detailed proofs for brevity. Lemma 2 is restated under assumption

n h_{n}^{p} \to \infty

. We assume

q = 1

without loss of generality. Denote

e_{i k} = y_{i k} - g (x_{i k}; {\hat{θ}}_{n}), ε_{i k} = y_{i k} - m (x_{i k}), u_{i k} = g (x_{i k}; {\hat{θ}}_{n}) - g (x_{i k}; θ^{*}), ℓ_{i k} = m (x_{i k}) - g (x_{i k}; θ^{*})

We introduce the following matrix notations:

e_{k} = {(e_{i k}, 1 \leq i \leq n)}^{'}, ε_{k} = {(ε_{i k}, 1 \leq i \leq n)}^{'}, u_{k} = {(u_{i k}, 1 \leq i \leq n)}^{'}, ℓ_{k} = {(ℓ_{i k}, 1 \leq i \leq n)}^{'}

W_{k} (h_{n}) = {[w_{i k, j k}]}_{1 \leq i, j \leq n}

w_{i k, j k} = \{\begin{matrix} \frac{1}{n (n - 1)} K_{h_{n}} (x_{i k} - x_{j k}), & i \neq j; \\ 0, & i = j \end{matrix}

Under

H_{1}

,

ℓ_{i k} \neq 0

,

\begin{matrix} V_{k} & = {(ℓ_{k} + ε_{k})}^{'} W_{k} (h_{n}) (ℓ_{k} + ε_{k}) - 2 ℓ_{k}^{'} W_{k} (h_{n}) u_{k} - 2 ε_{k}^{'} W_{k} (h_{n}) u_{k} + u_{k}^{'} W_{k} (h_{n}) u_{k} \\ = V_{1 k} - 2 V_{2 k} - 2 V_{3 k} + V_{4 k} \end{matrix}

Under

H_{0}

,

ℓ_{i k} = 0

, then

V_{k} = V_{1 k} - 2 V_{3 k} + V_{4 k}

, where

V_{1 k} = ε_{k}^{'} W_{k} (h_{n}) ε_{k}

. Following Zhao et al. [18], we decompose

n h_{n}^{p / 2} K^{1 / 2} T_{N}

in the following way:

\begin{matrix} n h_{n}^{p / 2} K^{1 / 2} T_{N} & = n h_{n}^{p / 2} {\hat{δ}}^{- 1} K^{- 1 / 2} \sum_{k = 1}^{K} V_{k} \\ = n h_{n}^{p / 2} {\hat{δ}}^{- 1} K^{- 1 / 2} \sum_{k = 1}^{K} V_{1 k} - 2 n h_{n}^{p / 2} {\hat{δ}}^{- 1} K^{- 1 / 2} \sum_{k = 1}^{K} V_{2 k} \\ - 2 n h_{n}^{p / 2} {\hat{δ}}^{- 1} K^{- 1 / 2} \sum_{k = 1}^{K} V_{3 k} + n h_{n}^{p / 2} {\hat{δ}}^{- 1} K^{- 1 / 2} \sum_{k = 1}^{K} V_{4 k} \\ = {\bar{V}}_{1} - 2 {\bar{V}}_{2} - 2 {\bar{V}}_{3} + {\bar{V}}_{4} \end{matrix}

Lemma 1.

Given Assumptions 1–5, under the null hypothesis,

δ^{2}

can be consistently estimated by

{\hat{δ}}^{2} (h_{n})

, as

n h_{n}^{p} \to \infty

,

h_{n} \to 0

,

K \to \infty

.

Lemma 2.

Given Assumptions 1–5,

1.

{\bar{V}}_{2} = O_{p} (\sqrt{n} h_{n}^{p / 2} \sqrt{E (ℓ_{N}^{2} (x))})

uniformly for

m (\cdot)

under

H_{1}

, as

h_{n} \to 0

,

n h_{n}^{p} \to \infty

,

K \to \infty

.

2.

{\bar{V}}_{3} = O_{p} (h_{n}^{p / 2} / \sqrt{K}) = o_{p} (1)

, uniformly for

m (\cdot)

under

H_{1}

and

H_{0}

, as

h_{n} \to 0

,

n h_{n}^{p} \to \infty

,

K \to \infty

.

3.

{\bar{V}}_{4} = O_{p} (h_{n}^{p / 2} / \sqrt{K}) = o_{p} (1)

, uniformly for

m (\cdot)

under

H_{1}

and

H_{0}

, as

h_{n} \to 0

,

n h_{n}^{p} \to \infty

,

K \to \infty

.

Lemma 3.

Denote

{\tilde{V}}_{1} = n h_{n}^{p / 2} K^{- 1 / 2} \sum_{k = 1}^{K} V_{1 k}

. Under Assumptions 1–6, for any

m_{N} (\cdot) \in H_{1} (κ_{N} ρ_{N})

and n large enough, where

ρ_{N} = N^{- \frac{2 s}{4 s + p}}

, we have

E ({\tilde{V}}_{1}) \leq C_{1} K^{1 / 2} n h_{n}^{p / 2} E ℓ_{N}^{2} (X), V a r ({\tilde{V}}_{1}) \leq C_{2} n h_{n}^{p} E ℓ_{N}^{2} (x) + C_{3}

When

κ_{N} ρ_{N} \geq ((Λ_{n} + \sqrt{r (P_{h_{n}})}) / (Λ_{n} - \frac{1}{\sqrt{n h_{n}^{p}}})) C_{0} L h_{n}^{s}, Λ_{n} = \frac{\sqrt{E (π_{k}^{'} P_{h_{n}} π_{k})}}{\sqrt{E (π_{k}^{'} π_{k})}}

we have

E ({\tilde{V}}_{1}) \geq C_{1} K^{1 / 2} n h_{n}^{p / 2} {((Λ_{n} - \frac{1}{\sqrt{n h_{n}^{p}}}) \sqrt{E ℓ_{N}^{2} (x)} - (Λ_{n} + \sqrt{r (P_{h_{n}})}) C_{0} L h_{n}^{s})}^{2}

Proof of Theorem 2.

We first construct a alternative

m_{N} (\cdot)

based on

{\tilde{h}}_{n}

.

m_{N} (\cdot) = g (\cdot; θ^{*}) + ℓ_{N} (\cdot)

Using the method in Guerre and Lavergne [13] to construct the alternatives

ℓ_{N} (\cdot)

, define

I_{t} = \prod_{i = 1}^{p} [t_{i} {\tilde{h}}_{n}, (t_{i} + 1) {\tilde{h}}_{n})

for

t \in Y_{n}

.

Y_{n} = {t : t = {(t_{1}, \dots, t_{p})}^{'} \in N^{p}, 0 \leq t_{i} \leq γ_{n} - 1}

Then,

I_{t} \subset {[0, 1]}^{p}

, without loss of generality, we assume that

γ_{n} = 1 / {\tilde{h}}_{n}

is an integer. Let

φ_{t} (x) = φ (\frac{x - t {\tilde{h}}_{n}}{{\tilde{h}}_{n}}), t \in Y_{n}

where

φ_{t} {(\cdot)}^{'} s

are orthogonal with disjoint supports

I_{t}

, and

φ (\cdot)

is bounded and nonnegative. Let

{B_{t}, t \in Y_{n}}

be any sequence with

| B_{t} | = 1, \forall t

,

ℓ_{N} (x) = κ_{N} ρ_{N} \sum_{t \in Y_{n}} B_{t} φ_{t} (x) = κ_{N} ρ_{N} \sum_{t \in Y_{n}} B_{t} φ (\frac{x - t {\tilde{h}}_{n}}{{\tilde{h}}_{n}})

Under Assumption 3, we have that there exists a constant C such that

E {m_{N}^{4} (X)} \leq C < \infty

and

m_{N} (\cdot) \in H_{1} (κ_{N} ρ_{N})

. Since

inf_{m (\cdot) \in H_{1} (κ_{N} ρ_{N})} P \{n h_{n}^{p / 2} K^{1 / 2} T_{N} (h_{n}) \geq z_{α}\} \leq P \{n h_{n}^{p / 2} K^{1 / 2} T_{N} (h_{n}) \geq z_{α}\}

for any

m (\cdot) \in H_{1} (κ_{N} ρ_{N})

. The main idea is that if

I {n h_{n}^{p / 2} K^{1 / 2} T_{N} (h_{n}) \geq z_{α}}

cannot be consistent against alternatives

m_{N} (\cdot)

as

κ_{N} \to \infty

, then we can conclude that it also not consistent uniformly against

m_{N} (\cdot) \in H_{1} (κ_{N} ρ_{N})

as

κ_{N} \to \infty

.

Based on Lemma 1–3, the test statistic

\begin{matrix} n h_{n}^{p / 2} K^{1 / 2} T_{N} (h_{n}) & = n h_{n}^{p / 2} \hat{δ} {(h_{n})}^{- 1} K^{- 1 / 2} \sum_{k = 1}^{K} V_{k} \\ = \hat{δ} {(h_{n})}^{- 1} {\tilde{V}}_{1} - 2 n h_{n}^{p / 2} \hat{δ} {(h_{n})}^{- 1} K^{- 1 / 2} \sum_{k = 1} V_{2 k} + o_{p} (1) \\ = O_{p} (1) E ({\tilde{V}}_{1}) + O_{p} (\sqrt{V a r ({\tilde{V}}_{1})}) - 2 {\bar{V}}_{2} + o_{p} (1) \end{matrix}

Without loss of generality, we assume that

K (\cdot)

has support

{[- 1, 1]}^{p}

in the following proofs:

(i): For

h_{n} = a_{N} {\tilde{h}}_{n}, a_{N} \to \infty

.

\begin{matrix} E ({\tilde{V}}_{1}) & = \sqrt{K} n h_{n}^{p / 2} E (V_{1 k}) \\ = \sqrt{K} n h_{n}^{p / 2} E (\frac{1}{h_{n}^{p}} K (\frac{X_{1} - X_{2}}{h_{n}}) ℓ (X_{1}) ℓ (X_{2})) \\ = \sqrt{K} n h_{n}^{p / 2} \int_{0}^{1} \int_{(0, 1) \cap (x_{2} - 1 h_{n}, x_{2} + 1 h_{n})} \frac{1}{h_{n}^{p}} K (\frac{x_{1} - x_{2}}{h_{n}}) ℓ (x_{1}) ℓ (x_{2}) f (x_{1}) f (x_{2}) d x_{1} d x_{2} \\ = \sqrt{K} n h_{n}^{p / 2} κ_{N}^{2} ρ_{N}^{2} \frac{1}{h_{n}^{p}} \sum_{t \in Y_{n}} \int_{t {\tilde{h}}_{n}}^{(t + 1) {\tilde{h}}_{n}} \int_{(t {\tilde{h}}_{n}, (t + 1) {\tilde{h}}_{n}) \cap (x_{2} - 1 h_{n}, x_{2} + 1 h_{n})} \\ K (\frac{x_{1} - x_{2}}{h_{n}}) φ (\frac{x_{1} - t {\tilde{h}}_{n}}{{\tilde{h}}_{n}}) φ (\frac{x_{2} - t {\tilde{h}}_{n}}{{\tilde{h}}_{n}}) f (x_{1}) f (x_{2}) d x_{1} d x_{2} \\ = \sqrt{K} n h_{n}^{p / 2} κ_{N}^{2} ρ_{N}^{2} \frac{1}{h_{n}^{p}} \sum_{t \in Y_{n}} \int_{t {\tilde{h}}_{n}}^{(t + 1) {\tilde{h}}_{n}} \int_{t {\tilde{h}}_{n}}^{(t + 1) {\tilde{h}}_{n}} K (\frac{x_{1} - x_{2}}{h_{n}}) \\ φ (\frac{x_{1} - t {\tilde{h}}_{n}}{{\tilde{h}}_{n}}) φ (\frac{x_{2} - t {\tilde{h}}_{n}}{{\tilde{h}}_{n}}) f (x_{1}) f (x_{2}) d x_{1} d x_{2} \\ = \sqrt{K} n h_{n}^{p / 2} κ_{N}^{2} ρ_{N}^{2} \frac{{\tilde{h}}_{n}^{2 p}}{h_{n}^{p}} \sum_{t \in Y_{n}} \int_{0}^{1} \int_{0}^{1} K (\frac{(u - v) {\tilde{h}}_{n}}{h_{n}}) φ (u) φ (v) \\ f (u {\tilde{h}}_{n} + t {\tilde{h}}_{n}) f (v {\tilde{h}}_{n} + t {\tilde{h}}_{n}) d u d v \\ = \sqrt{K} n h_{n}^{p / 2} κ_{N}^{2} ρ_{N}^{2} \frac{{\tilde{h}}_{n}^{2 p}}{h_{n}^{p}} \sum_{t \in Y_{n}} \int_{0}^{1} \int_{0}^{1} K (\frac{u - v}{a_{N}}) φ (u) φ (v) \\ f (u {\tilde{h}}_{n} + t {\tilde{h}}_{n}) f (v {\tilde{h}}_{n} + t {\tilde{h}}_{n}) d u d v \\ = \sqrt{K} n h_{n}^{p / 2} κ_{N}^{2} ρ_{N}^{2} \frac{{\tilde{h}}_{n}^{p}}{h_{n}^{p}} {({\tilde{h}}_{n} γ_{n})}^{p} {[\int_{0}^{1} φ (u) f (u {\tilde{h}}_{n} + t {\tilde{h}}_{n}) d u]}^{2} \\ = O (\frac{κ_{N}^{2}}{K^{1 / 2} a_{N}^{p / 2}}) \end{matrix}

(ii): For

h_{n} = {\tilde{h}}_{n} / a_{N}, a_{N} \to \infty

.

\begin{matrix} E ({\tilde{V}}_{1}) & = \sqrt{K} n h_{n}^{p / 2} E (V_{1 k}) \\ = \sqrt{K} n h_{n}^{p / 2} \int_{0}^{1} \int_{(0, 1) \cap (x_{2} - 1 h_{n}, x_{2} + 1 h_{n})} \frac{1}{h_{n}^{p}} K (\frac{x_{1} - x_{2}}{h_{n}}) ℓ (x_{1}) ℓ (x_{2}) f (x_{1}) f (x_{2}) d x_{1} d x_{2} \\ = \sqrt{K} n h_{n}^{p / 2} κ_{N}^{2} ρ_{N}^{2} \frac{1}{h_{n}^{p}} \sum_{t \in Y_{n}} \int_{t {\tilde{h}}_{n}}^{(t + 1) {\tilde{h}}_{n}} \int_{(t {\tilde{h}}_{n}, (t + 1) {\tilde{h}}_{n}) \cap (x_{2} - 1 h_{n}, x_{2} + 1 h_{n})} \\ K (\frac{x_{1} - x_{2}}{h_{n}}) φ (\frac{x_{1} - t {\tilde{h}}_{n}}{{\tilde{h}}_{n}}) φ (\frac{x_{2} - t {\tilde{h}}_{n}}{{\tilde{h}}_{n}}) f (x_{1}) f (x_{2}) d x_{1} d x_{2} \\ \leq \sqrt{K} n h_{n}^{p / 2} κ_{N}^{2} ρ_{N}^{2} {\tilde{h}}_{n}^{p} \sum_{t \in Y_{n}} \int_{0}^{1} \int_{- 1}^{1} K (u) φ (u \cdot \frac{h_{n}}{{\tilde{h}}_{n}} + v) φ (v) \\ f (u h_{n} + t {\tilde{h}}_{n} + v {\tilde{h}}_{n}) f (v {\tilde{h}}_{n} + t {\tilde{h}}_{n}) d u d v \\ = \sqrt{K} n h_{n}^{p / 2} κ_{N}^{2} ρ_{N}^{2} {({\tilde{h}}_{n} γ_{n})}^{p} \int_{0}^{1} \int_{- 1}^{1} K (u) φ (u \cdot \frac{h_{n}}{{\tilde{h}}_{n}} + v) φ (v) \\ f (u h_{n} + t {\tilde{h}}_{n} + v {\tilde{h}}_{n}) f (v {\tilde{h}}_{n} + t {\tilde{h}}_{n}) d u d v \\ = \sqrt{K} n h_{n}^{p / 2} κ_{N}^{2} ρ_{N}^{2} \\ = O (\frac{κ_{N}^{2}}{K^{1 / 2} a_{N}^{p / 2}}) \end{matrix}

Through tedious calculation, we can also get

\sqrt{V a r ({\tilde{V}}_{1})} = O (1)

and

{\bar{V}}_{2} = o_{p} (1)

as

h_{n} \to 0

for above two cases. Therefore, for any

a_{N}

, there exists

κ_{N}

such that

n h_{n}^{p / 2} K^{1 / 2} T_{N} (h_{n}) = O_{p} (1)

as

h_{n} \to 0

. Therefore, we cannot get

P \{n h_{n}^{p / 2} K^{1 / 2} T_{N} (h_{n}) \geq z_{α}\} \to 1

. We obtain the same conclusion for

inf_{m (\cdot) \in H_{1} (κ_{N} ρ_{N})} P \{n h_{n}^{p / 2} K^{1 / 2} T_{N} (h_{n}) \geq z_{α}\}

. Thus, the theorem is proved. □

Author Contributions

Conceptualization, Y.Z.; methodology, Y.Z.; formal analysis, P.L. and Y.Z.; investigation, Y.Z. and L.X.; writing—original draft preparation, P.L. and Y.Z.; writing—review and editing, P.L., Y.Z. and L.X.; visualization, P.L., Y.Z. and T.W.; supervision, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Natural Science Foundation of China (12201351) and the Natural Science Foundation of Shandong Province (ZR2022QA013), as well as the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (24KJB110024) the Qinglan Project of Jiangsu Province of China [2022], and the Huai’an City Science and Technology Project (HAB202357).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Zaremba, W.; Gretton, A.; Blaschko, M. B-test: A Non-parametric, Low Variance Kernel Two-sample Test. Adv. Neural Inf. Process. Syst. 2013, 26, 755–763. [Google Scholar]
Chen, X.Y.; Xie, M.G. A split-and-conquer approach for analysis of extraordinarily large data. Stat. Sin. 2014, 24, 1655–1684. [Google Scholar]
Battey, H.; Fan, J.; Liu, H.; Lu, J.; Zhu, Z. Distributed Estimation and Inference with Statistical Guarantees. arXiv 2015, arXiv:1509.05457. [Google Scholar]
Fan, J.; Han, F.; Liu, H. Challenges of Big Data analysis. Natl. Sci. Rev. 2014, 1, 293–314. [Google Scholar] [CrossRef] [PubMed]
Hardle, W.; Mammen, E. Comparing nonparametric versus parametric regression fits. Ann. Statist. 1993, 21, 1926–1947. [Google Scholar] [CrossRef]
Neumeyer, N.; Van Keilegom, I. Estimating the error distribution in nonparametric multiple regression with applications to model testing. J. Multivar. Anal. 2010, 101, 1067–1078. [Google Scholar] [CrossRef]
González-Manteiga, W.; Crujeiras, R.M. An updated review of Goodness-of-Fit tests for regression models. Test 2013, 22, 361–411. [Google Scholar] [CrossRef] [PubMed]
Delgado, M. Testing the equality of nonparametric regression curves. Stat. Probab. Lett. 1993, 17, 199–204. [Google Scholar] [CrossRef]
Bierens, H.J. A consistent conditional moment test of functional form. Econometrica 1990, 58, 1443–1458. [Google Scholar] [CrossRef]
Hart, J.D. Nonparametric Smoothing and Lack-of-Fit Tests, 1st ed.; Springer: New York, NY, USA, 1997. [Google Scholar]
Ingster, Y.I. Minimax nonparametric detection of signals in white Gaussian noise. Probl. Inf. Transm. 1982, 18, 130–140. [Google Scholar]
Ingster, Y.I. Asymptotically minimax hypothesis testing for nonparametric alternatives I, II, III. Math. Methods Stat. 1993, 2, 85–114. [Google Scholar]
Guerre, E.; Lavergne, P. Optimal minimax rates for nonparametric specification testing in regression models. Econom. Theory 2002, 18, 1139–1171. [Google Scholar] [CrossRef]
Cai, L.; Guo, X.; Zhong, W. Test and Measure for Partial Mean Dependence Based on Machine Learning Methods. J. Am. Stat. Assoc. 2024, 1–13. [Google Scholar] [CrossRef]
Tan, F.; Zhu, L. Adaptive-to-model checking for regressions with diverging number of predictors. Ann. Stat. 2019, 47, 1960–1994. [Google Scholar] [CrossRef]
Han, Y.; Ma, P.; Ren, H.; Wang, Z. Model checking in large-scale data set via structure-adaptive-sampling. Stat. Sin. 2023, 33, 303–329. [Google Scholar]
Zheng, J.X. A consistent test of functional form via nonparametric estimation techniques. J. Econom. 1996, 75, 263–289. [Google Scholar] [CrossRef]
Zhao, Y.; Zou, C.; Wang, Z. A scalable nonparametric specification testing for massive data. J. Stat. Plan. Inference 2019, 200, 161–175. [Google Scholar] [CrossRef]
Zhao, Y.; Zou, C.; Wang, Z. An adaptive lack of fit test for big data. Stat. Theory Relat. Fields 2017, 1, 59–68. [Google Scholar] [CrossRef]
Ibragimov, I.A.; Khasminski, R.Z. Statistical Estimation: Asymptotic Theory, 1st ed.; Springer: Berlin/Heidelberg, Germany, 1981. [Google Scholar]
Horowitz, J.; Spokoiny, V. An adaptive, rate-optimal test of parametric mean-regression model against a nonparametric alternative. Econometrica 2001, 69, 599–631. [Google Scholar] [CrossRef]
Guerre, E.; Lavergne, P. Data-driven rate-optimal specification testing in regression models. Ann. Stat. 2005, 33, 840–870. [Google Scholar] [CrossRef]

Figure 1. Power comparison of the MD test and DZH test based on different bandwidths under model M2 with

b = 0.8

and

K = 40

. Error term

ε

is generated from standard normal distribution.

Figure 2. Power comparison of the MD test and DZH test based on different bandwidths under model M2 with

b = 0.8

and

K = 40

. Error term

ε

is generated from standardized exponential distribution.

Figure 3. Power comparison of the MD test and DZH test based on different bandwidths under model M2 with

b = 0.8

and

K = 40

. Error term

ε

is generated from Student’s t distribution with 5 degrees of freedom.

Figure 4. Power comparison of the MD test and DZH test based on different bandwidths under model M2 with

b = 0.8

and

K = 40

. Error term

ε

is generated from standard normal distribution.

Z_{1}

and

Z_{1}

are generated from Student’s t distribution with 5 degrees of freedom.

Table 1. Empirical sizes for different values of N with

α = 1 %, 5 %, 10 %

when the error term is normal distribution.

Table 1. Empirical sizes for different values of N with

α = 1 %, 5 %, 10 %

when the error term is normal distribution.

		MD			DZH
				$h_{1}$			$h_{2}$			$h_{3}$
$N$	$α$ ∖ $K$	10	20	40	10	20	40	10	20	40	10	20	40
	1%	1.3	1.3	1.1	0.8	1.0	1.0	0.7	0.7	0.6	0.6	0.9	0.8
2000	5%	5.1	5.8	5.3	4.4	5.3	4.8	5.3	4.3	4.4	4.6	5.8	5.5
	10%	11.1	10.1	11.3	9.9	9.7	10.0	10.3	10.0	10.7	11.6	12.1	11.6
	1%	1.3	1.2	1.1	1.0	1.0	0.9	1.0	0.8	0.7	0.9	0.5	0.6
4000	5%	5.4	5.7	4.8	4.7	5.1	4.2	4.9	4.6	4.6	5.6	4.9	4.3
	10%	9.7	10.4	9.8	9.2	9.9	8.9	9.2	8.8	9.8	10.1	9.2	8.8
	1%	0.5	1.0	0.5	0.4	0.7	0.4	0.4	0.8	0.5	1.0	1.0	0.6
8000	5%	5.2	5.1	4.1	4.2	4.5	3.4	5.0	4.9	4.3	5.2	5.5	4.3
	10%	10.8	10.7	10.2	10.1	9.5	9.3	9.9	9.4	9.3	10.9	10.4	8.7

Table 2. Empirical sizes for different values of N with

α = 1 %, 5 %, 10 %

when the error term is exponential distribution.

Table 2. Empirical sizes for different values of N with

α = 1 %, 5 %, 10 %

when the error term is exponential distribution.

		MD			DZH
					$h_{1}$			$h_{2}$			$h_{3}$
$N$	$α$ ∖ $K$	10	20	40	10	20	40	10	20	40	10	20	40
	1%	1.7	2.2	1.6	1.5	1.9	1.5	0.7	0.8	1.5	1.1	0.7	1.1
2000	5%	5.7	5.5	5.5	5.2	5.1	5.0	5.8	4.9	4.9	6.2	5.1	5.1
	10%	10.6	10.8	9.9	10.0	9.9	9.6	12.1	10.5	9.7	11.5	11.0	9.8
	1%	1.3	0.8	0.7	1.2	0.5	0.7	1.0	0.5	1.2	1.2	0.6	0.6
4000	5%	5.8	5.2	4.8	4.9	4.6	4.3	5.1	4.9	4.6	4.8	4.9	3.8
	10%	11.3	9.7	10.4	10.3	9.0	9.8	8.9	9.4	9.3	8.9	9.8	8.6
	1%	1.0	1.1	1.0	0.9	1.0	0.8	1.4	1.2	0.5	1.1	1.3	0.6
8000	5%	4.9	5.5	3.4	4.3	4.5	2.9	4.5	4.6	2.9	4.7	5.4	4.1
	10%	10.1	10.3	8.1	9.3	9.5	7.4	8.3	9.4	8.3	9.0	9.3	9.8

Table 3. Empirical sizes for different values of N with

α = 1 %, 5 %, 10 %

when the error term is Student’s t distribution.

Table 3. Empirical sizes for different values of N with

α = 1 %, 5 %, 10 %

when the error term is Student’s t distribution.

		MD			DZH
					$h_{1}$			$h_{2}$			$h_{3}$
$N$	$α$ ∖ $K$	10	20	40	10	20	40	10	20	40	10	20	40
	1%	1.1	0.7	1.1	1.1	0.6	0.7	0.6	0.6	0.4	0.7	0.6	0.6
2000	5%	4.8	4.7	4.7	4.3	4.2	4.4	4.0	4.0	4.9	4.3	3.8	4.8
	10%	9.9	9.8	10.4	9.2	9.1	9.2	8.0	8.6	10.1	10.4	8.8	9.0
	1%	1.2	1.6	1.5	1.1	1.1	1.2	1.0	1.3	1.1	0.8	1.1	0.6
4000	5%	5.3	5.6	5.8	4.9	5.1	5.2	5.0	5.5	5.5	5.2	5.4	5.2
	10%	10.0	10.4	10.5	9.0	9.5	9.5	10.2	10.4	9.1	11.6	9.3	9.0
	1%	1.2	1.0	1.5	1.0	0.9	1.2	0.5	0.5	1.3	0.6	0.9	1.1
8000	5%	5.2	3.7	5.2	4.5	3.2	4.5	3.4	3.4	4.8	4.7	4.4	5.0
	10%	9.1	8.8	9.9	8.3	7.9	9.1	8.5	8.6	10.1	9.9	10.1	11.4

Table 4. Empirical sizes for different values of N with

α = 1 %, 5 %, 10 %

when the

Z_{1}

and

Z_{2}

are generated from Student’s t(5) distribution.

Table 4. Empirical sizes for different values of N with

α = 1 %, 5 %, 10 %

when the

Z_{1}

and

Z_{2}

are generated from Student’s t(5) distribution.

	MD				DZH
				$h_{1}$			$h_{2}$			$h_{3}$
$N$	$α$ ∖ $K$	10	20	40	10	20	40	10	20	40	10	20	40
	1%	1.2	1.3	1.7	0.9	1.1	1.6	1.5	0.8	1.1	1.0	1.1	1.1
2000	5%	6.0	6.6	6.0	5.6	6.1	5.3	5.0	5.8	5.9	5.3	5.9	5.7
	10%	10.9	10.8	11.5	10.0	10.3	10.5	10.2	10.4	10.2	10.9	10.9	11.0
	1%	0.9	0.5	1.1	0.7	0.5	0.8	0.9	1.0	1.4	1.2	1.3	1.1
4000	5%	5.7	4.9	4.9	4.9	4.3	4.6	4.8	4.2	4.5	5.4	5.4	5.3
	10%	11.3	10.4	9.9	9.6	9.8	8.6	10.1	9.2	9.6	10.8	9.8	10.1
	1%	0.9	1.8	0.5	0.8	1.2	0.3	0.7	1.1	1.1	0.8	1.0	0.6
8000	5%	4.6	5.6	4.6	4.0	4.7	4.3	3.9	5.9	4.3	4.3	4.2	5.5
	10%	9.8	11.7	9.4	9.1	11.1	8.8	9.1	10.8	8.7	8.8	9.4	10.5

Table 5. Empirical power of MD and DZH with

α = 5 %

when the error term is normal distribution.

Table 5. Empirical power of MD and DZH with

α = 5 %

when the error term is normal distribution.

	MD				DZH
				$h_{1}$			$h_{2}$			$h_{3}$
$N$	b∖ $K$	10	20	40	10	20	40	10	20	40	10	20	40
2000	1	83.4	60.9	36.8	81.9	59	35.5	45.1	25.9	14.5	17.3	10.9	7.4
2000	10	100	100	100	18.2	12.4	8.4	100	100	100	100	100	100
4000	1	100	99	88.9	100	98.9	88.3	94.3	74.4	46.5	48.5	27	15.7
4000	10	100	100	100	62.8	36.6	20	100	100	100	100	100	100
8000	1	100	100	100	100	100	100	100	99.9	94.9	96.3	75.9	48.4
8000	10	100	100	100	100	93.5	67.1	100	100	100	100	100	100

Table 6. Empirical power of MD and DZH with

α = 5 %

when the error term is exponential distribution.

Table 6. Empirical power of MD and DZH with

α = 5 %

when the error term is exponential distribution.

	MD			DZH
				$h_{1}$			$h_{2}$			$h_{3}$
$N$	b∖ $K$	10	20	40	10	20	40	10	20	40	10	20	40
2000	1	82.7	61.2	38.2	82.4	59.9	36.7	44.4	26.5	16.1	15.8	9.6	5.6
2000	10	100	100	100	19.1	11.8	7.5	100	100	99.9	100	100	100
4000	1	99.8	98.5	89.7	99.8	98.4	88.9	94.9	72	44.1	46.9	25.2	13.3
4000	10	100	100	100	66.3	34.2	19.1	100	100	100	100	100	100
8000	1	100	100	100	100	100	100	100	99.7	96.1	97	78.8	50.6
8000	10	100	100	100	99.8	94.1	66.4	100	100	100	100	100	100

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Optimal Minimax Rate of Smoothing Parameter in Distributed Nonparametric Specification Test

Abstract

1. Introduction

2. The Divide-And-Conquer-Based Test Statistics

3. The Unique Rate of the Smoothing Parameter Ensuring Rate-Optimality in the DZH Test

4. Simulation Studies

5. Proofs

Some Lemmas

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics