An Adaptive-to-Model Test for Parametric Functional Single-Index Model

Xia, Lili; Lai, Tingyu; Zhang, Zhongzhan

doi:10.3390/math11081812

Open AccessArticle

An Adaptive-to-Model Test for Parametric Functional Single-Index Model

by

Lili Xia

¹

,

Tingyu Lai

² and

Zhongzhan Zhang

^1,*

¹

Faculty of Science, Beijing University of Technology, Beijing 100124, China

²

School of Mathematics and Statistics, Guangxi Normal University, Guilin 541004, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(8), 1812; https://doi.org/10.3390/math11081812

Submission received: 6 March 2023 / Revised: 2 April 2023 / Accepted: 6 April 2023 / Published: 11 April 2023

Download

Browse Figures

Versions Notes

Abstract

:

Model checking methods based on non-parametric estimation are widely used because of their tractable limiting null distributions and being sensitive to high-frequency oscillation alternative models. However, this kind of test suffers from the curse of dimensionality, resulting in slow convergence, especially for functional data with infinite dimensional features. In this paper, we propose an adaptive-to-model test for a parametric functional single-index model by using the orthogonality of residual and its conditional expectation. The test achieves model adaptation by sufficient dimension reduction which utilizes functional sliced inverse regression. This test procedure can be easily extended to other non-parametric test methods. Under certain conditions, we prove the asymptotic properties of the test statistic under the null hypothesis, fixed alternative hypothesis and local alternative hypothesis. Simulations show that our test has better performance than the method that does not use functional sufficient dimension reduction. An analysis of COVID-19 data verifies our conclusion.

Keywords:

functional data; model checking; sufficient dimension reduction; sliced inverse regression; single-index model

MSC:

62R10; 62P10

1. Introduction

Consider the functional single-index regression model

\begin{matrix} Y = g (〈 β_{0}, X 〉) + ε, \end{matrix}

(1)

where Y is a scalar response, X is a predictor function in a functional space

H

with an inner product denoted by

〈 \cdot, \cdot 〉

,

β_{0} \in H

is an unknown functional index,

∥ β_{0} ∥ = 1

,

ε

is a scalar random noise and

E (ε | X) = 0

,

g (\cdot)

is a unknown square integrable continuous function. If the model (1) is correctly specified, Refs. [1,2,3,4] proposed consistent estimates of

β_{0}

and

g (\cdot)

in different ways. However, compared to non-parametric regression models without specifying the model form of

E (Y | X)

, parametric functional single-index models are estimated more accurately, especially for functional data with infinite dimension. On the other hand, a wrongly specified model structure would lead to misleading statistical analysis. Therefore, we develop a goodness-of-fit test for model (1) with a given link function

g (\cdot)

. In some cases, the presence of outliers may affect the quality of model fitting, and it is necessary to conduct an outlier test to determine whether outliers exist in the data and then perform the goodness-of-fit test based on the data with outliers removed. The method for the outlier test can be referred to [5], while the primary focus of this paper is on the goodness-of-fit test of model (1).

The special case of the model (1) in which g is linear has been extensively studied. Among them, the research on hypothesis testing has made great progress. For example, testing the nullity of the slope parameter was studied in [6,7,8], and testing whether the conditional expectation of the response given the covariate is almost surely zero or not was considered in [9,10,11]. The model checking method relies on linear model assumptions and is not flexible enough in practical applications. Thus, the proposed model-adaptive approach in this paper overcomes these limitations and offers a more flexible and reliable solution for model checking in parametric functional single-index models. In terms of model testing, the two widely used tests are global smoothing methods based on empirical processes (see [12,13]) and local smoothing methods based on non-parametric estimation (see [9,10,11]). Similar to the conclusion summarized by [14] on vector space, each of these two kinds of tests has advantages and disadvantages. For example, as summarized in [15], the global smoothing methods converge to the limiting distribution at the rate of order

O (n^{- 1 / 2})

, which is the faster rate that the existing methods can reach. However, they are insensitive to high-frequency oscillation alternative models and often require the use of resampling techniques to obtain critical values. In contrast, as summarized in [14,16], the local smoothing methods have high power in detecting high-frequency oscillation alternative models, but these methods converge to the limiting distribution at the rate of order

O (n^{- 1 / 2} h^{- p / 4})

, where

p = d i m (X)

. Thus, the local smoothing methods suffer from the curse of dimensionality due to the use of multivariate non-parametric estimation, especially for functional data with infinite dimensional features. To overcome this shortcoming, most of the local smoothing methods mentioned above are based on the semi-metric of the function space to compute univariate non-parametric kernel functions (see [11]), which requires functional principal component analysis for dimensionality reduction. The other method is based on projection technology (see [9,10,17]), which is more complicated for choosing the projection directions. In this paper, we employ sufficient dimension reduction to construct a model adaptive test that achieves a convergence rate of

O (n^{- 1 / 2} h^{- 1 / 4})

, which is faster compared to that of the locally smooth method when the dimensionality is reduced to 1 under the null hypothesis.

Our aim is to test the parametric model (1) against a parsimonious alternative model class through functional sufficient dimension reduction (FSDR) [18]. We consider the alternative model:

\begin{matrix} Y = G (〈 B, X 〉) + ε, \end{matrix}

(2)

where

B = (β_{1}, \dots, β_{K})

is composed of K linearly independent functions in

H

and column vectors

〈 B, X 〉 = {(〈 β_{1}, X 〉, \dots, 〈 β_{K}, X 〉)}^{⊤}

is a column vector,

G (\cdot)

is a function from

R^{K} \to R

and

ε

is a scalar error independent of X. The FSDR method is based on the above functional multiple index model. Since the FSDR method can effectively use the information between the functional predictor and the response variable, it is widely used in regression models. In the FSDR literature, the subspace spanned by

β_{1}, \dots, β_{K}

is called the functional sufficient dimension reduction subspace, which is first estimated by functional sliced inverse regression (FSIR) in [19]. Then, functional inverse regression [20], functional K-means inverse regression [21] and functional sliced average variance estimation [18,22] were created. For the test problem studied in this paper, when

K > 1

, the model (2) is a functional multiple-index model; when

K = 1

and

B = β_{0} / ∥ β_{0} ∥

, model (2) reduces to the functional single-index model in (1). Therefore, we need construct a test statistic which can adapt to the related model structure through estimating

B

and its dimension K. In this way, we can achieve that under a null hypothesis, the estimate of

B

is consistent to

β_{1}

, and under an alternative hypothesis, it is consistent to a multiple-index functional

(β_{1}, \dots, β_{K})

. To achieve this goal, the most critical step is to determine K. We use the modified BIC criterion introduced in [14] to determine K.

This paper is organized as follows. In Section 2, we construct a test statistic based on the orthogonality of residual and its conditional expectation, which is the extension of the model testing method for vector data in [23] to a functional model. In Section 3, we describe in detail the procedure of identifying and estimating

B

and its dimension K using the FSIR method. The asymptotic properties under null, alternative and local alternative hypotheses are given in Section 4. Simulation results and a real data analysis are reported in Section 5. The corresponding conclusions and discussions are given in Section 6. All proofs are given in Appendix A.

2. The Test Statistic

To verify the validity of model (1), we should develop an appropriate model checking method. Therefore, for a given

g (\cdot)

, we consider the following null hypothesis

\begin{matrix} H_{0} : \exists β_{0} \in H, s u c h t h a t E [Y | X] = g (〈 β_{0}, X 〉), \end{matrix}

(3)

and for any K linearly independent function set

B = (β_{1}, \dots, β_{K})

, under model (2), the alternative hypothesis is

\begin{matrix} H_{1} : sup_{β \in H} P (E [Y | X] = E [Y | 〈 B, X 〉] = g (〈 β, X 〉)) < 1 . \end{matrix}

(4)

Without loss of generality, we take

H = L^{2} ([0, 1])

. It is worth noting that under

H_{0}

,

K = 1

and

B = c_{1} β_{0}

for some constant

c_{1}

; under

H_{1}

,

K \geq 1

. Similar to the interpretation of vector spaces introduced in Stute and Zhu [24], the test statistic based on

E [Y - g (〈 β_{0}, X 〉) | 〈 β_{0}, X 〉]

is directional, while the test statistic based on

E [Y - g (〈 β_{0}, X 〉) | 〈 B, X 〉]

is omnibus. For example, we consider the alternative model:

Y = 〈 β_{0}, X 〉 + sin (〈 β_{1}, X 〉) + ε

, where

β_{0}

and

β_{1}

are orthogonal functions. When

E [sin (〈 β_{1}, X 〉)] = 0

, then

E [Y - g (〈 β_{0}, X 〉) | 〈 β_{0}, X 〉] = 0

providing that

〈 β_{0}, X 〉

is independent of

〈 β_{1}, X 〉

and X is a Gaussian process with

E (X) = 0

; similar examples can be found in [14]. Thus, the test based on

E [Y - g (〈 β_{0}, X 〉) | 〈 β_{0}, X 〉]

cannot detect the above alternative. Therefore, we can construct a test statistic based on

E [Y - g (〈 β_{0}, X 〉) | 〈 B, X 〉]

.

Define

ε = Y - g (〈 X, β 〉)

,

E (ε | 〈 B, X 〉) = 0

almost surely under

H_{0}

and

P (E (ε | 〈 B, X 〉) = G (〈 B, X 〉) - g (〈 β_{0}, X 〉) \neq 0) > 0

under

H_{1}

. Thus, it is natural to construct a test statistic based on the orthogonality between model’s residuals

ε

and a non-parametric estimation of

E (ε | 〈 B, X 〉)

. Thus, we construct a test statistic based on

E {ε E (ε | 〈 B, X 〉)} = E {E^{2} (ε | 〈 B, X 〉)}

.

Given the sample

{X_{i}, Y_{i}}_{i = 1}^{n}

and

\hat{B}

with its structural dimension

\hat{K}

(the estimation procedure for

\hat{B}

and

\hat{K}

will be introduced in Section 3), the kernel regression estimator of

E (ε_{i} | 〈 B, X_{i} 〉)

is

\begin{matrix} \hat{E} (ε_{i} | 〈 \hat{B}, X_{i} 〉) = \frac{\sum_{j \neq i}^{n} ε_{j} K_{h} (〈 \hat{B}, X_{i} - X_{j} 〉)}{\sum_{j \neq i}^{n} K_{h} (〈 \hat{B}, X_{i} - X_{j} 〉)}, \end{matrix}

where

K_{h} (\cdot) = K (\cdot / h) / h^{\hat{K}}

is a

\hat{K}

-dimensional kernel function, and h is a bandwidth.

When

g (\cdot)

is known, similar to the estimation methods in [25] (Refs. [26,27,28]’s methods are also available), it is easy to obtain a consistent estimator of

β

, which is denoted as

\hat{β}

. Define

{\hat{ε}}_{i} \equiv Y_{i} - g (〈 \hat{β}, X_{i} 〉)

. Next, to simplify the formula, let

\hat{ε} = {({\hat{ε}}_{1}, \dots, {\hat{ε}}_{n})}^{⊤}

,

\hat{U}

be the

n \times n

diagonal matrix with

(i i)

th element

{\hat{ε}}_{i}

,

{\hat{w}}_{i j} \equiv K_{h} (〈 \hat{B}, X_{i} - X_{j} 〉) / \sum_{l \neq i}^{n} K_{h} (〈 \hat{B}, X_{i} - X_{l} 〉)

and

{\hat{w}}_{i i} = 0

,

{\hat{W}}_{n} = {{\hat{w}}_{i j}}

be the

n \times n

matrix with

(i, j)

th element

{\hat{w}}_{i j}

, where

\hat{B}

is an FSDR estimate by [18] in following Section 3. Then, by Proposition 1 and Corollary 1 in [23], the finite sample corrected and standardized quadratic statistic is

\begin{matrix} T_{n}^{F S I R} = \frac{{\hat{ε}}^{⊤} {\hat{W}}_{n}^{s} \hat{ε}}{\sqrt{2} s (\hat{U} {\hat{W}}_{n}^{s} \hat{U})} + \frac{\hat{K}}{\sqrt{2} s ({\hat{W}}_{n}^{s})}, \end{matrix}

(5)

where

{\hat{W}}_{n}^{s} = ({\hat{W}}_{n} + {\hat{W}}_{n}^{⊤}) / 2

which satisfies the symmetry condition required in Corollary 1 of [23],

s (A) = {(\sum_{i, j} a_{i j}^{2})}^{1 / 2}

is the Frobenius norm of matrix

A = (a_{i j})

. We call

\frac{\hat{K}}{\sqrt{2} s ({\hat{W}}_{n}^{s})}

the finite sample correction term (FSC). When

T_{n}^{F S I R}

is large enough, the null hypothesis is rejected.

Remark 1.

This paper mainly extends the problem considered in [14] to the functional data model. In addition to this, there is a clear difference in the construction of the test statistic. For example, [14] constructed the test statistic based on

E {ε E (ε | B^{⊤} X) ω (B^{⊤} X)}

for a positive weight

ω (B^{⊤} X)

. When

ω (B^{⊤} X)

was the denominator of the non-parametric estimate of

E (ε | B^{⊤} X)

, they obtained a simple statistic. Thus,

T_{n}^{F S I R}

is different from the test of [14] even without the standardization and the bias correction.

3. Identification and Estimation of $B$ and $K$

Functional sufficient dimension reduction (FSDR) is concerned with the situations where the distribution of Y given X depends on X through a set of linear combinations of X; that is, there is a minimal set of members

B = (β_{1}, \dots, β_{K})

in

H

, such that

Y ⊥ X | 〈 B, X 〉

. In the applications where the conditional mean is of main interest, we can formulate FSDR through conditional mean as follows

\begin{matrix} Y ⊥ E (Y | X) | 〈 B, X 〉, \end{matrix}

where ⊥ stands for the conditional independence of Y and

E (Y | X)

given

〈 B, X 〉

. This is a fundamental assumption for functional sufficient dimension reduction [19,22,29]. It is worth noting that

Y ⊥ E (Y | X) | 〈 B, X 〉

is equivalent to

E (Y | X) = E (Y | 〈 B, X 〉)

by Proposition 8.1 in [29]. For any

K \times K

orthogonal matrix

D

, the above relation is unchanged if we replace

B

by

D B

, since

E (Y | 〈 B, X 〉) = E (Y | D^{⊤} 〈 B, X 〉)

. In fact, the identifiable parameter is the space spanned by the columns of

B

.

We denote the column space of

B

by

s p a n (B)

.

S_{E [Y | X]}

is defined as the central mean subspace and is the intersection of all subspace

s p a n (B)

such that

Y ⊥ E (Y | X) | 〈 B, X 〉

.

S_{Y | X}

is defined as the central subspace and is the intersection of all subspace

s p a n (B)

of minimal dimension such that

Y ⊥ X | 〈 B, X 〉

. Most of the FSDR methods are looking for central subspace

S_{Y | X}

, such as the FSIR method in [19], FSAVE method in [22], and FSDR via distance covariance in [30]. As proposed in [29] Theorem 8.1,

S_{E (Y | X)}

is a subspace of

S_{Y | X}

. To ensure that

S_{Y | X} = S_{E (Y | X)}

for

ε

in model (2), we assume that

ε = G_{1} (〈 B, X 〉) \tilde{ε}

, where

G_{1} (\cdot)

is an unknown smooth function and

\tilde{ε} ⊥ X

.

ε ⊥ X

is a special case. Next, we uniformly use

S_{Y | X}

to represent the FSDR space spanned by

B

. Here, we only introduce a popular FSDR method, functional sliced inverse regression (FSIR) which refers to [18], as an example. Of course, other FSDR methods are also applicable.

3.1. A Brief Review on FSIR

Most of the FSDR methods are based on the functional multiple index model (2). As introduced in following Proposition 1, with the FSDR technique, we can achieve with probability tending to 1

\hat{K} = 1

under

H_{0}

;

\hat{K} \geq 1

under

H_{1}

. In the following, we will introduce the specific dimensionality reduction technique of FSIR.

Given

(X, Y)

, by Karhunen–Loéve expansion of X, we have

\begin{matrix} X = \sum_{j = 1}^{\infty} ξ_{j} ϕ_{j}, \end{matrix}

where

E ξ_{j}^{2} = γ_{j}

,

γ_{i}

and

ϕ_{j}

are the eigenvalues and eigenfunctions, respectively, of the covariance operator of X, that is

Γ = V a r (X) = E (X \otimes X)

, where for

x, y \in L^{2} ([0, 1])

,

x \otimes y : L^{2} ([0, 1]) \to L^{2} ([0, 1])

is defined by

(x \otimes y) (z) = 〈 x, z 〉 y

. Define

Γ S_{Y | X}

as the space spanned by

Γ β_{1}, \dots, Γ β_{K}

. In addition, we write

Γ - V a r (X | Y) \in Γ S_{Y | X}

when

(Γ - V a r (X | Y)) β \in Γ S_{Y | X}

for all

β \in L^{2} ([0, 1])

. To ensure that the estimation of FSIR belongs to the space of

Γ S_{Y | X}

, we need the following Lemma (refer to Theorem 1 in [18]).

Lemma 1.

(a) Suppose for all

b \in L^{2} ([0, 1])

, the conditional expectation is a linear condition

E {〈 b, X 〉 |

〈 B, X 〉} = c_{0} + \sum_{i = 1}^{K} c_{i} 〈 β_{i}, X 〉

for constants

c_{0}, \dots, c_{K}

, then

E (X | Y) \in Γ S_{Y | X}

; (b) If in addition

V a r (X | 〈 B, X 〉)

is nonrandom,

Γ - V a r (X | Y) \in Γ S_{Y | X}

.

The conditions in (a) and (b) hold when X is a Gaussian process, although Gaussianity is not necessary. Furthermore, there is no requirement for specific distribution forms in the following theoretical study. In the simulation of Section 4, we will use Gaussian and other continuous distributions.

Since for functional data, the operator

Γ^{- 1}

is not bounded, we replace

Γ^{- 1}

by

{(Γ + ρ I)}^{- 1}

, where

ρ \to 0

as

n \to \infty

. Then, the FSDR space, which is obtained by FSIR, is the space spanned by the top eigenvectors of

\begin{matrix} {(Γ + ρ I)}^{- 1} V a r (E [X | Y]) . \end{matrix}

Given independent and identically distributed samples

{(X_{i}, Y_{i})}_{i = 1}^{n}

, referring to [29]’s steps in Section 3 on vector data, we summarize the estimation procedure of functional data as the following algorithm (Algorithm 1).

Algorithm 1: Estimation of

B

step 1. Compute the sample mean and sample variance functions

\bar{X} (t) = \frac{1}{n} \sum_{i = 1}^{n} X_{i} (t)

and

{\hat{Γ}}_{n} = \frac{1}{n} \sum_{i = 1}^{n} [X_{i} (t) - \bar{X} (t)] [X_{i} (t^{'}) - \bar{X} (t^{'})]

, where

t, t^{'} \in [0, 1]

, then, define

Z_{i} (t) = X_{i} (t) - \bar{X} (t)

,

i = 1, \dots, n

.

step 2. Discretize Y: let

{(I_{\dot{s}})}_{\dot{s} = 1}^{\dot{S}}

be slice intervals in

[min (Y), max (Y)]

, define

p_{\dot{s}} = P (Y \in I_{\dot{s}})

and

e_{\dot{s}} = E (Z | Y \in I_{\dot{s}})

,

v_{\dot{s}} = V a r (Z | Y \in I_{\dot{s}})

, then,

{\hat{p}}_{\dot{s}} = E_{n} [I (Y \in I_{\dot{s}})]

,

{\hat{e}}_{\dot{s}} = \sum_{i = 1}^{n} Z_{i} (t) I (Y_{i} \in I_{\dot{s}}) / n_{\dot{s}}

, where

E_{n} (\cdot)

is the empirical distribution function,

n_{s}

is the number of elements that satisfy

Y_{i} \in I_{\dot{s}}

.

step 3. Approximate

V a r [E (Z | Y)]

by

{\hat{Γ}}_{\dot{S}}^{F S I R} = \sum_{\dot{s} = 1}^{\dot{S}} {\hat{p}}_{\dot{s}} {\hat{e}}_{\dot{s}} \otimes {\hat{e}}_{\dot{s}} / \dot{S}

.

step 4. Let

{\hat{β}}_{1}, \dots, {\hat{β}}_{r}

and

{\hat{λ}}_{1}, \dots, {\hat{λ}}_{r}

be the top r eigenvectors and eigenvalues of

{({\hat{Γ}}_{n} + ρ I)}^{- 1} {\hat{Γ}}_{\dot{S}}^{F S I R}

, where

r \geq K

and

{\hat{λ}}_{1} \geq {\hat{λ}}_{2} \geq \dots \geq {\hat{λ}}_{r} \geq 0

. The sufficient predictors are

〈 {\hat{β}}_{1}, Z 〉, \dots, 〈 {\hat{β}}_{r}, Z 〉 .

Remark 2.

The way of discretization in step 2 may affect the outcome of Algorithm 1. A lot of discussions on the discretization were carried out, and some methods were recommended; see [18,31,32], among others.

3.2. Estimating the Structural Dimension $\hat{K}$

There are several methods to estimate the structural dimension K of

S_{Y | X}

for vector data. Ref. [33] introduced a Bayesian information criterion (BIC), Refs. [34,35] showed a modified BIC method, and Refs. [36,37] used a bootstrap method. Through simulation, we find that [34]’s method verifies the conclusion of Proposition 1 better. That is,

\begin{matrix} D (k) = \frac{n}{2} \frac{\sum_{l = 1}^{k} {log ({\hat{λ}}_{l} + 1) - {\hat{λ}}_{l}}}{\sum_{l = 1}^{r} {log ({\hat{λ}}_{l} + 1) - {\hat{λ}}_{l}}} - 2 C_{n} \frac{k (k + 1)}{2 r} . \end{matrix}

(6)

where

{\hat{λ}}_{1} \geq {\hat{λ}}_{2} \geq \dots \dots \geq {\hat{λ}}_{r} \geq 0

are eigenvalues of

{({\hat{Γ}}_{n} + ρ I)}^{- 1} {\hat{Γ}}_{\dot{S}}^{F S I R}

and

r \geq K

.

We maximize

D (k)

over

k = 1, \dots, r

, and denote the maximizer as

\hat{K}

. Then, if

C_{n} / n \to 0

and

C_{n} \to \infty

, the estimate

\hat{K}

by (6) satisfies:

P (\hat{K} = 1) \to 1

under

H_{0}

;

P (\hat{K} = K \geq 1) \to 1

under

H_{1}

; refer to Theorem 4 in [34]. The choice of

C_{n} = \sqrt{n}

can ensure the consistency of the estimators under hypotheses

H_{0}

and

H_{1}

.

4. Asymptotic Properties

Define semi-metric

d_{β} (x_{1}, x_{2}) = | 〈 β, x_{1} - x_{2} 〉 |

. The following assumptions are used to prove the asymptotic properties of the test statistic.

Assumption 1.

{E ∥ X ∥}^{4} < \infty

, Γ has eigenvalues

λ_{1} > λ_{2} > \dots > 0

, and

E ξ_{j}^{4} \leq C λ_{j}^{2}

.

Assumption 2.

For some constants

a, b

that satisfy

a > 1

and

a - 1 / 2 < b

,

j^{- a} \leq C λ_{j}

,

| ξ_{k j}^{*} | \leq C j^{- b}

, for some

C > 0

, where

ξ_{k j}^{*} = 〈 β_{k}, ϕ_{j} 〉, j = 1, 2, \dots

are the generalized Fourier coefficients of

β_{k}, 1 \leq k \leq K

.

Assumption 3.

The regularization parameter is chosen as

ρ \sim n^{- a / (a + 2 b)}

.

Assumption 4.

h = h_{n}

is a sequence of positive numbers satisfying

C_{2} n^{- τ_{2}} < h < C_{1} n^{- τ_{1}}

with

0 < τ_{1} < τ_{2} < 1

.

Assumption 5.

K is an asymmetrical strictly dereasing kernel in

[0, 1]

such that

\exists c_{1}, c_{2} > 0, c_{1} I_{[0, 1]} (t) < K (t) < c_{2} I_{[0, 1]} (t)

for

\forall t \in R

.

Assumption 6.

(1)

C

is a compact subset of

H

such that

X \in C

, a.s.; (2)

\forall β \in Θ_{n}

such that

c a r d Θ_{n} = n^{α}

with

α > 0

, we have

P (d_{β} (X_{1}, X_{2}) < h | X_{1}) = C_{X_{1}, β} h + o (h)

, a.s., where

X_{1}

and

X_{2}

are independent copies of X; (3)

0 < inf_{β \in Θ_{n}} C_{X_{1}, β} \leq sup_{β \in Θ_{n}} C_{X_{1}, β} < \infty

, a.s.

Assumption 7.

Let

g_{β} (\cdot) = E [Y | 〈 β, \cdot 〉]

.

g_{β}

satisfies, for

\forall β

, the Holder-type condition:

\exists C > 0,

\exists b_{1} > 0, \forall (x_{1}, x_{2}) \in C^{2}, | g_{β} (x_{1}) - g_{β} (x_{2}) | \leq C d_{β} {(x_{1}, x_{2})}^{b_{1}}

.

Assumption 8.

For any positive integer

c_{3}

,

{E (| Y |}^{c_{3}} | X) \leq C_{c_{3}, X} < \infty

, a.s. and

\exists ϱ > 0

,

E (Y^{2} | X = x) = σ (x) \geq ϱ

, with

σ (\cdot)

continuous on

C

.

Assumption 9.

The density

f (〈 B, X 〉)

of

〈 B, X 〉

on its support

C_{1}

exists and has two bounded derivatives and satisfies

0 < {inf}_{〈 B, X 〉 \in C_{1}} f (〈 B, X 〉) \leq {sup}_{〈 B, X 〉 \in C_{1}} f (〈 B, X 〉) < \infty

.

Remark 3.

Assumptions 1–3 are used to obtain

∥ {\hat{β}}_{j} - β_{j} ∥^{2} = O_{p} (n^{- (2 b - 1) / (a + 2 b)}) \equiv O_{p} (r_{n}^{2}),

j = 1, \dots, K

and

r_{n} = n^{- \frac{2 b - 1}{2 (a + 2 b)}}

, which was shown in [18]. By choosing the regularization parameter as

ρ \sim n^{- a / (a + 2 b)}

in Assumption 3, we aim to balance the bias and variance of the estimator and avoid overfitting. The choice of ρ in this form is motivated by the theoretical analysis of the convergence rates of the FSIR methods. Assumptions 4–8 mainly come from [1], which are necessary assumptions to obtain the following Theorem 1. Assumption 9 is needed for the asymptotic normality of our statistic.

For a given

g (\cdot)

,

{\hat{ε}}_{i} = Y_{i} - g (〈 \hat{β}, X_{i} 〉)

. Similar to the proof of [25], under Assumptions 1–3 and

H_{0}

, we have

∥ \hat{β} {- β ∥}^{2} = O_{p} (r_{n}^{2})

for least squares estimate (or penalized likelihood estimation in [38,39], but other assumptions may be needed). Therefore, the limiting behavior of the test statistic can be obtained.

Theorem 1.

Given Assumptions 1–9, if

n h^{1 / 2} r_{n}^{2} \to 0

and

r_{n} / h \to 0

as

n \to \infty

, then, under

H_{0}

, we have

T_{n}^{F S I R} \overset{d}{⟶} N (0, 1)

Proof.

See Appendix A (Part I). □

Remark 4.

It should be noted that although theoretically, we find that

F S C \overset{p}{⟶} 0

as

n \to \infty

, through simulation, we found that FSC can improve the power to some degree for finite samples.

We now investigate the power performance of the test statistic

T_{n}

under the fixed alternatives model (2). Suppose that under

H_{1}

, there exists a unique function

θ \in L^{2} ([0, 1])

satisfying

θ = a r g m i n_{β} E {(Y - g (〈 β, X 〉))}^{2}

; then,

{\hat{ε}}_{i} = Y - g (〈 θ, X_{i} 〉) - {g (〈 \hat{β}, X_{i} 〉) - g (〈 θ, X_{i} 〉)}

.

Theorem 2.

Under the alternative model (2) and Assumptions 1–9, it holds that

∥ \hat{β} - θ ∥ = O_{p} (r_{n})

and

T_{n}^{F S I R} \to \infty .

Proof.

See Appendix A (Part II). □

Next, we consider the following sequence of the local alternative model as

\begin{matrix} H_{1 n} : Y = g (〈 β_{0}, X 〉) + C_{1 n} G (〈 B, X 〉) + ε, \end{matrix}

(7)

where

C_{1 n}

goes to zero,

G (\cdot)

satisfies

E [G^{2} (〈 B, X 〉)] < \infty

and

{inf}_{β \in L^{2} ([0, 1])} E {[G (〈 B, X 〉) - g (〈 β, X 〉)]}^{2} > 0

.

In order to obtain the main results about the power behaviors under

H_{1 n}

of (7), we need to show the asymptotic property of

\hat{K}

when

C_{1 n} \to 0

.

Proposition 1.

Under the local alternative hypothesis

H_{1 n}

in (7) and Assumptions 1–3, when

C_{n} h^{1 / 2} \to \infty

(

C_{n}

was introduced in (6)), if

C_{1 n} = O (n^{- 1 / 2} h^{- 1 / 4})

, we have

P (\hat{K} = 1) \to 1

.

Proof.

See Appendix A (Part III). □

We now present the power performance of our tests under the local alternative hypothesis. Before this, the following condition needs to be met.

Similar to the parameter estimation method under

H_{0}

, we can obtain the parameter estimate

\hat{β}

with observations

{X_{i}, Y_{i}}_{i = 1}^{n}

under

H_{1 n}

. Then, we have

{\hat{ε}}_{i} = ε_{i} + C_{1 n} G (〈 B, X 〉) - {g (〈 \hat{β}, X_{i} 〉) - g (〈 β_{0}, X_{i} 〉)}

.

Theorem 3.

If Assumptions 1–9 hold,

n h^{1 / 2} r_{n}^{2} \to 0

and

r_{n} / h \to 0

as

n \to \infty

. Then, under

H_{1 n}

, if

C_{1 n} = n^{- 1 / 2} h^{- 1 / 4}

and

∥ \hat{β} - β_{0} ∥ = O_{p} (r_{n})

, we have

T_{n}^{F S I R} \overset{d}{⟶} N (ω, 1),

where

\frac{E [G^{2} (〈 B, X 〉)]}{\sqrt{2 σ^{2} E [f^{- 1} (〈 B, X 〉)]}}

,

σ^{2} = V a r (ε)

.

Proof.

See Appendix A (Part III). □

5. Simulations Results and Real Data Analysis

We now perform simulations to check the performance of the proposed test. In the non-parametric estimation, we use the Quartic kernel function

K (u) = 15 / 16 {(1 - u^{2})}^{2} I (u \in [0, 1])

, and the bandwidth h is selected by generalized cross-validation (GCV), which is similar to the method in [40]. Similar to the kernel estimators in Euclid spaces, the choice of the kernel affects less, and other kernel functions are also applicable, such as Gaussian kernel function and Triangular kernel function. The reason for choosing the Quartic kernel function is usually because it has a better fitting effect than the Triangular kernel, and compared to Gaussian kernel function, the Quartic kernel function is relatively more flexible in bandwidth selection. In the functional regression analysis, the optimal bandwidth h can be chosen by the R function

f r e g r e . n p

in the

f d a . u s c

package. Since the conclusion in [18] shows that the results are less sensitive to the slice number

\dot{S}

, we choose

\dot{S} = 3

in the simulations if not specifically stated. The regularization parameter is chosen as

ρ = 0.02

, which is suggested by the results in [18]. All studies are based on 1000 repetitions, and the significance level is always taken as

α = 0.05

. In the following subsections, we check the adequacy of the parametric functional single-index model by considering that

g (\cdot)

is both linear and nonlinear.

5.1. Study 1: Linear Link Function

When

g (\cdot)

is a linear function, we need to check the adequacy of the functional linear models (FLM). Among the existing model testing methods for FLM, [13] constructed two test statistics

CvM

and

KS

based on a random projection empirical process. We also compare the proposed test with the weighted kernel smoothing test suggested in [11], and we denote the test by

S_{n}

when its critical value is directly given by the asymptotic normal distribution and

S_{n}^{B}

when its critical value is obtained by bootstrap. In addition, we also extend the method of [11] to sufficient dimension reduction with the kernel function

K (〈 B, X_{i} - X_{j} 〉 / h)

and denote the test as

S_{n}^{FSIR}

(see the following Algorithm 2). Similar to

S_{n}

and

S_{n}^{B}

, we can drop out the FSDR and just use a univariate kernel

K (∥ X_{i} - X_{j} ∥ / h)

to substitute the multivariate kernel in

T_{n}^{FSIR}

; then, we obtain the corresponding statistics as

T_{n}

and

T_{n}^{B}

, respectively. In the simulation, the

KS

and

CvM

approach can be directly implemented by the

r p . f l m . t e s t

function in R package

f d a . u s c

when

g (\cdot)

is a linear link function, and the number of projections is taken as 3. The bootstrap resampling number is 500 in each case. The specific steps to calculate the test statistic

T_{n}^{FSIR}

and

S_{n}^{FSIR}

are as follows:

Algorithm 2: Calculate

T_{n}^{FSIR}

and

S_{n}^{FSIR}

step 1. To obtain

\hat{ε}

in

T_{n}^{FSIR}

and

S_{n}^{FSIR}

, we use the

p f r

function in the

r e f u n d

package of R for the FLM parameter estimation.

step 2.

\hat{B}

in the kernel function

K_{h} (〈 \hat{B}, X_{i} - X_{j} 〉)

of the test statistic

T_{n}^{FSIR}

is estimated by Algorithm 1.

step 3.

\hat{K}

is determined by (6). When

\hat{K} = 1

,

K_{h} (\cdot)

is a one-dimensional kernel function; when

\hat{K} > 1

,

K_{h} (\cdot)

is the product of

\hat{K}

one-dimensional kernel functions.

step 4. Then, we compute

S_{n}^{F S I R} = \frac{1}{n (n - 1)} \sum_{j \neq i}^{n} \frac{1}{h} K_{h} (〈 \hat{B}, X_{i} - X_{j} 〉) {\hat{ε}}_{i} {\hat{ε}}_{j}

and

T_{n}^{F S I R}

in (5).

We consider the following functional linear link model

Y = 〈 X, β_{1} 〉 + c \cdot m (〈 X, B 〉) + ε,

where X is a standard Brownian motion on

[0, 1]

and is observed on a grid of equally spaced 101 points,

β_{j} = sin (j π t)

is the Fourier orthogonal basis with

j = 1, 2, \dots

,

ε \sim N (0, 0 . 1^{2})

, c is a constant, and the null hypothesis

H_{0}

holds when

c = 0

. We carry out simulations from different perspectives with two examples.

Example 1.

In this example, our aim is to examine the performance of different test statistics under following alternative models:

$H_{11} : Y = 〈 X, β_{1} 〉 + 0.2 c \cdot cos (2 π 〈 X, β_{1} 〉) + ε$ ;
$H_{12} : Y = 〈 X, β_{1} 〉 + 0.2 c \cdot exp (- 〈 X, β_{1} 〉) + ε$ ;
$H_{13} : Y = 〈 X, β_{1} 〉 + 0.2 c \cdot 〈 X, β_{1} 〉 〈 X, β_{2} 〉 + ε$ ;
$H_{14} : Y = 〈 X, β_{1} 〉 + c \{exp (- 〈 X, β_{2} 〉) + cos (〈 X, β_{3} 〉)\} + ε$ ;
$H_{15} : Y = 〈 X, β_{1} 〉 + c \{exp (- 〈 X, β_{2} 〉) + cos (〈 X, β_{3} 〉) + {〈 X, β_{4} 〉}^{2}\} + ε$ ;
$H_{16} : Y = 〈 X, β_{1} 〉 + c \{exp (- 〈 X, β_{2} 〉) + cos (〈 X, β_{3} 〉) + {〈 X, β_{4} 〉}^{2} + {〈 X, β_{5} 〉}^{3}\} + ε$ ,

where

H_{11}

and

H_{12}

are used to examine whether and how much power our test loses by sufficient dimension reduction.

H_{13}

is the case of the multiplicative effect,

H_{14}

–

H_{16}

are used to analyze the effect of dimensionality on the different test statistics.

Table A1 shows the empirical sizes and powers of different test statistics with the nominal size

α = 0.05

. Through the comparative analysis in different scenarios, we conclude that: (1) when the critical value is directly given by the asymptotic normal distribution, the empirical sizes of

T_{n}^{FSIR}

are relatively closer to the pre-specified significance level

α = 0.05

, while

S_{n}

and

T_{n}

are more conservative. (2)

T_{n}^{FSIR}

has the highest power, except for small and moderate cs in the cases of

H_{12}

and

H_{13}

. (3)

S_{n}

,

CvM

and

KS

test statistics are mainly applicable to the cases of

H_{11}

and

H_{12}

, see [11,13]. In

H_{11}

and

H_{12}

,

B = β_{1}

under both the null and alternative hypothesis. Therefore, FSDR is not needed in these two alternative models, but we can examine whether and how much the power would be lost if FSDR were used. Comparing the results of using FSDR and not using FSDR in Table A1 for

H_{11}

and

H_{12}

, it is evident that the use of FSDR would result in some loss of power in

H_{12}

, while

T_{n}^{FSIR}

still has the highest power under

H_{1 n}

. (4) Under the multiplicative effects model

H_{13}

, the considered test statistics remain effective, and the test statistic using FSIR is slightly more powerful for large c and n. (5) From

H_{14}

to

H_{16}

, the power of the considered test statistic increases to varying degrees with the increasing dimensionality. (6) From

H_{14}

to

H_{16}

, there is a significant improvement in the powers of almost all test statistics. This indicates that the test statistics are more sensitive to the quadratic model than the high-frequency oscillation model. The power is slightly improved after adding a cubic model from

H_{15}

to

H_{16}

. (7) The powers of all statistics increase significantly as the sample size increases, and the power of the FSIR method increases more rapidly with increasing c, especially for

T_{n}^{FSIR}

.

Example 2.

To analyze the effect of different data from what we considered in Example 1, we take model

Y = 〈 X, β_{1} 〉 + c exp (- 〈 X, β_{2} 〉) + ε

as an example to analyze the performance of different statistics under the following data generation mechanism (DGM):

$D G M_{1} : X \sim O r n s t e i n U h l e n b e c k p r o c e s s, β_{1} = sin (π t), β_{2} = sin (2 π t)$ ;
$D G M_{2} : X \sim s t a n d a r d B r o w n i a n m o t i o n, β_{1} = sin (π t), β_{2} = sin (2 π t)$ ;
$D G M_{3} : X \sim s t a n d a r d B r o w n i a n m o t i o n, β_{1} = 2 {(2 t - 1)}^{2} - 1, β_{2} = 4 {(2 t - 1)}^{3} - 3 (2 t - 1)$ ;
$D G M_{4} : X \sim s t a n d a r d B r o w n i a n m o t i o n, β_{1} = 2 {(2 t - 1)}^{2} - 1, β_{2} = sin (2 π t)$ ,

where

ε \sim N (0, 0 . 1^{2})

,

β_{1}

and

β_{2}

in

{DGM}_{1} - {DGM}_{2}

and

{DGM}_{3}

are the Fourier orthogonal basis and Chebyshev orthogonal basis, respectively, and

{DGM}_{4}

has a non-orthogonal basis.

We present the test results for Example 2 in Table A2. In

{DGM}_{1}

and

{DGM}_{2}

, although X is different, the test results are similar. Comparing the results of

{DGM}_{2} - {DGM}_{4}

, which has different

β

s, we find that the powers in the Chebyshev basis are lower than those in the Fourier basis, while the powers in the case of the non-orthogonal basis are somewhere in between.

5.2. Study 2: Non-Linear Link Function

When

g (\cdot)

is a given nonlinear function, we only give results of

T_{n}, T_{n}^{B}

and

T_{n}^{FSIR}

, since

S_{n}

,

S_{n}^{B}

,

T_{n}

,

T_{n}^{B}

,

CvM

and

KS

are studied for FLM in [11,13]. Except for step 1 in Algorithm 2, the other steps are similar. Step 1 can be implemented in a similar way to functional principal component analysis in [25] when

g (\cdot)

is known, and

\hat{ε}

can be obtained through nonlinear least squares estimation.

We consider the nonlinear link function model:

Y = g (〈 X, β_{1} 〉) + c \cdot m (〈 X, B 〉) + ε,

where

g (\cdot)

is a smooth function. Similar to Study 1, we mainly analyze the performance of the test under different alternative models. The data generation mechanisms are the same as those in

{DGM}_{2}

. In order to make our test more widely applicable, we also analyze the case of monotonic and non-monotonic link functions, as demonstrated by the following two examples, respectively.

Example 3.

In this example, we consider the case where

g (\cdot)

is monotonic. The following models with the exponential link function are implemented.

$H_{31} : Y = exp (〈 X, β_{1} 〉) + c \cdot cos (〈 X, β_{2} 〉) + ε$ ;
$H_{32} : Y = exp (〈 X, β_{1} 〉) + c \cdot (cos (〈 X, β_{2} 〉) + | 〈 X, β_{3} 〉 |) + ε$ ;
$H_{33} : Y = exp (〈 X, β_{1} 〉) + c \cdot (cos (〈 X, β_{2} 〉) + | 〈 X, β_{3} 〉 | + {〈 X, β_{4} 〉}^{2}) + ε$ .

Example 4.

To examine the power of the test when

g (\cdot)

is non-monotonic, we analyze the following models with the sin link function,

$H_{41} : Y = sin (〈 X, β_{1} 〉) + c \cdot exp (〈 X, β_{2} 〉) + ε$ ;
$H_{42} : Y = sin (〈 X, β_{1} 〉) + c \cdot (exp (〈 X, β_{2} 〉) + | 〈 X, β_{3} 〉 |) + ε$ ;
$H_{43} : Y = sin (〈 X, β_{1} 〉) + c \cdot (exp (〈 X, β_{2} 〉) + | 〈 X, β_{3} 〉 | + {〈 X, β_{4} 〉}^{2}) + ε$ .

We present the results for Examples 3 and 4 in Figure A1. We make three observations. First, for the testing of nonlinear models, the statistic constructed based on the FSIR method always has higher power than others. Second, the difference between monotonic and non-monotonic power is not significant, and the effect of dimensionality on power is not very large, but the differences between

T_{n}^{FSIR}

and

T_{n}^{B}, T_{n}

in the monotonic cases are more obvious than the non-monotonic cases. For example, the power increment of

T_{n}^{FSIR}

relative to

T_{n}

in

H_{33}

is slightly larger than that in

H_{43}

when

n = 200

. Third, there is a significant increase in power with increasing sample size.

To summarize, through the above examples, including different data generation mechanisms, multiplicative effect models, and monotone/non-monotone cases under functional data, we find that the use of the FSDR method in functional data not only increases the power with the increase of sample size n and deviation parameter c but also improves the power of the test to some extent compared with the method without using FSDR [11,13] even under the multiplicative effect model. The above conclusion is largely consistent with the findings in vector data analysis [14,41,42]. In functional data analysis, similar research has been conducted, such as [43], but Ref. [43]’s method is only applicable to linear models. Our study is not only effective in linear situations but also performs well in nonlinear cases.

5.3. Analysis of the COVID-19 Data

As one of the most important aggregate indicators reflecting the overall macroeconomic operation of each country, Gross Domestic Product (GDP) is an important reference value for decision makers to make rational decisions based on the economic situation. The large shock to the world’s macroeconomy caused by the COVID-19 crisis has imposed tremendous challenges on now accurately forecasting a country’s GDP. Performing relevant data modeling is an important way to make predictions. Goodness-of-fit tests are a prerequisite for accurate modeling. Therefore, when the relationship between the number of new COVID-19 cases per day and the GDP of a country is known, the model can be tested first using the method in this paper, and then the related statistical analysis can be performed. Below, we describe the specific operations.

If X and Y have the following relationship model

E [Y_{i}^{G D P} | X_{i}] = g (〈 X_{i}, β 〉),

where

Y_{i}

is the DP growth (annual %) (https://advisor.visualcapitalist.com/gdp-growth-by-country-in-2021/, accessed on 18 January 2023) of country i in 2021,

X_{i} (t)

is the rolling mean of newly confirmed COVID-19 cases (in R package

t i d y c o v i d 19

) per day as reported by JHU CSSE (https://www.worldbank.org/en/home, accessed on 18 January 2023) in 2021,

i = 1, 2, \dots, 183

. Rolling mean can be implemented by the

r o l l m e a n

function in the R language and the width of the rolling window is 7; missing values are replaced by the nearby rolling mean. The specific trends of X and Y and the selected countries are shown in Figure A2, where the legend shows the ISO numbers for each country. In the legend of X, we have only indicated the three countries with a high number of confirmed cases. It is worth mentioning that there are more than 183 countries in the world, and we selected 183 countries for our analysis mainly because we excluded those countries that did not publish their GDP in 2021 and had zero cumulative confirmed cases in 2021.

Using the test statistic and parameter values, such as bandwidth and slice number, as mentioned in the simulation, if g is a linear function, the test results are displayed in Table A3. Table A3 shows that except for

CvM

and

K S

, all other statistics reject functional linear models, where

\hat{K} = 3

with

FSIR

. These conclusions suggests that a nonlinear model should be used: for example, a functional multi-index model may be appropriate. However,

CvM

and

KS

fail to detect this nonlinear model structure.

6. Conclusions and Discussion

In this paper, we extend [14]’s method to functional data and construct a different test statistic for checking a single-index model. We theoretically prove the large-sample properties of the constructed test statistic under the null hypothesis, alternative hypothesis, and local alternative hypothesis. In addition, the simulations of Section 4 also show that the FSIR-based statistic has higher power compared to other statistic in most cases and can be used to check the adequacy of functional linear or nonlinear models.

We can extend our methodology to missing, censored or dependent data. For example, the case of missing data can be referred to the method of [41]. Of course, in addition to mean regression, we can also consider the quantile regression or composite quantile regression.

Author Contributions

L.X.: Conceptualization, Formal analysis, Methodology, Software, Writing—original draft, Data curation; T.L.: Formal analysis, Methodology, Software, Writing—review and editing; Z.Z.: Conceptualization, Formal analysis, Methodology, Investigation, Supervision, Funding acquisition, Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research is partly supported by the National Natural Science Foundation of China (Grant No. 12271014 and No. 11971045).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

“GDP growth (annual %) Data” at https://advisor.visualcapitalist.com/gdp-growth-by-country-in-2021/ (accessed on 18 January 2023), “COVID-19 Data (in R package

t i d y c o v i d 19

)” at https://www.worldbank.org/en/home (accessed on 18 January 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Part I

Proof of Theorem 1.

We first give several notations used in the proof. Specifically,

Σ_{n}

is an

n \times n

diagonal matrix with

σ_{i i} = V a r^{1 / 2} (ε_{i})

, and for any

n \times n

matrix

A_{n}

, denote the spectral radius of

A_{n}

by

\begin{matrix} ρ (A_{n}) \equiv sup_{v \in R^{n}, v \neq 0} \frac{∥ A_{n} v ∥}{∥ v ∥} . \end{matrix}

When

A_{n}

is symmetric and the eigenvalue satisfies

| k_{1} | \geq | k_{2} | \geq \dots \geq | k_{n} |

,

ρ (A_{n}) = | k_{1} |

.

Next, we prove Theorem 1 in five steps.

Step 1:

\frac{ε^{⊤} W_{n}^{s} ε}{\sqrt{2} s (Σ_{n} W_{n}^{s} Σ_{n})} \overset{d}{⟶} N (0, 1)

.

Both

W_{n}

and

W_{n}^{s}

are

n \times n

matrices, with their

(i j)

th elements being

w_{i j} = K_{h} (〈 B, X_{i} - X_{j} 〉) / \sum_{l \neq i}^{n} K_{h} (〈 B, X_{i} - X_{l} 〉)

and

(w_{i j} + w_{j i}) / 2

, respectively.

To present the proof of Step 1, we first introduce two pivotal lemmas. The first lemma is a direct conclusion of Theorem 1.1 in [44].

Lemma A1.

Assume that

{ε}

is a sequence of independent uniformly square integrable random variables with mean zero, and

ε = {(ε_{1}, \dots, ε_{n})}^{⊤}

. Define

Σ_{n}

as an

n \times n

diagonal matrix with ii-th element

σ_{i i}

. For a real symmetric matrix

A_{n} = {(a_{i j})}_{n \times n}

with zero diagonal, if

ρ (A_{n}) / s (A_{n}) \to 0

as

n \to \infty

, then

\begin{matrix} T = \frac{ε^{⊤} A_{n} ε}{\sqrt{2} s (Σ_{n} A_{n} Σ_{n})} \overset{d}{⟶} N (0, 1) . \end{matrix}

(A1)

The second lemma comes from [23].

Lemma A2.

Let

X_{n} = {(X_{1}, X_{2}, \dots, X_{n})}^{⊤}

be an i.i.d. sample from X, and

x_{n}

for a realization of

X_{n}

.

ε

is the same as in Lemma 1.

P_{n} (X_{n})

and

T (X_{n}, ε)

are measurable functions. If

P_{n} (x_{n}) ⟶ 0 \Rightarrow T (X_{n}, ε) \overset{d}{⟶} N (0, 1),

and

P_{n} (X_{n}) \overset{p}{⟶} 0,

then

T (X_{n}, ε) \overset{d}{⟶} N (0, 1)

.

By Lemmas A1 and A2, to prove the conclusion in Step 1, we need to prove

ρ (W_{n}^{s}) / s (W_{n}^{s}) \overset{p}{⟶} 0

as

n ⟶ \infty

. Firstly, since

W_{n}

and

W_{n}^{⊤}

have the same eigenvalues, it is easy to show

\begin{matrix} ρ (W_{n}^{s}) & = ρ (W_{n}) . \end{matrix}

Secondly, noting that

w_{i j} \geq 0

, it holds

\begin{matrix} s^{2} (W_{n}^{s}) & \geq \frac{1}{2} s^{2} (W_{n}) . \end{matrix}

Thus,

\frac{ρ (W_{n}^{s})}{s (W_{n}^{s})} \leq \sqrt{2} \frac{ρ (W_{n})}{s (W_{n})},

and it suffices to show that

ρ (W_{n}) / s (W_{n}) \overset{p}{⟶} 0

.

Since the row sum of

W_{n}

is 1,

ρ (W_{n}) \leq 1

. Thus, we need only show

s (W_{n}) \overset{p}{⟶} \infty

. Define

Z = 〈 B, X_{j} 〉

as a random variable and the density function exists (Assumption 9). Since

\begin{matrix} E [\sum_{j = 1}^{n - 1} K_{h}^{2} (〈 B, X_{j} - x 〉)] = & (n - 1) E [K_{h}^{2} (Z - 〈 B, x 〉)] \\ = & (n - 1) \int K^{2} (\frac{Z - 〈 B, x 〉}{h}) f (Z) d Z \\ = & (n - 1) \int K^{2} (u) f (u h + 〈 B, x 〉) h^{K} d u \\ = & (n - 1) h^{K} \int K^{2} (u) [f (〈 B, x 〉) + h u^{⊤} f^{'} (〈 B, x 〉) + o (h)] d u \\ = & (n - 1) h^{K} f (〈 B, x 〉) \int K^{2} (u) d u + (n - 1) h^{K + 1} \int u^{⊤} f^{'} (〈 B, x 〉) K^{2} (u) d u + o (n h^{K + 1}) . \end{matrix}

Similarly, we have

\begin{matrix} E [\sum_{j = 1}^{n - 1} K_{h} (〈 B, X_{j} - x 〉)] = & (n - 1) E [K_{h} (〈 B, X - x 〉) | x] \\ = & (n - 1) h^{K} f (〈 B, x 〉) + (n - 1) h^{K + 1} f^{'} (〈 B, x 〉) \int u K (u) d u + o (n h^{K + 1}) . \end{matrix}

These two equations lead to

\begin{matrix} s^{2} (W_{n}) = & \sum_{i \neq j} w_{i j}^{2} \\ = & \sum_{i = 1}^{n} \frac{1}{n h^{K} f (〈 B, X_{i} 〉)} (\int K^{2} (u) d u + o_{p} (1)) \\ = & \frac{1}{h^{K}} E [f^{- 1} (〈 B, X 〉)] (\int K^{2} (u) d u + o_{p} (1)) \\ = & O_{p} (h^{- K}) . \end{matrix}

It is easy to derive

s (W_{n}) = O_{p} (h^{- K / 2})

. This result will be used in the following steps.

By Assumption 5:

\exists c_{1}, c_{2} > 0, c_{1} I_{[0, 1]} (t) < K (t) < c_{2} I_{[0, 1]} (t)

for

\forall t \in R

, we have

\begin{matrix} E [\sum_{j = 1}^{n - 1} K_{h}^{2} (〈 B, X_{j} - x 〉)] \geq (n - 1) h^{K} f (〈 B, x 〉) c_{1}^{2} + o_{p} (1) . \end{matrix}

Thus,

\begin{matrix} s^{2} (W_{n}) & \geq \frac{1}{h^{K}} E [f^{- 1} (〈 B, X 〉)] (c_{1}^{2} + o_{p} (1)) . \end{matrix}

Under Assumption 4, when

K = 1

under

H_{0}

, we have

s (W_{n}) \to \infty

.

Step 2:

\frac{ε^{⊤} {\hat{W}}_{n}^{s} ε - ε^{⊤} W_{n}^{s} ε}{\sqrt{2} s (Σ_{n} W_{n}^{s} Σ_{n})} \overset{p}{⟶} 0

.

To begin, note that

\begin{matrix} ε^{⊤} {\hat{W}}_{n}^{s} ε - ε^{⊤} W_{n}^{s} ε \\ = & \sum_{i = 1}^{n} \sum_{j \neq i}^{n} ({\hat{w}}_{i j}^{s} - w_{i j}^{s}) ε_{i} ε_{j} \\ = & \frac{1}{2} \sum_{i = 1}^{n} \sum_{j \neq i}^{n} ((\frac{K_{h} (〈 \hat{B}, X_{i} - X_{j} 〉)}{\sum_{l \neq i}^{n} K_{h} (〈 \hat{B}, X_{i} - X_{l} 〉)} - \frac{K_{h} (〈 B, X_{i} - X_{j} 〉)}{\sum_{l \neq i}^{n} K_{h} (〈 B, X_{i} - X_{l} 〉)}) + (\frac{K_{h} (〈 \hat{B}, X_{i} - X_{j} 〉)}{\sum_{l \neq j}^{n} K_{h} (〈 \hat{B}, X_{j} - X_{l} 〉)} - \frac{K_{h} (〈 B, X_{i} - X_{j} 〉)}{\sum_{l \neq j}^{n} K_{h} (〈 B, X_{j} - X_{l} 〉)})) ε_{i} ε_{j} \\ = & \frac{1}{2} (\sum_{i = 1}^{n} \sum_{j \neq i}^{n} (\frac{K_{h} (〈 \hat{B}, X_{i} - X_{j} 〉)}{\sum_{l \neq i}^{n} K_{h} (〈 \hat{B}, X_{i} - X_{l} 〉)} - \frac{K_{h} (〈 B, X_{i} - X_{j} 〉)}{\sum_{l \neq i}^{n} K_{h} (〈 B, X_{i} - X_{l} 〉)}) ε_{i} ε_{j} + \sum_{i = 1}^{n} \sum_{j \neq i}^{n} (\frac{K_{h} (〈 \hat{B}, X_{i} - X_{j} 〉)}{\sum_{l \neq j}^{n} K_{h} (〈 \hat{B}, X_{j} - X_{l} 〉)} - \frac{K_{h} (〈 B, X_{i} - X_{j} 〉)}{\sum_{l \neq j}^{n} K_{h} (〈 B, X_{j} - X_{l} 〉)}) ε_{i} ε_{j}) \\ \equiv & \frac{1}{2} (D_{1 n} + D_{2 n}) \end{matrix}

Then, an application of Taylor expansion of

D_{1 n}

yields

\begin{matrix} D_{1 n} = & \sum_{i = 1}^{n} \sum_{j \neq i}^{n} (\frac{K_{h} (〈 \hat{B}, X_{i} - X_{j} 〉)}{\sum_{l \neq i}^{n} K_{h} (〈 \hat{B}, X_{i} - X_{l} 〉)} - \frac{K_{h} (〈 B, X_{i} - X_{j} 〉)}{\sum_{l \neq i}^{n} K_{h} (〈 B, X_{i} - X_{l} 〉)}) ε_{i} ε_{j} \\ = & D_{1 n}^{★} + o_{p} (D_{1 n}^{★}), \end{matrix}

where

\begin{matrix} D_{1 n}^{★} = & \sum_{i = 1}^{n} \sum_{j \neq i}^{n} {(\frac{K_{h}^{'} (〈 B, X_{i} - X_{j} 〉) \sum_{l \neq i}^{n} K_{h} (〈 B, X_{i} - X_{l} 〉) - K_{h} (〈 B, X_{i} - X_{j} 〉) \sum_{l \neq i}^{n} K_{h}^{'} (〈 B, X_{i} - X_{l} 〉)}{{(\sum_{l \neq i}^{n} K_{h} (〈 B, X_{i}, X_{l} 〉))}^{2}})}^{⊤} \cdot \frac{〈 \hat{B} - B, X_{i} - X_{j} 〉}{h^{\hat{K}}} ε_{i} ε_{j} \\ = & \sum_{i = 1}^{n} \sum_{j \neq i}^{n} \frac{1}{h^{\hat{K}}} s_{i j}^{⊤} 〈 \hat{B} - B, X_{i} - X_{j} 〉 ε_{i} ε_{j}, \end{matrix}

and superscript “′” represents the gradient vector,

〈 \hat{B} - B, X_{i} - X_{j} 〉 = {(〈 {\hat{β}}_{1} - β_{1}, X_{i} - X_{j} 〉, 〈 {\hat{β}}_{2} - β_{2}, X_{i} - X_{j} 〉, \dots, 〈 {\hat{β}}_{K} - β_{K}, X_{i} - X_{j} 〉)}^{⊤}

and we define

s_{i j}

as the K-dimensional column vectors.

Similar to

D_{1 n}

,

D_{2 n} = D_{2 n}^{★} + o_{p} (D_{2 n}^{★}),

where

\begin{matrix} D_{2 n}^{★} = & \sum_{i = 1}^{n} \sum_{j \neq i}^{n} {(\frac{K_{h}^{'} (〈 B, X_{i} - X_{j} 〉) \sum_{l \neq j}^{n} K_{h} (〈 B, X_{j} - X_{l} 〉) - K_{h} (〈 B, X_{i} - X_{j} 〉) \sum_{l \neq j}^{n} K_{h}^{'} (〈 B, X_{j} - X_{l} 〉)}{{(\sum_{l \neq j}^{n} K_{h} (〈 B, X_{j} - X_{l} 〉))}^{2}})}^{⊤} \cdot \frac{〈 \hat{B} - B, X_{i} - X_{j} 〉}{h^{\hat{K}}} ε_{i} ε_{j} \\ \equiv & \sum_{i = 1}^{n} \sum_{j \neq i}^{n} \frac{1}{h^{\hat{K}}} {(s_{i j}^{★})}^{⊤} 〈 \hat{B} - B, X_{i} - X_{j} 〉 ε_{i} ε_{j}, \end{matrix}

and

s_{i j}^{★}

is the K-dimensional column vector corresponding to the coefficients of

\frac{〈 \hat{B} - B, X_{i} - X_{j} 〉}{h^{\hat{K}}} ε_{i} ε_{j}

.

Let the elements in the K-dimensional vector

1 / 2 {(s_{i j} + s_{i j}^{★})}^{⊤} ∥ X_{i} - X_{j} ∥

be

a_{i j}^{k}

,

A_{n}^{k}

is a

n \times n

matrix with the element

a_{i j}^{k}

. Then,

\begin{matrix} \frac{ε^{⊤} {\hat{W}}_{n}^{s} ε - ε^{⊤} W_{n}^{s} ε}{\sqrt{2} s (Σ_{n} W_{n}^{s} Σ_{n})} = & \frac{1}{2} (D_{1 n}^{★} + D_{2 n}^{★}) + \frac{1}{2} (o_{p} (D_{1 n}^{★}) + o_{p} (D_{2 n}^{★})), \end{matrix}

where

\begin{matrix} D_{1 n}^{★} + D_{2 n}^{★} = & \frac{\sum_{i = 1}^{n} \sum_{j \neq i}^{n} \frac{1}{2 h^{\hat{K}}} {(s_{i j} + s_{i j}^{★})}^{⊤} 〈 \hat{B} - B, X_{i} - X_{j} 〉 ε_{i} ε_{j}}{\sqrt{2} s (Σ_{n} W_{n}^{s} Σ_{n})} \\ \leq & \frac{\sum_{i = 1}^{n} \sum_{j \neq i}^{n} \frac{1}{2} {(s_{i j} + s_{i j}^{★})}^{⊤} ε_{i} ε_{j} ∥ X_{i} - X_{j} ∥}{\sqrt{2} s (Σ_{n} W_{n}^{s} Σ_{n})} \cdot \frac{∥ \hat{B} - B ∥}{h^{\hat{K}}} \\ = & \sum_{i = 1}^{n} \sum_{j \neq i}^{n} \frac{1}{2} {(s_{i j} + s_{i j}^{★})}^{⊤} ∥ X_{i} - X_{j} ∥ ε_{i} ε_{j} d i a g (\sqrt{2} s {(Σ_{n} A_{n}^{k} Σ_{n})}^{- 1}) \cdot \frac{d i a g (\sqrt{2} s (Σ_{n} A_{n}^{k} Σ_{n}))}{\sqrt{2} s (Σ_{n} W_{n}^{s} Σ_{n})} \cdot \frac{∥ \hat{B} - B ∥}{h^{\hat{K}}}, \end{matrix}

(A2)

where

∥ \hat{B} - B ∥ = (∥ {\hat{β}}_{1} - β_{1} ∥, ∥ {\hat{β}}_{2} - β_{2} ∥, \dots, ∥ {\hat{β}}_{K} - β_{K} {∥)}^{⊤}

.

Similar to the proof in Step 1, the elements of the K-dimensional vector

\sum_{i = 1}^{n} \sum_{j \neq i}^{n} \frac{1}{2} {(s_{i j} + s_{i j}^{★})}^{⊤} ∥ X_{i} - X_{j} ∥ ε_{i} ε_{j} \cdot d i a g ({(\sqrt{2} s (Σ_{n} A_{n}^{k} Σ_{n}))}^{- 1})

tend to a multivariate standard normal distribution. Thus, the first term of (A2) is

O_{p} (1)

.

We define the k-th component of the K-dimensional vector

s_{i j}

as

s_{i j}^{k}, k = 1, \dots, K

, where

s_{i j}

comes from

D_{1 n}^{★}

. Letting the K-dimension vector

R = \sum_{i = 1}^{n} \sum_{j = 1}^{n} {(s_{i j}^{k})}^{2} {∥ X_{i} - X_{j} ∥}^{2}

, it is easy to see that

\sqrt{2} s (Σ_{n} A_{n}^{k} Σ_{n})

and

R^{1 / 2}

have the same convergence rate. By Assumption 5, we have

\begin{matrix} R = & \sum_{i = 1}^{n} \sum_{j \neq i}^{n} \frac{{[K_{h}^{'} (〈 B, X_{i} - X_{j} 〉) \sum_{l \neq i}^{n} K_{h} (〈 B, X_{i} - X_{l} 〉) - K_{h} (〈 B, X_{i} - X_{j} 〉) \sum_{l \neq i}^{n} K_{h}^{'} (〈 B, X_{i} - X_{l} 〉)]}^{2} {∥ X_{i} - X_{j} ∥}^{2}}{{(\sum_{l \neq i}^{n} K_{h} (〈 B, X_{i}, X_{l} 〉))}^{4}} \\ = & \sum_{i = 1}^{n} \frac{\sum_{j = 1, j \neq i}^{n} {[K_{h}^{'} (〈 B, X_{i} - X_{j} 〉)]}^{2} {∥ X_{i} - X_{j} ∥}^{2}}{{(\sum_{l \neq i}^{n} K_{h} (〈 B, X_{i}, X_{l} 〉))}^{2}} + \sum_{i = 1}^{n} \frac{\sum_{j = 1, j \neq i}^{n} K_{h}^{2} (〈 B, X_{i} - X_{j} 〉) {[\sum_{l \neq i}^{n} K_{h}^{'} (〈 B, X_{i} - X_{l} 〉)]}^{2} {∥ X_{i} - X_{j} ∥}^{2}}{{(\sum_{l \neq i}^{n} K_{h} (〈 B, X_{i}, X_{l} 〉))}^{4}} \\ - 2 \sum_{i = 1}^{n} \frac{\sum_{j = 1, j \neq i}^{n} K_{h}^{'} (〈 B, X_{i} - X_{j} 〉) K_{h} (〈 B, X_{i} - X_{j} 〉) {∥ X_{i} - X_{j} ∥}^{2} \sum_{l \neq i} K_{h}^{'} (〈 B, X_{i} - X_{l} 〉)}{{(\sum_{l \neq i}^{n} K_{h} (〈 B, X_{i}, X_{l} 〉))}^{3}} \\ = & O_{p} (h^{- K}) . \end{matrix}

Then, the second term of (A2) satisfies

d i a g (\sqrt{2} s (Σ_{n} A_{n}^{k} Σ_{n})) / \sqrt{2} s (Σ_{n} W_{n}^{s} Σ_{n}) = O_{p} (1)

. Under the above results, together with

∥ \hat{B} - B ∥ = O_{p} (r_{n})

(refer to [18]) and

r_{n} / h ⟶ 0

, under

H_{0}

, we have

\begin{matrix} \frac{ε^{⊤} {\hat{W}}_{n}^{s} ε - ε^{⊤} W_{n}^{s} ε}{\sqrt{2} s (Σ_{n} W_{n}^{s} Σ_{n})} = O_{p} (\frac{r_{n}}{h}) = o_{p} (1) . \end{matrix}

Step 3:

\frac{{\hat{ε}}^{⊤} {\hat{W}}_{n}^{s} \hat{ε} - ε^{⊤} {\hat{W}}_{n}^{s} ε}{\sqrt{2} s (Σ_{n} W_{n}^{s} Σ_{n})} \overset{p}{⟶} 0

.

Decompose the quantity into two parts:

\begin{matrix} \frac{{\hat{ε}}^{⊤} {\hat{W}}_{n}^{s} \hat{ε} - ε^{⊤} {\hat{W}}_{n}^{s} ε}{\sqrt{2} s (Σ_{n} W_{n}^{s} Σ_{n})} = & \frac{{(\hat{ε} - ε)}^{⊤} {\hat{W}}_{n}^{s} (\hat{ε} - ε)}{\sqrt{2} s (Σ_{n} W_{n}^{s} Σ_{n})} + \frac{2 {(\hat{ε} - ε)}^{⊤} {\hat{W}}_{n}^{s} ε}{\sqrt{2} s (Σ_{n} W_{n}^{s} Σ_{n})} . \end{matrix}

(A3)

First, we consider the numerator of the first term on the right side of Equation (A3). It is easy to see that

{(\hat{ε} - ε)}^{⊤} {\hat{W}}_{n}^{s} (\hat{ε} - ε) = {(\hat{ε} - ε)}^{⊤} W_{n}^{s} (\hat{ε} - ε) (1 + o_{p} (1))

; then,

\begin{matrix} {(\hat{ε} - ε)}^{⊤} W_{n}^{s} (\hat{ε} - ε) \\ = \frac{1}{2} \sum_{i = 1}^{n} \sum_{j \neq i}^{n} (\frac{K_{h} (〈 B, X_{i} - X_{j} 〉)}{\sum_{l \neq i}^{n} K_{h} (〈 B, X_{i} - X_{l} 〉)} + \frac{K_{h} (〈 B, X_{i} - X_{j} 〉)}{\sum_{l \neq j}^{n} K_{h} (〈 B, X_{j} - X_{l} 〉)}) ({\hat{ε}}_{i} - ε_{i}) ({\hat{ε}}_{j} - ε_{j}) \\ = \frac{1}{2} \sum_{i = 1}^{n} \sum_{j \neq i}^{n} \frac{K_{h} (〈 B, X_{i} - X_{j} 〉)}{\sum_{l \neq i}^{n} K_{h} (〈 B, X_{i} - X_{l} 〉)} ({\hat{ε}}_{i} - ε_{i}) ({\hat{ε}}_{j} - ε_{j}) + \frac{1}{2} \sum_{i = 1}^{n} \sum_{j \neq i}^{n} \frac{K_{h} (〈 B, X_{i} - X_{j} 〉)}{\sum_{l \neq j}^{n} K_{h} (〈 B, X_{j} - X_{l} 〉)} \cdot ({\hat{ε}}_{i} - ε_{i}) ({\hat{ε}}_{j} - ε_{j}) \\ \equiv \frac{1}{2} D_{31 n} + \frac{1}{2} D_{32 n} . \end{matrix}

Then, since

\hat{ε} - ε = g (〈 \hat{β}, X 〉) - g (〈 β, X 〉)

, the Taylor expansion for

D_{31 n}

with respect to

β

leads to

\begin{matrix} D_{31 n} = & \sum_{i = 1}^{n} \sum_{j \neq i}^{n} \frac{K_{h} (〈 B, X_{i} - X_{j} 〉)}{\sum_{l \neq i}^{n} K_{h} (〈 B, X_{i} - X_{l} 〉)} ({\hat{ε}}_{i} - ε_{i}) ({\hat{ε}}_{j} - ε_{j}) \\ = & \sum_{i = 1}^{n} \sum_{j \neq i}^{n} w_{i j} [g^{'} (〈 X_{i}, β 〉) g^{'} (〈 X_{j}, β 〉) 〈 X_{i}, \hat{β} - β 〉 〈 X_{j}, \hat{β} - β 〉 + o (∥ \hat{β} - β ∥)] \\ = & O_{p} (n r_{n}^{2}), \end{matrix}

(A4)

where

∥ \hat{β} - β ∥ = O_{p} (r_{n})

and

K = 1

under

H_{0}

.

Similar to

D_{31 n}

, we can prove

D_{32 n} = O_{P} (n r_{n}^{2})

. Then, since

s (W_{n}^{s}) = O_{p} (h^{- 1 / 2})

(see Step 1),

r_{n} / h \to 0

and

n h^{1 / 2} r_{n}^{2} \to 0

, we have

\begin{matrix} \frac{{(\hat{ε} - ε)}^{⊤} {\hat{W}}_{n}^{s} (\hat{ε} - ε)}{\sqrt{2} s (Σ_{n} W_{n}^{s} Σ_{n})} \overset{p}{⟶} 0 . \end{matrix}

Next, we consider the numerator of the second term on the right side of Equation (A3), that is,

\begin{matrix} {(\hat{ε} - ε)}^{⊤} {\hat{W}}_{n}^{s} ε = & \frac{1}{2} \sum_{i = 1}^{n} \sum_{j \neq i}^{n} (\frac{K_{h} (〈 \hat{B}, X_{i} - X_{j} 〉)}{\sum_{l \neq i}^{n} K_{h} (〈 \hat{B}, X_{i} - X_{l} 〉)} + \frac{K_{h} (〈 \hat{B}, X_{i} - X_{j} 〉)}{\sum_{l \neq j}^{n} K_{h} (〈 \hat{B}, X_{j} - X_{l} 〉)}) ({\hat{ε}}_{i} - ε_{i}) ε_{j} \\ = & \frac{1}{2} \sum_{i = 1}^{n} \sum_{j \neq i}^{n} \frac{K_{h} (〈 \hat{B}, X_{i} - X_{j} 〉)}{\sum_{l \neq i}^{n} K_{h} (〈 \hat{B}, X_{i} - X_{l} 〉)} ({\hat{ε}}_{i} - ε_{i}) ε_{j} + \frac{1}{2} \sum_{i = 1}^{n} \sum_{j \neq i}^{n} \frac{K_{h} (〈 \hat{B}, X_{i} - X_{j} 〉)}{\sum_{l \neq j}^{n} K_{h} (〈 \hat{B}, X_{j} - X_{l} 〉)} ({\hat{ε}}_{i} - ε_{i}) ε_{j} \\ \equiv & \frac{1}{2} D_{33 n} + \frac{1}{2} D_{34 n} . \end{matrix}

(A5)

Similar to

D_{31 n}

, by Taylor expansion of

D_{33 n}

with respect to

B

and

β

, we have

\begin{matrix} D_{33 n} = D_{33 n}^{★} + o_{p} (D_{33 n}^{★}), \end{matrix}

where

\begin{matrix} D_{33 n}^{★} = & \sum_{i = 1}^{n} \sum_{j \neq i}^{n} w_{i j} ({\hat{ε}}_{i} - ε_{i}) ε_{j} + \sum_{i = 1}^{n} \sum_{j \neq i}^{n} \frac{1}{h} s_{i j}^{⊤} 〈 X_{i} - X_{j}, \hat{B} - B 〉 ({\hat{ε}}_{i} - ε_{i}) ε_{j} \\ = & \sum_{i = 1}^{n} \sum_{j \neq i}^{n} w_{i j} g^{'} (〈 X_{i}, β 〉) ε_{j} 〈 X_{i}, \hat{β} - β 〉 + \sum_{i = 1}^{n} \sum_{j \neq i}^{n} \frac{1}{h} s_{i j}^{⊤} 〈 X_{i} - X_{j}, \hat{B} - B 〉 g^{'} (〈 X_{i}, β 〉) ε_{j} 〈 X_{i}, \hat{β} - β 〉 \\ = & D_{331 n} + D_{332 n} . \end{matrix}

Similar to the Lemma 5 in Ait-Saïdi et al. [1], under

H_{0}

with

K = 1

, we have

\begin{matrix} \sum_{j} w_{i j} ε_{j} = & \frac{C}{\sqrt{n h}} + o_{p} (\frac{1}{\sqrt{n h}}) + o_{p} (\sqrt{\frac{log (n)}{n h}}) + o_{p} (h^{b_{1}}) . \end{matrix}

Since

∥ \hat{ε} - ε ∥ = O_{p} (r_{n})

,

∥ \hat{B} - B ∥ = O_{p} (r_{n})

,

∥ \hat{β} - β ∥ = O_{p} (r_{n})

and

s (W_{n}) = O_{p} (h^{- 1 / 2})

under

H_{0}

, we have

\begin{matrix} \frac{D_{331 n}}{\sqrt{2} s (Σ_{n} W_{n}^{s} Σ_{n})} = & O_{p} (r_{n}) . \end{matrix}

(A6)

Combining the proofs of

D_{31 n}

and

D_{331 n}

, we obtain

\begin{matrix} D_{332 n} = & \sum_{i = 1}^{n} \sum_{j \neq i}^{n} \frac{1}{h} \frac{K_{h}^{'} (〈 B, X_{i} - X_{j} 〉)}{\sum_{l \neq i}^{n} K_{h} (〈 B, X_{i} - X_{l} 〉)} ε_{j} 〈 X_{i} - X_{j}, \hat{B} - B 〉 g^{'} (〈 X_{i}, β 〉) 〈 X_{i}, \hat{β} - β 〉 \\ - \sum_{i = 1}^{n} \sum_{j \neq i}^{n} \frac{1}{h} \frac{\sum_{l \neq i}^{n} K_{h}^{'} (〈 B, X_{i} - X_{l} 〉)}{\sum_{l \neq i}^{n} K_{h} (〈 B, X_{i} - X_{l} 〉)} w_{i j} ε_{j} 〈 X_{i} - X_{j}, \hat{B} - B 〉 g^{'} (〈 X_{i}, β 〉) 〈 X_{i}, \hat{β} - β 〉 \\ = & \sum_{i = 1}^{n} \sum_{j \neq i}^{n} \frac{1}{h} \frac{K_{h}^{'} (〈 B, X_{i} - X_{j} 〉)}{K_{h} (〈 B, X_{i} - X_{j} 〉)} w_{i j} ε_{j} 〈 X_{i} - X_{j}, \hat{B} - B 〉 g^{'} (〈 X_{i}, β 〉) 〈 X_{i}, \hat{β} - β 〉 \\ - \sum_{i = 1}^{n} \sum_{j \neq i}^{n} \frac{1}{h} \frac{\sum_{l \neq i}^{n} K_{h}^{'} (〈 B, X_{i} - X_{l} 〉)}{\sum_{l \neq i}^{n} K_{h} (〈 B, X_{i} - X_{l} 〉)} w_{i j} ε_{j} 〈 X_{i} - X_{j}, \hat{B} - B 〉 g^{'} (〈 X_{i}, β 〉) 〈 X_{i}, \hat{β} - β 〉 \\ = & D_{3321 n} + D_{3322 n} . \end{matrix}

Combining Lemma A1 (i) in [11], and similar to (A6), we have

\begin{matrix} \frac{D_{3321 n}}{\sqrt{2} s (Σ_{n} W_{n}^{s} Σ_{n})} = O_{p} (\frac{r_{n}^{2}}{h^{2}}) = o_{p} (1), \frac{D_{3322 n}}{\sqrt{2} s (Σ_{n} W_{n}^{s} Σ_{n})} = O_{p} (\frac{r_{n}^{2}}{h}) = o_{p} (1) . \end{matrix}

Then,

D_{33 n} = o (1)

. Similar to

D_{33 n}

, it is easy to obtain

D_{34 n} = o_{p} (1)

; then, we have

\begin{matrix} \frac{2 {(\hat{ε} - ε)}^{⊤} {\hat{W}}_{n}^{s} ε}{\sqrt{2} s (Σ_{n} W_{n}^{s} Σ_{n})} \overset{p}{⟶} 0 . \end{matrix}

Step 4:

\frac{s (\hat{U} {\hat{W}}_{n}^{s} \hat{U})}{s (Σ_{n} W_{n}^{s} Σ_{n})} \overset{p}{⟶} 1

.

We prove this in two steps. That is,

\begin{matrix} \frac{\sum_{i j} {(w_{i j}^{s})}^{2} ε_{i}^{2} ε_{j}^{2} - s (Σ_{n} W_{n}^{s} Σ_{n})}{s^{2} (Σ_{n} W_{n}^{s} Σ_{n})} \overset{p}{⟶} 0, \end{matrix}

(A7)

and

\begin{matrix} \frac{\sum_{i j} {({\hat{w}}_{i j}^{s})}^{s} {\hat{ε}}_{i}^{2} {\hat{ε}}_{j}^{2} - \sum_{i j} {(w_{i j}^{s})}^{s} ε_{i}^{2} ε_{j}^{2}}{s^{2} (Σ_{n} W_{n}^{s} Σ_{n})} \overset{p}{⟶} 0, \end{matrix}

(A8)

From Step 3, we find that we only need to prove that the rate of

W_{n}

(or

{\hat{W}}_{n}

) in the formula is the same as the rate of

W_{n}^{s}

(or

{\hat{W}}_{n}^{s}

). Then, referring to Lemma 3 and Lemma A.3 in [23], we can easily obtain the conclusion of (A7). Next, combining Step 2 and Step 3, we can obtain (A8).

Step 5:

\frac{\hat{K}}{s ({\hat{W}}_{n}^{s})} \overset{p}{⟶} 0

.

By Taylor expansion of

W_{n}^{s}

in (A4), and

∥ \hat{B} - B ∥ = O_{p} (r_{n})

, we know that

s ({\hat{W}}_{n}^{s}) = O_{p} (h^{- K / 2})

with

K = 1

under

H_{0}

. Since

P (\hat{K} = 1) \overset{p}{⟶} 1

under

H_{0}

, then

\begin{matrix} \frac{\hat{K}}{s ({\hat{W}}_{n}^{s})} \overset{p}{⟶} 0 . \end{matrix}

Theorem 1 is proved. □

Appendix A.2. Part II

Proof of Theorem 2.

Define

{\tilde{ε}}_{i} = Y_{i} - g (〈 X_{i}, θ 〉)

and

\tilde{ε} = {({\tilde{ε}}_{1}, {\tilde{ε}}_{2}, \dots, {\tilde{ε}}_{n})}^{⊤}

. Then, combining the results in Step 2, we have

\begin{matrix} \frac{{\hat{ε}}^{⊤} {\hat{W}}_{n}^{s} \hat{ε}}{\sqrt{2} s (W_{n}^{s})} & = \frac{{\tilde{ε}}^{⊤} W_{n}^{s} \tilde{ε}}{\sqrt{2} s (W_{n}^{s})} + \frac{{(\hat{ε} - \tilde{ε})}^{⊤} W_{n}^{s} (\hat{ε} - \tilde{ε})}{\sqrt{2} s (W_{n}^{s})} + \frac{2 {(\hat{ε} - \tilde{ε})}^{⊤} W_{n}^{s} \tilde{ε}}{\sqrt{2} s (W_{n}^{s})} + o (1) . \end{matrix}

It is easy to show that the first part dominates the third part and goes to infinity in probability and the second part is bounded in probability. The details are similar to the proof of Theorem 1 and are therefore omitted. □

Appendix A.3. Part III

Proof of Proposition 1.

For the FSIR method, we first define

\begin{matrix} M^{\dot{s}} = & {(Γ + ρ I)}^{- 1} V a r [E (X | Y \in I_{\dot{s}})] \\ = & {(Γ + ρ I)}^{- 1} (\frac{E [X I (Y \in I_{\dot{s}})]}{p_{\dot{s}}} - \frac{E [X I (Y \notin I_{\dot{s}})]}{1 - p_{\dot{s}}}) \otimes (\frac{E [X I (Y \in I_{\dot{s}})]}{p_{\dot{s}}} - \frac{E [X I (Y \notin I_{\dot{s}})]}{1 - p_{\dot{s}}}) p_{\dot{s}} (1 - p_{\dot{s}}) \\ = & {(Γ + ρ I)}^{- 1} (\frac{E [X I (Y \in I_{\dot{s}})] - E (X) E [I (Y \in I_{\dot{s}})]}{p_{\dot{s}} (1 - p_{\dot{s}})}) \otimes (\frac{E [X I (Y \in I_{\dot{s}})] - E (X) E [I (Y \in I_{\dot{s}})]}{p_{\dot{s}} (1 - p_{\dot{s}})}) p_{\dot{s}} (1 - p_{\dot{s}}) \\ = & {(Γ + ρ I)}^{- 1} (E {(X - E (X)) I (Y \in I_{s})}) \otimes (E {(X - E (X)) I (Y \in I_{s})}) / (p_{\dot{s}} (1 - p_{\dot{s}})), \end{matrix}

where

\dot{s}

is the number of slicing space, which is described in detail in Algorithm 1 of Section 3.1. By Lemma 1 in [14],

\exists ρ (\cdot)

, such that

s p a n [E {M_{\dot{s}}}] = s p a n [E {M_{\dot{s}} ρ (\dot{s})}]

, then

M^{\dot{s}}

can also be taken to be

\begin{matrix} M^{\dot{s}} = {(Γ + ρ I)}^{- 1} (E {(X - E (X)) I (Y \in I_{s})}) \otimes (E {(X - E (X)) I (Y \in I_{s})}) . \end{matrix}

the sample estimate of

M^{\dot{s}}

is

\begin{matrix} M_{n}^{\dot{s}} = {(\hat{Γ} + ρ I)}^{- 1} (\frac{1}{n} \sum_{i = 1}^{n} (X_{i} - \bar{X}) I (Y_{i} \in I_{s})) \otimes (\frac{1}{n} \sum_{i = 1}^{n} (X_{i} - \bar{X}) I (Y_{i} \in I_{s})) . \end{matrix}

Referring to Theorem 4 in [34], we find that to prove

P (D (1) > D (k)) \to 1

under

H_{1 n}

for

k > 1

, we only need to prove

\begin{matrix} P (\frac{n \sum_{l = 2}^{k} {log ({\hat{λ}}_{l} + 1) - {\hat{λ}}_{l}}}{2 \sum_{l = 1}^{r} {log ({\hat{λ}}_{l} + 1) - {\hat{λ}}_{l}}} < C_{n} \frac{k (k + 1) - 2}{r}) \to 1 . \end{matrix}

(A9)

The biggest difference between the above formula under

H_{0}

and

H_{1 n}

is that the convergence rate about

\hat{λ}

is different. To obtain the rate of convergence about

\hat{λ}

under

H_{1 n}

, we first have to prove

∥ M_{n}^{\dot{s}} - M^{\dot{s}} ∥ = O_{p} (C_{1 n})

(

C_{1 n}

is defined in model (7)).

Let Y and

Y_{n}

be the responser under

H_{0}

and

H_{1 n}

, respectively. Then,

Y_{n} = Y + C_{1 n} G (〈 B, X 〉)

. Define

I_{\dot{s}} = (a_{\dot{s}}, b_{\dot{s}}]

, if

C_{1 n} = n^{- 1 / 2} h^{- 1 / 4}

, then

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} X_{i} I (Y_{i n} \in I_{\dot{s}}) - E [X I (Y \in I_{\dot{s}})] \\ = & \frac{1}{n} \sum_{i = 1}^{n} (X_{i} I (Y_{i n} \in I_{\dot{s}}) - E [X I (Y_{n} \in I_{\dot{s}})]) + E [X I (Y_{n} \in I_{\dot{s}})] - E [X I (Y \in I_{\dot{s}})] \\ = & O_{p} (n^{- 1 / 2}) + E (X [P (Y_{n} \in I_{\dot{s}} | X) - P (Y \in I_{\dot{s}} | X)]) \\ = & E (X [F_{Y | X} (b_{\dot{s}} - C_{1 n} G (〈 B, X 〉)) - F_{Y | X} (b_{\dot{s}}) - F_{Y | X} (a_{\dot{s}} - C_{1 n} G (〈 B, X 〉)) + F_{Y | X} (a_{\dot{s}})]) + O_{p} (n^{- 1 / 2}) \\ = & E (X [C_{1 n} G (〈 B, X 〉) {f_{Y | X} (a_{\dot{s}}) - f_{Y | X} (b_{\dot{s}})} + O_{p} (C_{1 n})]) + O_{p} (n^{- 1 / 2}) \\ = & O_{p} (C_{1 n}) . \end{matrix}

Similarly,

M_{n}^{\dot{s}} - M^{\dot{s}} = O_{p} (C_{1 n})

. Then, by Theorem 3 in [45], we have a

{\hat{λ}}_{i} - λ_{i} = O_{p} (C_{1 n})

. Then,

\sum_{l = 2}^{k} {log ({\hat{λ}}_{l} + 1) - {\hat{λ}}_{l}} = O_{p} (C_{1 n}^{2})

. To obtain the result of (A9), we need to make

n C_{1 n}^{2} / C_{n} \to 0

hold. When

C_{1 n} = n^{- 1 / 2} h^{- 1 / 4}

, since

C_{n} h^{1 / 2} \to \infty

, we have

\begin{matrix} \frac{n C_{1 n}^{2}}{C_{n}} = \frac{1}{C_{n} \sqrt{h}} \to 0 . \end{matrix}

Therefore,

P (D (1) > D (k)) ⟶ 1

under

H_{1 n}

. □

Proof of Theorem 3.

Define

δ_{i} = \hat{ε_{i}} - ε_{i}

,

i = 1, \dots, n

,

δ = {(δ_{1}, \dots, δ_{n})}^{⊤}

. Similar to the proof of Theorem 1, we have,

\begin{matrix} T_{n} & = \frac{ε^{⊤} W_{n}^{s} ε}{\sqrt{2} σ s (W_{n}^{s})} + \frac{2 δ^{⊤} W_{n}^{s} ε}{\sqrt{2} σ s (W_{n}^{s})} + \frac{δ^{⊤} W_{n}^{s} δ}{\sqrt{2} σ s (W_{n}^{s})} + o (1) \\ = D_{41 n} + D_{42 n} + D_{43 n} + o (1) . \end{matrix}

Similar to the proof in Theorem 1, it is easy to obtain

\frac{D_{41 n}}{\sqrt{2} s (Σ_{n} W_{n}^{s} Σ_{n})} \overset{d}{⟶} N (0, 1)

. Similar to (A5), we can prove that

\frac{D_{42 n}}{\sqrt{2} s (Σ_{n} W_{n}^{s} Σ_{n})} = o (1)

.

Lastly, we consider the term

D_{43 n}

. Define

Z = 〈 B, X_{j} 〉

as a random variable and the density function exists (Assumption 9). From step 1 in Theorem 1 and Proposition 1, we know that

\begin{matrix} s^{2} (W_{n}) = & \frac{1}{h} E [f^{- 1} (〈 B, X 〉)] (\int K^{2} (u) d u + o_{p} (1)) . \end{matrix}

Similar to the first part of (A3), we have

\begin{matrix} δ^{⊤} W_{n} δ = & \sum_{i = 1}^{n} \sum_{j \neq i}^{n} w_{i j} [g (〈 \hat{β}, X_{i} 〉) - g (〈 β_{0}, X_{i} 〉) + C_{1 n} G (〈 B, X_{i} 〉)] [g (〈 \hat{β}, X_{j} 〉) - g (〈 β_{0}, X_{j} 〉) + C_{1 n} G (〈 B, X_{j} 〉)] \\ = & \sum_{i = 1}^{n} \sum_{j \neq i}^{n} w_{i j} [g (〈 \hat{β}, X_{i} 〉) - g (〈 β_{0}, X_{i} 〉)] [g (〈 \hat{β}, X_{j} 〉) - g (〈 β_{0}, X_{j} 〉)] + 2 C_{1 n} \sum_{i = 1}^{n} \sum_{j \neq i}^{n} w_{i j} [g (〈 \hat{β}, X_{i} 〉) - g (〈 β_{0}, X_{i} 〉)] G (〈 B, X_{j} 〉) \\ + C_{1 n}^{2} \sum_{i = 1}^{n} \sum_{j \neq i}^{n} w_{i j} G (〈 B, X_{i} 〉) G (〈 B, X_{j} 〉) \\ = & \sum_{i = 1}^{n} \sum_{j \neq i}^{n} w_{i j} g^{'} (〈 β, X_{i} 〉) g^{'} (〈 β, X_{j} 〉) 〈 \hat{β} - β_{0}, X_{i} 〉 〈 \hat{β} - β_{0}, X_{j} 〉 [1 + o_{p} (1)] + 2 C_{1 n} \sum_{i = 1}^{n} \sum_{j \neq i}^{n} w_{i j} g^{'} (〈 β, X_{i} 〉) 〈 \hat{β} - β_{0}, X_{i} 〉 G (〈 B, X_{j} 〉) \\ + C_{1 n}^{2} \sum_{i = 1}^{n} \sum_{j \neq i}^{n} w_{i j} G (〈 B, X_{i} 〉) G (〈 B, X_{j} 〉) \\ \equiv & Q_{1 n} + C_{1 n} Q_{2 n} + C_{1 n}^{2} Q_{3 n} . \end{matrix}

Similar to Theorem 1, when

C_{1 n} = n^{- 1 / 2} h^{- 1 / 4}

, we have

\begin{matrix} D_{43 n} = & \frac{δ^{⊤} W_{n}^{s} δ}{\sqrt{2} σ s (W_{n}^{s})} \\ = & \frac{Q_{1 n}}{\sqrt{2} σ s (W_{n}^{s})} + \frac{C_{1 n} Q_{2 n}}{\sqrt{2} σ s (W_{n}^{s})} + \frac{C_{1 n}^{2} Q_{3 n}}{\sqrt{2} σ s (W_{n}^{s})} \\ = & O_{p} (n r_{n}^{2} h^{1 / 2}) + O_{p} (C_{1 n} n r_{n} h^{1 / 2}) + \frac{n h^{1 / 2} C_{1 n}^{2} E [G^{2} (〈 B, X 〉)]}{\sqrt{2 σ^{2} E [f^{- 1} (〈 B, X 〉)]}} \\ = & o_{p} (1) + o_{p} (1) + \frac{E [G^{2} (〈 B, X 〉)]}{\sqrt{2 σ^{2} E [f^{- 1} (〈 B, X 〉)]}}, \end{matrix}

which leads to

\begin{matrix} D_{43 n} \overset{p}{⟶} \frac{E [G^{2} (〈 B, X 〉)]}{\sqrt{2 σ^{2} E [f^{- 1} (〈 B, X 〉)]}} \equiv ω . \end{matrix}

This completes the proof of Theorem 3. □

Appendix B

Table A1. Empirical sizes and powers of test statistics under five alternative models in Example 1.

Models	Sample	c	$S_{n}$	$S_{n}^{B}$	$S_{n}^{FSIR}$	$T_{n}$	$T_{n}^{B}$	$T_{n}^{FSIR}$	$CvM$	$KS$
$H_{11}$	$n = 100$	0	0.032	0.047	0.035	0.036	0.052	0.042	0.054	0.059
		2	0.175	0.246	0.390	0.186	0.258	0.424	0.136	0.125
		4	0.699	0.771	0.941	0.749	0.814	0.957	0.327	0.270
		6	0.981	0.990	0.997	0.989	0.995	0.999	0.559	0.471
	$n = 200$	0	0.038	0.045	0.040	0.039	0.051	0.048	0.052	0.047
		2	0.397	0.492	0.740	0.403	0.531	0.740	0.229	0.196
		4	0.981	0.988	0.999	0.989	0.991	0.999	0.582	0.493
		6	1.000	1.000	1.000	1.000	1.000	1.000	0.812	0.723
$H_{12}$	$n = 100$	0	0.032	0.047	0.035	0.036	0.052	0.042	0.054	0.059
		2	0.170	0.217	0.390	0.204	0.250	0.557	0.404	0.299
		4	0.762	0.742	0.590	0.820	0.790	0.650	0.761	0.673
		6	0.978	0.957	0.877	0.983	0.968	0.893	0.893	0.839
	$n = 200$	0	0.038	0.045	0.040	0.039	0.051	0.048	0.052	0.047
		2	0.426	0.490	0.798	0.491	0.569	0.935	0.691	0.556
		4	0.989	0.990	0.931	0.997	0.996	0.951	0.972	0.938
		6	1.000	1.000	0.997	1.000	1.000	0.997	0.993	0.989
$H_{13}$	$n = 100$	0	0.032	0.047	0.035	0.036	0.052	0.042	0.054	0.059
		2	0.090	0.141	0.123	0.115	0.153	0.198	0.257	0.181
		4	0.453	0.494	0.579	0.528	0.550	0.646	0.634	0.522
		6	0.819	0.798	0.844	0.877	0.848	0.879	0.788	0.710
	$n = 200$	0	0.038	0.045	0.040	0.039	0.051	0.048	0.052	0.047
		2	0.187	0.265	0.263	0.270	0.345	0.434	0.503	0.368
		4	0.848	0.877	0.917	0.928	0.932	0.944	0.883	0.818
		6	0.992	0.993	0.995	0.999	0.999	1.000	0.954	0.926
$H_{14}$	$n = 100$	0	0.032	0.047	0.035	0.036	0.052	0.042	0.054	0.059
		2	0.117	0.166	0.151	0.146	0.201	0.233	0.195	0.158
		4	0.523	0.547	0.758	0.613	0.624	0.799	0.324	0.279
		6	0.853	0.843	0.950	0.872	0.883	0.954	0.371	0.326
	$n = 200$	0	0.038	0.045	0.040	0.039	0.051	0.048	0.052	0.047
		2	0.291	0.375	0.352	0.399	0.486	0.521	0.285	0.235
		4	0.835	0.848	0.977	0.876	0.876	0.985	0.460	0.414
		6	1.000	1.000	1.000	1.000	1.000	0.999	0.509	0.467
$H_{15}$	$n = 100$	0	0.032	0.047	0.035	0.036	0.052	0.042	0.054	0.059
		2	0.244	0.314	0.348	0.283	0.350	0.444	0.349	0.278
		4	0.826	0.836	0.937	0.857	0.858	0.956	0.541	0.496
		6	0.881	0.876	0.996	0.883	0.867	0.995	0.570	0.520
	$n = 200$	0	0.038	0.045	0.040	0.039	0.051	0.048	0.052	0.047
		2	0.581	0.661	0.704	0.679	0.745	0.832	0.517	0.454
		4	0.896	0.898	1.000	0.898	0.899	0.999	0.717	0.671
		6	1.000	1.000	1.000	1.000	1.000	1.000	0.726	0.687
$H_{16}$	$n = 100$	0	0.032	0.047	0.035	0.036	0.052	0.042	0.054	0.059
		2	0.242	0.315	0.343	0.281	0.350	0.479	0.344	0.296
		4	0.730	0.724	0.946	0.755	0.743	0.958	0.530	0.469
		6	0.890	0.880	0.995	0.890	0.882	0.996	0.579	0.528
	$n = 200$	0	0.038	0.045	0.040	0.039	0.051	0.048	0.052	0.047
		2	0.582	0.661	0.737	0.676	0.734	0.857	0.529	0.466
		4	0.898	0.897	1.000	0.901	0.902	1.000	0.717	0.670
		6	1.000	1.000	1.000	1.000	1.000	1.000	0.740	0.703

Table A2. Empirical sizes and powers of test statistics under four DGM in Example 2.

DGM	Sample	c	$S_{n}$	$S_{n}^{B}$	$S_{n}^{FSIR}$	$T_{n}$	$T_{n}^{B}$	$T_{n}^{FSIR}$	$CvM$	$KS$
$D G M_{1}$	$n = 100$	0	0.037	0.045	0.043	0.039	0.044	0.046	0.053	0.049
		2	0.168	0.242	0.258	0.185	0.254	0.370	0.262	0.192
		4	0.667	0.703	0.907	0.716	0.743	0.947	0.417	0.369
		6	0.848	0.843	0.990	0.865	0.851	0.994	0.544	0.486
	$n = 200$	0	0.041	0.054	0.038	0.047	0.041	0.053	0.057	0.059
		2	0.401	0.490	0.560	0.442	0.562	0.750	0.396	0.319
		4	0.885	0.884	1.000	0.896	0.895	1.000	0.649	0.601
		6	1.000	1.000	1.000	1.000	1.000	1.000	0.698	0.646
$D G M_{2}$	$n = 100$	0	0.039	0.049	0.052	0.038	0.048	0.056	0.050	0.054
		2	0.238	0.298	0.359	0.262	0.310	0.491	0.342	0.266
		4	0.806	0.814	0.966	0.838	0.831	0.985	0.563	0.485
		6	0.875	0.870	0.999	0.884	0.874	0.999	0.666	0.586
	$n = 200$	0	0.033	0.061	0.057	0.039	0.046	0.052	0.062	0.049
		2	0.534	0.614	0.730	0.609	0.692	0.867	0.538	0.443
		4	0.897	0.899	1.000	0.897	0.898	1.000	0.725	0.661
		6	1.000	1.000	1.000	1.000	1.000	1.000	0.805	0.773
$D G M_{3}$	$n = 100$	0	0.030	0.058	0.044	0.040	0.063	0.052	0.061	0.066
		2	0.059	0.097	0.060	0.059	0.100	0.058	0.113	0.100
		4	0.175	0.251	0.284	0.185	0.252	0.392	0.211	0.177
		6	0.430	0.506	0.794	0.479	0.542	0.882	0.337	0.274
	$n = 200$	0	0.036	0.047	0.045	0.038	0.046	0.048	0.054	0.053
		2	0.100	0.165	0.149	0.118	0.192	0.166	0.186	0.165
		4	0.468	0.558	0.683	0.530	0.620	0.818	0.352	0.292
		6	0.778	0.786	0.992	0.812	0.833	0.997	0.501	0.424
$D G M_{4}$	$n = 100$	0	0.038	0.054	0.042	0.039	0.053	0.055	0.053	0.061
		2	0.225	0.302	0.266	0.242	0.319	0.245	0.315	0.284
		4	0.781	0.785	0.953	0.819	0.810	0.975	0.564	0.506
		6	0.883	0.872	0.997	0.885	0.875	0.998	0.646	0.591
	$n = 200$	0	0.039	0.042	0.041	0.043	0.045	0.051	0.061	0.071
		2	0.527	0.623	0.586	0.619	0.684	0.620	0.513	0.456
		4	0.996	0.997	1.000	0.997	0.997	1.000	0.747	0.686
		6	1.000	1.000	1.000	1.000	1.000	1.000	0.816	0.776

Table A3. The p-value of different test statistics in real data analysis.

Statistic	$S_{n}$	$S_{n}^{B}$	$S_{n}^{FSIR}$	$T_{n}$	$T_{n}^{B}$	$T_{n}^{FSIR}$	$CvM$	$KS$
p-value	0.000	0.009	0.000	0.000	0.013	0.000	0.528	0.374

Figure A1. Empirical sizes and powers of

T_{n}

,

T_{n}^{B}

and

T_{n}^{FSIR}

against the value of c at

5 %

significance level in Examples 3 and 4.

Figure A1. Empirical sizes and powers of

T_{n}

,

T_{n}^{B}

and

T_{n}^{FSIR}

against the value of c at

5 %

significance level in Examples 3 and 4.

Figure A2. COVID-19 data in real data analysis.

References

Ait-Saïdi, A.; Ferraty, F.; Kassa, R.; Vieu, P. Cross-validated estimations in the single-functional index model. Statistics 2008, 42, 475–494. [Google Scholar] [CrossRef]
Ma, S. Estimation and inference in functional single-index models. Ann. Inst. Stat. Math. 2016, 68, 181–208. [Google Scholar] [CrossRef]
Wang, G.; Feng, X.N.; Chen, M. Functional partial linear single-index model. Scand. J. Stat. 2016, 43, 261–274. [Google Scholar] [CrossRef]
Jiang, F.; Baek, S.; Cao, J.; Ma, Y. A Functional single-index model. Stat. Sin. 2020, 30, 303–324. [Google Scholar] [CrossRef]
Jäntschi, L. A test detecting the outliers for continuous distributions based on the cumulative distribution function of the data being tested. Symmetry 2019, 11, 835. [Google Scholar] [CrossRef] [Green Version]
Cardot, H.; Ferraty, F.; Mas, A.; Sarda, P. Testing hypotheses in the functional linear model. Scand. J. Stat. 2003, 30, 241–255. [Google Scholar] [CrossRef]
Kokoszka, P.; Maslova, I.; Sojka, J.; Zhu, L. Testing for lack of dependence in the functional linear model. Can. J. Stat. 2008, 36, 207–222. [Google Scholar] [CrossRef]
Hilgert, N.; Mas, A.; Verzelen, N. Minimax adaptive tests for the functional linear model. Ann. Stat. 2013, 41, 838–869. [Google Scholar] [CrossRef]
Patilea, V.; Sánchez-Sellero, C.; Saumard, M. Projection-based nonparametric goodness-of-fit testing with functional covariates. arXiv 2012, arXiv:1205.5578. [Google Scholar]
Patilea, V.; Sánchez-Sellero, C.; Saumard, M. Testing the predictor effect on a functional response. J. Am. Stat. Assoc. 2016, 111, 1684–1695. [Google Scholar] [CrossRef]
Shi, G.; Du, J.; Sun, Z.; Zhang, Z. Checking the adequacy of functional linear quantile regression model. J. Stat. Plan. Inference 2021, 210, 64–75. [Google Scholar] [CrossRef]
García-Portugués, E.; González-Manteiga, W.; Febrero-Bande, M. A goodness-of-fit test for the functional linear model with scalar response. J. Comput. Graph. Stat. 2014, 23, 761–778. [Google Scholar] [CrossRef] [Green Version]
Cuesta-Albertos, J.A.; García-Portugués, E.; Febrero-Bande, M.; González-Manteiga, W. Goodness-of-fit tests for the functional linear model based on randomly projected empirical processes. Ann. Stat. 2019, 47, 439–467. [Google Scholar] [CrossRef] [Green Version]
Guo, X.; Wang, T.; Zhu, L. Model checking for parametric single-index models: A dimension reduction model-adaptive approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 2016, 78, 1013–1035. [Google Scholar] [CrossRef]
Stute, W.; Thies, S.; Zhu, L.X. Model checks for regression: An innovation process approach. Ann. Stat. 1998, 26, 1916–1934. [Google Scholar] [CrossRef]
Hardle, W.; Mammen, E. Comparing nonparametric versus parametric regression fits. Ann. Stat. 1993, 21, 1926–1947. [Google Scholar] [CrossRef]
Patilea, V.; Sánchez-Sellero, C. Testing for lack-of-fit in functional regression models against general alternatives. J. Stat. Plan. Inference 2020, 209, 229–251. [Google Scholar] [CrossRef]
Lian, H. Functional sufficient dimension reduction: Convergence rates and multiple functional case. J. Stat. Plan. Inference 2015, 167, 58–68. [Google Scholar] [CrossRef]
Ferré, L.; Yao, A.F. Functional sliced inverse regression analysis. Statistics 2003, 37, 475–488. [Google Scholar] [CrossRef]
Ferré, L.; Yao, A.F. Smoothed functional inverse regression. Stat. Sin. 2005, 15, 665–683. [Google Scholar]
Wang, G.; Lin, N.; Zhang, B. Functional k-means inverse regression. Comput. Stat. Data Anal. 2014, 70, 172–182. [Google Scholar] [CrossRef]
Lian, H.; Li, G. Series expansion for functional sufficient dimension reduction. J. Multivar. Anal. 2014, 124, 150–165. [Google Scholar] [CrossRef]
Ellison, G.; Ellison, S.F. A simple framework for nonparametric specification testing. J. Econom. 2000, 96, 1–23. [Google Scholar] [CrossRef] [Green Version]
Stute, W.; Zhu, L.X. Model checks for generalized linear models. Scand. J. Stat. 2002, 29, 535–545. [Google Scholar] [CrossRef]
Hall, P.; Horowitz, J.L. Methodology and convergence rates for functional linear regression. Ann. Stat. 2007, 35, 70–91. [Google Scholar] [CrossRef] [Green Version]
Yuan, M.; Cai, T.T. A reproducing kernel Hilbert space approach to functional linear regression. Ann. Stat. 2010, 38, 3412–3444. [Google Scholar] [CrossRef]
Crambes, C.; Kneip, A.; Sarda, P. Smoothing splines estimators for functional linear regression. Ann. Stat. 2009, 37, 35–72. [Google Scholar] [CrossRef]
Zhao, Y.; Ogden, R.T.; Reiss, P.T. Wavelet-based LASSO in functional linear regression. J. Comput. Graph. Stat. 2012, 21, 600–617. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Li, B. Sufficient Dimension Reduction: Methods and Applications with R; Chapman and Hall/CRC: Boca Raton, FL, USA, 2018. [Google Scholar]
Xu, J.; Cheng, H.; Cui, W.; Li, Y. Sufficient dimension reduction via distance covariance for functional and longitudinal data. arXiv 2022, arXiv:2202.13579. [Google Scholar]
Li, K.C. Sliced inverse regression for dimension reduction. J. Am. Stat. Assoc. 1991, 86, 316–327. [Google Scholar] [CrossRef]
Lin, Q.; Zhao, Z.; Liu, J.S. On consistency and sparsity for sliced inverse regression in high dimensions. Ann. Stat. 2018, 46, 580–610. [Google Scholar] [CrossRef] [Green Version]
Zhu, L.; Miao, B.; Peng, H. On sliced inverse regression with high-dimensional covariates. J. Am. Stat. Assoc. 2006, 101, 630–643. [Google Scholar] [CrossRef]
Zhu, L.; Wang, T.; Zhu, L.; Ferré, L. Sufficient dimension reduction through discretization-expectation estimation. Biometrika 2010, 97, 295–304. [Google Scholar] [CrossRef]
Zhu, L.P.; Zhu, L.X.; Feng, Z.H. Dimension reduction in regressions through cumulative slicing estimation. J. Am. Stat. Assoc. 2010, 105, 1455–1466. [Google Scholar] [CrossRef]
Sheng, W.; Yin, X. Sufficient dimension reduction via distance covariance. J. Comput. Graph. Stat. 2016, 25, 91–104. [Google Scholar] [CrossRef]
Zhang, J.; Chen, X. Robust sufficient dimension reduction via ball covariance. Comput. Stat. Data Anal. 2019, 140, 144–154. [Google Scholar] [CrossRef]
Du, P.; Wang, X. Penalized likelihood functional regression. Stat. Sin. 2014, 24, 1017–1041. [Google Scholar] [CrossRef] [Green Version]
Cardot, H.; Sarda, P. Estimation in generalized linear models for functional data via penalized likelihood. J. Multivar. Anal. 2005, 92, 24–41. [Google Scholar] [CrossRef] [Green Version]
Zheng, J.X. A consistent nonparametric test of parametric regression models under conditional quantile restrictions. Econom. Theory 1998, 14, 123–138. [Google Scholar] [CrossRef]
Niu, C.; Zhu, L. An adaptive-to-model test for parametric single-index models with missing responses. Electron. J. Stat. 2017, 11, 1491–1526. [Google Scholar] [CrossRef]
Zhu, X.; Guo, X.; Zhu, L. An adaptive-to-model test for partially parametric single-index models. Stat. Comput. 2017, 27, 1193–1204. [Google Scholar] [CrossRef] [Green Version]
Shi, E.; Liu, Y.; Sun, K.; Li, L.; Kong, L. An adaptive model checking test for functional linear model. arXiv 2022, arXiv:2204.01831. [Google Scholar]
Mikosch, T. Functional limit theorems for random quadratic forms. Stoch. Process. Their Appl. 1991, 37, 81–98. [Google Scholar] [CrossRef] [Green Version]
Li, B.; Wen, S.; Zhu, L. On a projective resampling method for dimension reduction with multivariate responses. J. Am. Stat. Assoc. 2008, 103, 1177–1186. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xia, L.; Lai, T.; Zhang, Z. An Adaptive-to-Model Test for Parametric Functional Single-Index Model. Mathematics 2023, 11, 1812. https://doi.org/10.3390/math11081812

AMA Style

Xia L, Lai T, Zhang Z. An Adaptive-to-Model Test for Parametric Functional Single-Index Model. Mathematics. 2023; 11(8):1812. https://doi.org/10.3390/math11081812

Chicago/Turabian Style

Xia, Lili, Tingyu Lai, and Zhongzhan Zhang. 2023. "An Adaptive-to-Model Test for Parametric Functional Single-Index Model" Mathematics 11, no. 8: 1812. https://doi.org/10.3390/math11081812

APA Style

Xia, L., Lai, T., & Zhang, Z. (2023). An Adaptive-to-Model Test for Parametric Functional Single-Index Model. Mathematics, 11(8), 1812. https://doi.org/10.3390/math11081812

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Adaptive-to-Model Test for Parametric Functional Single-Index Model

Abstract

1. Introduction

2. The Test Statistic

3. Identification and Estimation of $B$ and $K$

3.1. A Brief Review on FSIR

3.2. Estimating the Structural Dimension $\hat{K}$

4. Asymptotic Properties

5. Simulations Results and Real Data Analysis

5.1. Study 1: Linear Link Function

5.2. Study 2: Non-Linear Link Function

5.3. Analysis of the COVID-19 Data

6. Conclusions and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Part I

Appendix A.2. Part II

Appendix A.3. Part III

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

An Adaptive-to-Model Test for Parametric Functional Single-Index Model

Abstract

1. Introduction

2. The Test Statistic

3. Identification and Estimation of B and K

3.1. A Brief Review on FSIR

3.2. Estimating the Structural Dimension K ^

4. Asymptotic Properties

5. Simulations Results and Real Data Analysis

5.1. Study 1: Linear Link Function

5.2. Study 2: Non-Linear Link Function

5.3. Analysis of the COVID-19 Data

6. Conclusions and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Part I

Appendix A.2. Part II

Appendix A.3. Part III

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3. Identification and Estimation of $B$ and $K$

3.2. Estimating the Structural Dimension $\hat{K}$