GMM Estimation of a Partially Linear Additive Spatial Error Model

Chen, Jianbao; Cheng, Suli

doi:10.3390/math9060622

Open AccessArticle

GMM Estimation of a Partially Linear Additive Spatial Error Model

by

Jianbao Chen

¹

and

Suli Cheng

^1,2,*

¹

College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350117, China

²

College of Mathematics and Statistics, Chongqing Technology and Business University, Chongqing 400067, China

^*

Author to whom correspondence should be addressed.

Mathematics 2021, 9(6), 622; https://doi.org/10.3390/math9060622

Submission received: 19 February 2021 / Revised: 6 March 2021 / Accepted: 9 March 2021 / Published: 15 March 2021

(This article belongs to the Special Issue Recent Advances of Computational Statistics in Industry and Business)

Download

Browse Figures

Versions Notes

Abstract

This article presents a partially linear additive spatial error model (PLASEM) specification and its corresponding generalized method of moments (GMM). It also derives consistency and asymptotic normality of estimators for the case with a single nonparametric term and an arbitrary number of nonparametric additive terms under some regular conditions. In addition, the finite sample performance for our estimates is assessed by Monte Carlo simulations. Lastly, the proposed method is illustrated by analyzing Boston housing data.

Keywords:

PLASEM; local linear estimation; GMM estimation; consistency; asymptotic normality; Monte Carlo simulation

1. Introduction

Linear regression is one of the most classic and widely used techniques in statistics. Linear regression models usually assume that the mean response is linear; however, this assumption is not always granted Fan and Gijbels [1]. A nonparametric model was proposed to more flexibly solve nonlinear problems encountered in practice. Partially linear additive regression model (PLARM) can be used to study the linear and nonlinear relationship between a response and its covariates simultaneously. Opsomer and Ruppert [2] first proposed the PLARM and presented consistent backfitting estimators for the parametric part. Manzan and Zerom [3] introduced a kernel estimator of the finite dimensional parameter in the PLARM and derived consistency and asymptotic normality of the estimators under some regular conditions. Zhou et al. [4] considered variable selection for PLARM with measurement error. Wei et al. [5] investigated the empirical likelihood method for the PLARM. Hoshino [6] proposed a two-step estimation approach of partially linear additive quantile regression model. Lou et al. [7] introduced sparse PLARM and discussed the model selection problem through convex relaxation. Liu et al. [8] presented the spline–backfitted kernel estimator of PLARM and studied statistical inference properties of the estimator. Manghi et al. [9] also discussed the statistical inference of the generalized PLARM.

The linear spatial autoregressive model, which extends the ordinary linear regression model in view of a spatially lagged term of the response variable, has been one of the most important statistical tools for modeling spatial dependence among the spatial units (Li and Mei [10]). Theories and estimation methods based on linear autoregressive models have attracted widespread attention and have been extensively studied (Anselin [11], Kelejian and Prucha [12], Elhorst [13], et al.). In the real world, highly nonlinear practice problems commonly exist between response variables and regressors, especially for spatial data. Su and Jin [14] investigated profile quasi-maximum likelihood estimation of a partially linear spatial autoregressive model. Su [15] proposed generalized method of moments (GMM) estimation of nonparametric spatial autoregressive (SAR) model, which included heteroscedasticity and spatial dependence in the error terms. Li and Mei [10] studied statistical inference of a profile quasi-maximum likelihood estimator based on the partially linear spatial autoregressive model in Su and Jin [14]. However, these models face the problem of “curse of dimensionality”, namely, their nonparametric estimation precision decreases rapidly as the dimension of explanatory variables increases. In order to overcome this drawback, some useful semiparametric SAR models with dimensional reduction functions, which include partially linear single-index, varying coefficient, and additive SAR models, have been developed in recent years. Sun [16] studied GMM estimators of single-index SAR models and their asymptotic distributions. Cheng et al. [17] not only extended the single-index SAR model to partially linear single-index SAR models, but they also explored GMM estimation of the model and asymptotic properties of estimators. Wei and Sun [18] proposed GMM estimation of varying coefficients in the SAR model and obtained asymptotic properties of estimators. Dai et al. [19] considered a quantile regression approach for partially linear varying coefficients in the SAR model and established asymptotic properties of estimators and test statistics. Du et al. [20] presented a partially linear additive SAR model, constructed GMM estimation of the model, and proved the asymptotic properties of estimators. To our best knowledge, research on the statistical inference of partially linear additive spatial error model (PLASEM) is still lacking. In this paper, we attempt to study GMM estimation of this model and statistical properties of estimators.

The remainder of this paper is organized as follows. In Section 2, we present the PLASEM and its corresponding estimation method. In Section 3, we derive the large sample properties under some regular assumptions. Some Monte Carlo simulations are conducted in Section 4 to assess the finite sample performance of estimates. The developed method is illustrated with the Boston housing price data in Section 5. Conclusions are summarized in Section 6. The detailed proofs are given in Appendix A.

2. Model and Estimation

2.1. Model Specification

Consider the following PLASEM:

\{\begin{matrix} y_{n, i} = β^{T} x_{n, i} + \sum_{j = 1}^{d} m_{j} (z_{n, i j}) + η_{n, i}, \\ η_{n, i} = λ \sum_{k = 1}^{n} w_{n, i k} η_{n, k} + ε_{n, i}, \end{matrix} i = 1, 2, \dots, n,

(1)

where

y_{n, i}

is the ith observation of response variable

y_{n}

,

x_{n, i}

is the ith observation of p dimensional exogenous regressor

x_{n}

,

z_{n, i j}

is the jth observation of exogenous regressor

z_{n, i}

,

β

is a p dimensional linear regression coefficient,

λ

is a spatial auto-correlation coefficient,

m_{j} (\cdot) (j = 1, 2, \dots, d)

is an unknown smooth function,

η_{n, i}

is regression disturbance,

w_{n, i j}

is non-stochastic spatial weight, and the ith innovation

ε_{n, i}

is independent with E

(e_{n, i}) = 0

and E

{(e_{n, i})}^{2} = σ^{2}

. Furthermore, we assume E

(m_{j} (z_{n, i j})) = 0 (j = 1, 2, \dots, d)

for identification purposes. Denote

Y_{n} = {(y_{n, 1}, y_{n, 2}, \dots, y_{n, n})}^{T}

, and similarly for

X_{n}

,

η_{n}

and

ε_{n}

. We write the spatial weight matrix and the vectors of additive functions as

W_{n} = {(w_{n, i j})}_{1 \leq i, j \leq n}

and

m_{j} = {(m_{j} (z_{n, 1 j}), m_{j} (z_{n, 2 j}), \dots, m_{j} (z_{n, n j}))}^{T}

respectively. In matrix notation, Model (1) can be rewritten as

\{\begin{matrix} Y_{n} = X_{n} β + m_{1} + m_{2} + \dots + m_{d} + η_{n}, \\ η_{n} = λ W_{n} η_{n} + ε_{n} . \end{matrix}

(2)

2.2. Estimation Procedures

We first study the simple case with

d = 1

, so Model (2) under consideration is

\{\begin{matrix} Y_{n} = X_{n} β + m + η_{n}, \\ η_{n} = λ W_{n} η_{n} + ε_{n}, \end{matrix}

(3)

where

m = {(m (z_{n, 1}), \dots, m (z_{n, n}))}^{T}

. The specific estimation procedures are as follows:

Step 1. The initial estimation of unknown function $m (\cdot)$ . Following the idea of “working independence” in Lin and Carroll, [21], the correlation structure of $η_{n}$ can be ignored when we estimate $m (\cdot)$ . Next, a local linear estimation method is used to fit unknown function $m (\cdot)$ . Let $s_{z}$ represent the equivalent kernels for the local linear regression at z, it can be written as $s_{z} = e_{1}^{T} {(Z^{T} K Z)}^{- 1} Z^{T} K$ , where $e_{1} = {(1, 0)}^{T}$ , $Z = {(\begin{matrix} 1 & \dots & 1 \\ \frac{z_{n, 1} - z}{h} & \dots & \frac{z_{n, n} - z}{h} \end{matrix})}^{T}$ , $K = diag \{k_{h} (z_{n, 1} - z), \dots, k_{h} (z_{n, n} - z)\}$ with $k_{h} (\cdot) = \frac{1}{h} k (\cdot / h)$ for kernel function $k (\cdot)$ and bandwidth h. Thus, we can obtain the initial estimator of $m (\cdot)$ :

$\tilde{m} (z) = s_{z} (Y_{n} - X_{n} β) .$

Let $S = {(s_{z, 1}^{T}, \dots, s_{z, n}^{T})}^{T}$ be the smoother matrix whose rows are the equivalent kernels; we have

$\tilde{m} = S (Y_{n} - X_{n} β) .$

(4)
Step 2. The initial GMM estimation of $β$ . Let $H_{n} = {(h_{n, 1}, \dots, h_{n, n})}^{T}$ be an $n \times r (r \geq p + 1)$ matrix of instrumental variables, where $h_{n, i} = {(h_{n, i 1}, h_{n, i 2}, \dots, h_{n, i r})}^{T}$ , and we have the moment conditions E $(H_{n}^{T} η_{n}) = 0$ . Now we replace $m$ in Model (3) by $\tilde{m}$ and get the corresponding moment functions:

$l_{n} (β) = H_{n}^{T} (Y_{n} - X_{n} β - \tilde{m}) = H_{n}^{T} ({\tilde{Y}}_{n} - {\tilde{X}}_{n} β),$

where ${\tilde{Y}}_{n} = (I_{n} - S) Y_{n}$ , ${\tilde{X}}_{n} = (I_{n} - S) X_{n}$ . Let $A_{r}$ be a $r \times r$ positive definite constant matrix, we can choose initial estimator of $β$ by minimizing object function

$\begin{matrix} Q_{n} (β) = l_{n}^{T} (β) A_{r} l_{n} (β) = {({\tilde{Y}}_{n} - {\tilde{X}}_{n} β)}^{T} H_{n} A_{r} H_{n}^{T} ({\tilde{Y}}_{n} - {\tilde{X}}_{n} β), \end{matrix}$

It is easy to see that

$\tilde{β} = \arg min_{β} Q_{n} (β) = {({\tilde{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} {\tilde{X}}_{n})}^{- 1} {\tilde{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} {\tilde{Y}}_{n} .$
Step 3. The estimation of parameters $λ$ and $σ^{2}$ . For notational convenience, let ${\bar{η}}_{n} = W_{n} η_{n}$ , ${\overset{=}{η}}_{n} = W_{n}^{2} η_{n}$ , ${\bar{ε}}_{n} = W_{n} ε_{n}$ , similarly, we have $η_{n} = λ {\bar{η}}_{n} + ε_{n}$ and ${\bar{η}}_{n} = λ {\overset{=}{η}}_{n} + {\bar{ε}}_{n}$ , where ${\bar{η}}_{n, i}$ , ${\overset{=}{η}}_{n, i}$ , ${\bar{ε}}_{n, i}$ are the ith element of ${\bar{η}}_{n}$ , ${\overset{=}{η}}_{n}$ , ${\bar{ε}}_{n}$ respectively. Therefore, we get the following equations

$\begin{matrix} η_{n, i} - λ {\bar{η}}_{n, i} = ε_{n, i}, \end{matrix}$

(5)

$\begin{matrix} {\bar{η}}_{n, i} - λ {\bar{\bar{η}}}_{n, i} = {\bar{ε}}_{n, i} . \end{matrix}$

(6)

By squaring (5) and then summing, squaring (6) and summing, multiplying (5) by (6), and summing, we obtain the following equations:

$\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} η_{n, i}^{2} = \frac{2 λ}{n} \sum_{i = 1}^{n} η_{n, i} {\bar{η}}_{n, i} - \frac{λ^{2}}{n} \sum_{i = 1}^{n} {\bar{η}}_{n, i}^{2} + \frac{1}{n} \sum_{i = 1}^{n} ε_{n, i}^{2}, \\ \frac{1}{n} \sum_{i = 1}^{n} {\bar{η}}_{n, i}^{2} = \frac{2 λ}{n} \sum_{i = 1}^{n} {\bar{η}}_{n, i} {\bar{\bar{η}}}_{n, i} - \frac{λ^{2}}{n} \sum_{i = 1}^{n} {\bar{\bar{η}}}_{n, i}^{2} + \frac{1}{n} \sum_{i = 1}^{n} {\bar{ε}}_{n, i}^{2}, \\ \frac{1}{n} \sum_{i = 1}^{n} η_{n, i} {\bar{η}}_{n, i} = \frac{λ}{n} \sum_{i = 1}^{n} (η_{n, i} {\bar{\bar{η}}}_{n, i} + {\bar{η}}_{n, i}^{2}) - \frac{λ^{2}}{n} \sum_{i = 1}^{n} {\bar{η}}_{n, i} {\bar{\bar{η}}}_{n, i} + \frac{1}{n} \sum_{i = 1}^{n} ε_{n, i} {\bar{ε}}_{n, i} . \end{matrix}$

(7)

Assumption 2 implies $E (\frac{1}{n} \sum_{i = 1}^{n} ε_{n, i}^{2}) = σ^{2}$ , noting that $E (\frac{1}{n} \sum_{i = 1}^{n} ε_{n, i} {\bar{ε}}_{n, i}) = 0$ and $E (\frac{1}{n} \sum_{i = 1}^{n} {\bar{ε}}_{n, i}^{2}) = \frac{σ^{2}}{n} tr (W_{n}^{T} W_{n})$ , where $tr (\cdot)$ denotes the trace operator, let $θ = {(λ, λ^{2}, σ^{2})}^{T}$ , taking the expectations of Equation (7), we have

$Γ_{n} θ = γ_{n},$

(8)

where $Γ_{n} = \frac{1}{n} (\begin{matrix} 2 E (η_{n}^{T} {\bar{η}}_{n}) & - E ({\bar{η}}_{n}^{T} {\bar{η}}_{n}) & 1 \\ 2 E ({\bar{η}}_{n}^{T} {\overset{=}{η}}_{n}) & - E ({\overset{=}{η}}_{n}^{T} {\overset{=}{η}}_{n}) & tr (W_{n}^{T} W_{n}) \\ E (η_{n}^{T} {\overset{=}{η}}_{n} + {\bar{η}}_{n}^{T} {\bar{η}}_{n}) & - E ({\bar{η}}_{n}^{T} {\overset{=}{η}}_{n}) & 0 \end{matrix})$ , $γ_{n} = \frac{1}{n} (\begin{matrix} E (η_{n}^{T} η_{n}) \\ E ({\bar{η}}_{n}^{T} {\bar{η}}_{n}) \\ E (η_{n}^{T} {\bar{η}}_{n}) \end{matrix})$ .
If $Γ_{n}$ and $γ_{n}$ are known, Equations (7) implies that (8) determines estimator of $θ$ as

$\hat{θ} = Γ_{n}^{- 1} γ_{n} .$

(9)

In general, $Γ_{n}$ and $γ_{n}$ are unknown, we present the estimator of $θ$ by the two-stage estimation in Kelejian and Prucha [12]. Let ${\tilde{\bar{η}}}_{n} = W_{n} {\tilde{η}}_{n}$ and ${\tilde{\overset{=}{η}}}_{n} = W_{n}^{2} {\tilde{η}}_{n}$ , where ${\tilde{η}}_{n} = {\tilde{Y}}_{n} - {\tilde{X}}_{n} \tilde{β}$ is obtained from ${\tilde{η}}_{n}$ in the second step, and denote ${\tilde{η}}_{n, i}$ , ${\tilde{\bar{η}}}_{n, i}$ , ${\tilde{\bar{\bar{η}}}}_{n, i}$ as the ith element of ${\tilde{η}}_{n}$ , ${\tilde{\bar{η}}}_{n}$ , ${\tilde{\overset{=}{η}}}_{n}$ respectively. Thus, the estimators of $Γ_{n}$ and $γ_{n}$ can be obtained respectively as follows:
$G_{n} = \frac{1}{n} (\begin{matrix} 2 \sum_{i = 1}^{n} {\tilde{η}}_{n, i} {\tilde{\bar{η}}}_{n, i} & - \sum_{i = 1}^{n} {\tilde{\bar{η}}}_{n, i}^{2} & 1 \\ 2 \sum_{i = 1}^{n} {\tilde{\bar{η}}}_{n, i} {\tilde{\bar{\bar{η}}}}_{n, i} & - \sum_{i = 1}^{n} {\tilde{\bar{\bar{η}}}}_{n, i}^{2} & tr (W_{n}^{T} W_{n}) \\ \sum_{i = 1}^{n} ({\tilde{η}}_{n, i} {\tilde{\bar{\bar{η}}}}_{n, i} + {\tilde{\bar{η}}}_{n, i}^{2}) & - \sum_{i = 1}^{n} {\tilde{\bar{η}}}_{n, i} {\tilde{\bar{\bar{η}}}}_{n, i} & 0 \end{matrix})$ , $g_{n} = \frac{1}{n} (\begin{matrix} \sum_{i = 1}^{n} {\tilde{η}}_{n, i}^{2} \\ \sum_{i = 1}^{n} {\tilde{\bar{η}}}_{n, i}^{2} \\ \sum_{i = 1}^{n} {\tilde{η}}_{n, i} {\tilde{\bar{η}}}_{n, i} \end{matrix})$ .
Then, the empirical form of Formula (8) is

$g_{n} = G_{n} θ + v_{n},$

(10)

where $v_{n}$ can be viewed as a vector of regression residuals. It follows from Formula (9) that the empirical estimator of $θ$ can be given by

$\hat{θ} = G_{n}^{- 1} g_{n} .$

(11)

Based on Formula (10), the nonlinear least square estimator can also be defined by Kelejian and Prucha [12]

$\begin{matrix} \tilde{θ} = \arg min_{θ} {(g_{n} - G_{n} θ)}^{T} (g_{n} - G_{n} θ) = {(G_{n}^{T} G_{n})}^{- 1} G_{n}^{T} g_{n} . \end{matrix}$

(12)

Kelejian and Prucha [12] proved that both $\hat{θ}$ and $\tilde{θ}$ are consistent estimators of $θ$ . Hence, it does not matter which one is chosen as the estimator of $θ$ . In the following study, we need only consider $\tilde{θ}$ as an estimator of $θ$ .
Step 4. The final estimation of $β$ and $m$ . Applying a Cochrane–Orcutt type transformation to Model (3) yields

$Y_{n}^{*} = X_{n}^{*} β + ε_{n},$

where $Y_{n}^{*} = {\tilde{Y}}_{n} - λ W_{n} {\tilde{Y}}_{n}$ and $X_{n}^{*} = {\tilde{X}}_{n} - λ W_{n} {\tilde{X}}_{n}$ . Therefore, we get the final estimator of $β$ as follows:

$\hat{β} = {(X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} X_{n}^{*} (\tilde{λ}))}^{- 1} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} Y_{n}^{*} (\tilde{λ}),$

(13)

where $X_{n}^{*} (\tilde{λ}) = {\tilde{X}}_{n} - \tilde{λ} W_{n} {\tilde{X}}_{n}$ and $Y_{n}^{*} (\tilde{λ}) = {\tilde{Y}}_{n} - \tilde{λ} W_{n} {\tilde{Y}}_{n}$ .
Finally, we replace $β$ in Formula (4) by $\hat{β}$ and obtain the final estimator of $m$ as

$\hat{m} = S (Y_{n} - X_{n} \hat{β}) .$

Now, let us discuss the case

d > 1

. We estimate unknown functions by the popular “backfitting” method (Buja et al. [22], Härdle and Hall, [23]), and estimation of unknown parameters is similar to the case

d = 1

. The detailed steps are described as follows:

Step 1. The initial estimation of $m_{j}, j \in {1, \dots, d}$ is fixed. Suppose that $β$ and $m_{i} (i \neq j)$ are given, for any given $z_{j}$ , we can fit unknown function $m_{j} (z_{n, i j})$ and its derivative $m_{j}^{'} (z_{n, i j})$ by using local linear estimation method:

$(\begin{matrix} {\tilde{m}}_{j} (z_{j}) \\ h {\tilde{m}}_{j}^{'} (z_{j}) \end{matrix}) = {(Z_{j}^{T} K_{j} Z_{j})}^{- 1} Z_{j}^{T} K_{j} (Y_{n} - X_{n} β - \sum_{k \neq j} m_{k}),$

where $Z_{j} = {(\begin{matrix} 1 & \dots & 1 \\ \frac{z_{n, 1 j} - z_{j}}{h_{j}} & \dots & \frac{z_{n, n j} - z_{j}}{h_{j}} \end{matrix})}^{T}$ and $K_{j} = diag \{k_{h_{j}} (z_{n, 1 j} - z_{j}), \dots, k_{h_{j}} (z_{n, n j} - z_{j})\}$ . Let $s_{j, z_{j}} = e_{1}^{T} {(Z_{j}^{T} K_{j} Z_{j})}^{- 1} Z_{j}^{T} K_{j}$ , then $S_{j} = {(s_{j, z_{n, 1 j}}^{T}, \dots, s_{j, z_{n, n j}}^{T})}^{T}$ represents the smoother matrix for the local linear regression at observation $z_{j}$ . Like Opsomer and Ruppert [2], the smoother matrix $S_{j}$ is replaced by $S_{j}^{*} = (I_{n} - 1 1^{T} / n) S_{j}$ , where $I_{n}$ is an $n \times n$ unit matrix and $1 = {(1, 1, \dots, 1)}^{T}$ . Therefore, we get

${\tilde{m}}_{j} = S_{j}^{*} (Y_{n} - X_{n} β - \sum_{k \neq j} m_{k}) .$
Step 2. The initial estimation of $m_{k} (z_{n, i k})$ ( $k \neq j$ ). Based on backfitting algorithm, we choose the common Gauss–Seidel iteration, it schemes update one component at a time, based on the most recent components available Buja et al.( [22]). Let ${\tilde{m}}_{s}^{(l)}$ be estimation of the lth update for $m_{s}$ . Therefore, we have

${\tilde{m}}_{s}^{(l)} = S_{s}^{*} (Y_{n} - X_{n} β - \sum_{i = 1}^{s - 1} {\tilde{m}}_{i}^{(l)} - \sum_{i = s + 1}^{d} m_{i}^{(l - 1)}) .$

Iterate the above equation until convergence. Therefore, we obtain

${\tilde{m}}_{j} = S_{j}^{*} (Y_{n} - X_{n} β - \sum_{k \neq j} {\tilde{m}}_{k}), j = 1, 2, \dots, d .$

(14)

Hastie and Tibshirani [24], Opsomer and Ruppert [25] proposed the backfitting estimator of $m_{j}$ in the bivariate additive model $Y_{n} = α + m_{1} + m_{2} + ε$ . Similar to Opsomer and Ruppert [2], Opsomer [26], we can get the estimators of additive part functions by solving the following normal equations:

$\begin{matrix} (\begin{matrix} I_{n} & S_{1}^{*} & \dots & S_{1}^{*} \\ S_{2}^{*} & I_{n} & \dots & S_{2}^{*} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ S_{d}^{*} & S_{d}^{*} & \dots & I_{n} \end{matrix}) (\begin{matrix} {\tilde{m}}_{1} \\ {\tilde{m}}_{2} \\ ⋮ \\ {\tilde{m}}_{d} \end{matrix}) = (\begin{matrix} S_{1}^{*} (Y_{n} - X_{n} β) \\ S_{2}^{*} (Y_{n} - X_{n} β) \\ ⋮ \\ S_{d}^{*} (Y_{n} - X_{n} β) \end{matrix}), \end{matrix}$

Generally, if the matrix $M_{n} = (\begin{matrix} I_{n} & S_{1}^{*} & \dots & S_{1}^{*} \\ S_{2}^{*} & I_{n} & \dots & S_{2}^{*} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ S_{d}^{*} & S_{d}^{*} & \dots & I_{n} \end{matrix})$ is invertible, we may write the above estimators directly as

$\begin{matrix} (\begin{matrix} {\tilde{m}}_{1} \\ {\tilde{m}}_{2} \\ ⋮ \\ {\tilde{m}}_{d} \end{matrix}) = M_{n}^{- 1} C_{n} (Y_{n} - X_{n} β), \end{matrix}$

(15)

where $C_{n} = (\begin{matrix} S_{1}^{*} \\ S_{2}^{*} \\ \dots \\ S_{d}^{*} \end{matrix})$ . Let $E_{j} = {(0_{n}, \dots, I_{n}, \dots, 0_{n})}_{n \times n d}$ is a partitioned matrix with an $n \times n$ identity matrix as the jth “block” and zeros elsewhere. Then, we have

$\begin{matrix} {\tilde{m}}_{j} = & E_{j} M_{n}^{- 1} C_{n} (Y_{n} - X_{n} β) = F_{j} (Y_{n} - X_{n} β), \\ {\tilde{m}}_{+} = & \sum_{j = 1}^{d} {\tilde{m}}_{j} = \sum_{j = 1}^{d} F_{j} (Y_{n} - X_{n} β) = F_{n} (Y_{n} - X_{n} β), \end{matrix}$

(16)

where $F_{j} = E_{j} M_{n}^{- 1} C_{n}$ and $F_{n} = \sum_{j = 1}^{d} F_{j}$ . Next, we need the $d - 1$ dimensional smoother matrix $F_{n}^{[- j]}$ , which can be derived from the data generated by the model

$y_{n, i}^{†} = β_{0}^{T} x_{n, i} + \sum_{k = 1}^{j - 1} m_{k} (z_{n, i k}) + \sum_{k = j + 1}^{d} m_{k} (z_{n, i k}) + ε_{n, i}$

Based on Lemma 2.1 in Opsomer [26], we know that the backfitting estimators convergence to a unique solution if $∥ S_{j}^{*} F_{n}^{[- j]} ∥ < 1$ for some $j (1 \leq j \leq d)$ . In this case, $F_{j}$ can be rewritten as

$F_{j} = I_{n} - {(I_{n} - S_{j}^{*} F_{n}^{[- j]})}^{- 1} (I_{n} - S_{j}^{*}) = {(I_{n} - S_{j}^{*} F_{n}^{[- j]})}^{- 1} S_{j}^{*} (I_{n} - F_{n}^{[- j]}) .$
Step 3. The initial GMM estimation of $β$ . Let $H_{n} = {(h_{n, 1}, h_{n, 2}, \dots, h_{n, n})}^{T}$ be an $n \times r (r \geq p + 1)$ matrix of instrumental variables (IV), ${\bar{Y}}_{n} = (I_{n} - F_{n}) Y_{n}$ and ${\bar{X}}_{n} = (I_{n} - F_{n}) X_{n}$ , we have the corresponding moment functions

$l_{n} (β) = H_{n}^{T} (Y_{n} - X_{n} β - {\tilde{m}}_{+}) = H_{n}^{T} ({\bar{Y}}_{n} - {\bar{X}}_{n} β),$

Let $A_{r}$ be a $r \times r$ positive definite constant matrix; we can obtain the initial estimator of $β$ by minimizing

$\begin{matrix} Q_{n} (β) = l_{n}^{T} (β) A_{r} l_{n} (β) = {({\bar{Y}}_{n} - {\bar{X}}_{n} β)}^{T} H_{n} A_{r} H_{n}^{T} ({\bar{Y}}_{n} - {\bar{X}}_{n} β), \end{matrix}$

It is easy to know that

$\tilde{β} = \arg min_{β} Q_{n} (β) = {({\bar{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} {\bar{X}}_{n})}^{- 1} {\bar{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} {\bar{Y}}_{n} .$
Step 4. The estimation of parameter of $θ$ . Similar to Step 3 in the case $d = 1$ , we get the estimator of $θ$ by two-stage estimation in Kelejian and Prucha [12]. Denote ${\tilde{η}}_{n} = {\bar{Y}}_{n} - {\bar{X}}_{n} \tilde{β}$ as estimator of $η_{n}$ , we have

$\tilde{θ} = {(G_{n}^{T} G_{n})}^{- 1} G_{n}^{T} g_{n} .$
Step 5. The final estimator of $β$ and $m$ . Applying a Cochrane–Orcutt type transformation to Model (2). Let $Y_{n}^{*} = {\bar{Y}}_{n} - λ W_{n} {\bar{Y}}_{n}$ and $X_{n}^{*} = {\bar{X}}_{n} - λ W_{n} {\bar{X}}_{n}$ We can get the final estimator of $β$

$\hat{β} = {(X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} X_{n}^{*} (\tilde{λ}))}^{- 1} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} Y_{n}^{*} (\tilde{λ}),$

(17)

where $X_{n}^{*} (\tilde{λ}) = {\bar{X}}_{n} - \tilde{λ} W_{n} {\bar{X}}_{n}$ and $Y_{n}^{*} (\tilde{λ}) = {\bar{Y}}_{n} - \tilde{λ} W_{n} {\bar{Y}}_{n}$ .
By substituting $\hat{β}$ into the expression (16), we obtain the final estimator of $m_{j}$ and $m_{+}$ respectively as

$\begin{matrix} {\hat{m}}_{j} = F_{j} (Y_{n} - X_{n} \hat{β}), & {\hat{m}}_{+} = F_{n} (Y_{n} - X_{n} \hat{β}) . \end{matrix}$

3. Asymptotic Properties

To provide a rigorous consistency and asymptotic normality of parametric and nonparametric components, we state the following basic regular assumptions.

3.1. Assumptions

Assumption 1.

(1): The elements of spatial weight matrix $W_{n}$ are non-random, and all elements on the primary diagonal satisfy $w_{n, i i} = 0$ $(i = 1, 2, \dots, n)$ .
(2): The matrix $I_{n} - λ W_{n}$ is nonsingular for all $| λ | < 1$ .
(3): The matrices $W_{n}$ and ${(I_{n} - λ W_{n})}^{- 1}$ are uniformly bounded in both row and column sums in absolute value for all $| λ | < 1$ .

Assumption 2.

(1): The covariate $X_{n}$ is a non-stochastic matrix and has full row rank, and the elements of $X_{n}$ are uniformly bounded in absolute value.
(2): The column vectors of covariate $Z_{n}$ are i.i.d. random variables.
(3): The instrumental variables matrix $H_{n}$ is uniformly bounded in both row and column sums in absolute value.
(4): The innovation sequence satisfies $E | ε_{n, i} |^{2 + δ} = c_{δ} < \infty$ for some small δ, where $c_{δ}$ is a positive constant.
(5): The density $f (\cdot)$ of random variable z is positive and uniformly bounded away from zero on its compact support set. Furthermore, both $f^{'} (\cdot)$ and $f^{″} (\cdot)$ are bounded on their support sets.

Assumption 3.

(1): The kernel function $k (\cdot)$ is a bounded and continuous nonnegative symmetric function on its closed support set.
(2): Let $μ_{l} = \int v^{l} k (v) d v$ and $ν_{l} = \int v^{l} k^{2} (v) d v$ , where l is a nonnegative integer.
(3): The second derivative of $m (\cdot)$ exists and is bounded and continuous.

Assumption 4.

For notational simplicity, denote

(1): $A_{r} = A + o_{P} (1)$ , where $A$ is a positive semidefinite matrix.
(2): $\frac{1}{n} H_{n}^{T} (I_{n} - S) X_{n} = Q_{H X} + o_{P} (1)$ .
(3): $\frac{1}{n} H_{n}^{T} W_{n} (I_{n} - S) X_{n} = Q_{H W X} + o_{P} (1)$ .
(4): $\frac{1}{n} H_{n}^{T} H_{n} = Q_{H H} + o_{P} (1)$ .
(5): $Σ_{1} = lim_{n \to \infty} \frac{1}{n} H_{n}^{T} (I_{n} - S) {(I_{n} - λ_{0} W_{n})}^{- 1} {(I_{n} - λ_{0} W_{n}^{T})}^{- 1} {(I_{n} - S)}^{T} H_{n}$ .

Assumption 5.

As

n \to \infty

,

h \to 0

and

n h^{4} \to \infty

,

n h^{5} \to 0

.

Assumption 6.

(1): The kernel function $k_{j} (\cdot)$ $(j = 1, 2, \dots, d)$ is a bounded and continuous nonnegative symmetric function on its closed support set.
(2): Let $μ_{l}^{j} = \int v^{l} k_{j} (v) d v$ and $ν_{l}^{j} = \int v^{l} k_{j}^{2} (v) d v$ $(j = 1, 2, \dots, d)$ , where l is a nonnegative integer.
(3): The second derivative of $m_{j} (\cdot)$ $(j = 1, 2, \dots, d)$ exists and is bounded and continuous.

Assumption 7.

Denote

(1): $\frac{1}{n} H_{n}^{T} (I_{n} - F_{n}) X_{n} = R_{H X} + o_{P} (1)$ .
(2): $\frac{1}{n} H_{n}^{T} W_{n} (I_{n} - F_{n}) X_{n} = R_{H W X} + o_{P} (1)$ .
(3): $\frac{1}{n} H_{n}^{T} H_{n} = Q_{H H} + o_{P} (1)$ .
(4): $Σ_{2} = lim_{n \to \infty} \frac{1}{n} H_{n}^{T} (I_{n} - F_{n}) {(I_{n} - λ_{0} W_{n})}^{- 1} {(I_{n} - λ_{0} W_{n}^{T})}^{- 1} {(I_{n} - F_{n})}^{T} H_{n}$ .

Assumption 8.

As

n \to \infty

,

h_{j} \to 0

and

n h_{j}^{4} \to \infty

,

n h_{j}^{5} \to 0

,

j = 1, 2, \dots, d

.

Remark 1.

Assumption 1 provides the basic features of the spatial weight matrix. In some empirical applications, it is a practice to have

W_{n}

be row normalized, namely,

\sum_{j = 1}^{n} w_{n, i j} = 1

. Assumption 2 concerns the features of the regressors, error term, instrumental variables and density function for the model. Assumption 3 concerns the kernel function and nonparametric function for the case

d = 1

. Assumptions 4–5 are necessary for the asymptotic properties of the estimator for the case

d = 1

. Assumptions 6–8 are corresponding conditions of kernel functions, bandwidths and asymptotic properties of the estimator for arbitrary nonparametric additive terms.

3.2. Asymptotic Properties

In the next subsection, we first discuss the asymptotic normality of estimators for the case with a single nonparametric term. We then study the asymptotic properties of estimators for an arbitrary number of nonparametric additive terms.

Theorem 1.

Suppose that Assumptions 1–4 hold,

β

has a consistent estimator

\tilde{β}

. Furthermore, we have

\sqrt{n} (\tilde{β} - β) \overset{D}{\to} N (0, Ω_{1}),

where

Ω_{1} = σ^{2} {(Q_{H X}^{T} A Q_{H X})}^{- 1} Q_{H X}^{T} A Σ_{1} A Q_{H X} {(Q_{H X}^{T} A Q_{H X})}^{- 1}

.

Theorem 2.

Suppose that Assumptions 1–5 hold, we have

\tilde{θ} \overset{P}{\to} θ .

Theorem 3.

Suppose that Assumptions 1–5 hold and

A_{r} = {(\frac{1}{n} H_{n}^{T} H_{n})}^{- 1}

, we have

\sqrt{n} (\hat{β} - β) \overset{D}{\to} N (0, σ^{2} {\bar{Q}}^{- 1}),

where

\bar{Q} = {(Q_{H X} - λ Q_{H W X})}^{T} Q_{H H}^{- 1} (Q_{H X} - λ Q_{H W X})

.

Theorem 4.

Suppose that Assumptions 1–5 hold, we have

\sqrt{n h} (\hat{m} (z) - m (z)) \overset{D}{\to} N (0, f^{- 2} (z) Γ_{11}),

where

Γ_{11}

is the (1,1)

t h

element of Γ and

Γ = Z^{T} K {(I - λ W_{n})}^{- 1} {(I - λ W_{n}^{T})}^{- 1} K Z

.

Theorem 5.

Suppose that Assumptions 1–2 and Assumptions 6–8 hold, we have

\sqrt{n} (\tilde{β} - β) \overset{D}{\to} N (0, Ω_{2}),

where

Ω_{2} = σ^{2} {(R_{H X}^{T} A R_{H X})}^{- 1} R_{H X}^{T} A Σ_{2} A R_{H X} {(R_{H X}^{T} A R_{H X})}^{- 1}

.

Theorem 6.

Suppose that Assumptions 1–2 and Assumptions 6–8 hold, we have

\tilde{θ} \overset{P}{\to} θ .

Theorem 7.

Suppose that Assumptions 1–2 and Assumptions 6–8 hold and

A_{r} = {(\frac{1}{n} H_{n}^{T} H_{n})}^{- 1}

, we have

\sqrt{n} (\hat{β} - β) \overset{D}{\to} N (0, σ^{2} {\bar{R}}^{- 1}),

where

\bar{R} = {(R_{H X} - λ R_{H W X})}^{T} Q_{H H}^{- 1} (R_{H X} - λ R_{H W X})

.

Theorem 8.

Suppose that Assumptions 1–2 and Assumptions 6–8 hold, we have

\begin{matrix} E ({\hat{m}}_{j} - m_{j}) = \frac{1}{2} h_{j}^{2} μ_{2}^{j} (m_{j}^{″} - E (m_{j}^{″})) - S_{j}^{*} B_{- j} + O_{P} (\frac{1}{\sqrt{n}}) + o_{P} (h_{j}^{2}), \\ V a r ({\hat{m}}_{j} - m_{j}) = σ^{2} F_{j} {(I_{n} - λ_{0} W_{n})}^{- 1} {(I_{n} - λ W_{n}^{T})}^{- 1} F_{j}^{T}, \end{matrix}

where

B_{- j} = (F_{n}^{[- j]} - I_{n}) m_{(- j)}

with

m_{(- j)} = m_{1} + \dots + m_{j - 1} + m_{j + 1} + \dots + m_{d}

.

4. Simulation Studies

In this section, we will illustrate the finite sample performance of our GMM estimators by Monte Carlo experiment results. Similar to Cheng et al. [15], we generate the spatial weight matrix according to Rook contiguity (Anselin [11], pp. 17–18). The data generating process comes from the following model:

\{\begin{matrix} y_{n, i} = \sum_{k = 1}^{2} x_{n, i k} β_{k} + m_{1} (z_{n, i 1}) + m_{2} (z_{n, i 2}) + η_{n, i}, \\ η_{n, i} = λ \sum_{k = 1}^{n} w_{n, i k} η_{n, k} + ε_{n, i}, \end{matrix}

(18)

where covariate

x_{n, i k} (k = 1, 2)

is independently generated from multivariate normal distribution

N (0, Σ)

with

0 = {(0, 0)}^{T}

and

Σ = (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix})

. Covariates

z_{n, i 1}

and

z_{n, i 2}

are independently generated from uniform distributions

U (- 1, 1)

and

U (0, 1)

, respectively.

m_{1} (z_{n, i 1})

= 2 sin (π z_{n, i 1})

and

m_{2} (z_{n, i 2}) = z_{n, i 2}^{3} + 3 z_{n, i 2}^{2} - 2 z_{n, i 2} - 1

. The innovations

ε_{n, i}

are independently generated from normal distribution

N (0, 1)

. The true value

β = {(1, 1.5)}^{T}

. For comparison, three different values

λ = 0.25

,

λ = 0.5

and

λ = 0.75

, which represent from weak to strong spatial dependence of the error terms, are investigated.

For the parametric estimates, the sample mean (MEAN), the sample standard deviation (SD) and mean square error (MSE) are used as the evaluation criteria. For the nonparametric function, we use the root of average squared error (RASE, as in Fan and Wu [27]) as the evaluation criterion

{RASE}_{q} = \sqrt{Q^{- 1} \sum_{q = 1}^{Q} {({\hat{m}}_{j} (z_{q}) - m_{j} (z_{q}))}^{2}}, q = 1, 2, \dots, Q,

where grid point

Q = 40

is fixed.

For bandwidth sequences, we use cross-validation to choose the optimal bandwidth. For all estimators, we use the standardized Epanechikov kernel

k (u) = \frac{3}{4} \sqrt{5} (1 - \frac{1}{5} u^{2}) I (u^{2} \leq 5)

. Furthermore, we select the instrumental variables matrix as set to

H_{n} = (X_{n}, W_{n} X_{n})

. The sample sizes under investigation are

n = 49, 64, 81, 100, 225, 400

. For each case, there are 500 repetitions using Matlab software.

Table 1, Table 2 and Table 3 report the finite sample performance for all estimates under three cases of spatial correlation coefficient

λ

, respectively. First, the biases of

λ

are slightly larger when sample size

n = 49

, but they decrease as sample size increases. SDs and MSEs for estimates of

λ

are small for all cases and have a downward trend as sample size increases. Second, the biases, SDs and MSEs for estimates of

β_{1}

,

β_{2}

and

σ^{2}

are small for all cases and decrease as sample size increases. Third, the MEANs and SDs of 500 RASEs fairly rapidly decrease as sample size increases.

Figure 1 shows the fitted results and 95% confidence intervals of unknown function

m_{1}

for three cases of

λ

when

n = 225

and

n = 400

, respectively. The short dashed and solid curves display the true unknown function

m_{1}

and its estimates, respectively. The long dashed curves describe the corresponding 95% confidence intervals. Figure 2 shows the corresponding fitted results of unknown function

m_{2}

. By observing the fitting effect and confidence intervals of unknown functions in Figure 1 and Figure 2, we find that the small sample performances of nonparametric functions are good.

5. A Real Data Example

In this section, we will analyze the well-known Boston housing data in 1970 using the proposed estimation procedures in Section 2. The data set has 506 observations, 15 exogenous regressors and 1 response variable. It includes the following variables:

MEDV: The median value of owner-occupied homes in USD 1000s.
LON: Longitude coordinates.
LAT: Lattitude coordinates.
CRIM: Per capita crime rate by town.
ZN: Proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS: Proportion of non-retail business acres per town.
CHAS: Charles river dummy variable.
NOX: Nitric oxides concentration.
RM: Average number of rooms per dwelling.
AGE: Proportion of owner-occupied units built prior to 1940.
DIS: Weighted distances to five Boston employment centers.
RAD: Index of accessibility to radial highways.
TAX: Full-value property tax per USD 10,000.
PTRATIO: Pupil–teacher ratio by town.
B: 1000(B-0.63) $^{2}$ where B is the proportion of blacks by town.
LSTAT: Percentage of lower status of the population.

Zhang et al. [28] analyzed these data via the partially linear additive model. Moreover, they investigated this data set by using variable selection and concluded that variables RAD and PTRATIO have linear effects on the response variable MEDV; variables CRIM, NOX, RM, DIS, TAX and LSTAT have nonlinear effects on the response variable MEDV; and the other variables are insignificant and were removed from the final model. Based on the conclusion drawn by Du et al. [28] and Zhang et al. [20], the spatial effect was considered using the partially linear additive spatial autoregressive model. Cheng et al. [17] investigated these data using the partially linear single-index spatial autoregressive model.

In our real data analysis, based on the conclusions drawn by Cheng et al. [17] and Du et al. [20], we consider that the dependent variable is MEDV; variables RAD and PTRATIO have linear effects on the response variable MEDV; and variables CRIM, NOX, RM, DIS and LSTAT have nonlinear effects on the response variable MEDV. We consider the partially linear additive spatial error model as follows:

\{\begin{matrix} y_{n, i} = x_{n, 1 i} β_{1} + x_{n, 2 i} β_{2} + \sum_{k = 1}^{5} m_{k} (z_{n, k i}) + η_{n, i}, \\ η_{n, i} = λ \sum_{k = 1}^{n} w_{n, i k} η_{n, k} + ε_{n, i}, \end{matrix}

(19)

where the response variable

y_{n, i} = l o g (M E D V_{i})

, the covariates

x_{n, 1 i} = l o g (R A D_{i})

and

x_{n, 2 i} = l o g (P T R A T I O_{i})

,

z_{n, 1 i}, \dots, z_{n, 5 i}

are the ith element of CRIM, NOX, RM, DIS and log(LSTAT), respectively. As in Cheng et al. [17] and Du et al. [20], logarithmic transformation of variables is taken to relieve the trouble caused by big gaps in the domain. The spatial weight matrix

w_{n, i j}

is calculated by the Euclidean distance in the light of longitude and latitude coordinates of any two houses. Moreover, the spatial weight matrix is normalized. Table 4 lists analysis results for the estimation of unknown parametric coefficients and their corresponding 95% confidence intervals. The function estimates of nonparametric components are displayed in Figure 3.

From Table 4, we can get the conclusions as follows. Firstly, the estimate of the spatial coefficient is 0.4689, the standard deviation 0.0916, and the mean square error is 0.0085 with a confidence interval that does not include 0, which shows that there exists a positive spatial relation among the regression disturbance term. Secondly, for the linear part, the regression coefficient for RAD is positive, which indicates that the housing price increases as the index of accessibility to radial highways increasing. The regression coefficient for PTRATIO is negative, which reveals that the pupil–teacher ratio by town has a negative effect on the housing price. Thirdly, for the nonlinear part, by observing Figure 3, we can obtain that the housing price would decrease as the other variables increase except RM. The variables CRIM, NOX, RM, DIS and log(LSTAT) have non-linear effects on the response, which is slightly different from the conclusion obtained by Du et al. [20]. The main reason is that our model has a different spatial structure compared with the model given in Du et al. [20].

6. Conclusions

In this paper, we present a partially linear additive spatial error model. This model not only can effectively avoid the “curse of dimensionality” in the nonparametric spatial autoregressive model and enhance the robustness of estimation, but it also investigates the spatial autocorrelation of error terms, and captures the linearity and nonlinearity between the response variable and the interesting regressors simultaneously. The local linear estimators of nonparametric additive components and GMM estimators of unknown parameters are constructed for the cases

d = 1

and

d > 1

, respectively. The consistency and asymptotic normality of estimators under some regular conditions are developed for both cases. Meanwhile, Monte Carlo simulation illustrates the good finite sample performance of estimators. The real data analysis implies that our proposed methodology can be used to fit Boston data well and is easy to perform in practice.

This paper focuses only on the independent errors with homoscedasticity. However, many real spatial data sets do not satisfy the conditions, and it is significant to extend this model to cases with heteroscedasticity. Furthermore, we have not considered the issue of variable selection in the model. These scenarios will be studied as future topics.

Author Contributions

Conceptualization, J.C.; methodology, J.C. and S.C.; software, S.C.; validation, S.C.; formal analysis, J.C.; investigation, J.C. and S.C.; resources, J.C.; data curation, S.C.; writing—original draft preparation, S.C.; writing—review and editing, J.C.; visualization, J.C. and S.C.; supervision, J.C.; funding acquisition, J.C. and S.C. Both authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by NSF of Fujian Province, PR China (2018J05002, 2020J01170), Fujian Normal University Innovation Team Foundation, PR China (IRTL1704), Scientific Research Foundation of Chongqing Technology and Business University, PR China (2056015) and Science and Technology Research Program of Chongqing Municipal Education Commission, PR China (KJQN202000843).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Lemma A1.

(Su [15]) The row and column sums of

S = {(s_{i j})}_{n \times n}

are uniformly bounded in absolute value for sufficiently large n, and the elements satisfy

\sum_{j = 1}^{n} s_{i j} = 1

.

Lemma A2.

Suppose that Assumptions 1–3 hold, we have

\frac{1}{n} Z^{T} K Z = f (z) (\begin{matrix} 1 & 0 \\ 0 & μ_{2} \end{matrix}) + O_{p} (h) .

Proof of Lemma A2.

It follows from

Z = {(\begin{matrix} 1 & \dots & 1 \\ (z_{n, 1} - z) / h & \dots & (z_{n, n} - z) / h \end{matrix})}^{T}

and

K = diag {k_{h} (z_{1} - z), \dots, k_{h} (z_{n} - z)}

that

\frac{1}{n} Z^{T} K Z = (\begin{matrix} Λ_{1.11} & Λ_{1.12} \\ Λ_{1.21} & Λ_{1.22} \end{matrix}),

where

Λ_{1.11} = \frac{1}{n} \sum_{i = 1}^{n} k_{h} (z_{i} - z)

,

Λ_{1.12} = Λ_{1.21} = \frac{1}{n} \sum_{i = 1}^{n} \frac{z_{n, i} - z}{h} k_{h} (z_{i} - z)

and

Λ_{1.22} = \frac{1}{n} \sum_{i = 1}^{n} {(\frac{z_{n, i} - z}{h})}^{2} k_{h} (z_{i} - z)

. According to Assumption 3, we get

E (Λ_{1.11}) = E (k_{h} (z_{1} - z)) = \int k_{h} (z_{1} - z) f (z_{1}) d z_{1} = \int k (v) f (h v + z) d v = f (z) + O (h^{2}) .

Similarly,

E (Λ_{1.12}) = O (h)

and

E (Λ_{1.12}) = f (z) μ_{2} + O (h^{2})

. Therefore, we obtain

\frac{1}{n} Z^{T} K Z = f (z) (\begin{matrix} 1 & 0 \\ 0 & μ_{2} \end{matrix}) + O_{P} (h) .

□

Lemma A3.

Suppose that Assumptions 1–3 hold, we have

s_{z} {(I_{n} - λ W_{n})}^{- 1} ε_{n} = O_{P} (n^{- 1 / 2}) .

Proof of Lemma A3.

Denote

D_{n} = {(I_{n} - λ W_{n})}^{- 1} = {(d_{n, i j})}_{n \times n}

, by Assumption 1, it is easy to know that the row and column sums of

D_{n}

are uniformly bounded in absolute value. Namely, there exists a positive constant

C_{d}

such that

d_{n, i j} \leq \sum_{i = 1}^{n} | d_{n, i j} | \leq C_{d}

. Noting that

\frac{1}{n} Z^{T} K {(I_{n} - λ W_{n})}^{- 1} ε_{n} = (\begin{matrix} Λ_{1.11} \\ Λ_{1.21} \end{matrix}) = Λ_{1},

where

Λ_{1.11} = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{n} k_{h} (z_{n, i} - z) d_{n, i j} ε_{n, j}

and

Λ_{1.21} = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{n} \frac{z_{n, i} - z}{h} k_{h} (z_{n, i} - z) d_{n, i j} ε_{n, j}

. Clearly,

E (Λ_{1.11}) = 0

and

E (Λ_{1.21}) = 0

. It follows by Assumption 1 and Assumption 3 that

\begin{matrix} E (Λ_{1.11}^{2}) = & \frac{1}{n^{2}} E (\sum_{i = 1}^{n} \sum_{j = 1}^{n} \sum_{s = 1}^{n} \sum_{t = 1}^{n} k_{h} (z_{n, i} - z) k_{h} (z_{n, s} - z) d_{n, i j} d_{n, s t} ε_{n, j} ε_{n, t}) \\ = & \frac{1}{n^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} \sum_{s = 1}^{n} E (k_{h} (z_{n, i} - z) k_{h} (z_{n, s} - z) d_{n, i j} d_{n, s j} ε_{n, j}^{2}) \\ \leq & \frac{C_{d}^{2} σ_{0}^{2}}{n h^{2}} \int \int k (\frac{z_{n, i} - z}{h}) k (\frac{z_{n, s} - z}{h}) f (z_{n, i}) f (z_{n, s}) d z_{n, i} d z_{n, s} \\ = & \frac{C_{d}^{2} σ_{0}^{2}}{n h^{2}} {(\int k (\frac{z_{n, i} - z}{h}) f (z_{n, i}) d z_{n, i})}^{2} \\ = & \frac{C_{d}^{2} σ_{0}^{2}}{n} {(f (z) + o_{P} (h^{2}))}^{2} . \end{matrix}

Therefore, we have

E (Λ_{1.11}^{2}) = O (n^{- 1})

. It follows from Chebyshev inequality that

Λ_{1.11} = O_{P} (n^{- 1 / 2})

. Furthermore, we obtain

\begin{matrix} s_{z} {(I_{n} - λ W_{n})}^{- 1} ε_{n} = (1, 0) {(\frac{1}{n} Z^{T} K Z)}^{- 1} Λ_{1} = O_{P} (n^{- 1 / 2}) . \end{matrix}

□

Lemma A4.

Suppose that Assumptions 1–3 hold, the row and column sums of

F_{n}

are uniformly bounded in absolute value for sufficiently large n.

Proof of Lemma A4.

Noting that

F_{n} = \sum_{j = 1}^{d} F_{j}

, where

F_{j} = I_{n} - {(I_{n} - S_{j}^{*} F_{n}^{[- j]})}^{- 1} (I_{n} - S_{j}^{*})

. To prove the row and column sums of

F_{n}

are uniformly bounded in absolute value, we only need to prove the row and column sums of

F_{j}

are uniformly bounded in absolute value. By the Lemma 2 in Opsomer [26], we obtain that

{(I_{n} - S_{j}^{*} F_{n}^{[- j]})}^{- 1} = I_{n} + O_{P} (1 1^{T} / n)

Thus,

\begin{matrix} F_{j} = & I_{n} - {(I_{n} - S_{j}^{*} F_{n}^{[- j]})}^{- 1} (I_{n} - S_{j}^{*}) \\ = & I_{n} - (I_{n} + O_{P} (1 1^{T} / n)) (I_{n} - S_{j}^{*}) = S_{j}^{*} + O_{P} (1 1^{T} / n) . \end{matrix}

It follows Lemma A1 and the definition of

S_{j}^{*}

that the row and column sums of

S_{j}^{*}

are uniformly bounded in absolute value. Therefore, the row and column sums of

F_{j}

are uniformly bounded in absolute value. Furthermore, the row and column sums of

F_{n}

are uniformly bounded in absolute value. □

Proof of Theorem 1.

By the definition of

\tilde{β}

, we have

\begin{matrix} \tilde{β} = & {({\tilde{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} {\tilde{X}}_{n})}^{- 1} {\tilde{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} {\tilde{Y}}_{n} \\ = & {({\tilde{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} {\tilde{X}}_{n})}^{- 1} {\tilde{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} (I_{n} - S) Y_{n} \\ = & {({\tilde{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} {\tilde{X}}_{n})}^{- 1} {\tilde{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} (I_{n} - S) (X_{n} β + m + {(I_{n} - λ W_{n})}^{- 1} ε_{n}) \\ = & β + {({\tilde{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} {\tilde{X}}_{n})}^{- 1} {\tilde{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} (I_{n} - S) m \\ + {({\tilde{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} {\tilde{X}}_{n})}^{- 1} {\tilde{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} (I_{n} - S) {(I_{n} - λ W_{n})}^{- 1} ε_{n}, \end{matrix}

It follows from Taylor expansion of

m (z_{n, i})

at z that

\begin{matrix} m (z_{n, i}) = & m (z) + m^{'} (z) (z_{n, i} - z) + \frac{m^{″} (z)}{2} {(z_{n, i} - z)}^{2} + o (h^{2}) \\ = & (1, \frac{z_{n, i} - z}{h}) (\begin{matrix} m (z) \\ h m^{'} (z) \end{matrix}) + \frac{m^{″} (z)}{2} {(z_{n, i} - z)}^{2} + o (h^{2}), i = 1, 2, \dots, n . \end{matrix}

Let

Q_{m} (z) = (\begin{matrix} {(z_{n, 1} - z)}^{2} \\ ⋮ \\ {(z_{n, n} - z)}^{2} \end{matrix}) m^{″} (z)

, we can write

m = Z (\begin{matrix} m (z) \\ h m^{'} (z) \end{matrix}) + \frac{1}{2} Q_{m} (z) + o (h^{2}),

and hence

S m = m + \frac{1}{2} Q + o (h^{2})

where

Q = (\begin{matrix} s_{z, 1} Q_{m} (z_{1}) \\ ⋮ \\ s_{z, n} Q_{m} (z_{n}) \end{matrix})

. Because

\frac{1}{n} Z^{T} K Q_{m} (z_{1}) = (\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} k_{h} (z_{n, i} - z_{1}) {(z_{n, i} - z_{1})}^{2} m^{″} (z_{1}) \\ \frac{1}{n} \sum_{i = 1}^{n} \frac{z_{n, i} - z_{1}}{h} k_{h} (z_{n, i} - z_{1}) {(z_{n, i} - z_{1})}^{2} m^{″} (z_{1}) \end{matrix})

It’s easy to know that

\begin{matrix} E (\frac{1}{n} \sum_{i = 1}^{n} k_{h} (z_{n, i} - z_{1}) {(z_{n, i} - z_{1})}^{2} m^{″} (z_{1})) \\ = & E (k_{h} (z_{n, 1} - z_{1}) {(z_{n, 1} - z_{1})}^{2} m^{″} (z_{1})) \\ = & \int k_{h} (z_{n, 1} - z_{1}) {(z_{n, 1} - z_{1})}^{2} m^{″} (z_{1}) f (z_{n, 1}) d z_{n, 1} \\ = & \int h^{2} v^{2} k (v) m^{″} (z_{1}) (f (z_{1}) + h v f^{'} (z_{1}) + \frac{h^{2} v^{2}}{2} f^{″} (z_{1})) d v \\ = & h^{2} μ_{2} m^{″} (z_{1}) f (z_{1}) + O (h^{4}) . \end{matrix}

Similarly,

E (\frac{1}{n} \sum_{i = 1}^{n} \frac{z_{n, i} - z_{1}}{h} k_{h} (z_{n, i} - z_{1}) {(z_{n, i} - z_{1})}^{2} m^{″} (z_{1})) = h^{3} μ_{4} m^{″} (z_{1}) f^{'} (z_{1}) = O (h^{3})

. It follows from Lemma A2 that

s_{z, 1} Q_{m} (z_{1}) = e_{1}^{T} {(Z^{T} K Z)}^{- 1} Z^{T} K Q_{m} (z_{1}) = h^{2} μ_{2} m^{″} (z_{1}) + O_{P} (h^{4}) .

Furthermore, we have

Q = h^{2} μ_{2} m^{″} + O_{P} (h^{4})

and

(I_{n} - S) m = - \frac{1}{2} Q + o (h^{2}) = - \frac{1}{2} h^{2} μ_{2} m^{″} + O_{P} (h^{4}) + o (h^{2}) = - \frac{1}{2} h^{2} μ_{2} m^{″} + o_{P} (h^{2})

Therefore, we obtain

\begin{matrix} \frac{1}{n} H_{n}^{T} (I_{n} - S) m \\ = & \frac{1}{n} (\begin{matrix} h_{n, 11} & \dots & h_{n, n 1} \\ ⋮ & ⋱ & ⋮ \\ h_{n, 1 r} & \dots & h_{n, n r} \end{matrix}) (\begin{matrix} - \frac{1}{2} h^{2} μ_{2} m^{″} (z_{1}) \\ ⋮ \\ - \frac{1}{2} h^{2} μ_{2} m^{″} (z_{n}) \end{matrix}) + o_{P} (h^{2}) \\ = & - \frac{1}{2} h^{2} μ_{2} (\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} h_{n, i 1} m^{″} (z_{i}) \\ ⋮ \\ \frac{1}{n} \sum_{i = 1}^{n} h_{n, i r} m^{″} (z_{i}) \end{matrix}) + o_{P} (h^{2}) \\ = & - \frac{1}{2} h^{2} μ_{2} E (h_{n, 1} m^{″} (z_{1})) + o_{P} (h^{2}) . \end{matrix}

Combing with Assumption 4, we get

\begin{matrix} {({\tilde{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} {\tilde{X}}_{n})}^{- 1} {\tilde{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} (I_{n} - S) m \\ = & - \frac{1}{2} h^{2} μ_{2} {(Q_{H X}^{T} A Q_{H X})}^{- 1} Q_{H X}^{T} A E (h_{n, 1} m^{″} (z_{1})) + o_{P} (h^{2}) . \end{matrix}

Let

D_{n} = H_{n}^{T} (I_{n} - S) {(I_{n} - λ W_{n})}^{- 1}

, using Assumption 1, Assumption 2 and Lemma A1, we can show that

E (D_{n} ε_{n}) = 0, Var (D_{n} ε_{n}) = σ^{2} D_{n} D_{n}^{T}

Thus, according to Assumption 4, we have

E (\tilde{β} - β) = - \frac{1}{2} h^{2} μ_{2} {(Q_{H X}^{T} A Q_{H X})}^{- 1} Q_{H X}^{T} A E (h_{n, 1} m^{″} (z_{1})) + o_{P} (h^{2}),

\begin{matrix} Var (\tilde{β}) = & σ^{2} {({\tilde{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} {\tilde{X}}_{n})}^{- 1} {\tilde{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} (I_{n} - S) {(I_{n} - λ W_{n})}^{- 1} \\ \cdot {(I_{n} - λ W_{n}^{T})}^{- 1} {(I_{n} - S)}^{T} H_{n} A_{r} H_{n}^{T} {\tilde{X}}_{n} {({\tilde{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} {\tilde{X}}_{n})}^{- 1} \\ = & \frac{σ^{2}}{n} {(Q_{H X}^{T} A Q_{H X})}^{- 1} Q_{H X}^{T} A Σ_{1} A Q_{H X} {(Q_{H X}^{T} A Q_{H X})}^{- 1} + o_{P} (1) . \end{matrix}

Denote

Ω_{1} = σ^{2} {(Q_{H X}^{T} A Q_{H X})}^{- 1} Q_{H X}^{T} A Σ_{1} A Q_{H X} {(Q_{H X}^{T} A Q_{H X})}^{- 1}

, we obtain

\sqrt{n} (\tilde{β} - β) \overset{D}{\to} N (0, Ω_{1}) .

□

Proof of Theorem 2.

Similarly to the proof of Theorem 2 in Kelejian and Prucha [12], we only need to prove

{\tilde{η}}_{n} \overset{P}{\to} η_{n}

. Write

\begin{matrix} {\tilde{η}}_{n} & = {\tilde{Y}}_{n} - {\tilde{X}}_{n} \tilde{β} = (I_{n} - S) Y_{n} - (I_{n} - S) X_{n} \tilde{β} \\ = (I_{n} - S) (X_{n} β + m + η_{n}) - (I_{n} - S) X_{n} \tilde{β} \\ = (I_{n} - S) X_{n} (β - \tilde{β}) + (I_{n} - S) m + (I_{n} - S) η_{n} . \end{matrix}

Thus

{\tilde{η}}_{n} - η_{n} = (I_{n} - S) X_{n} (β - \tilde{β}) + (I_{n} - S) m - S η_{n} .

According to Lemma A3, we get

s_{z} η_{n} = s_{z} {(I_{n} - λ W_{n})}^{- 1} ε_{n} = O_{P} ({(n h)}^{- 1 / 2}) .

Therefore, we have

S η_{n} = o_{P} (1) .

It follows from the result in the proof of Theorem 1 that

(I_{n} - S) m = - \frac{1}{2} h^{2} μ_{2} m^{″} + o_{P} (h^{2}) .

The result then follows from Lemma 1 and the proof of Theorem 1. □

Proof of Theorem 3.

Recall that

\begin{matrix} \hat{β} & = {(X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} X_{n}^{*} (\tilde{λ}))}^{- 1} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} Y_{n}^{*} (\tilde{λ}) \\ = β + {(X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} X_{n}^{*} (\tilde{λ}))}^{- 1} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} η_{n}^{*} (\tilde{λ}), \end{matrix}

where

\begin{matrix} η_{n}^{*} (\tilde{λ}) = & Y_{n}^{*} (\tilde{λ}) - X_{n}^{*} (\tilde{λ}) β = {\tilde{Y}}_{n} - \tilde{λ} W_{n} {\tilde{Y}}_{n} - ({\tilde{X}}_{n} - \tilde{λ} W_{n} {\tilde{X}}_{n}) β \\ = & η_{n} - \tilde{λ} W_{n} η_{n} = ε_{n} - (\tilde{λ} - λ) W_{n} η_{n} . \end{matrix}

Therefore, we have

\begin{matrix} \sqrt{n} (\hat{β} - β) = & {(\frac{1}{n} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} X_{n}^{*} (\tilde{λ}))}^{- 1} n^{- 1 / 2} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} ε_{n} \\ - {(\frac{1}{n} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} X_{n}^{*} (\tilde{λ}))}^{- 1} (\tilde{λ} - λ) n^{- 1 / 2} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} W_{n} η_{n} . \end{matrix}

To show the asymptotic normality of

\hat{β}

, it suffices to show that

\begin{matrix} \frac{1}{n} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} X_{n}^{*} (\tilde{λ}) \overset{P}{\to} \bar{Q} \end{matrix}

(A1)

\begin{matrix} n^{- 1 / 2} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} ε_{n} \overset{D}{\to} N (0, σ^{2} \bar{Q}) \end{matrix}

(A2)

\begin{matrix} (\tilde{λ} - λ) n^{- 1 / 2} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} W_{n} η_{n} \overset{P}{\to} 0 \end{matrix}

(A3)

First, we prove (A1). Write

\begin{matrix} \frac{1}{n} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} X_{n}^{*} (\tilde{λ}) \\ = & \frac{1}{n} {({\tilde{X}}_{n} - \tilde{λ} W_{n} {\tilde{X}}_{n})}^{T} H_{n} A_{r} H_{n}^{T} ({\tilde{X}}_{n} - \tilde{λ} W_{n} {\tilde{X}}_{n}) \\ = & \frac{1}{n} {((I_{n} - S) X_{n} - \tilde{λ} W_{n} (I_{n} - S) X_{n})}^{T} H_{n} A_{r} H_{n}^{T} ((I_{n} - S) X_{n} - \tilde{λ} W_{n} (I_{n} - S) X_{n}) . \end{matrix}

When

A_{r}

satisfies

A_{r} = {(\frac{1}{n} H_{n}^{T} H_{n})}^{- 1}

, we get

\begin{matrix} \frac{1}{n} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} X_{n}^{*} (\tilde{λ}) \\ = & {(\frac{1}{n} H_{n}^{T} (I_{n} - S) X_{n} - \frac{1}{n} \tilde{λ} H_{n}^{T} W_{n} (I_{n} - S) X_{n})}^{T} \\ \cdot {(\frac{1}{n} H_{n}^{T} H_{n})}^{- 1} (\frac{1}{n} H_{n}^{T} (I_{n} - S) X_{n} - \frac{1}{n} \tilde{λ} H_{n}^{T} W_{n} (I_{n} - S) X_{n}) . \end{matrix}

It follows by Assumption 4 and Theorem 2 that

\frac{1}{n} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} X_{n}^{*} (\tilde{λ}) = \bar{Q} + o_{P} (1),

where

\bar{Q} = {(Q_{H X} - λ_{0} Q_{H W X})}^{T} Q_{H H}^{- 1} (Q_{H X} - λ_{0} Q_{H W X})

.

Next, we prove (A2). Write

\begin{matrix} n^{- 1 / 2} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} ε_{n} \\ = & n^{- 1 / 2} {({\tilde{X}}_{n} - \tilde{λ} W_{n} {\tilde{X}}_{n})}^{T} H_{n} A_{r} H_{n}^{T} ε_{n} \\ = & {(\frac{1}{n} H_{n}^{T} (I_{n} - S) X_{n} - \frac{1}{n} \tilde{λ} H_{n}^{T} W_{n} (I_{n} - S) X_{n})}^{T} {(\frac{1}{n} H_{n}^{T} H_{n})}^{- 1} (n^{- 1 / 2} H_{n}^{T} ε_{n}) . \end{matrix}

By Assumptions 2 and 4, we have

n^{- 1 / 2} H_{n}^{T} ε_{n} \overset{D}{\to} N (0, σ^{2} Q_{H H})

. It follows from the proof of (A1) that

n^{- 1 / 2} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} ε_{n} \overset{D}{\to} N (0, σ^{2} \bar{Q}) .

Finally, we prove (A3). Write

\begin{matrix} (\tilde{λ} - λ) n^{- 1 / 2} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} W_{n} η_{n} \\ = & (\tilde{λ} - λ) n^{- 1 / 2} {((I_{n} - S) X_{n} - \tilde{λ} W_{n} (I_{n} - S) X_{n})}^{T} H_{n} A_{r} H_{n}^{T} W_{n} η_{n} \\ = & (\tilde{λ} - λ) {(\frac{1}{n} H_{n}^{T} (I_{n} - S) X_{n} - \frac{1}{n} \tilde{λ} H_{n}^{T} W_{n} (I_{n} - S) X_{n})}^{T} {(\frac{1}{n} H_{n}^{T} H_{n})}^{- 1} n^{- 1 / 2} H_{n}^{T} W_{n} η_{n} . \end{matrix}

It follows from Assumptions 1–2 that

E (n^{- 1 / 2} H_{n}^{T} W_{n} η_{n}) = 0,

E (n^{- 1} H_{n}^{T} W_{n} η_{n} η_{n}^{T} W_{n}^{T} H_{n}) = \frac{σ^{2}}{n} H_{n}^{T} W_{n} {(I_{n} - λ W_{n})}^{- 1} {(I_{n} - λ W_{n}^{T})}^{- 1} W_{n}^{T} H_{n} .

By Assumptions 1–2, it is easy to know that

H_{n}^{T} W_{n} {(I_{n} - λ W_{n})}^{- 1} {(I_{n} - λ W_{n}^{T})}^{- 1} W_{n}^{T} H_{n}

is bounded. Thus,

n^{- 1} H_{n}^{T} W_{n} η_{n} η_{n}^{T} W_{n}^{T} H_{n} = O_{P} (1)

According to the proof of (A1), we can show that

(\tilde{λ} - λ) n^{- 1 / 2} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} W_{n} η_{n} \overset{P}{\to} 0 .

Using results (A1)–(A3), we obtain that

\sqrt{n} (\hat{β} - β) \overset{D}{\to} N (0, σ^{2} {\bar{Q}}^{- 1}) .

□

Proof of Theorem 4.

It follows from

\hat{m} (z) = s_{z} (Y_{n} - X_{n} \hat{β}) = s_{z} (X_{n} β + m + η_{n} - X_{n} \hat{β}) = s_{z} m + s_{z} η_{n} + s_{z} X_{n} (β - \hat{β})

that

\sqrt{n h} (\hat{m} (z) - m (z)) = \sqrt{n h} (s_{z} m - m (z)) + \sqrt{n h} s_{z} η_{n} + \sqrt{n h} s_{z} X_{n} (β - \hat{β}) .

(A4)

Using the result in the proof of Theorem 1, we can show that

\sqrt{n h} (s_{z} m - m (z)) = \sqrt{n h} (\frac{1}{2} h^{2} μ_{2} m^{″} (z) + o_{P} (h^{2})) = O_{P} (n^{1 / 2} h^{5 / 2}) = o_{P} (1)

(A5)

It follows from

\hat{β} - β = O (n^{- 1 / 2})

that

\hat{β} = β + O (n^{- 1 / 2})

. By some tedious calculations, Assumption 2 and Lemma A1, we obtain

\sqrt{n h} s_{z} X_{n} (β - \hat{β}) = o_{P} (1) .

(A6)

On the other hand, we have

\begin{matrix} \sqrt{n h} s_{z} η_{n} = & \sqrt{n h} s_{z} {(I_{n} - λ W_{n})}^{- 1} ε_{n} \\ = & e_{1}^{T} {(\frac{1}{n} Z^{T} K Z)}^{- 1} \sqrt{h / n} Z^{T} K {(I_{n} - λ W_{n})}^{- 1} ε_{n} . \end{matrix}

(A7)

Let

e_{n} = \sqrt{h / n} Z^{T} K {(I_{n} - λ W_{n})}^{- 1} ε_{n}

, we now show that

e_{n} \overset{D}{\to} N (0, Γ)

, where

Z^{T} K {(I_{n} - λ W_{n})}^{- 1} {(I_{n} - λ W_{n}^{T})}^{- 1} K Z

. By Cramér-Wold device, it suffices to show that

c_{1}^{T} e_{n} \overset{D}{\to} N (0, c_{1}^{T} Γ c_{1})

, where

c_{1}

is an arbitrary

2 \times 1

vector with

∥ c_{1} ∥ = 1

. Clearly,

E (c_{1}^{T} e_{n}) = 0

. Let

s_{1}^{2} = E {(c_{1}^{T} e_{n})}^{2}

and

{\tilde{e}}_{n} = \frac{c_{1}^{T} e_{n}}{s_{1}}

, then

E ({\tilde{e}}_{n}) = 0

and

E ({\tilde{e}}_{n}^{2}) = 1

. Denote

{(I_{n} - λ W_{n})}^{- 1} = T_{n} = {(t_{n, i j})}_{n \times n}

, write

\begin{matrix} {\tilde{e}}_{n} = & \frac{c_{1}^{T} e_{n}}{s_{1}} = \sqrt{h / n} c_{1}^{T} Z^{T} K {(I_{n} - λ W_{n})}^{- 1} ε_{n} / s_{1} \\ = & \sqrt{h / n} \sum_{i = 1}^{n} \sum_{j = 1}^{n} \sum_{k = 1}^{2} c_{1, k} z_{n, k i} k_{h} (z_{n, i} - z) t_{n, i j} ε_{n, j} / s_{1} = \sum_{j = 1}^{n} {\tilde{ε}}_{n, j}, \end{matrix}

where

{\tilde{ε}}_{n, j} = \sqrt{h / n} \sum_{i = 1}^{n} \sum_{k = 1}^{2} c_{1, k} z_{n, k i} k_{h} (z_{n, i} - z) t_{n, i j} ε_{n, j} / s_{1}

. To show

{\tilde{e}}_{n} \overset{D}{\to} N (0, 1)

, it is sufficient to show

\sum_{j = 1}^{n} E {| {\tilde{ε}}_{n, j} |}^{2 + δ} = o (1)

for arbitrary small

δ > 0

(Davidson [29], Theorem 23.6 and Theorem 23.11). According to Assumptions 1–2, we have

\begin{matrix} \sum_{j = 1}^{n} E {| {\tilde{ε}}_{n, j} |}^{2 + δ} & = \frac{1}{s_{1}^{2 + δ}} \frac{h^{(2 + δ) / 2}}{n^{(2 + δ) / 2}} \sum_{j = 1}^{n} | \sum_{i = 1}^{n} \sum_{k = 1}^{2} c_{1, k} z_{n, k i} k_{h} (z_{n, i} - z) t_{n, i j} |^{2 + δ} E {| ε_{n, j} |}^{2 + δ} \\ = \frac{c_{δ}}{s_{1}^{2 + δ}} \frac{h^{(2 + δ) / 2}}{n^{(2 + δ) / 2}} \sum_{j = 1}^{n} | \sum_{i = 1}^{n} \sum_{k = 1}^{2} c_{1, k} z_{n, k i} k_{h} (z_{n, i} - z) t_{n, i j} |^{2 + δ} \\ \leq \frac{c_{δ}}{s_{1}^{2 + δ}} \frac{h^{(2 + δ) / 2}}{n^{(2 + δ) / 2}} \sum_{j = 1}^{n} | \sum_{k = 1}^{2} c_{1, k} z_{n, k i} k_{h} (z_{n, i} - z) |^{2 + δ} \sum_{i = 1}^{n} {| t_{n, i j} |}^{2 + δ} \\ \leq \frac{c_{δ} c_{t}^{2 + δ}}{s_{1}^{2 + δ}} \frac{h^{(2 + δ) / 2}}{n^{(2 + δ) / 2}} \sum_{j = 1}^{n} | \sum_{k = 1}^{2} c_{1, k} z_{n, k i} k_{h} (z_{n, i} - z) |^{2 + δ} \\ = O ({(n h)}^{- δ / 2}) = o (1), \end{matrix}

where

c_{t} = s u p_{i \leq j \leq n} \sum_{i = 1}^{n} | t_{n, i j} |

. It follows from

s_{1}^{2} = E {(c^{T} e_{n})}^{2} = \frac{h σ^{2}}{n} c_{1}^{T} Z^{T} K {(I_{n} - λ W_{n})}^{- 1} {(I_{n} - λ W_{n}^{T})}^{- 1} K Z c_{1} \overset{D}{\to} c_{1}^{T} Γ c_{1}

that

s_{1}^{2} = E {(c^{T} e_{n})}^{2} = \frac{h σ^{2}}{n} c_{1}^{T} Z^{T} K {(I_{n} - λ W_{n})}^{- 1} {(I_{n} - λ W_{n}^{T})}^{- 1} K Z c_{1} \overset{D}{\to} c_{1}^{T} Γ c_{1},

By (A7) and Lemma A2, we obtain

\sqrt{n h} s_{z} η_{n} \overset{D}{\to} N (0, f^{- 2} (z) Γ_{11}),

(A8)

where

Γ_{11}

is the

(1, 1) t h

element of

Γ

. It follows by expressions (A4)-(A8) that

\sqrt{n h} (\tilde{m} (z) - m (z)) \overset{D}{\to} N (0, f^{- 2} (z) Γ_{11}) .

□

Proof of Theorem 5.

Noting that

\begin{matrix} \tilde{β} = & {({\bar{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} {\bar{X}}_{n})}^{- 1} {\bar{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} {\bar{Y}}_{n} \\ = & {({\bar{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} {\bar{X}}_{n})}^{- 1} {\bar{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} (I_{n} - F_{n}) Y_{n} \\ = & {({\bar{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} {\bar{X}}_{n})}^{- 1} {\bar{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} (I_{n} - F_{n}) (X_{n} β_{0} + m_{+} + {(I_{n} - λ W_{n})}^{- 1} ε_{n}) \\ = & β + {({\bar{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} {\bar{X}}_{n})}^{- 1} {\bar{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} (I_{n} - F_{n}) m_{+} \\ + {({\bar{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} {\bar{X}}_{n})}^{- 1} {\bar{X}}_{n}^{T} H_{n} A_{r} H_{n}^{T} (I_{n} - F_{n}) {(I_{n} - λ W_{n})}^{- 1} ε_{n} . \end{matrix}

Similar to the proof of Theorem 1, we have

S_{j} m_{j} = m_{j} + \frac{1}{2} Q_{j} + o (h_{j}^{2}),

where

Q_{j} = (\begin{matrix} s_{j, z_{n, 1 j}} Q_{m_{j}} (z_{n, 1 j}) \\ ⋮ \\ s_{j, z_{n, n j}} Q_{m_{j}} (z_{n, n j}) \end{matrix})

,

Q_{m_{j}} (z_{j}) = (\begin{matrix} {(z_{n, 1 j} - z_{j})}^{2} \\ ⋮ \\ {(z_{n, n j - z_{j}})}^{2} \end{matrix}) D^{2} m_{j}

,

D^{2} m_{j} = (\begin{matrix} m_{j}^{″} (z_{n, 1 j}) \\ ⋮ \\ m_{j}^{″} (z_{n, n j}) \end{matrix})

. According to the definition of

S_{j}^{*}

, we get

\begin{matrix} (I_{n} - S_{j}^{*}) m_{j} & = (I_{n} - (I_{n} - \frac{1 1^{T}}{n}) S_{j}) m_{j} = m_{j} - (I_{n} - \frac{1 1^{T}}{n}) S_{j} m_{j} \\ = m_{j} - (I_{n} - \frac{1 1^{T}}{n}) (m_{j} + \frac{1}{2} Q_{j} + o (h_{j}^{2})) \\ = {\bar{m}}_{j} - \frac{1}{2} Q_{j}^{*} + o (h_{j}^{2}), \end{matrix}

where

Q_{j}^{*} = (I_{n} - \frac{1 1^{T}}{n}) Q_{j}

. Combing with Lemma A4, it is easy to know that

\begin{matrix} (I_{n} - F_{j}) m_{j} & = {(I_{n} - S_{j}^{*} F_{n}^{[- j]})}^{- 1} (I_{n} - S_{j}^{*}) m_{j} \\ = {(I_{n} - S_{j}^{*} F_{n}^{[- j]})}^{- 1} ({\bar{m}}_{j} - \frac{1}{2} Q_{j}^{*} + o (h_{j}^{2})) \\ = {\bar{m}}_{j} - \frac{1}{2} {(I_{n} - S_{j}^{*} F_{n}^{[- j]})}^{- 1} Q_{j}^{*} + o (h_{j}^{2}) . \end{matrix}

Denote

m_{(- j)} = m_{1} + \dots + m_{j - 1} + m_{j + 1} + \dots + m_{d}

, thus

m_{+} = m_{j} + m_{(- j)}

. Furthermore, we get

\begin{matrix} (I_{n} - F_{j}) m_{(- j)} & = {(I_{n} - S_{j}^{*} F_{n}^{[- j]})}^{- 1} (I_{n} - S_{j}^{*}) m_{(- j)} \\ = {(I_{n} - S_{j}^{*} F_{n}^{[- j]})}^{- 1} (I_{n} - S_{j}^{*} F_{n}^{[- j]} + S_{j}^{*} F_{n}^{[- j]} - S_{j}^{*}) m_{(- j)} \\ = m_{(- j)} + {(I_{n} - S_{j}^{*} F_{n}^{[- j]})}^{- 1} S_{j}^{*} B_{- j}, \end{matrix}

where

B_{- j} = (F_{n}^{[- j]} - I_{n}) m_{(- j)}

. Therefore, we obtain

\begin{matrix} F_{j} m_{+} & = F_{j} m_{j} + F_{j} m_{(- j)} \\ = m_{j} - {\bar{m}}_{j} + {(I_{n} - S_{j}^{*} F_{n}^{[- j]})}^{- 1} (\frac{1}{2} Q_{j}^{*} - S_{j}^{*} B_{- j}) + o (h_{j}^{2}) . \end{matrix}

By Theorem 2 in Opsomer and Ruppert [2] and Lemma 5 in Fan and Jiang [30], we can show that

\begin{matrix} (I_{n} - F_{n}) m_{+} = & m_{+} - \sum_{j = 1}^{d} F_{j} m_{+} \\ = & m_{+} - \sum_{j = 1}^{d} (m_{j} - {\bar{m}}_{j} + {(I_{n} - S_{j}^{*} F_{n}^{[- j]})}^{- 1} (\frac{1}{2} Q_{j}^{*} - S_{j}^{*} B_{- j}) + o (h_{j}^{2})) \\ = & \sum_{j = 1}^{d} {\bar{m}}_{j} + O \sum_{j = 1}^{d} (h_{j}^{2}) . \end{matrix}

According to the proof of Theorem 4.1 in Opsomer and Ruppert [25], we have

\sum_{j = 1}^{d} {\bar{m}}_{j} = O_{P} (n^{- 1 / 2})

. Combing with Assumption 4, we obtain

\frac{1}{n} H_{n}^{T} (I_{n} - F_{n}) m_{+} = \frac{1}{n} H_{n}^{T} (\sum_{j = 1}^{d} {\bar{m}}_{j} + O \sum_{j = 1}^{d} (h_{j}^{2})) = O (\sum_{j = 1}^{d} (h_{j}^{2}) + n^{- 3 / 2}) .

Let

D = H_{n}^{T} (I_{n} - F_{n}) {(I_{n} - λ W_{n})}^{- 1}

, it follows by Assumptions 1–2 and Lemma A4 that

\begin{matrix} E (H_{n}^{T} (I_{n} - F_{n}) {(I_{n} - λ W_{n})}^{- 1} ε_{n}) = 0, \\ Var (H_{n}^{T} (I_{n} - F_{n}) {(I_{n} - λ W_{n})}^{- 1} ε_{n}) = σ^{2} D D^{T} . \end{matrix}

By Assumption 6 and the proof of Theorem 1, we get

E (\tilde{β} - β) = {(R_{H X}^{T} A R_{H X})}^{- 1} R_{H X}^{T} A O_{P} (\sum_{j = 1}^{d} h_{j}^{2} + n^{- 3 / 2}) = O_{P} (\sum_{j = 1}^{d} h_{j}^{2} + n^{- 3 / 2}),

Var (\tilde{β} - β) = \frac{σ^{2}}{n} {(R_{H X}^{T} A R_{H X})}^{- 1} R_{H X}^{T} A Σ_{2} A R_{H X} {(R_{H X}^{T} A R_{H X})}^{- 1} .

Therefore, we have

\sqrt{n} (\tilde{β} - β) \overset{D}{\to} N (0, Ω_{2}),

where

Ω_{2} = σ^{2} {(R_{H X}^{T} A R_{H X})}^{- 1} R_{H X}^{T} A Σ_{2} A R_{H X} {(R_{H X}^{T} A R_{H X})}^{- 1}

. □

Proof of Theorem 6.

Similar to the proof of Theorem 2, it is not difficult to show that analogous statements hold for this theorem. We just need to prove

{\tilde{η}}_{n} \overset{P}{\to} η_{n}

. Recall that

\begin{matrix} {\tilde{η}}_{n} & = {\bar{Y}}_{n} - {\bar{X}}_{n} \tilde{β} = (I_{n} - F_{n}) Y_{n} - (I_{n} - F_{n}) X_{n} \tilde{β} \\ = (I_{n} - F_{n}) (X_{n} β + m_{+} + η_{n}) - (I_{n} - F_{n}) X_{n} \tilde{β} \\ = (I_{n} - F_{n}) X_{n} (β - \tilde{β}) + (I_{n} - F_{n}) m_{+} + (I_{n} - F_{n}) η_{n}, \end{matrix}

we get

{\tilde{η}}_{n} - η_{n} = (I_{n} - F_{n}) X_{n} (β - \tilde{β}) + (I_{n} - F_{n}) m_{+} - F_{n} η_{n} .

By Theorem 5 and the proof of Theorem 4.1 in Opsomer and Ruppert [25], we obtain

(I_{n} - F_{n}) m_{+} = \sum_{j = 1}^{d} {\bar{m}}_{j} + O (\sum_{j = 1}^{d} h_{j}^{2}) = O_{P} (n^{- 1 / 2}) + O (\sum_{j = 1}^{d} h_{j}^{2}) = O_{P} (n^{- 1 / 2} + \sum_{j = 1}^{d} h_{j}^{2}) .

It follows by Assumption 2, Theorem 5 and Lemma A4 that

(I_{n} - F_{n}) X_{n} (β - \tilde{β}) = O_{P} (1) .

By Assumption 1 and Lemma A4, we have

\begin{matrix} E ((I_{n} - F_{n}) η_{n}) = & E ((I_{n} - F_{n}) {(I_{n} - λ W_{n})}^{- 1} ε_{n}) = 0, \\ Var ((I_{n} - F_{n}) η_{n}) = & σ^{2} (I_{n} - F_{n}) {(I_{n} - λ W_{n})}^{- 1} {(I_{n} - λ W_{n}^{T})}^{- 1} {(I_{n} - F_{n})}^{T}, \end{matrix}

and

Var ((I_{n} - F_{n}) η_{n})

is bounded. It follows from Chebyshev Law of Large Number that

(I_{n} - F_{n}) η_{n} \overset{P}{\to} 0

. Therefore, we obtain

{\tilde{η}}_{n} - η_{n} \overset{P}{\to} 0 .

□

Proof of Theorem 7.

The reasoning for this theorem is completely analogous with Theorem 3. Noting that

\begin{matrix} \hat{β} & = {(X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} X_{n}^{*} (\tilde{λ}))}^{- 1} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} Y_{n}^{*} (\tilde{λ}) \\ = β + {(X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} X_{n}^{*} (\tilde{λ}))}^{- 1} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} ζ_{n}^{*} (\tilde{λ}), \end{matrix}

where

\begin{matrix} ζ_{n}^{*} (\tilde{λ}) = & Y_{n}^{*} (\tilde{λ}) - X_{n}^{*} (\tilde{λ}) β = {\bar{Y}}_{n} - \tilde{λ} W_{n} {\bar{Y}}_{n} - ({\bar{X}}_{n} - \tilde{λ} W_{n} {\bar{X}}_{n}) β \\ = & η_{n} - \tilde{λ} W_{n} η_{n} = ε_{n} - (\tilde{λ} - λ) W_{n} η_{n} . \end{matrix}

Thus

\begin{matrix} \sqrt{n} (\hat{β} - β) = & {(\frac{1}{n} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} X_{n}^{*} (\tilde{λ}))}^{- 1} n^{- 1 / 2} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} ε_{n} \\ + {(\frac{1}{n} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} X_{n}^{*} (\tilde{λ}))}^{- 1} (λ - \tilde{λ}) n^{- 1 / 2} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} W_{n} η_{n} . \end{matrix}

If

A_{r} = {(\frac{1}{n} H_{n}^{T} H_{n})}^{- 1}

, then

\begin{matrix} \frac{1}{n} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} X_{n}^{*} (\tilde{λ}) \\ = & \frac{1}{n} {({\bar{X}}_{n} - \tilde{λ} W_{n} {\bar{X}}_{n})}^{T} H_{n} A_{r} H_{n}^{T} ({\bar{X}}_{n} - \tilde{λ} W_{n} {\bar{X}}_{n}) \\ = & \frac{1}{n} {((I_{n} - F_{n}) X_{n} - \tilde{λ} W_{n} (I_{n} - F_{n}) X_{n})}^{T} H_{n} A_{r} H_{n}^{T} ((I_{n} - F_{n}) X_{n} - \tilde{λ} W_{n} (I_{n} - F_{n}) X_{n}) \\ = & {(\frac{1}{n} H_{n}^{T} (I_{n} - F_{n}) X_{n} - \frac{1}{n} \tilde{λ} H_{n}^{T} W_{n} (I_{n} - F_{n}) X_{n})}^{T} \\ {(\frac{1}{n} H_{n}^{T} H_{n})}^{- 1} (\frac{1}{n} H_{n}^{T} (I_{n} - F_{n}) X_{n} - \frac{1}{n} \tilde{λ} H_{n}^{T} W_{n} (I_{n} - F_{n}) X_{n}) . \end{matrix}

It follows by Assumption 6 and Theorem 6 that

\frac{1}{n} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} X_{n}^{*} (\tilde{λ}) = \bar{R} + o_{P} (1),

(A9)

where

\bar{R} = {(R_{H X} - λ_{0} R_{H W X})}^{T} Q_{H H}^{- 1} (R_{H X} - λ_{0} R_{H W X})

.

Using Assumption 2 and Assumption 6, it is easy to show that

n^{- 1 / 2} H_{n}^{T} ε_{n} \overset{D}{\to} N (0, σ^{2} Q_{H H}) .

Combining with (A9), we can obtain

\begin{matrix} n^{- 1 / 2} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} ε_{n} \\ = & n^{- 1 / 2} {({\bar{X}}_{n} - \tilde{λ} W_{n} {\bar{X}}_{n})}^{T} H_{n} A_{r} H_{n}^{T} ε_{n} \\ = & {(\frac{1}{n} H_{n}^{T} (I_{n} - F_{n}) X_{n} - \frac{1}{n} \tilde{λ} H_{n}^{T} W_{n} (I_{n} - F_{n}) X_{n})}^{T} {(\frac{1}{n} H_{n}^{T} H_{n})}^{- 1} (n^{- 1 / 2} H_{n}^{T} ε_{n}) \\ \overset{D}{\to} N (0, σ^{2} \bar{R}) . \end{matrix}

(A10)

By the proof of Theorem 3, we have

E (n^{- 1 / 2} H_{n}^{T} W_{n} η_{n}) = 0,

E (n^{- 1} H_{n}^{T} W_{n} η_{n} η_{n}^{T} W_{n}^{T} H_{n}) = \frac{σ^{2}}{n} H_{n}^{T} W_{n} {(I_{n} - λ W_{n})}^{- 1} {(I_{n} - λ W_{n}^{T})}^{- 1} W_{n}^{T} H_{n} .

It follows by Assumptions 1–2 that

H_{n}^{T} W_{n} {(I_{n} - λ W_{n})}^{- 1} {(I_{n} - λ W_{n}^{T})}^{- 1} W_{n}^{T} H_{n}

is bounded. Then

n^{- 1} H_{n}^{T} W_{n} η_{n} η_{n}^{T} W_{n}^{T} H_{n} = O (1) .

Thus

n^{- 1 / 2} H_{n}^{T} W_{n} η_{n} = O_{P} (1) .

Therefore, we get

\begin{matrix} (\tilde{λ} - λ) n^{- 1 / 2} X_{n}^{* T} (\tilde{λ}) H_{n} A_{r} H_{n}^{T} W_{n} η_{n} \\ = & (\tilde{λ} - λ) n^{- 1 / 2} {((I_{n} - F_{n}) X_{n} - \tilde{λ} W_{n} (I_{n} - F_{n}) X_{n})}^{T} H_{n} A_{r} H_{n}^{T} W_{n} η_{n} \\ = & (\tilde{λ} - λ) {(\frac{1}{n} H_{n}^{T} (I_{n} - F_{n}) X_{n} - \frac{1}{n} \tilde{λ} H_{n}^{T} W_{n} (I_{n} - F_{n}) X_{n})}^{T} {(\frac{1}{n} H_{n}^{T} H_{n})}^{- 1} n^{- 1 / 2} H_{n}^{T} W_{n} η_{n} \\ \overset{P}{\to} 0 . \end{matrix}

(A11)

According to (A9)–(A11), we obtain that

\sqrt{n} (\hat{β} - β) \overset{D}{\to} N (0, σ^{2} {\bar{R}}^{- 1}) .

□

Proof of Theorem 8.

Recall that

\begin{matrix} {\hat{m}}_{j} & = F_{j} (Y_{n} - X_{n} \hat{β}) = F_{j} (X_{n} β + m_{+} + η_{n} - X_{n} \hat{β}) \\ = F_{j} X_{n} (β - \hat{β}) + F_{j} m_{+} + F_{j} η_{n} . \end{matrix}

It follows by the proof of Theorem 7 that

\hat{β} - β = O (n^{- 1 / 2})

. Combing Assumption 2 and Lemma A4, we obtain

F_{j} X_{n} (β - \hat{β}) = F_{j} X_{n} O (n^{- 1 / 2}) = O (n^{- 1 / 2}) .

According to the proof of Theorem 5, noting that

F_{j} m_{+} = m_{j} - {\bar{m}}_{j} + {(I_{n} - S_{j}^{*} F_{n}^{[- j]})}^{- 1} (\frac{1}{2} Q_{j}^{*} - S_{j}^{*} B_{- j}) + o_{P} (h_{j}^{2}) .

Using Assumption 1 and Lemma A4, we get

\begin{matrix} E (F_{j} η_{n}) = & E (F_{j} {(I_{n} - λ W_{n})}^{- 1} ε_{n}) = 0, \\ Var (F_{j} η_{n}) = & Var (F_{j} {(I_{n} - λ W_{n})}^{- 1} ε_{n}) = σ^{2} F_{j} {(I_{n} - λ W_{n})}^{- 1} {(I_{n} - λ W_{n}^{T})}^{- 1} F_{j}^{T} . \end{matrix}

By the above arguments, we have

E ({\hat{m}}_{j} - m_{j}) = - {\bar{m}}_{j} + {(I_{n} - S_{j}^{*} F_{n}^{[- j]})}^{- 1} (\frac{1}{2} Q_{j}^{*} - S_{j}^{*} B_{- j}) + o_{P} (h_{j}^{2}) .

Combing with the proof of Theorem 4.1 in Opsomer and Ruppert [25], we obtain

\begin{matrix} E ({\hat{m}}_{j} - m_{j}) = & \frac{1}{2} h_{j}^{2} μ_{2}^{j} (m_{j}^{″} - E (m_{j}^{″})) - S_{j}^{*} B_{- j} + O_{P} (n^{- 1 / 2}) + o_{P} (h_{j}^{2}), \\ Var ({\hat{m}}_{j} - m_{j}) = & σ^{2} F_{j} {(I_{n} - λ W_{n})}^{- 1} {(I_{n} - λ W_{n}^{T})}^{- 1} F_{j}^{T} . \end{matrix}

□

References

Fan, J.; Gijbels, I. Local Polynomial Modelling and Its Applications; Chapman & Hall: London, UK, 1996; pp. 19–56. [Google Scholar]
Opsomer, J.D.; Ruppert, D. A root-n consistent backfitting estimator for semiparametric additive modeling. J. Comput. Graph. Stat. 1999, 8, 715–732. [Google Scholar] [CrossRef]
Manzan, S.; Zerom, D. Kernel estimation of a partially linear additive model. Stat. Probabil. Lett. 2005, 72, 313–332. [Google Scholar] [CrossRef]
Zhou, Z.; Jiang, R.; Qiao, W. Variable selection for additive partially linear models with measurement error. Metrika 2011, 74, 185–202. [Google Scholar] [CrossRef]
Wei, C.; Luo, Y.; Wu, X. Empirical likelihood for partially linear additive errors–in–variables models. Stat. Pap. 2012, 53, 485–496. [Google Scholar] [CrossRef]
Hoshino, T. Quantile regression estimation of partially linear additive models. J. Nonparametr. Stat. 2014, 26, 509–536. [Google Scholar] [CrossRef]
Lou, Y.; Bien, J.; Caruana, R.; Gehrke, J. Sparse partially linear additive models. J. Comput. Graph. Stat. 2016, 25, 1126–1140. [Google Scholar] [CrossRef]
Liu, R.; Härdle, W.K.; Zhang, G. Statistical inference for generalized additive partially linear models. J. Multivar. Anal. 2017, 162, 1–15. [Google Scholar] [CrossRef]
Manghi, R.F.; Cysneiros, F.J.A.; Paula, G.A. Generalized additive partial linear models for analyzing correlated data. Comput. Stat. Data. An. 2019, 129, 47–60. [Google Scholar] [CrossRef]
Li, T.; Mei, C. Statistical inference on the parametric component in partially linear spatial autoregressive models. Commun. Stat. Simul. Comput. 2016, 45, 1991–2006. [Google Scholar] [CrossRef]
Anselin, L. Spatial Econometrics: Methods and Models; Kluwer Academic Publisher: Dordrecht, The Netherlands, 1988; pp. 16–28. [Google Scholar]
Kelejian, H.H.; Prucha, I.R. A generalized spatial two-stage least squares procedure for estimation a spatial autoregressive model with autoregressive disturbances. J. Real. Estate. Financ. Econ. 1998, 17, 99–121. [Google Scholar] [CrossRef]
Elhorst, J.P. Spatial Econometrics from Cross-Sectional Data to Spatial Panels; Springer: New York, NY, USA, 2014; pp. 5–23. [Google Scholar]
Su, L.; Jin, S. Profile quasi-maximum likelihood estimation of partially linear spatial autoregressive models. J. Econom. 2010, 157, 18–33. [Google Scholar] [CrossRef]
Su, L. Semiparametric GMM estimation of spatial autoregressive models. J. Econom. 2012, 167, 543–560. [Google Scholar] [CrossRef]
Sun, Y. Estimation of single-index model with spatial interaction. Reg. Sci. Urban. Econ. 2017, 62, 36–45. [Google Scholar] [CrossRef]
Cheng, S.; Chen, J.; Liu, X. GMM estimation of partially linear single-index spatial autoregressive model. Spat. Stat. 2019, 31, 100354. [Google Scholar] [CrossRef]
Wei, H.; Sun, Y. Heteroskedasticity-robust semi-parametric GMM estimation of a spatial model with space-varying coefficients. Spat. Econ. Anal. 2017, 12, 113–128. [Google Scholar] [CrossRef]
Dai, X.; Li, S.; Tian, M. Quantile Regression for Partially Linear Varying Coefficient Spatial Autoregressive Models. Available online: https://arxiv.org/pdf/1608.01739.pdf (accessed on 5 August 2016).
Du, J.; Sun, X.; Cao, R.; Zhang, Z. Statistical inference for partially linear additive spatial autoregressive models. Spat. Stat. 2018, 25, 52–67. [Google Scholar] [CrossRef]
Lin, X.; Carroll, R.J. Nonparametric function estimation for clustered data when the predictor is measured without/with error. J. Am. Stat. Assoc. 2000, 95, 520–534. [Google Scholar] [CrossRef]
Buja, A.; Hastie, T.; Tibshirani, R. Linear smoothers and additive models. Ann. Stat. 1989, 17, 453–510. [Google Scholar] [CrossRef]
Härdle, W.; Hall, P. On the backfitting algorithm for additive regression models. Stat. Neerl. 1993, 47, 43–57. [Google Scholar] [CrossRef]
Hastie, T.J.; Tibshirani, R.J. Generalized Additive Models; Chapman & Hall: London, UK, 1990; pp. 136–167. [Google Scholar]
Opsomer, J.D.; Ruppert, D. Fitting a bivariate additive model by local polynomial regression. Ann. Stat. 1997, 25, 186–211. [Google Scholar] [CrossRef]
Opsomer, J.D. Asymptotic properties of backfitting estimators. J. Multivar. Anal. 2000, 73, 166–179. [Google Scholar] [CrossRef]
Fan, J.; Wu, Y. Semiparametric estimation of covariance matrices for longitudinal data. J. Am. Stat. Assoc. 2008, 103, 1520–1533. [Google Scholar] [CrossRef]
Zhang, H.H.; Cheng, G.; Liu, Y. Linear or nonlinear? Automatic structure discovery for partially linear models. J. Am. Stat. Assoc. 2011, 106, 1099–1112. [Google Scholar] [CrossRef] [PubMed]
Davidson, J. Stochastic Limit Theory: An Introduction for Econometricians; Oxford University Press: Oxford, UK, 1994; pp. 369–373. [Google Scholar]
Fan, J.; Jiang, J. Nonparametric inferences for additive models. J. Am. Stat. Assoc. 2005, 100, 890–907. [Google Scholar] [CrossRef]

Figure 1. The fitting results of

m_{1}

in Model (18).

Figure 1. The fitting results of

m_{1}

in Model (18).

Figure 2. The fitting results of

m_{2}

in Model (18).

Figure 2. The fitting results of

m_{2}

in Model (18).

Figure 3. The estimates of the nonparametric functions in Model (19).

Table 1. The results of parametric estimates in Model (18) with

λ = 0.25

.

Table 1. The results of parametric estimates in Model (18) with

λ = 0.25

.

Parameter	True Value	MEAN	SD	MSE	MEAN	SD	MSE
Parameter	True Value	$n = 49$			$n = 64$
$λ$	0.2500	0.2838	0.0560	0.0043	0.2837	0.0201	0.0015
$β_{1}$	1.0000	1.0067	0.1318	0.0174	1.0020	0.1057	0.0112
$β_{2}$	1.5000	1.4828	0.1256	0.0161	1.4996	0.1071	0.0115
$σ^{2}$	0.0100	0.0535	0.0678	0.0065	0.0335	0.0469	0.0028
$m_{1}$	-	1.1696	0.7665	-	0.5390	0.4070	-
$m_{2}$	-	3.4316	0.4778	-	2.9212	0.3052	-
		$n = 81$			$n = 100$
$λ$	0.2500	0.2733	0.0172	0.0008	0.2730	0.0133	0.0007
$β_{1}$	1.0000	0.9974	0.0923	0.0085	0.9934	0.0824	0.0068
$β_{2}$	1.5000	1.4974	0.0905	0.0082	1.5025	0.0823	0.0068
$σ^{2}$	0.0100	0.0181	0.0276	0.0008	0.0132	0.0204	0.0004
$m_{1}$	-	0.5216	0.4050	-	0.3775	0.2905	-
$m_{2}$	-	2.7251	0.2396	-	2.3652	0.1914	-
		$n = 225$			$n = 400$
$λ$	0.2500	0.2618	0.0053	0.0002	0.2614	0.0033	0.0001
$β_{1}$	1.0000	1.0013	0.0573	0.0033	0.9979	0.0373	0.0014
$β_{2}$	1.5000	1.5009	0.0546	0.0030	1.5026	0.0409	0.0017
$σ^{2}$	0.0100	0.0116	0.0049	2.66E-5	0.0113	0.0035	1.39E-5
$m_{1}$	-	0.1673	0.1303	-	0.0858	0.0654	-
$m_{2}$	-	1.5636	0.0852	-	1.1394	0.0473	-

Table 2. The results of parametric estimates in Model (18) with

λ = 0.5

.

Table 2. The results of parametric estimates in Model (18) with

λ = 0.5

.

Parameter	True Value	MEAN	SD	MSE	MEAN	SD	MSE
Parameter	True Value	$n = 49$			$n = 64$
$λ$	0.5000	0.5438	0.0341	0.0031	0.5366	0.0209	0.0018
$β_{1}$	1.0000	1.0019	0.1289	0.0166	0.9948	0.1147	0.0132
$β_{2}$	1.5000	1.5015	0.1236	0.0153	1.4997	0.1106	0.0122
$σ^{2}$	0.0100	0.0477	0.0652	0.0057	0.0319	0.0434	0.0024
$m_{1}$	-	1.1397	0.7554	-	0.5453	0.4035	-
$m_{2}$	-	3.4167	0.4098	-	2.9159	0.3232	-
		$n = 81$			$n = 100$
$λ$	0.5000	0.5348	0.0208	0.0016	0.5295	0.0129	0.0010
$β_{1}$	1.0000	0.9979	0.0919	0.0085	0.9999	0.0824	0.0068
$β_{2}$	1.5000	1.5027	0.0901	0.0081	1.4992	0.0877	0.0077
$σ^{2}$	0.0100	0.0200	0.0309	0.0011	0.0149	0.0228	0.0005
$m_{1}$	-	0.5001	0.4017	-	0.3653	0.2852	-
$m_{2}$	-	2.7182	0.2304	-	2.3661	0.1948	-
		$n = 225$			$n = 400$
$λ$	0.5000	0.5184	0.0049	0.0004	0.5137	0.0028	0.0002
$β_{1}$	1.0000	1.0013	0.0566	0.0032	1.0024	0.0414	0.0017
$β_{2}$	1.5000	1.5005	0.0572	0.0033	1.4998	0.0408	0.0017
$σ^{2}$	0.0100	0.0032	0.0043	0.0001	0.0031	0.0031	0.0001
$m_{1}$	-	0.1689	0.1238	-	0.0872	0.0645	-
$m_{2}$	-	1.5603	0.0897	-	1.1407	0.0501	-

Table 3. The results of parametric estimates in Model (18) with

λ = 0.75

.

Table 3. The results of parametric estimates in Model (18) with

λ = 0.75

.

Parameter	True Value	MEAN	SD	MSE	MEAN	SD	MSE
Parameter	True Value	$n = 49$			$n = 64$
$λ$	0.7500	0.7838	0.0559	0.0043	0.7837	0.0194	0.0015
$β_{1}$	1.0000	0.9926	0.1317	0.0174	0.9894	0.1093	0.0121
$β_{2}$	1.5000	1.5051	0.1252	0.0157	1.4942	0.1070	0.0115
$σ^{2}$	0.0100	0.0548	0.0735	0.0074	0.0343	0.0470	0.0028
$m_{1}$	-	1.2188	0.8086	-	0.5739	0.4374	-
$m_{2}$	-	3.4220	0.4281	-	2.9343	0.2837	-
		$n = 81$			$n = 100$
$λ$	0.7500	0.7734	0.0187	0.0009	0.7730	0.0139	0.0007
$β_{1}$	1.0000	0.9945	0.0940	0.0089	1.0052	0.0842	0.0071
$β_{2}$	1.5000	1.4976	0.1030	0.0106	1.4965	0.0853	0.0073
$σ^{2}$	0.0100	0.0226	0.0340	0.0013	0.0131	0.0201	0.0004
$m_{1}$	-	0.5159	0.3918	-	0.4090	0.2959	-
$m_{2}$	-	2.7176	0.2388	-	2.3667	0.1983	-
		$n = 225$			$n = 400$
$λ$	0.7500	0.7618	0.0049	0.0002	0.7614	0.0031	0.0001
$β_{1}$	1.0000	1.0010	0.0564	0.0032	0.9997	0.0390	0.0015
$β_{2}$	1.5000	1.5030	0.0538	0.0029	1.5013	0.0402	0.0016
$σ^{2}$	0.0100	0.0124	0.0046	2.69E-5	0.0103	0.0033	1.09E-5
$m_{1}$	-	0.1671	0.1265	-	0.0818	0.0615	-
$m_{2}$	-	1.5586	0.0850	-	1.1395	0.0471	-

Table 4. Estimation results of unknown parameters in Model (19).

	Estimate	SD	MSE	Lower Bound	Upper Bound
$λ$	0.4689	0.0916	0.0085	0.3797	0.5216
$β_{1}$	0.1625	0.0468	0.0023	0.0708	0.2105
$β_{2}$	−0.3455	0.0790	0.0065	−0.4336	−0.1831

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, J.; Cheng, S. GMM Estimation of a Partially Linear Additive Spatial Error Model. Mathematics 2021, 9, 622. https://doi.org/10.3390/math9060622

AMA Style

Chen J, Cheng S. GMM Estimation of a Partially Linear Additive Spatial Error Model. Mathematics. 2021; 9(6):622. https://doi.org/10.3390/math9060622

Chicago/Turabian Style

Chen, Jianbao, and Suli Cheng. 2021. "GMM Estimation of a Partially Linear Additive Spatial Error Model" Mathematics 9, no. 6: 622. https://doi.org/10.3390/math9060622

APA Style

Chen, J., & Cheng, S. (2021). GMM Estimation of a Partially Linear Additive Spatial Error Model. Mathematics, 9(6), 622. https://doi.org/10.3390/math9060622

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

GMM Estimation of a Partially Linear Additive Spatial Error Model

Abstract

1. Introduction

2. Model and Estimation

2.1. Model Specification

2.2. Estimation Procedures

3. Asymptotic Properties

3.1. Assumptions

3.2. Asymptotic Properties

4. Simulation Studies

5. A Real Data Example

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI