Profile Maximum Likelihood Estimation of Single-Index Spatial Dynamic Panel Data Model

Mengqi Zhang; Boping Tian

doi:10.3390/math11132947

and

Department of Mathematics, Harbin Institute of Technology, Harbin 150001, China

^*

Author to whom correspondence should be addressed.

Mathematics2023, 11(13), 2947;https://doi.org/10.3390/math11132947

Version Notes

Order Reprints

Abstract

In this paper, the spatial dynamic panel data (SDPD) model is extended to the single-index spatial dynamic panel data (Si-SDPD) model by introducing a nonlinear connection function to reflect the interaction between explanatory variables. The Si-SDPD model not only retains the advantages of the parametric SDPD model in dealing with spatial and temporal interaction effects and spatio-temporal dependencies, but also solves the limitations of the parametric SDPD model that may lead to missed bias. It reduces the data dimension of non-parametric models and enhances the practicability and explanatory power of parametric models. Since the parts of the model to be estimated contain unknown functions, we propose a new estimation method, a profile maximum likelihood (PML) method, to solve the problem of incidental parameters in the estimation. Under the assumption that the spatial coefficients are known, we preliminarily estimate the unknown function by carrying out local polynomial estimation, so as to transform the model into the parametric form for solving purposes. We then solve the dynamic panel parametric model via quasi-maximum likelihood (QML) estimation. We derive the asymptotic properties of profile maximum likelihood estimators (PMLEs) and find that, under certain regularity conditions, both parametric and non-parametric estimators are consistent. Monte Carlo results show that PMLEs have good finite sample performance.

Keywords:

profile maximum likelihood; nonparametric estimation; spatial dynamic panel data; single-index panel model

MSC:

62F12; 62G20; 62H11

1. Introduction

Research into spatial panel data is always a hot topic in econometrics. Scholars have gradually applied the spatial panel data model to analyze social and economic issues, such as the effective allocation of resources, factors influencing economic growth, energy consumption, environmental pollution, foreign direct investments, and so on. Compared with time series or cross-section data, panel data can increase the degree of freedom and reduce collinearity, thereby improving the accuracy of parameter estimation. Spatial panel data models can be divided into parametric and non-parametric models according to different assumptions about the explanatory variables. The parametric panel data model can simply and clearly describe the relationship between dependent and independent variables. Refs. [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] and others have made outstanding contributions to parametric panel data modelling and its inference of economic phenomena. However, in many practical problems, the relationship between variables is usually complex, and it is often difficult to accurately describe this relationship using the preset form of the parametic model. In order to solve this problem, some non-parametric panel data models have been widely studied. Because the function form of a non-parametric model can be arbitrary, the model has great adaptability. When there are many explanatory variables in non-parametric models, the so-called “curse of dimensionality” problem often occurs, leading to the reduced reliability of estimates. Therefore, some new models have been proposed, and one of the effective methods is to build a single-index model.

The basic mathematical form of the single-index model is

Y = g (x^{'} β) + ε

(Ref. [16]), where

β

is the unknown parameter and

g (\cdot)

is the unknown connection function

g (\cdot)

. Its advantages are that it achieves dimension reduction by connecting functions, effectively avoids the problem of “curse of dimensionality”, and (at the same time) better reflects the relationship between variables. In recent years, the research focus on various extended single-index models has been to seek effective estimates of

β

and

g (\cdot)

. For the single-index model of cross-section data, Ref. [17] proposed a semi-parametric maximum likelihood estimation method, while Ref. [18] proposed non-iterative methods for estimating

β

and

g (\cdot)

based on the parameter constraint assumption, which improves the operation speed. Ref. [19] proposed root-n-consistent estimators for the variance components and a local linear smoother for the connection function. Ref. [20] constructed a semi-parametric minimum average variance estimation method in a partially linear single-index panel data model with fixed effects, and demonstrated the asymptotic properties of the estimation. Compared with the cross-sectional data model, there are very few studies on the single-index model for panel data, and even fewer on the single-index spatial panel data model because of its greater estimation complexity.

According to these research trends of spatial econometric models, we design a Si-SDPD model by introducing a nonlinear connection function to reflect the interaction between explanatory variables. The Si-SDPD model not only retains the advantages of the parametric SDPD model in dealing with spatial and temporal interaction effects and spatio-temporal dependencies, but also solves the limitations of the parametric SDPD model that may lead to missed bias. It reduces the data dimension of non-parametric models and enhances the practicability and explanatory power of parametric models. It is worth noting that due to the presence of a spatial lag term and the multi-directionality of the spatial correlation, the traditional estimation of the single-index model may not be directly applied to estimate the Si-SDPD model. In order to overcome the above difficulties, we propose a new estimation method, namely PML, which can first estimate the unknown functions via local polynomial estimation under the assumption that the spatial coefficients are known, and then solves the dynamic panel parametric model by conducting QML estimation. Then, we derive the asymptotic properties of the PMLEs of the Si-SDPD model and find that under certain regularity conditions, both the parametric and non-parametric estimators are consistent. Finally, we present a rigorous theoretical analysis of the asymptotic properties of PMLEs and verify some of their finite-sample properties by carrying out Monte Carlo experiments.

This paper is organized as follows. In Section 2, we introduce the Si-SDPD model and explain our PML estimation method. With the law of large numbers and central limit theorem for our settings developed in the Appendix A, Section 3 establishes the consistency of the parametric and nonparametric parts of the PMLE. We then present a Monte Carlo simulation to verify that the estimators have good finite sample performance. Section 5 concludes the paper. Some useful lemmas and results are provided in the Appendix.

2. The Model and Profile Maximum Likelihood Estimators

2.1. The Model

The model considered in this paper refers to the single-index spatial dynamic panel data (Si-SDPD) model:

Y_{n t} = ρ_{0} W_{n} Y_{n t} + γ_{0} Y_{n, t - 1} + τ_{0} W_{n} Y_{n, t - 1} + G_{n t} + V_{n t}, t = 1, 2, \dots, T

(1)

where

Y_{n t} = (y_{1 t}, y_{2 t}, \dots, y_{n t})'

and

V_{n t} = (v_{1 t}, v_{2 t}, \dots, v_{n t})'

are n × 1 column vectors;

v_{n t}

is independent and identically distributed across i and t with a zero mean and variance

σ_{0}^{2}

;

W_{n}

is the

n \times n

spatial weight matrix, which is predetermined and generates spatial dependence between cross-sectional units

y_{i t}

;

G_{n t} = (g (x_{1}' β), g (x_{2}' β), \dots, g (x_{n}' β))'

,

g (\cdot)

is an unary unknown function,

||β|| = 1

; and the first component of

β

is positive and is an identifiable condition.

Define

Z_{n t} = (Y_{n, t - 1}, W_{n} Y_{n, t - 1})

,

A_{n} (ρ) = I - ρ W_{n}

,

θ = (β, δ', ρ, σ^{2})'

and

ζ = (β, δ', ρ)'

, where

δ = (γ, τ)'

. At the true value,

A_{n} = A_{n} (ρ_{0}) = I - ρ_{0} W_{n}

,

θ_{0} = (β_{0}, δ_{0}^{'}, ρ_{0}, σ_{0}^{2})'

,

ζ_{0} = (β_{0}, δ_{0}^{'}, ρ_{0})'

, where

δ_{0} = (γ_{0}, τ_{0})'

. Then, presuming

A_{n}

is invertible and (1) can be rewritten as

Y_{n t} = A_{n}^{- 1} (Z_{n t} δ_{0} + G_{n t} + V_{n t})

. The likelihood function of (1) is

l n L_{n, T} (θ, G_{n t}) = - \frac{n T}{2} l n (2 π σ^{2}) + T l n |A_{n} (ρ)| - \frac{1}{2 σ^{2}} \sum_{t = 1}^{T} V_{n t}' (ζ) V_{n t} (ζ)

(2)

where

V_{n t} (ζ) = A_{n} (ρ) Y_{n t} - Z_{n t} δ - G_{n t}

and

V_{n t} (ζ) = (v_{1}, v_{2}, \dots, v_{n})'

. Thus,

V_{n t} = V_{n t} (ζ_{0})

.

The QMLE

{\hat{θ}}_{n T}

is the extreme estimator derived from the maximization of (2). When the

V_{n t}

s are normally distributed,

{\hat{θ}}_{n T}

is the MLE; when the

V_{n t}

s are not normally distributed,

{\hat{θ}}_{n T}

is the QMLE.

2.2. The Profile Maximum Likelihood Estimation

For the likelihood function shown in (2), the parameter estimation method is not feasible because

g (x_{i}' β), i = 1, \dots, n

is unknown. In order to obtain a feasible estimate, we propose adopting the PML method. First, we consider the parameter

θ = (β, δ', ρ, σ^{2})'

as known, and then (1) becomes a general spatial nonparametric model. The initial estimate

{\hat{g}}_{I N} (x_{i}' β)

of

g (\cdot)

can be obtained using the kernel estimation. Obviously,

{\hat{g}}_{I N} (x_{i}' β)

is a function of the parameter

θ

. By replacing

g (\cdot)

in (2) with

{\hat{g}}_{I N} (x_{i}' β)

, we obtain the likelihood function with parameter

θ

. Then, by maximizing the likelihood function, we obtain the estimator

\hat{θ} = (\hat{β}, \hat{δ'}, \hat{ρ}, \hat{σ^{2}})'

of

θ

. Finally, the final estimate of

g (\cdot)

,

\hat{g} (x_{i}' β)

, is obtained by replacing

θ

in

{\hat{g}}_{I N} (x_{i}' β)

with

\hat{θ}

.

The specific steps of the profile maximum likelihood method are as follows.

Step 1: Considering $θ = (β, δ', ρ, σ^{2})'$ as known, we obtain ${\hat{g}}_{I N} (x_{i}' β)$ that is an initial estimate of $g (\cdot)$ using local polynomial estimation.

We denote

Y_{t} = A_{n} (ρ) Y_{n t} - Z_{n t} δ = G_{n t} + V_{n t} (ζ)

and

Y_{t} = (y_{1}, y_{2}, \dots, y_{n})'

, and (1) can be written as

y_{i} = g (x_{i}' β) + v_{i} = g (u_{i}) + v_{i}, i = 1, \dots, n

. At

u

, the p-order Taylor expansion of

g (u_{i})

is:

\begin{matrix} g (u_{i}) \approx g (u) + g' (u) (u_{i} - u) + \dots + \frac{1}{p!} g^{(p)} {(u_{i} - u)}^{p} \\ ≜ λ_{0} + λ_{1} (u_{i} - u) + \dots + λ_{p} {(u_{i} - u)}^{p} \dots \end{matrix}

(3)

Therefore, we can use the samples near

u

to perform weighted regression to estimate

g (u_{i})

and its higher-order derivatives, i.e., solving the following minimization problem:

\min_{\{λ_{0}, λ_{1}\}} \frac{1}{n} \sum_{i = 1}^{n} {\{y_{i} - λ_{0} - λ_{1} (u_{i} - u)\}}^{2} k_{h} (u_{i} - u)

(4)

where

k_{h} (u_{i} - u) = h^{- p} k (\frac{u_{i} - u}{h})

,

k (\frac{u_{i} - u}{h}), i = 1, \dots, n

is the multivariate kernel function and h is the bandwidth. In order to simplify the theoretical derivation, all variables in this chapter have the same window width, and the conclusion is also valid under the assumption of different window widths.

For the convenience of the following matrix operations, we denote

p = n - 1

. Then, we denote

M (u, β) = (\begin{matrix} 1 & \dots & 1 \\ \frac{u_{1} - u}{h} & \dots & \frac{u_{n} - u}{h} \end{matrix})', λ = (λ_{0}, λ_{1})'

and

κ (u, β) = d i a g (k (\frac{u_{1} - u}{h}), \dots, k (\frac{u_{n} - u}{h}))

, so

M (u, β)

is a

2 \times n

matrix. We can rewrite the objective function (4) as:

\underset{λ}{m i n} \frac{1}{n} (Y_{t} - M (u, β) λ)' κ (u, β) (Y_{t} - M (u, β) λ)

After minimizing, the estimator of

λ

is:

\hat{λ} = {(M (u, β)' κ (u, β) M (u, β))}^{- 1} M (u, β)' κ (u, β) Y_{t}

The first component of

\hat{λ}

is an estimate of

g (x_{i}' β)

. We denote

{(M (u, β)' κ (u, β) M (u, β))}^{- 1} M (u, β)' κ (u, β) = S (u, β)

,

s (u, β) = e_{1} S (u, β)

, and

s (u, β) = e_{1} S (u, β)

, and the initial estimate of

g (x_{i}' β)

is

{\hat{g}}_{I N} (u) = e_{1} \hat{λ} = e_{1} S (u, β) Y = s (u, β) Y_{t}

(5)

where

e_{1} = (1, 0, \dots, 0)

. Then, the initial estimate of

G_{n t}

is

\begin{matrix} {\hat{G}}_{I N} = ({\hat{g}}_{I N} (u_{1}), {\hat{g}}_{I N} (u_{2}), \dots, {\hat{g}}_{I N} (u_{n}))' \\ = (s (u_{1}) Y, s (u_{2}) Y, \dots, s (u_{n}) Y_{t})' \\ = (s (u_{1})', s (u_{2})', \dots, s (u_{n})')' Y_{t} \\ = S (β) Y_{t} \end{matrix}

(6)

where

S (β) = (s (x_{1}' β)', s (x_{2}' β)', \dots, s (x_{n}' β)')'

.

Step 2: By substituting

{\hat{G}}_{I N}

for

G_{n t}

in (1), we obtain the approximate value of the logarithmic likelihood function as:

\begin{matrix} l n L_{n, T} (θ) = - \frac{n T}{2} l n (2 π σ^{2}) + T l n |A_{n} (ρ)| - \frac{1}{2 σ^{2}} \sum_{t = 1}^{T} V_{n t}' (ζ) V_{n t} (ζ) \\ = - \frac{n T}{2} l n (2 π σ^{2}) + T l n |A_{n} (ρ)| - \frac{1}{2 σ^{2}} \sum_{t = 1}^{T} (Y_{t} - {\hat{G}}_{I N})' (Y_{t} - {\hat{G}}_{I N}) \\ = - \frac{n T}{2} l n (2 π σ^{2}) + T l n |A_{n} (ρ)| - \frac{1}{2 σ^{2}} \sum_{t = 1}^{T} [(I_{n} - S (β)) Y_{t}]' [(I_{n} - S (β)) Y_{t}] \end{matrix}

(7)

The

{\hat{θ}}_{n T}

that can maximize the above formula is the estimate of

θ

, i.e.,

{\hat{θ}}_{n T} = a r g \max_{θ} \frac{1}{n T} l n L_{n, T} (θ)

Computationally and analytically, it is convenient to work with the concentrated log-likelihood by concentrating out the

σ^{2}

. Based on the log-likelihood function, the initial estimate of

σ^{2}

is

\begin{matrix} {\hat{σ^{2}}}_{I N} = \frac{1}{n T} \sum_{t = 1}^{T} [(I_{n} - S (β)) Y_{t}]' [(I_{n} - S (β)) Y_{t}] \\ = \frac{1}{n T} \sum_{t = 1}^{T} [(I_{n} - S (β)) (A_{n} (ρ) Y_{n t} - Z_{n t} δ)]' [(I_{n} - S (β)) (A_{n} (ρ) Y_{n t} - Z_{n t} δ)] \\ = \frac{1}{n T} \sum_{t = 1}^{T} [(I_{n} - S (β)) ({\hat{G}}_{I N} + V_{n t} (ζ))]' [(I_{n} - S (β)) ({\hat{G}}_{I N} + V_{n t} (ζ))] \end{matrix}

(8)

The concentrated log-likelihood function of

ζ

is

l n L_{n, T} (ζ) = - \frac{n T}{2} l n (2 π + 1) - \frac{n T}{2} l n {\hat{σ^{2}}}_{I N} + T l n |A_{n} (ρ)|

(9)

The estimate of

ζ

can be defined as:

{\hat{ζ}}_{n T} = a r g \max_{θ} \frac{1}{n T} l n L_{n, T} (ζ)

(10)

Equation (10) displays a nonlinear optimization problem, which can be solved using an iterative method in the actual estimation. After

{\hat{ζ}}_{n T}

is obtained, the final estimate of

σ^{2}

can be obtained by replacing

ζ

in Equation (8) with

{\hat{ζ}}_{n T}

:

\begin{matrix} \hat{σ^{2}} = \frac{1}{n T} \sum_{t = 1}^{T} [(I_{n} - S (\hat{β})) Y_{t}]' [(I_{n} - S (\hat{β})) Y_{t}] \\ = \frac{1}{n T} \sum_{t = 1}^{T} [(I_{n} - S (\hat{β})) ({\hat{G}}_{I N} + V_{n t} ({\hat{ζ}}_{n T}))]' [(I_{n} - S (\hat{β})) ({\hat{G}}_{I N} + V_{n t} ({\hat{ζ}}_{n T}))] \end{matrix}

(11)

Step 3: By using

\hat{θ} = (\hat{β}, \hat{δ'}, \hat{ρ}, \hat{σ^{2}})'

obtained in Step 2 to replace the parameters in the model, we describe the final estimate of the non-parametric part

g (x_{i}' β)

as:

\hat{g} (u) = s (u, \hat{β}) [A_{n} (\hat{ρ}) Y_{n t} - Z_{n t} \hat{δ}] = s (u, \hat{β}) {\hat{Y}}_{t}

3. Profile Likelihood Estimators and Their Asymptotic Properties

To analyze the asymptotic properties of the estimators, we need the following assumptions:

Assumption 1.

W_{n}

is a constant spatial weight matrix and its diagonal elements satisfy

w_{n, i i} = 0

for

i = 1, 2, \dots, n

. In addition,

W_{n}

is uniformly bounded in the row and column sums in an absolute value (for short, UB).

Assumption 2.

The disturbances

\{v_{i t}\}, i = 1, 2, \dots, n

and

t = 1, 2, \dots, T

, are

i . i . d .

across

i

and

t

with a zero mean, a variance of

σ^{2}

, and

E {|v_{i t}|}^{4 + η} < \infty

for some

η > 0

.

Assumption 3.

A_{n} (ρ)

is invertible for all

ρ \in Λ_{ρ}

, where

Λ_{ρ}

is compact and

ρ_{0}

is in the interior of

Λ_{ρ}

. Furthermore,

A_{n}^{- 1} (ρ)

is UB.

Assumption 4.

{\{x_{i}\}}_{i = 1}^{n}

is an independent and identically distributed random sequence and

E (v_{i t} x_{i}') < \infty, i = 1, \dots, n

.

x_{i}' β

has marginal density functions

f (u_{i})

, where

f (u_{i})

is continuously differentiable near

x_{i}' β \in Λ_{u}

and

Λ_{u}

is the support set of

k (\cdot)

.

Assumption 5.

g (\cdot)

has continuous derivatives

g^{(p)} (\cdot), p = 1, \dots, n

and

|g (\cdot)| \leq 𝜛_{g}

, where

𝜛_{g}

is a positive constant.

Assumption 6.

When

n \to \infty

,

h \to 0

and

n h^{p} \to \infty, p = 1, \dots, n

.

Assumption 7.

The kernel function

k (\cdot)

is a bounded continuous non-negative function whose support set is bounded and closed:

[- 𝜛_{k}, 𝜛_{k}] \subset R^{p}

, where

𝜛_{k} > 0

is a constant, i.e.,

k (m) > 0

only if

|m| \leq 𝜛_{k}

. In addition,

M (u, β)

,

κ (u, β)

, and

{(M (u, β)' κ (u, β) M (u, β))}^{- 1}

are UB.

Assumption 8.

k (\cdot)

is an even function and

\int k (m) m^{l} d m = 𝜛_{l}, \int k^{2} (m) m^{l} d m = ς_{l}

. For any positive odd number

l

,

𝜛_{l} = ς_{l} = 0

; also,

𝜛_{1} = 0, 𝜛_{2} \neq 0

.

Assumption 9.

β_{0}

is the inner point of

D

, where

D \in R^{p}

is a convex compact set.

||β_{0} = 1||

and the first component of vector quantity

β_{0}

is positive, where

||\cdot||

is the Euclidean norm.

Assumption 1 is a standard normalization assumption in spatial econometrics. Here, the matrix

A = (A_{i j})

is UB, meaning that there is a non-negative constant

𝜛_{A}

such that

\sum_{i = 1}^{n} A_{i j} \leq 𝜛_{A}

and

\sum_{j = 1}^{n} A_{i j} \leq 𝜛_{A}

. The constant

𝜛_{A}

is defined differently in the following sections depending on the matrices. Assumption 2 provides regularity assumptions for

v_{i t}

. The reversibility and compactness of

A_{n} (ρ)

in Assumption 3 were derived from Kelejian and Prucha (1998, 2001) and have also been used in many articles about spatial correlation. When exogenous variables

u_{i}

are included in the model, it is convenient to assume that the exogenous regressors are uniformly bounded, as in Assumption 4. Assumption 5 is a necessary condition for (3). Assumptions 6–8 demonstrate the condition of kernel density estimation. The bandwidth of the kernel function,

h

, is an important parameter that affects the estimation result of the kernel function. Kernel functions that satisfy Assumptions 7–8 exist, such as the product kernel,

k (m) = Π_{i = 1}^{p} k (m_{i})

, where

k (m_{i})

is a symmetric kernel of one variable on the closed interval

[- 𝜛_{k}, 𝜛_{k}]

. Assumption 9 makes the model (1) recognizable.

For the concentrated log-likelihood Function (4) divided by the sample size

n T

, the corresponding expected value function is

Q_{n, T} (θ) = \max_{σ^{2}} E [\frac{1}{n T} l n L_{n, T} (θ)]

, which is

\begin{matrix} Q_{n, T} (θ) = \frac{1}{n T} E [l n L_{n, T} (θ)] \\ = - \frac{1}{2} l n (2 π + 1) + \frac{1}{n} l n |A_{n} (ρ)| - \frac{1}{2 σ^{2}} E \frac{1}{n T} \sum_{t = 1}^{T} V_{n t} (ζ)' V_{n t} (ζ) \end{matrix}

(12)

To show the consistency of

{\hat{θ}}_{n T}

, we need the following uniform convergence results.

Lemma 1.

Under Assumptions 1–9, for an

n \times n

nonstochastic UB matrix

B_{n}

,

\begin{matrix} \frac{1}{n T} \sum_{t = 1}^{T} V_{n t}' B_{n} V_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} V_{n t}' B_{n} V_{n t}) = O_{p} (\frac{1}{\sqrt{n T}}) \\ \frac{1}{n T} \sum_{t = 1}^{T} Z_{n t}' B_{n} Z_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} Z_{n t}' B_{n} Z_{n t}) = O_{p} (\frac{1}{\sqrt{n T}}) \\ \frac{1}{n T} \sum_{t = 1}^{T} Z_{n t}' B_{n} V_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} Z_{n t}' B_{n} V_{n t}) = O_{p} (\frac{1}{\sqrt{n T}}) \end{matrix}

(13)

where

E (\frac{1}{n T} \sum_{t = 1}^{T} V_{n t}' B_{n} V_{n t}) = \frac{1}{n} σ_{0}^{2} t r (B_{n}) = O (1)

,

E (\frac{1}{n T} \sum_{t = 1}^{T} Z_{n t}' B_{n} Z_{n t}) = O (1)

, and

E (\frac{1}{n T} \sum_{t = 1}^{T} Z_{n t}' B_{n} V_{n t}) = O (\frac{1}{T})

.

Lemma 2.

Let Θ be any compact parameter space. Then, under Assumptions 1–9,

\frac{1}{n T} l n L_{n, T} (θ) - Q_{n, T} (θ) \overset{P}{\to} 0

is uniform in

θ \in Θ

.

Lemma 3.

Let Θ be any compact parameter space. Then, under Assumptions 1–8,

Q_{n, T} (ζ)

is uniformly equicontinuous for

θ \in Θ

.

Before obtaining the information matrix, we need to compute the first and second derivatives of the logarithmic likelihood function. The asymptotic distribution of the QMLE

{\hat{θ}}_{n T}

can be derived from the Taylor expansion of

\frac{\partial l n L_{n, T} ({\hat{θ}}_{n T})}{\partial θ}

around

θ_{0}

. We define

θ^{β} = θ (\hat{β}) = (δ', ρ, σ^{2})'

and

ζ^{β} = ζ (\hat{β}) = (δ', ρ)'

at the true value

θ_{0}^{β} = θ_{0} (\hat{β}) = (δ_{0}^{'}, ρ_{0}, σ_{0}^{2})'

and

ζ_{0}^{β} = ζ_{0} (\hat{β}) = (δ_{0}^{'}, ρ_{0})'

. The first-order derivative of the concentrated likelihood function involves both linear and quadratic functions of

V_{n t}

, which are as follows:

\begin{matrix} \frac{1}{\sqrt{n T}} \frac{\partial l n L_{n, T} (θ_{0}^{β})}{\partial θ^{β}} \\ = \frac{1}{\sqrt{n T}} [\begin{matrix} \frac{\partial l n L_{n, T} (θ_{0}^{β})}{\partial δ} & \frac{\partial l n L_{n, T} (θ_{0}^{β})}{\partial ρ} & \frac{\partial l n L_{n, T} (θ_{0}^{β})}{\partial σ^{2}} \end{matrix}] \\ = \frac{1}{\sqrt{n T}} {[\begin{matrix} \frac{1}{σ_{0}^{2}} \sum_{t = 1}^{T} Z_{n s}' V_{n t} \\ \frac{1}{σ_{0}^{2}} \sum_{t = 1}^{T} [F_{n s} Z_{n t} δ_{0}]' V_{n t} - \sum_{t = 1}^{T} [V_{n t}' F_{n} V_{n t} - σ_{0}^{2} t r (F_{n})] \\ \frac{1}{2 σ_{0}^{4}} (\sum_{t = 1}^{T} V_{n t}' V_{n t} - n T σ_{0}^{2}) \end{matrix}]}^{T} \end{matrix}

(14)

where

Z_{n s} = [I_{n} - S (\hat{β})] Z_{n t}

and

Y_{n s} = [I_{n} - S (\hat{β})] W_{n} Y_{n t}

. Then, the second-order derivatives are:

\begin{matrix} \frac{1}{n T} \frac{\partial^{2} l n L_{n, T} (θ^{β})}{\partial θ^{β} \partial θ^{β}'} = - \frac{1}{n T} \times \\ [\begin{matrix} \frac{1}{σ^{2}} \sum_{t = 1}^{T} Z_{n s}' Z_{n s} & \frac{1}{σ^{2}} \sum_{t = 1}^{T} Z_{n s}' Y_{ns} & \frac{1}{σ^{4}} \sum_{t = 1}^{T} Z_{n s}' V_{n t} (ζ^{β}) \\ * & \frac{1}{σ^{2}} \sum_{t = 1}^{T} Y_{ns} {' Y}_{ns} - T t r (F_{n}^{2}) & \frac{1}{σ^{4}} \sum_{t = 1}^{T} Y_{ns}' V_{n t} (ζ^{β}) \\ * & * & - \frac{n T}{2 σ^{4}} + \frac{1}{σ^{6}} \sum_{t = 1}^{T} V_{n t} (ζ^{β})' V_{n t} (ζ^{β}) \end{matrix}] \end{matrix}

(15)

The information matrix is as follows:

\begin{matrix} Σ_{θ_{0}^{β}, n T} = - E (\frac{1}{n T} \frac{\partial^{2} l n L_{n, T} (θ_{0}^{β})}{\partial θ^{β} \partial θ^{β}'}) = \\ [\begin{matrix} \frac{1}{n T} \frac{1}{σ_{0}^{2}} \sum_{t = 1}^{T} Z_{n s}' Z_{n s} & \frac{1}{n T} \frac{1}{σ_{0}^{2}} \sum_{t = 1}^{T} Z_{n s}' R_{n s} & 0 \\ * & \frac{1}{n T} \frac{1}{σ_{0}^{2}} E \sum_{t = 1}^{T} R_{n s}' R_{n s} + \frac{1}{n} [t r (F_{n}^{2}) + t r (F_{n s}' F_{n s})] & \frac{1}{n σ_{0}^{2}} t r [F_{n s}] \\ * & * & \frac{1}{2 σ_{0}^{4}} \end{matrix}] \end{matrix}

where

F_{n s} = [I_{n} - S (\hat{β})] F_{n}

and

R_{n s} = F_{n s} (Z_{n t} δ_{0} + {\hat{G}}_{I N})

.

Assumption 10.

\lim_{n \to \infty} \frac{1}{n} [t r (F_{n}^{2}) + t r (F_{n s}' F_{n s}) - \frac{2}{n} {(t r F_{n s})}^{2}] \neq 0

.

Assumption 10 is an important condition for the non-singularity of the limiting information matrix

Σ_{θ_{0}^{β}, n T}

in addition to the global identification in Lemma 4 and Theorem 1.

Lemma 4.

The information matrix

Σ_{θ_{0}^{β}, n T}

is non-singular.

Theorem 1.

Under Assumptions 1–10,

θ_{0}

is globally identifiable and

\hat{θ}

is a consistent estimator of

θ_{0}

(similar to Yu (2008)).

Theorem 2.

Under Assumptions 1–10,

θ_{0}

is globally identifiable and

{\hat{θ}}_{n T} \overset{P}{\to} θ_{0}

if

\underset{n \to \infty}{l i m} (\frac{1}{n} l n |σ_{0}^{2} A_{n}^{- 1} A'_{n}^{- 1}| - \frac{1}{n} l n |σ_{n}^{2} (ρ) A_{n}^{- 1} (ρ) A'_{n}^{- 1} (ρ)|) \neq 0

for

ρ \neq ρ_{0}

(similar to Yu (2008)).

Lemma 5.

Under Assumptions 1–10,

\frac{1}{n} M (u, β)' κ (u, β) M (u, β) \overset{P}{\to} f (x) (\begin{matrix} 1 & 0 \\ 0 & 𝜛_{2} I_{n} \end{matrix})

.

Lemma 6.

Under Assumptions 1–10,

S (β) V_{n t} = o_{p} (1)

.

Lemma 7.

Under Assumptions 1–9,

(I_{n} - S) G_{n t} = o_{p} (1)

.

Theorem 3.

Under Assumptions 1–9,

{\hat{G}}_{n t} \overset{P}{\to} G_{n t}

.

4. Monte Carlo Results

In this section, the Monte Carlo experiment is carried out on the estimation method pertaining to the previously constructed Si-SDPD model, and the simulation results are evaluated to test the performance of the PML method under limited samples. Then, the practical application value of the PMLEs is evaluated. All experiments are compiled using R language and plotted using the ‘ggplot2′ package.

For the parametric part, we generate samples from (1) and use

θ_{1}^{β} = (- 0.3, 0.4, 1, - 0.2)'

and

θ_{2}^{β} = (0.25, - 0.6, 0.5, 0.1)'

, where

θ_{0}^{β} = (γ_{0}, ρ_{0}, σ_{0}^{2}, τ_{0})'

. The component of the two-dimensional random variable

X = (\begin{matrix} x_{11}, & x_{12}, & \dots, \\ x_{21}, & x_{22}, & \dots, \end{matrix} \begin{matrix} x_{1 n} \\ x_{2 n} \end{matrix})' = (\begin{matrix} X_{1} \\ X_{2} \end{matrix})'

and the random error term

V_{n t} = (v_{1 t}, v_{2 t}, \dots, v_{n t})'

are generated from the uniform distribution

U [- 3, 3]

and the independent normal distribution

N (0, σ_{0}^{2})

, respectively. The spatial weight matrix

W_{n}

that we use is the

Rook

matrix, which is one of the main types of spatial weight matrix in spatial econometrics. For the non-parametric part, we use the commonly used Gaussian kernel function,

g (X) = 2 e^{- {(β_{1} X_{1} + β_{2} X_{2})}^{2}}

, where

(β_{1}, β_{2}) = (\sqrt{1 / 3}, \sqrt{2 / 3}) \approx (0.5774, 0.8165)

. As it is difficult to select the optimal window width, we simply select the window width using the rule-of-thumb method. Finally, we use

n = 10, 49, 100

as the sample size and

T = 4, 10

as the number of periods. For each set of

n

and

T

, the sampling observations are generated with the Metropolis–Hastings sampling algorithm.

The evaluation of simulation results should also be divided into parametric and non-parametric parts. In the parametric part, for each estimator, we calculate the standard deviation (Std) and the root-mean-squared error (RMSE), where

R M S E = {[\frac{1}{n} \sum_{i = 1}^{r e p s} {(\hat{θ_{i}} - θ_{0})}^{2}]}^{1 / 2}, r e p s

is the number of simulations and

\hat{θ_{i}}, i = 1, \dots, r e p s

are the parameter estimates obtained from each simulation. In order to accurately estimate the parameter values, according to Su (2012), we take the window width

h = s t d (x) \cdot n^{1 / 5}

here, where

s t d (x)

represents the standard deviation of sequence

x

. In the non-parametric part, we refer to Chen (2012) when choosing the mean absolute deviation error (MADE) as the evaluation standard, which is

M A D E = \frac{1}{M} \sum_{j = 1}^{M} |\hat{g_{j}} (x_{m}) - g_{j} (x_{m})|, j = 1, \dots, r e p s,

where

{\{x_{m}\}}_{m = 1}^{M}

is the

M

fixed grid points selected within the support set of

x

. We select 20 fixed lattice points in

(- 2, 2)

, namely

M = 20

. When estimating the non-parametric part, we use the method of leave-one-out cross-validation to select the window width, i.e., the window width minimizes

\frac{1}{n T} \sum_{t = 1}^{T} \sum_{i = 1}^{n} {({\hat{y}}_{i} - {\hat{g}}_{- i})}^{2}

, where

{\hat{y}}_{i}

is the ith element of the

{\hat{Y}}_{t} = A_{n} (\hat{ρ}) Y_{n t} - Z_{n t} \hat{δ}

after the estimated value

\hat{θ}

and

{\hat{g}}_{- i}

is the

g (x_{i})

estimate obtained with the observation value other than the ith observation.

For different cases of

n

and

T

, 100 simulations are carried out with R Language. In each simulation, the Metropolis–Hastings sampling algorithm is used to conduct 1000 samples in the PML function. In order to obtain the distribution of samples close to reality and in order to achieve stability in the state, the first 200 sampling results are discarded. With two different values of

θ_{0}

for each

n

and

T

, finite sample properties of both estimators are summarized in Table 1 and Table 2, in which we report the means, variances (Vars), root mean square error (RMSE), and coverage probability (CP).

Table 1. The performance of spatial coefficients estimators with

θ_{1}^{β}

.

Table 2. The performance of spatial coefficients estimators with

θ_{2}^{β}

.

For each case, the estimated value of the parameter, i.e., the mean, is relatively close to the real value, and we can see that for each given

n

, when

T

is larger, the variance of estimators will be smaller; for each given

T

, when

n

is larger, the biases between the real value and the estimators will be nearly the same, but the variance will be smaller. When both

n

and

T

are maximized, i.e.,

n = 100, T = 10

, the variance and RMSEs of the parameter estimators are the smallest in all cases, which indicates that the parameter estimators will converge with the increase in the

n

value, which is consistent with the large sample property, as demonstrated. For different values of

θ_{0}

, the variances are almost all less than 0.01, which indicates that the fitting error is small and the fitting results are good. In addition, the variance and root mean square error have little changes and will not change with an increase in the simulation time and the sample size, which further indicates that the variance of parameters is stable. Due to the small estimation variance, the mean hardly fluctuates around the true value, and the range of the confidence interval is also relatively stable. In only a few cases, the confidence intervals do not cover the true value, and with an increase in

n

and

T

, the coverage degree becomes higher and higher, i.e., the CP gradually approaches 1.

Table 3 shows the average absolute error and variance of unknown function estimators

g (X) = 2 e^{- {(β_{1} X_{1} + β_{2} X_{2})}^{2}}

under different samples. As shown in Table 3, with an increase in the sample size and total number of periods under the same parameter setting, both the value and estimation error of the unknown function at 20 fixed lattice points decrease, indicating that the estimation of the non-parametric part is convergent. This is consistent with the theoretical results of Theorem 3.

Table 3. The performance of unknown function estimators

{\hat{G}}_{n t}

.

5. Conclusions

In this paper, we propose a Si-SDPD model that can overcome the “Curse of Dimensionality” problem and deal with spatial and temporal correlation. With this model, we construct a PML method when the traditional maximum likelihood estimation method is not applicable. The theoretical results show that, under certain regularity conditions, both parametric and nonparametric estimators are consistent. Numerical simulation results show that PMLEs have good small sample characteristics and estimation accuracy increases with an increase in the sample size and time periods. Our research results will enrich and improve the estimation methods of single-index panel data models in spatial econometrics, and provide a new research tool and perspective for the applied research of related disciplines.

It is a complicated problem to prove the asymptotic normality of PML estimation because the number of parameters and unknown variables in the Si-SDPD model is too large, and the representation of the covariance matrix will be very complicated. The cross-sectional heteroskedasticity (space-varying error variances) in the Si-SDPD model is another interesting extension to consider. Most estimators are generally inconsistent in the presence of an unknown form of heteroskedasticity in the disturbance term in the SDPD model, much less in the Si-SDPD model with more unknown parameters. These works would be much more challenging than the already quite challenging works presented in this paper, and will be the topics of our future research.

Author Contributions

Conceptualization, M.Z. and B.T.; methodology, M.Z. and B.T.; software, M.Z.; validation, M.Z. and B.T.; formal analysis, M.Z. and B.T.; investigation, M.Z. and B.T.; resources, M.Z.; data curation, M.Z.; writing—original draft preparation, M.Z.; writing—review and editing, M.Z. and B.T.; visualization, M.Z. and B.T.; supervision, B.T.; project administration, B.T.; funding acquisition, B.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant number 91646106).

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the requirements of related projects supported by the National Natural Science Foundation of China.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Some Basic Lemmas

Proof of Lemma 2.

Based on

[I_{n} - S (\hat{β})] Y_{t} = V_{n t} (ζ) = A_{n} (ρ) Y_{n t} - Z_{n t} δ - {\hat{G}}_{I N}

, we have

V_{n t} (ζ) = V_{n t} - Z_{n t} (δ - δ_{0}) - (ρ - ρ_{0}) W_{n} Y_{n t}

. Hence,

\begin{matrix} V_{n t} (ζ)' V_{n t} (ζ) = V_{n t}' V_{n t} - 2 (δ - δ_{0})' Z_{n t}' V_{n t} - 2 (ρ - ρ_{0}) (W_{n} Y_{n t})' V_{n t} + {(ρ - ρ_{0})}^{2} (W_{n} Y_{n t})' (W_{n} Y_{n t}) \\ + (δ - δ_{0})' Z_{n t}' Z_{n t} (δ - δ_{0}) \\ + 2 (ρ - ρ_{0}) (W_{n} Y_{n t})' Z_{n t} (δ - δ_{0}) \end{matrix}

(A1)

where, using

W_{n} Y_{n t} = W_{n} A_{n}^{- 1} \{Z_{n t} δ + {[I_{n} - S (\hat{β})]}^{- 1} V_{n t}\}

and

F_{n} = W_{n} A_{n}^{- 1}

,

\begin{matrix} [W_{n} Y_{n t}]' [W_{n} Y_{n t}] = \\ δ_{0}' Z_{n t}' F_{n}' F_{n} Z_{n t} δ_{0} + \{F_{n} {[I_{n} - S (\hat{β})]}^{- 1} V_{n t}\}' \{F_{n} {[I_{n} - S (\hat{β})]}^{- 1} V_{n t}\} \\ + 2 δ_{0}' Z_{n t}' F_{n}' \{F_{n} {[I_{n} - S (\hat{β})]}^{- 1} V_{n t}\} \end{matrix}

(A2)

Using Lemma 1,

\begin{matrix} \frac{1}{n T} \sum_{t = 1}^{T} V_{n t}' V_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} V_{n t}' V_{n t}) \overset{P}{\to} 0 \\ \frac{1}{n T} \sum_{t = 1}^{T} Z_{n t}' Z_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} Z_{n t}' Z_{n t}) \overset{P}{\to} 0 \\ \frac{1}{n T} \sum_{t = 1}^{T} Z_{n t}' V_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} Z_{n t}' V_{n t}) \overset{P}{\to} 0 \\ \frac{1}{n T} \sum_{t = 1}^{T} [W_{n} Y_{n t}]' V_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} [W_{n} Y_{n t}]' V_{n t}) \overset{P}{\to} 0 \\ \frac{1}{n T} \sum_{t = 1}^{T} [W_{n} Y_{n t}]' Z_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} [W_{n} Y_{n t}]' Z_{n t}) \overset{P}{\to} 0 \\ \frac{1}{n T} \sum_{t = 1}^{T} [W_{n} Y_{n t}]' [W_{n} Y_{n t}] - E (\frac{1}{n T} \sum_{t = 1}^{T} [W_{n} Y_{n t}]' [W_{n} Y_{n t}]) \overset{P}{\to} 0 \end{matrix}

(A3)

As

δ

and

ρ

are bounded in Θ, we have

\frac{1}{n T} \sum_{t = 1}^{T} V_{n t} (ζ)' V_{n t} (ζ) - \frac{1}{n T} E \sum_{t = 1}^{T} V_{n t} (ζ)' V_{n t} (ζ) \overset{P}{\to} 0

that is uniform in θ and Θ. Using the fact that

σ^{2}

is bounded away from zero in Θ and

Q_{n, T} (θ) = \frac{1}{n T} E [l n L_{n, T} (θ)]

,

\begin{matrix} \frac{1}{n T} l n L_{n, T} (θ) - Q_{n, T} (θ) \\ = \frac{1}{2 σ^{2}} (\sum_{t = 1}^{T} V_{n t} (ζ)' V_{n t} (ζ) - \frac{1}{n T} E \sum_{t = 1}^{T} V_{n t} (ζ)' V_{n t} (ζ)) \overset{P}{\to} 0 \end{matrix}

(A4)

uniformly in

θ

. □

Proof of Lemma 3

(similar to Ref. [3]). Given

{\hat{G}}_{I N} = S (\hat{β}) Y_{t}

and

V_{n t} (ζ) = [I_{n} - S (\hat{β})] Y_{t}

, we have

\begin{matrix} V_{n t} (ζ) = A_{n} (ρ) A_{n}^{- 1} (Z_{n t} δ_{0} + {\hat{G}}_{I N} + V_{n t}) - Z_{n t} δ - {\hat{G}}_{I N} \\ = A_{n} (ρ) A_{n}^{- 1} Z_{n t} δ_{0} - Z_{n t} δ + A_{n} (ρ) A_{n}^{- 1} V_{n t} + A_{n} (ρ) A_{n}^{- 1} {\hat{G}}_{I N} - {\hat{G}}_{I N} \\ = A_{n} (ρ) A_{n}^{- 1} Z_{n t} δ_{0} - Z_{n t} δ + [A_{n} (ρ) A_{n}^{- 1} - S (\hat{β})] {[I_{n} - S (\hat{β})]}^{- 1} V_{n t} \end{matrix}

(A5)

And

Q_{n, T} (θ) = \frac{1}{n T} E [l n L_{n, T} (θ)] = - \frac{1}{2} l n (2 π + 1) + \frac{1}{n} l n |A_{n} (ρ)| - \frac{1}{2 σ^{2}} E \frac{1}{n T} \sum_{t = 1}^{T} V'_{n t} (ζ)' V_{n t} (ζ)

, where

\begin{matrix} E \frac{1}{n T} \sum_{t = 1}^{T} V ’_{n t} (ζ)' V_{n t} (ζ) \\ = \frac{1}{n T} E \sum_{t = 1}^{T} [A_{n} (ρ) A_{n}^{- 1} Z_{n t} δ_{0} - Z_{n t} δ]' [A_{n} (ρ) A_{n}^{- 1} Z_{n t} δ_{0} - Z_{n t} δ] \\ + \frac{1}{n} σ_{0}^{2} t r \{[(A_{n} (ρ) A_{n}^{- 1} - S (\hat{β})) {(I_{n} - S (\hat{β}))}^{- 1}]' [(A_{n} (ρ) A_{n}^{- 1} - S (\hat{β})) {(I_{n} - S (\hat{β}))}^{- 1}]\} \\ + \frac{2}{n T} \sum_{t = 1}^{T} [A_{n} (ρ) A_{n}^{- 1} Z_{n t} δ_{0} - Z_{n t} δ]' [(A_{n} (ρ) A_{n}^{- 1} - S (\hat{β})) {(I_{n} - S (\hat{β}))}^{- 1}] V_{n t} \end{matrix}

(A6)

According to Lemma 1, the third term is

\frac{2}{n T} \sum_{t = 1}^{T} [A_{n} (ρ) A_{n}^{- 1} Z_{n t} δ_{0} - Z_{n t} δ]' [(A_{n} (ρ) A_{n}^{- 1} - S (\hat{β})) {(I_{n} - S (\hat{β}))}^{- 1}] V_{n t} = O (\frac{1}{T})

, and

O (\frac{1}{T})

is uniform in

θ

and Θ because it is a polynomial function in

θ

and Θ is a bounded set. The second term is equal to

σ_{n}^{2} (ρ)

, where

σ_{n}^{2} (ρ) = \frac{1}{n} σ_{0}^{2} t r \{[(A_{n} (ρ) A_{n}^{- 1} - S (\hat{β})) {(I_{n} - S (\hat{β}))}^{- 1}]' [(A_{n} (ρ) A_{n}^{- 1} - S (\hat{β})) {(I_{n} - S (\hat{β}))}^{- 1}]\}

, which are polynomial functions of

θ

and are uniform in Θ. Using

A_{n} (ρ) A_{n}^{- 1} = I_{n} - (ρ - ρ_{0}) F_{n}

in the first term, we have

\begin{matrix} \frac{1}{n T} E \sum_{t = 1}^{T} [A_{n} (ρ) A_{n}^{- 1} Z_{n t} δ_{0} - Z_{n t} δ]' [A_{n} (ρ) A_{n}^{- 1} Z_{n t} δ_{0} - Z_{n t} δ] \\ = \frac{1}{n T} E \sum_{t = 1}^{T} [Z_{n t} (δ - δ_{0}) + (ρ - ρ_{0}) F_{n} Z_{n t} δ_{0}]' [Z_{n t} (δ - δ_{0}) + (ρ - ρ_{0}) F_{n} Z_{n t} δ_{0}] \\ = (δ' - δ_{0}', ρ - ρ_{0}) E Λ_{n T} (δ' - δ_{0}', ρ - ρ_{0}) \end{matrix}

where

Λ_{n T} = [\begin{matrix} \frac{1}{n T} \sum_{t = 1}^{T} Z_{n t}' Z_{n t} & \frac{1}{n T} \sum_{t = 1}^{T} (F_{n} Z_{n t} δ_{0})' Z_{n t} \\ \frac{1}{n T} \sum_{t = 1}^{T} (F_{n} Z_{n t} δ_{0})' Z_{n t} & \frac{1}{n T} \sum_{t = 1}^{T} (F_{n} Z_{n t} δ_{0})' (F_{n} Z_{n t} δ_{0}) \end{matrix}]

(A7)

To prove that

Q_{n, T} (ζ)

is uniformly equicontinuous for

θ \in Θ

, the following four conditions must be true. (1)

l n σ^{2}

is uniformly continuous;

σ^{2}

is bounded away from zero in

Θ

, so (1) is obvious. (2)

\frac{1}{n} l n |A_{n} (ρ)|

is uniformly equicontinuous; we know

\frac{1}{n} l n |A_{n} (ρ_{2})| - \frac{1}{n} l n |A_{n} (ρ_{1})| = \frac{1}{n} t r (W_{n} A_{n}^{- 1} (\bar{ρ})) (ρ_{2} - ρ_{1})

, where

\bar{ρ}

falls between

ρ_{1}

and

ρ_{2}

. Because

A_{n}^{- 1} (ρ)

is UB, and

θ

is uniform in

Θ

,

\frac{1}{n} t r (W_{n} A_{n}^{- 1} (\bar{ρ}))

is bounded. Hence, (2) is true. (3) The first term, i.e.,

(δ' - δ_{0}', ρ - ρ_{0}) E Λ_{n T} (δ' - δ_{0}', ρ - ρ_{0})

, is uniformly equicontinuous; both

ρ

and

δ

are bounded, with

E Λ_{n T} = O (1)

, meaning (3) is true. (4)

σ_{n}^{2} (ρ)

is uniformly equicontinuous given that

σ_{n}^{2} (ρ) = \frac{1}{n} σ_{0}^{2} t r \{[(A_{n} (ρ) A_{n}^{- 1} - S (\hat{β})) {(I_{n} - S (\hat{β}))}^{- 1}]' [(A_{n} (ρ) A_{n}^{- 1} - S (\hat{β})) {(I_{n} - S (\hat{β}))}^{- 1}]\}

and

A_{n} (ρ) A_{n}^{- 1} = I_{n} - (ρ - ρ_{0}) F_{n}

, we have

σ_{n}^{2} (ρ_{2}) - σ_{n}^{2} (ρ_{1}) = σ_{0}^{2} (ρ_{2} - ρ_{1}) \frac{1}{n} [(ρ_{2} + ρ_{1} - 2 ρ_{0}) t r F_{n}' F_{n} - t r (F_{n}' + F_{n})]

And because

F_{n}' F_{n}

and

F_{n}

are UB, (4) is true. □

Proof of Lemma 4.

In line with Lee’s study (2004), we can use a contradiction to prove the result. Firstly, we assume

α = (α_{1}, α_{2}, α_{3})',

where

α_{1}, α_{2}

and

α_{3}

are scalars. Next, for

Σ_{θ_{0}^{β}} \equiv \underset{T \to \infty}{l i m} Σ_{θ_{0}^{β}, n T}

, we need to prove that

Σ_{θ_{0}^{β}} α = 0

implies

α = 0

. If this is true, then, columns of

Σ_{θ_{0}^{β}}

would be linear-independent and

Σ_{θ_{0}^{β}}

would be nonsingular.

From (21),

Σ_{θ_{0}^{β}} = \frac{1}{σ_{0}^{2}} \times [\begin{matrix} E Φ_{Z Z} & E Φ_{Z R} & 0 \\ E Φ_{R Z} & E Φ_{R R} + \underset{n \to \infty}{l i m} \frac{σ_{0}^{2}}{n} [t r (F_{n}^{2}) + t r (F_{n s}' F_{n s})] & \underset{n \to \infty}{l i m} \frac{1}{n} t r [F_{n s}] \\ 0 & \underset{n \to \infty}{l i m} \frac{1}{n} t r [F_{n s}] & \frac{1}{2 σ_{0}^{2}} \end{matrix}]

where

Φ_{Z Z} = \lim_{T \to \infty} \frac{1}{n T} \sum_{t = 1}^{T} \{[I_{n} - S (\hat{β})] Z_{n t}\}' [I_{n} - S (\hat{β})] Z_{n t}

,

Φ_{R Z} = Φ_{Z R}'

,

Φ_{Z R} = \lim_{T \to \infty} \frac{1}{n T} \sum_{t = 1}^{T} \{[I_{n} - S (\hat{β})] Z_{n t}\}' R_{n}

, and

Φ_{R R} = \underset{T \to \infty}{l i m} \frac{1}{n T} \sum_{t = 1}^{T} R_{n}' R_{n}

. Hence,

Σ_{θ_{0}^{β}} α = 0

implies

\begin{matrix} \frac{1}{σ_{0}^{2}} E Φ_{Z Z} \times α_{1} + \frac{1}{σ_{0}^{2}} E Φ_{Z R} \times α_{2} = 0 \\ \frac{1}{σ_{0}^{2}} E Φ_{R Z} \times α_{1} + \{\frac{1}{σ_{0}^{2}} E Φ_{R R} + \underset{n \to \infty}{l i m} \frac{1}{n} [t r (F_{n}^{2}) + t r (F_{n s}' F_{n s})]\} \times α_{2} \\ + \underset{n \to \infty}{l i m} \frac{1}{n σ_{0}^{2}} t r [F_{n s}] \times α_{3} = 0 \\ \underset{n \to \infty}{l i m} \frac{1}{n σ_{0}^{2}} t r [F_{n s}] \times α_{2} + \frac{1}{2 σ_{0}^{4}} \times α_{3} = 0 \end{matrix}

The first and third equations imply, respectively,

α_{1} = - {(E Φ_{Z Z})}^{- 1} E Φ_{Z R} \times α_{2}

and

α_{3} = - 2 \underset{n \to \infty}{l i m} \frac{σ_{0}^{2}}{n} t r [F_{n s}] \times α_{2}

. By eliminating

α_{1}

and

α_{3}

, the second equation becomes

\{\frac{1}{σ_{0}^{2}} [E Φ_{R R} - E Φ_{R Z} {(E Φ_{Z Z})}^{- 1} E Φ_{Z R}] + \underset{n \to \infty}{l i m} \frac{1}{n} [t r (F_{n}^{2}) + t r (F_{n s}' F_{n s}) + \frac{2}{n} {(t r (F_{n s}))}^{2}]\} \times α_{2} = 0

.

Under Assumption 10, we assume that

Φ_{n T} = \frac{1}{σ_{0}^{2}} [\begin{matrix} Φ_{Z Z} & Φ_{Z R} \\ Φ_{R Z} & Φ_{R R} \end{matrix}]

and that

E Φ_{n T}

is nonsingular, meaning that the above formula is only true if

α_{2} = 0

, i.e.,

α = 0

. The information matrix

Σ_{θ_{0}^{β}, n T}

is nonsingular. □

Proof of Lemma 5.

As known from the previous definition,

\begin{matrix} \frac{1}{n} M (u, β)' κ (u, β) M (u, β) = \\ (\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} k_{h} (u_{i} - u) & \frac{1}{n} \sum_{i = 1}^{n} k_{h} (u_{i} - u) (\frac{u_{i} - u}{h})' \\ \frac{1}{n} \sum_{i = 1}^{n} k_{h} (u_{i} - u) (\frac{u_{i} - u}{h}) & \frac{1}{n} \sum_{i = 1}^{n} k_{h} (u_{i} - u) {(\frac{u_{i} - u}{h})}^{2} \end{matrix}) \\ ≝ (\begin{matrix} Ψ_{11}^{1} & Ψ_{12}^{1} \\ Ψ_{21}^{1} & Ψ_{22}^{1} \end{matrix}) = Ψ^{1} \end{matrix}

From Assumption 4,

{\{x_{i}\}}_{i = 1}^{n}

is an independent, identically distributed random sequence, so

{\{u_{i}\}}_{i = 1}^{n}

is an independent, identically distributed sequence.

\begin{matrix} E Ψ_{11}^{1} = \int h^{- 1} k (\frac{u_{1} - u}{h}) f (u_{1} - u) d u_{1} = \int k (m) f (u + h m) d m = \int k (m) [f (u) + h f' (u) m + h^{2} f ″ (ξ) m^{2}] d m \\ = f (u) + O (h^{2}) \end{matrix}

where

m = \frac{u_{1} - u}{h}

is a variable substitution, and

ξ

falls between

u_{1}

and

u

.

Similarly,

E Ψ_{21}^{1} = E k_{h} (u_{1} - u) (\frac{u_{1} - u}{h}) = \int h^{- 1} k (\frac{u_{1} - u}{h}) f (u_{i}) (\frac{u_{1} - u}{h}) d u_{1} = \int k (m) f (u + h m) m d m = \int k (m) [f (u) + h f' (u) m + h^{2} f ″ (ξ) m^{2}] m d m

=

O (h)

and

E Ψ_{12}^{1} = (E Ψ_{21}^{1})' = O (h)

.

Moreover,

E Ψ_{22}^{1} = E k_{h} (u_{1} - u) {(\frac{u_{1} - u}{h})}^{2} = \int h^{- 1} k (\frac{u_{1} - u}{h}) f (u_{1}) {(\frac{u_{1} - u}{h})}^{2} d u_{1} = \int k (m) f (u + h m) m^{2} d m = \int k (m) [f (u) + h f' (u) m + h^{2} f ″ (ξ) m^{2}] m^{2} d m

=

f (u) \int k (m) m^{2} d m + O (h^{2})

.

According to the Khintchine law of large numbers,

Ψ_{i j}^{1} \overset{P}{\to}

E

Ψ_{i j}^{1}, i = 1, 2

and

Ψ^{1} \overset{P}{\to} (\begin{matrix} E Ψ_{11}^{1} & E Ψ_{12}^{1} \\ E Ψ_{21}^{1} & E Ψ_{22}^{1} \end{matrix}) = (\begin{matrix} f (u) + O (h^{2}) & O (h) \\ O (h) & f (u) \int k (m) m^{2} d m \cdot I_{n} + O (h^{2}) \end{matrix})

Based on Assumption 6, when

n \to \infty

,

h \to 0

, meaning that

Ψ^{1} \overset{P}{\to} f (u) (\begin{matrix} 1 & 0 \\ 0 & f (u) \int k (m) m^{2} d m \cdot I_{n} \end{matrix}) .

□

Proof of Lemma 6.

As

s (u, β) V_{n t} = e_{1} {(M (u, β)' κ (u, β) M (u, β))}^{- 1} M (u, β)' κ (u, β) V_{n t} = e_{1} {(\frac{1}{n} M (u, β)' κ (u, β) M (u, β))}^{- 1} \frac{1}{n} M (u, β)' κ (u, β) V_{n t}

, we describe

\frac{1}{n} M (u, β)' κ (u, β) V_{n t} = (\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} k_{h} (u_{i} - u) v_{i t} \\ \frac{1}{n} \sum_{i = 1}^{n} k_{h} (u_{i} - u) (\frac{u_{i} - u}{h}) v_{i t} \end{matrix}) ≝ (\begin{matrix} Ψ_{1}^{2} \\ Ψ_{2}^{2} \end{matrix}) = Ψ^{2}

. Based on Assumptions 2 and 4, since

{\{x_{i}\}}_{i = 1}^{n}

and

{\{v_{i}\}}_{i = 1}^{n}

are independent,

{\{u_{i}\}}_{i = 1}^{n}

and

{\{v_{i}\}}_{i = 1}^{n}

are independent, and,

E Ψ_{1}^{2} = 0, E Ψ_{2}^{2} = 0

. Furthermore,

{\{u_{i}\}}_{i = 1}^{n}

and

{\{v_{i}\}}_{i = 1}^{n}

denote an independent, identically distributed sequence, so

\begin{matrix} V a r Ψ_{1}^{2} = \frac{1}{n} E k_{h}^{2} (u_{i} - u) v_{i}^{2} = \frac{1}{n} E [k_{h}^{2} (u_{i} - u)] E v_{i}^{2} = \frac{σ_{0}^{2}}{n} E k_{h}^{2} (u_{i} - u) = \frac{σ_{0}^{2}}{n} h^{- 1} \int k^{2} (m) f (u + h m) d m = O ({(n h)}^{- 1}); \\ V a r Ψ_{2}^{2} = \frac{1}{n} E k_{h}^{2} (u_{i} - u) {(\frac{u_{i} - u}{h})}^{2} v_{i}^{2} = \frac{1}{n} E [k_{h}^{2} (u_{i} - u) {(\frac{u_{i} - u}{h})}^{2} E (v_{i}^{2})] = \frac{σ_{0}^{2}}{n} E [k_{h}^{2} (u_{i} - u) {(\frac{u_{i} - u}{h})}^{2}] \\ = \frac{σ_{0}^{2}}{n} h^{- 1} \int k^{2} (m) f (u + h m) m^{2} d m = O ({(n h)}^{- 1}) . \end{matrix}

Based on Assumption 5,

Ψ_{1}^{2} = O_{p} ({(n h)}^{- 1 / 2}) = o_{p} (1)

and

Ψ_{2}^{2} = O_{p} ({(n h)}^{- 1 / 2}) = o_{p} (1)

, so

\frac{1}{n} M (u, β)' κ (u, β) V_{n t} = O_{p} ({(n h^{p})}^{- 1 / 2}) = o_{p} (1)

. Then, from Lemma 5,

S (β) V_{n t} = o_{p} (1)

. □

Proof of Lemma 7.

As (6),

\begin{matrix} s (u, β) G_{n t} = e_{1} {(\frac{1}{n} M (u, β)' κ (u, β) M (u, β))}^{- 1} \frac{1}{n} M (u, β)' κ (u, β) G_{n t}, \\ \frac{1}{n} M (u, β)' κ (u, β) G_{n t} = (\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} k_{h} (u_{i} - u) g (u_{i}) \\ \frac{1}{n} \sum_{i = 1}^{n} k_{h} (u_{i} - u) (\frac{u_{i} - u}{h}) g (u_{i}) \end{matrix}) ≝ (\begin{matrix} Ψ_{1}^{3} \\ Ψ_{2}^{3} \end{matrix}) = Ψ^{3} . \end{matrix}

Based on Assumptions 2 and 4,

\begin{matrix} E Ψ_{1}^{3} = E k_{h} (u_{i} - u) g (u_{i}) = \int h^{- p} k (\frac{u_{i} - u}{h}) g (u_{i}) f (u_{i}) d u_{i} \\ = \int k (m) f (u + h m) g (u + h m) d m \\ = \int k (m) [g (u) + h g' (u) m + h^{2} g ″ (ξ_{1}) m] [f (u) + h f' (x) m + h^{2} f ″ (ξ_{2}) m^{2}] d m \\ = g (u) f (m) + O (h^{2}), \\ V a r Ψ_{1}^{3} = \frac{1}{n} V a r k_{h} (u_{i} - u) g (u_{i}) = \frac{1}{n} \{E k_{h}^{2} (u_{i} - u) g^{2} (u_{i}) - {[E k_{h} (u_{i} - u) g (u_{i})]}^{2}\}, \end{matrix}

where

ξ_{i} i = 1, 2

falls between

u_{i}

and

u

. And then,

\begin{matrix} E k_{h}^{2} (u_{i} - u) g^{2} (u_{i}) = h^{- 1} \int k^{2} (m) g (u + h m) f (u + h m) d m = O (h^{- 1}), \\ V a r Ψ_{1}^{3} = \frac{1}{n} \{O (h^{- 1}) + {[g (u) f (u) + O (h^{2})]}^{2}\} = O ({(n h)}^{- 1}), \\ Ψ_{1}^{3} = g (u) f (u) + O (h^{2}) + O_{p} ({(n h)}^{- 1 / 2}) = g (u) f (u) + O (h^{2} + {(n h)}^{- 1 / 2}) . \end{matrix}

Similarly,

\begin{matrix} E Ψ_{2}^{3} = E k_{h} (u_{i} - u) (\frac{u_{i} - u}{h}) g (u_{i}) = \int k (m) f (u + h m) g (u + h m) m d m \\ = \int k (m) [g (u) + h g' (ξ_{1}) m] [f (u) + h f' (ξ_{2}) m] m d m = O (h), \\ {(E Ψ_{2}^{3})}^{2} = \frac{1}{n} E k_{h}^{2} (u_{i} - u) {(\frac{u_{i} - u}{h})}^{2} g^{2} (u_{i}) \\ = {(n h)}^{- 1} \int k^{2} (m) f (u + h m) g^{2} (u + h m) m^{2} d m = O ({(n h)}^{- 1}) . \end{matrix}

In that way,

V a r Ψ_{2}^{3} = E {(Ψ_{2}^{3})}^{2} - {(E Ψ_{2}^{3})}^{2} = O (h^{2} + {(n h)}^{- 1})

and

Ψ_{2}^{3} = O_{p} (h + {(n h)}^{- 1 / 2}) = o_{p} (1)

. This suggests that

\frac{1}{n} M (u, β)' κ (u, β) G_{n t} = (\begin{matrix} g (u) f (u) + O (h^{2} + {(n h)}^{- 1 / 2}) \\ O_{p} (h + {(n h)}^{- 1 / 2}) \end{matrix}) \overset{P}{\to} (\begin{matrix} g (u) f (u) \\ 0 \end{matrix} .)

And then, according to Assumption 4,

\begin{matrix} s (u, β) G_{n t} = e_{1} {(\frac{1}{n} M (u, β)' κ (u, β) M (u, β))}^{- 1} \frac{1}{n} M (u, β)' κ (u, β) G_{n t} = g (u) + o_{p} (1), i . e ., s (u_{i}, β) G_{n t} = g (u_{i}) + \\ o_{p} (1), i = 1, 2, \dots, n \\ S (β) G_{n t} = (s (x_{1}' β)' G_{n t}, s (x_{2}' β)' G_{n t}, \dots, s (x_{n}' β)' G_{n t})' = G_{n t} + o_{p} (1) . Hence, (I_{n} - S (β)) G_{n t} = o_{p} (1) . \end{matrix}

□

Appendix B. Proof of Theoretical Results

Proof of Theorem 1.

As

E \sum_{t = 1}^{T} V_{n t}' V_{n t} = n T σ_{0}^{2}

, at

θ_{0}

, (10) implies

E [l n L_{n, T} (θ_{0})] = - \frac{n T}{2} l n 2 π - \frac{n T}{2} l n σ_{0}^{2} + T l n |A_{n}| - \frac{n T}{2}

. Then, we have

\begin{matrix} \frac{1}{n T} E [l n L_{n, T} (θ)] - \frac{1}{n T} E [l n L_{n, T} (θ_{0})] \\ = - \frac{1}{2} (l n σ_{0}^{2} - l n σ^{2}) + \frac{1}{n} l n |A_{n} (ρ)| - \frac{1}{n} l n |A_{n}| - [\frac{1}{n T} \frac{1}{2 σ^{2}} E \sum_{t = 1}^{T} V_{n t} (ζ)' V_{n t} (ζ) - \frac{1}{2}] \\ = T_{1, n} (ρ, σ^{2}) - \frac{1}{2 σ^{2}} T_{2, n} (ρ, δ) + o (1), where T_{1, n} (ρ, σ^{2}) = - \frac{1}{2} (l n σ_{0}^{2} - l n σ^{2}) + \frac{1}{n} l n |A_{n} (ρ)| - \frac{1}{n} l n |A_{n}| - \frac{1}{2 σ^{2}} \\ [σ_{n}^{2} (ρ) - σ^{2}] and T_{2, n} (ρ, δ) = \frac{1}{n T} E \sum_{t = 1}^{T} [Z_{n t} (δ - δ_{0}) + (ρ - ρ_{0}) F_{n} Z_{n t} δ_{0}]' [Z_{n t} (δ - δ_{0}) + (ρ - ρ_{0}) F_{n} Z_{n t} δ_{0}] \end{matrix}

We consider the process

Y_{n t} = ρ_{0} W_{n} Y_{n t} + V_{n t}

, for a period t, the log-likelihood function of which is

l n L_{p, n} (ρ, σ^{2}) = - \frac{n}{2} l n (2 π σ^{2}) + l n |A_{n} (ρ)| - \frac{1}{2 σ^{2}} \sum_{t = 1}^{T} [A_{n} (ρ) Y_{n t}]' [A_{n} (ρ) Y_{n t}]

By letting

E_{p} (\cdot)

be the expectation operator for

Y_{n t}

, we have

\begin{matrix} E_{p} [\frac{1}{n} l n L_{p, n} (ρ, σ^{2})] - E_{p} [\frac{1}{n} l n L_{p, n} (ρ_{0}, σ_{0}^{2})] \\ = - \frac{1}{2} [l n σ^{2} - l n σ_{0}^{2}] + \frac{1}{n} l n |A_{n} (ρ)| - \frac{1}{n} l n |A_{n}| - \frac{1}{2 σ^{2}} [σ_{n}^{2} (ρ) - σ^{2}] \\ = T_{1, n} (ρ, σ^{2}) \end{matrix}

Based on information inequality,

l n L_{p, n} (ρ, σ^{2}) - l n L_{p, n} (ρ_{0}, σ_{0}^{2}) \leq 0

. Thus,

T_{1, n} (ρ, σ^{2}) \leq 0

for any

(ρ, σ^{2})

. Also,

T_{2, n} (ρ, δ)

is a quadratic function of

ρ

and

δ

. Under the condition that

E Φ_{n T}

is nonsingular,

T_{2, n} (ρ, δ) > 0

whenever

(ρ, δ) \neq (ρ_{0}, δ_{0})

, so

(ρ, δ)

is globally identified, given that

ρ_{0}, σ_{0}^{2}

is a unique maximizer of

T_{1, n} (ρ, σ^{2})

. Hence,

(ρ, δ, σ^{2})

is globally identified. Combined with uniform convergence and equicontinuity in Lemmas 3 and 4, the consistency follows. □

Proof of Theorem 2.

From the proof of Theorem 1,

\frac{1}{n T} E [l n L_{n, T} (θ)] - \frac{1}{n T} E [l n L_{n, T} (θ_{0})] = T_{1, n} (ρ, σ^{2}) - \frac{1}{2 σ^{2}} T_{2, n} (ρ, δ) + o (1)

. When

E Φ_{n T}

is singular,

δ_{0}

and

ρ_{0}

cannot be identified from

T_{2, n} (ρ, δ)

. Global identification requires that the limit of

T_{1, n} (ρ, σ^{2})

is strictly less than zero. As

T_{1, n} (ρ, σ^{2}) \leq 0

based on information inequality,

T_{1, n} (ρ, σ^{2}) \neq 0

is equivalent to

\underset{n \to \infty}{l i m} (\frac{1}{n} l n |σ_{0}^{2} A_{n}^{- 1} A'_{n}^{- 1}| - \frac{1}{n} l n |σ_{n}^{2} (ρ) A_{n}^{- 1} (ρ) A'_{n}^{- 1} (ρ)|) \neq 0

(see Lee (2004)). After

ρ_{0}

and

σ_{0}^{2}

are identified, given

ρ_{0}

,

δ_{0}

can be identified from

T_{2, n} (ρ, δ)

. Combined with uniform convergence and equicontinuity in Lemmas 3 and 4, the consistency follows. □

Proof of Theorem 3.

As (6), we have

{\hat{G}}_{n t}

=

S (\hat{β}) {\hat{Y}}_{t}

=

S (\hat{β}) (Y_{n t} - \hat{ρ} W_{n} Y_{n t} - Z_{n t} \hat{δ})

=

S (\hat{β}) (Y_{n t} - \hat{ρ} W_{n} Y_{n t} + ρ_{0} W_{n} Y_{n t} - ρ_{0} W_{n} Y_{n t} - Z_{n t} \hat{δ} + Z_{n t} δ_{0} - Z_{n t} δ_{0})

=

S (\hat{β}) [(Y_{n t} - ρ_{0} W_{n} Y_{n t} - Z_{n t} δ_{0}) - (\hat{ρ} - ρ_{0}) W_{n} Y_{n t} - Z_{n t} (\hat{δ} - δ_{0})]

=

S (\hat{β}) [G_{n t} + V_{n t} - (\hat{ρ} - ρ_{0}) W_{n} Y_{n t} - Z_{n t} (\hat{δ} - δ_{0})]

, so that,

{\hat{G}}_{n t} - G_{n t} = S (\hat{β}) V_{n t} - [I_{n} - S (\hat{β})] G_{n t} - [(\hat{ρ} - ρ_{0}) S (\hat{β}) W_{n} Y_{n t} + S (\hat{β}) Z_{n t} (\hat{δ} - δ_{0})]

.

From Theorems 1 and 2,

\hat{ρ} - ρ_{0} = o_{p} (1), \hat{δ} - δ_{0} = o_{p} (1)

. And from Lemmas 6 and 7,

[I_{n} - S (\hat{β})] G_{n t} = o_{p} (1), S (\hat{β}) V_{n t} = o_{p} (1)

. Otherwise determined by the boundedness in the assumptions,

{\hat{G}}_{n t} - G_{n t} = o_{p} (1)

. □

References

Ai, C.; Chen, X. Efficient estimation of models with conditional moment restrictions containing unknown functions. Econometrica 2003, 71, 1795–1843. [Google Scholar] [CrossRef]
Baltagi, B.H.; Song, S.H.; Koh, W. Testing panel data regression models with spatial error correlation. Econometrics 2003, 117, 123–150. [Google Scholar] [CrossRef]
Chen, J.; Gao, J.T.; Li, D.G. Estimation in partially linear single-index panel data models with fixed effects. J. Bus. Econ. Stat. 2013, 31, 315–330. [Google Scholar] [CrossRef]
Elhorst, J.P. Specification and estimation of spatial panel data models. Int. Reg. Sci. Rev. 2003, 26, 244–268. [Google Scholar] [CrossRef]
Wickham, H. ggplot2: Elegant Graphics for Data Analysis; Springer: New York, NY, USA, 2016. [Google Scholar]
Jin, B.; Wu, Y.; Rao, C.R.; Hou, L. Estimation and model selection in general spatial dynamic panel data models. Proc. Natl. Acad. Sci. USA 2020, 117, 5235–5241. [Google Scholar] [CrossRef] [PubMed]
Lee, L.F.; Yu, J. Estimation of spatial panel model with fixed effects. Econometrics 2010, 154, 165–185. [Google Scholar] [CrossRef]
Lee, L.F.; Yu, J. A spatial dynamic panel data model with both time and individual fixed effects. Econom. Theory 2010, 26, 564–597. [Google Scholar] [CrossRef]
Lee, L.F.; Yu, J. Some recent developments in spatial panel data models. Reg. Sci. Urban Econ. 2010, 40, 255–271. [Google Scholar] [CrossRef]
Pang, Z.; Xue, L.G. Estimation for the single-index models with random effects. Comput. Stat. Data Anal. 2012, 56, 1837–1853. [Google Scholar] [CrossRef]
Parent, O.; LeSage, J.P. A space-time filter for panel data models containing random effects. Comput. Stat. Data Anal. 2011, 55, 475–490. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2022. [Google Scholar]
Su, L.; Yang, Z. QML Estimation of Dynamic Panel Data Models with Spatial Errors; Singapore Management University: Singapore, 2007. [Google Scholar]
Su, L.; Jin, S.N. Profile quasi-maximum likelihood estimation of partially linear spatial autoregressive models. J. Econom. 2010, 157, 18–33. [Google Scholar] [CrossRef]
Wang, J.L.; Xue, L.G.; Zhu, L.X.; Chong, Y.S. Estimation for a partial-linear single-index model. Ann. Stat. 2010, 38, 246–274. [Google Scholar] [CrossRef]
Yu, J.; de Jong, R.; Lee, L.F. Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and T are large. J. Econom. 2008, 146, 118–134. [Google Scholar] [CrossRef]
Yu, J.; Lee, J.F. Efficient GMM estimation of spatial dynamic panel data models. J. Econom. 2010, 180, 174–197. [Google Scholar]
Yu, J.; Lee, J.F. Estimation of unit root spatial dynamic panel data models. Econom. Theory 2010, 26, 1332–1362. [Google Scholar] [CrossRef]
Yu, J.; de Jong, R.; Lee, L.F. Estimation for spatial dynamic panel data with fixed effects: The case of spatial cointegration. J. Econom. 2012, 167, 16–37. [Google Scholar] [CrossRef]
Zhang, Y.Q.; Shen, D.M. Estimation of semiparametric varying-coefficient spatial panel data models with random effects. J. Stat. Plan. Inference 2015, 159, 64–80. [Google Scholar] [CrossRef]

Table 1. The performance of spatial coefficients estimators with

θ_{1}^{β}

.

Table 1. The performance of spatial coefficients estimators with

θ_{1}^{β}

.

$n$		$T = 4$				$T = 10$
$n$		RMSE	Means	Vars	CP	RMSE	Means	Vars	CP
10	$β_{1}$	0.0337	0.562	0.0074	0.82	0.0323	0.584	0.0066	0.84
	$β_{2}$	0.0341	0.832	0.0072	0.81	0.0319	0.794	0.0069	0.85
	$γ$	0.0282	−0.319	0.0058	0.84	0.0254	−0.313	0.0052	0.86
	$ρ$	0.0317	0.419	0.0064	0.82	0.0301	0.416	0.0054	0.84
	$σ^{2}$	0.1088	1.021	0.0069	0.85	0.0867	1.019	0.0062	0.88
	$τ$	0.0246	−0.224	0.0057	0.84	0.0238	−0.218	0.0051	0.87
49	$β_{1}$	0.0312	0.586	0.0068	0.85	0.0298	0.571	0.0052	0.88
	$β_{2}$	0.0325	0.828	0.0067	0.83	0.0296	0.809	0.0053	0.89
	$γ$	0.0234	−0.315	0.0054	0.86	0.0222	−0.288	0.0049	0.89
	$ρ$	0.0241	0.413	0.0057	0.84	0.0225	0.389	0.0048	0.88
	$σ^{2}$	0.0834	0.988	0.0059	0.87	0.0676	1.008	0.0054	0.90
	$τ$	0.0235	−0.179	0.0053	0.87	0.0226	−0.211	0.0047	0.91
100	$β_{1}$	0.0298	0.569	0.0053	0.87	0.0282	0.572	0.0045	0.90
	$β_{2}$	0.0302	0.807	0.0055	0.87	0.0283	0.811	0.0049	0.91
	$γ$	0.0217	−0.309	0.0049	0.88	0.0205	−0.297	0.0043	0.92
	$ρ$	0.0225	0.391	0.0047	0.89	0.0208	0.405	0.0039	0.91
	$σ^{2}$	0.0649	1.009	0.0048	0.90	0.0521	0.994	0.0041	0.93
	$τ$	0.0221	−0.185	0.0047	0.89	0.0213	−0.192	0.0042	0.92

Table 2. The performance of spatial coefficients estimators with

θ_{2}^{β}

.

Table 2. The performance of spatial coefficients estimators with

θ_{2}^{β}

.

$n$		T = 4				T = 10
$n$		RMSE	Means	Vars	CP	RMSE	Means	Vars	CP
10	$β_{1}$	0.0406	0.564	0.0088	0.79	0.0395	0.569	0.0077	0.82
	$β_{2}$	0.0512	0.829	0.0091	0.81	0.0494	0.806	0.0086	0.83
	$γ$	0.0332	0.226	0.0076	0.83	0.0318	0.234	0.0069	0.85
	$ρ$	0.0343	−0.620	0.0081	0.82	0.0335	−0.617	0.0077	0.83
	$σ^{2}$	0.1039	0.531	0.0084	0.82	0.1021	0.526	0.0078	0.84
	$τ$	0.0342	0.122	0.0078	0.84	0.0329	0.115	0.0071	0.86
49	$β_{1}$	0.0372	0.586	0.0071	0.82	0.0368	0.571	0.0056	0.85
	$β_{2}$	0.0433	0.802	0.0074	0.84	0.0419	0.812	0.0063	0.85
	$γ$	0.0325	0.239	0.0068	0.87	0.0299	0.258	0.0057	0.90
	$ρ$	0.0301	−0.612	0.0068	0.84	0.0298	−0.592	0.0055	0.88
	$σ^{2}$	0.0914	0.518	0.0069	0.85	0.0867	0.489	0.0062	0.90
	$τ$	0.0329	0.114	0.0064	0.88	0.0314	0.089	0.0054	0.92
100	$β_{1}$	0.0355	0.581	0.0061	0.84	0.0316	0.573	0.0056	0.89
	$β_{2}$	0.0367	0.813	0.0057	0.87	0.0345	0.810	0.0051	0.91
	$γ$	0.0307	0.243	0.0052	0.90	0.0281	0.253	0.0049	0.92
	$ρ$	0.0281	−0.592	0.0053	0.87	0.0282	−0.604	0.0046	0.90
	$σ^{2}$	0.0823	0.512	0.0061	0.89	0.0606	0.494	0.0057	0.92
	$τ$	0.0298	0.092	0.0058	0.91	0.0289	0.105	0.0051	0.94

Table 3. The performance of unknown function estimators

{\hat{G}}_{n t}

.

Table 3. The performance of unknown function estimators

{\hat{G}}_{n t}

.

n	$θ_{0}^{a} = (- 0.3, 0.4, 1, - 0.2)'$				$θ_{0}^{b} = (0.25, - 0.6, 0.5, 0.1)'$
	T = 4		T = 10		T = 4		T = 10
	$MADE$	Std.dev	$MADE$	Std.dev	$MADE$	Std.dev	$MADE$	Std.dev
10	0.0406	0.0216	0.0344	0.0187	0.0397	0.0187	0.0321	0.0145
49	0.0314	0.0125	0.0229	0.0112	0.0252	0.0138	0.0176	0.0098
100	0.0198	0.0092	0.0106	0.0076	0.0142	0.0096	0.0094	0.0071

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Profile Maximum Likelihood Estimation of Single-Index Spatial Dynamic Panel Data Model

Abstract

1. Introduction

2. The Model and Profile Maximum Likelihood Estimators

2.1. The Model

2.2. The Profile Maximum Likelihood Estimation

3. Profile Likelihood Estimators and Their Asymptotic Properties

4. Monte Carlo Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Some Basic Lemmas

Appendix B. Proof of Theoretical Results

References

Article Metrics

Citations

Article Access Statistics