Estimation of the Non-Parametric Spatial Dynamic Panel Data Model with Fixed Effects

Mengqi Zhang; Boping Tian

doi:10.3390/math11132865

and

Department of Mathematics, Harbin Institute of Technology, Harbin 150001, China

^*

Author to whom correspondence should be addressed.

Mathematics2023, 11(13), 2865;https://doi.org/10.3390/math11132865

Version Notes

Order Reprints

Abstract

In this paper, the spatial dynamic panel data (SDPD) model with fixed effects is extended to a non-parametric form by relaxing the linear or nonlinear parameter structure of explanatory variables. The non-parametric spatial dynamic panel data (NSDPD) model with fixed effects not only retains the advantages of the SDPD model which can deal with spatial and/or temporal individual characteristics and spatio-temporal dependencies but also solves the limitation that may lead to specification errors. It also enhances the flexibility and practicability of the spatial econometric model. Since the model to be estimated contains unknown functions, we propose this profile maximum likelihood (PML) method to solve the problem of the incidental parameters in the estimation. Under the assumption that the spatial coefficients are known, we first eliminate the influence of the time effect by substitution and then use the local polynomial estimation to preliminarily estimate the unknown function so as to transform the model into the parametric form for solving. We derive the asymptotic properties of PMLEs and find that under certain regularity conditions, both parametric and non-parametric estimators are consistent. The Monte Carlo results show that the estimators have good finite sample performance. We illustrate the empirical relevance of the model by applying it to examine the impact of tourism dynamics on economic development in the Yangtze River Delta region of China.

Keywords:

profile maximum likelihood; non-parametric estimation; spatial dynamic panel data; fixed effects; individual effects

MSC:

62G05; 62G20

1. Introduction

In recent years, the spatial econometric model has been used to study many economic issues such as regional employment growth rates, housing price models, technology introduction, and so on. It is an important analytical tool in fields like regional science, geography, and the economy. The spatial econometric model can be divided into parametric models and non-parametric models according to different hypothesis forms of explanatory variables. Parametric models assume that functional relationships between explanatory variables are known and are mostly assumed to be linear. With the efforts of many researchers, the use of parametric spatial models has expanded from cross-sectional data to dynamically correlated panel data, forming the spatial dynamic panel data (SDPD) model, in which the spatio-temporal dynamic elements are added. The SDPD model not only considers the influence of individuals, which are often heterogeneous, but also takes into account spatio-temporal dependencies, improving the flexibility of the spatial panel data model.

A typical SDPD model includes individual (or regional) and time-fixed effects to account for the effects of individual and time invariant factors on the dependent variable. A data transformation approach can be adopted to wipe out either the individual or the time fixed effects from the model in order to make the incidental parameters problem less severe (Ref. [1]). For example, Refs. [2,3] established the asymptotic properties of quasi-maximum likelihood (QML) estimators and generalized moment estimation estimators for the spatial dynamic panel model with fixed effects when both the number of individuals n and the number of time periods T are large, respectively. Ref. [4] examined the asymptotics of QML estimators for unit root SDPD models with fixed effects. Then, Ref. [5] investigated the asymptotic properties of QML estimators in the presence of an unstable unit roots generated by temporal and spatial correlations and proposed bias correction for the estimators. Ref. [6] investigated the first difference (FD) estimation of SDPD models with fixed effects using the QML approach, where both n and T are large. These studies have shown that SDPD models with fixed effects are very worthy of attention in recent years.

However, as a parametric model, the SDPD model has a serious problem in that it assumes that the independent variables are linear:

X_{n t} β

. Once the functional relationship between economic variables is not linear, or even if the true relationship is unknown, the use of parametric models may have serious consequences such as inconsistent estimates. The non-parametric model only requires the relationship between explanatory variables to be a smooth function satisfying certain moment conditions and does not need to make assumptions on the model form in advance, which can effectively avoid the risk of model misassumptions. Current research on non-parametric spatial panel data is not complete. Ref. [7] proved that the estimation of regression functions of non-parametric panel data has consistency when the variance is known, but in most cases, the variance is unknown. Ref. [8] proposed a test statistic to test the null hypothesis of random effects versus fixed effects in non-parametric panel data regression models. Their research on non-parametric panel data has not been extended to spatial econometric models because the characteristics of spatial econometric models and spatial and time lag terms make it impossible to directly use non-parametric estimation methods. This is a difficult problem for calculation and theoretical derivation, which is also the problem to be solved in this paper.

According to the research trend of spatial econometric models, we designed a non-parametric spatial dynamic panel data (NSDPD) model with fixed effects. The influence of explanatory variables is extended from setting a known linear or nonlinear parameter structure, which not only retains the advantages of the SDPD model in a parametric form that can deal with spatial and/or temporal individual characteristics and spatio-temporal dependence but also solves the limitation that may lead to specification error and enhances the readability and explanatory power of the SDPD model. It is worth noting that estimation methods for finite-dimensional parametric spatial models may not be directly applied to non-parametric models because the estimation part of the model contains unknown functions, which is equivalent to facing an infinite-dimensional parameter estimation problem. In order to overcome the above difficulties, we propose a profile maximum likelihood (PML) method, which eliminates the influence of time effects through a transformation process and avoids the problem of incidental parameters in estimation. After that, we present a rigorous theoretical analysis of the asymptotic properties of PMLE and verify some of its finite-sample properties through Monte Carlo experiments. Finally, we illustrate the empirical relevance of the model by applying it to examine the impact of tourism dynamics on economic development in the Yangtze River Delta region of China.

2. The Model and Profile Maximum Likelihood Estimators

2.1. The Model

The model considered in this paper is the non-parametric spatial dynamic panel data (NSDPD) model:

Y_{n t} = ρ_{0} W_{n} Y_{n t} + γ_{0} Y_{n, t - 1} + τ_{0} W_{n} Y_{n, t - 1} + G_{n t} (\vec{x}) + α_{n 0} + V_{n t}, t = 1, 2, \dots, T

(1)

where

Y_{n t} = (y_{1 t}, y_{2 t}, \dots, y_{n t})'

,

\vec{x} = (x_{1}, x_{2}, \dots, x_{n})'

, and

V_{n t} = (v_{1 t}, v_{2 t}, \dots, v_{n t})'

are n × 1 column vectors and

v_{n t}

is i.i.d. across i and t with zero mean and variance

σ_{0}^{2}

,

W_{n}

is an

n \times n

spatial weights matrix, which is predetermined and generates the spatial dependence between cross sectional units

y_{i t}

; and

α_{n 0}

is a column vector of order n × 1, representing space-specific effects, which controls all variables with fixed space and constant time. If these variables are ignored, the corresponding section model estimation will be biased.

G_{n t} (\vec{x}) = (g (x_{1}), g (x_{2}), \dots, g (x_{n}))'

,

g (\cdot)

is an unknown function.

Define

Z_{n t} = (Y_{n, t - 1}, W_{n} Y_{n, t - 1})

,

A_{n} (ρ) = I - ρ W_{n}

,

θ = (δ', ρ, σ^{2})'

, and

ζ = (δ', ρ)'

, where

δ = (γ, τ)'

. At the true value,

{A_{n} = A}_{n} (ρ_{0}) = I - ρ_{0} W_{n}

,

θ_{0} = (δ_{0}^{'}, ρ_{0}, σ_{0}^{2})'

,

ζ_{0} = (δ_{0}^{'}, ρ_{0})'

, where

δ_{0} = (γ_{0}, τ_{0})'

. Then, presuming

A_{n}

is invertible and (1) can be rewritten as

Y_{n t} = A_{n}^{- 1} (Z_{n t} δ_{0} + G_{n t} (\vec{x}) + α_{n 0} + V_{n t})

. The likelihood function of (1) is:

\ln L_{n, T} (θ, α_{n}) = - \frac{n T}{2} l n (2 π σ^{2}) + T l n |A_{n} (ρ)| - \frac{1}{2 σ^{2}} \sum_{t = 1}^{T} {V_{n t}' (ζ) V}_{n t} (ζ)

(2)

where

V_{n t} (ζ) = A_{n} (ρ) Y_{n t} - Z_{n t} δ - G_{n t} (\vec{x}) - α_{n}

and

V_{n t} (ζ) = (v_{1}, v_{2}, \dots, v_{n})'

. Thus,

V_{n t} = V_{n t} (ζ_{0})

.

The estimators

{\hat{θ}}_{n T}

and

{\hat{α}}_{n T}

are derived from maximization of (2). When the

V_{n t}

are normally distributed,

{\hat{θ}}_{n T}

is the MLE; when the

V_{n t}

are not normally distributed,

{\hat{θ}}_{n T}

is the QMLE. As

n

approaches infinity, the information of

α_{n}

can be centralized, and the estimator of

θ_{0}

can be analyzed centrally by using the centralized log-likelihood function. For a concentrated log-likelihood function, the dimension of the parameter space does not change with increasing

n

and/or

T

.

For ease of presentation, we define

{\tilde{Y}}_{n t} = Y_{n t} - {\bar{Y}}_{n T}

and

{\tilde{Y}}_{n, t - 1} = Y_{n, t - 1} - {\bar{Y}}_{n T, - 1}, t = 1, 2, \dots, T

, where

{\bar{Y}}_{n T} = \frac{1}{T} \sum_{t = 1}^{T} Y_{n t}, {\bar{Y}}_{n T, - 1} = \frac{1}{T} \sum_{t = 1}^{T} Y_{n, t - 1}

. Similarly, we define

{\tilde{V}}_{n t} = V_{n t} - {\bar{V}}_{n T}

,

{\tilde{G}}_{n t} = G_{n t} (\vec{x}) - {\bar{G}}_{n T}

and

{\tilde{Z}}_{n t} = Z_{n t} - {\bar{Z}}_{n T} = ({\tilde{Y}}_{n, t - 1}, W_{n} {\tilde{Y}}_{n, t - 1}) = (Y_{n, t - 1} - {\bar{Y}}_{n T, - 1}, W_{n} Y_{n, t - 1} - W_{n} {\bar{Y}}_{n T, - 1})

. From the first derivative of Equation (2),

\frac{\partial \ln L_{n, T} (θ, α_{n})}{\partial α_{n}} = \frac{1}{σ^{2}} \sum_{t = 1}^{T} V_{n t} (ζ)

. Thus, given

θ

, the concentrated estimator of

α_{n 0}

is

{\hat{α}}_{n T} (θ) = \frac{1}{T} \sum_{t = 1}^{T} [A_{n} (ρ) Y_{n t} - Z_{n t} δ - G_{n t} (\vec{x})]

. Put it into Equation (2) and the concentrated logarithmic likelihood function can be obtained:

\ln L_{n, T} (θ) = - \frac{n T}{2} l n (2 π σ^{2}) + T l n |A_{n} (ρ)| - \frac{1}{2 σ^{2}} \sum_{t = 1}^{T} {\tilde{V}}_{n t}' (ζ) {\tilde{V}}_{n t} (ζ)

(3)

where

{\tilde{V}}_{n t} (ζ) = A_{n} (ρ) {\tilde{Y}}_{n t} - {\tilde{Z}}_{n t} δ - {\tilde{G}}_{n t}

.

The maximum likelihood estimate

{\hat{θ}}_{n T}

maximizes the centralized log-likelihood function (3), and the QMLE of

α_{n 0}

is

{\hat{α}}_{n T} ({\hat{θ}}_{n T})

. From (3), the first and second derivatives of the concentrated logarithmic likelihood function can be derived.

2.2. Profile Maximum Likelihood Estimation

For the likelihood function shown in (2), the parameter estimation method is not feasible because

g (x_{i}), i = 1, \dots, n

is unknown. In order to obtain a feasible estimate, we propose to adopt the PML method. Firstly, we consider the parameter

θ = (δ', ρ, σ^{2})'

as known; then, (1) becomes a general spatial non-parametric model, and the initial estimate

{\hat{g}}_{I N}

of

g (\cdot)

can be obtained by using the local polynomial estimation. Obviously,

{\hat{g}}_{I N}

is a function of the parameter

θ

. By replacing

g (\cdot)

in (2) with

{\hat{g}}_{I N}

, we obtain the likelihood function with parameter

θ

. Then, by maximizing the likelihood function, we btain the estimator

\hat{θ} = (\hat{δ'}, \hat{ρ}, \hat{σ^{2}})'

of

θ

. Finally, the final estimate of

g (\cdot)

,

\hat{g} (x_{i})

, is obtained by replacing

θ

in

{\hat{g}}_{I N}

with

\hat{θ}

.

The specific steps of profile maximum likelihood are as follows:

Step 1:

Considering

θ = (δ', ρ, σ^{2})'

as known, we obtain

{\hat{g}}_{I N}

, the initial estimate of

g (\cdot)

, using local polynomial estimation.

Denote

Y_{t} = A_{n} (ρ) Y_{n t} - Z_{n t} δ - α_{n} = G_{n t} (\vec{x}) + V_{n t} (ζ)

and

Y_{t} = (y_{1}, y_{2}, \dots, y_{n})'

, (1) can be written as

y_{i} = g (x_{i}) + v_{i}, i = 1, \dots, n

. Assuming

g (\cdot)

has a continuous derivative of order

p + 1

, the p—order Taylor expansion of

g (x_{i})

at

x_{0}

is:

g (x_{i}) \approx g (x_{0}) + g' (x_{0}) (x_{i} - x_{0}) + \dots + \frac{1}{p!} g^{(p)} {(x_{i} - x_{0})}^{p} ≜ β_{0} + β_{1} (x_{i} - x_{0}) + \dots + β_{p} {(x_{i} - x_{0})}^{p}

(4)

Therefore, we can use the samples near

x_{0}

to perform weighted regression to obtain the estimation of

g (x_{i})

and its higher-order derivatives, that is, solve the following minimization problem:

\min_{\{β_{0}, \dots, β_{p}\}} \sum_{i = 1}^{n} {\{y_{i} - \sum_{j = 0}^{p} β_{j} {(x_{i} - x_{0})}^{j}\}}^{2} k_{h} (x_{i} - x)

(5)

where

k_{h} (x_{i} - x) = h^{- p} k (\frac{x_{i} - x}{h})

,

k (\frac{x_{i} - x}{h}), i = 1, \dots, n

is the multivariate kernel function and h is the bandwidth. In order to simplify the theoretical derivation, all variables in this chapter have the same window width, and the conclusion is also valid under the assumption of different window widths.

For the convenience of the following matrix operations, we denote

p = n - 1

. Then, denote

X = (\begin{matrix} 1 \\ ⋮ \\ 1 \end{matrix} \begin{matrix} x_{1} - x_{0} & \dots & {(x_{1} - x_{0})}^{n - 1} \\ ⋮ & ⋱ & ⋮ \\ x_{n} - x_{0} & \dots & {(x_{n} - x_{0})}^{n - 1} \end{matrix}), β = (β_{0}, \dots, β_{n - 1})'

and

κ (x_{0}) = d i a g (k_{h} (x_{1} - x), \dots, k_{h} (x_{n} - x))

, so

X

is an

n \times n

square matrix. We can rewrite the objective function (4) as:

(Y_{t} - X β)' κ (x) (Y_{t} - X β)

After minimizing, the estimator of

β

is:

\hat{β} = {(X' κ (x) X)}^{- 1} X' κ (x) Y_{t}

The first component of

\hat{β}

is an estimate of

g (x_{i})

. Denote

{(X' κ (x) X)}^{- 1} X' κ (x) = S (x)

and

s (x) = e_{1} S (x)

, the initial estimate of

g (x_{i})

is:

{\hat{g}}_{I N} (x) = e_{1} \hat{β} = e_{1} S (x) Y_{t} = s (x) Y_{t}

(6)

where

e_{1} = (1, 0, \dots, 0)

. Then, the initial estimate of

G_{n t} (\vec{x})

is:

{\hat{G}}_{I N} = ({\hat{g}}_{I N} (x_{1}), {\hat{g}}_{I N} (x_{2}), \dots, {\hat{g}}_{I N} (x_{n}))' = (s (x_{1}) Y, s (x_{2}) Y, \dots, s (x_{n}) Y_{t})' = (s (x_{1})', s (x_{2})', \dots, s (x_{n})')' Y_{t} = S Y_{t}

(7)

where

S = (s (x_{1})', s (x_{2})', \dots, s (x_{n})')'

.

Step 2:

Substituting

{\hat{G}}_{I N, t}

for

G_{n t} (\vec{x})

in (1), we obtain the approximate value of the logarithmic likelihood function as:

\ln L_{n, T} (θ) = - \frac{n T}{2} l n (2 π σ^{2}) + T l n |A_{n} (ρ)| - \frac{1}{2 σ^{2}} \sum_{t = 1}^{T} {\tilde{V}}_{n t}' (ζ) {\tilde{V}}_{n t} (ζ) = - \frac{n T}{2} l n (2 π σ^{2}) + T l n |A_{n} (ρ)| - \frac{1}{2 σ^{2}} \sum_{t = 1}^{T} [(I_{n} - S) Y_{t}]' [(I_{n} - S) Y_{t}]

(8)

where

{\tilde{V}}_{n t} (ζ) = A_{n} (ρ) {\tilde{Y}}_{n t} - {\tilde{Z}}_{n t} δ - {\tilde{G}}_{I N, t}

,

{\tilde{G}}_{I N, t} = {\hat{G}}_{I N, t} - {\bar{G}}_{I N, t}

,

{\bar{G}}_{I N, t} = \frac{1}{T} \sum_{t = 1}^{T} {\hat{G}}_{I N, t}

.

The

{\hat{θ}}_{n T}

that can maximize the above formula is the estimate of

θ

, i.e:

{\hat{θ}}_{n T} = a r g \max_{θ} \frac{1}{n T} \ln L_{n, T} (θ)

Computationally and analytically, it is convenient to work with the concentrated log-likelihood by concentrating out the

σ^{2}

. From the log-likelihood function, the QMLE of

σ^{2}

is:

\hat{σ^{2}} (ζ) = \frac{1}{n T} \sum_{t = 1}^{T} {\tilde{V}}_{n t}' (ζ) {\tilde{V}}_{n t} (ζ)' = \frac{1}{n T} \sum_{t = 1}^{T} [(I_{n} - S) Y_{t}]' [(I_{n} - S) Y_{t}]

(9)

The concentrated log-likelihood function of

ζ

is:

\ln L_{n, T} (ζ) = - \frac{n T}{2} l n (2 π + 1) - \frac{n T}{2} l n \hat{σ^{2}} (ζ) + T l n |A_{n} (ρ)|

(10)

Step 3:

Using

\hat{θ} = (\hat{δ'}, \hat{ρ}, \hat{σ^{2}})'

obtained in Step 2 to replace the parameters in the model, we obtain the final estimate of the non-parametric part

g (x_{i})

as:

\hat{g} (x) = s (x) Y_{t} = s (x) [A_{n} (\hat{ρ}) Y_{n t} - Z_{n t} \hat{δ} - α_{n}]

3. Profile Likelihood Estimators and Their Asymptotic Properties

For our analysis of the asymptotic properties of the estimators, we need the following assumptions:

Assumption 1.

W_{n}

is a constant spatial weights matrix and its diagonal elements satisfy

w_{n, i i} = 0

for

i = 1, 2, \dots, n

. Also,

W_{n}

is uniformly bounded in row and column sums in absolute value (for short, UB).

Assumption 2.

The disturbances

\{v_{i t}\}, i = 1, 2, \dots, n

and

t = 1, 2, \dots, T

, are

i . i . d .

across

i

and

t

with zero mean, variance

σ^{2}

and

{E |v_{i t}|}^{4 + η} < \infty

for some

η > 0

.

Assumption 3.

A_{n} (ρ)

is invertible for all

ρ \in Λ

. Furthermore,

Λ

is compact and

ρ_{0}

is in the interior of

Λ

. Also,

A_{n}^{- 1} (ρ)

is UB, uniformly in

ρ \in Λ

.

Assumption 4.

{\{x_{i}\}}_{i = 1}^{n}

is an independent, identically distributed random sequence, which is nonstochastic and bounded uniformly in different

n

and

T

.

x_{i}, i = 1, \dots, n

have second-order continuously differentiable probability density functions

f (x_{i})

, where

0 < f (x_{i}) < \infty

, for any

x

on the support set.

Assumption 5.

g (\cdot)

has continuous derivatives

g^{(p)} (\cdot), p = 1, \dots, n

and

|g (\cdot)| \leq {\bar{ϖ}}_{g}

, where

{\bar{ϖ}}_{g}

is a positive constant.

Assumption 6.

When

n \to \infty

,

h \to 0

, and

n h^{p} \to \infty, p = 1, \dots, n

.

Assumption 7.

The kernel function

k (\cdot)

is a bounded continuous non-negative function whose support set is compact:

Π_{1}^{p} [- {\bar{ϖ}}_{k}, {\bar{ϖ}}_{k}] \subset R^{p}

, where

{\bar{ϖ}}_{k} > 0

is a constant. In addition,

X

,

κ (x)

and

{(X' κ (x) X)}^{- 1}

are UB.

Assumption 8.

k (\cdot)

is a tightly supported bounded kernel such that

\int m m' k (m) d m = m_{2} I_{n}

, where

m_{2} \neq 0

is a scalar. Furthermore, all odd moments of

k (\cdot)

do not exist, namely

\int m^{l_{1}} \dots m^{l_{d}} k (m) d m = 0

, for all non-negative integers

l_{1}, \dots, l_{d}

whose sums are odd. (The last condition is satisfied by the spherically symmetric kernel and the product kernel based on symmetric univariate kernel function).

Assumption 1 is a standard normalization assumption in spatial econometrics, and Assumption 2 provides regularity assumptions for

v_{i t}

. The reversibility and compactness of

A_{n} (ρ)

in Assumption 3 originated from Kelejian and Prucha (1998, 2001). When exogenous variables

x_{i}

are included in the model, it is convenient to assume that the exogenous regressors are uniformly bounded, as in Assumption 4. Assumption 5 is a necessary condition for (3). Assumption 6–8 are conditions of kernel density estimation. The bandwidth of kernel function,

h

, is an important parameter which affects the estimation result of kernel function. Kernel functions that satisfy Assumptions 7 and 8 exist, such as the product kernel,

k (m) = Π_{i = 1}^{p} k (m_{i})

, where

k (m_{i})

is a symmetric kernel of one variable on the closed interval

[- {\bar{ϖ}}_{k}, {\bar{ϖ}}_{k}]

.

For the concentrated log likelihood function (4) divided by sample size

n T

, the corresponding expected value function is

Q_{n, T} (θ) = \max_{σ^{2}} E [\frac{1}{n T} \ln L_{n, T} (θ)]

, which is:

\begin{array}{l} Q_{n, T} (θ) = \frac{1}{n T} E [\ln L_{n, T} (θ)] \\ = - \frac{1}{2} l n (2 π + 1) + \frac{1}{n} l n |A_{n} (ρ)| - \frac{1}{2 σ^{2}} E \frac{1}{n T} \sum_{t = 1}^{T} {\tilde{V}}_{n t}' (ζ) {\tilde{V}}_{n t} (ζ) \end{array}

(11)

To show the consistency of

{\hat{θ}}_{n T}

, we need the following uniform convergence result.

Lemma 1.

Under Assumptions 1–6, for an

n \times n

nonstochastic UB matrix

B_{n}

:

\frac{1}{n T} \sum_{t = 1}^{T} {\tilde{V}}_{n t}' B_{n} {\tilde{V}}_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} {\tilde{V}}_{n t}' B_{n} {\tilde{V}}_{n t}) = O_{p} (\frac{1}{\sqrt{n T}}) \frac{1}{n T} \sum_{t = 1}^{T} {\tilde{Z}}_{n t}' B_{n} {\tilde{Z}}_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} {\tilde{Z}}_{n t}' B_{n} {\tilde{Z}}_{n t}) = O_{p} (\frac{1}{\sqrt{n T}}) \frac{1}{n T} \sum_{t = 1}^{T} {\tilde{Z}}_{n t}' B_{n} {\tilde{V}}_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} {\tilde{Z}}_{n t}' B_{n} {\tilde{V}}_{n t}) = O_{p} (\frac{1}{\sqrt{n T}}) \frac{1}{n T} \sum_{t = 1}^{T} {\tilde{G}}_{I N, t}' B_{n} {\tilde{G}}_{I N, t} - E (\frac{1}{n T} \sum_{t = 1}^{T} {\tilde{G}}_{I N, t}' B_{n} {\tilde{G}}_{I N, t}) = O_{p} (\frac{1}{\sqrt{n T}}) \frac{1}{n T} \sum_{t = 1}^{T} {\tilde{G}}_{I N, t}' B_{n} {\tilde{V}}_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} {\tilde{G}}_{I N, t t}' B_{n} {\tilde{V}}_{n t}) = O_{p} (\frac{1}{\sqrt{n T}})

(12)

where

E (\frac{1}{n T} \sum_{t = 1}^{T} {\tilde{V}}_{n t}' B_{n} {\tilde{V}}_{n t}) = \frac{1}{n} σ_{0}^{2} t r (B_{n}) = O (1)

,

E (\frac{1}{n T} \sum_{t = 1}^{T} {\tilde{Z}}_{n t}' B_{n} {\tilde{Z}}_{n t}) = O (1)

,

E (\frac{1}{n T} \sum_{t = 1}^{T} {\tilde{Z}}_{n t}' B_{n} {\tilde{V}}_{n t}) = O (\frac{1}{T})

[Ref. [2], Lemma 15].

E (\frac{1}{n T} \sum_{t = 1}^{T} {\tilde{G}}_{I N, t}' B_{n} {\tilde{G}}_{I N, t}) = O_{p} (\frac{1}{\sqrt{n T}})

,

E (\frac{1}{n T} \sum_{t = 1}^{T} {\tilde{G}}_{I N, t}' B_{n} {\tilde{V}}_{n t}) = O_{p} (\frac{1}{\sqrt{n T}})

.

Lemma 2.

Under Assumptions 1–6, for an

n \times n

nonstochastic UB matrix

B_{n}

:

\frac{1}{n T} \sum_{t = 1}^{T} {\bar{V}}_{n t}' B_{n} {\bar{V}}_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} {\bar{V}}_{n t}' B_{n} {\bar{V}}_{n t}) = O_{p} (\frac{1}{\sqrt{n T}}) \frac{1}{n T} \sum_{t = 1}^{T} {\bar{Z}}_{n t}' B_{n} {\bar{Z}}_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} {\bar{Z}}_{n t}' B_{n} {\bar{Z}}_{n t}) = O_{p} (\frac{1}{\sqrt{n T}}) \frac{1}{n T} \sum_{t = 1}^{T} {\bar{Z}}_{n t}' B_{n} {\bar{V}}_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} {\bar{Z}}_{n t}' B_{n} {\bar{V}}_{n t}) = O_{p} (\frac{1}{\sqrt{n T^{2}}}) \frac{1}{n T} \sum_{t = 1}^{T} {\bar{G}}_{I N, t}' B_{n} {\bar{G}}_{I N, t} - E (\frac{1}{n T} \sum_{t = 1}^{T} {\bar{G}}_{I N, t}' B_{n} {\bar{G}}_{I N, t}) = O_{p} (\frac{1}{\sqrt{n T}}) \frac{1}{n T} \sum_{t = 1}^{T} {\bar{G}}_{I N, t}' B_{n} {\bar{V}}_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} {\bar{G}}_{I N, t}' B_{n} {\bar{V}}_{n t}) = O_{p} (\frac{1}{\sqrt{n T}})

(13)

where

E (\frac{1}{n T} \sum_{t = 1}^{T} {\bar{V}}_{n t}' B_{n} {\bar{V}}_{n t}) = O (\frac{1}{T})

,

E (\frac{1}{n T} \sum_{t = 1}^{T} {\bar{Z}}_{n t}' B_{n} {\bar{Z}}_{n t}) = O (1)

,

E (\frac{1}{n T} \sum_{t = 1}^{T} {\bar{Z}}_{n t}' B_{n} {\bar{V}}_{n t}) = O (\frac{1}{T})

[Ref. [2], Lemma 16].

E (\frac{1}{n T} \sum_{t = 1}^{T} {\bar{G}}_{n t}' B_{n} {\bar{G}}_{n t}) = O (\frac{1}{T})

,

E (\frac{1}{n T} \sum_{t = 1}^{T} {\bar{G}}_{n t}' B_{n} {\bar{V}}_{n t}) = O (\frac{1}{T})

.

The consistency of

{\hat{θ}}_{n T}

will follow from the uniform convergence of

\frac{1}{n T} \ln L_{n, T} (θ) - Q_{n, T} (θ)

to zero on

Θ

and the uniqueness identification condition (White (1994, Theorem 3.4)). The properties of each part of

\frac{1}{n T} \ln L_{n, T} (θ) - Q_{n, T} (θ)

are shown in Lemma 1 and 2, so the following conclusions can be drawn:

Lemma 3.

Let Θ be any compact parameter space. Then, under Assumptions 1–7,

\frac{1}{n T} \ln L_{n, T} (θ) - Q_{n, T} (θ) \overset{P}{\to} 0

uniformly in

θ \in Θ

.

Lemma 4.

Let Θ be any compact parameter space. Then, under Assumptions 1–7,

Q_{n, T} (ζ)

is uniformly equicontinuous for

θ \in Θ

.

Before obtaining the information matrix, we need to compute the first and second derivatives of the logarithmic likelihood function. The asymptotic distribution of the QMLE

{\hat{θ}}_{n T}

can be derived from the Taylor expansion of

\frac{\partial \ln L_{n, T} ({\hat{θ}}_{n T})}{\partial θ}

around

θ_{0}

. The first order derivative of the concentrated likelihood function involves both linear and quadratic functions of

V_{n t}

as follows:

\begin{array}{l} \frac{1}{\sqrt{n T}} \frac{\partial \ln L_{n, T} (θ_{0})}{\partial θ} \\ = \frac{1}{\sqrt{n T}} [\begin{matrix} \frac{\partial \ln L_{n, T} (θ_{0})}{\partial δ} & \frac{\partial \ln L_{n, T} (θ_{0})}{\partial ρ} & \frac{\partial \ln L_{n, T} (θ_{0})}{\partial σ^{2}} \end{matrix}] \\ = = \frac{1}{\sqrt{n T}} {[\begin{matrix} \frac{1}{σ^{2}} \sum_{t = 1}^{T} {\tilde{Z}}_{n s}' {\tilde{V}}_{n t} (ζ) \\ \frac{1}{σ^{2}} \sum_{t = 1}^{T} {\tilde{Y}}_{n s}' {\tilde{V}}_{n t} (ζ) - T t r (F_{n}) \\ \frac{1}{2 σ^{4}} (\sum_{t = 1}^{T} {\tilde{V}}_{n t} (ζ)' {\tilde{V}}_{n t} (ζ) - n T σ^{2}) \end{matrix}]}^{T} \end{array}

(14)

where

{\tilde{Z}}_{n s} = (I_{n} - S) {\tilde{Z}}_{n t}

,

{\tilde{Y}}_{n s} = (I_{n} - S) W_{n} {\tilde{Y}}_{n t}

. Then, the second order derivatives are:

\begin{array}{l} \frac{1}{n T} \frac{\partial^{2} \ln L_{n, T} (θ)}{\partial θ \partial θ'} = - \frac{1}{n T} \times \\ [\begin{matrix} \frac{1}{σ^{2}} \sum_{t = 1}^{T} {\tilde{Z}}_{n s}' {\tilde{Z}}_{n s} & \frac{1}{σ^{2}} \sum_{t = 1}^{T} {\tilde{Z}}_{n s}' {\tilde{Y}}_{n s} & \frac{1}{σ^{4}} \sum_{t = 1}^{T} {\tilde{Z}}_{n s}' {\tilde{V}}_{n t} (ζ) \\ * & \frac{1}{σ^{2}} \sum_{t = 1}^{T} {\tilde{Y}}_{n s}' {\tilde{Y}}_{n s} - T t r (F_{n}^{2}) & \frac{1}{σ^{4}} \sum_{t = 1}^{T} {\tilde{Y}}_{n s}' {\tilde{V}}_{n t} (ζ) \\ * & * & - \frac{n T}{2 σ^{4}} + \frac{1}{σ^{6}} \sum_{t = 1}^{T} {\tilde{V}}_{n t} (ζ)' {\tilde{V}}_{n t} (ζ) \end{matrix}] \end{array}

(15)

And the information matrix as follows:

Σ_{θ_{0}, n T} = - E (\frac{1}{n T} \frac{\partial^{2} \ln L_{n, T} (θ_{0})}{\partial θ \partial θ'}) = Σ_{θ_{0}, n T}^{(1)} - Σ_{θ_{0}, n T}^{(2)}

(16)

Σ_{θ_{0}, n T}^{(1)} = [\begin{matrix} \frac{1}{n T} \frac{1}{σ_{0}^{2}} E \sum_{t = 1}^{T} {\tilde{Z}}_{n s}' {\tilde{Z}}_{n s} & \frac{1}{n T} \frac{1}{σ_{0}^{2}} E \sum_{t = 1}^{T} {\tilde{Z}}_{n s}' {\tilde{R}}_{n s} & 0 \\ * & \frac{1}{n T} \frac{1}{σ_{0}^{2}} E \sum_{t = 1}^{T} {\tilde{R}}_{n s}' {\tilde{R}}_{n s} + \frac{1}{n} [t r (F_{n}^{2}) + t r (F_{n s}' F_{n s})] & \frac{1}{{n σ}_{0}^{2}} t r [F_{n s}] \\ * & * & \frac{1}{{2 σ}_{0}^{4}} + \frac{1}{{n σ}_{0}^{4}} \end{matrix}]

Σ_{θ_{0}, n T}^{(2)} = [\begin{matrix} 0 & \frac{1}{n} \frac{1}{σ_{0}^{2}} E ({\bar{Z}}_{n s}' F_{n s} {\bar{V}}_{n T}) & \frac{1}{n} \frac{1}{σ_{0}^{4}} E ({\bar{Z}}_{n s}' {\bar{V}}_{n T}) \\ * & \frac{1}{n} \frac{1}{σ_{0}^{2}} E {\bar{R}}_{n s}' {\bar{V}}_{n T} + \frac{1}{n T} t r (F_{n s}' F_{n s}) & \frac{1}{{n σ}_{0}^{4}} E [{\bar{R}}_{n s}' {\bar{V}}_{n T}] + \frac{1}{{n T σ}_{0}^{2}} t r [F_{n s}] \\ * & * & \frac{1}{{n T σ}_{0}^{4}} \end{matrix}]

where

F_{n s} = (I_{n} - S) F_{n}

,

{{\tilde{R}}_{n s} = F}_{n s} ({\tilde{Z}}_{n t} δ_{0} + {\tilde{G}}_{I N})

,

{{\bar{R}}_{n s} = F}_{n s} ({\bar{Z}}_{n T} δ_{0} + {\hat{G}}_{I N, T})

.

From Lemma 2,

Σ_{θ_{0}, n T}^{(2)} = O (\frac{1}{T})

,

Σ_{θ_{0}, n T} = Σ_{θ_{0}, n T}^{(1)} + O (\frac{1}{T})

.

Assumption 9.

\lim_{n \to \infty} \frac{1}{n} [t r (F_{n}^{2}) + t r (F_{n s}' F_{n s}) - \frac{2}{n} {(t r F_{n s})}^{2}] \neq 0

.

Assumption 9 is an important condition for the non-singularity of the limiting information matrix

Σ_{θ_{0}, n T}

in addition to the global identification in Lemma 5 and Theorem 1.

Lemma 5.

The information matrix

Σ_{θ_{0}, n T}

is non-singular.

The proofs of Lemmas 3–5 can be viewed in Appendix A. After the establishment of Lemma 3 and 4, Theorem 1 presents the consistency of

{\hat{θ}}_{n T}

if Assumption 9 holds, while Theorem 2 proves the consistency of

{\hat{θ}}_{n T}

if Assumption 9 is not satisfied.

Theorem 1.

Under Assumptions 1–9,

θ_{0}

is globally identifiable and

{\hat{θ}}_{n T}

is a consistent estimator of

θ_{0}

(similar to Ref. [2]).

Theorem 2.

Under Assumptions 1–8,

θ_{0}

is globally identifiable and

{{\hat{θ}}_{n T} \overset{P}{\to} θ}_{0}

if

\lim_{n \to \infty} (\frac{1}{n} l n |σ_{0}^{2} {A_{n}^{- 1} A'}_{n}^{- 1}| - \frac{1}{n} l n |{σ_{n}^{2} (ρ) A_{n}^{- 1} (ρ) A'}_{n}^{- 1} (ρ)|) \neq 0

for

ρ \neq ρ_{0}

(similar to Ref [2]).

Lemma 6.

Under Assumptions 1–9, if

p

is odd:

E [{\hat{g}}_{I N} (x) |{\{x_{i}\}}_{i = 1}^{n}] - g (x) = h^{p + 1} c_{1} [\frac{m^{(p + 1)} (x)}{(p + 1)!}] + o (h^{p + 1})

If

p

is even:

E [{\hat{g}}_{I N} (x) |{\{x_{i}\}}_{i = 1}^{n}] - g (x) = h^{p + 2} \{c_{2} [\frac{m^{(p + 1)} (x) f^{(1)} (x)}{f (x) (p + 1)!}] + c_{3} [\frac{m^{(p + 2)} (x)}{(p + 2)!}]\} + o (h^{p + 2})

In either of these cases, the variance is:

V a r [{\hat{g}}_{I N} (x) |{\{x_{i}\}}_{i = 1}^{n}] = h^{p + 1} [\frac{c_{4} σ^{2} (x)}{n h f (x)}] + o (\frac{1}{n h})

where

c_{i}, i = 1, 2, 3, 4

is the constant defined by Ruppert and Wand (1994).

Lemma 6 states that the first conditional deviation term depends on whether

p

is odd or even. From the Taylor series expansion, we know that when

|x_{i} - x| < h

, the remainder term of the expansion of a polynomial of order

p

should be of order

o (h^{p + 1})

, so the result of

p

being odd is easy to understand. When

p

is even,

p + 1

is odd, so the

h^{p + 1}

term is associated with

\int m^{l} k (m) d m

when

l

is odd. Because

k (m)

is an even function,

\int m^{l} k (m) d m = 0

. Therefore, there is no

h^{p + 1}

term, and the rest of the term becomes

o (h^{p + 2})

. Since

p

is either odd or even, the deviation term we see is

h

to an even power. This is similar to the case of using higher-order kernel functions based on symmetric kernel functions (even functions) for local constant estimates, where the deviation is always an even power of

h

. In summary, if

p

is odd,

{\hat{g}}_{I N} (x) - g (x) = O_{p} (h^{p + 1} + {(n h)}^{- 1 / 2})

, if

p

is even,

{\hat{g}}_{I N} (x) - g (x) = O_{p} (h^{p + 2} + {(n h)}^{- 1 / 2})

.

Theorem 3.

Under Assumptions 1–9,

{\hat{G}}_{I N, t} \overset{P}{\to} G_{n t}

.

The proofs of Theorems 1–3 can be viewed in Appendix B. Theorem 1 and 2 show that the PMLEs of parameters

θ = (δ', ρ, σ^{2})'

are consistent. And Theorem 3 shows that the PMLE

{\hat{g}}_{I N} (x)

of the unknown function

g (x)

is also consistent.

4. Monte Carlo Simulations: Methods and Results

In this section, all experiments were compiled using R language and plotted using the ‘ggplot2’ package.

For the parameter part, we generated samples from (1) and use

θ_{a} = (- 0.3, 0.4, 1, - 0.2)'

,

θ_{b} = (0.25, - 0.6, 0.5, 0.1)'

, where

θ_{0} = (γ_{0}, ρ_{0}, σ_{0}^{2}, τ_{0})'

.

\vec{x} = (x_{1}, x_{2}, \dots, x_{n})'

and

V_{n t} = (v_{1 t}, v_{2 t}, \dots, v_{n t})'

are generated from uniform distribution

U [- 3,3]

and independent normal distribution

N (0, σ_{0}^{2}), r e s p e c t i v e l y

. The spatial weight matrix

W_{n}

we used was the

R o o k

matrix, which is one of the main types of spatial weight matrices in spatial econometrics. For the non-parametric part, the kernel function we used is the commonly used Gaussian kernel function,

g (x) = 1.5 e^{- x^{2}}

. As it is difficult to select the optimal window width, we simply used the rule of thumb method. The spatial specific effect

α_{n 0}

is generated randomly in the standard normal distribution, which controls all spatial-fixed and time-invariant variables. Finally, we used the sample size

n = 10, 49, 100

and the total number of periods

T = 4, 10

. For each set of

n

and

T

, the sampling observations were generated with the Metropolis–Hastings sampling algorithm.

The evaluation of the simulation results should also be divided into parametric and non-parametric parts. In the parametric part, for each estimator, we calculate the standard deviation (Std) and root mean squared error (RMSE), where

R M S E = {[\frac{1}{n} \sum_{i = 1}^{r e p s} {(\hat{θ_{i}} - θ_{0})}^{2}]}^{1 / 2}, r e p s

is the number of simulations and

\hat{θ_{i}}, i = 1, \dots, r e p s

are the parameter estimates obtained from each simulation. In order to accurately estimate the parameter values, by Su (2012), we took the window width

h = s t d (x) \cdot n^{1 / 5}

here, where

s t d (x)

represents the standard deviation of sequence

x

. In the non-parametric part, we referred to Chen (2012) to choose the mean absolute deviation error (MADE) as the evaluation standard, which is

M A D E = \frac{1}{M} \sum_{j = 1}^{M} |\hat{g_{j}} (x_{m}) - g_{j} (x_{m})|, j = 1, \dots, r e p s,

where

{\{x_{m}\}}_{m = 1}^{M}

is the

M

fixed grid points selected within the support set of the

x

. We selected 20 fixed lattice points in

(- 2, 2)

, namely

M = 20

. When estimating the non-parametric part, we used the leave one out cross validation method to select the window width, that is, the window width minimizes

\frac{1}{n T} \sum_{t = 1}^{T} \sum_{i = 1}^{n} {({\hat{y}}_{i} - {\hat{g}}_{- i})}^{2}

, where

{\hat{y}}_{i}

is the i th element of

{\hat{Y}}_{t} = A_{n} (\hat{ρ}) Y_{n t} - Z_{n t} \hat{δ}

after the estimated value

\hat{θ}

,

{\hat{g}}_{- i}

is the estimate of

g (x_{i})

obtained with the observation value other than the i th observation.

For different cases of

n

and

T

, 100 simulations were carried out with R language. In each simulation, the Metropolis–Hastings sampling algorithm was used to conduct 1000 samples in the PML function. In order to obtain the distribution of samples close to reality and ensure that the state was stable, the first 200 sampling results were discarded. With two different values of

θ_{0}

for each

n

and

T

, the finite sample properties of both estimators are summarized in Table 1 and Table 2, in which we report the means, variances (Vars), root mean square error (RMSE), and coverage probability (CP). For each case, the estimated value of the parameter, that is, the means, is relatively close to the real value, and we can see that for each given

n

, when

T

is larger, the variance of estimators will be smaller; for each given

T

, when

n

is larger, the biases between the real value and the estimators will be nearly the same, but the variance will be smaller. When both

n

and

T

are maximized, that is,

n = 100, T = 10

, the variances and RMSEs of the parameter estimators are the smallest in all cases, which indicates that the parameter estimators will converge with the increase in

n

, which is consistent with the large sample property we have proven. Also, for different values of

θ_{0}

, the variances and RMSEs do not change much.

Table 1. Performance of spatial coefficient estimators with

θ_{a}

.

Table 2. Performance of spatial coefficient estimators with

θ_{b}

.

Figure 1, Figure 2, Figure 3 and Figure 4 show the variances and fitting curves of each parameter component in

{\hat{θ}}_{a}

and

{\hat{θ}}_{b}

under various combinations of

n

and

T

, where the horizontal axis is the number of simulation 0~100 and the vertical axis is the variances. The green points in the figure are the variances obtained from each simulation, and the red curve is fitted out from 100 variances. It can be clearly seen that when

n = 10

and 49, the variances of the estimated value of each parameter are distributed in

(0, 0.02)

, most of which are less than 0.01, and a minority of them are between 0.01 and 0.02. When

n = 100

, the variances are all less than 0.015, indicating that the overall fitting error is small. In addition, the number of points exceeding 0.01 in 100 points decreased significantly, which also indicates that the variances decrease with the increase in the sample size, and the fitting results are better. Moreover, from the shape of the fitting curve, the variance will converge after about 70 simulations, and the convergence value become smaller and smaller as the number of simulations increases, indicating that the variances of the estimators do not increase with the increase in the number of simulations and further indicating that the variances of the parameters tend to be stable. The comparison between Figure 1 and Figure 3 and Figure 2 and Figure 4, namely between the variance fitting curves of

{\hat{θ}}_{a}

and

{\hat{θ}}_{b}

when

T = 10

, shows that the convergence and range of variances do not change with the change in time period

T

, so the large sample property proven above can be confirmed.

Figure 1. Variance and fitting curve of

{\hat{θ}}_{a}

when

T = 4, n = 10, 49,

and 100.

Figure 2. Variance and fitting curve of

{\hat{θ}}_{a}

when

T = 10, n = 10, 49,

and 100.

Figure 3. Variance and fitting curve of

{\hat{θ}}_{b}

when

T = 4, n = 10, 49,

and 100.

Figure 4. Variance and fitting curve of

{\hat{θ}}_{b}

when

T = 10, n = 10, 49,

and 100.

Figure 5, Figure 6, Figure 7 and Figure 8 show the graph of the mean value and confidence interval of each parameter in

{\hat{θ}}_{a}

and

{\hat{θ}}_{b}

under various combinations of

n

and

T

, where the blue area is the range covered by the 95% confidence interval, the red broken line is the mean of the parameter estimates, and the yellow line is the true value of the parameters. As can be seen from the figures, due to the small, estimated variances, the means fluctuate very little around the true values, and the ranges of confidence intervals are also relatively stable. In only a few cases, the confidence intervals do not cover the true values, and with the increase in

n

and

T

, the coverage degree becomes higher and higher, that is, the coverage probability (CP) gradually approaches 1.

Figure 5. Plot of the mean and confidence interval of

{\hat{θ}}_{a}

when

T = 4, n = 10, 49,

and 100.

Figure 6. Plot of the mean and confidence interval of

{\hat{θ}}_{a}

when

T = 10, n = 10, 49,

and 100.

Figure 7. Plot of the mean and confidence interval of

{\hat{θ}}_{b}

when

T = 4, n = 10, 49,

and 100.

Figure 8. Plot of the mean and confidence interval of

{\hat{θ}}_{b}

when

T = 10, n = 10, 49,

and 100.

Table 3 shows the average absolute error and variance of the estimates of the unknown function

g (x) = 1.5 e^{- x^{2}}

under different samples. By comparing six simulation results under the initial values of different parameters, we can see that under the limited sample, when the period number T is fixed, the deviation between the estimated value and the true value of

g (x)

decreases with the increase in the sample size n, which is mainly represented as the MADE values decrease. This indicates that when T is the same, the estimated values of the parameters will converge with the increase in n; when n is fixed, the deviation between the estimated values and their true value will also decrease with the increase in T. Combining the above two results, it is not difficult to draw the following conclusion: the estimated values will converge to the true values of the parameters with the increase in n and T, which is consistent with the theoretical result of Theorem 3.

Table 3. Performance of unknown function estimators

{\hat{G}}_{n t}

.

5. Empirical Application: Spatial Spillovers in the Yangtze River Delta

We selected the panel data of 16 cities in the Yangtze River Delta region (Shanghai, Hangzhou, Jiaxing, Huzhou, Ningbo, Shaoxing, Zhoushan, Nanjing, Suzhou, Wuxi, Changzhou, Zhenjiang, Nantong, Yangzhou, Taizhou, and Taizhou) from 2019 to 2021 (data source: Statistical Yearbook of Shanghai, Zhejiang and Jiangsu 2020–2022) to study the relationship between urban tourism development and economic growth in the Yangtze River Delta (YRD). The YRD city cluster is an important intersection area of the “Belt and Road” and the Yangtze River Economic Belt. It plays a pivotal strategic role in China’s modernization and opening-up pattern and is an important platform for China to participate in international competition and an important leader in economic and social development. The YRD city cluster has a vast economic hinterland, modern river and seaports and airports, a relatively sound highway network, leading the country to having an increasing density of road and rail transportation lines, and a three-dimensional comprehensive transportation network, which has important conditions for economic agglomeration. Economic agglomeration is a general state of economic development, representing the geographical and spatial concentration of economic activities. Due to the influence of objective factors such as location condition, ecological environment, development basis, and market development degree, there are significant differences in the tourism development modes of the different cities. However, there are close economic relations between neighboring cities in space, that is, the development of tourism in a certain region not only has a direct impact on the local economy but also has a spillover effect on the economy of its neighboring region. In addition to the spatial lag in the same period, there may also be time and space lag or diffusion, that is, under the premise of spatial interaction between regions, the spatial and temporal linkage of inter-regional tourism development and economic growth will be further enhanced. Therefore, the spatial panel analysis of the spatial spillover effect of economic agglomeration can reveal the possible economic agglomeration phenomenon between regions from the temporal and spatial dimensions, and more objectively and scientifically study the structure and process of regional economic development.

Let

y_{i t}

denote the gross domestic product (GDP) in city

i

at period

t

. The

n \times n

spatial weights matrix is

W_{n}

and

W_{n, i j} = 1

if cities

i

and

j

share the same border and

W_{n, i j} = 0

otherwise. The tourism dynamic variables are constructed by the number of tourists travelling to every city at period

t

, including international inbound tourists and domestic tourists, which are recorded as

{I n t}_{t}

and

{D o m}_{t}

, respectively. GDP is likely to be influenced by a number of macroeconomic factors, which are reflected by fixed effects. The model consists of:

y_{i t} = ρ \sum_{j = 1}^{n} W_{n, i j} y_{j t} + γ y_{j, t - 1} + τ \sum_{j = 1}^{n} W_{n, i j} y_{j, t - 1} + g_{i t} ({D o m}_{t}, {I n t}_{t}) + α_{i} + ε_{i t}

First of all, we fit the data in the ‘nlme’ package of R language to obtain the form of the non-parametric part as follows:

g_{i t} ({D o m}_{t}, {I n t}_{t}) = {D o m}_{t}^{a} * {I n t}_{t}^{b}

The estimated results of parameters

a

and

b

are shown in Table 4, which all pass the 1% significance test and reveal interesting spatial patterns. Since the expression

g_{i t} ({D o m}_{t}, {I n t}_{t})

is in power exponential form, it means that the increase in tourist numbers in a region has a positive impact on local economic development, which is consistent with the inference. The power index (

a

) of domestic tourists is negative, and the index (

b

) of international inbound tourists is positive, indicating that the increase in international inbound tourist arrivals will greatly promote local economic development, but the increase in domestic tourists will weaken some economic growth generated by tourism. This is related to the consumption habits of domestic tourists and the local tourism reception capacity: first, domestic tourists usually use various preferential and discount apps to book tickets and accommodation in advance, while foreign tourists lack understanding and conditions for this, and second, tourist souvenirs are an important part of tourism profits. They are highly attractive to international tourists, while domestic tourists often choose to buy them online rather than in scenic areas. Finally, China is a populous country, and the local tourism reception capacity is limited. If there are too many domestic tourists at the same time, such as during holidays, it will inevitably affect the travel and consumption experience of foreign tourists. On the other hand, the parameter SDPD model is a linear relationship with a predetermined part of the equation, which is obviously inconsistent with the actual data relationship and may lead to completely different conclusions, which is wrong. In the parametric SDPD model, the linear relationship

a * {D o m}_{t} + b * {I n t}_{t}

at

g_{i t} ({D o m}_{t}, {I n t}_{t})

is defined in advance, which is obviously inconsistent with the actual data relationship and may lead to completely different and biased conclusions.

Table 4. Estimation of the non-parametric function.

Table 5 reports the estimated results. Firstly, it shows that the economic development of a city has a positive impact on the neighboring region (

ρ

), that is, a city with a high degree of economic activity leads to the growth of the economic activity of the neighboring city through knowledge spillover, technological innovation, industrial experience, etc. The spatial agglomeration phenomenon of economic development in the YRD region shows a positive spatial correlation. The positive time lag coefficient (

γ

) reflects that the economic development of each city in the previous period has a positive impact on itself from 2019 to 2021. This is consistent with the positioning that these 16 cities are the most economically developed and the most valuable urban agglomerations in China. These cities have already determined their own development direction and needs, and with the support of national policies and talents, economic and social development has been steadily improving year on year. Note that

τ

and diffusion (

ρ

) have opposite signs, which is a very enlightening discovery. This shows that although the 16 cities are urban clusters that develop together and help each other, their economic development structures are different. Factors that produce spatial effects at the same time, such as industry experience, may be able to learn and emulate in the short term, but they are not suitable for long-term application. Local governments need to develop characteristic economies according to their own economic structure and regional characteristics. On the other hand, the fixed effect (

α

) is positive, indicating that the regional economic development process has unique characteristics that do not change with time, including additional individual effects and time effects such as in special cases. There, it is necessary to use the NSDPD model with fixed effect.

Table 5. Estimation of economic development.

6. Conclusions and Future Work

In this paper, the SDPD model with fixed effects was extended to a non-parametric form, which relaxed the setting that the influence of explanatory variables is a known linear or nonlinear parameter structure, so that the influence of explanatory variables can be any unknown function satisfying certain conditions. Then, we proposed a PML method to estimate the spatial correlation coefficients and unknown functions. The theoretical proof showed that the parameters and non-parametric estimators obtained by the PML method were consistent under certain regular conditions. Numerical simulation showed that the estimator has a good small sample property, and the estimation accuracy increased with the increase in the sample size and time periods.

So far,

α_{n 0}

represents space-specific effects, which controls all variables with fixed space and constant time in the model. It may be of interest to allow more extensive forms of

α_{n 0}

, such as interactive fixed effects in [9], time-varying regression coefficients, time-varying spatial coefficients, time-varying spatial weight matrices, etc., even in the non-fixed space case or completely at random. In terms of theoretical development, our results can be extended to allow for time-varying regression coefficients but may not apply to other types of heterogeneity. Furthermore, the cross-sectional heteroskedasticity (space-varying error variances) in the NSDPD model is another interesting extension to consider. It is a good idea to relax the condition from the assumption of homoscedasticity to the assumption that the random error is sequential and cross-sectional dependent. These models and methods would be much more challenging than the already quite challenging works presented in this paper and will be the topic of our future research.

Author Contributions

Conceptualization, M.Z. and B.T.; methodology, M.Z. and B.T.; software, M.Z.; validation, M.Z. and B.T.; formal analysis, M.Z. and B.T.; investigation, M.Z. and B.T.; resources, M.Z.; data curation, M.Z.; writing—original draft preparation, M.Z.; writing—review and editing, M.Z. and B.T.; visualization, M.Z. and B.T.; supervision, B.T.; project administration, B.T.; funding acquisition, B.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by [National Natural Science Foundation of China] grant number [91646106].

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the requirements of related projects supported by [National Natural Science Foundation of China].

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Some Basic Lemmas

Proof of Lemma 3.

From

(I_{n} - S) Y_{t} = {\tilde{V}}_{n t} (ζ) = A_{n} (ρ) {\tilde{Y}}_{n t} - {\tilde{Z}}_{n t} δ - {\tilde{G}}_{n t}

, we have

{\tilde{V}}_{n t} (ζ) = {\tilde{V}}_{n t} - {{\tilde{Z}}_{n t} (δ - δ_{0}) - (ρ - ρ_{0}) W}_{n} {\tilde{Y}}_{n t}

. Hence:

{\tilde{V}}_{n t} (ζ)' {\tilde{V}}_{n t} (ζ) = {\tilde{V}}_{n t}' {\tilde{V}}_{n t} - 2 (δ - δ_{0})' {\tilde{Z}}_{n t}' {\tilde{V}}_{n t} - 2 (ρ - ρ_{0}) (W_{n} {\tilde{Y}}_{n t})' {\tilde{V}}_{n t} + {(ρ - ρ_{0})}^{2} (W_{n} {\tilde{Y}}_{n t})' (W_{n} {\tilde{Y}}_{n t}) + (δ - δ_{0})' {\tilde{Z}}_{n t}' {\tilde{Z}}_{n t} (δ - δ_{0}) + 2 (ρ - ρ_{0}) (W_{n} {\tilde{Y}}_{n t})' {\tilde{Z}}_{n t} (δ - δ_{0})

where using

W_{n} {\tilde{Y}}_{n t} = W_{n} A_{n}^{- 1} [{\tilde{Z}}_{n t} δ + {(I_{n} - S)}^{- 1} {\tilde{V}}_{n t}]

and

{F_{n} = W}_{n} A_{n}^{- 1}

,

{[W_{n} {\tilde{Y}}_{n t}]}^{' [W_{n} {\tilde{Y}}_{n t}]} = δ_{0}^{' {\tilde{Z}}_{n t}' F_{n}' F_{n} {\tilde{Z}}_{n t} δ_{0}} + [F_{n} {(I_{n} - S)}^{- 1} {\tilde{V}}_{n t}]' [F_{n} {(I_{n} - S)}^{- 1} {\tilde{V}}_{n t}] + 2 δ_{0}' {\tilde{Z}}_{n t}' F_{n}' [F_{n} {(I_{n} - S)}^{- 1} {\tilde{V}}_{n t}]

Using Lemma 1:

\frac{1}{n T} \sum_{t = 1}^{T} {\tilde{V}}_{n t}' {\tilde{V}}_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} {\tilde{V}}_{n t}' {\tilde{V}}_{n t}) \overset{P}{\to} 0 \frac{1}{n T} \sum_{t = 1}^{T} {\tilde{Z}}_{n t}' {\tilde{Z}}_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} {\tilde{Z}}_{n t}' {\tilde{Z}}_{n t}) \overset{P}{\to} 0 \frac{1}{n T} \sum_{t = 1}^{T} {\tilde{Z}}_{n t}' {\tilde{V}}_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} {\tilde{Z}}_{n t}' {\tilde{V}}_{n t}) \overset{P}{\to} 0 \frac{1}{n T} \sum_{t = 1}^{T} [W_{n} {\tilde{Y}}_{n t}]' {\tilde{V}}_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} [W_{n} {\tilde{Y}}_{n t}]' {\tilde{V}}_{n t}) \overset{P}{\to} 0 \frac{1}{n T} \sum_{t = 1}^{T} [W_{n} {\tilde{Y}}_{n t}]' {\tilde{Z}}_{n t} - E (\frac{1}{n T} \sum_{t = 1}^{T} [W_{n} {\tilde{Y}}_{n t}]' {\tilde{Z}}_{n t}) \overset{P}{\to} 0 \frac{1}{n T} \sum_{t = 1}^{T} [W_{n} {\tilde{Y}}_{n t}]' [W_{n} {\tilde{Y}}_{n t}] - E (\frac{1}{n T} \sum_{t = 1}^{T} [W_{n} {\tilde{Y}}_{n t}]' [W_{n} {\tilde{Y}}_{n t}]) \overset{P}{\to} 0

As

δ

and

ρ

are bounded in

Θ

, we have

\frac{1}{n T} \sum_{t = 1}^{T} {\tilde{V}}_{n t} (ζ)' {\tilde{V}}_{n t} (ζ) - \frac{1}{n T} E \sum_{t = 1}^{T} {\tilde{V}}_{n t} (ζ)' {\tilde{V}}_{n t} (ζ) \overset{P}{\to} 0

uniformly in θ in Θ. Using the fact that

σ^{2}

is bounded away from zero in

Θ

and

Q_{n, T} (θ) = \frac{1}{n T} E [\ln L_{n, T} (θ)]

,

\begin{array}{l} \frac{1}{n T} \ln L_{n, T} (θ) - Q_{n, T} (θ) \\ = \frac{1}{2 σ^{2}} (\sum_{t = 1}^{T} {\tilde{V}}_{n t} (ζ)' {\tilde{V}}_{n t} (ζ) - \frac{1}{n T} E \sum_{t = 1}^{T} {\tilde{V}}_{n t} (ζ)' {\tilde{V}}_{n t} (ζ)) \overset{P}{\to} 0 \end{array}

(A1)

uniformly at

θ

. □

Proof of Lemma 4.

(similar to Ref. [2]). From

{\hat{G}}_{I N} = S Y_{t}

and

{\tilde{V}}_{n t} (ζ) = (I_{n} - S) Y_{t}

, we have:

{\tilde{V}}_{n t} (ζ) = A_{n} (ρ) A_{n}^{- 1} ({\tilde{Z}}_{n t} δ_{0} + {\tilde{G}}_{I N} + {\tilde{V}}_{n t}) - {\tilde{Z}}_{n t} δ - {\tilde{G}}_{I N} = A_{n} (ρ) A_{n}^{- 1} {\tilde{Z}}_{n t} δ_{0} - {\tilde{Z}}_{n t} δ + A_{n} (ρ) A_{n}^{- 1} {\tilde{V}}_{n t} + A_{n} (ρ) A_{n}^{- 1} {\tilde{G}}_{I N} - {\tilde{G}}_{I N} = A_{n} (ρ) A_{n}^{- 1} {\tilde{Z}}_{n t} δ_{0} - {\tilde{Z}}_{n t} δ + (A_{n} (ρ) A_{n}^{- 1} - S) {(I_{n} - S)}^{- 1} {\tilde{V}}_{n t}

(A2)

where

{\tilde{G}}_{I N} = {\hat{G}}_{I N, t} - {\hat{G}}_{I N, T}

,

{\hat{G}}_{I N, T} = \frac{1}{T} \sum_{t = 1}^{T} {\hat{G}}_{I N, t}

, and

Q_{n, T} (θ) = \frac{1}{n T} E [\ln L_{n, T} (θ)] = - \frac{1}{2} l n (2 π + 1) + \frac{1}{n} l n |A_{n} (ρ)| - \frac{1}{2 σ^{2}} E \frac{1}{n T} \sum_{t = 1}^{T} {\tilde{V}}_{n t} (ζ)' {\tilde{V}}_{n t} (ζ)

, where:

\begin{array}{l} E \frac{1}{n T} \sum_{t = 1}^{T} {\tilde{V}}_{n t} (ζ)' {\tilde{V}}_{n t} (ζ) \\ = \frac{1}{n T} E \sum_{t = 1}^{T} [A_{n} (ρ) A_{n}^{- 1} {\tilde{Z}}_{n t} δ_{0} - {\tilde{Z}}_{n t} δ]' [A_{n} (ρ) A_{n}^{- 1} {\tilde{Z}}_{n t} δ_{0} - {\tilde{Z}}_{n t} δ] \\ + \frac{1}{n} \frac{T - 1}{T} σ_{0}^{2} t r \{[(A_{n} (ρ) A_{n}^{- 1} - S) {(I_{n} - S)}^{- 1}]' [(A_{n} (ρ) A_{n}^{- 1} - S) {(I_{n} - S)}^{- 1}]\} \\ + \frac{2}{n T} \sum_{t = 1}^{T} [A_{n} (ρ) A_{n}^{- 1} {\tilde{Z}}_{n t} δ_{0} - {\tilde{Z}}_{n t} δ]' [(A_{n} (ρ) A_{n}^{- 1} - S) {(I_{n} - S)}^{- 1}] {\tilde{V}}_{n t} \end{array}

(A3)

According to Lemma 1, the third term

\frac{2}{n T} \sum_{t = 1}^{T} [A_{n} (ρ) A_{n}^{- 1} {\tilde{Z}}_{n t} δ_{0} - {\tilde{Z}}_{n t} δ]' [(A_{n} (ρ) A_{n}^{- 1} - S) {(I_{n} - S)}^{- 1}] {\tilde{V}}_{n t} = O (\frac{1}{T})

, and

O (\frac{1}{T})

is uniformly in

θ

in

Θ

because it is a polynomial function in θ, and Θ is a bounded set. The second term is equal to

σ_{n}^{2} (ρ)

, where

σ_{n}^{2} (ρ) = {\frac{1}{n} σ}_{0}^{2} t r \{[(A_{n} (ρ) A_{n}^{- 1} - S) {(I_{n} - S)}^{- 1}]' [(A_{n} (ρ) A_{n}^{- 1} - S) {(I_{n} - S)}^{- 1}]\}

, which are polynomial functions of

θ

, that is, uniformly in θ. Using

A_{n} (ρ) A_{n}^{- 1} = I_{n} - (ρ - ρ_{0}) F_{n}

in the first term, we have:

\begin{array}{l} \frac{1}{n T} E \sum_{t = 1}^{T} [A_{n} (ρ) A_{n}^{- 1} {\tilde{Z}}_{n t} δ_{0} - {\tilde{Z}}_{n t} δ]' [A_{n} (ρ) A_{n}^{- 1} {\tilde{Z}}_{n t} δ_{0} - {\tilde{Z}}_{n t} δ] \\ = \frac{1}{n T} E \sum_{t = 1}^{T} [{\tilde{Z}}_{n t} (δ - δ_{0}) + (ρ - ρ_{0}) F_{n} {\tilde{Z}}_{n t} δ_{0}]' [Z_{n t} (δ - δ_{0}) + (ρ - ρ_{0}) F_{n} {\tilde{Z}}_{n t} δ_{0}] \\ = (δ^{'} - δ_{0}^{'}, ρ - ρ_{0}) E Λ_{n T} (δ^{'} - δ_{0}^{'}, ρ - ρ_{0}) \end{array}

where

Λ_{n T} = [\begin{matrix} \frac{1}{n T} \sum_{t = 1}^{T} {\tilde{Z}}_{n t}' {\tilde{Z}}_{n t} & \frac{1}{n T} \sum_{t = 1}^{T} (F_{n} {\tilde{Z}}_{n t} δ_{0})' {\tilde{Z}}_{n t} \\ \frac{1}{n T} \sum_{t = 1}^{T} (F_{n} {\tilde{Z}}_{n t} δ_{0})' {\tilde{Z}}_{n t} & \frac{1}{n T} \sum_{t = 1}^{T} (F_{n} {\tilde{Z}}_{n t} δ_{0})' (F_{n} {\tilde{Z}}_{n t} δ_{0}) \end{matrix}]

(A4)

To prove

Q_{n, T} (ζ)

is uniformly equicontinuous for

θ \in Θ

, the following four conditions must be true: (1)

l n σ^{2}

is uniformly continuous:

σ^{2}

is bounded away from zero in

Θ

, so (1) is obvious. (2)

\frac{1}{n} l n |A_{n} (ρ)|

is uniformly equicontinuous: we know

\frac{1}{n} l n |A_{n} (ρ_{2})| - \frac{1}{n} l n |A_{n} (ρ_{1})| = \frac{1}{n} t r (W_{n} A_{n}^{- 1} (\bar{ρ})) (ρ_{2} - ρ_{1})

, where

\bar{ρ}

is between

ρ_{1}

and

ρ_{2}

. Because

A_{n}^{- 1} (ρ)

is UB, uniformly in

θ

in

Θ

,

\frac{1}{n} t r (W_{n} A_{n}^{- 1} (\bar{ρ}))

is bounded. Hence, (2) is true. (3) The first term, that is,

(δ^{'} - δ_{0}^{'}, ρ - ρ_{0}) {E Λ}_{n T} (δ^{'} - δ_{0}^{'}, ρ - ρ_{0})

’ is uniformly equicontinuous: both

ρ

and

δ

are bounded, with

{E Λ}_{n T} = O (1)

. Hence, (3) is true. (4)

σ_{n}^{2} (ρ)

is uniformly equicontinuous: by

σ_{n}^{2} (ρ) = {\frac{1}{n} σ}_{0}^{2} t r \{[(A_{n} (ρ) A_{n}^{- 1} - S) {(I_{n} - S)}^{- 1}]' [(A_{n} (ρ) A_{n}^{- 1} - S) {(I_{n} - S)}^{- 1}]\}

and

A_{n} (ρ) A_{n}^{- 1} = I_{n} - (ρ - ρ_{0}) F_{n}

, we have:

σ_{n}^{2} (ρ_{2}) {- σ}_{n}^{2} (ρ_{1}) = σ_{0}^{2} (ρ_{2} - ρ_{1}) \frac{1}{n} [(ρ_{2} + ρ_{1} - 2 ρ_{0}) t r F_{n}' F_{n} - t r (F_{n}' + F_{n})]

And because

F_{n}' F_{n}

and

F_{n}

are UB, (4) is true. □

Proof of Lemma 5.

We can follow Lee (2004) by using a contradiction to prove the result. Firstly, we assume

α = (α_{1}, α_{2}, α_{3})',

where

α_{1}, α_{2}

, and

α_{3}

are scalars. Next, for

Σ_{θ_{0}} \equiv \lim_{T \to \infty} Σ_{θ_{0}, n T}

, we need to prove that

Σ_{θ_{0}} α = 0

implies

α = 0

. If this is true, then columns of

Σ_{θ_{0}}

would be linear independent and

Σ_{θ_{0}}

would be nonsingular.

From (15):

Σ_{θ_{0}} = \frac{1}{σ_{0}^{2}} \times [\begin{matrix} E Φ_{Z Z} & E Φ_{Z R} & 0 \\ E Φ_{R Z} & E Φ_{R R} + \lim_{n \to \infty} \frac{σ_{0}^{2}}{n} [t r (F_{n}^{2}) + t r (F_{n s}' F_{n s})] & \lim_{n \to \infty} \frac{1}{n} t r [F_{n s}] \\ 0 & \lim_{n \to \infty} \frac{1}{n} t r [F_{n s}] & \frac{1}{2 σ_{0}^{2}} \end{matrix}]

(A5)

where

Φ_{Z Z} = \lim_{T \to \infty} \frac{1}{n T} \sum_{t = 1}^{T} {\tilde{Z}}_{n s}' {\tilde{Z}}_{n s}

,

Φ_{Z R} = \lim_{T \to \infty} \frac{1}{n T} \sum_{t = 1}^{T} {\tilde{Z}}_{n s}' {\tilde{R}}_{n s}

,

Φ_{R Z} = Φ_{Z R}'

,

Φ_{R R} = \lim_{T \to \infty} \frac{1}{n T} \sum_{t = 1}^{T} {\tilde{R}}_{n s}' {\tilde{R}}_{n s}

. Hence,

Σ_{θ_{0}} α = 0

implies:

\begin{array}{l} \frac{1}{σ_{0}^{2}} E Φ_{Z Z} \times α_{1} + \frac{1}{σ_{0}^{2}} E Φ_{Z R} \times α_{2} = 0 \\ \frac{1}{σ_{0}^{2}} E Φ_{R Z} \times α_{1} + \{\frac{1}{σ_{0}^{2}} E Φ_{R R} + \lim_{n \to \infty} \frac{1}{n} [t r (F_{n}^{2}) + t r (F_{n s}' F_{n s})]\} \times α_{2} \\ + \lim_{n \to \infty} \frac{1}{n σ_{0}^{2}} t r [F_{n s}] \times α_{3} = 0 \\ \lim_{n \to \infty} \frac{1}{n σ_{0}^{2}} t r [F_{n s}] \times α_{2} + (\frac{1}{{2 σ}_{0}^{4}} + \frac{1}{{n σ}_{0}^{4}}) \times α_{3} = 0 \end{array}

(A6)

The first and third equations imply,

α_{1} = - {(E Φ_{Z Z})}^{- 1} E Φ_{Z R} \times α_{2}

and

α_{3} = - 2 \lim_{n \to \infty} \frac{σ_{0}^{2}}{n + 2} t r [F_{n s}] \times α_{2}

, respectively. By eliminating

α_{1}

and

α_{3}

, the second equation becomes

\{\frac{1}{σ_{0}^{2}} [E Φ_{R R} - E Φ_{R Z} {(E Φ_{Z Z})}^{- 1} E Φ_{Z R}] + \lim_{n \to \infty} \frac{1}{n} [t r (F_{n}^{2}) + t r (F_{n s}' F_{n s}) + \frac{2}{n + 2} {(t r (F_{n s}))}^{2}]\} \times α_{2} = 0

.

Assume

Φ_{n T} = \frac{1}{σ_{0}^{2}} [\begin{matrix} Φ_{Z Z} & Φ_{Z R} \\ Φ_{R Z} & Φ_{R R} \end{matrix}]

under Assumption 9 and

{E Φ}_{n T}

is nonsingular, the above formula is true only if

α_{2} = 0

, that is

α = 0

. The information matrix

Σ_{θ_{0}, n T}

is nonsingular. □

Appendix B. Proof of the Theoretical Results

Proof of Theorem 1.

As

E \sum_{t = 1}^{T} {\tilde{V}}_{n t}' {\tilde{V}}_{n t} = n T σ_{0}^{2}

, at

θ_{0}

, (10) implies

E [\ln L_{n, T} (θ_{0})] = - \frac{n T}{2} l n 2 π - \frac{n T}{2} l n σ_{0}^{2} + T l n |A_{n}| - \frac{n T}{2}

. Then, we have:

\begin{array}{l} \frac{1}{n T} E [\ln L_{n, T} (θ)] - \frac{1}{n T} E [\ln L_{n, T} (θ_{0})] \\ = - \frac{1}{2} (l n σ_{0}^{2} - l n σ^{2}) + \frac{1}{n} l n |A_{n} (ρ)| - \frac{1}{n} l n |A_{n}| - [\frac{1}{n T} \frac{1}{2 σ^{2}} E \sum_{t = 1}^{T} {\tilde{V}}_{n t} (ζ)' {\tilde{V}}_{n t} (ζ) - \frac{1}{2}] \\ = T_{1, n} (ρ, σ^{2}) - \frac{1}{2 σ^{2}} T_{2, n} (ρ, δ) + o (1), where, T_{1, n} (ρ, σ^{2}) = - \frac{1}{2} (l n σ_{0}^{2} - l n σ^{2}) + \frac{1}{n} l n |A_{n} (ρ)| - \frac{1}{n} l n |A_{n}| - \frac{1}{2 σ^{2}} [σ_{n}^{2} (ρ) - σ^{2}] and T_{2, n} (ρ, δ) = \frac{1}{n T} E \sum_{t = 1}^{T} [{\tilde{Z}}_{n t} (δ - δ_{0}) + (ρ - ρ_{0}) F_{n} {\tilde{Z}}_{n t} δ_{0}]' [{\tilde{Z}}_{n t} (δ - δ_{0}) + (ρ - ρ_{0}) F_{n} {\tilde{Z}}_{n t} δ_{0}] \end{array}

Consider the process

Y_{n t} = ρ_{0} W_{n} Y_{n t} + {\tilde{V}}_{n t}

, for a period t, the log likelihood function of it is:

\ln L_{p, n} (ρ, σ^{2}) = - \frac{n}{2} l n (2 π σ^{2}) + l n |A_{n} (ρ)| - \frac{1}{2 σ^{2}} \sum_{t = 1}^{T} [A_{n} (ρ) Y_{n t}]' [A_{n} (ρ) Y_{n t}]

(A7)

Let

E_{p} (\cdot)

be the expectation operator for

Y_{n t}

, we have:

\begin{array}{l} E_{p} [\frac{1}{n} \ln L_{p, n} (ρ, σ^{2})] - E_{p} [\frac{1}{n} \ln L_{p, n} (ρ_{0}, σ_{0}^{2})] \\ = - \frac{1}{2} [l n σ^{2} - l n σ_{0}^{2}] + \frac{1}{n} l n |A_{n} (ρ)| - \frac{1}{n} l n |A_{n}| - \frac{1}{2 σ^{2}} [σ_{n}^{2} (ρ) - σ^{2}] \\ = T_{1, n} (ρ, σ^{2}) \end{array}

Through the information inequality,

\ln L_{p, n} (ρ, σ^{2}) - \ln L_{p, n} (ρ_{0}, σ_{0}^{2}) \leq 0

. Thus,

T_{1, n} (ρ, σ^{2}) \leq 0

for any

(ρ, σ^{2})

. Also,

T_{2, n} (ρ, δ)

is a quadratic function of

ρ

and

δ

. Under the condition that

{E Φ}_{n T}

is nonsingular,

T_{2, n} (ρ, δ) > 0

whenever

(ρ, δ) \neq (ρ_{0}, δ_{0})

, so

(ρ, δ)

is globally identified. Given

ρ_{0}, σ_{0}^{2}

is the unique maximizer of

T_{1, n} (ρ, σ^{2})

. Hence,

(ρ, δ, σ^{2})

is globally identified. Combined with uniform convergence and equicontinuity in Lemma 4–5, the consistency follows. □

Proof of Theorem 2.

From proof of Theorem 1,

\frac{1}{n T} E [\ln L_{n, T} (θ)] - \frac{1}{n T} E [\ln L_{n, T} (θ_{0})] = T_{1, n} (ρ, σ^{2}) - \frac{1}{2 σ^{2}} T_{2, n} (ρ, δ) + o (1)

. When

{E Φ}_{n T}

is singular,

δ_{0}

and

ρ_{0}

cannot be identified from

T_{2, n} (ρ, δ)

. Global identification requires that the limit of

T_{1, n} (ρ, σ^{2})

is strictly less than zero.TABLE As

T_{1, n} (ρ, σ^{2}) \leq 0

through the information inequality,

T_{1, n} (ρ, σ^{2}) \neq 0

is equivalent to:

\lim_{n \to \infty} (\frac{1}{n} l n |σ_{0}^{2} {A_{n}^{- 1} A'}_{n}^{- 1}| - \frac{1}{n} l n |{σ_{n}^{2} (ρ) A_{n}^{- 1} (ρ) A'}_{n}^{- 1} (ρ)|) \neq 0

(See Lee (2004)). After

ρ_{0}

and

σ_{0}^{2}

are identified, given

ρ_{0}

,

δ_{0}

can be identified from

T_{2, n} (ρ, δ)

. Combined with the uniform convergence and equicontinuity in Lemma 4–5, the consistency follows. □

Proof of Theorem 3.

As (7), we have

{\hat{G}}_{I N, t} = ({\hat{g}}_{I N} (x_{1}), {\hat{g}}_{I N} (x_{2}), \dots, {\hat{g}}_{I N} (x_{n}))'

,

{\hat{G}}_{I N, t} - G_{n t} = ({\hat{g}}_{I N} (x_{1}) - g (x_{1}), {\hat{g}}_{I N} (x_{2}) - g (x_{2}), \dots, {\hat{g}}_{I N} (x_{n}) - g (x_{n}))'

(A8)

From Lemma 6 and Assumption 8:

if $p$ is odd, ${\hat{g}}_{I N} (x) - g (x) = O_{p} (h^{p + 1} + {(n h)}^{- 1 / 2})$ ,
if $p$ is even, ${\hat{g}}_{I N} (x) - g (x) = O_{p} (h^{p + 2} + {(n h)}^{- 1 / 2})$ . □

References

Neyman, J.; Scott, E.L. Consistent estimates based on partially consistent observations. Econometrica 1948, 16, 1–32. [Google Scholar] [CrossRef]
Yu, J.; de Jong, R.; Lee, L.F. Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and T are large. J. Econom. 2008, 146, 118–134. [Google Scholar] [CrossRef]
Lee, L.F.; Yu, J. Estimation of spatial panel model with fixed effects. Econometrics 2010, 154, 165–185. [Google Scholar] [CrossRef]
Lee, L.F.; Yu, J. A spatial dynamic panel data model with both time and individual fixed effects. Econom. Theory 2010, 26, 564–597. [Google Scholar] [CrossRef]
Yu, J.; de Jong, R.; Lee, L.F. Estimation for spatial dynamic panel data with fixed effects: The case of spatial cointegration. J. Econom. 2012, 167, 16–37. [Google Scholar] [CrossRef]
Jin, F.; Lee, L.F.; Yu, J. First difference estimation of spatial dynamic panel data models with fixed effects. Econ. Lett. 2020, 189, 109010. [Google Scholar] [CrossRef]
Lin, X.; Carroll, R.J. Nonparametric function estimation for clustered data when the predictor is measured without/with error. J. Am. Stat. Assoc. 2000, 95, 520–534. [Google Scholar] [CrossRef]
Henderson, D.J.; Carroll, R.J.; Li, Q. Nonparametric estimation and testing of fixed effects panel data models. J. Econom. 2008, 144, 257–275. [Google Scholar] [CrossRef] [PubMed]
Shi, W.; Lee, L.F. Spatial dynamic panel data models with interactive fixed effects. Econometrics 2017, 197, 323–347. [Google Scholar] [CrossRef]

Figure 1. Variance and fitting curve of

{\hat{θ}}_{a}

when

T = 4, n = 10, 49,

and 100.

Figure 2. Variance and fitting curve of

{\hat{θ}}_{a}

when

T = 10, n = 10, 49,

and 100.

Figure 3. Variance and fitting curve of

{\hat{θ}}_{b}

when

T = 4, n = 10, 49,

and 100.

Figure 4. Variance and fitting curve of

{\hat{θ}}_{b}

when

T = 10, n = 10, 49,

and 100.

Figure 5. Plot of the mean and confidence interval of

{\hat{θ}}_{a}

when

T = 4, n = 10, 49,

and 100.

Figure 6. Plot of the mean and confidence interval of

{\hat{θ}}_{a}

when

T = 10, n = 10, 49,

and 100.

Figure 7. Plot of the mean and confidence interval of

{\hat{θ}}_{b}

when

T = 4, n = 10, 49,

and 100.

Figure 8. Plot of the mean and confidence interval of

{\hat{θ}}_{b}

when

T = 10, n = 10, 49,

and 100.

Table 1. Performance of spatial coefficient estimators with

θ_{a}

.

Table 1. Performance of spatial coefficient estimators with

θ_{a}

.

$n$		T = 4				T = 10
$n$		RMSE	Mean	Vars	CP	RMSE	Mean	Vars	CP
10	$γ$	0.0348	−0.309	0.0076	0.73	0.0307	−0.295	0.0067	0.79
	$ρ$	0.0361	0.419	0.0064	0.83	0.0224	0.412	0.0058	0.87
	$σ^{2}$	0.0287	1.014	0.0067	0.81	0.0262	1.012	0.0056	0.89
	$τ$	0.0381	−0.221	0.0069	0.77	0.0287	−0.192	0.0064	0.83
49	$γ$	0.0334	−0.306	0.0059	0.82	0.0283	−0.296	0.0051	0.85
	$ρ$	0.0323	0.411	0.0057	0.87	0.0257	0.405	0.0049	0.92
	$σ^{2}$	0.0245	1.011	0.0056	0.86	0.0214	1.008	0.0053	0.94
	$τ$	0.0319	−0.214	0.0059	0.85	0.0244	−0.207	0.0052	0.89
100	$γ$	0.0327	−0.297	0.0044	0.89	0.0221	−0.302	0.0043	0.94
	$ρ$	0.0308	0.406	0.0046	0.90	0.0218	0.403	0.0042	0.92
	$σ^{2}$	0.0206	1.005	0.0051	0.92	0.0183	0.998	0.0047	0.93
	$τ$	0.0269	−0.208	0.0052	0.89	0.0184	−0.203	0.0041	0.90

Table 2. Performance of spatial coefficient estimators with

θ_{b}

.

Table 2. Performance of spatial coefficient estimators with

θ_{b}

.

$n$		T = 4				T = 10
$n$		RMSE	Mean	Vars	CP	RMSE	Mean	Vars	CP
10	$γ$	0.0263	0.239	0.0071	0.78	0.0247	0.241	0.0066	0.87
	$ρ$	0.0248	−0.587	0.0069	0.75	0.0214	−0.590	0.0062	0.79
	$σ^{2}$	0.0251	0.488	0.0073	0.83	0.0226	0.511	0.0064	0.88
	$τ$	0.0229	0.121	0.0072	0.77	0.0203	0.117	0.0065	0.86
49	$γ$	0.0237	0.257	0.0063	0.86	0.0221	0.255	0.0057	0.90
	$ρ$	0.0202	−0.609	0.0058	0.84	0.0195	−0.593	0.0054	0.91
	$σ^{2}$	0.0219	0.510	0.0056	0.88	0.0208	0.509	0.0052	0.90
	$τ$	0.0196	0.087	0.0054	0.84	0.0184	0.091	0.0046	0.89
100	$γ$	0.0198	0.255	0.0051	0.92	0.0182	0.252	0.0039	0.94
	$ρ$	0.0185	−0.594	0.0046	0.89	0.0168	−0.603	0.0041	0.92
	$σ^{2}$	0.0191	0.508	0.0047	0.91	0.0173	0.505	0.0043	0.93
	$τ$	0.0177	0.108	0.0046	0.88	0.0145	0.106	0.0038	0.91

Table 3. Performance of unknown function estimators

{\hat{G}}_{n t}

.

Table 3. Performance of unknown function estimators

{\hat{G}}_{n t}

.

N	$θ_{0}^{a} = (- 0.3, 0.4, 1, - 0.2)'$				$θ_{0}^{b} = (0.25, - 0.6, 0.5, 0.1)'$
	T = 4		T = 10		T = 4		T = 10
	$M A D E$	Std.dev	$M A D E$	Std.dev	$M A D E$	Std.dev	$M A D E$	Std.dev
10	0.0379	0.0223	0.0227	0.0103	0.0305	0.0147	0.0181	0.0082
49	0.0241	0.0113	0.0161	0.0091	0.0211	0.0098	0.0122	0.0067
100	0.0126	0.0066	0.0085	0.0045	0.0104	0.0072	0.0074	0.0055

Table 4. Estimation of the non-parametric function.

		Estimate (Std. Error)	t Value (Pr(>\|t\|))
2021	a	−4.7864	−6.273
	a	(0.7630)	(2.05 × 10⁻⁵)
	b	3.4747	9.486
	b	(0.3663)	(1.79 × 10⁻⁷)
2020	a	−3.7395	−2.762
	a	(0.8539)	(1.53 × 10⁻⁴)
	b	3.3827	6.032
	b	(0.5608)	(3.08 × 10⁻⁵)

Table 5. Estimation of economic development.

		Coefficient	SD
Contemporaneous spatial effect	$ρ$	0.10655 ***	0.01275
Own time lag	$γ$	0.35262 ***	0.00993
Spatial diffusion	$τ$	−0.02983 ***	0.01363
Deviation of the error term	$σ^{2}$	0.00723 **	0.03159
Fixed effect	$α$	0.10887 ***	0.01107

Note: **

p < 0.05

; ***

p < 0.01

. The sample size is

n = 16

and

T = 3

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Estimation of the Non-Parametric Spatial Dynamic Panel Data Model with Fixed Effects

Abstract

1. Introduction

2. The Model and Profile Maximum Likelihood Estimators

2.1. The Model

2.2. Profile Maximum Likelihood Estimation

3. Profile Likelihood Estimators and Their Asymptotic Properties

4. Monte Carlo Simulations: Methods and Results

5. Empirical Application: Spatial Spillovers in the Yangtze River Delta

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Some Basic Lemmas

Appendix B. Proof of the Theoretical Results

References

Article Metrics

Citations

Article Access Statistics