A Survey of Spatial Unit Roots

Baltagi, Badi H.; Shu, Junjie

doi:10.3390/math12071052

Open AccessReview

A Survey of Spatial Unit Roots

by

Badi H. Baltagi

^1,*

and

Junjie Shu

²

¹

Center for Policy Research and Department of Economics, Syracuse University, 426 Eggers Hall, Syracuse, NY 13244-1020, USA

²

Department of Economics, Syracuse University, Syracuse, NY 13244-1020, USA

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(7), 1052; https://doi.org/10.3390/math12071052

Submission received: 26 January 2024 / Revised: 26 March 2024 / Accepted: 29 March 2024 / Published: 31 March 2024

(This article belongs to the Special Issue Mathematical Economics and Spatial Econometrics)

Download Versions Notes

Abstract

This paper conducts a brief survey of spatial unit roots within the context of spatial econometrics. We summarize important concepts and assumptions in this area and study the parameter space of the spatial autoregressive coefficient, which leads to the idea of spatial unit roots. Like the case in time series, the spatial unit roots lead to spurious regression because the system cannot achieve equilibrium. This phenomenon undermines the power of the usual Ordinary Least Squares (OLS) method, so various estimation methods such as Quasi-maximum Likelihood Estimate (QMLE), Two Stage Least Squares (2SLS), and Generalized Spatial Two Stage Least Squares (GS2SLS) are explored. This paper considers the assumptions needed to guarantee the identification and asymptotic properties of these methods. Because of the potential damage of spatial unit roots, we study some test procedures to detect them. Lastly, we offer insights into how to relax the compactness assumption to avoid spatial unit roots, as well as the relationship between spatial unit roots and other models, such as the Spatial Dynamic Panel Data (SDPD) model and Lévy–Brownian motion.

Keywords:

spatial correlation; spatial unit roots; nonstationarity; spurious spatial regression; panel data

MSC:

62-02

1. Introduction

There is an extensive literature using spatial statistics that deals with cross-sectional correlation, and is popular in regional science, urban economics and geography, to mention a few; see Anselin [1] for a nice introduction to this literature. Unlike time-series, there is typically no unique natural ordering for cross-sectional data. Spatial dependence models may use a metric of economic distance that provides cross-sectional data with a structure similar to that provided by the time index in time series. Examples in economics usually involve spillover effects or externalities due to geographical proximity. For example, the productivity of public capital, like roads and highways, on the output of neighboring states. Also, the pricing of welfare in one state that pushes recipients to other states. In a linear regression model, this spatial correlation may be in the disturbances and is called the spatial error model (SEM), or modeled on the dependent variable itself and named the spatial autoregression (SAR) model, or the spatial lag model. Unlike the autoregressive model or lagged model in time series where there is a natural ordering across time and lagged values are well defined, in cross-sections, this is dealt with using neighbors whose shocks or disturbances are affected by their neighbor’s shocks or disturbances in the SEM. For the SAR, the dependent variable is affected by neighbors, like house price being affected by neighboring house prices.This leads to the construction of a weight matrix that defines one’s neighbors by distance or contagion; see Anselin [1]. So, rather than lagged house prices as in an autoregressive time series model, own house price is related to a weighted average of neighboring house prices. Spatial dependence has been extended from cross-section to panel data; see Chapter 13 of Baltagi [2] or Elhorst [3] for a textbook treatment of the subject.

OLS yields inconsistent estimates in the SAR model due to the endogeneity of the spatial lagged dependent variable and the disturbances. For the SEM model, OLS yields an unbiased but inefficient estimator. Because of the limitation of OLS, (Q)MLE is often used to estimate the spatial models as in Ord [4], Anselin [1] and Lee [5]. However, MLE is sometimes computationally intensive especially for large sample sizes because of the requirement to compute the Jacobian term in the likelihood function. Ord [4] proposes a simplified computational procedure only requiring the eigenvalues of the spatial weight matrix, but computing accurate eigenvalues is increasingly difficult for large n. Kelejian and Prucha [6] suggest a Generalized Method of Moments (GMM) estimator for the SEM with SAR structure. Alternatively, constructing instrumental variables (IVs) from the exogenous variables, Kelejian and Prucha [7] propose the GS2SLS estimator and Lee [8] discuss the best GS2SLS estimator by using more efficient IVs. Lee [9] considers a GMM estimator for the SAR models with exogenoues variables and showed it is more efficient than the 2SLS estimator and is as efficient as the ML estimator asymptotically. Spatial panel data models (with dynamic terms) are also estimated using MLE, 2SLS or GMM as in Yu et al. [10], Baltagi and Liu [11], Kapoor et al. [12].

However, for all of these estimation methods considered above, they constrain the parameter space of the spatial coefficient to limit the degree of correlation between units. This is because when such spatial correlation is too strong, the spatial echoes passing through each unit do not die out and so the system cannot achieve an equilibrium. When the spatial weight matrix is row-normalized by convention, such a constraint requires the spatial coefficient to be smaller than 1 in absolute value. In this survey, we summarize the developments that relax this constraint and allow the spatial coefficient to equal or sufficiently approach unity, which is known as the spatial unit root. This is of practical relevance because there are many cases where the spatial coefficients are close to 1. For example, Keller and Shiue [13] detect the inter-regional trade of Chinese rice and find that rice prices for different provinces are highly related with spatial coefficients lying between 0.9 to 0.95.

When (near) spatial unit roots exist, the standard estimation procedures are not necessarily reliable and statistical inference is invalid. To remedy such a case, Fingleton [14] avoids the circularity of the spatial weight matrix and conducts a Monte Carlo simulation to explore the performance of OLS estimation. Alternatively, Lee and Yu [15] artificially let the spatial autoregression coefficient sufficiently approach the unit roots and derive the asymptotic behavior of QMLE and 2SLS estimators. The spatial unit roots have also been generalized to Spatial Dynamic Panel Data (SDPD) model by Yu and Lee [16].

In order to investigate the possible spatial unit roots, several test procedures have been proposed. Fingleton [14] suggests a “very high” value of the Moran’s I statistic could be useful for testing spatial unit roots. Lauridsen and Kosfeld [17,18] propose a two-stage Lagrange Multiplier (LM) tests that distinguish the spatial unit roots from the stationary positive spatial correlation. By the fact that spatial impulses do not die out under spatial unit roots and lead to explosive variance, Beenstock et al. [19] numerically calculate the critical value even under an irregular spatial weight matrix. These tests have been extensively used in the literature; see Yesilyurt and Elhorst [20], Olejnik [21], Machado et al. [22], Beenstock and Felsenstein [23].

This paper is organized as follows. Section 2 introduces some basic concepts in spatial econometrics. Next, the parameter space of the spatial autoregressive coefficient and the corresponding singular points are considered. Stationarity and spatial cointegration concepts derived from spatial unit roots are introduced. General assumptions and the corresponding implications in spatial econometrics are discussed. Section 3 introduces the potential problems with the existence of spatial unit roots: spurious and nonsense regression. Section 4 investigates estimation methods and inference under spatial unit roots. Section 5 discusses how to test for spatial unit roots. Section 6 discusses spatial unit roots in the SAR model while Section 7 concludes.

2. Basic Concepts in Spatial Econometrics

Spatial models study the spatial dependence between units. In practice, the spatial weight matrix is used to describe such dependence. Let

W_{n}^{*}

be the

n \times n

contiguity-based spatial weight matrix, i.e.,

w_{i j}^{*} = 1

if unit i and j are contiguous and 0 otherwise. Also, the diagonal elements are set to 0 by convention. In practice, the spatial weight matrix is generally row-normalized by

w_{n, i j} = \frac{w_{n, i j}^{*}}{\sum_{j} w_{n, i j}^{*}}

, so the row sum of

W_{n}

will be one.

Different spatial model specifications have different implications. The SEM with spatial autoregressive (SAR) structure on the

n \times 1

error vector

u_{n}

can be expressed as

u_{n} = λ_{1} W_{n} u_{n} + ϵ_{n} = {(I_{n} - λ_{1} W_{n})}^{- 1} ϵ_{n}

, where

λ_{1}

is known as the spatial coefficient and satisfies some assumptions that will be introduced later, and

ϵ_{n}

is the

n \times 1

independent and identically distributed (i.i.d.) innovations with variance

σ_{ϵ}^{2}

. The error covariance matrix for the

u_{n}

with SAR structure is

Ω_{S A R} = E [u_{n} u_{n}^{'}] = σ_{ϵ}^{2} {(I_{n} - λ_{1} W_{n})}^{- 1} {(I_{n} - λ_{1} W_{n}^{'})}^{- 1} = σ_{ϵ}^{2} {(B_{n}^{'} B_{n})}^{- 1},

(1)

where

u_{n}^{'} = ϵ_{n}^{'} {(I_{n} - λ_{1} W_{n})}^{' - 1}

and

B_{n} = I_{n} - λ_{1} W_{n}

. Though

W_{n}

may be sparse,

{(B_{n}^{'} B_{n})}^{- 1}

is not necessarily so, thus the spatial covariance structure induced by such SEM model is classified as global. Conversely, a spatial moving average (SMA) specification for the error vector

u_{n}

can be expressed as

u_{n} = λ_{2} W_{n} ϵ_{n} + ϵ_{n} = (I_{n} + λ_{2} W_{n}) ϵ_{n}

, and the corresponding covariance matrix will be

Ω_{S M A} = E [u_{n} u_{n}^{'}] = σ_{ϵ}^{2} [I_{n} + λ_{2} (W_{n} + W_{n}^{'}) + λ_{2}^{2} W_{n} W_{n}^{'}],

(2)

including only

W_{n}

and

W_{n} W_{n}^{'}

which are first and second order neighbors if

W_{n}

is defined as first-order contiguity. Hence, such a model is generally classified as local. See Baltagi et al. [24].

Kelejian and Prucha [7] consider a “cross-sectional (first-order) autoregressive spatial model with (first-order) autoregressive disturbances” (SARAR) and is labeled as spatial autoregressive combined (SAC) model. If the right-hand side includes both the independent variable and the spatially lagged dependent variable then it is termed the mixed regressive, spatial autoregression (MRSAR) or mixed SAR model. The spatial Durbin model (SDM) includes both the spatial lagged dependent variable and independent variables. A full model labeled as general nesting spatial (GNS) model given in Elhorst [3] is

\begin{matrix} \begin{matrix} Y_{n} & = λ_{1} W_{n} Y_{n} + α ι_{n} + X_{n} β + W_{n} X_{n} θ + u_{n}, \\ u_{n} & = λ_{2} W_{n} u_{n} + ϵ_{n}, \end{matrix} \end{matrix}

(3)

where

λ_{1}

is referred as spatial autoregressive coefficient (SAC) and

λ_{2}

is called the spatial autocorrelation coefficient.

2.1. Parameter Space of the Spatial Autoregressive Coefficient

Consider the pure SAR model with data generating process (DGP):

Y_{n} = λ W_{n} Y_{n} + ϵ_{n},

(4)

where

Y_{n}

is an

n \times 1

vector of observations on the dependent variable,

W_{n}

is an

n \times n

spatial weight matrix and

ϵ_{n}

is an

n \times 1

vector of disturbances which are assumed to be i.i.d.

(0, σ_{ϵ}^{2})

. The reduced form equation of

Y_{n}

can be written as

Y_{n} = S_{n}^{- 1} ϵ_{n},

(5)

where

S_{n} = I_{n} - λ W_{n}

. So to guarantee the system achieves equilibrium, a crucial assumption is that the absolute value of the spatial coefficient

λ

is strictly less than 1 (see Kelejian and Prucha [6] (Assumption 2), Kelejian and Prucha [7] (Assumption 2)) to ensure the nonsingularity of

S_{n}

. This assumption follows from a sufficient condition for the invertibility matrix in Horn and Johnson [25] (Corollary 5.6.16, p. 351):

Theorem 1.

An

n \times n

matrix

A_{n}

is nonsingular if there exists a matrix norm

∥ \cdot ∥

such that

∥ I_{n} - A_{n} ∥ < 1

. If this condition is satisified,

A_{n}^{- 1} = \sum_{k = 0}^{\infty} {(I_{n} - A_{n})}^{k}

.

Thus,

S_{n}

is invertible if there exists a matrix norm such that

∥ I_{n} - (I_{n} - λ W_{n}) ∥ = ∥ λ W_{n} ∥ < 1

. It is also well known that any norm of a matrix is larger than all of its eigenvalues. Let

X_{n}

be the eigenvector matrix and

ρ_{i}

,

i = 1, \dots, n

, be the egienvalues of

W_{n}

, then

| ρ_{i} | ∥ X_{n} ∥ = ∥ ρ_{i} X_{n} ∥ = ∥ W_{n} X_{n} ∥ \leq ∥ W_{n} ∥ ∥ X_{n} ∥ .

(6)

So it is easy to see that

| λ | < \frac{1}{∥ W_{n} ∥}

and therefore

\frac{1}{ρ_{m i n}} < λ < \frac{1}{ρ_{m a x}},

(7)

because

\sum_{i}^{n} ρ_{i} = tr (W_{n}) = 0

so that

ρ_{m i n} < 0 < ρ_{m a x}

. A useful result is given in Ord [4]:

Theorem 2.

If

W_{n}

has eigenvalues

ρ_{1}, \dots, ρ_{n}

,

| ρ I_{n} - W_{n} | = \prod_{i = 1}^{n} (ρ - ρ_{i})

. Then for

S_{n} = I_{n} - λ W_{n}

,

det (S_{n}) = | I_{n} - λ W_{n} | = \prod_{i = 1}^{n} (1 - λ ρ_{i})

.

Moreover, the log-likelihood function for

Y_{n}

, given

Y_{n} = y

in (4) is

ℓ (λ, σ^{2}) = - \frac{n}{2} ln (2 π σ^{2}) - \frac{1}{2 σ^{2}} y^{'} S_{n}^{'} S_{n} y + ln | S_{n} |,

(8)

and

| S_{n} | = \prod_{i = 1}^{n} (1 - λ ρ_{i}) > 0

, of which a sufficient condition is

λ ρ_{i} < 1

, for all i. Again, since

ρ_{m i n} < 0 < ρ_{m a x}

, we obtain the range of

λ

that is given in (7).

However, either by Theorem 1 or 2, (7) is a sufficient condition for the invertibility of

S_{n}

. The singular points of

S_{n}

are

\frac{1}{ρ_{1}}, \dots, \frac{1}{ρ_{n}}

by Theorem 2, and the number of these singular points are at most countably many as

n \to \infty

. This raises a problem as stated in Kelejian and Robinson [26]. These singular points can be determined generally by the nth polynomial numerically, and to avoid inconsistency, they should be removed from the possible values of

λ

. Griffith [27] (p. 19) states that this condition also ensures stationarity, but Kelejian and Robinson [26] give a counter example showing that it does not when

W_{n}

is a row-normalized double queen weight matrix.

When the matrix is row-normalized, Kelejian and Prucha [6] (Note 8, p. 120) show that

ρ_{m a x} = 1

by Geršgorin’s theorem, and typically,

| ρ_{m i n} | < 1

[26]. We assume

| λ | < 1

, and this is why

λ

is generally interpreted as the spatial autocorrelation coefficient similar to its counterpart in time series.

2.2. Stationarity, Order of Integration and Cointegration

Stationarity is a key assumption in time series. Similarly, in spatial econometrics, when the stationary assumption does not hold, spurious or nonsense regression appears, as will be shown in the next section. Stationarity is tightly connected with

S_{n}

. The formal definition of stationarity is given in Anselin [1] (p. 42):

Definition 1.

A process is strictly stationary if any finite subset

\{x_{i}, x_{j}, \dots, x_{n}\}

from the stochastic process

\{x_{i}, i \in I\}

has the same joint distribution as the subset

\{x_{i + s}, x_{j + s}, \dots, x_{n + s}\}

for any s, where s represents an uniform shift in time, space or time–space.

But we generally consider a weaker version, covariance stationarity. For the intuition of stationarity and the connection with the inverse of

S_{n}

, see Beenstock et al. [19]. Consider the pure SAR model in (4) and (5), the covariance matrix for

Y_{n}

is

Var (Y_{n}) = E [Y_{n} Y_{n}^{'}] = σ_{ϵ}^{2} S_{n}^{- 1} S_{n}^{' - 1} \equiv σ_{ϵ}^{2} B_{n},

(9)

where

B_{n} = S_{n}^{- 1} S_{n}^{' - 1}

and

S_{n}^{- 1}

is defined in (5). Letting

b_{k j}

be the

(k, j)

element of the matrix

B_{n}

, by normalizing

σ_{ϵ}^{2} = 1

,

Y_{k}

has variance

b_{k k}

and covariance

b_{k j}

with

Y_{j}

. By Definition 1, stationarity requires that

b_{k k}

and

b_{k j}

remain unchanged asymptotically (this implicitly assumes that both location j and k are far away from the edge). Note that by Theorem 1, we have

S_{n}^{- 1} = \sum_{k = 0}^{\infty} {(I_{n} - S_{n})}^{k} = \sum_{k = 0}^{\infty} {(λ W_{n})}^{k} = I_{n} + λ W_{n} + λ^{2} W_{n}^{2} + \dots .

(10)

So the stationarity assumption is equivalent to

λ^{m - k} W_{n}^{m - k} \to 0

, for all k as

m \to \infty

. Since

m < n

and

m \to \infty

, m represents the “remote” area, and

W_{n}^{m - k}

is the

m - k

step neighborhood of unit k. Thus, intuitively, stationarity requires that the shocks from far away locations will asymptotically not affect the epicenter area.

Another two concepts tightly connected with unit roots and stationarity are the order of integration and cointegration. The order of integration is originally a concept in time series that describes the minimum number of differences that a non-stationary process needs to be (covariance) stationary. Cointegration, on the other hand, describes the minimum order of integration of a combination of two or more series with the same order of integration. The formal definitions of the order of integration and cointegration are given in Hamilton [28]:

Definition 2.

A time series is integrated of order d, denoted

I (d)

, if

{(1 - L)}^{d} X_{t}

is a stationary process, where L is the lag operator and

1 - L

is the first difference.

Definition 3.

Time series X and Y are cointegrated of order

I (d, b)

, if both of them are

I (d)

, and there exists a cointegrated vector

(a, b)

such that

a X + b Y \sim I (d - b)

.

In time series, the lag operator L is defined by

L X_{t} = X_{t - 1}

because of the natural order of temporal data. In spatial econometrics, we regard

W_{n}

as the spatial lag operator and

I_{n} - W_{n}

as the first order spatial difference, see Anselin [1] (pp. 22–26). We also use

I (d)

and

I (a, b)

to refer to spatial integration and spatial cointegration respectively.

More specifically, for the pure SAR model in (4) with a row-normalized weight matrix, if

λ = 1

,

Y_{n} \sim I (1)

since

ϵ_{n}

is stationary. Also, suppose both

Y_{n}

and

X_{n}

are

I (1)

, but they have a long-term equilibrium relationship

Y_{n} = X_{n} β + ϵ_{n}

, then obviously,

(X_{n}, Y_{n})

are

I (1, 1)

with cointegrated vector

(- β, 1)

.

2.3. Some Fundamental Assumptions

Different assumptions are made in spatial econometrics for different estimation methods. The most common ones are listed here, and the implications are explained.

Assumption 1.

The disturbances

ϵ_{1}, \dots, ϵ_{n}

are i.i.d. for all n (so uniformly) with zero mean and finite variance

σ^{2}

. Additionally, fourth moments exist.

Assumption 2.

The elements of the exogenous variables

X_{n}

are uniformly bounded for all n. The

{lim}_{n \to \infty} \frac{X_{n}^{'} X_{n}}{n}

exists and is nonsingular.

Assumption 3.

The matrix

S_{n}

is nonsingular.

The existence of up to the fourth moment of disturbances is needed to apply the central limit theorem for (a system) of the linear-quadratic form (see Kelejian and Prucha [29] (p. 226) and Kelejian and Prucha [30] (p. 63)). The nonsigularity of

S_{n}

makes sure the system achieves an equilibrium as well as ensuring that the mean and variance of

Y_{n}

exist.

Assumption 4.

The matrices

W_{n}

and

S_{n}^{- 1} = {(I_{n} - λ W_{n})}^{- 1}

are uniformly bounded (UB) in both row and column sums for all n. (We say a matrix is UB in row (column) sums if its maximum row (column) sum is finite. This property preserves under finite matrix multiplication.)

The UB condition for

W_{n}

implicitly assumes a limited number of neighbors for all units even as

n \to \infty

, so the weight matrix

W_{n}

is sparse for large n. This assumption is relaxed in Lee et al. [31] by introducing dominant (popular) units. In practice, the spatial units have a limited number of neighbors. Though sometimes

w_{n, i j}

may be defined as the inverse of the distance between i, j physically or economically,

w_{n, i j}

tends to be 0 between far away units as n increases. So in general this assumption is satisfied. The UB of

\{S_{n}^{- 1}\}

is to ensure the covariance matrix

Var (Y_{n})

in (9) is still UB, which limits the correlation between two different units since the UB property preserves under matrix multiplication.

Other assumptions to ensure identification conditions or the derivation of asymptotic distributions of estimators will be mentioned when needed.

3. Spurious Regression When (Near) Unit Roots Exist

The variance of

Y_{n}

explodes when unit roots exist, and OLS estimation may perform unsatisfactorily: the estimators are inconsistent, the test statistics do not have familiar distributions, and may even converge to a constant. These phenomena have been studied extensively in time series, and similar symptoms occur in spatial econometrics.

3.1. Spurious Regression of Driftless Series and Spatial Integration

Fingleton [14] studies unit roots and spatial cointegration in spatial econometrics. Using Monte Carlo simulations, he finds that spatial unit roots will lead to a spurious regression and proves that when two vectors are spatially cointegrated, even running a regression on the error-correction model yields inconsistent estimates. Beenstock et al. [19] distinguish between the terms spurious regression and nonsense regression and argue that, Fingleton [14] refers to nonsense regression instead of spurious regression. When

Y_{n}

and

X_{n}

are driftless random walks, the nonsense regression occurs because of the increased variances of

Y_{n}

and

X_{n}

over time. On the other hand, the spurious regression occurs when

Y_{n}

and

X_{n}

are independent random walks with drift, which causes their means to increase over time. See also Mur and Trívez [32].

To run the simulation, two independent pure SAR processes

Y_{n}

and

X_{n}

containing spatial unit roots are generated separately as in (5). But as discussed in Section 2.1,

S_{n}^{- 1}

does not exist under a row-normalized weight matrix when

λ = 1

. To avoid this, Fingleton introduces the “unconnected central cell”, which manually sets one row of the spatial weight matrix equal to 0 to avoid circularity. This is a time-series analogy because there is always a starting point in temporal data (

t = 1

). By doing so, the singular point is slightly larger than 1, and the existence of

S_{n}^{- 1}

is ensured [32]. Regressing

X_{n}

on

Y_{n}

, the t-statistic and coefficient of determination

R^{2}

show the significance of the parameter between two unrelated variables when spatial unit roots exist. Letting e be the OLS residuals, Moran’s I, defined as

\begin{matrix} \begin{matrix} I_{M o r a n} & = \frac{n}{\sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{n, i j}} \frac{e^{'} W_{n} e}{e^{'} e} \\ = \frac{e^{'} W_{n} e}{e^{'} e} (when W_{n} is row - normalized), \end{matrix} \end{matrix}

(11)

is the spatial version of the Durbin–Watson statistic, and thus is a measure of spatial autocorrelation. The simulation results of Moran’s I show a high level of positive spatial autocorrelation in the residuals and evidence for the presence of a spurious regression.

To remedy this situation, spatial differencing is introduced to the SAR process with unit roots:

Δ Y_{n} = γ Δ X_{n} + ϵ_{n},

(12)

where

Δ Y_{n} = Y_{n} - W_{n} Y_{n}

and

Δ X_{n} = X_{n} - W_{n} X_{n}

. When both

Y_{n}

and

X_{n}

are spatial

I (1)

processes, we have

Δ Y_{n} = ϵ_{Y}

and

Δ X_{n} = ϵ_{X}

. The regression of the first-order spatial difference variable is equivalent to regressing two independent

I (0)

processes, which should theoretically yield

\hat{γ} = 0

.

Next, spatially cointegrated series are considered. To generate

(X_{n}, Y_{n}) \sim I (1, 1)

, the “error-correction representation” is used. The idea is adopted from Robert [33] that the existence of error-correction representation is a necessary and sufficient condition for a cointegrated time series. The spatial analogy is

\begin{matrix} \begin{matrix} Y_{n} = W_{n} Y_{n} + c (W_{n} X_{n} - W_{n} Y_{n}) + e_{1 n}, \\ X_{n} = W_{n} X_{n} + d (W_{n} X_{n} - W_{n} Y_{n}) + e_{2 n}, \\ e_{1 n} \sim N (0, σ_{1}^{2} I_{n}), \\ e_{2 n} \sim N (0, σ_{2}^{2} I_{n}), \end{matrix} \end{matrix}

(13)

where

W_{n} X_{n} - W_{n} Y_{n}

is the equilibrium error assumed (for simplicity) stationary and hence the name “error-correction”. The spatial unit root series

X_{n}

and

Y_{n}

have a long-term equilibrium

X_{n} = Y_{n}

. Note that (13) has two equations and two unknowns and

W_{n}

is a noncircular matrix.

Moran’s I statistic may act as a useful indicator for cointegration because the cointegration regression (regress

Y_{n}

on

X_{n}

) involves endogenous variables. Also, the first-order regression is inappropriate because of omitted variable bias concerning the equilibrium error

W_{n} X_{n} - W_{n} Y_{n}

. Rearranging (13) yields the appropriate specifications:

\begin{matrix} \begin{matrix} Y_{n} - W_{n} Y_{n} & = c (W_{n} X_{n} - W_{n} Y_{n}) + e_{1 n}, \\ X_{n} - W_{n} X_{n} & = d (W_{n} X_{n} - W_{n} Y_{n}) + e_{2 n} . \end{matrix} \end{matrix}

(14)

But OLS estimation for either c or d is inconsistent because of the presence of a spatially lagged dependent variable, which is different from the traditional time series counterpart.

3.2. Spurious Regression with Deterministic Trends

Fingleton [14] studies the effect of spatial unit roots by simulation while Mur and Trívez [32] show that the variance of the spatial unit roots series explodes. For the DGP in (4)

Y_{n} = λ W_{n} Y_{n} + ϵ_{n} ⟹ Y_{n} = {(I_{n} - λ W_{n})}^{- 1} ϵ_{n} = S_{n}^{- 1} ϵ_{n},

(15)

since the contiguity-based spatial weight matrix is symmetric,

W_{n}

can be decomposed as

W_{n} = Q_{n} Λ_{n} Q_{n}^{- 1}

no matter whether it is row-normalized or not. Thus

S_{n} = I_{n} - λ W_{n} = Q_{n} (I_{n} - λ Λ_{n}) Q_{n}^{- 1} = Q_{n} Δ_{n} Q_{n}^{- 1}

where

Λ_{n}

is the eigenvalue matrix of

W_{n}

and thus

Δ_{n}

is also diagonal. So, Mur and Trívez [32] derive the variance of

Y_{n}

as

\begin{matrix} \begin{matrix} Var (Y_{n}) & = σ^{2} {(B_{n}^{'} B_{n})}^{- 1} \\ = σ^{2} Q_{n} Δ_{n}^{- 1} Q_{n}^{- 1} {(Q_{n}^{'})}^{- 1} Δ_{n}^{- 1} Q_{n}^{'} \\ = σ^{2} Q_{n} Δ_{n}^{- 1} {(Q_{n}^{'} Q_{n})}^{- 1} Δ_{n}^{- 1} Q_{n}^{'}, \end{matrix} \end{matrix}

(16)

with

\begin{matrix} Q_{n} Δ_{n}^{- 1} = [\begin{matrix} q_{11} & \dots & q_{1 n} \\ ⋮ & ⋱ & ⋮ \\ q_{n 1} & \dots & q_{n n} \end{matrix}] diag \{\frac{1}{1 - λ ρ_{1}}, \dots, \frac{1}{1 - λ ρ_{n}}\} = [\begin{matrix} \frac{q_{11}}{1 - λ ρ_{1}} & \dots & \frac{q_{1 n}}{1 - λ ρ_{n}} \\ ⋮ & ⋱ & ⋮ \\ \frac{q_{n 1}}{1 - λ ρ_{1}} & \dots & \frac{q_{n n}}{1 - λ ρ_{n}} \end{matrix}] . \end{matrix}

(17)

Let

m_{i j}

be the element of row i and column j of the matrix

{(Q_{n}^{'} Q_{n})}^{- 1}

, and the variance of the r-th element of

Y_{n}

is then

\begin{matrix} Var (Y_{n, r}) = σ^{2} \sum_{i}^{n} \sum_{j}^{n} \frac{q_{i r} q_{j r}}{(1 - λ ρ_{i}) (1 - λ ρ_{j})} m_{i j} . \end{matrix}

(18)

If

W_{n}

is row-normalized, at least one

ρ_{i} = 1

, which means that if the spatial unit roots exist and

λ = 1

,

Var (Y_{n})

explodes. If

W_{n}

is not row-normalized so that

Q_{n}

is symmetric and orthogonal with

Q_{n}^{'} Q_{n} = I_{n}

,

Var (Y_{n}) = σ^{2} Q_{n} Δ_{n}^{- 2} Q_{n}

, then Mur and Trívez [32] show the variance of the observation at r reduces to

Var (Y_{n, r}) = σ^{2} \sum_{i = 1}^{n} \frac{q_{i r}^{2}}{{(1 - λ ρ_{i})}^{2}} = \infty, when λ = \frac{1}{ρ_{i}} .

(19)

But when

λ

is not one of the singular points of

S_{n}

,

Var (Y_{n})

is not necessarily a function of n, i.e., the variances of

Y_{n}

do not increase as the sample size grows [32]. This is in line with the discussion of stationarity in Section 2.2 and reveals the possible source for nonsense regression concerned with the spatial unit root SAR series [34] (p. 303).

Mur and Trívez [32] focus on the spurious regression when a spatial deterministic trend exists and show that under such circumstances similar symptoms related to unit roots occur. Consider the DGP

\begin{matrix} Y_{n} & = δ ι_{n} + λ W_{n} Y_{n} + ϵ_{n} . \end{matrix}

(20)

\begin{matrix} ⟹ Y_{n} & = {(I_{n} - λ W_{n})}^{- 1} (δ ι_{n} + ϵ_{n}) \end{matrix}

(21)

\begin{matrix} = δ ι_{n}^{‡} + S_{n}^{- 1} ϵ_{n}, \end{matrix}

(22)

where

ι_{n}

is an

n \times 1

unit vector and

ι_{n}^{‡} = S_{n}^{- 1} ι_{n}

. Comparing the term

δ ι_{n}^{‡}

in (22) with the time trends in a typical time series model, “

ι_{n}^{‡}

” is similar to the time trend “t”: t is different in terms of the relative position in time and the element of

ι_{n}^{‡}

is different in terms of its relative position in space. Also, the presence of such a trend term in the SAR process leads to spurious regression. Consider the simple regression

Y_{1 n} = α + Y_{2 n} β + μ_{n},

(23)

where

Y_{1 n}

and

Y_{2 n}

are unrelated SAR processes generated by (20), respectively

\begin{matrix} \begin{matrix} Y_{1 n} = δ_{1} ι_{n} + λ_{1} W_{n 1} Y_{1 n} + ϵ_{1 n} \Rightarrow Y_{1 n} = δ_{1} S_{1 n}^{- 1} ι_{n} + S_{1 n}^{- 1} ϵ_{1 n}, \\ Y_{2 n} = δ_{2} ι_{n} + λ_{2} W_{n 2} Y_{2 n} + ϵ_{2 n} \Rightarrow Y_{2 n} = δ_{2} S_{2 n}^{- 1} ι_{n} + S_{2 n}^{- 1} ϵ_{2 n} . \end{matrix} \end{matrix}

(24)

Assuming

ϵ_{1 n}

and

ϵ_{2 n}

are independent white noises, we expect the estimate of

β

in (23) to be 0. However, this is generally not the case which means spurious regression occurs. This can be seen from the fact that the correlation coefficient between

Y_{1 n}

and

Y_{2 n}

given in Mur and Trívez [32]

r_{Y_{1 n}, Y_{2 n}} = \frac{\sum_{r} (Y_{1 n, r} - {\bar{Y}}_{1 n}) (Y_{2 n, r} - {\bar{Y}}_{2 n})}{\sqrt{\sum_{r} {(Y_{1 n, r} - {\bar{Y}}_{1 n})}^{2} \sum_{r} {(Y_{2 n, r} - {\bar{Y}}_{2 n})}^{2}}} \to 1, n \to \infty .

(25)

Though Fingleton argues that a high value of Moran’s I statistics may be a good indicator for the existence of spatial unit roots and spatial cointegration, he cannot distinguish between them, or even from the (genuine) positive spatial autocorrelation case. Some testing methods are developed and summarized in Section 5. The trend SAR series proposed by Mur and Trívez [32] seems to receive less attention, which may be due to the fact that when the mixed SAR process contains only a constant exogenous variable and

W_{n}

is row-normalized, multicollinearity occurs; see Kelejian and Prucha [7] (p. 105), Lee [35] (p. 258), Lee [5] (p. 1907).

3.3. Spurious Regression under the near Unit Roots with a Row-Normalized, Circular Weight Matrix

The spurious regression considered in the previous two sections is under a row-normalized, noncircular weight matrix, which implicitly assumes an unconnected central unit in the spatial system, and is too restrictive to be used in empirical applications. Thus, Lee and Yu [34] study the spurious regression under a circular, row-normalized spatial weight matrix. The DGP process for the (mixed) SAR series is

Y_{j n} = λ_{j n} W_{j n} Y_{j n} + Z_{j n} γ_{j} + ϵ_{j n}, j = 1, \dots, m .

(26)

In this case,

λ_{j n} \neq 1

, since the unit roots are singular points of

W_{j n}

. They study the consequence when

λ_{n}

approaches 1, namely

λ_{n 0} = 1 - \frac{1}{ψ_{n}},

(27)

where

ψ_{n} \to \infty

as

n \to \infty

.

3.3.1. Decomposition of $Y_{n}$

Though the variance of

Y_{n}

explodes as

λ_{n} \to 1

,

Y_{n}

can be decomposed into a stable part and an unstable part by the decomposition of the weight matrix

W_{n}

. This decomposition given in Lee and Yu [15] is used in Yu et al. [10,36], Yu and Lee [16] to study unit roots in a spatial dynamic panel data (SDPD) model. Because of its importance, this procedure is summarized here. See Baltagi et al. [37] and Lee and Yu [15,34] for more information.

Theorem 3.

Suppose that

W_{n}

is a row-normalized weight matrix from a symmetric matrix

C_{n}

, i.e.,

W_{n} = Λ_{n}^{- 1} C_{n}

, where

Λ_{n}

is a diagonal matrix with its diagonal elements formed by the row sums of

C_{n}

. Then (i) the eigenvalues of

W_{n}

are all real; and (ii)

W_{n}

is diagonalizable.

(i) can be easily seen from the fact that all symmetric matrices have real eigenvalues. For (ii)

W_{n} = Λ_{n}^{- 1} C_{n} ⟹ Λ_{n}^{\frac{1}{2}} W_{n} Λ_{n}^{- \frac{1}{2}} = Λ_{n}^{- \frac{1}{2}} C_{n} Λ_{n}^{- \frac{1}{2}} .

(28)

Let

D_{n}^{*}

be the eigenvalue matrix of

Λ_{n}^{- \frac{1}{2}} C_{n} Λ_{n}^{- \frac{1}{2}}

, and

R_{n}^{*}

be the corresponding orthogonal eigenvector matrix, i.e.,

Λ_{n}^{- \frac{1}{2}} C_{n} Λ_{n}^{- \frac{1}{2}} = R_{n}^{*} D_{n}^{*} R_{n}^{*^{'}}

. Lee and Yu [15] show

W_{n}

can be expressed as:

\begin{matrix} \begin{matrix} W_{n} & = Λ_{n}^{- \frac{1}{2}} (R_{n}^{*} D_{n}^{*} R_{n}^{*^{'}}) Λ_{n}^{\frac{1}{2}} \\ = (Λ_{n}^{- \frac{1}{2}} R_{n}^{*}) D_{n}^{*} (R_{n}^{* - 1} Λ_{n}^{\frac{1}{2}}) \\ = (Λ_{n}^{- \frac{1}{2}} R_{n}^{*}) D_{n}^{*} {({Λ_{n}}^{- \frac{1}{2}} R_{n}^{*})}^{- 1} . \end{matrix} \end{matrix}

(29)

Let

R_{n} = Λ_{n}^{- \frac{1}{2}} R_{n}^{*}

,

D_{n} = D_{n}^{*}

. By definition,

R_{n}

is the eigenvector of

W_{n}

and

D_{n} = D_{n}^{*}

is the corresponding eigenvalue, so the eigendecomposition of

W_{n}

is

W_{n} = R_{n} D_{n} R_{n}^{- 1} .

(30)

Moreover, the largest eigenvalues of a row-normalized matrix are 1 in absolute value. Without loss of generality, Lee and Yu [15] assume there are

m_{n}

eigenvalues equal to 1 and let

D_{n} = diag \{1_{m_{n}}, d_{n, m_{n} + 1}, \dots, d_{n n}\},

(31)

where

1_{m_{n}}

is

1 \times m_{n}

vector of ones and

|d_{n i}| \leq 1

for all i. So the eigenvalue matrix

D_{n}

can be decomposed into two parts:

D_{n} = J_{n} + {\tilde{D}}_{n},

(32)

where

J_{n} = diag \{1_{m_{n}}, 0, \dots, 0\}

and

{\tilde{D}}_{n} = diag \{0, \dots, 0, d_{n, m_{n} + 1}, \dots, d_{n n}\}

. Accordingly,

\begin{matrix} \begin{matrix} W_{n} & = R_{n} D_{n} R_{n}^{- 1} = R_{n} (J_{n} + {\tilde{D}}_{n}) R_{n}^{- 1} \\ = R_{n} J_{n} R_{n}^{- 1} + R_{n} {\tilde{D}}_{n} R_{n}^{- 1} = W_{n}^{u} + {\tilde{W}}_{n}, \end{matrix} \end{matrix}

(33)

where

W_{n}^{u} = R_{n} J_{n} R_{n}^{- 1}

and

{\tilde{W}}_{n} = R_{n} {\tilde{D}}_{n} R_{n}^{- 1}

.

Lee and Yu [15] note that

J_{n} J_{n} = J_{n}

and

J_{n} {\tilde{D}}_{n} = 0

, so

W_{n}^{u} W_{n}^{u} = W_{n}^{u}, W_{n}^{u} {\tilde{W}}_{n} = 0, W_{n} W_{n}^{u} = W_{n}^{u} .

(34)

Denoting

S_{n} (λ) = I_{n} - λ W_{n} = R_{n} (I_{n} - λ D_{n}) R_{n}^{- 1}

, and

S_{n} = S_{n} (λ_{n 0})

where

λ_{n 0}

is the true value of

λ

, they obtain

S_{n}^{- 1} (λ) = R_{n} {(I_{n} - λ D_{n})}^{- 1} R_{n}^{- 1}

and thus

\begin{matrix} I_{n} - λ_{n 0} D_{n} = diag \{(1 - λ_{n 0}), \dots, (1 - λ_{n 0}), (1 - λ_{n 0} d_{n, m_{n} + 1}), \dots, (1 - λ_{n 0} d_{n n})\}, \end{matrix}

(35)

\begin{matrix} ⟹ & {(I_{n} - λ_{n 0} D_{n})}^{- 1} = diag \{ψ_{n}, \dots, ψ_{n}, \frac{1}{1 - λ_{n 0} d_{n, m_{n} + 1}}, \dots, \frac{1}{1 - λ_{n 0} d_{n n}}\}, \end{matrix}

(36)

since

λ_{n 0} = 1 - \frac{1}{ψ_{n}}

. Similarly,

{(I_{n} - λ_{n 0} {\tilde{D}}_{n})}^{- 1} = diag \{1_{m_{n}}, \frac{1}{1 - λ_{n 0} d_{n, m_{n} + 1}}, \dots, \frac{1}{1 - λ_{n 0} d_{n n}}\} .

(37)

Comparing the first

m_{n}

diagonals of

{(I_{n} - λ_{n 0} D_{n})}^{- 1}

and

{(I_{n} - λ_{n 0} {\tilde{D}}_{n})}^{- 1}

, it is easy to obtain

{(I_{n} - λ_{n 0} D_{n})}^{- 1} = ψ_{n} λ_{n 0} J_{n} + {(I_{n} - λ_{n 0} {\tilde{D}}_{n})}^{- 1}

, thus

S_{n}^{- 1} (λ_{n 0}) = ψ_{n} λ_{n 0} W_{n}^{u} + {(I_{n} - λ_{n 0} {\tilde{W}}_{n})}^{- 1} .

(38)

Denote

G_{n} = W_{n} S_{n}^{- 1}

, by (38) and (34):

G_{n} = ψ_{n} λ_{n 0} W_{n}^{u} + W_{n} {(I_{n} - λ_{n 0} {\tilde{W}}_{n})}^{- 1} .

(39)

Thus, Lee and Yu [15] decompose

Y_{j n} = S_{j n}^{- 1} (λ_{j n}) (Z_{j n} γ_{j} + ϵ_{j n})

into two parts by (38):

Y_{j n} = ψ_{j n} Y_{j n}^{u} + {\tilde{Y}}_{j n},

(40)

where

Y_{j n}^{u} = λ_{j n} W_{j n}^{u} (Z_{j n} γ_{j} + ϵ_{j n})

and

{\tilde{Y}}_{j n} = {(I_{n} - λ_{j n} {\tilde{W}}_{j n})}^{- 1} (Z_{j n} γ_{j} + ϵ_{j n})

.

It can be seen from (38) that when

λ_{n 0} \to 1

,

S_{n}^{- 1} (λ_{n 0})

is ill-conditioned because

ψ_{n} \to \infty

and hence the variance of

Y_{n}

explodes. This is caused by the first unstable term in (40) (the second term is stable). Thus,

Y_{j n}

is of order

ψ_{j n}

, which may grow too fast to yield useful asymptotic analysis. Thus, a rate-adjusted factor

\frac{1}{ψ_{j n}}

is needed to maintain a controllable rate. A similar idea applies to QMLE and 2SLS methods for estimation as shown later.

3.3.2. Spurious Regression of OLS under Near Unit Root

To study spurious regression, following Fingleton [14], Lee and Yu [34] consider the DGP that is similar to (26) but without exogenous variables:

Y_{j n} = λ_{j n} W_{j n} Y_{j n} + ϵ_{j n}, j = 1, \dots, m .

(41)

Denote

Y_{- 1, n} = [Y_{2 n}, \dots, Y_{m n}]

,

X_{n} = [ι_{n}, Y_{- 1, n}]

where

ι_{n}

is the

n \times 1

vector of ones. Let

λ

be a scalar,

β = {(β_{2}, \dots, β_{m})}^{'}

be an

(m - 1) \times 1

vector and

δ \equiv {(λ, β^{'})}^{'}

. OLS, which may yield spurious regression, is then

Y_{1 n} = α ι_{n} + Y_{- 1, n} β + V_{n} = X_{n} δ + V_{n},

(42)

where

V_{n}

is an

n \times 1

vector of disturbances. To make sure the variable of interest is under controllable order, scale

Y_{j n}

and

S_{j n}^{- 1} S_{j n}^{' - 1}

as

Y_{j n}^{*} = \frac{1}{ψ_{j n}} Y_{j n}, S_{j n} = \frac{1}{ψ_{j n}^{2}} S_{j n}^{- 1} S_{j n}^{' - 1}, j = 1, \dots, m .

(43)

Lee and Yu [34] introduce a sufficient condition for Assumption 4 that ensures the UB of

S_{n}^{- 1}

, by (38):

Assumption 5.

{(I_{n} - λ_{j n} {\tilde{W}}_{j n})}^{- 1}

and

W_{j n}^{u}

are

U B

.

Under Assumption 5,

S_{j n}

is UB. To study the properties of OLS estimation, it is sufficient to show the asymptotic behaviors of

\frac{1}{\sqrt{n}} X_{n}^{*'} Y_{1 n}^{*}

and

\frac{1}{n} Y_{i n}^{*'} Y_{j n}^{*}

, where

X_{n}^{*} = [ι_{n}, Y_{- 1, n}^{*}]

. Proofs of these properties are about orders of matrices and random vectors as well as first and second moments of quadratic forms (some useful lemmas can be found at https://www.asc.ohio-state.edu/lee.1777/wp/sar-qml-r-appen-04feb.pdf, accessed on 28 March 2024). The OLS estimates for

α

and

β

:

{({\hat{α}}_{n}, {\hat{β}}_{n}^{'})}^{'} = {(X_{n}^{'} X_{n})}^{- 1} (X_{n}^{'} Y_{1 n})

can be expressed in terms of

\frac{1}{\sqrt{n}} X_{n}^{*'} Y_{1 n}^{*}

and

\frac{1}{\sqrt{n}} X_{n}^{*'} X_{n}^{*}

\sqrt{n} (\begin{matrix} {\hat{α}}_{n} \\ {\hat{β}}_{n} \end{matrix}) = ψ_{1 n} Y_{m}^{- 1} {(\frac{1}{n} X_{n}^{*'} X_{n}^{*})}^{- 1} (\frac{1}{\sqrt{n}} X_{n}^{*'} Y_{1 n}^{*}),

(44)

where

Y_{m} = (\begin{matrix} 1 & 0 \\ 0 & Y_{2 m} \end{matrix}) and Y_{2 m} = (\begin{matrix} ψ_{2 n} & 0 & \dots & 0 \\ 0 & ψ_{3 n} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & ψ_{m n} \end{matrix}) .

(45)

Lee and Yu [34] notice that the scaling factor

Y_{m}

is needed because the columns of

X_{n}

have different orders

(1, ψ_{2 n}, \dots, ψ_{m n})

and

Y_{1 n}

is of order

ψ_{1 n}

. Based on this fact, they give the following result:

\frac{1}{n} X_{n}^{*'} X_{n}^{*} = D_{n, x x}^{*} + O_{p} (\frac{1}{\sqrt{n}}),

(46)

where

D_{n, x x}^{*} \equiv diag \{1, σ_{2}^{2} \frac{1}{n} tr (S_{2 n}), \dots, σ_{m}^{2} \frac{1}{n} tr (S_{m n})\}

,

O_{p} (\frac{1}{\sqrt{n}})

means the remaing terms of

\frac{1}{n} X_{n}^{*'} X_{n}^{*}

are at most of order

\frac{1}{\sqrt{n}}

and

S_{j n}

symbols are defined in (43), for all

j = 2, \dots, m

.

Moreover,

\frac{1}{\sqrt{n}} X_{n}^{*'} Y_{1 n}^{*}

has limiting variance matrix:

Σ_{m}^{*} = σ_{1}^{2} diag \{lim_{n \to \infty} \frac{1}{n} l_{n}^{'} S_{1 n} l_{n}, σ_{2}^{2} lim_{n \to \infty} \frac{1}{n} tr (S_{2 n} S_{1 n}), \dots, σ_{m}^{2} lim_{n \to \infty} \frac{1}{n} tr (S_{m n} S_{1 n})\} .

(47)

Lee and Yu [34] also adjust Assumption 2 to ensure

\frac{1}{n} X_{n}^{*'} X_{n}^{*}

is of full rank when

n \to \infty

and obtain the asymptotic distributions of the OLS estimators:

Assumption 6.

{lim}_{n \to \infty} \frac{1}{n} tr (S_{j n}) \neq 0 for j = 1, 2, \dots, m

.

Denoting

D_{x x}^{*} \equiv lim_{n \to \infty} D_{n, x x}^{*} = diag \{1, σ_{2}^{2} lim_{n \to \infty} \frac{1}{n} tr (S_{2 n}), \dots, σ_{m}^{2} lim_{n \to \infty} \frac{1}{n} t r (S_{m n})\},

(48)

then

\frac{\sqrt{n}}{ψ_{1 n}} Y_{m} {({\hat{λ}}_{n}, {\hat{β}}_{n}^{'})}^{'} \overset{d}{\to} N (0, {[D_{x x}^{*}]}^{- 1} Σ_{m}^{*} {[D_{x x}^{*}]}^{- 1}) .

(49)

Since

Y_{- 1, n}

is independent of

Y_{1 n}

, one may expect insignificant

\hat{β}

in (42). However, whether

{\hat{β}}_{j}

converges to 0 in probability or not depends on the factor

\frac{\sqrt{n}}{ψ_{1 n}} ψ_{j n}

for

2 \leq j \leq m

(note that

Y_{m}

in (49) is diagonal): if

\frac{\sqrt{n}}{ψ_{1 n}} ψ_{j n} \to \infty

,

{\hat{β}}_{j n} \overset{p}{\to} 0

is

\frac{\sqrt{n}}{ψ_{1 n}} ψ_{j n}

-consistent; if

\frac{\sqrt{n}}{ψ_{1 n}} ψ_{j n} \to c < \infty

,

{\hat{β}}_{j n}

is asymptotically normal because its limiting variance does not converge to 0; if

\frac{\sqrt{n}}{ψ_{1 n}} ψ_{j n} \to \infty

,

{\hat{β}}_{j n}

is not bounded in probability and diverges. Intuitively, spurious regression will not occur if

ψ_{1 n}

approaches ∞ faster than

ψ_{j n}

, or equivalently,

λ_{1 n}

approaches 1 more quickly than

λ_{j n}

[34].

3.3.3. Other Test Statistics

It is also important to discuss the statistical properties of other test statistics based on (42), which could be potentially useful for distinguishing spurious regression. Lee and Yu [34] give the following theorem as a prerequisite:

Theorem 4.

Under Assumptions 1 and 4, for any nonstochastic UB square matrix

B_{n}

,

\frac{1}{n} Y_{1 n}^{*'} B_{n} P_{n} Y_{1 n}^{*} = O_{p} (\frac{1}{n}) .

(50)

where

P_{n} = X_{n} {(X_{n}^{'} X_{n})}^{- 1} X_{n}^{'}

is the projection matrix of

X_{n}

Based on this theorem, Lee and Yu [34] show the order of the estimated variance of the disturbances

{\hat{σ}}_{n}^{2}

is

\frac{1}{ψ_{1 n}^{2}} {\hat{σ}}_{n}^{2} = \frac{1}{ψ_{1 n}^{2}} (\frac{e_{n}^{'} e_{n}}{n - m}) = σ_{1}^{2} tr (S_{1 n}) + O_{p} (\frac{1}{\sqrt{n}}),

(51)

and

\begin{matrix} t_{β_{j}} \overset{d}{\to} N (0, lim_{n \to \infty} \frac{n \cdot tr (S_{j n} S_{1 n})}{tr (S_{j n}) \cdot tr (S_{1 n})}), for every 2 \leq j \leq m, \end{matrix}

(52)

F \overset{d}{=} \frac{1}{(m - 1)} U_{m}^{'} U_{m},

(53)

where

U_{m}

is an

m - 1

random vector with its elements

u_{j m} \sim N (0, {lim}_{n \to \infty} \frac{n tr (S_{j n} S_{1 n})}{tr (S_{j n}) tr (S_{1 n})})

. Thus

t_{β_{j}}

does not have a familiar asymptotic standard normal distribution, and the F-statistic has no familiar

χ^{2} (m - 1)

distribution either.

Even though the t- and F-statistic are not reliable, Lee and Yu [34] suggest the combination of

R^{2}

and Moran’s I could be a good indicator for spurious regression under near unit roots. Let

M_{n}^{0} = I_{n} - \frac{1}{n} l_{n} l_{n}^{'}

, where

l_{n}

is an

n \times 1

vector of ones, the coefficient of determination

R^{2}

is

\begin{matrix} R^{2} : = 1 - \frac{e_{n}^{'} e_{n}}{Y_{1 n}^{'} M_{n}^{0} Y_{1 n}} = 1 - \frac{e_{n}^{'} e_{n} / (n ψ_{1 n}^{2})}{Y_{1 n}^{'} M_{n}^{0} Y_{1 n} / (n ψ_{1 n}^{2})} = O_{p} (\frac{1}{n}) \overset{p}{\to} 0 . \end{matrix}

(54)

And the Moran’s I statistic is

\begin{matrix} \begin{matrix} I_{Moran} & : = \frac{\frac{1}{n} e_{n}^{'} W_{n} e_{n}}{\frac{1}{n} e_{n}^{'} e_{n}} = \frac{tr (S_{1 n}^{*^{'} - 1} W_{n} S_{1 n}^{* - 1})}{tr (S_{1 n}^{*' - 1} S_{1 n}^{* - 1})} + O_{p} (\frac{1}{\sqrt{n}}) \\ \overset{p}{\to} 1, when W_{n} = W_{1 n} . \end{matrix} \end{matrix}

(55)

3.3.4. Constant Terms in the DGP of $Y_{j n}$ ’s

Lee and Yu [34] also study the constant term and unit roots at the same time. It could be shown that the estimation of

β

is the same as in (49) after reparameterization. Consider the DGP of series

Y_{j n}^{c}

with near unit roots and a constant term

c_{j n}

as

Y_{j n}^{c} = λ_{j n} W_{j n} Y_{j n}^{c} + c_{j n} ι_{n} + ϵ_{j n}, for every j = 1, \dots, m .

(56)

Regress

Y_{1 n}^{c}

on

ι_{n}

and

Y_{- 1, n}^{c}

:

Y_{1 n}^{c} = α^{c} ι_{n} + Y_{- 1, n}^{c} β^{c} + V_{n}^{c} .

(57)

where

α^{c}

is a constant and

Y_{- 1, n}^{c}

is similarly defined as

Y_{- 1}

above. Since

W_{j n}

is row-normalized,

W_{j n} ι_{n} = ι_{n}

, we have

Y_{j n}^{c} = S_{j n}^{- 1} (c_{j n} ι_{n} + ϵ_{j n}) = ψ_{j n} c_{j n} ι_{n} + S_{j n}^{- 1} ϵ_{j n} = ψ_{j n} c_{j n} ι_{n} + Y_{j n}

. So (57) could be rewritten as:

Y_{1 n} = (α^{c} - ψ_{1 n} c_{1 n} + \sum_{j = 2}^{m} ψ_{j n} c_{j n} β_{j n}^{c}) ι_{n} + Y_{- 1, n} β^{c} + V_{n}^{c} .

(58)

Compare (58) and (42), OLS estimation of

β^{c}

is the same as that of

β

and thus the corresponding statistics would be the same.

3.4. “Spurious” Regression with Equal Weights

Baltagi and Liu [38] show that under the special case where the spatial weight matrix is row-normalized and with equal spatial weights, i.e.,

w_{n, i j} = \{\begin{matrix} 0 & , if i = j, \\ \frac{1}{n - 1} & , if i \neq j . \end{matrix}

(59)

spurious regression will not occur. This spatial weight matrix “is naturally suggested if all units are neighbors to each other and there is no other natural or observable measure of distance [39]”, such as interactions between students in a class or workers in a firm, etc. Without loss of generality, the DGP is assumed to be

\begin{matrix} \begin{matrix} Y_{n} = λ_{1} W_{n} Y_{n} + ϵ_{1 n}, \\ X_{n} = λ_{2} W_{n} X_{n} + ϵ_{2 n} . \end{matrix} \end{matrix}

(60)

Consider the regression

Y_{n} = α ι_{n} + X_{n} β + V_{n} .

(61)

By the Frisch–Waugh–Lovell Theorem, the OLS estimation of

β

is

\hat{β} = {(X_{n}^{'} E_{n} X_{n})}^{- 1} X_{n}^{'} E_{n} Y_{n}

, where

E_{n} = I_{n} - \frac{J_{n}}{n}

and

J_{n}

is an

n \times n

matrix of ones. Kelejian and Prucha [39] show the inverse of the matrix

S_{n k} = I_{n} - λ_{k} W_{n}

,

k = 1, 2

, is

S_{n k}^{- 1} = δ_{k 1} J_{n} - δ_{k 2} I_{n},

(62)

where

δ_{k 1} = \frac{λ_{k}}{(n - 1 + λ_{k}) (1 - λ_{k})}

and

δ_{k 2} = \frac{n - 1}{n - 1 + λ_{k}}

. Using the fact that

E_{n} J_{n} = 0

, Baltagi and Liu [38] show

\frac{1}{n} X_{n}^{'} E_{n} X_{n} \overset{p}{⟶} σ_{2}^{2}

and

\frac{1}{\sqrt{n}} X_{n}^{'} E_{n} Y_{n} \overset{d}{⟶} N (0, σ_{1}^{2} σ_{2}^{2})

, so the asymptotic distribution of

\hat{β}

is given by

\sqrt{n} \hat{β} = {(\frac{1}{n} X_{n}^{'} E_{n} X_{n})}^{- 1} (\frac{1}{\sqrt{n}} X_{n}^{'} E_{n} Y_{n}) \overset{d}{⟶} N (0, σ_{1}^{2} / σ_{2}^{2}) .

(63)

The asymptotic distribution of

\hat{β}

does not depend on

λ_{1}

and

λ_{2}

and is

\sqrt{n}

consistent (compared with (49)), which means that the spurious regression does not occur.

4. Estimation and Inference

The spurious regression studied in the previous section suggests that OLS is not a good estimator in the SAR model with spatial unit roots. QMLE and 2SLS are alternative methods of estimation. This section briefly reviews these estimators and their performance under near unit roots.

When the error term of the DGP has a spatial autoregressive structure, OLS and (feasible) generalized least squares ((F)GLS) estimators are consistent. The efficiency of the OLS estimator was considered by Krämer and Donninger [40], Tilke [41] for the symmetric spatial weight matrices, and generalized by Krämer and Baltagi [42] with a broader covariance matrix. But the symmetry of the weight matrix is too restrictive to be used in practice, so Martellosio [43] generalizes this to nonsymmetric weights matrices. The efficiency of the OLS estimator is defined as

η : = \frac{tr [var (X {\hat{β}}_{G L S})]}{tr [var (X {\hat{β}}_{O L S})]}

. But these papers generally focus on the relationship between

W_{n}

and X, for example, when the column space of

W_{n}

is contained by the column space of X. Following Lee and Yu [34], Baltagi et al. [37] derive the asymptotic properties of OLS, (F)GLS of

β

and point out important differences from conventional theory based on stationary spatial error.

A special SAR model has also been considered in regular lattices under spatial unit roots with the form

Y_{k, l} = \sum_{i = 0}^{p_{1}} \sum_{j = 0}^{p_{2}} α_{i, j} Y_{k - i, l - j} + ϵ_{k, l}, α_{0, 0} = 0 .

(64)

The simplest case of this special SAR model is the doubly geometric spatial autoregressive process:

Y_{k, l} = α Y_{k - 1, l} + β Y_{k, l - 1} - α β Y_{k - 1, l - 1} + ϵ_{k, l} .

(65)

It is called “unilateral” because only the previous units have effects on the latter ones and have a lower triangular weight matrix. It can be considered as the combination of two autoregressive (AR) models [44]. This model has been widely used in the area of image processing, agriculture trials, digital filtering, and other different fields. Model (64) is unstable when either

| α | \geq 1

or

| β | \geq 1

because of the existence of spatial unit roots [45,46].

A more complicated special case of the unilateral model is

Y_{k, l} = α Y_{k - 1, l} + β Y_{k, l - 1} + γ Y_{k - 1, l - 1} + ϵ_{k, l} .

(66)

Because of its simple form, the estimation and inference of (64) can be derived without too many assumptions, as will be seen shortly. Moreover, when spatial unit roots exist, the limit of the variance of

Y_{k, l}

are analytically obtained in Baran [47] when the parameters

(α, β, γ)

are located on the interior, on the edges, and on the vertices of the domain of stability. Paulauskas [48] explicitly shows that the growth rate of the variance of Y is different in dimensions

d = 2, 3

and

d = 4

. Though this approach studies spatial unit roots from a different angle than that of Fingleton [14], it points to some similarity as in a recent paper [49] that will be discussed in Section 6.

Another possible way to remedy the problem caused by spatial unit roots in the SAR, SEM model is to relax the compactness assumption. When some parameters approach the boundary of the parameter space, consistency of extremum estimators could be obtained with compact parameter spaces. Thus the compactness assumption is standard in spatial econometrics because of the existence existence of the singular point

\frac{1}{ρ_{m a x}}

, see proofs in Kelejian and Prucha [6] Lee [5], Gupta [50]. But such an assumption is also restrictive in the sense that if we choose an arbitrary compact set on the open parameter space, the true global optimizer may be exclusive, especially for near unit root cases. A recent paper by Liu et al. [51] generalizes Theorem 2.7 in Newey and McFadden [52] (p. 2133), which relaxes compactness when the objective function of an extremum estimator is concave and allows the non-stochastic objective functions to depend on the sample size n. This generalization is suitable for spatial econometrics models because the sample observations are usually in triangular arrays. (A triangular array is a doubly indexed sequence of numbers or polynomials. Each row of the array is only as long as the row’s index. For example, the ith row contains only i elements.) The consistency of the QMLE of the SAR model and the MLE of the SAR Tobit model are investigated. But a closed-form solution is not obtained. On the other hand, Gupta [53] proposes a Newton-step computational algorithm of QMLE for a large-parameter-size SAR model, which is free from the compactness assumptions. Under the normality assumption, it has the same asymptotic efficiency as MLE, but has a closed-form solution and is computationally simple.

4.1. QMLE and 2SLS Methods for the (Mixed) SAR Model

4.1.1. Quasi-Maximum Likelihood Estimation Method

Lee [5] investigates the asymptotic distribution of the QMLE estimator of the mixed SAR model, which is the starting point for further analysis when spatial unit roots exist. This analysis is based on the discussion of the singularity of the information matrix of the log-likelihood function. Especially when the information matrix is singular, a scaling factor

\frac{1}{\sqrt{h_{n}}}

is needed, where

\frac{1}{h_{n}}

is the order of the elements of the spatial weight matrix

W_{n}

and thus

\frac{1}{\sqrt{h_{n}}}

is the order elements of

G_{n} = W_{n} S_{n}^{- 1}

. We have seen in (39) that the order of elements of

G_{n}

is

ψ_{n}

in the near unit roots case; thus, a similar scaling factor will be needed.

The (mixed) SAR model under consideration is

Y_{n} = X_{n} β + λ W_{n} Y_{n} + V_{n},

(67)

with its reduced form

\begin{matrix} \begin{matrix} Y_{n} & = S_{n}^{- 1} (X_{n} β_{0} + V_{n}) \\ = X_{n} β_{0} + λ_{0} G_{n} X_{n} β_{0} + S_{n}^{- 1} V_{n}, \end{matrix} \end{matrix}

(68)

since

I_{n} + λ_{0} G_{n} = S_{n}^{- 1}

.

Also, Lee [5] imposes a weaker assumption about the spatial weight matrix and derives the information matrix.

Assumption 7.

The elements

w_{n, i j}

of

W_{n}

are at most of order

h_{n}^{- 1}

, denoted by

O (1 / h_{n})

, uniformly in all

i, j

where the rate sequence

\{h_{n}\}

can be bounded or divergent. The ratio

h_{n} / n \to 0

as n goes to infinity.

Let

V_{n} (δ) = Y_{n} - X_{n} β - λ W_{n} Y_{n}

, where

δ = {(β^{'}, λ)}^{'}

, then the log-likelihood function of (67) is

ln L_{n} (θ) = - \frac{n}{2} ln (2 π) - \frac{n}{2} ln σ^{2} + ln |S_{n} (λ)| - \frac{1}{2 σ^{2}} V_{n}^{'} (δ) V_{n} (δ),

(69)

where

θ = {(β^{'}, λ, σ^{2})}^{'}

. The information matrix is

E (\frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (θ_{0})}{\partial θ} \frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (θ_{0})}{\partial θ^{'}}) = - E (\frac{1}{n} \frac{\partial^{2} ln L_{n} (θ_{0})}{\partial θ \partial θ^{'}}) + Ω_{θ, n},

(70)

where

\begin{matrix} \begin{matrix} - E (\frac{1}{n} \frac{\partial^{2} ln L_{n} (θ_{0})}{\partial θ \partial θ^{'}}) \\ = (\begin{matrix} \frac{1}{σ_{0}^{2} n} X_{n}^{'} X_{n} & \frac{1}{σ_{0}^{2} n} X_{n}^{'} (G_{n} X_{n} β_{0}) & 0 \\ \frac{1}{σ_{0}^{2} n} {(G_{n} X_{n} β_{0})}^{'} X_{n} & \frac{1}{σ_{0}^{2} n} {(G_{n} X_{n} β_{0})}^{'} (G_{n} X_{n} β_{0}) + \frac{1}{n} tr (G_{n}^{s} G_{n}) & \frac{1}{σ_{0}^{2} n} tr (G_{n}) \\ 0 & \frac{1}{σ_{0}^{2} n} tr (G_{n}) & \frac{1}{2 σ_{0}^{4}} \end{matrix}) . \end{matrix} \end{matrix}

(71)

with

G_{n}^{s} = G_{n} + G_{n}^{'}

. The existence of the extra

Ω_{θ, n}

is because

V_{n}

is not necessarily normally distributed.

To ensure the asymptotic distribution of QMLE

\hat{θ}

exists,

Σ_{θ} = - {lim}_{n \to \infty} E (\frac{1}{n} \frac{\partial^{2} ln L_{n} (θ_{0})}{\partial θ \partial θ^{'}})

must be well defined. Lee [5] proves the nonsingularity of

Σ_{θ}

can be guaranteed by the fact that there does not exist a nonzero vector

λ = {(λ_{1}^{'}, λ_{2}, λ_{3})}^{'}

such that a linear combination of columns of

Σ_{θ}

is 0. This condition could be simplified as: there does not exist a

λ_{2} \neq 0

, such that

\begin{matrix} \begin{matrix} \{lim_{n \to \infty} \frac{1}{n σ_{0}^{2}} {(G_{n} X_{n} β_{0})}^{'} & M_{n} (G_{n} X_{n} β_{0}) + \\ lim_{n \to \infty} \frac{1}{n} [tr (G_{n}^{'} G_{n}) + tr (G_{n}^{2}) - \frac{2}{n} {tr}^{2} (G_{n})]\} λ_{2} = 0 . \end{matrix} \end{matrix}

(72)

Since each term in (72) is greater or equal to 0 (the first term is non-negative because it is symmetric; for the second term,

tr (G_{n} G_{n}^{'}) + tr (G_{n}^{2}) - \frac{2}{n} {tr}^{2} (G_{n}) = \frac{1}{2} tr [(C_{n}^{'} + C_{n}) {(C_{n}^{'} + C_{n})}^{'}] = \frac{1}{2} t r (C_{n}^{s^{'}} C_{n}^{s}) \geq 0

where

C_{n} = G_{n} - \frac{t r (G_{n})}{n} I_{n}

and

C_{n}^{s} = C_{n}^{'} + C_{n}

), Lee [5] studies the singularity of the information matrix in terms of these two terms, respectively. For the first term, by the partition matrix formula,

{lim}_{n \to \infty} \frac{1}{n} {(X_{n}, G_{n} X_{n} β_{0})}^{'} (X_{n}, G_{n} X_{n} β_{0})

is nonsingular if and only if

{lim}_{n \to \infty} \frac{1}{n} X_{n}^{'} X_{n}

and

{lim}_{n \to \infty} \frac{1}{n} {(G_{n} X_{n} β_{0})}^{'} M_{n} (G_{n} X_{n} β_{0})

are nonsingular. Moreover, under Assumption 7, if

G_{n} X_{n} β_{0}

and

X_{n}

are independent, one sufficient condition for the nonsingularity of

Σ_{θ}

could be:

Assumption 8.

{lim}_{n \to \infty} \frac{1}{n} {(X_{n}, G_{n} X_{n} β_{0})}^{'} (X_{n}, G_{n} X_{n} β_{0})

exists and is nonsingular.

However, Lee [5] states that if

G_{n} X_{n} β_{0}

and

X_{n}

are linearly dependent, for example, when

W_{n}

is row-normalized and the relevant regressor is only the constant term, Assumption 8 should be adjusted to guarantee the second term in (72) is greater than 0:

Assumption 9.

{lim}_{n \to \infty} \frac{1}{n} {(G_{n} X_{n} β_{0})}^{'} M_{n} (G_{n} X_{n} β_{0}) = 0

and the

\{h_{n}\}

is a bounded sequence and, for any

λ \neq λ_{0}

,

lim_{n \to \infty} (\frac{1}{n} ln |σ_{0}^{2} S_{n}^{- 1} S_{n}^{' - 1}| - \frac{1}{n} ln |σ_{n}^{2} (λ) S_{n}^{- 1} (λ) S_{n}^{' - 1} (λ)|) \neq 0 .

(73)

Then under Assumption 7 and either 8 or 9, the asymptotic distribution of QMLE

\hat{θ}

will be

\sqrt{n} ({\hat{θ}}_{n} - θ_{0}) \overset{D}{\to} N (0, Σ_{θ}^{- 1} + Σ_{θ}^{- 1} Ω_{θ} Σ_{θ}^{- 1}),

(74)

where

Ω_{θ} = {lim}_{n \to \infty} Ω_{θ, n}

,

Σ_{θ} = - {lim}_{n \to \infty} E (\frac{1}{n} \frac{\partial^{2} ln L_{n} (θ_{0})}{\partial θ \partial θ^{'}})

.

The above results are based on

Σ_{θ}

being invertible of which a necessary condition is that

{h_{n}}

is a bounded sequence. However, when

{lim}_{n \to \infty} h_{n} = \infty, Σ_{θ}

will become singular because

\frac{1}{n} tr [(C_{n}^{'} + C_{n}) {(C_{n}^{'} + C_{n})}^{'}] = O (\frac{1}{h_{n}}) = o (1)

. Also, the singularity of the information matrix implies that the score function will be too flat to be useful and thus an adjustment of the rate should be imposed as in Lee [5]

Assumption 10.

The

\{h_{n}\}

is a divergent sequence, elements of

M_{n} (G_{n} X_{n} β_{0})

have the uniform order

O (\frac{1}{\sqrt{h_{n}}})

, and

{lim}_{n \to \infty} (\frac{h_{n}}{n}) {(G_{n} X_{n} β_{0})}^{'} M_{n} (G_{n} X_{n} β_{0}) = c

with

0 \leq

c < \infty

. Under this situation, either (a)

c > 0

, or (b) if

c = 0

\begin{matrix} lim_{n \to \infty} (\frac{h_{n}}{n} ln |σ_{0}^{2} S_{n}^{- 1} S_{n}^{' - 1}| - \frac{h_{n}}{n} ln |σ_{n}^{2} (λ) S_{n}^{- 1} (λ) S_{n}^{' - 1} (λ)|) \neq 0, \end{matrix}

(75)

whenever

λ \neq λ_{0}

.

Lee [5] gives the asymptotic distributions of the QMLE under this rate-adjusted assumption:

\begin{matrix} \begin{matrix} \sqrt{\frac{n}{h_{n}}} ({\hat{λ}}_{n} - λ_{0}) \overset{D}{\to} N (0, σ_{λ}^{2}), \\ \sqrt{\frac{n}{h_{n}}} ({\hat{β}}_{n} - β_{0}) \overset{D}{\to} N (0, σ_{λ}^{2} lim_{n \to \infty} {(X_{n}^{'} X_{n})}^{- 1} X_{n}^{'} (G_{n} X_{n} β_{0}) {(G_{n} X_{n} β_{0})}^{'} X_{n} {(X_{n}^{'} X_{n})}^{- 1}), \\ \sqrt{n} ({\hat{σ}}_{n}^{2} - σ_{0}^{2}) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} (v_{i}^{2} - σ_{0}^{2}) + o_{P} (1) \overset{D}{\to} N (0, μ_{4} - σ_{0}^{4}) . \end{matrix} \end{matrix}

(76)

4.1.2. Generalized Spatial Two-Stage Least Squares (GS2SLS) Method

Kelejian and Prucha [7] proposed the GS2SLS method for the “cross-sectional (first-order) autoregressive spatial model with (first-order) autoregressive disturbances”:

\begin{matrix} \begin{matrix} Y_{n} & = X_{n} β + λ W_{n} Y_{n} + u_{n} & | λ | < 1 \\ = Z_{n} δ + u_{n}, \\ u_{n} & = ρ M_{n} u_{n} + ϵ_{n}, & | ρ | < 1, \end{matrix} \end{matrix}

(77)

where

Z_{n} = (X_{n}, W_{n} Y_{n})

and

δ = {(β^{'}, λ)}^{'}

. Since

W_{n} Y_{n}

is endogenous, it should be instrumented. Assume

ρ

is known (a consistent estimator is given by Kelejian and Prucha [6]), a Cochrane–Orcutt (CO)-type transformation applied to (77) yield the transformed regression

\begin{matrix} \begin{matrix} Y_{n^{*}} & = Z_{n^{*}} δ + ϵ_{n} \\ = (I_{n} - ρ M_{n}) Z_{n} δ + ϵ_{n} \\ = (Z_{n} - ρ M_{n} Z_{n}) δ + ϵ_{n} . \end{matrix} \end{matrix}

(78)

M_{n} Z_{n}

should be instrumented. The ideal instruments are of course

E [Z_{n}] = (X_{n}, W_{n} E [Y_{n}]) and E [M_{n} Z_{n}] = (M_{n} X_{n}, M_{n} W_{n} E [Y_{n}]) .

(79)

Then, by (10),

E [Y_{n}] = {(I_{n} - λ W_{n})}^{- 1} X_{n} β = [\sum_{i = 0}^{\infty} λ^{i} W_{n}^{i}] X_{n} β

, the ideal instruments are:

H_{n} = [X_{n}, W_{n}^{1} X_{n}, W_{n}^{2} X_{n}, \dots, M_{n} X_{n}, M_{n} W_{n}^{1} X_{n}, M_{n} W_{n}^{2} X_{n}, \dots] .

(80)

In practice,

H_{n} = [X_{n}, W_{n}^{1} X_{n}, W_{n}^{2} X_{n}, M_{n} X_{n}, M_{n} W_{n}^{1} X_{n}, M_{n} W_{n}^{2} X_{n}]

is used.

Assumptions about the instrument matrix are made to ensure that

{lim}_{n \to 1} \frac{1}{n} H_{n}^{'} Z_{n}

exists and is of full rank. In the near unit roots case, similar to the QMLE method, scaling factors are needed to guarantee this property as we will see later. With these assumptions, the GS2SLS procedure has three steps, as in Kelejian and Prucha [7], as follows:

Run 2SLS on $Y_{n} = Z_{n} δ + u_{n}$ with instruments $H_{n}$ . This yields ${\tilde{δ}}_{n} = {({\hat{Z}}_{n}^{'} {\hat{Z}}_{n})}^{- 1} {\hat{Z}}_{n}^{'} Y_{n}$ , where ${\hat{Z}}_{n} = P_{H_{n}} Z_{n} = (X_{n}, \hat{W_{n} Y_{n}})$ , $P_{H_{n}}$ is the projection matrix of $H_{n}$ , and ${\tilde{δ}}_{n} = δ + O_{p} (n^{- \frac{1}{2}})$ .
Estimate $ρ$ by Kelejian and Prucha [6] according to the GMM system: $ρ_{n} = G_{n} λ + V_{n}$ where $λ = {(ρ, ρ^{2}, σ^{2})}^{'}$ , then solve $\tilde{λ} = G_{n}^{'} g_{n}$ , or by $\overset{\approx}{λ} = {argmin}_{ρ, σ^{2}} V_{n}^{'} V_{n}$ . Both $\tilde{λ}$ and $\overset{\approx}{λ}$ are consistent, but $\overset{\approx}{λ}$ is more efficient.
Assuming $ρ$ is known, run 2SLS on the CO transformed regression (78) with instruments $H_{n}$ yields ${\hat{δ}}_{n} = {({\hat{Z}}_{n^{*}} {\hat{Z}}_{n^{*}})}^{- 1} {\hat{Z}}_{n^{*}}^{'} Y_{n^{*}}$ , where

$\begin{matrix} \begin{matrix} {\hat{Z}}_{n^{*}} & = P_{H_{n}} Z_{n^{*}} = P_{H_{n}} [X_{n} - ρ M_{n} X_{n}, W_{n} Y_{n} - ρ M_{n} W_{n} Y_{n}] \\ = [X_{n} - ρ M_{n} X_{n}, \hat{W_{n} Y_{n} - ρ M_{n} W_{n} Y_{n}}] . \end{matrix} \end{matrix}$

(81)

By replacing $ρ$ by its consistent estimation ${\hat{ρ}}_{n}$ (in Step 2). The feasible 2SLS estimator is

$\begin{matrix} {\hat{δ}}_{F, n} = {[{\hat{Z}}_{n^{*}} {({\hat{ρ}}_{n})}^{'} {\hat{Z}}_{n^{*}} ({\hat{ρ}}_{n})]}^{- 1} {\hat{Z}}_{n^{*}} ({\hat{ρ}}_{n}) Y_{n} ({\hat{ρ}}_{n}) . \end{matrix}$

(82)

Obviously, this procedure is for a SARAR model. If it is a SAR model, only step 1 is needed and

H_{n} = [X_{n}, W_{n}^{1} X_{n}, W_{n}^{2} X_{n}, \dots]

.

4.1.3. Best Generalized Spatial Two Stage Least Squares (BGS2SLS) Estimators

Lee [8] does not drop the higher-order terms in (80) but use the fact that

W_{n} E [Y_{n}] = W_{n} {(I_{n} - λ W_{n})}^{- 1} X_{n} β = G_{n} X_{n} β

by the definition of

G_{n}

and proposes the best instrument:

\begin{matrix} \begin{matrix} H_{n}^{*} & = (I_{n} - ρ M_{n}) [X_{n}, W_{n} {(I_{n} - λ W_{n})}^{- 1} X_{n} β] \\ = (I_{n} - ρ M_{n}) [X_{n}, G_{n} X_{n} β] . \end{matrix} \end{matrix}

(83)

With the corresponding simple instrumental variable estimator, the BGS2SLS is

{\hat{δ}}_{B, n} = {[H_{n}^{*'} Z_{n}]}^{- 1} H_{n}^{*'} Y_{n} .

(84)

If there is no SAR structure in the disturbance term, the best instrument is

H_{n}^{*} = [X_{n}, G_{n} X_{n} β],

(85)

and

β

in (85) could be replaced by any consistent estimator such as the KP-GS2SLS estimator.

Compared with (80),

H_{n}^{*}

does not drop off the higher-order terms and is numerically equivalent to the ideal instrument, which in turn yields asymptotically optimal instrumental variable estimators.

4.2. Near Unit Roots in the SAR Model

Lee and Yu [15] study the asymptotic distribution of QMLE and 2SLS estiamtors of the SAR model by decomposing

Y_{n}

(see Section 3.3.1). The model is given as

Y_{n} = λ_{n 0} W_{n} Y_{n} + X_{n} β_{0} + ϵ_{n},

(86)

where

λ_{n 0} = 1 - \frac{1}{ψ_{n}}

. And the reduced form, again, is

Y_{n} = S_{n}^{- 1} (X_{n} β_{0} + V_{n}) = X_{n} β_{0} + λ_{0} G_{n} X_{n} β_{0} + S_{n}^{- 1} V_{n} .

(87)

4.2.1. QMLE

Obviously, the generated regressor

G_{n} X_{n} β_{0}

is explosive because of

ψ_{n}

in

G_{n}

, see (39). This is very similar to the case in Assumption 10 (a) with

{lim}_{n \to \infty} h_{n} = \infty

in Section 4.1.1. This implies that the convergence rate of estimators of

ρ_{n 0}

is not the usual

\sqrt{n}

case as in (76).

Additional assumptions are made in Lee and Yu [15] to control the magnitude of the unstable part of

W_{n}

(Assumption 11), and specify the identification condition (Assumption 12), which are adjusted from Assumption 8:

Assumption 11.

(1)

{lim}_{n \to \infty} \frac{m_{n}}{n} > 0

; (2)

{lim}_{n \to \infty} n^{- 1} tr (W_{n}^{' u} W_{n}^{u}) > 0

; (3) for any finite constant

c, {lim}_{n \to \infty} n^{- 1} tr [{(I_{n} - c W_{n}^{u})}^{'} (I_{n} - c W_{n}^{u})] > 0

.

Assumption 12.

β_{0}^{'} {lim}_{n \to \infty} (n^{- 1} X_{n}^{'} W_{n}^{' u} M_{n} W_{n}^{u} X_{n}) β_{0} > 0

holds.

Assumption 11 (1) (2) guarantee that

m_{n}

is not too small compared to n and (3) implies that it is not too large. Assumption 12 is equivalent to (see Lee and Yu [15] (p. 338, Lemma 1 (7)))

{lim}_{n \to \infty} \frac{1}{n ψ_{n}^{2}} {(G_{n} X_{n} β_{0})}^{'} M_{n} (G_{n} X_{n} β_{0})

, which is a rate-adjusted version of Assumption 8, that ensures the identification uniqueness and implies nonsingularity of

Σ_{θ}

. For a detailed discussion of these two assumptions, see Baltagi et al. [37] (p. 6). Also, since the adjusted rate is

\frac{1}{ψ_{n}^{2}}

, QMLE

{\hat{λ}}_{n}

would be

\sqrt{n} ψ_{n}

-consistent, see (76) derived under Assumption 10.

The information matrix will be the same as in (70). In Section 3.3.2, when studying the spurious regression of OLS, we mentioned that the scaling factor

Y_{m}

is needed in terms of the order of

X_{n}

. Here, for the QMLE, similar scaling factor will also be introduced because elements of

Σ_{n}

and

Ω_{n}

have different orders as the existence of

G_{n}

. The second column and row of

Σ_{n}

and

Ω_{n}

, which are the derivatives with respect to the spatial coefficient

λ_{n}

contain

G_{n}

, thus, they have to be scaled by a factor

\frac{1}{ψ_{n}}

. Specifically, the

(2, 2)

element should be scaled by a factor

\frac{1}{ψ_{n}^{2}}

. This can be done by a left and right multiplying matrix

Y_{θ, n}^{- 1}

, where

Y_{θ, n} = (\begin{matrix} Y_{δ, n} & 0_{(k + 1) \times 1} \\ 0_{1 \times (k + 1)} & 1 \end{matrix}) and Y_{δ, n} = (\begin{matrix} I_{k} & 0_{k \times 1} \\ 0_{1 \times k} & ψ_{n} \end{matrix}) .

(88)

Thus, Lee and Yu [15] give

\sqrt{n} Y_{θ, n} ({\hat{θ}}_{n} - θ_{n 0}) = - {(Y_{θ, n}^{- 1} \frac{1}{n} \frac{\partial^{2} ln L_{n} ({\tilde{θ}}_{n})}{\partial θ \partial θ^{'}} Y_{θ, n}^{- 1})}^{- 1} Y_{θ, n}^{- 1} \frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (θ_{n 0})}{\partial θ} .

(89)

Let

Σ = {lim}_{n \to \infty} Y_{θ, n}^{- 1} Σ_{n} Y_{θ, n}^{- 1}

and

Ω = {lim}_{n \to \infty} Y_{θ, n}^{- 1} Ω_{n} Y_{θ, n}^{- 1}

and assume they exist, the asymptotic distribution of

{\hat{θ}}_{n}

is

\sqrt{n} Y_{θ, n} ({\hat{θ}}_{n} - θ_{n 0}) \overset{D}{\to} N (0, Σ^{- 1} + Σ^{- 1} Ω Σ^{- 1}) .

(90)

Recently, Rossi and Lieberman [54] combine the near unit roots with a similarity-based weighted matrix and study the consistency of the QMLE estimator when the spatial coefficient

0 \leq λ < 1

and

λ = 1

, by allowing uncentered units. The element of similarity-based weighted matrix is

w_{i, j} = \frac{s (x_{i}, x_{j}; w_{0})}{\sum_{j \neq i} s (x_{i}, x_{j}; w_{0})}

, where

s (x_{i}, x_{j}; w_{0})

is some function that measures the similarity between unit i and j according to some parameter

w_{0}

. The parameters they are most interested in are

θ_{2} = {(λ, w^{'})}^{'}

. They establish the connection between the

λ

and the order of uniform absolute row-sum norm of

S_{0}^{- 1}

,

{∥ S_{0}^{- 1} ∥}_{\infty} = {∥S^{- 1} (θ_{20})∥}_{\infty} = O (n^{γ})

. This means that

λ_{0} = 1 \Leftrightarrow γ = 1

[54] (p. 11, Proposition 1). Recall that

Var (Y_{n}) = σ_{ϵ}^{2} S_{n}^{- 1} S_{n}^{' - 1}

, so when

γ = 0

, the variance is independent of n, corresponding to the standard SAR setup (

λ < 1

and fixed). In the case

0 < γ < 1

, the variance increases with the sample size but with a lower speed, which is the case that we have seen when studying near unit root; but when

γ = 1

,

λ = 1

, the variance increases so fast that the non-standard limit distribution of

θ = (λ, w)

has to be established on a case-by-case basis, according to the resulting

S_{n}^{- 1}

. Their result is much more complicated than that of Lee and Yu [15] because of the introduction of similarity structure in the weight matrix, but are much more flexible since now

W_{n}

is no longer fixed but data-driven, and is potentially more useful in empirical work.

4.2.2. GS2SLS and BGS2SLS

Lee and Yu [15] derive the 2SLS estimators and their asymptotic distributions using the procedures mentioned above. Using instruments defined in (80), the GS2SLS estimator of

δ_{n 0}

is

{\hat{δ}}_{n, 2 s l s} = {(Z_{n}^{'} P_{H_{n}} Z_{n})}^{- 1} Z_{n}^{'} P_{H_{n}} Y_{n} .

(91)

Since

Z_{n} = [X_{n}, W_{n} Y_{n}]

contains

W_{n}

, the adjustment by

Y_{δ, n}

is needed. The asymptotic properties of the GS2SLS estimators of

λ_{n 0}

and

β_{0}

are obtained as follows:

Y_{δ, n} \sqrt{n} ({\hat{δ}}_{n, 2 s l s} - δ_{n 0}) = (\begin{matrix} \sqrt{n} ({\hat{β}}_{n, 2 s l s} - β_{0}) \\ \sqrt{n} ψ_{n} ({\hat{λ}}_{n, 2 s l s} - λ_{n 0}) \end{matrix}) \overset{d}{\to} N (0, Φ_{2 s l s})

(92)

where

Φ_{2 s l s} = σ_{0}^{2} {({lim}_{n \to \infty} \frac{1}{n} Y_{δ, n}^{- 1} Z_{n}^{'} P_{H_{n}} Z_{n} Y_{δ, n}^{- 1})}^{- 1}

. So the GS2SLS estimator

{\hat{λ}}_{n, 2 s l s}

of

λ_{n 0}

is

\sqrt{n} ψ_{n}

-consistent, which is higher than the usual

\sqrt{n}

rate in Kelejian and Prucha [7] and Lee [8], but

{\hat{β}}_{n, 2 s l s}

has the usual

\sqrt{n}

rate of convergence.

Choosing the instrument

H_{n}^{*} = [X_{n}, G_{n} X_{n} β_{0}]

as in (85), the BGS2SLS estimator is

{\hat{δ}}_{n, b 2 s l s} = {(H_{n}^{*'} Z_{n})}^{- 1} H_{n}^{*'} Y_{n}

with asymptotic distribution:

Y_{δ, n} \sqrt{n} ({\hat{δ}}_{n, b 2 s l s} - δ_{n 0}) \overset{d}{\to} N (0, Φ_{b 2 s l s})

(93)

where

Φ_{b 2 s l s} = σ_{0}^{2} {({lim}_{n \to \infty} \frac{1}{n} Y_{δ, n}^{- 1} H_{n}^{*'} H_{n}^{*} Y_{δ, n}^{- 1})}^{- 1}

. Since

Φ_{b 2 s l s} - Φ_{2 s l s}

is negative semidefinite, the BGS2SLS estimator is more efficient.

The above result is based on the fact that

G_{n} X_{n} β

and

X_{n}

are independent, which makes sure that the instrument matrix is of full rank, otherwise the 2SLS estimator will be inconsistent [9]. Liu [55] shows that even though

G_{n} X_{n} β

and

X_{n}

are linearly dependent, i.e.,

G_{n} X_{n} β = X_{n} c_{n}

, where

c_{n}

is a nonzero vector, we still have

{\hat{ρ}}_{2 S L S} - ρ = O_{p} (\frac{1}{ψ_{n}})

,

{\hat{β}}_{k, 2 S L S} - β_{k} = O_{p} (\frac{c_{k n}}{ψ_{n}}) + O_{p} (\frac{1}{\sqrt{n}})

under near unit roots case; and

{\hat{ρ}}_{2 S L S} - ρ = O_{p} (1)

,

{\hat{β}}_{k, 2 S L S} - β_{k} = O_{p} (c_{k n}) + O_{p} (\frac{1}{\sqrt{n}})

under the regular case, as long as

c_{n} = o (1)

. This is equivalent to

G_{n} X_{n} β

and

X_{n}

, which are asymptotically independent.

To provide guidelines for empirical studies, Lee and Yu [15] conduct simulations to compare the performance of QMLE and 2SLS methods. QMLEs are relatively robust whether the error term is normally distributed or not. Moreover, as n increases (and the spatial coefficient is closer to the spatial unit roots), the QMLEs perform better than the 2SLS estimators because of smaller variances. One interesting phenomenon is that the best 2SLS estimators are even worse than the regular 2SLS estimators in some cases, which violates the theoretical result as shown in (93). One possible reason for this is that the best 2SLS estimator requires an initial consistent estimator by construction (see (85)) and under spatial unit roots, such an initial estimator may not be accurately calculated.

4.3. Near Unit Roots in the SEM Model

Baltagi et al. [37] extend the study of near unit roots from SAR model to SEM model by considering the OLS, GLS and FGLS estimation and properties of the corresponding statistics. The model is given as

\begin{matrix} \begin{matrix} Y_{n} = X_{n} β_{0} + u_{n}, \\ u_{n} = λ_{n 0} W_{n} u_{n} + ϵ_{n}, \end{matrix} \end{matrix}

(94)

with

λ_{n 0} = 1 - \frac{1}{ψ_{n}}

. Similar to Lee and Yu [15],

u_{n}

could be decomposed to

u_{n} = S_{n}^{- 1} ϵ_{n} = ψ_{n} W_{n}^{u} ϵ_{n} + {(I_{n} - λ_{n 0} {\tilde{W}}_{n})}^{- 1} ϵ_{n} .

(95)

One more assumption that they impose is

Assumption 13.

The elements of

X_{n}

are nonstochastic and bounded, uniformly in n,

{lim}_{n \to \infty} n^{- 1} X_{n}^{'} X_{n}

exists and is nonsingular.

{lim}_{n \to \infty} n^{- 1} X_{n}^{'} W_{n}^{u} W_{n}^{u^{'}} X_{n}

exists. Furthermore,

{lim}_{n \to \infty} n^{- 1} X_{n}^{'} S_{n}^{'} S_{n} X_{n}

exists and is nonsingular.

The OLS estimator

{\hat{β}}_{OLS} = {(X_{n}^{'} X_{n})}^{- 1} X_{n}^{'} Y_{n}

has the asymptotic distribution when

\frac{ψ_{n}}{\sqrt{n}} \to 0

\begin{matrix} \begin{matrix} \frac{\sqrt{n}}{ψ_{n}} ({\hat{β}}_{OLS} - β_{0}) \overset{d}{\to} & N (0, σ_{0}^{2} {(lim_{n \to \infty} \frac{1}{n} X_{n}^{'} X_{n})}^{- 1} \\ (lim_{n \to \infty} \frac{1}{n} X_{n}^{'} W_{n}^{u} W_{n}^{u^{'}} X_{n}) {(lim_{n \to \infty} \frac{1}{n} X_{n}^{'} X_{n})}^{- 1}), \end{matrix} \end{matrix}

(96)

and when

\frac{ψ_{n}}{\sqrt{n}} \to c < \infty

,

{\hat{β}}_{OLS} - β_{0} = O_{p} (1)

. Thus

{\hat{β}}_{OLS} = β_{0} + O_{p} (\frac{ψ_{n}}{\sqrt{n}})

, which is

\frac{\sqrt{n}}{ψ_{n}}

-consistent and is slower than the stationary error term case.

Baltagi et al. [37] also study the asymptotic properties of the GLS and FGLS estimators. If

λ_{n 0}

is known,

{\hat{β}}_{G L S} = {(X_{n}^{'} S_{n}^{'} S_{n} X_{n})}^{- 1} X_{n}^{'} S_{n}^{'} S_{n} Y_{n}

, and

\sqrt{n} ({\hat{β}}_{G L S} - β_{0}) \overset{d}{\to} N (0, σ_{0}^{2} {(lim_{n \to \infty} \frac{1}{n} X_{n}^{'} S_{n}^{'} S_{n} X_{n})}^{- 1}),

(97)

which implies that

{\hat{β}}_{G L S}

is robust for the near unit roots in the error term because it has

\sqrt{n}

rate of convergence. The feasible GLS (FGLS) could be achieved by replacing

λ

by a consistent estimator

{\hat{λ}}_{n}

, which yields

{\hat{β}}_{F G L S} = {(X_{n}^{'} {\hat{S}}_{n}^{'} {\hat{S}}_{n} X_{n})}^{- 1} X_{n}^{'} {\hat{S}}_{n}^{'} {\hat{S}}_{n} Y_{n},

(98)

where

{\hat{S}}_{n} = I_{n} - {\hat{λ}}_{n} W_{n}

. It can be seen that FGLS is identical to the QMLE: concentrated log likelihood function of (94) with respect to

λ

is

ln L_{n} (λ) = - \frac{n}{2} (ln 2 π + 1) - \frac{n}{2} ln {\hat{σ}}_{n}^{2} (λ) + ln |S_{n} (λ)|,

(99)

where

\begin{matrix} {\hat{σ}}_{n}^{2} (λ) = n^{- 1} u_{n}^{'} S_{n} {(λ)}^{'} {\bar{P}}_{n} (λ) S_{n} (λ) u_{n}, \end{matrix}

(100)

\begin{matrix} {\bar{P}}_{n} (λ) = I_{n} - S_{n} (λ) X_{n} {[X_{n}^{'} S_{n} {(λ)}^{'} S_{n} (λ) X_{n}]}^{- 1} X_{n}^{'} S_{n} {(λ)}^{'} . \end{matrix}

(101)

The QMLE is of order

ψ_{n} ({\hat{λ}}_{n} - λ_{n 0}) = o_{p} (1)

, which is

ψ_{n}

-consistent and

\begin{matrix} \begin{matrix} {\hat{β}}_{n} ({\hat{λ}}_{n}) = {[X_{n}^{'} S_{n} {({\hat{λ}}_{n})}^{'} S_{n} ({\hat{λ}}_{n}) X_{n}]}^{- 1} X_{n}^{'} S_{n} {({\hat{λ}}_{n})}^{'} S_{n} ({\hat{λ}}_{n}) Y_{n}, \\ {\hat{σ}}_{n}^{2} ({\hat{λ}}_{n}) = \frac{1}{n} u_{n}^{'} S_{n} {({\hat{λ}}_{n})}^{'} {\bar{P}}_{n} ({\hat{λ}}_{n}) S_{n} ({\hat{λ}}_{n}) u_{n} . \end{matrix} \end{matrix}

(102)

Comparing with (98),

{\hat{β}}_{n} ({\hat{λ}}_{n})

is a FGLS of

β

using

{\hat{λ}}_{n}

. Thus, the QMLE

{\hat{β}}_{M L}

and the infeasible GLS estimator

{\hat{β}}_{G L S}

have the same asymptotic distribution as shown before. Next, Baltagi et al. [37] consider the Wald test statistic for the null hypothesis

H_{0} : R β = r

for OLS, GLS and FGLS, where R is a

q \times k

matrix of rank

q < k

and r is

q \times 1

. For OLS,

\begin{matrix} \begin{matrix} W_{OLS} & : = {(R {\hat{β}}_{O L S} - r)}^{'} {[{\hat{σ}}_{O L S}^{2} R {(X_{n}^{'} X_{n})}^{- 1} R^{'}]}^{- 1} (R {\hat{β}}_{O L S} - r) \\ \overset{d}{\to} ξ^{'} {[σ_{0}^{2} (lim_{n \to \infty} \frac{1}{n} tr (W_{n}^{u'} W_{n}^{u})) R {(lim_{n \to \infty} \frac{1}{n} X_{n}^{'} X_{n})}^{- 1} R^{'}]}^{- 1} ξ, \end{matrix} \end{matrix}

(103)

where

\begin{matrix} ξ \sim N (0, σ_{0}^{2} R {(lim_{n \to \infty} n^{- 1} X_{n}^{'} X_{n})}^{- 1} (lim_{n \to \infty} n^{- 1} X_{n}^{'} W_{n}^{u} W_{n}^{u^{'}} X_{n}) {(lim_{n \to \infty} n^{- 1} X_{n}^{'} X_{n})}^{- 1} R^{'}), \end{matrix}

(104)

does not have a standard

χ^{2}

distribution, which is similar to the F-statistic shown above. However, the GLS Wald statistic

W_{G L S} = {(R {\hat{β}}_{G L S} - r)}^{'} {[σ_{0}^{2} R {(X^{'} S_{n}^{'} S_{n} X)}^{- 1} R^{'}]}^{- 1} (R {\hat{β}}_{G L S} - r) \overset{d}{⟶} χ_{k}^{2},

(105)

has a chi-squared limiting distribution.

Baltagi et al. [37] conduct extensive simulations. Using the root mean squared error (RMSE) as the evaluation criteria, the QML (FGLS) estimators perform uniformly better than the OLS estimator. In particular, when the spatial coefficient is sufficiently close to 1 and the sample size n increases, the RMSE of the OLS estimator grows dramatically. Together with the fact that the Wald test statistic based on the QML method has a standard Chi-squared distribution, QMLE is recommended when near spatial unit roots exist in the spatial error model.

4.4. Doubly Geometric Spatial Autoregressive Process

The main difference between the SAR and the doubly geometric spatial autoregressive models is that the spatial dependence form of the latter is clearly specified. However, for the SAR model, such dependence relies on the specification of

W_{n}

whose explicit form varies in different situations.

For model (65), based on the observation

\{X_{k, ℓ} : 1 ⩽ k ⩽ m and 1 ⩽ ℓ ⩽ n\}

, Baran [47] shows that the asymptotic normality of the estimators

({\hat{α}}_{m, n}, {\hat{β}}_{m, n})

is

\sqrt{m n} (\begin{matrix} {\hat{α}}_{m, n} - α \\ {\hat{β}}_{m, n} - β \end{matrix}) \overset{D}{⟶} N (0, Σ_{α, β})

in the stable case (

| α | < 1

and

| β | < 1

), with some covariance matrix

Σ_{α, β}

. For the unstable case (

α_{n} \to 1

,

β_{n} \to 1

), using the martingale central limit theorem, Bhattacharyya et al. [56,57] show “one step Gauss-Newton” estimators are asymptotically normal with convergence rate

n^{3 / 2}

. This is different from the classical time series

A R (1)

, where the OLS estimator converges to a fraction of functionals of the standard Brownian motion:

T (a_{O L S}^{'} - 1) \Rightarrow \frac{W {(1)}^{2} - 1}{2 \int_{0}^{1} W {(r)}^{2} d r}

[58] (p. 281).

Baran and Pap [59] consider the more complicated model as in (66). The model is stable if and only if

(α, β, γ) \in S

, where S is the open tetrahedron with vertices

V : = {(1, 1, - 1), (1, - 1, 1), (- 1, 1, 1), (- 1, - 1, - 1)}

. They also prove that the OLS estimator is asymptotically normally distributed with the convergence rate n when the model is stable, and

n^{3 / 2}

otherwise. (The simpler model

Y_{k, l} = α Y_{k - 1, l} + β Y_{k, l - 1} + ϵ_{k, l}

, with possibly

α = β

was investigated in Baran et al. [60], Baran et al. [44], Baran and Pap [61] under stable and unstable cases. Under different settings, the limiting distribution of the OLS estimator is normal but has different rates of convergence.)

Roknossadati and Zarepour [62,63] study the limiting behavior of M-estimation for the near unit roots of model (65). The M-estimator

({\hat{α}}_{n}, {\hat{β}}_{n})

of

(α_{n}, β_{n})

is defined to minimize of the objective function

\begin{matrix} g (α_{n}, β_{n}) = \sum_{i = 2}^{n} \sum_{j = 2}^{n} λ (Y_{i j} - α_{n} Y_{i - 1, j} - β_{n} Y_{i, j - 1} + α_{n} β_{n} Y_{i - 1, j - 1}), \end{matrix}

(106)

for some convex function

λ (\cdot)

. Roknossadati and Zarepour [62] show that the self-normalized M-estimators are asymptotically normal, and when the series is stable, the convergence rate of M-estimators is still

n^{3 / 2}

, same as in Bhattacharyya et al. [56,57]. But if it is unstable, i.e., when the model has infinite variance innovations, the M-estimates have a higher consistency rate.

5. Tests for Spatial Unit Roots and Nonstationarity

Recognizing the possible consequences of spatial unit roots, it is necessary to test for it. In fact, in nonstationary cases, the estimator is inconsistent and diverges [64]. If the series contains spatial unit roots, one may employ the spatial first difference as recommended by Fingleton [14]: after the first-order difference, such series will be converted to a stationary one, otherwise it is over-differenced and spatial correlation still exists. Based on this idea, Lauridsen and Kosfeld [17,18] propose two-stage LM tests to check for spatial unit roots. However, such LM tests have a high power function because of

L M > L R > W a l d

in finite samples and are not useful for spatial cointegration since they mis-specify the regression in the second stage. A Wald test is proposed by Lauridsen and Kosfeld [65] but it does not have a usual

χ^{2}

distribution so simulation has to be conducted before each test to obtain the critical values. A different approach introduced by Beenstock et al. [19] uses the fact that when spatial unit roots exist, the variance explodes and the spatial impulse does not die out as distance increases, so they iterate on the parameter space to find out the value of the unit roots (for irregular lattice) and then generate nonstationary series to conduct interval estimation.

Martellosio [66] derives the power properties of invariant tests, for example,

\frac{{\hat{u}}^{'} Q \hat{u}}{{\hat{u}}^{'} \hat{u}}

, where

\hat{u}

is the OLS residuals and Q is a fixed matrix. When

Q = W_{n}

, we obtain the Cliff–Ord test. When the regression contains only a constant, the Cliff–Ord test reduces to Moran’s test as introduced before, which is best locally invariant as shown by King [67]. It has been shown that for the SEM model, as

λ ↑ \frac{1}{ρ_{m a x}}

, the test power vanishes. For the SAR model, as

λ ↑ \frac{1}{ρ_{m a x}}

, the limiting power is either 0 or 1. Krämer [68] shows similar conclusions but focuses on the symmetric weight matrix. Martellosio [69] further shows the power of any test vanishes as spatial correlation increases for a set of regression spaces. Heteroskedasticity robust tests have been studied. For example, Born and Breitung [70], Baltagi and Yang [71] design diagnostic tests for SEM and SAR employing the outer product of gradients (OPG) variant of the LM test which are robust against heteroskedastic (and non-normal) errors. But these tests suffer from the same deficiency as in Martellosio [66] because such test is asymptotically equivalent to Moran’s I. Baltagi and Yang [72] have also shown that the standard LM test undergoes finite sample distortion when spatial dependence is heavy in both spatial and panel data settings. Recently Preinerstorfer [73] suggests some modified tests to avoid this “zero-power trap” phenomenon, which works well for small spatial autocorrelation, but still has limiting power smaller than 1 (only 0.619 by simulation). Thus, the invariant test of

I (0)

null hypothesis is not satisfactory when spatial unit roots exist, and methodologies to determine it (Tests of the

I (1)

null hypothesis) deserve more attention.

5.1. Two-Stage LM Test for the Sources of Spurious Spatial Regression

Lauridsen and Kosfeld [17] develop a two-stage LM test to distinguish between two possible sources for spurious regression. The first one is the existence of spatial (near) unit roots in the regressand and/or regressors as in Fingleton [14], Lee and Yu [34]; the second is that the spatial error term itself is nonstationary. So the LM tests are essentially testing if the spatial process is stable or not. The idea originates from the fact that Fingleton [14] suggests a high value of Moran’s I statistic as an indicator for both spatial nonstationarity and spurious regression, but we cannot distinguish between them or even distinguish between the nonstationarity and the positive spatial correlation among the error terms, which by definition imply a high value of Moran’s I. Specifically, we are trying to distinguish if (i) the

X_{n}

is a SAR process and we regressed

Y_{n}

on

X_{n}

or (ii) the model itself is SEM as

Y_{n} = X_{n} β + ϵ_{n}

, where

ϵ_{n} = λ_{ϵ} W_{n} ϵ_{n} + μ_{n}

, because both (i) and (ii) can cause spatial autocorrelation.

There are at least three advantages of the LM tests [17]. First is that compared with Wald or LR, LM is usually simpler to compute because it is constructed under

H_{0}

. Second is that with the LM test, it is possible to control for some omitted model features such as heterogeneity and autoregression, as in Anselin [74], which will be discussed later. The last one is that, other statistics may not have a standard asymptotic distribution, such as the OLS Wald type statistic as in Baltagi et al. [37]. The proposed two-stage LM test is based on the SEM model and all four possible results are summarized in Lauridsen and Kosfeld [17] (Table 1):

Under $H_{0} : λ_{ϵ} = 0$ , the LM error statistic (LME) developed by Anselin [74] (p. 11, Equation (35)) is

$L M E = \frac{{(e^{'} W_{n} e / σ^{2})}^{2}}{tr (W_{n}^{2} + W_{n}^{'} W_{n})} \sim χ^{2} (1) .$

(107)

Thus large values of $L M E$ reject the null hypothesis, which implies either $0 < λ_{ϵ} < 1$ or $λ_{ϵ} = 1$ .
The next step is to test if $H_{0} : λ_{ϵ} = 1$ . This could be carried out by using the spatial differencing we introduced before. Under $H_{0}$ , $Δ ϵ = μ$ , thus the first order difference on the regression, $Δ Y_{n} = Δ X_{n} β + μ_{n}$ , yields i.i.d. error $μ_{n}$ , which means the value of differenced LME (DLME) should be close 0 under $H_{0}$ . But if $λ_{ϵ} < 1$ , $Δ$ represents overdifferencing, i.e., $Δ ϵ_{n} = (I_{n} - W_{n}) {(I_{n} - λ_{ϵ} W_{n})}^{- 1} μ_{n}$ , so spatial correlation in the error term still exists, and we cannot reject $H_{0}$ .

Similar procedures could be used to investigate whether

Y_{n}

or any

X_{n}

are spatially nonstationary, as the case in Lee and Yu [34]. Letting

Z_{n}

be one of

Y_{n}

,

X_{n 1}

,

X_{n 2}, \dots

, Lauridsen and Kosfeld [17] suggest using the regression

\begin{matrix} \begin{matrix} Z_{n} & = α ι_{n} + ϵ_{n}, \\ Δ Z_{n} & = α Δ ι_{n} + Δ ϵ_{n} = Δ ϵ_{n}, \end{matrix} \end{matrix}

(108)

to obtain LME and DLME, respectively.

Z_{n}

is regressed on a constant term because there is no meaningful regressor but we still need the residuals.

Spatial cointegration could also be tested using this LM test. Thus, after determining

Y_{n}

and

X_{n}

are nonstationary, regress

Y_{n}

on

X_{n}

and

Δ Y_{n}

on

Δ X_{n}

to obtain LME and DLME. The cointegration relation exists if LME is 0; and non-cointegration if LME is positive and DLME is 0; the limiting case of “near cointegration” occurs if LME and DLME are positive.

Moreover, Lauridsen and Kosfeld [18] generalize their two-stage LM test to account for unobserved heteroskedasticity. They specify the covariance matrix of

ϵ_{n}

,

Ω = diag {σ_{1}^{2}, \dots, σ_{n}^{2}}

, have the diagonal element

σ_{i}^{2} = f (Z_{i}, λ_{Z})

, where

Z_{i}

is

P \times 1

vector of observations of exogenous variables for region i, related to

σ_{i}^{2}

via the

P \times 1

vector of parameters

λ_{Z}

. So the statistic in (107) should be adjusted as in Anselin [74] (p. 9, Equation (29)):

L M E H = \frac{{(e^{'} W_{n} e / σ^{2})}^{2}}{tr (W_{n}^{2} + W_{n}^{'} W_{n})} + \frac{f^{'} Z {(Z^{'} Z)}^{- 1} Z^{'} f}{2} \sim χ^{2} (P + 1),

(109)

with

f_{i} = \frac{e_{i}^{2}}{σ^{2}} - 1

and Z as the

n \times P

matrix containing the Z vectors that cause heteroskedasticity.

However, the Lauridsen and Kosfeld test procedure is not without problems. Beenstock et al. [19] point out that this procedure is not suitable for testing spatial cointegration since the second stage is misspecified. To see it more clearly, regress

Δ Y_{n} = β Δ X_{n} + v_{n}

. The LM procedure asserts that if

v_{n}

is not spatially correlated, then

X_{n}

and

Y_{n}

are spatially integrated; and if

v_{n}

is spatially correlated,

X_{n}

and

Y_{n}

are not spatially integrated because of overdifferencing. Nevertheless, regressing

Δ Y_{n}

on

Δ X_{n}

is equivalent to regressing two white noise series,

ϵ_{Y}

and

ϵ_{X}

; because Y and X are both I(1). Hence, the corresponding residuals must be not spatially correlated as long as

ϵ_{Y}

and

ϵ_{X}

are independent, regardless of whether

v_{n}

is spatially correlated, nonstationary, or not.

5.2. A Wald Test for Spatial Nonstationarity

Lauridsen and Kosfeld [65] suggest a Wald post-test statistic. Based on MLE, under

H_{0} : R θ = q

, the general form of the Wald test is

W = {(R θ - q)}^{'} {(R V R^{'})}^{- 1} (R θ - q)

, where

V = I^{- 1}

is the inverse of information matrix. If we specify the null hypothesis as

λ = 1

, then with

R = (0^{'}, 1, 0)

and

q = 1

, we have

W = \frac{{(λ - 1)}^{2}}{V_{λ}}

, where

V_{λ}

is the diagonal element of V corresponding to

λ

. However, as mentioned before, Wald statistics may not have a standard distribution so simulations are conducted. Unlike Fingleton [14], to generate SAR series with spatial unit roots, Lauridsen and Kosfeld [65] do not introduce the noncircular matrix. Thus

λ = 1

is a singular point of

(I_{n} - λ W_{n})

so the inverse does not exist. To solve this issue, they use the Moore–Penrose pseudoinverse.

According to Monte Carlo simulation, they find the critical limit of the Wald test under spatial nonstationarity is higher than the

χ^{2} (1)

distribution, especially for the 5th and 10th percentile.

5.3. Test Unit Roots and Cointegration in the Sense of Spatial Impulses

Beenstock et al. [19] come up with an innovative method to test spatial unit roots and spatial cointegration by considering the behavior of the variance and the spatial impulse. Also, they do not assume unconnected spatial units or row normalize the spatial weight matrix either. Thus, based on the topology of the unit neighborhood, the spatial unit roots are

λ^{*} = \frac{1}{n}

in the regular lattices where n is the maximum and general number of neighbors of each unit. For example,

n = 2

, for bilateral space,

n = 4

for rook lattice and

n = 8

for queen lattice, with

λ^{*} = \frac{1}{2}, \frac{1}{4}, \frac{1}{8}

respectively. (“The weight matrix with first-order contiguity according to the rook criterion has the cells immediately above, below, to the right, and to the left, for a total of four neighboring cells. The weight matrix with first-order contiguity according to the queen criterion is eight cells immediately surrounding the central cell” [75] (p. 131). For the introduction of other types of the spatial weight matrices, see Kelejian and Robinson [26] (pp. 94–95).) And with spatial unit roots,

{(I_{n} - λ^{*} W_{n})}^{- 1}

is still well-defined because of the existence of the edge effect, that is, there exist some units having fewer neighbors than n. However, even though

{(I_{n} - λ^{*} W_{n})}^{- 1}

exists, the variance tends to explode even in finite sample space, which provides us with a way to determine the spatial unit roots for any arbitrary irregular lattices.

5.3.1. Spatial Impulse

The spatial impulse response is essentially the consequence of the shocks from one location to another. Intuitively, shocks should have no effect on the remote units if the spatial data are stationary. Beenstock et al. [19] first consider the simplest SAR model in lateral space:

\begin{matrix} Y_{n, j} & = λ (Y_{n, j + 1} + Y_{n, j - 1}) + u_{n, j}, j = - \infty, \dots, \infty, \\ ⟹ & (1 - λ^{- 1} L + L^{2}) Y_{n, j} = - λ^{- 1} u_{n, j - 1}, \end{matrix}

(110)

where L denote a spatial lag operator such that

L^{i} Y_{n, j} = Y_{n, j - i}

. The auxiliary equation is

x^{2} - λ^{- 1} x + 1 = 0 .

(111)

When the discriminant of the above equation is greater than 0,

0 < λ < \frac{1}{2}

, and there are two different solutions,

x_{1} < 1 < x_{2}

, by Vieta’s formula. Hence, Beenstock et al. [19] express

Y_{n, j}

as

\begin{matrix} Y_{n, j} & = \frac{λ^{- 1}}{x_{1}^{- 1} - x_{1}} [\sum_{i = 1}^{\infty} x_{1}^{i} u_{n, j - i} + \sum_{i = 0}^{\infty} x_{1}^{i} u_{n, j + i}], \\ ⟹ & \frac{\partial Y_{n, j}}{\partial u_{n, j - i}} = \frac{\partial Y_{n, j}}{\partial u_{n, j + i}} = \frac{λ^{- 1} x_{1}^{i}}{x_{1}^{- 1} - x_{1}}, \end{matrix}

(112)

where (112) is known as the Wold representation that expresses

Y_{n, j}

in terms of the shocks. The impulse from location

j - i

to i tends to 0 because

x_{1} < 1

. Also,

x_{1}

varies with

λ

. When

λ = λ^{*} = \frac{1}{2}

,

x_{1} = x_{2} = 1

, so the impulse does not die out with distance and explodes. This fact can also be seen from

Var (Y_{n, j}) = \frac{λ^{- 2} (1 + x_{1}^{2})}{(1 - x_{1}^{2}) {(x_{1}^{- 1} - x_{1})}^{2}} σ_{u}^{2} .

(113)

If

0 < λ < \frac{1}{2}

,

Var (Y_{n, j})

is finite and independent of j; if

λ = \frac{1}{2}

,

x_{1} = 1

, and

Var (Y_{n, j})

is infinite.

For the bilateral space case,

λ^{*} = \frac{1}{n}

are the spatial unit roots as shown before. Because of the edge effect, the singular point is strictly greater but approaches

λ^{*}

. This fact shows a downside of the row-normalized spatial weight matrix: it overstates the true weight of the unit at the edge of the lattice. For example, in a rook lattice, the units have three neighbors with weight

\frac{1}{3}

at the edge, and four neighbors with weight

\frac{1}{4}

in the center. The row-normalized procedure assigns a higher weight to the neighbors of edge units. This weight assignment is not necessarily reasonable and makes the spatial unit roots the same as the singular points. Moreover, without row-normalization, the edging units play the role of the unconnected unit as in Fingleton [14]. The general SAR model in bilateral space is

Y_{n} = λ W_{n} Y_{n} + u_{n}

and the Wold representation is

Y_{n} = A_{n} u_{n}

, where

A_{n} = {(I_{n} - λ W_{n})}^{- 1} = I_{n} + \sum_{i = 1}^{\infty} λ^{i} W_{n}^{i}

. Let the spatial impulse response be defined as

\frac{d Y_{n, j}}{d u_{n, j}} = a_{j j}

and

\frac{d Y_{n, j}}{d u_{n, i}} = a_{j i}

. Analytical solutions of spatial impulse response in bilateral and higher dimension lattices are not obtained, but Beenstock et al. [19] expect

a_{j j}

to be positively related to the number of spatial units because of the larger spillover effect and

a_{j i}

varies inversely with the distance between i and j in the stationary case. If spatial unit roots exist, as in the lateral case, the impulse

a_{j i}

would not die out as the distance increases. This is supported by the simulation, though only the finite sample case could be simulated, see Beenstock and Felsenstein [76] (Figures 5.2 and 5.4). Compared with

λ < λ^{*}

, when

λ = λ^{*}

, it obviously shows a qualitative difference in the persistence of spatial impulses, as well as in the tendency for the explosion of the variance.

In the irregular lattices, the number of neighbors for the unit is undetermined generally, so the spatial unit roots,

λ^{*}

, cannot be calculated as the reciprocal of n. However, since the nonstationarity implies that spatial impulses do not disappear, one can find the empirical spatial unit roots by simulation.

The simulation method in Beenstock et al. [19] to calculate the critical value is pretty flexible and can be adapted to different models. For example, when both dynamic and spatial terms are included, Beenstock and Felsenstein [23] develop a similar procedure for testing cointegration in nonstationary panel data when estimating the spatial spillover effect in housing construction for Israel.

5.3.2. Spatial Unit Roots and Cointegration Tests

Knowing the spatial unit roots

λ^{*}

, Beenstock et al. [19] conduct Monte Carlo simulations to generate the artificial SAR series and use the MLE method to estimate SAC to obtain the corresponding distributions under different topologies (different sample size, criteria, etc.). Results show that the empirical distribution of SAC could be used to construct interval estimation and critical values for statistics that test spatial unit roots. For the spatial cointegration test, a similar procedure applies, but OLS estimation is used.

5.4. Some Applications

Kosfeld and Lauridsen [77] offer an application of the two-stage LM test in Section 5.1 to the income and productivity convergence in the German regional labor market. They find highly significant LME and DLME statistics (refer to formulas above like (107)) for all variables, which means the spatial unit roots are rejected. Yesilyurt and Elhorst [20] estimate the spatial interaction effects of inflation in Turkey. Because the regional inflation rates have a high tendency to co-move over time, they question whether the inflation rates of different regions are stationary in space. Using the two-stage LM statistics from (108), they find that the inflation curve is stationary in space. Olejnik [21] studies the income process of the extended European 25 based on the augmented Solow model taking into account the spatial autocorrelation effect. The stationarity of the error term as well as all variables in the model are investigated. No problem of spurious regression is found. Machado et al. [22] examine the spatial correlation of traffic accidents of vulnerable road users (such as pedestrians and cyclists) in big cities and detect the factors that contribute to these accidents. Because their study covers several cities, the model specifications may vary across different locations. Thus they use the two-stage LM statistic to choose the best model, see Machado et al. [22] (Table 4). Though the Wald post-test in Lauridsen and Kosfeld [65] is asymptotically equivalent to the LM test, “It is generally recommended to choose among these alternatives on the basis of computational ease [78] (p. 94)”.

6. Related Topics

Spatial panel data have been studied extensively. The spatial dependence is incorporated in the error component [12,75,79] or by spatial lag dependence [11,80]. See Baltagi [2] for a textbook discussion. Also, the panel data model can have time lagged dependent variables. If the panel data model includes both spatial and dynamic features, it is named as spatial dynamic panel data (SDPD) model by Yu et al. [10]. Yu et al. [10], Yu and Lee [16] and Yu et al. [36] study the QMLE estimator of the SDPD model under stable, unit roots, and spatial cointegration respectively. The concept of unit roots under the SDPD model is a combination of the spatial and dynamic one. To see this more clearly, Yu et al. [10] specify the model as

Y_{n t} = λ_{0} W_{n} Y_{n t} + γ_{0} Y_{n, t - 1} + ρ_{0} W_{n} Y_{n, t - 1} + X_{n t} β_{0} + c_{n 0} + V_{n t},

(114)

where

Y_{n t} = {(y_{1 t}, y_{2 t}, \dots, y_{n t})}^{'}

and

V_{n t} = {(v_{1 t}, v_{2 t}, \dots, v_{n t})}^{'}

are

n \times 1

column vectors. Since

S_{n} (λ) = I_{n} - λ W_{n}

, assuming

S_{n}

is invertable, the reduced form is

Y_{n t} = A_{n} Y_{n, t - 1} + S_{n}^{- 1} X_{n t} β_{0} + S_{n}^{- 1} c_{n 0} + α_{t 0} S_{n}^{- 1} ι_{n} + S_{n}^{- 1} V_{n t},

(115)

where

A_{n} = S_{n}^{- 1} (γ_{0} I_{n} + ρ_{0} W_{n})

. If the infinite sums are well-defined, then by continuous substitution

Y_{n t} = \sum_{h = 0}^{\infty} A_{n}^{h} S_{n}^{- 1} (c_{n 0} + X_{n, t - h} β_{0} + V_{n, t - h}) .

(116)

So instead of focusing on the singular points of

S_{n}

,

A_{n}

should be considered, which contains

λ

,

γ

and

ρ

: the parameter of contemporaneous spatial effect, time lagged variable and time–spatial effect. A similar process as in Section 3.3.1, letting

{\bar{w}}_{n} = diag \{{\bar{w}}_{n 1}, {\bar{w}}_{n 2}, \dots, {\bar{w}}_{n n}\}

be the eigenvalue matrix of

W_{n}

, Yu et al. [10] show that the eigenvalue matrix of

A_{n}

is

D_{n} = {(I_{n} - λ_{0} {\bar{w}}_{n})}^{- 1} (γ_{0} I_{n} + ρ_{0} {\bar{w}}_{n})

, which can be decomposed as

D_{n} = \frac{γ_{0} + ρ_{0}}{1 - λ_{0}} J_{n} + {\tilde{D}}_{n}

. The power matrix of

A_{n}

follows as

A_{n}^{h} = {(\frac{γ_{0} + ρ_{0}}{1 - λ_{0}})}^{h} R_{n} J_{n} R_{n}^{- 1} + B_{n}^{h}

with

B_{n}^{h} = R_{n} {\tilde{D}}_{n}^{h} R_{n}^{- 1}

since the eigenvector matrix

R_{n}

is orthogonal and

J_{n} {\tilde{D}}_{n} = 0

. Thus, whether

Y_{n t}

is stable or not depends on the value of

\frac{γ_{0} + ρ_{0}}{1 - λ_{0}}

compared with 1. Consequently, the decomposition of

Y_{n t}

, which is a generalization of (40), can be expressed as

Y_{n t} = Y_{n t}^{u} + Y_{n t}^{s} + Y_{n t}^{α},

(117)

where

Y_{n t}^{s}

is a possible stable part,

Y_{n t}^{u}

is a possible unstable part, and

Y_{n t}^{α}

is the time effect part, see Lee and Yu [81] for details. A data transformation procedure is imposed by them to eliminate both the time effects and the possible unstable term. Based on their analysis, the eigenvalues of

A_{n}

, the asymptotic properties of QMLE and bias are derived. When eigenvalues of

A_{n}

are all less than 1 (

γ_{0} + λ_{0} + ρ_{0} < 1

), or some equal to 1 (

γ_{0} + λ_{0} + ρ_{0} = 1

and

γ_{0} \neq 1

), or all equal to 1 (

γ_{0} + λ_{0} + ρ_{0} = 1

and

γ_{0} = 1

), the information matrix has different properties, see Yu and Lee [16] (Table 5).

Thus, the test for the unit eigenvalues of

A_{n}

is of great importance. Most attention is paid to the unit roots in the time dimension, i.e.,

γ_{0} = 1

and equivalently, if

λ_{0} + ρ_{0} = 0

. Unit root tests in panel data under spatial dependence have been extensively studied, see Baltagi [2] (Section 12.3) for a summary. Also, the performance of different tests has been considered in Baltagi et al. [24]. The test for

H_{0} : γ_{0} + ρ_{0} + λ_{0} = 1

has been investigated in Lee and Yu [81] (Section 14.3.4). Such a test works well when

λ_{0} + γ_{0} + ρ_{0} < 1

. However, when

λ_{0} + γ_{0} + ρ_{0} > 1

and T is small, it is not reliable. Thus, further study of the unit root test for the SDPD model should be investigated.

Recently, another approach to describe strong spatial dependence has been proposed by Müller and Watson [49]. Since the spatial units are not neatly arranged, i.e., irregular lattice, they do not model the spatial dependence by SAR model but “posit a continuous parameter model of spatial variation [49]”. They use the Lévy–Brownian motion to define the spatial

I (1)

process,

L (s), s \in R^{d}

, which is a generalization of the Wiener process that is widely used in time series in d dimensions (when

d = 2

, it could be regarded as a random walk on the plane). The advantage of the Lévy–Brownian motion is that such a process is isotropic, which means the relative variance between two locations is determined by the distance but not the orientation (see Anselin [1] (p. 42)). The functional central limit theorem (FCLT) is established to measure the asymptotic behavior of such a process. When regressing two independent

I (1)

processes, spurious regression also occurs since classical, HAC-corrected and clustered standard errors F statistics diverge to infinity, which is similar to that in Fingleton [14]. To remedy this situation, “difference” regression is again considered. Unlike time series, Müller and Watson [49] introduce the isotropic differences that treat all directions symmetrically. That is, they regard the weighted values of the neighborhood as the “average” value of the current location, just as

W_{n} Y_{n}

in the SAR model. And the difference transformation is defined as

y_{l}^{*} = \frac{1}{n} \sum_{ℓ \neq l} κ_{b} (λ_{n}^{- 1} |s_{ℓ} - s_{l}|) (y_{ℓ} - y_{l})

, where

κ_{b}

is some weighting function. Their simulations show regressions using isotropic differences do not suffer from spurious regression problems and valid inference can be conducted. Some test procedures for

H_{0} : I (1)

and

H_{0} : I (0)

are also suggested.

7. Conclusions

This paper briefly surveys spatial unit roots in spatial models. First, some fundamental concepts in spatial econometrics are introduced. Spatial unit roots in SAR and SEM models may lead to spurious regression. For the estimation and inference in the presence of spatial unit roots, QMLE and 2SLS methods are generally used and have satisfactory properties after scaling. The compactness assumption has been recently relaxed in spatial econometrics which potentially makes the spatial unit roots no longer a concern but its implication to concepts like stationarity, and spatial cointegration should be further investigated. The doubly geometric spatial autoregressive process, has been widely used in some scientific fields of which the most concern is about regular lattice. Similar to time series, exact orders of convergence for different estimators are obtained because of its simple specification. But this limits its application in economics where irregular lattice and different types of weight matrices are applied.

To detect possible spatial unit roots, as well as spatial cointegration, several test procedures have been proposed. Their applications are rather limited and depend heavily on simulations to obtain critical values. This could be explained by the fact that statistics under spatial scenarios generally do not have standard asymptotic distributions, not to mention the irregular lattice.

Lastly, some related topics were introduced. The idea of singular points is generalized in SDPD model because such a model includes the time lagged variable that is based on the traditional SAR model. However, the existing literature focuses on the temporal unit roots in the SDPD model. Recently, an innovative way to study spatial unit roots describes the underlying spatial process using Lévy–Brownian motion, which is a generalization and spatial analogy to the time series counterpart. The limitations of different approaches and further research were also discussed.

Author Contributions

Conceptualization, B.H.B. and J.S.; methodology, B.H.B. and J.S.; formal analysis, B.H.B. and J.S.; writing—original draft preparation, B.H.B. and J.S.; writing—review and editing, B.H.B. and J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

We would like to thank the editors and four anonymous referees for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Anselin, L. Spatial Econometrics: Methods and Models; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1988. [Google Scholar]
Baltagi, B.H. Econometric Analysis of Panel Data, 6th ed.; Springer: Cham, Switzerland, 2021. [Google Scholar]
Elhorst, J.P. Spatial Econometrics: From Cross-Sectional Data to Spatial Panels; Springer: Heidelberg, Germany, 2014; Volume 479. [Google Scholar]
Ord, K. Estimation methods for models of spatial interaction. J. Am. Stat. Assoc. 1975, 70, 120–126. [Google Scholar] [CrossRef]
Lee, L.F. Asymptotic distribution of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 2004, 72, 1899–1925. [Google Scholar] [CrossRef]
Kelejian, H.H.; Prucha, I.R. A generalized moments estimator for the autoregressive parameter in a spatial model. Int. Econ. Rev. 1999, 40, 509–533. [Google Scholar] [CrossRef]
Kelejian, H.H.; Prucha, I.R. A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. J. Real Estate Financ. Econ. 1998, 17, 99–121. [Google Scholar] [CrossRef]
Lee, L.F. Best spatial two-stage least squares estimators for a spatial autoregressive model with autoregressive disturbances. Econom. Rev. 2003, 22, 307–335. [Google Scholar] [CrossRef]
Lee, L.F. GMM and 2SLS estimation of mixed regressive, spatial autoregressive models. J. Econom. 2007, 137, 489–514. [Google Scholar] [CrossRef]
Yu, J.; De Jong, R.; Lee, L.F. Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and T are large. J. Econom. 2008, 146, 118–134. [Google Scholar] [CrossRef]
Baltagi, B.H.; Liu, L. Instrumental variable estimation of a spatial autoregressive panel model with random effects. Econ. Lett. 2011, 111, 135–137. [Google Scholar] [CrossRef]
Kapoor, M.; Kelejian, H.H.; Prucha, I.R. Panel data models with spatially correlated error components. J. Econom. 2007, 140, 97–130. [Google Scholar] [CrossRef]
Keller, W.; Shiue, C.H. The origin of spatial interaction. J. Econom. 2007, 140, 304–332. [Google Scholar] [CrossRef]
Fingleton, B. Spurious spatial regression: Some Monte Carlo results with a spatial unit root and spatial cointegration. J. Reg. Sci. 1999, 39, 1–19. [Google Scholar] [CrossRef]
Lee, L.F.; Yu, J. Near unit root in the spatial autoregressive model. Spat. Econ. Anal. 2013, 8, 314–351. [Google Scholar] [CrossRef]
Yu, J.; Lee, L.F. Estimation of unit root spatial dynamic panel data models. Econom. Theory 2010, 26, 1332–1362. [Google Scholar] [CrossRef]
Lauridsen, J.; Kosfeld, R. A test strategy for spurious spatial regression, spatial nonstationarity, and spatial cointegration. Pap. Reg. Sci. 2006, 85, 363–377. [Google Scholar] [CrossRef]
Lauridsen, J.; Kosfeld, R. Spatial cointegration and heteroscedasticity. J. Geogr. Syst. 2007, 9, 253–265. [Google Scholar] [CrossRef]
Beenstock, M.; Feldman, D.; Felsenstein, D. Testing for unit roots and cointegration in spatial cross-section data. Spat. Econ. Anal. 2012, 7, 203–222. [Google Scholar] [CrossRef]
Yesilyurt, F.; Elhorst, J.P. A regional analysis of inflation dynamics in Turkey. Ann. Reg. Sci. 2014, 52, 1–17. [Google Scholar] [CrossRef]
Olejnik, A. Using the spatial autoregressively distributed lag model in assessing the regional convergence of per-capita income in the EU25. Pap. Reg. Sci. 2008, 87, 371–385. [Google Scholar] [CrossRef]
Machado, C.A.S.; Giannotti, M.A.; Chiaravalloti Neto, F.; Tripodi, A.; Persia, L.; Quintanilha, J.A. Characterization of black spot zones for vulnerable road users in São Paulo (Brazil) and Rome (Italy). ISPRS Int. J. Geo-Inf. 2015, 4, 858–882. [Google Scholar] [CrossRef]
Beenstock, M.; Felsenstein, D. Estimating spatial spillover in housing construction with nonstationary panel data. J. Hous. Econ. 2015, 28, 42–58. [Google Scholar] [CrossRef]
Baltagi, B.H.; Bresson, G.; Pirotte, A. Panel unit root tests and spatial dependence. J. Appl. Econom. 2007, 22, 339–360. [Google Scholar] [CrossRef]
Horn, R.A.; Johnson, C.R. Matrix Analysis, 2nd ed.; Cambridge University Press: New York, NY, USA, 2012. [Google Scholar]
Kelejian, H.H.; Robinson, D.P. Spatial correlation: A suggested alternative to the autoregressive model. In New Directions in Spatial Econometrics; Anselin, L., Florax, R.J.G.M., Eds.; Springer: Berlin/Heidelberg, Germany, 1995; pp. 75–95. [Google Scholar]
Griffith, D.A. Advanced Spatial Statistics: Special Topics in the Exploration of Quantitative Spatial Data Series; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 12. [Google Scholar]
Hamilton, J.D. Time Series Analysis; Princeton University Press: Princeton, NJ, USA, 1994. [Google Scholar]
Kelejian, H.H.; Prucha, I.R. On the asymptotic distribution of the Moran I test statistic with applications. J. Econom. 2001, 104, 219–257. [Google Scholar] [CrossRef]
Kelejian, H.H.; Prucha, I.R. Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. J. Econom. 2010, 157, 53–67. [Google Scholar] [CrossRef] [PubMed]
Lee, L.F.; Yang, C.; Yu, J. QML and efficient GMM estimation of spatial autoregressive models with dominant (popular) units. J. Bus. Econ. Stat. 2023, 41, 550–562. [Google Scholar] [CrossRef]
Mur, J.; Trívez, F.J. Unit roots and deterministic trends in spatial econometric models. Int. Reg. Sci. Rev. 2003, 26, 289–312. [Google Scholar] [CrossRef]
Robert, E.F. Co-integration and error correction: Representation, estimation, and testing. Econometrica 1987, 55, 251–276. [Google Scholar]
Lee, L.F.; Yu, J. Spatial nonstationarity and spurious regression: The case with a row-normalized spatial weights matrix. Spat. Econ. Anal. 2009, 4, 301–327. [Google Scholar] [CrossRef]
Lee, L.F. Consistency and efficiency of least squares estimation for mixed regressive, spatial autoregressive models. Econom. Theory 2002, 18, 252–277. [Google Scholar] [CrossRef]
Yu, J.; de Jong, R.; Lee, L.F. Estimation for spatial dynamic panel data with fixed effects: The case of spatial cointegration. J. Econom. 2012, 167, 16–37. [Google Scholar] [CrossRef]
Baltagi, B.H.; Kao, C.; Liu, L. The estimation and testing of a linear regression with near unit root in the spatial autoregressive error term. Spat. Econ. Anal. 2013, 8, 241–270. [Google Scholar] [CrossRef]
Baltagi, B.H.; Liu, L. Spurious spatial regression with equal weights. Stat. Probab. Lett. 2010, 80, 1640–1642. [Google Scholar] [CrossRef]
Kelejian, H.H.; Prucha, I.R. 2SLS and OLS in a spatial autoregressive model with equal spatial weights. Reg. Sci. Urban Econ. 2002, 32, 691–707. [Google Scholar] [CrossRef]
Krämer, W.; Donninger, C. Spatial autocorrelation among errors and the relative efficiency of OLS in the linear regression model. J. Am. Stat. Assoc. 1987, 82, 577–579. [Google Scholar]
Tilke, C. The relative efficiency of OLS in the linear regression model with spatially autocorrelated errors. Stat. Pap. 1993, 34, 263–270. [Google Scholar] [CrossRef]
Krämer, W.; Baltagi, B. A general condition for an optimal limiting efficiency of OLS in the general linear regression model. Econ. Lett. 1996, 50, 13–17. [Google Scholar] [CrossRef]
Martellosio, F. Efficiency of the OLS estimator in the vicinity of a spatial unit root. Stat. Probab. Lett. 2011, 81, 1285–1291. [Google Scholar] [CrossRef]
Baran, S.; Pap, G.; van Zuijlen, M.C. Asymptotic inference for unit roots in spatial triangular autoregression. Acta Appl. Math. 2007, 96, 17–42. [Google Scholar] [CrossRef][Green Version]
Basu, S.; Reinsel, G.C. A note on properties of spatial Yule-Walker estimators. J. Stat. Comput. Simul. 1992, 41, 243–255. [Google Scholar] [CrossRef]
Basu, S.; Reinsel, G.C. Properties of the spatial unilateral first-order ARMA model. Adv. Appl. Probab. 1993, 25, 631–648. [Google Scholar] [CrossRef]
Baran, S. On the variances of a spatial unit root model. Lith. Math. J. 2011, 51, 122–140. [Google Scholar] [CrossRef][Green Version]
Paulauskas, V. On unit roots for spatial autoregressive models. J. Multivar. Anal. 2007, 98, 209–226. [Google Scholar] [CrossRef]
Müller, U.K.; Watson, M.W. Spatial Unit Roots; Princeton University: Princeton, NJ, USA, 2022. [Google Scholar]
Gupta, A. Estimation of spatial autoregressions with stochastic weight matrices. Econom. Theory 2019, 35, 417–463. [Google Scholar] [CrossRef]
Liu, T.; Xu, X.; Lee, L.F. Consistency without compactness of the parameter space in spatial econometrics. Econ. Lett. 2022, 210, 110224. [Google Scholar] [CrossRef]
Newey, W.K.; McFadden, D. Large sample estimation and hypothesis testing. In Handbook of Econometrics; Elsevier: Amsterdam, The Netherlands, 1994; Volume 4, Chapter 35; pp. 2111–2245. [Google Scholar]
Gupta, A. Efficient closed-form estimation of large spatial autoregressions. J. Econom. 2023, 232, 148–167. [Google Scholar] [CrossRef]
Rossi, F.; Lieberman, O. Spatial autoregressions with an extended parameter space and similarity-based weights. J. Econom. 2023, 235, 1770–1798. [Google Scholar] [CrossRef]
Liu, L. A note on 2SLS estimation of the mixed regressive spatial autoregressive model. Econ. Lett. 2015, 134, 49–52. [Google Scholar] [CrossRef]
Bhattacharyya, B.; Richardson, G.; Franklin, L. Asymptotic inference for near unit roots in spatial autoregression. Ann. Stat. 1997, 25, 1709–1724. [Google Scholar] [CrossRef]
Bhattacharyya, B.; Khalil, T.; Richardson, G. Gauss-Newton estimation of parameters for a spatial autoregression model. Stat. Probab. Lett. 1996, 28, 173–179. [Google Scholar] [CrossRef]
Phillips, P.C. Time series regression with a unit root. Econometrica 1987, 55, 277–301. [Google Scholar] [CrossRef]
Baran, S.; Pap, G. Parameter estimation in a spatial unilateral unit root autoregressive model. J. Multivar. Anal. 2012, 107, 282–305. [Google Scholar] [CrossRef]
Baran, S.; Pap, G.; Van Zuijlen, M.C. Asymptotic inference for a nearly unstable sequence of stationary spatial AR models. Stat. Probab. Lett. 2004, 69, 53–61. [Google Scholar] [CrossRef][Green Version]
Baran, S.; Pap, G. On the least squares estimator in a nearly unstable sequence of stationary spatial AR models. J. Multivar. Anal. 2009, 100, 686–698. [Google Scholar] [CrossRef]
Roknossadati, S.; Zarepour, M. M-estimation for a spatial unilateral autoregressive model with infinite variance innovations. Econom. Theory 2010, 26, 1663–1682. [Google Scholar] [CrossRef]
Roknossadati, S.; Zarepour, M. M-estimation for near unit roots in spatial autoregression with infinite variance. Statistics 2011, 45, 337–348. [Google Scholar] [CrossRef]
Ahlgren, N.; Gerkman, L. Inference in unilateral spatial econometric models. Bull. Int. Stat. Inst. 2007, 56, 1–44. [Google Scholar]
Lauridsen, J.; Kosfeld, R. A Wald test for spatial nonstationarity. Estud. Econ. Apl. 2004, 22, 475–486. [Google Scholar]
Martellosio, F. Power properties of invariant tests for spatial autocorrelation in linear regression. Econom. Theory 2010, 26, 152–186. [Google Scholar] [CrossRef]
King, M.L. A small sample property of the Cliff-Ord test for spatial correlation. J. R. Stat. Soc. Ser. B (Methodological) 1981, 43, 263–264. [Google Scholar] [CrossRef]
Krämer, W. Finite sample power of Cliff–Ord-type tests for spatial disturbance correlation in linear regression. J. Stat. Plan. Inference 2005, 128, 489–496. [Google Scholar] [CrossRef]
Martellosio, F. Testing for spatial autocorrelation: The regressors that make the power disappear. Econom. Rev. 2012, 31, 215–240. [Google Scholar] [CrossRef]
Born, B.; Breitung, J. Simple regression-based tests for spatial dependence. Econom. J. 2011, 14, 330–342. [Google Scholar] [CrossRef]
Baltagi, B.H.; Yang, Z. Heteroskedasticity and non-normality robust LM tests for spatial dependence. Reg. Sci. Urban Econ. 2013, 43, 725–739. [Google Scholar] [CrossRef]
Baltagi, B.H.; Yang, Z. Standardized LM tests for spatial error dependence in linear or panel regressions. Econom. J. 2013, 16, 103–134. [Google Scholar] [CrossRef]
Preinerstorfer, D. How to avoid the zero-power trap in testing for correlation. Econom. Theory 2021, 39, 1292–1324. [Google Scholar] [CrossRef]
Anselin, L. Lagrange multiplier test diagnostics for spatial dependence and spatial heterogeneity. Geogr. Anal. 1988, 20, 1–17. [Google Scholar] [CrossRef]
Baltagi, B.H.; Song, S.H.; Koh, W. Testing panel data regression models with spatial error correlation. J. Econom. 2003, 117, 123–150. [Google Scholar] [CrossRef]
Beenstock, M.; Felsenstein, D. Unit root and cointegration tests in spatial cross-section data. In The Econometric Analysis of Non-Stationary Spatial Panel Data; Advances in Spatial Science; Springer: Berlin/Heidelberg, Germany, 2019; Chapter 5; pp. 97–127. [Google Scholar]
Kosfeld, R.; Lauridsen, J. Dynamic spatial modelling of regional convergence processes. Empir. Econ. 2004, 29, 705–722. [Google Scholar] [CrossRef]
Vaona, A. Spatial autocorrelation and the sensitivity of RESET: A simulation study. J. Geogr. Syst. 2010, 12, 89–103. [Google Scholar] [CrossRef]
Fingleton, B. A generalized method of moments estimator for a spatial model with moving average errors, with application to real estate prices. Empir. Econ. 2008, 34, 35–57. [Google Scholar] [CrossRef]
Baltagi, B.H.; Liu, L. Testing for random effects and spatial lag dependence in panel data models. Stat. Probab. Lett. 2008, 78, 3304–3306. [Google Scholar] [CrossRef]
Lee, L.F.; Yu, J. A unified transformation approach for the estimation of spatial dynamic panel data models: Stability, spatial cointegration and explosive roots. In Handbook on Empirical Economics and Finance; Ullah, A., Giles, D., Eds.; Taylor & Francis Group: New York, NY, USA, 2011; Chapter 13; pp. 397–434. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Baltagi, B.H.; Shu, J. A Survey of Spatial Unit Roots. Mathematics 2024, 12, 1052. https://doi.org/10.3390/math12071052

AMA Style

Baltagi BH, Shu J. A Survey of Spatial Unit Roots. Mathematics. 2024; 12(7):1052. https://doi.org/10.3390/math12071052

Chicago/Turabian Style

Baltagi, Badi H., and Junjie Shu. 2024. "A Survey of Spatial Unit Roots" Mathematics 12, no. 7: 1052. https://doi.org/10.3390/math12071052

APA Style

Baltagi, B. H., & Shu, J. (2024). A Survey of Spatial Unit Roots. Mathematics, 12(7), 1052. https://doi.org/10.3390/math12071052

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Survey of Spatial Unit Roots

Abstract

1. Introduction

2. Basic Concepts in Spatial Econometrics

2.1. Parameter Space of the Spatial Autoregressive Coefficient

2.2. Stationarity, Order of Integration and Cointegration

2.3. Some Fundamental Assumptions

3. Spurious Regression When (Near) Unit Roots Exist

3.1. Spurious Regression of Driftless Series and Spatial Integration

3.2. Spurious Regression with Deterministic Trends

3.3. Spurious Regression under the near Unit Roots with a Row-Normalized, Circular Weight Matrix

3.3.1. Decomposition of Y n

3.3.2. Spurious Regression of OLS under Near Unit Root

3.3.3. Other Test Statistics

3.3.4. Constant Terms in the DGP of Y j n ’s

3.4. “Spurious” Regression with Equal Weights

4. Estimation and Inference

4.1. QMLE and 2SLS Methods for the (Mixed) SAR Model

4.1.1. Quasi-Maximum Likelihood Estimation Method

4.1.2. Generalized Spatial Two-Stage Least Squares (GS2SLS) Method

4.1.3. Best Generalized Spatial Two Stage Least Squares (BGS2SLS) Estimators

4.2. Near Unit Roots in the SAR Model

4.2.1. QMLE

4.2.2. GS2SLS and BGS2SLS

4.3. Near Unit Roots in the SEM Model

4.4. Doubly Geometric Spatial Autoregressive Process

5. Tests for Spatial Unit Roots and Nonstationarity

5.1. Two-Stage LM Test for the Sources of Spurious Spatial Regression

5.2. A Wald Test for Spatial Nonstationarity

5.3. Test Unit Roots and Cointegration in the Sense of Spatial Impulses

5.3.1. Spatial Impulse

5.3.2. Spatial Unit Roots and Cointegration Tests

5.4. Some Applications

6. Related Topics

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.3.1. Decomposition of $Y_{n}$

3.3.4. Constant Terms in the DGP of $Y_{j n}$ ’s