Asymptotic Properties of Quasi-Maximum Likelihood Estimators for Heterogeneous Spatial Autoregressive Models

Feng Qiu; Hao Ding; Jianhua Hu

doi:10.3390/sym14091894

,

and

¹

School of Mathematics and Computer Science, Zhejiang Agriculture and Forestry University, Hangzhou 311300, China

²

Samoyed Cloud Technology Group Holdings Limited, Shanghai 200124, China

³

School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai 200433, China

^*

Author to whom correspondence should be addressed.

Symmetry2022, 14(9), 1894;https://doi.org/10.3390/sym14091894

This article belongs to the Section Mathematics

Version Notes

Order Reprints

Abstract

In this paper, we address a class of heterogeneous spatial autoregressive models with all

n (n - 1)

spatial coefficients taking m distinct true values, where m is independent of the sample size n, and we establish asymptotic properties of the maximum likelihood estimator and the quasi-maximum likelihood estimator for all parameters in the class of models, extending Lee’s work (2004). The rates of convergence of those estimators depend on the features of values taken by elements of the spatial weights matrix in this model. Under the situations where, based on the values of the weights, each individual will not only influence a few neighbors but also be influenced by only a few neighbors, the estimator can enjoy an

\sqrt{n}

-rate of convergence and be asymptotically normal. However, when each individual can influence many neighbors or can be influenced by many neighbors and their number does not exceed

o (n)

, singularity of the information matrix may occur, and various components of the estimators may have different (usually lower than

\sqrt{n}

) rates of convergence. An inconsistent estimator is provided if some important assumptions are violated. Finally, simulation studies demonstrate that the finite sample performances of maximum likelihood estimators are good.

Keywords:

asymptotic normality; heterogeneous spatial autoregression; heterogeneous spatial interaction; quasi-maximum likelihood estimator; rates of convergence; spatial coefficients

1. Introduction

Spatial econometrics consists of econometric techniques dealing with empirical economic problems caused by spatial (cross sectional) interaction between spatial individuals. The dependence across spatial individuals is an interaction issue in urban, real estate, regional, public, agricultural, environmental economics, finance and industrial organizations. In many economic applications, modeling the interaction is essential for understanding market competition. One typical example is the commodity market where commodity buyers compare commodities of all sizes, with all kind of options and in all locations as well as their prices before purchasing.

In this example, the market demand for a commodity is a function of the prices and characteristics of all commodities. Specifically, let

x_{i}

and

ϵ_{i}

denote the observed and unobserved characteristics of commodity i, respectively, and let

y_{i}

denote its price. The market demand for commodity (

x_{i}, ϵ_{i}

) is a function of

(y_{i}, x_{i}, ϵ_{i})

,

i = 1, \dots, n

, where n denotes the total number of commodities in the market. Setting the market demand for commodity

(x_{i}, ϵ_{i})

equal to its market supply and solving for

y_{i}

, we obtain the following partial equilibrium price:

y_{i} = f_{i} (x_{i}, ϵ_{i}, (y_{j}, x_{j}, ϵ_{j}) for all j \neq i), i, j = 1, \dots, n .

(1)

The marginal effect of

y_{i}

with respect to

y_{j}

measures the price competition from commodity j, while the marginal effect with respect to

x_{j}

measures the quality competition. To aid exposition, we shall call the dependence of the partial equilibrium price of commodity i on other prices

(y_{j}, for all j, i)

as the direct spatial interaction and call dependence on other commodity characteristics (

(x_{j}, ϵ_{j})

for all

j \neq i

) as the indirect spatial interaction. The spatial interaction is the sum of the direct and indirect spatial interaction. The spatial interaction is often ignored in the existing hedonic commodity price regression literature. One argument for the omission is that researchers are interested only in predicting commodity prices, not in understanding market competition. If prediction is the goal, then we should solve (1) for all prices to obtain the general equilibrium prices:

y_{i} = h_{i} ((x_{j}, ϵ_{j}) for all j), i, j = 1, \dots, n .

(2)

Clearly, the general equilibrium prices may contain the indirect spatial interaction terms; neglecting these terms in the hedonic commodity price regression could result in omitted variable bias and poor price prediction. Bell and Bockstael [] and Banerjee et al. [] partially addressed the omitted variable problem by adding the spatial error correlation to the hedonic commodity price regression. Their models yield consistent parameter estimates and prediction if and only if the other omitted interaction terms are uncorrelated with its own characteristics.

To capture spatial interaction, the approaches in spatial econometrics are to impose structures on a model. A well-known structure is the spatial autoregressive (SAR) studied by Cliff and Ord [] who extended autocorrelation in time series to spatial dimensions. The SAR model with common regressors is expressed as

y_{i} = ρ \sum_{j \neq i} w_{i j} y_{j} + x_{i}^{⊤} β + ϵ_{i}, i, j = 1, \dots, n,

(3)

where

y_{i}

is the value of the equilibrium price for the ith spatial individual,

w_{i j}

is the nonstochastic spatial weights from j to i with

w_{i i} = 0

for all i,

ρ

is the single spatial coefficient,

x_{i k}

is the value of the kth regressor among p common regressors for the ith spatial individual,

β_{k}

is the regression coefficient of the kth regressor for all spatial individuals. Assume that the disturbances or errors

ϵ_{i}

across

i = 1, \dots, n

are independent and identically distributed (i.i.d.) with mean zero and variance

σ^{2}

.

The spatial aspect of a SAR model has the distinguishing feature of simultaneity in econometric equilibrium models. Sustainable developments in testing and estimation of SAR models have been summarized in substantial literature, e.g., Cliff and Ord [], Anselin [], Cressie [], Anselin and Bera [], and Elhorst [] among others. Recent empirical applications of the SAR model in main stream economics journals include Case [], Case et al. [], Besley and Case [], Brueckner [], Bell and Bockstael [], Bertrand et al. [], Topa [], Coval and MosKouitz [], Druska and Horrace [], Frazier and Kockelman [], Baltagi and Li [], Pirinsky and Wang [], Bekaert et al. [], Robinson and Rossi [], Liu et al. [], and among others.

The SAR models can be estimated by the method of maximum likelihood (ML), e.g., Ord [], Smirnov and Anselin [], and Robinson and Rossi [], methods of moments, e.g., Kelejian and Prucha [], and Lee and Liu [], as well as the method of quasi-maximum likelihood (QML), e.g., Lee [] and Yu et al. [].

In recent years, many authors have considered the problem of spatial autoregressive models. Liu et al. [] developed a penalized quasimaximum likelihood method for simultaneous model selection and parameter estimation in the SAR models with independent and identical distributed errors. Song et al. [] proposed a variable selection method based on exponential squared loss for the SAR models. Ju et al. [] developed Bayesian influence analysis for skew-normal spatial autoregression models (SSARMs). Our argument will focus on the more general specification of linear structure of the spatial interaction.

Let us carefully look up model (3). The single spatial coefficient

ρ

in (3) together with the nonnegative spatial weights implies a unidirectional effect. Specifically, if

ρ > 0

(<0), then those spatial individuals with positive spatial weights (

w_{i j} > 0

for

j \neq i

) all have positive (negative) effects on individual i.

In applications, however, it is possible that some spatial individuals have positive effects while other individuals have negative effects on individual i. For instance, in regional economics applications, region j may serve as a supply hub for region i and could have a positive effect on region i’s economy, while region k competes (e.g., an alternative competition) against region i and could have a negative effect on region i. Specification of the linear structure of spatial interaction in (3) rules out this type of bidirectional effect.

In addition, the spatial weights in (3) are often constructed to be symmetric (i.e.,

w_{i j} = w_{j i}

), implying that the effect of individual i on individual j is identical to the effect of individual j on individual i. In applications, it is possible to have asymmetric effects. Again, in regional economics applications, economically strong region i is likely to have a bigger impact on economically weak region j than the economically weak region on the economically strong region. This type of asymmetric effects are not permitted by (3). One could argue that the asymmetric effects can be modeled through construction of asymmetric spatial weights. As we know, there is hardly any theoretical guidance on the construction of those asymmetric weights and the construction of weights itself is not entirely undisputed. About weights matrix, an interested reader can be referred to Ahrens and Bhattacharjee [], and Lam and Souza [].

This type of individual-specific endogenous effects is universal in real society. In many applications, individuals have different impacts on neighbors’ behaviors. For example, Clark and Loheac [] designed a special (classic) spatial panel model to show that popular teenagers in a school have much stronger influences on their classmates’ smoking decisions than their less popular peers, Mas and Moretti [] applied the expected utility principle to find that the magnitude of spillovers varies dramatically among workers with different skill levels, Banerjee et al. [] used a model of word-of-mouth diffusion to investigate that individuals directly connected with some village leaders are more likely to join the micro-finance program than those connected to someone else.

To better capture the spatial interaction, a spatial autoregressive model with a general linear specification that includes all spatial interaction terms should be expressed as

y_{i} = \sum_{i \neq j} w_{i j} ρ_{i j} y_{j} + x_{i}^{⊤} β + ϵ_{i},

(4)

where

ρ_{i j}

is the spatial coefficient representing the effect of individual j on individual i. If the true values of the spatial coefficients satisfy

ρ_{i j} = ρ

for all i and j, the most general SAR model (4) is reduced to the famous model (3). If

ρ_{i j} \neq ρ_{j i}

for some

i, j

, then the effect of individual j on individual i is not equal to the effect of individual i on individual j even if

w_{i j} = w_{j i}

, and if

ρ_{j i} ρ_{i j} < 0

the effects of individual i on j and individual j on i are bidirectional (in the opposite direction).

Clearly the most general model (4) is flexible enough to permit both asymmetric, individual-specific endogenous and bidirectional effects. Despite of these advantages, this model is not identified since there are

n (n - 1)

spatial coefficients, increasing as n increases. Some restrictions must be placed on those coefficients.

Homogeneous classification of spatial coefficients is an available method. We classify the

n (n - 1)

spatial coefficients into m subgroups where m is independent of n and spatial coefficients in a subgroup take a same value. Two approaches, data-driven selection and economic geographic attributes, can be adopted to realize this type of homogeneous classification. In regional economics applications, many regions belonging to an upper administrative unit is one example of economic geographic attributes. In this example, the number of upper administrative units, m, is regarded as being independent of n if it is much smaller than the number of regions, n. In one word, we are restricting the true values of the

n (n - 1)

spatial coefficients in model (4) to a set of m finite distinct values.

Consequently, the specified spatial weight matrix W of order n can be divided into m nonzero spatial weight matrices,

W_{1}, \dots, W_{m}

, of order n, satisfying the homogeneous classification condition

W = W_{1} + \dots + W_{m},

(5)

where, for any nonzero component

w_{i j}

of W, there is a unique

k_{0}

such that

w_{k_{0}, i j} = w_{i j}

and

w_{k, i j} = 0

for

k \neq k_{0}

with

w_{k, i j}

being the

(i, j)

th component of

W_{k}

, and for

w_{i j} = 0

, then all

w_{k, i j} = 0

for all k. The weights w’s may be selected based on potential specifications, such as physical distance, social networks, or “economic” quantities among variables, see Case et al. [], or based on a best combination of these specifications, see Lam and Souza [].

With the homogeneous classification (5), a heterogenous spatial autoregressive model with a linear specification having m distinct spatial coefficients is postulated as

y_{i} = \sum_{k = 1}^{m} ρ_{k} \sum_{i \neq j} w_{k, i j} y_{j} + x_{i}^{⊤} β + ϵ_{i} .

(6)

Dou et al. [] considered a class of pure spatio-temporal models without regressors by classifying the spatial coefficients based on rows of weight matrix. Their method, however, does not work for the similar models with regressors. Peng [] considered a SAR model in network by classifying the spatial coefficients based on columns of weight matrix. Peng’s work need sparsity of spatial coefficients.

In this paper, we investigate asymptotic properties of the maximum likelihood (ML) estimator and the quasi-maximum likelihood (QML) estimator for the heterogeneous SAR model (6) under the normal distributional specification, extending the results for the SAR model (3) investigated in Lee [] to the heterogeneous SAR model (6). The QML estimator is appropriate when the estimator is derived from a normal likelihood but the disturbances in the model are not truly normally distributed, e.g., see Lee [].

In the existing literature, the ML estimator of such a model is implicitly regarded as having the familiar

\sqrt{n}

-rate of convergence as a usual ML estimator for a parametric statistical model with sample size n, e.g., see the reviews by Anselin [] and Anselin and Bera []. Lee [] provided a broader view of the asymptotic property of the ML and QML estimators and shown that the rates of convergence of the ML and QML estimators depend on some general features of the spatial weights matrix W of the model (3). This paper will extend Lee’s works to the model (6), aiming at providing a similar view of the asymptotic property and the rates of convergence of the ML and QML estimators in the model (6) under the different scenarios.

The remainder of this paper is organized as follows. Section 2 provides an estimation procedure to find the ML and QML estimators of parameters in the novel heterogeneous spatial autoregressive model and specify regularity conditions to the model (6). In Section 3, we show that identification of parameters can be assured if there is no multicollinearity among the regressors and m spatially generated regressors. The ML and QML estimators can be

\sqrt{n}

-consistent and asymptotically normal (Theorem 3) under some regularity conditions on the spatial weights matrix.

Section 4 considers the spatial scenarios in the space zone between asymptotic normality and inconsistency. This spatial scenarios occur when each individual can be influenced by many neighbors or can influence many neighbors, in which singularity or irregularity of the information matrix may occur and various components of the QML estimators may have different rates of convergence. This includes the ML estimator and QML estimator for the (pure) heterogeneous SAR process. This section also considers the event of multicollinearity where the m spatially generated regressor is collinear with the original regressors. A counterexamples of two distinct spatial coefficients is given to provide an inconsistent QML estimator when multicollinearity occurs.

In Section 5, we conduct finite sample simulation studies for the spatial coefficients heterogeneous SAR models with 5 distinct spatial coefficients. Section 6 provides the brief concluding remarks. The proofs of all main theorems are collected in Appendix A and counterexample of inconsistent QML estimators are provided in Appendix B for this article.

2. Heterogeneous SAR Model and QML Estimators

The heterogeneous SAR model with m distinct spatial coefficients and p common regressors is written in a matrix version as

Y_{n} = \sum_{k = 1}^{m} ρ_{k} W_{k} Y_{n} + X_{n} β + ϵ_{n},

(7)

where n is the total number of spatial units,

Y_{n}

is the n-dimensional equilibrium (price) vector,

X_{n}

is an

n \times p

matrix of constant (spatial varying) regressors,

ρ = {(ρ_{1}, \dots, ρ_{m})}^{⊤}

is the spatial coefficient vector,

β = {(β_{1}, \dots, β_{p})}^{⊤}

is the regression coefficients (may include the intercept),

ϵ_{n}

is an n-dimensional vector of independently and identically distributed (i.i.d.) disturbances with zero mean and variance

σ^{2}

.

We denote

θ_{0} = {(ρ_{0}^{⊤}, β_{0}^{⊤}, σ_{0}^{2})}^{⊤}

to be the true value of

θ = {(ρ^{⊤}, β^{⊤}, σ^{2})}^{⊤}

. Let

S (ρ) = I - \sum_{k = 1}^{m} ρ_{k} W_{k}

for any spatial parametric vector

ρ

, where I is the identity matrix. The equilibrium (price) vector

Y_{n}

is expressed as

Y_{n} = S_{n}^{- 1} (X_{n} β + ϵ_{n}),

(8)

where

S_{n} = S (ρ_{0})

is assumed to be nonsingular. When there are no regressors

X_{n}

in the model (7), it becomes a pure heterogeneous SAR process:

Y_{n} = \sum_{i = k}^{m} ρ_{k} W_{k} Y_{n} + ϵ_{n},

(9)

implying that the equilibrium (price) vector

Y_{n}

is simply derived from the disturbance vector

ϵ_{n}

.

We use

ϵ (ρ, β)

to denote

S (ρ) Y_{n} - X_{n} β

and

ϵ_{n}

to denote

ϵ (ρ_{0}, β_{0})

. The log-likelihood function of

θ

in (7) is

\log L_{n} (θ) = - \frac{n}{2} \log (2 π) - \frac{n}{2} \log (σ^{2}) + \log | S (ρ) | - \frac{1}{2 σ^{2}} ϵ {(ρ, β)}^{⊤} ϵ (ρ, β) .

(10)

The extremum estimator derived from the maximization of (10) is written as

{\hat{θ}}_{n} = \underset{θ}{argmax} \log L_{n} (θ),

(11)

where

θ

takes values in the set of admissible values. When the (i.i.d.) disturbances in the model (7) are normally distributed by

N (0, σ^{2})

, the extremum estimator is the maximum likelihood (ML) estimator. When the (i.i.d.) disturbances in the model are not truly normally distributed, the extremum estimator derived from a normal likelihood is called the quasi-maximum likelihood (QML) estimator.

The first-order partial derivatives of function (10) with respect to

ρ

,

β

and

σ^{2}

as follows:

\frac{\partial \log L_{n} (θ)}{\partial ρ_{k}} = - tr [S {(ρ)}^{- 1} W_{k}] + \frac{1}{σ^{2}} ϵ {(ρ, β)}^{⊤} W_{k} Y_{n} for k = 1, \dots, m,

(12)

where

tr (A)

is the trace of A and the formula

d [\log (\det F)] = tr (F^{- 1} d F)

is used, see Magnus and Neudecker [],

\frac{\partial \log L_{n} (θ)}{\partial β} = \frac{1}{σ^{2}} X_{n}^{⊤} ϵ (ρ, β),

(13)

and

\frac{\partial \log L_{n} (θ)}{\partial σ^{2}} = - \frac{n}{2 σ^{2}} + \frac{1}{2 σ^{4}} ϵ {(ρ, β)}^{⊤} ϵ (ρ, β) .

(14)

In order to prove consistency and asymptotic normality, we usual adopt the following solution procedure. The concentrated log-likelihood function of

ρ

is defined as

\log L_{n} (ρ) = max_{β, σ^{2}} \log L_{n} (θ) .

Letting (13) be zero obtains the QML estimator of

β

for fixed

ρ

{\tilde{β}}_{n} (ρ) = {(X_{n}^{⊤} X_{n})}^{- 1} X_{n}^{⊤} S (ρ) Y_{n},

(15)

and letting (14) be zero gives the QML estimator of

σ^{2}

for fixed

ρ

{\tilde{σ}}_{n}^{2} (ρ) = \frac{1}{n} {[S (ρ) Y_{n} - X_{n} {\tilde{β}}_{n} (ρ)]}^{⊤} [S (ρ) Y_{n} - X_{n} {\tilde{β}}_{n} (ρ)] = \frac{1}{n} Y_{n}^{⊤} S {(ρ)}^{⊤} (I - P_{X_{n}}) S (ρ) Y_{n}

(16)

where

P_{X_{n}} = X_{n} {(X_{n}^{⊤} X_{n})}^{- 1} X_{n}^{⊤}

is the orthogonal projection to the column space of

X_{n}

. Then, the concentrated log likelihood function of

ρ

is

\log L_{n} (ρ) = - \frac{n}{2} [\log (2 π) + 1] - \frac{n}{2} \log {\tilde{σ}}_{n}^{2} (ρ) + \log | S (ρ) | .

(17)

Maximizing the concentrated likelihood (17) obtains the QML estimator

{\hat{ρ}}_{n}

of

ρ

{\hat{ρ}}_{n} = \underset{ρ}{argmax} \log L_{n} (ρ),

where

ρ

takes values in the set of admissible values. The procedure of calculating the QML estimator

{\hat{ρ}}_{n}

can be realized by Newton–Raphson method (Ord []) and R package rgenoud. Thus, the QML estimators of

β

and

σ^{2}

are expressed, respectively, as

{\hat{β}}_{n} = {\tilde{β}}_{n} ({\hat{ρ}}_{n}) and {\hat{σ}}_{n}^{2} = {\tilde{σ}}_{n}^{2} ({\hat{ρ}}_{n}),

finally obtaining QML estimator

{\hat{θ}}_{n}

in (11).

After finding the QML estimator

{\hat{θ}}_{n}

, we focus our attention on how to investigate its consistency and asymptotic normality. Similar to Lee [], we first introduce some basic regularity conditions for our heterogeneous SAR model to provide a rigorous analysis of the QML estimators. Additional regularity conditions will be subsequently added.

Assumption A1.

The disturbances,

ϵ_{1}, \dots, ϵ_{n}

, of

ϵ_{n} = {(ϵ_{1}, \dots, ϵ_{n})}^{⊤}

are i.i.d. with mean zero and variance

σ^{2} > 0

. Their moments,

E (| ϵ_{1} |^{4 + γ}), \dots, E (| ϵ_{n} |^{4 + γ})

for some

γ > 0

uniformly exist.

Assumption A2.

The elements

w_{i j}

of W are

O (d_{n}^{- 1})

, at most of order

d_{n}^{- 1}

, uniformly for all

i, j

, where the rate sequence

d_{n}

is bounded or divergent, as n tends to infinite. As a normalization,

w_{i i} = 0

for all i.

Assumption A3.

The ratio

{lim}_{n \to \infty} d_{n} / n = 0

as n goes to infinity.

Assumption A4.

The matrix

S_{n}

is nonsingular.

This tells us that the parametric function

S (ρ)

is nonsingular at point

ρ_{0}

. Since

S (ρ)

is continuous,

S (ρ)

is nonsingular for

ρ

in a neighborhood of

ρ_{0}

. Alternatively,

ρ

is said to take values in the set of admissible values.

Assumption A5.

The weight matrix W and

S_{n}^{- 1}

are uniformly bounded in both row and column sums as n goes to infinity.

It is also assumption in Horn and Johnson []. It follows from (5) that

W_{1}, \dots, W_{m}

all are uniformly bounded in both row and column sums. Since usual

w_{i j} \geq 0

, putting Assumptions A1 and A5 together implies that there do not exist

O (n)

elements of each row or each column whose number reaches to

O (d_{n}^{- 1})

.

Assumption A6.

{S_{n} (ρ)}^{- 1}

are uniformly bounded in either row or column sums, uniformly in ρ in a compact parameter space Γ with

ρ_{0}

being in the interior of Γ.

Assumption A7.

The elements of

X_{n}

are uniformly bounded constants for all n and assume that

lim_{n \to \infty} \frac{1}{n} X_{n}^{⊤} X_{n} > 0 .

3. Asymptotic Properties of QML Estimators

Let

G_{k} = W_{k} S_{n}^{- 1}

, then

S_{n}^{- 1} = I + \sum_{k = 1}^{m} ρ_{k 0} G_{k}

. From (8), the reduced form equation of

Y_{n}

can be represented as

Y_{n} = X_{n} β_{0} + \sum_{k = 1}^{m} ρ_{k 0} (G_{k} X_{n} β_{0}) + S_{n}^{- 1} ϵ_{n} .

(18)

Condition 1.

Assume that

lim_{n \to \infty} \frac{1}{n} {(G_{1} X_{n} β_{0}, \dots, G_{m} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) (G_{1} X_{n} β_{0}, \dots, G_{m} X_{n} β_{0}) > 0 .

Condition 2.

Assuming

lim_{n \to \infty} \frac{1}{n} {(G_{1} X_{n} β_{0}, \dots, G_{m} X_{n} β_{0}, X_{n})}^{⊤} (G_{1} X_{n} β_{0}, \dots, G_{m} X_{n} β_{0}, X_{n}) > 0 .

(19)

Condition 2 says that

X_{n}

,

G_{1} X_{n} β_{0}, \dots, G_{m - 1} X_{n} β_{0}

and

G_{m} X_{n} β_{0}

in (18), are not asymptotically multicollinear. It is a sufficient condition for global identification of

θ_{0}

.

Lemma 1.

Under Assumptions A1–A6, Condition 2 is true if and only if Assumption A7 and Condition 1 both are true.

3.1. Consistency

Corresponding to the concentrated log likelihood function, we define its expectation as

Q_{n} (ρ) = max_{β, σ^{2}} E [\log L_{n} (θ)] .

(20)

It follows from (10) that the expectation of

\log L_{n} (θ)

is expressed as

E [\log L_{n} (θ)] = - \frac{n}{2} \log (2 π) - \frac{n}{2} \log (σ^{2}) + \log | S (ρ) | - \frac{1}{2 σ^{2}} E [ϵ {(ρ, β)}^{⊤} ϵ (ρ, β)],

with

E [ϵ {(ρ, β)}^{⊤} ϵ (ρ, β)] = E [{(S (ρ) Y_{n} - X_{n} β)}^{⊤} (S (ρ) Y_{n} - X_{n} β)] = β^{⊤} X_{n}^{⊤} X_{n} β - 2 β^{⊤} X_{n}^{⊤} S (ρ) S_{n}^{- 1} X_{n} β_{0} + n σ_{n}^{2} (ρ) + β_{0}^{⊤} X_{n}^{⊤} B_{n} X_{n} β_{0},

where

σ_{n}^{2} (ρ) = \frac{1}{n} σ_{0}^{2} tr (B_{n})

with

B_{n} = {(S_{n}^{- 1})}^{⊤} S {(ρ)}^{⊤} S (ρ) S_{n}^{- 1}

. Moreover,

σ_{n}^{2} (ρ) = σ_{0}^{2} + \frac{2 σ_{0}^{2}}{n} \sum_{k = 1}^{m} (ρ_{k 0} - ρ_{k}) tr (G_{k}) + \frac{σ_{0}^{2}}{n} \sum_{j, k = 1}^{m} (ρ_{j 0} - ρ_{j}) (ρ_{k 0} - ρ_{k}) tr (G_{j} G_{k}) = σ_{0}^{2} + O_{p} (d_{n}^{- 1}) .

The optimal solutions of this maximization (20) are

{\overset{˘}{β}}_{n} (ρ) = {(X_{n}^{⊤} X_{n})}^{- 1} X_{n}^{⊤} S (ρ) S_{n}^{- 1} X_{n} β_{0}

and

{\overset{˘}{σ}}_{n}^{2} (ρ) = \frac{1}{n} E \{{[S (ρ) Y_{n} - X_{n} {\overset{˘}{β}}_{n} (ρ)]}^{⊤} [S (ρ) Y_{n} - X_{n} {\overset{˘}{β}}_{n} (ρ)]\} = \frac{1}{n} [β_{0}^{⊤} X_{n}^{⊤} {(S_{n}^{- 1})}^{⊤} S {(ρ)}^{⊤} (I - P_{X_{n}}) S (ρ) S_{n}^{- 1} X_{n} β_{0} + σ_{0}^{2} tr (B_{n})] = \frac{1}{n} \{[\sum_{k = 1} (ρ_{k} - ρ_{k 0}) {(G_{k} X_{n} β_{0})}^{⊤}] (I - P_{X_{n}}) [\sum_{k = 1} (ρ_{k} - ρ_{k 0}) G_{k} X_{n} β_{0}]\} + σ_{n}^{2} (ρ) .

(21)

Hence,

Q_{n} (ρ) = - \frac{n}{2} [\log (2 π) + 1] - \frac{n}{2} \log {\overset{˘}{σ}}_{n}^{2} (ρ) + \log | S (ρ) | .

(22)

Let

f_{n} (z_{n} | ρ)

is the density function of a random vector following the multivariate normal

N_{n} (0, σ_{n}^{2} (ρ) S {(ρ)}^{- 1} S^{⊤} {(ρ)}^{- 1})

. When

ρ \neq ρ_{0}

, if the determinant of covariance

σ_{n}^{2} (ρ) S {(ρ)}^{- 1} S^{⊤} {(ρ)}^{- 1}

is not equal to the determinant of covariance

σ_{0}^{2} S_{n}^{- 1} {(S_{n}^{⊤})}^{- 1}

, then the probability of the set

{z_{n} : f_{n} (z_{n} | ρ) \neq f_{n} (z_{n} | ρ_{0})}

is not zero. Similarly, for any n, when

\frac{1}{n} \log | σ_{n}^{2} (ρ) S {(ρ)}^{- 1} S^{⊤} {(ρ)}^{- 1} | \neq \frac{1}{n} \log | σ_{0}^{2} S_{n}^{- 1} {(S_{n}^{⊤})}^{- 1} |

, the probability of the set

{z_{n} : \frac{1}{n} \log f_{n} (z_{n} | ρ) \neq \frac{1}{n} \log f_{n} (z_{n} | ρ_{0})}

is not zero.

Condition 3.

When

{d_{n}}

is a bound sequence, for

ρ \neq ρ_{0}

,

lim_{n \to \infty} \frac{1}{n} \{\log [σ_{0}^{2} S^{- 1} {(S^{⊤})}^{- 1}] - \log [σ_{n}^{2} (ρ) S {(ρ)}^{- 1} S^{⊤} {(ρ)}^{- 1}]\} \neq 0 .

After all above-mentioned preparations, consistency of the QML estimator follows.

Theorem 1.

Under Assumptions A1–A7 with a bound sequence

{d_{n}}

, given Condition 1 or 3, then

θ_{0}

is globally identifiable and

{\hat{θ}}_{n}

is a consistent estimator of

θ_{0}

.

Proof.

See Appendix A. □

Our argument process for consistency follows from Theorem 3.4 of White (1994).

3.2. Asymptotic Normality

In this subsection, we derive the asymptotic distribution of the QML estimator

{\hat{θ}}_{n}

. We start from the optimal condition

\frac{\partial \log L_{n} ({\hat{θ}}_{n})}{\partial θ} = 0, namely, {\frac{\partial \log L_{n} (θ)}{\partial θ}|}_{θ = {\hat{θ}}_{n}} = 0 .

Based on (12)–(14), the first-order derivatives of the log-likelihood function at

θ_{0}

are

\frac{1}{\sqrt{n}} \frac{\partial \log L_{n} (θ_{0})}{\partial ρ_{k}} = \frac{1}{\sqrt{n} σ_{0}^{2}} [ϵ_{n}^{⊤} G_{k} ϵ_{n} - σ_{0}^{2} tr (G_{k})] + \frac{1}{\sqrt{n} σ_{0}^{2}} {(G_{k} X_{n} β_{0})}^{⊤} ϵ_{n}, k = 1, \dots, m, \frac{1}{\sqrt{n}} \frac{\partial \log L_{n} (θ_{0})}{\partial β} = \frac{1}{\sqrt{n} σ_{0}^{2}} X_{n}^{⊤} ϵ_{n}, and \frac{1}{\sqrt{n}} \frac{\partial \log L_{n} (θ_{0})}{\partial σ^{2}} = \frac{1}{2 \sqrt{n} σ_{0}^{4}} (ϵ_{n}^{⊤} ϵ_{n} - n σ_{0}^{2}) .

(23)

Let us consider the likelihood score vector

U = \frac{1}{\sqrt{n}} \frac{\partial \log L_{n} (θ_{0})}{\partial θ} .

It is easy to see that

E (U) = 0

from (12)–(14). Then, the covariance or Fisher information matrix of U is

I_{n} (θ_{0}) \equiv Cov (U) = E [\frac{1}{\sqrt{n}} \frac{\partial \log L_{n} (θ_{0})}{\partial θ} \frac{1}{\sqrt{n}} \frac{\partial \log L_{n} (θ_{0})}{\partial θ^{⊤}}],

which can be decomposed into

E [\frac{1}{\sqrt{n}} \frac{\partial \log L_{n} (θ_{0})}{\partial θ} \frac{1}{\sqrt{n}} \frac{\partial \log L_{n} (θ_{0})}{\partial θ^{⊤}}] = Σ_{n} (θ_{0}) + Ω_{n} (θ_{0}),

where

Σ_{n} (θ_{0}) = - E [\frac{1}{n} \frac{\partial^{2} \log L_{n} (θ_{0})}{\partial θ \partial θ^{⊤}}]

is called the sample average Hessian matrix. The following discussion aims at calculation of

Σ_{n} (θ_{0})

and

Ω_{n} (θ_{0})

.

From (12)–(14), we obtain the following equations:

- \frac{1}{n} \frac{\partial^{2} \log L_{n} (θ)}{\partial ρ_{k} \partial ρ_{l}} = \frac{1}{n} tr [S {(ρ)}^{- 1} W_{k} S {(ρ)}^{- 1} W_{l}] + \frac{1}{n σ^{2}} Y_{n}^{⊤} W_{k}^{⊤} W_{l} Y_{n}, for k, l = 1, \dots, m,

(24)

where

d F^{- 1} = - F^{- 1} (d F) F^{- 1}

is used,

\begin{matrix} - \frac{1}{n} \frac{\partial^{2} \log L_{n} (θ)}{\partial β \partial β^{⊤}} & = \frac{1}{n σ^{2}} X_{n}^{⊤} X_{n}, \\ - \frac{1}{n} \frac{\partial^{2} \log L_{n} (θ)}{\partial β \partial ρ_{k}} & = \frac{1}{n σ^{2}} X_{n}^{⊤} W_{k} Y_{n}, for k = 1, \dots, m, \\ - \frac{1}{n} \frac{\partial^{2} \log L_{n} (θ)}{\partial β \partial σ^{2}} & = \frac{1}{n σ^{4}} X_{n}^{⊤} ϵ (ρ, β), \\ - \frac{1}{n} \frac{\partial^{2} \log L_{n} (θ)}{\partial σ^{2} \partial σ^{2}} & = - \frac{1}{2 σ^{4}} + \frac{1}{n σ^{6}} ϵ {(ρ, β)}^{⊤} ϵ (ρ, β), \\ - \frac{1}{n} \frac{\partial^{2} \log L_{n} (θ)}{\partial ρ_{k} \partial σ^{2}} & = \frac{1}{n σ^{4}} ϵ {(ρ, β)}^{⊤} W_{k} Y_{n}, for k = 1, \dots, m . \end{matrix}

(25)

By calculating the expectation of the above second derivatives (24) and (25) at

θ_{0}

, we obtain the following lemma.

Lemma 2.

The sample average Hessian matrix

Σ_{n} (θ_{0})

is given by

Σ_{n} (θ_{0}) = (\begin{matrix} Σ_{ρ ρ^{⊤}} (θ_{0}) & * & * \\ Σ_{β ρ^{⊤}} (θ_{0}) & Σ_{β β^{⊤}} (θ_{0}) & 0 \\ Σ_{σ^{2} ρ^{⊤}} (θ_{0}) & 0 & Σ_{σ^{2} σ^{2}} (θ_{0}) \end{matrix})

(26)

with

Σ_{ρ ρ^{⊤}} (θ_{0}) = (\begin{matrix} \frac{1}{n} tr (G_{k}^{s} G_{l} + G_{l}^{s} G_{k}) + \frac{1}{n σ_{0}^{2}} {(G_{k} X_{n} β_{0})}^{⊤} (G_{l} X_{n} β_{0}) \end{matrix}), Σ_{β ρ^{⊤}} (θ_{0}) = (\begin{matrix} \frac{1}{n σ_{0}^{2}} X_{n}^{⊤} (G_{1} X_{n} β_{0}) & \dots & \frac{1}{n σ_{0}^{2}} X_{n}^{⊤} (G_{m} X_{n} β_{0}) \end{matrix}), Σ_{β β^{⊤}} (θ_{0}) = \frac{1}{n σ_{0}^{2}} X_{n}^{⊤} X_{n}, Σ_{σ^{2} ρ^{⊤}} (θ_{0}) = (\begin{matrix} \frac{1}{n σ_{0}^{2}} tr (G_{1}) & \dots & \frac{1}{n σ_{0}^{2}} tr (G_{m}) \end{matrix}), and Σ_{σ^{2} σ^{2}} (θ_{0}) = \frac{1}{2 σ_{0}^{4}} .

Proof.

See Appendix A. □

Condition 4.

Assume that

lim_{n \to \infty} \frac{1}{n} {[vec (C_{1} + C_{1}^{⊤}), \dots, vec (C_{m} + C_{m}^{⊤})]}^{⊤} [vec (C_{1} + C_{1}^{⊤}), \dots, vec (C_{m} + C_{m}^{⊤})] > 0,

where

C_{k} = G_{k} - (1 / n) tr (G_{k}) I

and vec is a vectorization operator.

Condition 2 may be true only under the scenario that

{d_{n}}

is a bounded sequence because

tr (C_{i} C_{j}^{⊤}) = O (n / d_{n})

for any

i, j

. The following is a sufficient and necessary condition for a nonsingular average Hessian matrix

Σ_{θ} = {lim}_{n \to \infty} Σ_{n} (θ_{0})

.

Theorem 2.

Under Assumptions A1–A7, the average Hessian matrix

Σ_{θ}

is nonsingular if and only if either of Conditions 1 and 4 holds.

Proof.

See Appendix A. □

For a divergent sequence

{d_{n}}

, Condition 4 is violated. In this case,

Σ_{θ}

is nonsingular if and only if Condition 1 holds. Set

μ_{k} = E (ϵ^{k})

at

θ_{0}

,

k = 3, 4

and

g_{k} = {(G_{k, 11}, \dots, G_{k, n n})}^{⊤}

with

G_{k, i i}

being the

(i, i)

th component of

G_{k}

. Calculation of

Ω_{n} (θ_{0})

is summarized in the following lemma.

Lemma 3.

The difference between the information matrix and the sample average Hessian matrix is given by

Ω_{n} (θ_{0}) = I_{n} (θ_{0}) - Σ_{n} (θ_{0}) = (\begin{matrix} Ω_{ρ ρ^{⊤}} (θ_{0}) & * & * \\ Ω_{β ρ^{⊤}} (θ_{0}) & 0 & * \\ Ω_{σ^{2} ρ^{⊤}} (θ_{0}) & Ω_{σ^{2} β^{⊤}} (θ_{0}) & Ω_{σ^{2} σ^{2}} (θ_{0}) \end{matrix}),

(27)

where

Ω_{ρ ρ^{⊤}} (θ_{0}) = (\begin{matrix} \frac{1}{n σ_{0}^{4}} (μ_{4} - 3 σ_{0}^{4}) g_{k}^{⊤} g_{l} + \frac{μ_{3}}{n σ_{0}^{4}} [{(G_{k} X_{n} β_{0})}^{⊤} g_{l} + g_{k}^{⊤} (G_{l} X_{n} β_{0})] \end{matrix}), Ω_{β ρ^{⊤}} (θ_{0}) = (\begin{matrix} \frac{μ_{3}}{n σ_{0}^{4}} X_{n}^{⊤} g_{1} & \dots & \frac{μ_{3}}{n σ_{0}^{4}} X_{n}^{⊤} g_{l} & \dots & \frac{μ_{3}}{n σ_{0}^{4}} X_{n}^{⊤} g_{m} \end{matrix}), Ω_{σ^{2} ρ^{⊤}} (θ_{0}) = (\begin{matrix} \dots & \frac{1}{2 n σ_{0}^{6}} [(μ_{4} - 3 σ_{0}^{4}) tr (G_{l}) + μ_{3} 1^{⊤} G_{l} X_{n} β_{0}] & \dots \end{matrix}), Ω_{σ^{2} β^{⊤}} (θ_{0}) = \frac{μ_{3}}{2 n σ_{0}^{6}} 1^{⊤} X_{n}, and Ω_{σ^{2} σ^{2}} (θ_{0}) = \frac{1}{4 σ^{8}} (μ_{4} - 3 σ_{0}^{4}) .

Proof.

See Appendix A. □

If

ϵ_{n}

is normally distributed, then

μ_{3} = 0

and

μ_{4} = 3 σ_{0}^{4}

, implying that

Ω_{n} (θ_{0}) = 0

. Thus, the sample average Hessian matrix

Σ_{n} (θ_{0})

is the covariance of U or the information matrix

I_{n} (θ_{0})

. In the sense,

Σ_{n} (θ_{0})

can be said to be an information matrix. Finally, with the above long and necessary preparations, the asymptotic distribution of the QML estimator

{\hat{θ}}_{n}

is summarized in the following theorem.

Theorem 3.

Under Assumptions A1–A7 and either of Conditions 1 and 4,

\sqrt{n} ({\hat{θ}}_{n} - θ_{0})

converges in distribution to the multivariate normal

N (0, Σ_{θ}^{- 1} + Σ_{θ}^{- 1} Ω_{θ} Σ_{θ}^{- 1})

, where

Ω_{θ} = lim_{n \to \infty} Ω_{n} (θ_{0}) and Σ_{θ} = - lim_{n \to \infty} E [\frac{1}{n} \frac{\partial^{2} \log L_{n} (θ_{0})}{\partial θ \partial θ^{⊤}}] .

Moreover, particularly if

ϵ_{n}

is normally distributed, then

\sqrt{n} ({\hat{θ}}_{n} - θ_{0})

converges in distribution to the multivariate normal

N (0, Σ_{θ}^{- 1})

.

Proof.

See Appendix A. □

The estimation of the asymptotic covariance of

{\hat{θ}}_{n}

is a routine issue. The

Σ_{θ}

is estimated by

- \frac{1}{n} \frac{\partial^{2} \log L_{n} ({\hat{θ}}_{n})}{\partial θ \partial θ^{⊤}} .

The

Ω_{θ}

is estimated by

I_{n} ({\hat{θ}}_{n}) - Σ_{n} ({\hat{θ}}_{n})

in (27). For the QML estimator, the extra moments

μ_{3}

and

μ_{4}

in (27) can be estimated by the third and fourth order empirical moments based on estimated residuals of the

ϵ

’s.

Remark 1.

The asymptotic results in Theorems 1 and 3 are valid regardedless of whether the series

{d_{n}}

is a bounded or divergent sequence.

For the case in which

{lim}_{n \to \infty} d_{n} = \infty

, because

G_{k, i j} = O (d_{n}^{- 1})

, the matrices (16) and (27) can be simplified to

Σ_{θ} = lim_{n \to \infty} (\begin{matrix} (\begin{matrix} \frac{1}{n σ_{0}^{2}} {(G_{k} X_{n} β_{0})}^{⊤} G_{l} X_{n} β_{0} \end{matrix}) & * & 0 \\ \frac{1}{n σ_{0}^{2}} X_{n}^{⊤} (\begin{matrix} G_{1} X_{n} β_{0} & \dots & G_{m} X_{n} β_{0} \end{matrix}) & \frac{1}{n σ_{0}^{2}} X_{n}^{⊤} X_{n} & 0 \\ 0 & 0 & \frac{1}{2 σ_{0}^{4}} \end{matrix})

and

Ω_{θ} = lim_{n \to \infty} (\begin{matrix} 0 & 0 & * \\ 0 & 0 & * \\ \frac{μ_{3}}{2 n σ_{0}^{6}} 1^{⊤} (\begin{matrix} G_{1} X_{n} β_{0} & \dots & G_{m} X_{n} β_{0} \end{matrix}) & \frac{μ_{3}}{2 n σ_{0}^{6}} 1^{⊤} X_{n} & \frac{1}{4 σ_{0}^{8}} (μ_{4} - 3 σ_{0}^{4}) \end{matrix}) .

Theoretically, the presence of

X_{n}

and the linear independence of

G_{1} X_{n} β_{0}, \dots, G_{m} X_{n} β_{0}

and

X_{n}

are the crucial conditions for the asymptotic results in Theorem 3, in particular, the

\sqrt{n}

-rate of convergence of

{\hat{θ}}_{n}

. Practically,

G_{1} X_{n} β_{0}, \dots, G_{m} X_{n} β_{0}

and

X_{n}

are not (asymptotically) multicollinear to guarantee consistency of

{\hat{θ}}_{n}

.

Remark 2.

When the disturbances ϵ’s are normally distributed,

{\hat{θ}}_{n}

is the ML estimator. The ML estimators,

{\hat{β}}_{n}

and

{\hat{σ}}_{n}^{2}

, still have independent as the same as that of linear regression analysis regardedless of whether the series

{d_{n}}

is a bounded or divergent sequence. However, the dependence between ML estimator

{\hat{ρ}}_{n}

and

{\hat{σ}}_{n}^{2}

relies on whether the series

{d_{n}}

is a bounded or divergent sequence. When the series

{d_{n}}

is bounded, there is some

k_{0}

such that the ML estimator

{\hat{ρ}}_{k_{0}}

and

{\hat{σ}}_{n}^{2}

will be asymptotically dependent, see (16), because

{lim}_{n \to \infty} tr (G_{k_{0}}) / n

is finite and may not be zero. Anselin and Bera (1998) discussed the implication of this dependence on statistical inference problems for the case of

m = 1

. We also see that, for the case in which the series

{d_{n}}

is a divergent sequence,

{lim}_{n \to \infty} tr (G_{k}) / n = 0

for all k, the QML estimator

{\hat{ρ}}_{n}

and

{\hat{σ}}_{n}^{2}

are asymptotically independent.

Remark 3.

The requirements in Conditions 1 and 2 are for all spatial coefficients. Sometime, it is possible these requirements to be satisfied only by partial spatial coefficients. Write

ρ = {(ρ_{1}^{⊤}, ρ_{2}^{⊤})}^{⊤}

and

ρ_{0} = {(ρ_{10}^{⊤}, ρ_{20}^{⊤})}^{⊤}

, where

ρ_{1}

and

ρ_{10}

are

m_{1}

-dimensional while

ρ_{2}

and

ρ_{20}

are

(m - m_{1})

-dimensional. Conditions 1 and 2 hold only for partial spatial coefficients

ρ_{10}

, i.e., without loss of generality,

Condition 1’. The

{lim}_{n \to \infty} \frac{1}{n} {(G_{1} X_{n} β_{0}, \dots, G_{m_{1}} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) (G_{1} X_{n} β_{0}, \dots, G_{m_{1}} X_{n} β_{0}) > 0

.

Condition 4’. The

{lim}_{n \to \infty} \frac{1}{n} {[vec (C_{1} + C_{1}^{⊤}), \dots, vec (C_{m_{1}} + C_{m_{1}}^{⊤})]}^{⊤} [vec (C_{1} + C_{1}^{⊤}), \dots,

vec (C_{m_{1}} + C_{m_{1}}^{⊤})] > 0

.

After using Condition 1’ and 4’ to replace Conditions 1 and 4, consistency in Theorem 1 and asymptotic normality in Theorem 3 hold for

{\hat{ρ}}_{1}, {\hat{β}}_{n}, {\hat{σ}}_{n}^{2}

, at

ρ_{2} = ρ_{20}

.

4. Asymptotic Normality with Non-Square-Root Rates

Consider the case of

{lim}_{n \to \infty} = \infty

. It follows from Theorem 2 that the average Hessian matrix is singular under Condition 5 and 6.

Condition 5.

For any

k \in {1, \dots, m}

,

{lim}_{n \to \infty} \frac{1}{n} {(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) (G_{k} X_{n} β_{0}) = 0

.

Condition 6.

For any

k \in {1, \dots, m}

, the

{lim}_{n \to \infty} \frac{1}{n} vec {(C_{k} + C_{k}^{⊤})}^{⊤} vec (C_{k} + C_{k}^{⊤}) = 0

.

For example, for the pure SAR process (10) with

θ = {(ρ^{⊤}, σ^{2})}^{⊤}

, as

d_{n} \to \infty

,

Σ_{θ} = diag [0, 1 / (2 σ_{0}^{2})]

. There are other cases in which the singularity occurs, see Remark 4. However, it is easy to see that there is a space zone between Theorem 3 and inconsistency. In this section, we will investigate some new condition of guaranteeing consistency with non-square-root rates or asymptotic normality in the situation under Condition 5 and 6.

Lee [] suggested that the singularity of the average Hessian matrix or the information matrix under normal disturbances has implications on the rate of convergence of the estimators. When

{lim}_{n \to \infty} d_{n} = \infty

,

(1 / n) \log L_{n} (ρ)

is rather flat with respect to

ρ

and the convergence of

(1 / n) [\log L_{n} (ρ) - Q_{n} (ρ)]

to zero is too fast to be useful. A properly adjusted rate is made such that

\frac{d_{n}}{n} \{[Log L_{n} (ρ) - \log L_{n} (ρ_{0})] - [Q_{n} (ρ) - Q_{n} (ρ_{0})]\} \overset{P}{⟶} 0 uniformly for ρ in Λ .

We consider the following two new conditions.

Condition 7.

The

{d_{n}}

is a divergent sequence, elements of

(I - P_{X_{n}}) (G_{k} X_{n} β_{0})

for

k = 1, \dots, m

have the uniform order

O (d_{n}^{- 1})

, and

lim_{n \to \infty} \frac{d_{n}}{n} {(G_{1} X_{n} β_{0}, \dots, G_{m} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) (G_{1} X_{n} β_{0}, \dots, G_{m} X_{n} β_{0}) > 0 .

Condition 7 modifies Condition 1 with the factor

d_{n}

to account for the proper rate of convergence. It is a generalization of Condition 1.

Condition 8.

The

{d_{n}}

is a divergent sequence and

lim_{n \to \infty} [\frac{d_{n}}{n} \log |σ_{0}^{2} S_{n}^{- 1} {(S_{n}^{⊤})}^{- 1}| - \frac{d_{n}}{n} \log |σ_{n}^{2} (ρ) S {(ρ)}^{- 1} S^{- 1} {(ρ)}^{⊤}|] \neq 0 .

Condition 8 modifies Condition 3 with the factor

d_{n}

to account for the proper rate of convergence. It is a generalization of Condition 3.

Theorem 4.

Under Assumptions A1–A7 and Conditions 5 and 6, if either of Conditions 7 and 8 holds, then the QML estimator

{\hat{ρ}}_{n}

derived from the maximization of

\log L_{n} (ρ)

in (17) is a consistent estimator.

Proof.

See Appendix A. □

The central limit theorem for a linear–quadratic form implies that

(\sqrt{d_{n} / n}) \partial L_{n} (ρ) / \partial ρ

is asymptotically normal. The asymptotic distribution of

{\hat{ρ}}_{n}

follows from

\sqrt{\frac{d_{n}}{n}} ({\hat{ρ}}_{n} - ρ_{0}) = - {(\frac{d_{n}}{n} \frac{\partial^{2} \log L_{n} (ρ_{0})}{\partial ρ \partial ρ^{⊤}})}^{- 1} \sqrt{\frac{d_{n}}{n}} \frac{\partial \log L_{n} (ρ_{0})}{\partial ρ} .

Theorem 5.

Under Assumptions A1–A7, and Condition 7 or 8,

\sqrt{\frac{n}{d_{n}}} ({\hat{ρ}}_{n} - ρ_{0}) \overset{D}{⟶} N (0, Σ_{ρ}^{- 1} I_{ρ} Σ_{ρ}^{- 1}),

(28)

where

Σ_{ρ} = - lim_{n \to \infty} E [\frac{d_{n}}{n} \frac{\partial^{2} L_{n} (ρ_{0})}{\partial ρ \partial ρ^{⊤}}] = (\begin{matrix} lim_{n \to \infty} b_{k l} \end{matrix})

with

b_{k l} = \frac{1}{σ_{0}^{2}} [\frac{d_{n}}{n} {(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) (G_{l} X_{n} β_{0})] + \frac{d_{n}}{n} [tr (G_{k}^{⊤} G_{l}) + tr (G_{k} G_{l})]

and

I_{ρ} = lim_{n \to \infty} E [\sqrt{\frac{d_{n}}{n}} \frac{\partial L_{n} (ρ_{0})}{\partial ρ} \sqrt{\frac{d_{n}}{n}} \frac{\partial L_{n} (ρ_{0})}{\partial ρ^{⊤}}] = (\begin{matrix} lim_{n \to \infty} \frac{d_{n}}{n {\tilde{σ}}_{n}^{4} (ρ_{0})} e_{k l} \end{matrix})

with

e_{k l} = σ_{0}^{2} {(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) (G_{l} X_{n} β_{0}) + μ_{3} [{(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) c_{l} + {(G_{l} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) c_{k}] + (μ_{4} - 3 σ_{0}^{4}) c_{k}^{⊤} c_{l} + σ_{0}^{4} [tr ((I - P_{X_{n}}) C_{k} (I - P_{X_{n}}) C_{l}) + tr (C_{k}^{⊤} (I - P_{X_{n}}) C_{l})] .

Proof.

See Appendix A. □

After finding the limiting distribution of

{\hat{ρ}}_{n}

, the limiting distributions of

{\hat{β}}_{n}

and

{\hat{σ}}_{n}^{2}

defined in (15) and (16) are inevitable consequences.

Theorem 6.

Under Assumptions A1–A7, and Condition 7 or 8,

\sqrt{\frac{n}{d_{n}}} ({\hat{β}}_{n} - β_{0}) = \frac{1}{\sqrt{d_{n}}} {(\frac{X_{n}^{⊤} X_{n}}{n})}^{- 1} \frac{X_{n}^{⊤} ϵ_{n}}{\sqrt{n}} - {(\frac{X_{n}^{⊤} X_{n}}{n})}^{- 1} (\begin{matrix} \frac{1}{n} X_{n}^{⊤} (G_{1} X_{n} β_{0}) & \dots & \frac{1}{n} X_{n}^{⊤} (G_{1} X_{n} β_{0}) \end{matrix}) \sqrt{\frac{n}{d_{n}}} ({\hat{ρ}}_{n} - ρ_{0}) - \frac{1}{\sqrt{n}} \sum_{k = 1}^{m} \sqrt{\frac{n}{d_{n}}} ({\hat{ρ}}_{k} - ρ_{k 0}) {(\frac{X_{n}^{⊤} X_{n}}{n})}^{- 1} \frac{(X_{n}^{⊤} G_{k} ϵ_{n})}{\sqrt{n}} \overset{D}{⟶} N (0, V^{⊤} Σ_{ρ}^{- 1} I_{ρ} Σ_{ρ}^{- 1} V),

(29)

where

V = {lim}_{n \to \infty} {(\frac{X_{n}^{⊤} X_{n}}{n})}^{- 1} {lim}_{n \to \infty} (\begin{matrix} \frac{1}{n} X_{n}^{⊤} (G_{1} X_{n} β_{0}) & \dots & \frac{1}{n} X_{n}^{⊤} (G_{m} X_{n} β_{0}) \end{matrix})

, and

\sqrt{n} ({\hat{σ}}_{n}^{2} - σ_{0}^{2}) = \sum_{k = 1}^{n} \frac{ϵ_{i}^{2} - σ_{0}^{2}}{\sqrt{n}} + o_{p} (1) \overset{D}{⟶} N (0, μ_{4} - σ_{0}^{4}) .

In particular, when

β_{0} = 0

,

\sqrt{n} {\hat{β}}_{n} = {(\frac{X_{n}^{⊤} X_{n}}{n})}^{- 1} \frac{X_{n}^{⊤} ϵ_{n}}{\sqrt{n}} - \sqrt{\frac{d_{n}}{n}} \sum_{k = 1}^{m} \sqrt{\frac{n}{d_{n}}} ({\hat{ρ}}_{k} - ρ_{k 0}) {(\frac{X_{n}^{⊤} X_{n}}{n})}^{- 1} \frac{X_{n}^{⊤} G_{k} ϵ_{n}}{\sqrt{n}} \overset{D}{⟶} N (0, σ_{0}^{2} lim_{n \to \infty} \frac{1}{n} X_{n}^{⊤} X_{n}) .

Proof.

See Appendix A. □

The asymptotic distribution of

{\hat{ρ}}_{n}

has the

\sqrt{n / d_{n}}

-rate of convergence in Theorem 5. As

d_{n}

is divergent, this rate of convergence is lower than

\sqrt{n}

. The asymptotic distribution of the QML estimator

{\hat{β}}_{n}

and its low rate of convergence in Theorem 6 are determined by the asymptotic distribution of

{\hat{ρ}}_{n}

that forms the leading term in the asymptotic expansion (29). When

β_{0} = 0

, this leading term vanishes and

{\hat{β}}_{n}

converges to

β_{0}

with the usual

\sqrt{n}

-rate. The asymptotic distribution of

{\hat{σ}}_{n}^{2}

keeps the usual

\sqrt{n}

-rate of convergence.

The rate of convergence of

{\hat{β}}_{n}

may be changed when all items,

(1 / n) X_{n}^{⊤} G_{1} X_{n} β_{0}

, …,

(1 / n) X_{n}^{⊤} G_{m} X_{n} β_{0}

, may vanish asymptotically and simultaneously. However, the exact rate of convergence will depend on how fast these items

(1 / n) X_{n}^{⊤} G_{1} X_{n} β_{0}, \dots, (1 / n) X_{n}^{⊤} G_{m} X_{n} β_{0}

will vanish in the limit. This asymptotical and simultaneous evanescence of all items may result in some components of

{\hat{β}}_{n}

have

\sqrt{n}

-rate of convergence while others have lower rate of convergence. An interested reader can refer to Lee [] for the details to the case of

m = 1

.

Remark 4.

The set of the vectors

G_{1} X_{n} β_{0}, \dots, G_{m} X_{n} β_{0}

and the regressor matrix

X_{n}

can be linearly dependent under some circumstances.

Circumstance 1: The regression coefficient

β_{0}

is a zero vector. The heterogeneous SAR model (7) reduces to the pure heterogeneous SAR model (9). Thus, the vectors,

G_{1} X_{n} β_{0}, \dots, G_{m} X_{n} β_{0}

, all become zero vectors. Condition 2 is violated because of

diag (0, {lim}_{n \to \infty} \frac{1}{n} X_{n}^{⊤} X_{n}) \geq 0

, resulting in that the set of the vectors

G_{1} X_{n} β_{0}, \dots, G_{m} X_{n} β_{0}

and the columns of regressor matrix

X_{n}

can be linearly dependent. Condition 1 is also violated.

Circumstance 2: For a specific W and

X_{n}

,

(I - P_{X_{n}}) G_{1} X_{n} β_{0} = 0

, see the counterexample discussed in next subsection. In this case, Conditions 1 and 4 both are violated.

Circumstance 3: When

X_{n} = 1 \otimes x^{⊤}

with a p-dimensional vector,

P_{X_{n}} = P_{1} \otimes P_{x^{⊤}}

. Then,

(I - P_{X_{n}}) G_{k} X_{n} β_{0} = (I - P_{1}) \otimes (1 - P_{x ⊤}) G_{k} (I \otimes x) β_{0} = 0

for any k. Condition 1 is violated.

In the above circumstances, if Condition 4 is also violated, for example,

d_{n}

is a divergent sequence, consistency may be violated, implying that asymptotic normality is also violated. Supplementary material provides a counterexample leading to an inconsistent QML estimators.

5. Simulation Studies

In this simulation study section, we estimate the parameters in the heterogenous spatial heterogenous spatial autoregressive model (7) with the given values of the parameters. To investigate the finite sample properties of the QMLE by a Monte Carlo study with an appropriate spatial matrix W, which satisfies the assumptions in Section 2, we focus on the same spatial scenario as those investigated by Case [] and Lee [].

Specifically,

W_{n}

=

I_{n_{1}} \otimes L_{n_{2}}

, where

L_{n_{2}} = (1_{n_{2}} 1_{n_{2}}^{⊤} - I_{n_{2}}) / (n_{2} - 1)

and

1_{n_{2}}

is an

n_{2}

-dimensional column vector of ones. There is

n_{1}

districts and

n_{2}

members in each district with each neighbor of a member in a district being given equal weight. For this spatial scenario,

d_{n}

=

(n_{2} - 1)

,

n = n_{1} n_{2}

and hence

d_{n} / n

=

O (1 / n_{1})

. If both

n_{1}

and

n_{2}

increase to infinite, then

d_{n}

goes to infinity and

d_{n} / n

goes to zero as n tends to infinity. Let us consider the data sets generated from the following model

Y_{n} = X_{n} β + ρ_{1} W_{1} Y_{n} + ρ_{2} W_{2} Y_{n} + \dots + ρ_{5} W_{5} Y_{n} + ϵ_{n},

where

W_{1}, \dots, W_{5}

are the uniform column segmentation of spatial matrix

W_{n}

, namely,

W_{1}

is the first

n / 5

column of

W_{n}

, …, and

W_{5}

is the last

n / 5

column of

W_{n}

such that

W = W_{1} + \dots + W_{5}

, the spatial coefficient vector

ρ = {(ρ_{1}, \dots, ρ_{5})}^{⊤}

is set to

{(0.2, 0.35, 0.5, 0.65, 0.8)}^{⊤}

and

ϵ_{n}

follows

N (0, I_{n})

.

We consider three different settings for models with three different types of regressors without the intercept term:

Model 1: Set $β = (1, 1)$ . The n-dimensional regressor vector $X_{1}$ is generated by the i.i.d. standard normal $N (0, 1)$ and $X_{2}$ is generated by the i.i.d t-distribution $t (2)$ .
Model 2: The setting is the same as Model 1. Additionally, the correlation coefficient of the first regressor and the second regressor is $0.5$ .
Model 3: Set $β = (1, 1)$ , $Z_{1 l}$ , $l = 1, \dots, n_{1}$ be generated by the standard normal $N (0, 1)$ and $Z_{2 l}, l = 1, \dots, n_{1}$ be generated by the t-distribution $t (2)$ . The first regressor $x_{1 i l}$ of the ith member in the district l is generated as $X_{1 i l} = (Z_{1 l} + Z_{1 i l}) / \sqrt{2}$ , where $Z_{1 i l}$ are i.i.d. $N (0, 1)$ for all i and l and are independent of $Z_{1 l}$ , and the second regressor $x_{2 i l}$ is generated as $X_{2 i l} = (Z_{2 l} + Z_{2 i l}) / \sqrt{2}$ where $Z_{2 i l}$ are i.i.d $t (2)$ for all i and l and are independent of $Z_{2 l}$ . The correlation coefficient of the two regressors is $0.5$ . This specification implies that the average value of $x_{1 i l}$ and $x_{2 i l}$ of the district l will converge in probability to $Z_{1 l}$ and $Z_{2 l}$ as $n_{2}$ goes to infinity.

The statistical R language is used in simulation studies. For each model, there are 400 repetitions. The empirical mean, bias, empirical root mean square error (RMSE), and coverage probability of

100 (1 - α) %

confidence interval (CP) for

θ = {(ρ^{⊤}, β^{⊤}, σ^{2})}^{⊤}

are reported, respectively, in Table 1, Table 2 and Table 3.

Table 1. Empirical ML estimates of

θ

and CP in Model 1 with true value

θ_{0}

and

n = n_{1} n_{2}

.

Table 2. Empirical ML estimates of

θ

and CP in Model 2 with true value

θ_{0}

and

n = n_{1} n_{2}

.

Table 3. Empirical ML estimates of

θ

and CP in Model 3 with true value

θ_{0}

and

n = n_{1} n_{2}

.

We experimented with three different values of

n_{1}

at 30, 50 and 80 and three different values of

n_{2}

at 20, 40 and 60, respectively, resulting in n taking values of 600, 1000, 1200, 1600, 1800, 2000, 3000, 3200 and 4800. For a fixed

n_{1}

, the biases and RMSEs of

ρ

,

β

and

σ^{2}

decrease as

n_{2}

increases. On the other hand, for a fixed

n_{2}

, the biases and RMSEs of

θ

,

β

and

σ^{2}

decrease as

n_{1}

increases. The estimators of

ρ

,

β

and

σ^{2}

in Model 1 and Model 2 have similar bias and RMSE, while the estimator of

ρ

,

β

and

σ^{2}

in Model 3 has the lowest bias and RMSE in all three models.

From the three tables above, we conclude that regressor vectors whose elements are correlated (see Model 3) makes the parameter estimator have lower bias and RMSE than those in Model 1, and this improvement of the estimator is not affected by the correlated regressors because the regressors of both Model 2 and Model 3 are correlated. The coverage probability of

95 %

confidence interval (CP) of

ρ

,

β

and

σ^{2}

are more closer to

0.95

as n increases.

6. Concluding Remarks

In this paper, we proposed a heterogeneous spatial autoregressive model with all

n (n - 1)

spatial coefficients taking m distinct true values, where m is independent of the sample size n, comprehensively expounding the motivations and reasons for investigating the model. It is necessary and important to establish a corresponding estimation and the properties for the novel model. We established the asymptotic properties of the maximum likelihood estimator and the quasi-maximum likelihood estimator for the parameters in the novel model, extending Lee’s work [] for the classic spatial autoregressive model.

In the spatial autoregressive model, the homoscedasticity of the variances of disturbances was too unrealistic, particularly as the sample size n is large to infinite. How to relax the assumption of homoscedasticity to heteroscedasticity is an interesting and challenging problem. It is also our future research topic.

Author Contributions

Conceptualization, J.H. and F.Q.; methodology, J.H. and H.D.; software, H.D. and F.Q.; validation, H.D. and F.Q.; project management, J.H. and H.D.; investigation, F.Q. and H.D.; Preparation of the original work draft, H.D. and F.Q.; revision, J.H. and F.Q.; visualization, F.Q.; supervision, funding acquisition, J.H., H.D. and F.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China [Grants 11571219] and Shanghai Research Center for Data Science and Decision Technology. This work was also supported by Startup Foundation for Talents in Zhejiang Agriculture and Forestry University (2017FR044).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

Sincere thanks to everyone who suggested revisions and improved this article.

Conflicts of Interest

No conflict of interest among all authors.

Appendix A. Proofs of Main Theoretical Results

Proof (Proof of Lemma 2) .

Let A be a nonrandom matrix, then

E (Y^{⊤} A Y) = μ^{⊤} A μ + tr (A Σ) .

(A1)

for a random vector Y with mean

μ = E (Y)

and covariance matrix

Cov (Y) = Σ

. With the second derivatives (24), (25), (A1) and routine calculations, we obtain

- E [\frac{1}{n} \frac{\partial^{2} \log L_{n} (θ_{0})}{\partial ρ_{k} \partial ρ_{l}}] = \frac{1}{n} tr [S_{n}^{- 1} W_{k} S_{n}^{- 1} W_{l}] + \frac{1}{n σ_{0}^{2}} E (Y_{n}^{⊤} W_{k}^{⊤} W_{l} Y_{n}) = \frac{1}{n} tr (G_{k}^{s} G_{l}) + \frac{1}{n σ_{0}^{2}} {(G_{k} X_{n} β_{0})}^{⊤} (G_{l} X_{n} β_{0}) with G_{k}^{s} = G_{k}^{⊤} + G_{k}, for k, l = 1, \dots, m,

- E [\frac{1}{n} \frac{\partial^{2} \log L_{n} (θ_{0})}{\partial β \partial β^{⊤}}] = \frac{1}{n σ_{0}^{2}} X_{n}^{⊤} X_{n},

- E [\frac{1}{n} \frac{\partial^{2} \log L_{n} (θ_{0})}{\partial β \partial ρ_{k}}] = \frac{1}{n σ_{0}^{2}} X_{n}^{⊤} (G_{k} X_{n} β_{0}), for k = 1, \dots, m,

- E [\frac{1}{n} \frac{\partial^{2} \log L_{n} (θ_{0})}{\partial β \partial σ^{2}}] = 0,

- E [\frac{1}{n} \frac{\partial^{2} \log L_{n} (θ_{0})}{\partial σ^{2} \partial ρ_{k}}] = \frac{1}{n σ_{0}^{4}} E [ϵ_{n}^{⊤} W_{k} Y_{n}] = \frac{1}{n σ_{0}^{2}} tr (G_{k}), for k = 1, \dots, m,

- E [\frac{1}{n} \frac{\partial^{2} \log L_{n} (θ)}{\partial σ^{2} \partial σ^{2}}] = - \frac{1}{2 σ_{0}^{4}} + \frac{1}{n σ_{0}^{6}} E [ϵ_{n}^{⊤} ϵ_{n}] = \frac{1}{2 σ_{0}^{4}} .

Therefore, Lemma 2 is proved. □

Proof (Proof of Lemma 3) .

Using the block matrices expresses

I_{n} (θ_{0})

as

I_{n} (θ_{0}) = (\begin{matrix} I_{ρ ρ^{⊤}} (θ_{0}) & * & * \\ I_{β ρ^{⊤}} (θ_{0}) & I_{β β^{⊤}} (θ_{0}) & * \\ I_{σ^{2} ρ^{⊤}} (θ_{0}) & I_{σ^{2} β^{⊤}} (θ_{0}) & I_{σ^{2} σ^{2}} (θ_{0}) \end{matrix}) .

With routine calculations, we have

I_{ρ ρ^{⊤}} (θ_{0}) = (\begin{matrix} a_{11} & a_{12} & \dots & a_{1 m} \\ \dots & \dots & \dots & \dots \\ a_{m 1} & a_{m 2} & \dots & a_{m m} \end{matrix}),

where

a_{k l} = \frac{1}{n} E \{[\frac{1}{σ_{0}^{2}} ϵ_{n}^{⊤} W_{k} Y_{n} - tr (S^{- 1} W_{k})] [\frac{1}{σ_{0}^{2}} ϵ_{n}^{⊤} W_{l} Y_{n} - tr (S^{- 1} W_{l})]\} = \frac{1}{n} [tr (G_{k}) tr (G_{l}) - \frac{1}{σ_{0}^{2}} tr (G_{k}) E (ϵ_{n}^{⊤} G_{l} ϵ_{n}) - \frac{1}{σ_{0}^{2}} tr (G_{l}) E (ϵ_{n}^{⊤} G_{k} ϵ_{n}) + \frac{1}{σ_{0}^{4}} E (Y_{n}^{⊤} W_{k}^{⊤} ϵ_{n} ϵ_{n}^{⊤} W_{l} Y_{n})]

with

E (Y_{n}^{⊤} W_{k}^{⊤} ϵ_{n} ϵ_{n}^{⊤} W_{l} Y_{n}) = E [{(X_{n} β_{0} + ϵ_{n})}^{⊤} {(W_{k} S_{n}^{- 1})}^{⊤} ϵ_{n} ϵ_{n}^{⊤} (W_{l} S_{n}^{- 1}) (X_{n} β_{0} + ϵ_{n})] = {(G_{k} X_{n} β_{0})}^{⊤} E [ϵ_{n} ϵ_{n}^{⊤}] G_{l} X_{n} β_{0} + {(G_{k} X_{n} β_{0})}^{⊤} E [ϵ_{n} ϵ_{n}^{⊤} G_{l} ϵ_{n}] + E [ϵ_{n}^{⊤} G_{k}^{⊤} ϵ_{n} ϵ_{n}^{⊤}] G_{l} X_{n} β_{0} + E [ϵ_{n}^{⊤} G_{k}^{⊤} ϵ_{n} ϵ_{n}^{⊤} G_{l} ϵ_{n}] .

Recall that, for i.i.d.

ϵ_{1}, \dots, ϵ_{n}

,

E (ϵ_{i} ϵ_{j}) = \{\begin{matrix} σ_{0}^{2}, & for i = j \\ 0, & for i \neq j; \end{matrix} E (ϵ_{i} ϵ_{j} ϵ_{s}) = \{\begin{matrix} μ_{3}, & for i = j = s, \\ 0, & otherwise; \end{matrix}

and

E (ϵ_{i} ϵ_{j} ϵ_{s} ϵ_{t}) = \{\begin{matrix} μ_{4}, & for i = j = s = t, \\ σ_{0}^{4}, & for s = t, i = j or s = i, t = j or s = j, t = i, \\ 0, & otherwise . \end{matrix}

Then, we have

E (ϵ_{n}^{⊤} G_{k} ϵ_{n}) = E [tr (G_{k} V ϵ_{n}^{⊤})] = tr [G_{k} E (ϵ_{n} ϵ_{n}^{⊤})] = σ_{0}^{2} tr (G_{k}), E (ϵ_{n}^{⊤} G_{l} ϵ_{n}) = E [tr (G_{l} V ϵ_{n}^{⊤})] = tr [G_{l} E (ϵ_{n} ϵ_{n}^{⊤})] = σ_{0}^{2} tr (G_{l}), E (ϵ_{n} ϵ_{n}^{⊤} G_{k} ϵ_{n}) = E (ϵ_{1}^{3}) {(G_{k, 11}, \dots, G_{i, n n})}^{⊤} = μ_{3} g_{k}, E [ϵ_{n}^{⊤} G_{l}^{⊤} ϵ_{n} ϵ_{n}^{⊤}] = E (ϵ_{1}^{3}) (G_{l, 11}, \dots, G_{l, n n}) = μ_{3} g_{l}^{⊤}

and

E (ϵ_{n}^{⊤} G_{k}^{⊤} ϵ_{n} ϵ_{n}^{⊤} G_{l}^{⊤} ϵ_{n}) = \sum_{s = 1}^{n} \sum_{t = 1}^{n} \sum_{i = 1}^{n} \sum_{j = 1}^{n} G_{k, t s} G_{l, i j} E (ϵ_{s} ϵ_{t} ϵ_{i} ϵ_{j}) = μ_{4} \sum_{i = 1}^{n} G_{k, i i} G_{l, i i} + σ_{0}^{4} \sum_{s \neq i} [G_{k, s s} G_{l, i i} + G_{k, i s} G_{l, i s} + G_{k, i s} G_{l, s i}] = (μ_{4} - 3 σ_{0}^{4}) g_{k}^{⊤} g_{l} + σ_{0}^{4} \sum_{s, i = 1}^{n} [G_{k, s s} G_{l, i i} + G_{k, i s} G_{l, i s} + G_{k, i s} G_{l, s i}] = (μ_{4} - 3 σ_{0}^{4}) g_{k}^{⊤} g_{l} + σ_{0}^{4} [tr (G_{k}) tr (G_{l}) + tr (G_{k} G_{l}) + tr (G_{k}^{⊤} G_{l})] = (μ_{4} - 3 σ_{0}^{4}) g_{k}^{⊤} g_{l} + σ_{0}^{4} [tr (G_{k}) tr (G_{l}) + tr (G_{k}^{s} G_{l} + G_{k} G_{l}^{s})],

implying that

a_{k l} = \frac{1}{n σ_{0}^{2}} {(G_{k} X_{n} β_{0})}^{⊤} (G_{l} X_{n} β_{0}) + \frac{1}{n} tr (G_{k}^{s} G_{l}) + \frac{1}{n σ_{0}^{4}} (μ_{4} - 3 σ_{0}^{4}) g_{k}^{⊤} g_{l} + \frac{μ_{3}}{n σ_{0}^{4}} [{(G_{k} X_{n} β_{0})}^{⊤} g_{l} + g_{k}^{⊤} (G_{l} X_{n} β_{0})] .

(A2)

Moreover, we have

I_{β ρ^{⊤}} (θ_{0}) = (\begin{matrix} \dots & \frac{1}{n} E \{[\frac{1}{σ_{0}^{2}} X_{n}^{⊤} ϵ_{n}] [\frac{1}{σ_{0}^{2}} ϵ_{n}^{⊤} W_{l} Y_{n} - tr (S_{n}^{- 1} W_{l})]\} & \dots \end{matrix}) = {(\begin{matrix} \dots & \frac{1}{n σ_{0}^{2}} X_{n}^{⊤} (G_{l} X_{n} β_{0}) + \frac{μ_{3}}{n σ_{0}^{4}} X_{n}^{⊤} g_{l} & \dots \end{matrix})}_{p \times m}

I_{β β^{⊤}} (θ_{0}) = \frac{1}{n σ_{0}^{4}} E [X_{n}^{⊤} ϵ_{n} ϵ_{n}^{⊤} X_{n}] = \frac{1}{n σ_{0}^{2}} X_{n}^{⊤} X_{n},

I_{σ^{2} ρ^{⊤}} (θ_{0}) = (\begin{matrix} \dots & \frac{1}{n} E \{[- \frac{n}{2 σ_{0}^{2}} + \frac{1}{2 σ_{0}^{4}} ϵ_{n}^{⊤} ϵ_{n}] [\frac{1}{σ_{0}^{2}} ϵ_{n}^{⊤} W_{l} Y_{n} - tr (S^{- 1} W_{l})]\} & \dots \end{matrix}) = {(\begin{matrix} \dots & \frac{1}{2 n σ_{0}^{6}} [(μ_{4} - σ_{0}^{4}) tr (G_{l}) + μ_{3} 1^{⊤} G_{l} X_{n} β_{0}] & \dots \end{matrix})}_{1 \times m}

I_{σ^{2} β^{⊤}} (θ_{0}) = \frac{1}{n} E \{[- \frac{n}{2 σ_{0}^{2}} + \frac{1}{2 σ_{0}^{4}} ϵ_{n}^{⊤} ϵ_{n}] [\frac{1}{σ_{0}^{2}} ϵ_{n}^{⊤} X_{n}]\} = \frac{μ_{3}}{2 n σ_{0}^{6}} 1^{⊤} X_{n},

and

I_{σ^{2} σ^{2}} (θ_{0}) = \frac{1}{n} E \{[- \frac{n}{2 σ_{0}^{2}} + \frac{1}{2 σ_{0}^{4}} ϵ_{n}^{⊤} ϵ_{n}] [- \frac{n}{2 σ_{0}^{2}} + \frac{1}{2 σ_{0}^{4}} ϵ_{n}^{⊤} ϵ_{n}]\} = \frac{1}{4 σ_{0}^{8}} (μ_{4} - σ_{0}^{4}) .

Finally, calculating

I_{n} (θ_{0}) - Σ_{n} (θ_{0})

obtains the expression of

Ω_{n} (θ_{0})

. □

Proof (Proof of Theorem 1) .

It follows from the estimation procedure stated in Section 2 that

{\hat{β}}_{n}

and

{\hat{σ}}_{n}^{2}

are the continuous functions of

{\hat{ρ}}_{n}

. It means that we only need to show the consistency of

{\hat{ρ}}_{n}

. To realize this goal, it suffices from Theorem 3.4 of White [] to show that (i)

lim_{n \to \infty} \frac{1}{n} [\log L_{n} (ρ) - Q_{n} (ρ)] = 0

uniformly for

ρ

on

Γ

, and (ii) the uniqueness identification condition that, for any

ε > 0

,

\underset{n \to \infty}{lim sup} \{max_{ρ \in N_{ε} (ρ_{0})} \frac{1}{n} [Q_{n} (ρ) - Q_{n} (ρ_{0})]\} < 0,

where

N_{ε} (ρ_{0})

is the complement of an open neighborhood of

ρ_{0}

in

Γ

of radius

ε

.

To prove part (i), it follows from (17) and (22) that

\frac{1}{n} [\log L_{n} (ρ) - Q_{n} (ρ)] = - \frac{1}{2} [\log {\tilde{σ}}_{n}^{2} (ρ) - \log {\overset{˘}{σ}}_{n}^{2} (ρ)]

where

{\overset{˘}{σ}}_{n}^{2} (ρ)

is determined by (21) and

{\tilde{σ}}_{n}^{2} (ρ)

from (16) can be written as

{\tilde{σ}}_{n}^{2} (ρ) = \frac{1}{n} [\sum_{k = 1}^{m} (ρ_{k} - ρ_{k 0}) {(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) \sum_{k = 1}^{m} (ρ_{k} - ρ_{k 0}) (G_{k} X_{n} β_{0})] + D_{1 n} + D_{2 n} = {\overset{˘}{σ}}_{n}^{2} (ρ) + D_{1 n} + [D_{2 n} - σ_{n}^{2} (ρ)]

with

D_{1 n} (ρ) = \frac{1}{n} \sum_{k = 1}^{m} (ρ_{k} - ρ_{k 0}) {(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) S (ρ) S_{n}^{- 1} ϵ_{n}

and

D_{2 n} (ρ) = \frac{1}{n} ϵ_{n}^{⊤} {(S_{n}^{- 1})}^{⊤} S {(ρ)}^{⊤} (I - P_{X_{n}}) S (ρ) S_{n}^{- 1} ϵ_{n} .

It can be shown that

D_{1 n} (ρ) = o_{p} (1)

and

D_{2 n} (ρ) - σ_{n}^{2} (ρ) = o_{p} (1)

uniformly on

Γ

. Therefore,

\log {\tilde{σ}}_{n}^{2} (ρ) - \log {\overset{˘}{σ}}_{n}^{2} (ρ) = \log [1 + \frac{D_{1 n}}{{\overset{˘}{σ}}_{n}^{2} (ρ)} + \frac{D_{2 n} - σ_{n}^{2} (ρ)}{{\overset{˘}{σ}}_{n}^{2} (ρ)}] = o_{p} (1)

(A3)

uniformly on

Γ

. Consequently,

sup_{ρ \in Γ} \{\frac{1}{n} | \log L_{n} (ρ) - Q_{n} (ρ) |\} = o_{p} (1),

proving part (i).

To prove part (ii), the identification uniqueness condition can be established as follows. Note that

[Q_{n} (ρ) - Q_{n} (ρ_{0})] / n

can be divided into two parts

\frac{1}{n} [Q_{n} (ρ) - Q_{n} (ρ_{0})] = \frac{1}{n} \{E [\log f_{n} (z_{n} | ρ)] - E [\log f_{n} (z_{n} | ρ_{0})]\} - \frac{1}{2} [\log {\overset{˘}{σ}}_{n}^{2} (ρ) - \log σ_{n}^{2} (ρ)],

where

f_{n} (z_{n} | ρ)

is the density function of a random vector following the multivariate normal

N_{n} (0, σ_{n}^{2} (ρ) S {(ρ)}^{- 1} S^{⊤} {(ρ)}^{- 1})

, and

E

is the expectation with respect to the probability density function

f_{n} (z_{n} | ρ_{0})

. For the first part,

{E [\log f_{n} (z_{n} | ρ)] - E [\log f_{n} (z_{n} | ρ_{0})]} / n

, consider the following function

g (z_{n} | ρ) = {[\frac{f (z_{n} | ρ)}{f (z_{n} | ρ_{0})}]}^{\frac{1}{n}} .

For

ρ \neq ρ_{0}

, it follows from Jensen’s inequality that

E [- \log g (z_{n} | ρ)] \geq - \log E [g (z_{n} | ρ)] \geq - \log {[E (\frac{f (z_{n} | ρ)}{f (z_{n} | ρ_{0})})]}^{\frac{1}{n}} = - \log {[\int \frac{f (z_{n} | ρ)}{f (z_{n} | ρ_{0})} f (z_{n} | ρ_{0}) d z_{n}]}^{\frac{1}{n}} = 0,

implying that

\frac{1}{n} E [\log f_{n} (z_{n} | ρ)] \leq \frac{1}{n} E [\log f_{n} (z_{n} | ρ_{0})] for ρ \neq ρ_{0} .

(A4)

For the second part, it is clear from (21) that

\log σ_{n}^{2} (ρ) \leq \log {\overset{˘}{σ}}_{n}^{2} (ρ) .

(A5)

It follows

\frac{1}{n} [Q_{n} (ρ) - Q_{n} (ρ_{0})] \leq 0 .

Under Condition 1, strict inequality (A5) holds, implying that

Q_{n} (ρ) / n < Q_{n} (ρ_{0}) / n

and then the identification uniqueness condition (ii) holds. When Condition 1 is violated, inequality (A5) may becomes an equality, for example,

β_{0}^{⊤} X_{n}^{⊤} G_{k}^{⊤} (I - P_{X_{n}}) G_{k} X_{n} β_{0} = 0

for all k. That

Q_{n} (ρ) / n < Q_{n} (ρ_{0}) / n

holds only if the strick inequality (A4) holds. Since

- \log (x)

and

x^{1 / n}

are strictly convex, a sufficient condition of guaranteeing the strict inequality (A4) is

P (\{z_{n} : g_{n} (z_{n} | ρ) \neq 1 for ρ \neq ρ_{0}\}) > 0 .

or

P (\{z_{n} : \frac{1}{n} \log f_{n} (z_{n} | ρ) - \frac{1}{n} \log f_{n} (z_{n} | ρ_{0}) \neq 0 for ρ \neq ρ_{0}\}) > 0,

which is true under Condition 4. Therefore,

Q_{n} (ρ) / n < Q_{n} (ρ_{0}) / n

and then the identification uniqueness condition (ii) also holds.

Combining part (i) and (ii) together completes the proof of consistency. □

Proof (Proof of Theorem 2) .

Under Assumptions A1–A7, taking the row elementary transformation

E = (\begin{matrix} I & - Σ_{ρ β^{⊤}} (θ_{0}) Σ_{β β^{⊤}} {(θ_{0})}^{- 1} & - Σ_{ρ {σ^{2}}^{⊤}} (θ_{0}) Σ_{σ^{2} σ^{2}} {(θ_{0})}^{- 1} \\ 0 & I & 0 \\ 0 & 0 & I \end{matrix})

on the average Hessian matrix

Σ_{n} (θ_{0})

yields

E Σ_{n} (θ_{0}) = (\begin{matrix} Σ_{ρ ρ^{⊤}} - Σ_{ρ β^{⊤}} Σ_{β β^{⊤}}^{- 1} Σ_{β ρ^{⊤}} - Σ_{ρ {σ^{2}}^{⊤}} Σ_{σ^{2} σ^{2}}^{- 1} Σ_{σ^{2} ρ^{⊤}} & 0 & 0 \\ Σ_{β ρ^{⊤}} (θ_{0}) & Σ_{β β^{⊤}} (θ_{0}) & 0 \\ Σ_{σ^{2} ρ^{⊤}} (θ_{0}) & 0 & Σ_{σ^{2} σ^{2}} (θ_{0}) \end{matrix}) .

Thus, under Assumptions A1–A7, the average Hessian matrix

Σ_{n} (θ_{0})

is nonsingular if and only if

Σ_{ρ ρ^{⊤}} - Σ_{ρ β^{⊤}} Σ_{β β^{⊤}}^{- 1} Σ_{β ρ^{⊤}} - Σ_{ρ {σ^{2}}^{⊤}} Σ_{σ^{2} σ^{2}}^{- 1} Σ_{σ^{2} ρ^{⊤}}

is nonsingular. It can be decomposed into

Σ_{ρ ρ^{⊤}} - Σ_{ρ β^{⊤}} Σ_{β β^{⊤}}^{- 1} Σ_{β ρ^{⊤}} - Σ_{ρ {σ^{2}}^{⊤}} Σ_{σ^{2} σ^{2}}^{- 1} Σ_{σ^{2} ρ^{⊤}} = H_{1}^{⊤} H_{1} + H_{2}^{⊤} H_{2},

(A6)

where

H_{1} = (\begin{matrix} \frac{1}{\sqrt{n}} (I - P_{X_{n}}) G_{1} X_{n} β_{0} & \dots & \frac{1}{\sqrt{n}} (I - P_{X_{n}}) G_{m} X_{n} β_{0} \end{matrix})

and

H_{2} = (\begin{matrix} \frac{\sqrt{2}}{\sqrt{n}} vec (C_{1} + C_{1}^{⊤}) & \dots & \frac{\sqrt{2}}{\sqrt{n}} vec (C_{m} + C_{m}^{⊤}) \end{matrix})

with

C_{k} = G_{k} - \frac{1}{n} tr (G_{k}) I

. Therefore, it follows from (A6) that

Σ_{ρ ρ^{⊤}} - Σ_{ρ β^{⊤}} Σ_{β β^{⊤}}^{- 1} Σ_{β ρ^{⊤}} - Σ_{ρ {σ^{2}}^{⊤}} Σ_{σ^{2} σ^{2}}^{- 1} Σ_{σ^{2} ρ^{⊤}}

is nonsingular if and only if either of

H_{1}^{⊤} H_{1}

and

H_{2}^{⊤} H_{2}

exists and nonsingular, which are guaranteed by Condition 1 or 4. We complete the proof. □

Proof (Proof of Theorem 3) .

By the mean value theorem, the function

\partial \log L_{n} (θ) / \partial θ

at

{\hat{θ}}_{n}

is expressed as

\frac{\partial \log L_{n} (θ)}{\partial θ} = \frac{\partial^{2} \log L_{n} ({\bar{θ}}_{n})}{\partial θ \partial θ^{⊤}} (θ - {\hat{θ}}_{n}),

(A7)

where

{\bar{θ}}_{n}

is between

θ_{0}

and

{\hat{θ}}_{n}

. First, we show that

- \frac{1}{n} \frac{\partial^{2} \log L_{n} ({\bar{θ}}_{n})}{\partial θ \partial θ^{⊤}} converges in probability to - \frac{1}{n} \frac{\partial^{2} \log L_{n} (θ_{0})}{\partial θ \partial θ^{⊤}}

(A8)

Assumption A5 implies that

G_{1} ({\bar{ρ}}_{n}), \dots, G_{m} ({\bar{ρ}}_{n})

are uniformly bounded in row and column sums uniformly in a neighborhood of

ρ_{0}

. For

k, l = 1, \dots, m

, from (24), we have

\frac{1}{n} \frac{\partial^{2} \log L_{n} ({\bar{θ}}_{n})}{\partial ρ_{k} \partial ρ_{l}} - \frac{1}{n} \frac{\partial^{2} \log L_{n} (θ_{0})}{\partial ρ_{k} \partial ρ_{l}} = \frac{1}{n} \{tr [G_{k} ({\bar{ρ}}_{n}) G_{l} ({\bar{ρ}}_{n})] - tr [G_{k} (ρ_{0}) G_{l} (ρ_{0})]\} + [\frac{1}{{\bar{σ}}_{n}^{2}} - \frac{1}{σ_{0}^{2}}] Y_{n}^{⊤} W_{k}^{⊤} W_{l} Y_{n}, = - \frac{1}{n} h^{⊤} ({\dot{ρ}}_{n}) ({\dot{ρ}}_{n} - ρ_{0}) + \frac{1}{n} [\frac{1}{{\bar{σ}}_{n}^{2}} - \frac{1}{σ_{0}^{2}}] Y_{n}^{⊤} W_{k}^{⊤} W_{l} Y_{n} = o_{p} (1),

where the mean value theorem is used, the vector

h (ρ) = {(h_{1} (ρ), \dots, h_{m} (ρ))}^{⊤}

with

h_{i} (ρ) = tr [G_{k} (ρ) G_{i} (ρ) G_{l} (ρ) + tr [G_{k} (ρ) G_{l} (ρ) G_{i} (ρ)]

and

{\dot{ρ}}_{n}

is between

{\bar{ρ}}_{n}

and

ρ_{0}

, due to

h_{i} ({\dot{ρ}}_{n}) = O (n / d_{n})

and

Y_{n}^{⊤} W_{k}^{⊤} W_{l} Y_{n} = O_{p} (n / d_{n})

. As other terms of the second order derivatives in (3.8) can be easily analyzed, the result (A8) holds.

Second, based on the expressions (24)–(26) and the facts that

X^{⊤} G_{k} ϵ_{n} / n = o_{p} (1)

from Assumptions A1–A7 and

(1 / n) [ϵ_{n}^{⊤} G^{⊤} G_{l} ϵ_{n} - σ_{0}^{2} tr (G_{k}^{⊤} G_{l})] = o_{p} (1)

, we obtain

- \frac{1}{n} \frac{\partial^{2} \log L_{n} (θ_{0})}{\partial θ \partial θ^{⊤}} converges in probability to - E [\frac{1}{n} \frac{\partial^{2} \log L_{n} (θ_{0})}{\partial θ \partial θ^{⊤}}] .

It follows from Theorem 2 that the average Hessian matrix

Σ_{θ}

is nonsingular for large enough n. Hence,

\partial^{2} \log L_{n} ({\bar{θ}}_{n}) / \partial θ \partial θ^{⊤}

in the neighborhood

N_{ε} (θ_{0})

is nonsingular. It follows from (A7) that

\sqrt{n} ({\hat{θ}}_{n} - θ_{0}) = - {[\frac{1}{\sqrt{n}} \frac{\partial^{2} \log L_{n} ({\bar{θ}}_{n})}{\partial θ \partial θ^{⊤}}]}^{- 1} \frac{1}{\sqrt{n}} \frac{\partial \log L_{n} (θ_{0})}{\partial θ} .

Third, the components of

(1 / \sqrt{n}) \partial \log L_{n} (θ_{0}) / \partial θ

are linear or quadratic functions of

ϵ_{n}

. With the existence of high-order moments of

ϵ

in Assumption A1, the central limit theorem for linear quadratic forms of Kelejian and Prucha [] can be applied and

\frac{1}{\sqrt{n}} \frac{\partial \log L_{n} (θ_{0})}{\partial θ} converges in distribution N (0, Σ_{θ} + Ω_{θ}) .

Finally, it follows from Slutsky’s theorem that

\sqrt{n} ({\hat{θ}}_{n} - θ_{0}) converges in distribution to N (0, Σ_{θ}^{- 1} + Σ_{θ}^{- 1} Ω_{θ} Σ_{θ}^{- 1}) .

If

ϵ_{n}

is normally distributed, then

Ω_{θ} = 0

, implying that

\sqrt{n} ({\hat{θ}}_{n} - θ_{0})

converges in distribution to the multivariate normal

N (0, Σ_{θ}^{- 1})

. Therefore, the proof is complete. □

Proof (Proof of Theorem 4) .

From (16) and (22), we find

Δ_{n} (ρ) \equiv \frac{d_{n}}{n} [\log L_{n} (ρ) - Q_{n} (ρ)] = - \frac{d_{n}}{2} [\log {\tilde{σ}}_{n}^{2} (ρ) - \log {\overset{˘}{σ}}_{n}^{2} (ρ)] .

By the mean value theorem,

Δ_{n} (ρ) - Δ_{n} (ρ_{0}) = - \frac{d_{n}}{2} \frac{\partial [\log {\tilde{σ}}_{n}^{2} (\bar{ρ}) - \log {\overset{˘}{σ}}_{n}^{2} (\bar{ρ})]}{\partial ρ^{⊤}} (ρ - ρ_{0}), = \frac{1}{{\tilde{σ}}^{2} (\bar{ρ})} \frac{d_{n}}{n} (F_{1} (\bar{ρ}), \dots, F_{m} (\bar{ρ})) (ρ - ρ_{0}),

where

\bar{ρ}

is between

ρ

and

ρ_{0}

and

F_{k} (ρ) = B_{k} (ρ) - \frac{{\tilde{σ}}^{2} (\bar{ρ}) - {\overset{˘}{σ}}^{2} (\bar{ρ})}{{\overset{˘}{σ}}^{2} (\bar{ρ})} A_{k} (ρ)

with

A_{k} (ρ) = \sum_{j = 1}^{m} (ρ_{j 0} - ρ_{j}) {(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) G_{k} X_{n} β_{0} + σ_{0}^{2} tr [G_{k}^{⊤} S (ρ) S_{n}^{- 1}]

and

B_{k} (ρ) = Y_{n}^{⊤} W_{k}^{⊤} (I - P_{X_{n}}) S (ρ) Y_{n} - A_{k} (ρ) .

We must show that (i)

Δ_{n} (ρ) - Δ_{n} (ρ_{0})

converges in probability to zero uniformly on

Γ

.

Note that, by the law of large numbers for quadratic forms,

\frac{d_{n}}{n} [ϵ_{n}^{⊤} (I - P_{X_{n}}) G_{k} ϵ_{n} - σ_{0}^{2} tr (G_{k})] = o_{p} (1) and \frac{d_{n}}{n} [ϵ_{n}^{⊤} G_{k}^{⊤} (I - P_{X_{n}}) G_{l} ϵ_{n} - σ_{0}^{2} tr (G_{k}^{⊤} G_{l})] = o_{p} (1),

and, by Condition 7,

\frac{d_{n}}{n} (G_{k} X_{n} β_{0}^{⊤}) (I - P_{X_{n}}) G_{l} ϵ_{n} = o_{p} (1) and \frac{d_{n}}{n} (G_{k} X_{n} β_{0}^{⊤}) (I - P_{X_{n}}) ϵ_{n} = o_{p} (1) .

Thus, we obtain

\frac{d_{n}}{n} B_{k} = \frac{d_{n}}{n} [{(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) ϵ_{n} + \sum_{j = 1}^{m} (ρ_{j 0} - ρ_{j}) {(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) G_{j} ϵ_{n} + \sum_{j = 1}^{m} (ρ_{j 0} - ρ_{j}) {(G_{j} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) G_{k} ϵ_{n} + \sum_{j = 1}^{m} (ρ_{j 0} - ρ_{j}) ϵ_{n}^{⊤} G_{k}^{⊤} (I - P_{X_{n}}) G_{j} ϵ_{n} - σ_{0}^{2} tr (G_{k}^{⊤}) - σ_{0}^{2} \sum_{j = 1}^{m} (ρ_{j 0} - ρ_{j}) tr (G_{k}^{⊤} G_{j})] = o_{p} (1),

uniformly on

Γ

.

(d_{n} / n) A_{k} (ρ)

has

O (1)

uniformly on

Γ

. From (A3) in Theorem 1,

{\tilde{σ}}_{n}^{2} (ρ) - {\overset{˘}{σ}}^{2} (ρ) = o_{p} (1)

uniformly on

Γ

. On

{\tilde{σ}}^{2} (\bar{ρ})

and

{\overset{˘}{σ}}^{2} (\bar{ρ})

are bounded (away from zero) in probability. Therefore,

\frac{d_{n}}{n} {\log L_{n} (ρ) - Q_{n} (ρ) - [\log L_{n} (ρ_{0}) - Q_{n} (ρ_{0})]}

converges in probability to zero uniformly on

Γ

, having proved (i).

Next, we need to show that (ii) the uniqueness identification condition that, for any

ε > 0

,

\underset{n \to \infty}{lim sup} \{max_{ρ \in N_{ε} (ρ_{0})} \frac{d_{n}}{n} [Q_{n} (ρ) - Q_{n} (ρ_{0})]\} < 0,

where

N_{ε} (ρ_{0})

is the complement of an open neighborhood of

ρ_{0}

in

Γ

of radius

ε

.

The similar argument process shown in part (ii) of Theorem 1 is adopted. Note that

d_{n} [Q_{n} (ρ) - Q_{n} (ρ_{0})] / n

can be divided into two parts

\frac{d_{n}}{n} [Q_{n} (ρ) - Q_{n} (ρ_{0})] = \frac{d_{n}}{n} \{E [\log f_{n} (z_{n} | ρ)] - E [\log f_{n} (z_{n} | ρ_{0})]\} - \frac{d_{n}}{2} [\log {\overset{˘}{σ}}_{n}^{2} (ρ) - \log σ_{n}^{2} (ρ)],

where

f_{n} (z_{n} | ρ)

is the density function of a random vector following the multivariate normal

N (0, σ_{n}^{2} (ρ) S {(ρ)}^{- 1} S^{⊤} {(ρ)}^{- 1})

and

E

is the expectation with respect to the probability density function

f_{n} (z_{n} | ρ_{0})

. For the first part, consider the following function

g_{1} (z_{n} | ρ) = {[\frac{f (z_{n} | ρ)}{f (z_{n} | ρ_{0})}]}^{\frac{d_{n}}{n}} .

For

ρ \neq ρ_{0}

, it follows from Jensen’s inequality that

E [- \log g_{1} (z_{n} | ρ)] \geq - \log E [g_{1} (z_{n} | ρ)] \geq - \log {[E (\frac{f (z_{n} | ρ)}{f (z_{n} | ρ_{0})})]}^{\frac{d_{n}}{n}} = - \log {[\int \frac{f (z_{n} | ρ)}{f (z_{n} | ρ_{0})} f (z_{n} | ρ_{0}) d z_{n}]}^{\frac{d_{n}}{n}} = 0,

implying that

\frac{d_{n}}{n} E [\log f_{n} (z_{n} | ρ)] \leq \frac{d_{n}}{n} E [\log f_{n} (z_{n} | ρ_{0})] for ρ \neq ρ_{0} .

(A9)

For the second part, it is clear from (21) that

\log σ_{n}^{2} (ρ) \leq \log {\overset{˘}{σ}}_{n}^{2} (ρ) .

(A10)

It follows

\frac{d_{n}}{n} [Q_{n} (ρ) - Q_{n} (ρ_{0})] \leq 0 .

Under Condition 7, strict inequality (A10) holds, implying that

d_{n} Q_{n} (ρ) / n < d_{n} Q_{n} (ρ_{0}) / n

and then the identification uniqueness condition (ii) holds. When Condition 7 is violated, inequality (A10) may becomes an equality. That

d_{n} Q_{n} (ρ) / n < d_{n} Q_{n} (ρ_{0}) / n

holds only if the strick inequality (A9) holds. Since

- \log (x)

and

x^{d_{n} / n}

(

d_{n} < n

from Assumption A3) are strictly convex, a sufficient condition of guaranteeing the strict inequality (A9) is

P (\{z_{n} : g_{n} (z_{n} | ρ) \neq 1 for ρ \neq ρ_{0}\}) > 0 .

or

P (\{z_{n} : \frac{d_{n}}{n} \log f_{n} (z_{n} | ρ) - \frac{d_{n}}{n} \log f_{n} (z_{n} | ρ_{0}) \neq 0 for ρ \neq ρ_{0}\}) > 0,

which is true under Condition 8. Therefore,

d_{n} Q_{n} (ρ) / n < d_{n} Q_{n} (ρ_{0}) / n

, and then the identification uniqueness condition (ii) also holds.

Combining the part (i) and (ii) together proves the consistency. □

Proof (Proof of Theorem 5) .

To derive the limiting (here, the asymptotic normal) distribution of

{\hat{ρ}}_{n}

with

\sqrt{n / d_{n}}

-rate of convergence, our statements are divided into four steps.

First Step: The first- and second-order derivatives of the concentrated log-likelihood are derived as

\frac{\partial L_{n} (ρ)}{\partial ρ_{k}} = \frac{1}{{\tilde{σ}}_{n}^{2} (ρ)} Y_{n}^{⊤} W_{k}^{⊤} (I - P_{X_{n}}) S (ρ) Y_{n} - tr [S {(ρ)}^{- 1} W_{k}] for k = 1, \dots, m, and \frac{\partial^{2} L_{n} (ρ)}{\partial ρ_{k} \partial ρ_{l}} = - \frac{2}{n {\tilde{σ}}^{4} (ρ)} Y_{n}^{⊤} W_{k}^{⊤} (I - P_{X_{n}}) S (ρ) Y_{n} Y_{n}^{⊤} W_{l}^{⊤} (I - P_{X_{n}}) S (ρ) Y_{n} - \frac{1}{{\tilde{σ}}_{n}^{2} (ρ)} Y_{n}^{⊤} W_{k}^{⊤} (I - P_{X_{n}}) W_{l} Y_{n} - tr [S {(ρ)}^{- 1} W_{k} S {(ρ)}^{- 1} W_{l}] for k, j = 1, \dots, m,

where

{\tilde{σ}}_{n}^{2} (ρ) = (1 / n) Y_{n}^{⊤} S (ρ) (I - P_{X_{n}}) S (ρ) Y_{n}

. For the pure SAR process,

β_{0} = 0

and

P_{X_{n}} = 0

and the corresponding derivatives are similar with

(I - P_{X_{n}})

replaced by the identity I.

Second step: Under Condition 7 or 8, we have

\frac{d_{n}}{n} Y_{n}^{⊤} W_{k}^{⊤} (I - P_{X_{n}}) W_{l} Y_{n} = \frac{d_{n}}{n} {(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) (G_{l} X_{n} β_{0}) + \frac{d_{n}}{n} ϵ_{n}^{⊤} G_{k}^{⊤} (I - P_{X_{n}}) G_{l} ϵ_{n} + o_{p} (1)

and

\frac{d_{n}}{n} Y_{n}^{⊤} W_{k}^{⊤} (I - P_{X_{n}}) S (ρ) Y_{n} = \frac{d_{n}}{n} ϵ_{n}^{⊤} G_{k}^{⊤} (I - P_{X_{n}}) ϵ_{n} + \sum_{j = 1}^{m} (ρ_{j 0} - ρ_{j}) [\frac{d_{n}}{n} {(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) (G_{l} X_{n} β_{0}) + \frac{d_{n}}{n} ϵ_{n}^{⊤} G_{k}^{⊤} (I - P_{X_{n}}) G_{l} ϵ_{n}] + o_{p} (1) .

When

{lim}_{n \to \infty} d_{n} = \infty

, we have

(\sqrt{d_{n}} / n) Y_{n} W_{k}^{⊤} (I - P_{X_{n}}) S (ρ) Y_{n} = o_{p} (1)

and

{\tilde{σ}}_{n}^{2} (ρ) = σ_{0}^{2} + o_{p} (1)

uniformly on

Γ

. Then,

\frac{d_{n}}{n} \frac{\partial^{2} L_{n} (ρ)}{\partial ρ_{k} \partial ρ_{l}} = - \frac{1}{σ_{0}^{2}} [\frac{d_{n}}{n} {(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) (G_{l} X_{n} β_{0}) + \frac{d_{n}}{n} ϵ_{n}^{⊤} G_{k}^{⊤} (I - P_{X_{n}}) G_{l} ϵ_{n}] - \frac{d_{n}}{n} tr [S {(ρ)}^{- 1} W_{k} S {(ρ)}^{- 1} W_{l}] + o_{p} (1) .

Under Assumption A6,

(d_{n} / n) tr [G_{k} (ρ) G_{i} (ρ) G_{l} (ρ)] = O_{p} (1)

uniformly on

Γ

. By the Taylor expansion, we have

\frac{d_{n}}{n} [\frac{\partial^{2} L_{n} (\bar{ρ})}{\partial ρ_{k} \partial ρ_{l}} - \frac{\partial^{2} L_{n} (ρ_{0})}{\partial ρ_{k} \partial ρ_{l}}] = - \frac{d_{n}}{n} \{tr [S {(ρ)}^{- 1} W_{k} S {(ρ)}^{- 1} W_{l}] - tr [G_{k} G_{l}]\} + o_{p} (1) = - \frac{d_{n}}{n} \{tr [G_{k} ({\bar{ρ}}_{n}) G_{l} ({\bar{ρ}}_{n})] - tr [G_{k} G_{l}]\} + o_{p} (1) = - \frac{d_{n}}{n} h^{⊤} ({\bar{ρ}}_{n}) ({\bar{ρ}}_{n} - ρ_{0}) + o_{p} (1) \overset{P}{⟶} 0,

where the mean value theorem is used again,

\bar{ρ}

is any consistent estimator to

ρ_{0}

and the vector

h (ρ) = {(h_{1} (ρ), \dots, h_{m} (ρ))}^{⊤}

with

h_{i} (ρ) = tr [G_{i} (ρ) G_{k} (ρ) G_{l} (ρ) + tr [G_{k} (ρ) G_{i} (ρ) G_{l} (ρ)]

.

Define

F_{k l} (ρ_{0}) = - \frac{1}{σ_{0}^{2}} [{(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) (G_{l} X_{n} β_{0}) + ϵ_{n}^{⊤} G_{k}^{⊤} (I - P_{X_{n}}) G_{l} ϵ_{n}] - tr (G_{k} G_{l}) .

Then,

E [F_{k l} (ρ_{0})] = - \frac{1}{σ_{0}^{2}} [{(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) (G_{l} X_{n} β_{0})] - [tr (G_{k}^{⊤} G_{l}) + tr (G_{k} G_{l})] + o_{p} (1) .

Since,

\frac{d_{n}}{n} [ϵ_{n}^{⊤} G_{k}^{⊤} (I - P_{X_{n}}) G_{l} ϵ_{n} - σ_{0}^{2} tr (G_{k}^{⊤} (I - P_{X_{n}}) G_{l})] = o_{p} (1),

we find

\frac{d_{n}}{n} {F_{k l} (ρ_{0}) - E [F_{k l} (ρ_{0})]} = - \frac{1}{σ_{0}^{2}} \frac{d_{n}}{n} [ϵ_{n}^{⊤} G_{k}^{⊤} (I - P_{X_{n}}) G_{l} ϵ_{n} - σ_{0}^{2} tr (G_{k}^{⊤} (I - P_{X_{n}}) G_{l})] + o (1) = o_{p} (1)

and then

\frac{d_{n}}{n} \frac{\partial^{2} L_{n} (ρ_{0})}{\partial ρ_{k} \partial ρ_{l}} = \frac{d_{n}}{n} F_{k l} (ρ_{0}) + o_{p} (1) = \frac{d_{n}}{n} E [F_{k l} (ρ_{0})] + o_{p} (1) = - \frac{1}{σ_{0}^{2}} [\frac{d_{n}}{n} {(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) (G_{l} X_{n} β_{0})] - \frac{d_{n}}{n} [tr (G_{k}^{⊤} G_{l}) + tr (G_{k} G_{l})] + o_{p} (1) \equiv - b_{k l} + o_{p} (1) .

By Condition 7 or 8, the average Hessian matrix or information matrix under normality

Σ_{ρ} = - lim_{n \to \infty} E_{ρ} (\begin{matrix} \frac{d_{n}}{n} \frac{\partial^{2} L_{n} (ρ_{0})}{\partial ρ_{k} \partial ρ_{l}} \end{matrix}) = lim_{n \to \infty} {(b_{k l})}_{m \times m} > 0

is nonsingular.

Third step: Considering

\sqrt{\frac{d_{n}}{n}} \frac{\partial L_{n} (ρ_{0})}{\partial ρ}

, we have

\sqrt{\frac{d_{n}}{n}} \frac{\partial L_{n} (ρ_{0})}{\partial ρ_{k}} = \frac{1}{{\tilde{σ}}^{2} (ρ_{0})} \sqrt{\frac{d_{n}}{n}} [{(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) ϵ_{n} + ϵ_{n}^{⊤} C_{k}^{⊤} (I - P_{X_{n}}) ϵ_{n}] .

Let

η_{k} = {(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) ϵ_{n}

and

ζ_{k} = ϵ_{n}^{⊤} C_{k}^{⊤} (I - P_{X_{n}}) ϵ_{n}

. Our task is to find the mean and covariance of

{(η_{1} + ζ_{1}, \dots, η_{m} + ζ_{m})}^{⊤}

With routine calculations, we obtain

E (η_{k}) = 0 and E (ζ_{k}) = σ_{0}^{2} tr [C_{k}^{⊤} (I - P_{X_{n}})] = O_{p} (1)

and for any

k, l = 1, \dots, m

Cov [(η_{k} + ζ_{k}) (η_{l} + ζ_{l})] = E [(η_{k} + ζ_{k}) (η_{l} + ζ_{l})] - E (η_{k} + ζ_{k}) E (η_{l} + ζ_{l}) = E (η_{k} η_{l} + η_{k} ζ_{l} + ζ_{k} η_{l} + ζ_{k} ζ_{l}) - E (ζ_{k}) E (ζ_{l})

with

E (η_{k} η_{l}) = E [{(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) ϵ_{n} {(G_{l} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) ϵ_{n}] = σ_{0}^{2} {(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) (G_{l} X_{n} β_{0}),

E (η_{k} ζ_{l}) = E [{(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) ϵ_{n} ϵ_{n}^{⊤} C_{l}^{⊤} (I - P_{X_{n}}) ϵ_{n}] = μ_{3} {(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) c_{l}

where

c_{l}

is the vector whose elements are diagonal elements of

C_{l}^{⊤} (I - P_{X_{n}})

, and

E (ζ_{k} ζ_{l}) = E [ϵ_{n}^{⊤} C_{k}^{⊤} (I - P_{X_{n}}) ϵ_{n} ϵ_{n}^{⊤} C_{l}^{⊤} (I - P_{X_{n}}) ϵ_{n}] = (μ_{4} - 3 σ_{0}^{4}) c_{k}^{⊤} c_{l} + σ_{0}^{4} \{tr [C_{k}^{⊤} (I - P_{X_{n}})] tr [C_{l}^{⊤} (I - P_{X_{n}})] + tr [C_{k}^{⊤} (I - P_{X_{n}}) C_{l}^{⊤} (I - P_{X_{n}})] + tr (C_{l}^{⊤} (I - P_{X_{n}}) C_{k}]\} = (μ_{4} - 3 σ_{0}^{4}) c_{k}^{⊤} c_{l} + σ_{0}^{4} \{tr [C_{k}^{⊤} (I - P_{X_{n}})] tr [C_{l}^{⊤} (I - P_{X_{n}})]\} + σ_{0}^{4} tr ((I - P_{X_{n}}) C_{k} (I - P_{X_{n}}) C_{l}) + tr (C_{k}^{⊤} (I - P_{X_{n}}) C_{l}) .

Therefore, we obtain

(i) The mean of

\sqrt{\frac{d_{n}}{n}} \frac{\partial L_{n} (ρ_{0})}{\partial ρ}

is

o_{p} (1)

, and

(ii) The covariance of

\sqrt{\frac{d_{n}}{n}} \frac{\partial L_{n} (ρ_{0})}{\partial ρ}

is

(\frac{d_{n}}{n {\tilde{σ}}_{n}^{4} (ρ_{0})} e_{k l})

, where

e_{k l} = σ_{0}^{2} {(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) (G_{l} X_{n} β_{0}) + μ_{3} [{(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) c_{l} + {(G_{l} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) c_{k}] + (μ_{4} - 3 σ_{0}^{4}) c_{k}^{⊤} c_{l} + σ_{0}^{4} [tr ((I - P_{X_{n}}) C_{k} (I - P_{X_{n}}) C_{l}) + tr (C_{k}^{⊤} (I - P_{X_{n}}) C_{l})] .

The

\sqrt{\frac{d_{n}}{n}} \frac{\partial L_{n} (ρ_{0})}{\partial ρ}

converges in distribution to

N (0, I_{ρ})

, where

I_{ρ} = {lim}_{n \to \infty} (\frac{d_{n}}{n {\tilde{σ}}_{n}^{4} (ρ_{0})} e_{k l})

.

Fourth step: It follows from Slutsky’s theorem that

\sqrt{\frac{n}{d_{n}}} ({\hat{ρ}}_{n} - ρ_{0}) = - {(\frac{d_{n}}{n} \frac{\partial^{2} \log L_{n} (ρ_{0})}{\partial ρ \partial ρ^{⊤}})}^{- 1} \sqrt{\frac{d_{n}}{n}} \frac{\partial \log L_{n} (ρ_{0})}{\partial ρ}

converges in distribution to

N (0, Σ_{ρ}^{- 1} I_{ρ} Σ_{\hat{ρ}}^{- 1})

, implying that we completed the proof of the desired result. □

Proof (Proof of Theorem 6) .

To derive the limiting distribution of

{\hat{β}}_{n}

with

\sqrt{n / d_{n}}

-rate of convergence, it follows from (15) that

\sqrt{\frac{n}{d_{n}}} ({\hat{β}}_{n} - β_{0}) = \sqrt{\frac{n}{d_{n}}} [{(X_{n}^{⊤} X_{n})}^{- 1} X_{n}^{⊤} S (\hat{ρ}) Y_{n} - β_{0}] = \sqrt{\frac{1}{d_{n}}} {(\frac{X_{n}^{⊤} X_{n}}{n})}^{- 1} \frac{X_{n}^{⊤} ϵ_{n}}{\sqrt{n}} - \sum_{k = 1}^{m} \sqrt{\frac{n}{d_{n}}} ({\hat{ρ}}_{k} - ρ_{k 0}) {(\frac{X_{n}^{⊤} X_{n}}{n})}^{- 1} \frac{X_{n}^{⊤} (G_{k} X_{n} β_{0})}{n} - \frac{1}{\sqrt{n}} \sum_{k = 1}^{m} \sqrt{\frac{n}{d_{n}}} ({\hat{ρ}}_{k} - ρ_{k 0}) {(\frac{X_{n}^{⊤} X_{n}}{n})}^{- 1} \frac{X_{n}^{⊤} G_{k} ϵ_{n}}{\sqrt{n}} = {(\frac{X_{n}^{⊤} X_{n}}{n})}^{- 1} (\begin{matrix} \frac{X_{n}^{⊤} (G_{1} X_{n} β_{0})}{n} & \dots & \frac{X_{n}^{⊤} (G_{m} X_{n} β_{0})}{n} \end{matrix}) \sqrt{\frac{n}{d_{n}}} ({\hat{ρ}}_{n} - ρ_{0}) + O_{p} (d_{n}^{- 1 / 2}) .

It is a linear combination function of random vector,

\sqrt{\frac{n}{d_{n}}} ({\hat{ρ}}_{n} - ρ_{0})

, and higher-order infinitesimals. The limiting distribution (29) of

{\hat{β}}_{n}

follows. When

β_{0} = 0

, we have

\sqrt{n} {\hat{β}}_{n} = \sqrt{n} {(X_{n}^{⊤} X_{n})}^{- 1} X_{n}^{⊤} S (\hat{ρ}) Y_{n} = {(\frac{X_{n}^{⊤} X_{n}}{n})}^{- 1} \frac{X_{n}^{⊤} ϵ_{n}}{\sqrt{n}} - \sqrt{\frac{d_{n}}{n}} \sum_{k = 1}^{m} \sqrt{\frac{n}{d_{n}}} ({\hat{ρ}}_{k} - ρ_{k 0}) {(\frac{X_{n}^{⊤} X_{n}}{n})}^{- 1} \frac{X_{n}^{⊤} G_{k} ϵ_{n}}{\sqrt{n}} = {(\frac{X_{n}^{⊤} X_{n}}{n})}^{- 1} \frac{X_{n}^{⊤} ϵ_{n}}{\sqrt{n}} + O_{p} (\sqrt{d_{n} / n}) \overset{D}{⟶} N (0, σ_{0}^{2} lim_{\to \infty} \frac{1}{n} X_{n}^{⊤} X_{n}) .

To derive the limiting distribution of

{\hat{σ}}_{n}

with

\sqrt{n}

-rate of convergence, it follows from (16) that

\sqrt{n} ({\tilde{σ}}_{n}^{2} - σ_{0}^{2}) = \sqrt{n} [\frac{1}{n} Y_{n}^{⊤} S {({\hat{ρ}}_{n})}^{⊤} (I - P_{X_{n}}) S ({\hat{ρ}}_{n}) Y_{n} - σ_{0}^{2}] = \frac{1}{\sqrt{n}} {(X_{n} β_{0} + ϵ_{n})}^{⊤} (I - \sum_{k = 1}^{m} ({\hat{ρ}}_{k} - ρ_{k 0}) G_{k}^{⊤}) (I - P_{X_{n}}) \times (I - \sum_{l = 1}^{m} ({\hat{ρ}}_{l} - ρ_{l 0}) G_{l}) (X_{n} β_{0} + ϵ_{n}) - \sqrt{n} σ_{0}^{2} .

The detailed decomposition is as follows:

\sqrt{n} ({\tilde{σ}}_{n}^{2} - σ_{0}^{2}) = \frac{1}{\sqrt{n}} (ϵ_{n}^{⊤} ϵ_{n} - n σ_{0}^{2}) + \frac{1}{\sqrt{n}} ϵ_{n}^{⊤} P_{X_{n}} ϵ_{n} - \frac{2}{\sqrt{d_{n}}} \sum_{k = 1}^{m} \sqrt{\frac{n}{d_{n}}} ({\hat{ρ}}_{k} - ρ_{k 0}) \frac{d_{n}}{n} ϵ_{n}^{⊤} G_{k}^{⊤} (I - P_{X_{n}}) ϵ_{n} - \frac{2}{\sqrt{n}} \sum_{k = 1}^{m} \sqrt{\frac{n}{d_{n}}} ({\hat{ρ}}_{k} - ρ_{k 0}) \sqrt{\frac{d_{n}}{n}} {(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) ϵ_{n} + \frac{1}{\sqrt{n}} \sum_{k, l = 1}^{m} \sqrt{\frac{n}{d_{n}}} ({\hat{ρ}}_{k} - ρ_{k 0}) \sqrt{\frac{n}{d_{n}}} ({\hat{ρ}}_{l} - ρ_{l 0}) \frac{d_{n}}{n} ϵ_{n}^{⊤} G_{k}^{⊤} (I - P_{X_{n}}) G_{l} ϵ_{n} + \frac{2}{\sqrt{n}} \sum_{k, l = 1}^{m} \sqrt{\frac{n}{d_{n}}} ({\hat{ρ}}_{k} - ρ_{k 0}) \sqrt{\frac{n}{d_{n}}} ({\hat{ρ}}_{l} - ρ_{l 0}) \frac{d_{n}}{n} {(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) G_{l} ϵ_{n} + \frac{1}{\sqrt{n}} \sum_{k, l = 1}^{m} \sqrt{\frac{n}{d_{n}}} ({\hat{ρ}}_{k} - ρ_{k 0}) \sqrt{\frac{n}{d_{n}}} ({\hat{ρ}}_{l} - ρ_{l 0}) \frac{d_{n}}{n} {(G_{k} X_{n} β_{0})}^{⊤} (I - P_{X_{n}}) G_{l} X_{n} β_{0} = \sum_{i = 1}^{n} \frac{ϵ_{i}^{2} - σ_{0}^{2}}{\sqrt{n}} + o_{p} (1) \overset{D}{⟶} N (0, μ_{4} - σ_{0}^{4}) .

Therefore, we have proven the desired results. □

Appendix B. Counterexample of Inconsistent QML Estimators

Let us consider the following example

W_{2 n} = \frac{1}{n} (1_{2 n} 1_{2 n}^{⊤} - I_{2 n}) .

In this case, each individual can be influenced with exactly the same effect (absolute equality) by all neighbors and can influence all neighbors at an exactly the same effect. The number of neighbors is

n - 1

. Here,

d_{2 n} = O (n)

, which violates Assumption A3. Take

W_{1} = (\begin{matrix} \frac{1}{n} (1_{n} 1_{n}^{⊤} - I_{n}) & 0 \\ \frac{1}{n} 1_{n} 1_{n}^{⊤} & 0 \end{matrix}) and W_{2} = (\begin{matrix} 0 & \frac{1}{n} 1_{n} 1_{n}^{⊤} \\ 0 & \frac{1}{n} (1_{n} 1_{n}^{⊤} - I_{n}) \end{matrix}),

which satisfy the homogeneous classification condition (5). For such a spatial weight matrices,

W_{1}

and

W_{2}

, we have

S (ρ) = I_{2 n} - ρ_{1} W_{1} - ρ_{2} W_{2} = (\begin{matrix} \frac{n + ρ_{1}}{n} I - \frac{ρ_{1}}{n} 1_{n} 1_{n}^{⊤} & - \frac{ρ_{2}}{n} 1_{n} 1_{n}^{⊤} \\ - \frac{ρ_{1}}{n} 1_{n} 1_{n}^{⊤} & \frac{n + ρ_{2}}{n} I - \frac{ρ_{2}}{n} 1_{n} 1_{n}^{⊤} \end{matrix}) \equiv (\begin{matrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{matrix}) .

Thus,

S {(ρ)}^{- 1} = (\begin{matrix} B_{11}^{- 1} + B_{11}^{- 1} B_{12} B_{22.1}^{- 1} B_{21} B_{11}^{- 1} & - B_{11}^{- 1} B_{12} B_{22.1}^{- 1} \\ - B_{22.1}^{- 1} B_{21} B_{11}^{- 1} & B_{22.1}^{- 1} \end{matrix}),

where

B_{11}^{- 1} = \frac{n}{n + ρ_{1}} I_{n} + \frac{ρ_{1} n}{(n + ρ_{1}) [n (1 - ρ_{1}) + ρ_{1}]} 1 1^{⊤} \equiv a_{1} I_{n} + a_{2} 1 1^{⊤}

(A11)

with

a_{1} \to 1

and

n a_{2} \to ρ_{1} / (1 - ρ_{1})

(assuming

ρ_{1} \neq 1

),

B_{22.1} = B_{22} - B_{21} B_{11}^{- 1} B_{12} = \frac{n + ρ_{2}}{n} I_{n} - \frac{ρ_{2} (n + ρ_{1})}{n [n (1 - ρ_{1}) + ρ_{1}]} 1 1^{⊤},

and

B_{22.1}^{- 1} = \frac{n}{n + ρ_{2}} I_{n} + \frac{ρ_{2} (n^{2} + n ρ_{1})}{(n + ρ_{2}) [n^{2} (1 - ρ_{1} - ρ_{2}) + n (ρ_{1} + ρ_{2} - 2 ρ_{1} ρ_{2}) + ρ_{1} ρ_{2}]} 1 1^{⊤} \equiv b_{1} I_{n} + b_{2} 1 1^{⊤}

(A12)

with

b_{1} \to 1

and

n b_{2} \to ρ_{2} / (1 - ρ_{1} - ρ_{2})

(assuming

ρ_{1} + ρ_{2} \neq 1

). It follows that

G_{1} = W_{1} S {(ρ_{0})}^{- 1} = (\begin{matrix} \frac{1}{n} (1_{n} 1_{n}^{⊤} - I_{n}) [B_{11}^{- 1} + B_{11}^{- 1} B_{12} B_{22.1}^{- 1} B_{21} B_{11}^{- 1}] & - \frac{1}{n} (1_{n} 1_{n}^{⊤} - I_{n}) B_{11}^{- 1} B_{12} B_{22.1}^{- 1} \\ \frac{1}{n} 1_{n} 1_{n}^{⊤} [B_{11}^{- 1} + B_{11}^{- 1} B_{12} B_{22.1}^{- 1} B_{21} B_{11}^{- 1}] & - \frac{1}{n} 1_{n} 1_{n}^{⊤} B_{11}^{- 1} B_{12} B_{22.1}^{- 1} \end{matrix})

and

G_{2} = W_{2} S {(ρ_{0})}^{- 1} = (\begin{matrix} - \frac{1}{n} 1_{n} 1_{n}^{⊤} B_{22.1}^{- 1} B_{21} B_{11}^{- 1} & \frac{1}{n} 1_{n} 1_{n}^{⊤} B_{22.1}^{- 1} \\ - \frac{1}{n} (1_{n} 1_{n}^{⊤} - I_{n}) B_{22.1}^{- 1} B_{21} B_{11}^{- 1} & \frac{1}{n} (1_{n} 1_{n}^{⊤} - I_{n}) B_{22.1}^{- 1} \end{matrix}) .

Note that we have not made a distinction between

ρ_{0}

or

ρ

in B’s in our discussion.

Appendix B.1. The Case of Violating Condition 1

As

X_{2 n}

includes two constant columns,

X_{0} = diag (1_{n}, 1_{n})

, i.e.,

X_{2 n} = (X_{0}, X_{1})

, we have

P_{X_{2 n}} = P_{X_{0}} + P_{M_{X_{0}} X_{1}},

where

M_{X_{0}} = (I - P_{X_{0}}) = diag (I - P_{1}, I - P_{1})

. Note that

I_{2 n} - P_{X_{2 n}} = I - P_{X_{0}} - P_{M_{X_{0}} X_{1}} = [I - M_{X_{0}} {(X_{1}^{⊤} M_{X_{0}} X_{1})}^{- 1} X_{0}^{⊤}] M_{X_{0}} .

Thus, with

G_{k} = W_{k} S {(ρ_{0})}^{- 1}

,

(I - P_{1}) B_{11}^{- 1} B_{12} B_{22.1}^{- 1} = 0

and

(I - P_{1}) B_{22.1}^{- 1} B_{12} B_{11}^{- 1} = 0

, we obtain

(I - P_{X_{2 n}}) G_{1} X_{2 n} β_{0} = [I - M_{X_{0}} {(X_{1}^{⊤} M_{X_{0}} X_{1})}^{- 1} X_{0}^{⊤}] (\begin{matrix} - \frac{a_{1}}{n} (I - P_{1}) & 0 \\ 0 & 0 \end{matrix}) X_{2 n} β_{0} = 0

and

(I - P_{X_{2 n}}) G_{2} X_{2 n} β_{0} = [I - M_{X_{0}} {(X_{1}^{⊤} M_{X_{0}} X_{1})}^{- 1} X_{0}^{⊤}] (\begin{matrix} 0 & 0 \\ 0 & - \frac{b_{1}}{n} (I - P_{1}) \end{matrix}) X_{2 n} β_{0} = 0,

implying that Condition 1 is violated.

G X_{2 n} β_{0}

belongs to the column space of

X_{2 n}

or

G X_{2 n} β_{0}

is multicollinear with

X_{2 n}

.

From (A11) and (A12), we find

G_{2} = (\begin{matrix} \frac{ρ_{1}}{n} (a_{1} + a_{2} n) (b_{1} + b_{2} n) 1_{n} 1_{n}^{⊤} & \frac{1}{n} (b_{1} + b_{2} n) 1_{n} 1_{n}^{⊤} \\ \frac{ρ_{1} (n - 1)}{n^{2}} (a_{1} + a_{2} n) (b_{1} + b_{2} n) 1_{n} 1_{n}^{⊤} & \frac{1}{n} [(b_{1} + (n - 1) b_{2}) 1_{n} 1_{n}^{⊤} - b_{1} I_{n}] \end{matrix})

(A13)

with

tr (G_{2}) = ρ_{1} (a_{1} + a_{2} n) (b_{1} + n b_{2}) + (n - 1) b_{2} \to \frac{ρ_{1} + ρ_{2}}{1 - ρ_{1} - ρ_{2}} .

(A14)

Furthermore, we obtain

G_{2} G_{2} = (\begin{matrix} [ρ_{1}^{2} a + \frac{ρ_{1} (n - 1)}{n}] a b^{2} \frac{1_{n} 1_{n}^{⊤}}{n} & [ρ_{1} a + \frac{(n - 1)}{n}] b^{2} \frac{1_{n} 1_{n}^{⊤}}{n} \\ [\frac{n - 1}{n} ρ_{1}^{2} a + \frac{ρ_{1} {(n - 1)}^{2}}{n^{2}}] a b^{2} \frac{1_{n} 1_{n}^{⊤}}{n} & [\frac{n - 1}{n} ρ_{1} a b^{2} + (b - b_{2}) (b - b_{2} - 2 b_{1} / n)] \frac{1_{n} 1_{n}^{⊤}}{n} + \frac{b_{1}^{2}}{n^{2}} I_{n} \end{matrix})

(A15)

where

a = a_{1} + n a_{2} \to 1 / (1 - ρ_{1})

and

b = b_{1} + n b_{2} \to (1 - ρ_{1}) / (1 - ρ_{1} - ρ_{2})

, with

tr (G_{2} G_{2}) = [ρ_{1}^{2} a + 2 ρ_{1} \frac{(n - 1)}{n}] a b^{2} + (b - b_{2}) (b - b_{2} - \frac{2 b_{1}}{n}) + \frac{b_{1}^{2}}{n} \to \frac{1}{{(1 - ρ_{1} - ρ_{2})}^{2}},

(A16)

implying that Condition 4 is violated. Singularity of the information matrix occurs.

For simplicity, consider

σ_{0}^{2} = 1

as known. The log-likelihood function (10) is

\log L_{2 n} (ρ, β) = - \frac{n}{2} \log (2 π) + \log | S (ρ) | - \frac{1}{2} ϵ_{2 n}^{⊤} (ρ, β) ϵ_{2 n} (ρ, β) .

Given

ρ

, the QML estimator of

\hat{β} (ρ)

is

\hat{β} (ρ) = {(X_{2 n}^{⊤} X_{2 n})}^{- 1} X_{2 n}^{⊤} S (ρ) Y_{2 n}

and the concentrated log-likelihood function of

ρ

is

\log L_{2 n} (ρ) = - \frac{n}{2} \log (2 π) + \log | S (ρ) | - \frac{1}{2} [Y_{2 n}^{⊤} S {(ρ)}^{⊤} (I - P_{X_{2 n}}) S (ρ) Y_{2 n}] .

(A17)

The first-order derivative of the likelihood function (A17) with respect to

ρ_{k}

is

\frac{\partial \log L_{2 n} (ρ)}{\partial ρ_{k}} = - tr [S {(ρ)}^{- 1} W_{k}] + Y_{n}^{⊤} S {(ρ)}^{⊤} (I - P_{X_{2 n}}) W_{k} Y_{n} = - tr [S {(ρ)}^{- 1} W_{k}] + {(X_{2 n} β + ϵ_{2 n})}^{⊤} {[I + (ρ_{01} - ρ_{1}) G_{1} + (ρ_{02} - ρ_{2}) G_{2}]}^{⊤} (I - P_{X_{2 n}}) G_{k} ϵ_{2 n} = - tr [S {(ρ)}^{- 1} W_{k}] + ϵ_{2 n}^{⊤} {[I + (ρ_{01} - ρ_{1}) G_{1} + (ρ_{02} - ρ_{2}) G_{2}]}^{⊤} (I - P_{X_{2 n}}) G_{k} ϵ_{2 n} .

Thus,

\frac{\partial \log L_{2 n} (ρ_{0})}{\partial ρ_{k}} = - tr (G_{k}) + ϵ_{2 n}^{⊤} (I - P_{X_{2 n}}) G_{k} ϵ_{2 n} .

(A18)

The second-order derivative of (A17) is

\frac{\partial^{2} \log L_{2 n} (ρ)}{\partial^{2} ρ_{k}} = - tr [S {(ρ)}^{- 1} W_{k} S {(ρ)}^{- 1} W_{k}] - {(X_{2 n} β + ϵ_{2 n})}^{⊤} G_{k}^{⊤} (I - P_{X_{2 n}}) G_{k} (X_{2 n} β + ϵ_{2 n}) = - tr [S {(ρ)}^{- 1} W_{k} S {(ρ)}^{- 1} W_{k}] - ϵ_{2 n}^{⊤} G_{k}^{⊤} (I - P_{X_{2 n}}) G_{k} ϵ_{2 n} .

(A19)

It follows from the mean value theorem that

{\hat{ρ}}_{k} - ρ_{0 k} = - {[\frac{\partial^{2} \log L_{2 n} (\bar{ρ})}{\partial^{2} ρ_{k}}]}^{- 1} \frac{\partial \log L_{2 n} (ρ_{0})}{\partial ρ_{k}},

(A20)

where

{\bar{ρ}}_{k}

is between

{\hat{ρ}}_{k}

and

ρ_{0 k}

.

Assume that

{\hat{ρ}}_{k}

is consistent. If

{\hat{ρ}}_{k}

is consistent, it means that

{\hat{ρ}}_{k}

converges in probability to true value

ρ_{0 k}

, i.e.,

{\hat{ρ}}_{k} \overset{P}{⟶} ρ_{0 k}

. It follows from (A13), (A14), (A16), (A18) and (A19) that

\frac{\partial \log L_{2 n} (ρ_{0})}{\partial ρ_{2}} = - tr (G_{2}) - (_{1} ϵ_{n}^{⊤}, ϵ_{n}^{⊤}) (I - P_{X_{2 n}}) (\begin{matrix} 0 & 0 \\ 0 & \frac{b_{1}}{n} \end{matrix}) (\begin{matrix} _{1} ϵ_{n} \\ ϵ_{n} \end{matrix}) = - tr (G_{2}) - \frac{b_{1}}{n} ϵ_{n}^{⊤} ϵ_{n} + \frac{b_{1}}{n} \frac{ϵ_{2 n}^{⊤} X_{2 n}}{\sqrt{2 n}} {(\frac{1}{2 n} X_{2 n}^{⊤} X_{2 n})}^{- 1} \frac{1}{\sqrt{2}} \frac{_{2} X_{n}^{⊤} ϵ_{n}}{\sqrt{n}} ⟶ - \frac{ρ_{01} + ρ_{02}}{1 - ρ_{01} - ρ_{02}} - 1 = - \frac{1}{1 - ρ_{01} - ρ_{02}},

where

ϵ_{2 n} = {(_{1} ϵ_{n}^{⊤}, ϵ_{n}^{⊤})}^{⊤}

,

X_{2 n} = {(_{1} X_{n}^{⊤},_{2} X_{n}^{⊤})}^{⊤}

and Assumption A7 is used in dealing with the third item, and

\frac{\partial^{2} \log L_{2 n} (ρ)}{\partial^{2} ρ_{k}} = - tr [S {(ρ)}^{- 1} W_{2} S {(ρ)}^{- 1} W_{2}] - ϵ_{2 n}^{⊤} G_{2}^{⊤} (I - P_{X_{2 n}}) G_{2} ϵ_{2 n} = - tr [S {(ρ)}^{- 1} W_{2} S {(ρ)}^{- 1} W_{2}] - (_{1} ϵ_{n}^{⊤}, ϵ_{n}^{⊤}) (\begin{matrix} 0 & 0 \\ 0 & \frac{b_{1}}{n} \end{matrix}) (I - P_{X_{2 n}}) (\begin{matrix} 0 & 0 \\ 0 & \frac{b_{1}}{n} \end{matrix}) (\begin{matrix} _{1} ϵ_{n} \\ ϵ_{n} \end{matrix}) = - tr [S {(ρ)}^{- 1} W_{2} S {(ρ)}^{- 1} W_{2}] - \frac{b_{1}^{2}}{n^{2}} (0, ϵ_{n}^{⊤}) (I - P_{X_{2 n}}) (\begin{matrix} 0 \\ ϵ_{n} \end{matrix}) = - tr [S {(ρ)}^{- 1} W_{2} S {(ρ)}^{- 1} W_{2}] - \frac{b_{1}^{2}}{n^{2}} ϵ_{n}^{⊤} ϵ_{n} + \frac{2 b_{1}^{2}}{n^{2}} \frac{ϵ_{n}^{⊤}_{2} X_{n}}{\sqrt{n}} {(\frac{1}{2 n} X_{2 n}^{⊤} X_{2 n})}^{- 1} \frac{{_{2} X_{n}}^{⊤} ϵ_{n}}{\sqrt{n}} \overset{P}{⟶} - \frac{1 - ρ_{1} + ρ_{1}^{2}}{{(1 - ρ_{1} - ρ_{2})}^{2}} .

It follows that

{\hat{ρ}}_{2} - ρ_{02} \overset{P}{⟶} \frac{ρ_{1} + ρ_{2} - 1}{1 - ρ_{1} + ρ_{1}^{2}} \neq 0 .

There is a contradiction, implying that

{\hat{ρ}}_{2}

is not consistent.

Appendix B.2. The Pure Heterogeneous SAR Process

For the pure heterogeneous SAR process (9),

β_{0} = 0

, implying that Condition 1 is violated. Note that Condition 4 has been violated, see Appendix B.1. The first-order derivative at

ρ_{0}

reduces to

\frac{\partial \log L_{2 n} (ρ_{0})}{\partial ρ_{2}} = - tr (G_{2}) + ϵ_{2 n}^{⊤} G_{2} ϵ_{2 n},

(A21)

following from (A13) that

ϵ_{2 n} G_{2} ϵ_{2 n} = ρ_{1} (a_{1} + n a_{2}) (b_{1} + n b_{2}) {(\frac{\sum_{i = 1}^{n} ϵ_{i}}{\sqrt{n}})}^{2} + (b_{1} + n b_{2}) (\frac{\sum_{i = 1}^{n} ϵ_{i}}{\sqrt{n}}) (\frac{\sum_{i = 1}^{n} ϵ_{n + i}}{\sqrt{n}}) + \frac{ρ_{1} (n - 1)}{n} (a_{1} + n a_{2}) (b_{1} + n b_{2}) (\frac{\sum_{i = 1}^{n} ϵ_{n + i}}{\sqrt{n}}) (\frac{\sum_{i = 1}^{n} ϵ_{i}}{\sqrt{n}}) + (b_{1} + (n - 1) b_{2}) {(\frac{\sum_{i = 1}^{n} ϵ_{n + i}}{\sqrt{n}})}^{2} - \frac{b_{1}}{n} \sum_{i, j = 1}^{n} ϵ_{n + i}^{2} \overset{P}{⟶} \frac{ρ_{01}}{1 - ρ_{01} - ρ_{02}} ξ_{1}^{2} + \frac{1}{1 - ρ_{1} - ρ_{02}} ξ_{1} ξ_{2} + \frac{1 - ρ_{01}}{1 - ρ_{01} - ρ_{02}} ξ_{2}^{2} - 1,

where

ξ_{1}

and

ξ_{2}

are two independent standard normal variables. Thus, from (A14), we find

\frac{\partial \log L_{2 n} (ρ_{0})}{\partial ρ_{2}} \overset{D}{⟶} \frac{ρ_{01}}{1 - ρ_{01} - ρ_{02}} ξ_{1}^{2} + \frac{1}{1 - ρ_{01} - ρ_{02}} (ξ_{1} ξ_{2} - 1) + \frac{1 - ρ_{01}}{1 - ρ_{01} - ρ_{02}} ξ_{2}^{2},

where

\overset{D}{⟶}

denotes convergence in distribution. The second-order derivative reduces to

\frac{\partial^{2} \log L_{2 n} (ρ)}{\partial^{2} ρ_{2}} = - tr [S {(ρ)}^{- 1} W_{2} S {(ρ)}^{- 1} W_{2}] - ϵ_{2 n}^{⊤} G_{2}^{⊤} G_{2} ϵ_{2 n},

following from (A15) that

ϵ_{2 n}^{⊤} G_{2}^{⊤} G_{2} ϵ_{2 n} = [ρ_{1}^{2} a + \frac{ρ_{1} (n - 1)}{n}] a b^{2} {(\frac{\sum_{i = 1}^{n} ϵ_{i}}{\sqrt{n}})}^{2} + [ρ_{1} a + \frac{(n - 1)}{n}] b^{2} (\frac{\sum_{i = 1}^{n} ϵ_{i}}{\sqrt{n}}) (\frac{\sum_{i = 1}^{n} ϵ_{n + i}}{\sqrt{n}}) + [\frac{n - 1}{n} ρ_{1}^{2} a + \frac{ρ_{1} {(n - 1)}^{2}}{n^{2}}] a b^{2} (\frac{\sum_{i = 1}^{n} ϵ_{n + i}}{\sqrt{n}}) (\frac{\sum_{i = 1}^{n} ϵ_{i}}{\sqrt{n}}) + [\frac{n - 1}{n} ρ_{1} a b^{2} + (b - b_{2}) (b - b_{2} - 2 b_{1} / n)] {(\frac{\sum_{i = 1}^{n} ϵ_{n + i}}{\sqrt{n}})}^{2} + \frac{b_{1}^{2}}{n} (\frac{\sum_{i = 1}^{n} ϵ_{n + i}^{2}}{n}) \overset{P}{⟶} \frac{ρ_{01}}{{(1 - ρ_{01} - ρ_{02})}^{2}} ξ_{1}^{2} + \frac{1}{{(1 - ρ_{01} - ρ_{02})}^{2}} ξ_{1} ξ_{2} + \frac{1 - ρ_{01}}{{(1 - ρ_{01} - ρ_{02})}^{2}} ξ_{2}^{2} .

Thus, with (A16), we obtain

\frac{\partial^{2} \log L_{2 n} (ρ)}{\partial^{2} ρ_{2}} \overset{D}{⟶} - \frac{ρ_{01}}{{(1 - ρ_{01} - ρ_{02})}^{2}} ξ_{1}^{2} - \frac{1}{{(1 - ρ_{01} - ρ_{02})}^{2}} (ξ_{1} ξ_{2} + 1) - \frac{1 - ρ_{01}}{{(1 - ρ_{01} - ρ_{02})}^{2}} ξ_{2}^{2} .

It follows from (A20) that

{\hat{ρ}}_{2} - ρ_{02} = - {[\frac{\partial^{2} \log L_{2 n} (\bar{ρ})}{\partial^{2} ρ_{2}}]}^{- 1} \frac{\partial \log L_{2 n} (ρ_{0})}{\partial ρ_{2}} \overset{D}{⟶} (1 - ρ_{01} - ρ_{02}) \frac{ρ_{01} ξ_{1}^{2} + ξ_{1} ξ_{2} - 1 + (1 - ρ_{01}) ξ_{2}^{2}}{ρ_{01} ξ_{1}^{2} + ξ_{1} ξ_{2} + 1 + (1 - ρ_{01}) ξ_{2}^{2}} \neq 0,

which would be a contradiction as

{\hat{ρ}}_{2} - ρ_{02}

would not have a degenerate distribution at the origin point. This contradiction tells us that the estimator

{\hat{ρ}}_{2}

could not be a consistent estimator of

ρ_{02}

.

References

Bell, K.P.; Bockstael, N.E. Applying the generalized-moments estimation approach to spatial problems involving microlevel data. Rev. Econ. Stat. 2000, 82, 72–82. [Google Scholar] [CrossRef]
Banerjee, S.; Gelfand, A.E.; Knight, J.R.; Sirmans, C.F. Spatial modeling of house prices using normalized distance-weighted sums of stationary processes. J. Bus. Econ. Stat. 2004, 22, 206–213. [Google Scholar] [CrossRef]
Cliff, A.D.; Ord, J.K. Spatial Autocorrelation; Pion Ltd.: London, UK, 1973. [Google Scholar]
Anselin, L. Spatial Econometrics: Methods and Models; Kluwer: Dordrecht, The Netherlands, 1988. [Google Scholar]
Cressie, N. Statistics for Spatial Data; John Wiley: New York, NY, USA, 1993. [Google Scholar]
Anselin, L.; Bera, A.K. Spatial dependence in linear regression models with an introduction to spatial econometrics. In Handbook of Applied Economics Statistics; Ullah, A., Giles, D.E.A., Eds.; Marcel Dekker: New York, NY, USA, 1998. [Google Scholar]
Elhorst, J.P. Spatial Econometrics: From Cross Sectional Data to Spatial Panels; Springer: Heidelberg, Germany; New York, NY, USA; Dordrecht, The Netherlands; London, UK, 2014. [Google Scholar]
Case, A.C. Spatial patterns in household demand. Econometrica 1991, 59, 953–965. [Google Scholar] [CrossRef]
Case, A.C.; Rosen, H.S.; Hines, J.R. Budget spillovers and fiscal policy interdependence: Evidence from the states. J. Public Econ. 1993, 52, 285–307. [Google Scholar] [CrossRef]
Besley, T.; Case, A. Incumbent behavior: Vote-seeking, tax-setting, and yard stick competition. Am. Econ. Rev. 1995, 85, 25–45. [Google Scholar]
Brueckner, J.K. Testing for strategic interaction among local governments: The case of growth controls. J. Urban Econ. 1998, 44, 438–467. [Google Scholar] [CrossRef]
Bertrand, M.; Luttmer, E.F.R.; Mullainathan, S. Network effects and welfare cultures. Q. J. Econ. 2000, 115, 1019–1055. [Google Scholar] [CrossRef]
Topa, G. Social interactions, local spillovers and unemployment. Rev. Econ. Stud. 2001, 68, 261–295. [Google Scholar] [CrossRef]
Coval, J.D.; MosKouitz, T.J. The geography of investment: Informed trading and asset prices. J. Political Econ. 2001, 109, 811–841. [Google Scholar] [CrossRef]
Druska, V.; Horrace, W.C. Generalized moments estimation for spatial panel data: Indonesian rice farming. Am. J. Agric. Econ. 2004, 86, 185–198. [Google Scholar] [CrossRef]
Frazier, C.; Kockelman, K.M. Spatial econometric models for panel data: Incorporating spatial and temporal data. Transp. Res. Rec. J. Transp. Res. Board 2005, 1902, 80–90. [Google Scholar] [CrossRef]
Baltagi, B.; Li, D. Prediction in the panel data model with spatial correlation: The case of liquor. Spat. Econ. Anal. 2006, 1, 175–185. [Google Scholar] [CrossRef]
Pirinsky, C.; Wang, Q. Does corporate headquarters location matter for stock returns? J. Financ. 2006, 61, 1991–2015. [Google Scholar] [CrossRef]
Bekaert, G.; Hodrick, R.J.; Zhang, X.Y. International stock return comovements. J. Financ. 2009, 64, 2591–2626. [Google Scholar] [CrossRef]
Robinson, P.M.; Rossi, F. Improved Lagrange multiplier tests in spatial autoregressions. Econ. J. 2014, 17, 139–154. [Google Scholar] [CrossRef]
Liu, X.; Chen, J.; Cheng, S. A penalized quasi-maximum likelihood method for variable selection in the spatial autoregressive model. Spat. Stat. 2018, 25, 86–104. [Google Scholar] [CrossRef]
Ord, K. Estimation methods for models of spatial Interaction. J. Am. Stat. Assoc. 1975, 70, 120–126. [Google Scholar] [CrossRef]
Smirnov, O.; Anselin, L. Fast maximum likelihood estimation of very large spatial autoregressive models: A characteristic polynomial approach. Comput. Stat. Data Anal. 2001, 35, 301–319. [Google Scholar] [CrossRef]
Robinson, P.M.; Rossi, F. Refinements in maximum likelihood inference on spatial autocorrelation in panel data. J. Econom. 2015, 189, 447–456. [Google Scholar] [CrossRef]
Kelejian, H.H.; Prucha, I.R. A Generalized Moments Estimator for the Autoregressive Parameter in a Spatial Model. Int. Econ. Rev. 1999, 40, 509–533. [Google Scholar] [CrossRef]
Lee, L.F.; Liu, X. Efficient GMM estimation of high order spatial autoregressive models with autoregressive disturbances. Econom. Theory 2010, 26, 187–230. [Google Scholar] [CrossRef] [Green Version]
Lee, L.F. Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 2004, 72, 1899–1925. [Google Scholar] [CrossRef]
Yu, J.; De Jong, R.; Lee, L.F. Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and T are large. J. Econom. 2008, 146, 118–134. [Google Scholar] [CrossRef]
Song, Y.; Liang, X.; Zhu, Y.; Lin, L. Robust variable selection with exponential squared loss for the spatial autoregressive model. Comput. Stat. Data Anal. 2021, 155, 107094. [Google Scholar] [CrossRef]
Ju, Y.; Yang, Y.; Hu, M.; Dai, L.; Wu, L. Bayesian Influence Analysis of the Skew-Normal Spatial Autoregression Models. Mathematics 2022, 10, 1306. [Google Scholar] [CrossRef]
Ahrens, A.; Bhattacharjee, A. Two-step Lasso estimation of the spatial weights matrix. Econometrics 2015, 3, 128–155. [Google Scholar] [CrossRef]
Lam, C.; Souza, P.C.L. Estimation and selection of spatial weight matrix in a spatial lag model. J. Bus. Econ. Stat. 2020, 38, 693–710. [Google Scholar] [CrossRef]
Clark, A.E.; Loheac, Y. “It was not me, it was them!” Social influence in risky behavior by adolescents. J. Health Econ. 2007, 26, 763–784. [Google Scholar] [CrossRef]
Mas, A.; Moretti, E. Peers at work. Am. Econ. Rev. 2009, 99, 112–145. [Google Scholar] [CrossRef]
Banerjee, A.; Chandrasekhar, A.; Duflo, E.; Jackson, M. The diffusion of microfinance. Science 2013, 341, 1236498. [Google Scholar] [CrossRef]
Dou, B.; Parrella, M.; Yao, Q. Generalized yule-walker estimation for spatio-temporal models with unknown diagonal coefficients. J. Econom. 2016, 194, 369–382. [Google Scholar] [CrossRef] [Green Version]
Peng, S. Heterogeneous endogenous effects in networks. arXiv 2019, arXiv:1908.00663. [Google Scholar]
Magnus, J.R.; Neudecker, H. Matrix Differential Calculus with Applications in Statistics and Econometrics; John Wiley & Sons: Hoboken, NJ, USA, 1991. [Google Scholar]
Horn, R.; Johnson, C. Matrix Analysis; Cambridge University Press: New York, NY, USA, 1985. [Google Scholar]
White, H. Estimation Inference and Specification Analysis; Cambridge University Press: New York, NY, USA, 1994. [Google Scholar]
Kelejian, H.H.; Prucha, I.R. On the asymptotic distribution of the Moran I test statistic with applications. J. Econom. 2001, 104, 219–257. [Google Scholar] [CrossRef] [Green Version]

Table 1. Empirical ML estimates of

θ

and CP in Model 1 with true value

θ_{0}

and

n = n_{1} n_{2}

.

Table 1. Empirical ML estimates of

θ

and CP in Model 1 with true value

θ_{0}

and

n = n_{1} n_{2}

.

		$n_{2}$	20				40				60
$n_{1}$	$θ$	$θ_{0}$	$\hat{θ}$	Bias	RMSE	CP	$\hat{ρ}$	BIAS	RMSE	CP	$\hat{θ}$	BIAS	RMSE	CP
30	$ρ_{1}$	0.20	0.1739	−0.0261	0.0998	91%	0.1905	−0.0095	0.0646	93%	0.2055	0.0055	0.0378	93%
	$ρ_{2}$	0.35	0.3363	−0.0137	0.0547	92%	0.3556	0.0056	0.0438	91%	0.3383	−0.0117	0.0312	92%
	$ρ_{3}$	0.50	0.5239	0.0239	0.0579	92%	0.4812	−0.0188	0.0441	94%	0.4882	−0.0118	0.0306	95%
	$ρ_{4}$	0.65	0.6541	0.0041	0.0352	93%	0.6541	0.0041	0.0316	92%	0.6425	−0.0075	0.0254	94%
	$ρ_{5}$	0.80	0.7935	−0.0065	0.0240	93%	0.7979	−0.0021	0.0263	93%	0.7989	−0.0011	0.0105	94%
	$β_{1}$	1	1.0152	0.0152	0.0413	96%	1.0239	0.0239	0.0451	95%	1.0081	0.0081	0.0369	93%
	$β_{2}$	1	1.0316	0.0316	0.0436	94%	0.9956	−0.0044	0.0372	95%	1.0129	0.0129	0.0358	96%
	$σ^{2}$	1	1.0112	0.0112	0.0974	93%	0.9852	−0.0148	0.0637	94%	1.0040	0.0040	0.0518	94%
50	$ρ_{1}$	0.20	0.1858	−0.0142	0.0473	92%	0.1982	−0.0018	0.0339	92%	0.1904	-0.0096	0.0289	93%
	$ρ_{2}$	0.35	0.3334	−0.0166	0.0373	93%	0.3391	−0.0109	0.0320	95%	0.3471	−0.0029	0.0248	95%
	$ρ_{3}$	0.50	0.4946	−0.0054	0.0252	92%	0.5001	0.0001	0.0214	97%	0.4964	−0.0036	0.0125	93%
	$ρ_{4}$	0.65	0.6756	0.0256	0.0243	94%	0.6416	−0.0084	0.0176	95%	0.6468	−0.0032	0.0147	97%
	$ρ_{5}$	0.80	0.7891	−0.0109	0.0113	95%	0.7999	−0.0001	0.0097	92%	0.7995	−0.0005	0.0078	95%
	$β_{1}$	1	1.0020	0.0020	0.0391	94%	0.9986	−0.0014	0.0358	93%	1.0026	0.0026	0.0324	96%
	$β_{2}$	1	1.0296	0.0296	0.0341	95%	1.0319	0.0319	0.0321	95%	1.0224	0.0224	0.0268	91%
	$σ^{2}$	1	0.9898	−0.0102	0.0579	91%	1.0061	0.0061	0.0454	93%	1.0070	0.0070	0.0333	92%
80	$ρ_{1}$	0.20	0.2056	0.0056	0.0353	92%	0.2011	0.0011	0.0292	92%	0.1998	−0.0002	0.0258	93%
	$ρ_{2}$	0.35	0.3371	−0.0129	0.0206	94%	0.3512	0.0012	0.0264	91%	0.3489	−0.0011	0.0156	95%
	$ρ_{3}$	0.50	0.4975	−0.0025	0.0241	91%	0.5003	0.0003	0.0187	93%	0.4996	−0.0004	0.0139	94%
	$ρ_{4}$	0.65	0.6547	0.0047	0.0116	92%	0.6504	0.0004	0.0153	91%	0.6497	−0.0003	0.0094	92%
	$ρ_{5}$	0.80	0.7956	−0.0044	0.0082	94%	0.7995	−0.0005	0.0088	95%	0.7998	−0.0002	0.0064	91%
	$β_{1}$	1	1.0033	0.0033	0.0274	97%	0.9966	−0.0034	0.0239	97%	1.0021	0.0021	0.0211	93%
	$β_{2}$	1	0.9876	−0.0124	0.0216	96%	0.9867	−0.0133	0.0217	95%	1.0109	0.0109	0.0194	96%
	$σ^{2}$	1	0.9942	−0.0058	0.0327	94%	0.9951	−0.0049	0.0276	93%	0.9979	−0.0021	0.0247	95%

Table 2. Empirical ML estimates of

θ

and CP in Model 2 with true value

θ_{0}

and

n = n_{1} n_{2}

.

Table 2. Empirical ML estimates of

θ

and CP in Model 2 with true value

θ_{0}

and

n = n_{1} n_{2}

.

		$n_{2}$	20				40				60
$n_{1}$	$θ$	$θ_{0}$	$\hat{θ}$	Bias	RMSE	CP	$\hat{ρ}$	BIAS	RMSE	CP	$\hat{θ}$	BIAS	RMSE	CP
30	$ρ_{1}$	0.20	0.2259	0.0259	0.0971	92%	0.2089	0.0089	0.0477	93%	0.2042	0.0042	0.0328	95%
	$ρ_{2}$	0.35	0.3323	−0.0177	0.0587	94%	0.3495	−0.0005	0.0283	95%	0.3599	0.0099	0.0282	94%
	$ρ_{3}$	0.50	0.4829	−0.0171	0.0547	93%	0.4926	−0.0074	0.0256	91%	0.5026	0.0026	0.0246	92%
	$ρ_{4}$	0.65	0.6459	−0.0041	0.0344	91%	0.6400	−0.0100	0.0246	94%	0.6443	−0.0057	0.0194	91%
	$ρ_{5}$	0.80	0.7988	−0.0012	0.0140	95%	0.7985	−0.0015	0.0102	92%	0.7973	−0.0027	0.0095	92%
	$β_{1}$	1	0.9944	−0.0056	0.0430	96%	0.9984	−0.0016	0.0408	93%	1.0013	0.0013	0.0245	93%
	$β_{2}$	1	0.9983	−0.0017	0.0363	94%	1.0022	0.0022	0.0314	91%	1.0009	0.0009	0.0215	96%
	$σ^{2}$	1	0.9771	−0.0229	0.0925	93%	0.9818	−0.0182	0.0640	94%	1.0015	0.0015	0.0544	91%
50	$ρ_{1}$	0.20	0.2078	0.0078	0.0433	93%	0.2045	0.0045	0.0327	97%	0.1939	−0.0061	0.0266	93%
	$ρ_{2}$	0.35	0.3557	0.0057	0.0316	92%	0.3403	−0.0097	0.0320	94%	0.3467	−0.0033	0.0228	95%
	$ρ_{3}$	0.50	0.5037	0.0037	0.0252	94%	0.4981	−0.0019	0.0154	93%	0.4944	−0.0056	0.0145	96%
	$ρ_{4}$	0.65	0.6445	−0.0055	0.0171	90%	0.6530	0.0030	0.0076	94%	0.6505	0.0005	0.0147	94%
	$ρ_{5}$	0.80	0.7997	−0.0003	0.0096	92%	0.8019	0.0019	0.0107	95%	0.7996	−0.0004	0.0090	95%
	$β_{1}$	1	0.9958	−0.0042	0.0306	94%	0.9986	−0.0014	0.0248	91%	0.9988	−0.0012	0.0171	96%
	$β_{2}$	1	0.9997	−0.0003	0.0314	95%	1.0004	0.0004	0.0212	98%	0.9997	−0.0003	0.0191	92%
	$σ^{2}$	1	0.9869	−0.0131	0.0516	91%	0.9863	−0.0137	0.0433	93%	1.0005	0.0005	0.0330	95%
80	$ρ_{1}$	0.20	0.1947	−0.0053	0.0338	91%	0.1994	−0.0006	0.0284	95%	0.1927	−0.0073	0.0244	92%
	$ρ_{2}$	0.35	0.3505	0.0005	0.0168	92%	0.3501	0.0001	0.0149	91%	0.3532	0.0032	0.0155	92%
	$ρ_{3}$	0.50	0.5021	0.0021	0.0195	95%	0.5008	0.0008	0.0170	96%	0.5027	0.0027	0.0117	95%
	$ρ_{4}$	0.65	0.6453	−0.0047	0.0101	96%	0.6469	−0.0031	0.0132	97%	0.6497	-0.0003	0.0082	91%
	$ρ_{5}$	0.80	0.7996	−0.0004	0.0059	93%	0.7997	−0.0003	0.0048	93%	0.8021	0.0021	0.0056	94%
	$β_{1}$	1	1.0104	0.0104	0.0267	92%	1.0008	0.0008	0.0165	91%	1.0027	0.0027	0.0115	96%
	$β_{2}$	1	0.9996	−0.0004	0.0203	93%	0.9997	−0.0003	0.0176	90%	1.0000	≤0.0001	0.0124	94%
	$σ^{2}$	1	1.0032	0.0032	0.0331	93%	1.0007	0.0007	0.0283	93%	0.9976	−0.0024	0.0243	91%

Table 3. Empirical ML estimates of

θ

and CP in Model 3 with true value

θ_{0}

and

n = n_{1} n_{2}

.

Table 3. Empirical ML estimates of

θ

and CP in Model 3 with true value

θ_{0}

and

n = n_{1} n_{2}

.

		$n_{2}$	20				40				60
$n_{1}$	$θ$	$θ_{0}$	$\hat{θ}$	BIAS	RMSE	CP	$\hat{θ}$	BIAS	RMSE	CP	$\hat{θ}$	BIAS	RMSE	CP
30	$ρ_{1}$	0.20	0.1987	−0.0013	0.0373	93%	0.2063	0.0063	0.0306	95%	0.1978	−0.0022	0.0247	93%
	$ρ_{2}$	0.35	0.3451	−0.0049	0.0317	92%	0.3504	0.0004	0.0229	94%	0.3508	0.0008	0.0147	96%
	$ρ_{3}$	0.50	0.4992	−0.0008	0.0301	92%	0.4874	−0.0126	0.0234	91%	0.4991	−0.0009	0.0174	95%
	$ρ_{4}$	0.65	0.6491	−0.0009	0.0210	93%	0.6555	0.0055	0.0181	94%	0.6525	0.0025	0.0143	94%
	$ρ_{5}$	0.80	0.7915	−0.0085	0.0189	92%	0.7975	−0.0025	0.0136	93%	0.7996	−0.0004	0.0050	95%
	$β_{1}$	1	0.9862	−0.0138	0.0284	94%	1.0000	−0.0000	0.0187	96%	0.9989	−0.0011	0.0191	96%
	$β_{2}$	1	1.0160	0.0160	0.0263	93%	0.9988	−0.0012	0.0194	92%	1.0009	0.0009	0.0185	93%
	$σ^{2}$	1	0.9534	−0.0466	0.0777	94%	0.9853	−0.0147	0.0520	93%	1.0105	0.0105	0.0396	95%
50	$ρ_{1}$	0.20	0.1881	−0.0119	0.0291	91%	0.2009	0.0009	0.0248	94%	0.1978	−0.0022	0.0190	93%
	$ρ_{2}$	0.35	0.3535	0.0035	0.0213	94%	0.3461	−0.0039	0.0192	92%	0.3516	0.0016	0.0158	93%
	$ρ_{3}$	0.50	0.4984	−0.0016	0.0224	92%	0.4963	−0.0037	0.0169	94%	0.4968	−0.0032	0.0105	96%
	$ρ_{4}$	0.65	0.6461	−0.0039	0.0140	95%	0.6451	−0.0049	0.0149	93%	0.6509	0.0009	0.0097	92%
	$ρ_{5}$	0.80	0.7999	−0.0001	0.0078	96%	0.7980	−0.0020	0.0066	92%	0.8006	0.0006	0.0029	94%
	$β_{1}$	1	1.0039	0.0039	0.0221	93%	1.0048	0.0048	0.0198	94%	1.0006	0.0006	0.0122	93%
	$β_{2}$	1	0.9965	−0.0035	0.0222	94%	0.9957	−0.0043	0.0195	91%	0.9993	−0.0007	0.0119	93%
	$σ^{2}$	1	0.9925	−0.0075	0.0477	95%	0.9913	−0.0087	0.0354	92%	1.0079	0.0079	0.0279	93%
80	$ρ_{1}$	0.20	0.1968	−0.0032	0.0204	95%	0.1988	−0.0012	0.0195	92%	0.2019	0.0019	0.0153	92%
	$ρ_{2}$	0.35	0.3555	0.0055	0.0189	94%	0.3486	−0.0014	0.0139	95%	0.3507	0.0007	0.0063	92%
	$ρ_{3}$	0.50	0.5002	0.0002	0.0164	93%	0.4991	−0.0009	0.0115	93%	0.5008	0.0008	0.0041	95%
	$ρ_{4}$	0.65	0.6515	0.0015	0.0085	94%	0.6507	0.0007	0.0052	96%	0.6510	0.0010	0.0073	91%
	$ρ_{5}$	0.80	0.7982	−0.0018	0.0115	96%	0.8009	0.0009	0.0041	97%	0.7999	−0.0001	0.0027	94%
	$β_{1}$	1	0.9962	−0.0038	0.0174	94%	0.9972	−0.0028	0.0124	93%	0.9965	−0.0035	0.0079	96%
	$β_{2}$	1	1.0042	0.0042	0.0172	93%	1.0025	0.0025	0.0121	92%	1.0036	0.0036	0.0076	94%
	$σ^{2}$	1	0.9891	−0.0109	0.0281	92%	0.9930	−0.0070	0.0212	94%	1.0023	0.0023	0.0185	95%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Asymptotic Properties of Quasi-Maximum Likelihood Estimators for Heterogeneous Spatial Autoregressive Models

Abstract

1. Introduction

2. Heterogeneous SAR Model and QML Estimators

3. Asymptotic Properties of QML Estimators

3.1. Consistency

3.2. Asymptotic Normality

4. Asymptotic Normality with Non-Square-Root Rates

5. Simulation Studies

6. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Proofs of Main Theoretical Results

Appendix B. Counterexample of Inconsistent QML Estimators

Appendix B.1. The Case of Violating Condition 1

Appendix B.2. The Pure Heterogeneous SAR Process

References

Article Metrics

Citations

Article Access Statistics