Next Article in Journal
Anti-Symmetric Medium Chirality Leading to Symmetric Field Helicity in Response to a Pair of Circularly Polarized Plane Waves in Counter-Propagating Configuration
Previous Article in Journal
Methodologies to Determine Geometrical Similarity Patterns as Experimental Models for Shapes in Architectural Heritage
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Asymptotic Properties of Quasi-Maximum Likelihood Estimators for Heterogeneous Spatial Autoregressive Models

1
School of Mathematics and Computer Science, Zhejiang Agriculture and Forestry University, Hangzhou 311300, China
2
Samoyed Cloud Technology Group Holdings Limited, Shanghai 200124, China
3
School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai 200433, China
*
Author to whom correspondence should be addressed.
Symmetry 2022, 14(9), 1894; https://doi.org/10.3390/sym14091894
Submission received: 8 August 2022 / Revised: 3 September 2022 / Accepted: 5 September 2022 / Published: 9 September 2022
(This article belongs to the Section Mathematics)

Abstract

:
In this paper, we address a class of heterogeneous spatial autoregressive models with all n ( n 1 ) spatial coefficients taking m distinct true values, where m is independent of the sample size n, and we establish asymptotic properties of the maximum likelihood estimator and the quasi-maximum likelihood estimator for all parameters in the class of models, extending Lee’s work (2004). The rates of convergence of those estimators depend on the features of values taken by elements of the spatial weights matrix in this model. Under the situations where, based on the values of the weights, each individual will not only influence a few neighbors but also be influenced by only a few neighbors, the estimator can enjoy an n -rate of convergence and be asymptotically normal. However, when each individual can influence many neighbors or can be influenced by many neighbors and their number does not exceed o ( n ) , singularity of the information matrix may occur, and various components of the estimators may have different (usually lower than n ) rates of convergence. An inconsistent estimator is provided if some important assumptions are violated. Finally, simulation studies demonstrate that the finite sample performances of maximum likelihood estimators are good.

1. Introduction

Spatial econometrics consists of econometric techniques dealing with empirical economic problems caused by spatial (cross sectional) interaction between spatial individuals. The dependence across spatial individuals is an interaction issue in urban, real estate, regional, public, agricultural, environmental economics, finance and industrial organizations. In many economic applications, modeling the interaction is essential for understanding market competition. One typical example is the commodity market where commodity buyers compare commodities of all sizes, with all kind of options and in all locations as well as their prices before purchasing.
In this example, the market demand for a commodity is a function of the prices and characteristics of all commodities. Specifically, let x i and ϵ i denote the observed and unobserved characteristics of commodity i, respectively, and let y i denote its price. The market demand for commodity ( x i , ϵ i ) is a function of ( y i , x i , ϵ i ) , i = 1 , , n , where n denotes the total number of commodities in the market. Setting the market demand for commodity ( x i , ϵ i ) equal to its market supply and solving for y i , we obtain the following partial equilibrium price:
y i = f i ( x i , ϵ i , ( y j , x j , ϵ j ) for all j i ) , i , j = 1 , , n .
The marginal effect of y i with respect to y j measures the price competition from commodity j, while the marginal effect with respect to x j measures the quality competition. To aid exposition, we shall call the dependence of the partial equilibrium price of commodity i on other prices ( y j , for all j , i ) as the direct spatial interaction and call dependence on other commodity characteristics ( ( x j , ϵ j ) for all j i ) as the indirect spatial interaction. The spatial interaction is the sum of the direct and indirect spatial interaction. The spatial interaction is often ignored in the existing hedonic commodity price regression literature. One argument for the omission is that researchers are interested only in predicting commodity prices, not in understanding market competition. If prediction is the goal, then we should solve (1) for all prices to obtain the general equilibrium prices:
y i = h i ( ( x j , ϵ j ) for all j ) , i , j = 1 , , n .
Clearly, the general equilibrium prices may contain the indirect spatial interaction terms; neglecting these terms in the hedonic commodity price regression could result in omitted variable bias and poor price prediction. Bell and Bockstael [1] and Banerjee et al. [2] partially addressed the omitted variable problem by adding the spatial error correlation to the hedonic commodity price regression. Their models yield consistent parameter estimates and prediction if and only if the other omitted interaction terms are uncorrelated with its own characteristics.
To capture spatial interaction, the approaches in spatial econometrics are to impose structures on a model. A well-known structure is the spatial autoregressive (SAR) studied by Cliff and Ord [3] who extended autocorrelation in time series to spatial dimensions. The SAR model with common regressors is expressed as
y i = ρ j i w i j y j + x i β + ϵ i , i , j = 1 , , n ,
where y i is the value of the equilibrium price for the ith spatial individual, w i j is the nonstochastic spatial weights from j to i with w i i = 0 for all i, ρ is the single spatial coefficient, x i k is the value of the kth regressor among p common regressors for the ith spatial individual, β k is the regression coefficient of the kth regressor for all spatial individuals. Assume that the disturbances or errors ϵ i across i = 1 , , n are independent and identically distributed (i.i.d.) with mean zero and variance σ 2 .
The spatial aspect of a SAR model has the distinguishing feature of simultaneity in econometric equilibrium models. Sustainable developments in testing and estimation of SAR models have been summarized in substantial literature, e.g., Cliff and Ord [3], Anselin [4], Cressie [5], Anselin and Bera [6], and Elhorst [7] among others. Recent empirical applications of the SAR model in main stream economics journals include Case [8], Case et al. [9], Besley and Case [10], Brueckner [11], Bell and Bockstael [1], Bertrand et al. [12], Topa [13], Coval and MosKouitz [14], Druska and Horrace [15], Frazier and Kockelman [16], Baltagi and Li [17], Pirinsky and Wang [18], Bekaert et al. [19], Robinson and Rossi [20], Liu et al. [21], and among others.
The SAR models can be estimated by the method of maximum likelihood (ML), e.g., Ord [22], Smirnov and Anselin [23], and Robinson and Rossi [24], methods of moments, e.g., Kelejian and Prucha [25], and Lee and Liu [26], as well as the method of quasi-maximum likelihood (QML), e.g., Lee [27] and Yu et al. [28].
In recent years, many authors have considered the problem of spatial autoregressive models. Liu et al. [21] developed a penalized quasimaximum likelihood method for simultaneous model selection and parameter estimation in the SAR models with independent and identical distributed errors. Song et al. [29] proposed a variable selection method based on exponential squared loss for the SAR models. Ju et al. [30] developed Bayesian influence analysis for skew-normal spatial autoregression models (SSARMs). Our argument will focus on the more general specification of linear structure of the spatial interaction.
Let us carefully look up model (3). The single spatial coefficient ρ in (3) together with the nonnegative spatial weights implies a unidirectional effect. Specifically, if ρ > 0 (<0), then those spatial individuals with positive spatial weights ( w i j > 0 for j i ) all have positive (negative) effects on individual i.
In applications, however, it is possible that some spatial individuals have positive effects while other individuals have negative effects on individual i. For instance, in regional economics applications, region j may serve as a supply hub for region i and could have a positive effect on region i’s economy, while region k competes (e.g., an alternative competition) against region i and could have a negative effect on region i. Specification of the linear structure of spatial interaction in (3) rules out this type of bidirectional effect.
In addition, the spatial weights in (3) are often constructed to be symmetric (i.e., w i j = w j i ), implying that the effect of individual i on individual j is identical to the effect of individual j on individual i. In applications, it is possible to have asymmetric effects. Again, in regional economics applications, economically strong region i is likely to have a bigger impact on economically weak region j than the economically weak region on the economically strong region. This type of asymmetric effects are not permitted by (3). One could argue that the asymmetric effects can be modeled through construction of asymmetric spatial weights. As we know, there is hardly any theoretical guidance on the construction of those asymmetric weights and the construction of weights itself is not entirely undisputed. About weights matrix, an interested reader can be referred to Ahrens and Bhattacharjee [31], and Lam and Souza [32].
This type of individual-specific endogenous effects is universal in real society. In many applications, individuals have different impacts on neighbors’ behaviors. For example, Clark and Loheac [33] designed a special (classic) spatial panel model to show that popular teenagers in a school have much stronger influences on their classmates’ smoking decisions than their less popular peers, Mas and Moretti [34] applied the expected utility principle to find that the magnitude of spillovers varies dramatically among workers with different skill levels, Banerjee et al. [35] used a model of word-of-mouth diffusion to investigate that individuals directly connected with some village leaders are more likely to join the micro-finance program than those connected to someone else.
To better capture the spatial interaction, a spatial autoregressive model with a general linear specification that includes all spatial interaction terms should be expressed as
y i = i j w i j ρ i j y j + x i β + ϵ i ,
where ρ i j is the spatial coefficient representing the effect of individual j on individual i. If the true values of the spatial coefficients satisfy ρ i j = ρ for all i and j, the most general SAR model (4) is reduced to the famous model (3). If ρ i j ρ j i for some i , j , then the effect of individual j on individual i is not equal to the effect of individual i on individual j even if w i j = w j i , and if ρ j i ρ i j < 0 the effects of individual i on j and individual j on i are bidirectional (in the opposite direction).
Clearly the most general model (4) is flexible enough to permit both asymmetric, individual-specific endogenous and bidirectional effects. Despite of these advantages, this model is not identified since there are n ( n 1 ) spatial coefficients, increasing as n increases. Some restrictions must be placed on those coefficients.
Homogeneous classification of spatial coefficients is an available method. We classify the n ( n 1 ) spatial coefficients into m subgroups where m is independent of n and spatial coefficients in a subgroup take a same value. Two approaches, data-driven selection and economic geographic attributes, can be adopted to realize this type of homogeneous classification. In regional economics applications, many regions belonging to an upper administrative unit is one example of economic geographic attributes. In this example, the number of upper administrative units, m, is regarded as being independent of n if it is much smaller than the number of regions, n. In one word, we are restricting the true values of the n ( n 1 ) spatial coefficients in model (4) to a set of m finite distinct values.
Consequently, the specified spatial weight matrix W of order n can be divided into m nonzero spatial weight matrices, W 1 , , W m , of order n, satisfying the homogeneous classification condition
W = W 1 + + W m ,
where, for any nonzero component w i j of W, there is a unique k 0 such that w k 0 , i j = w i j and w k , i j = 0 for k k 0 with w k , i j being the ( i , j ) th component of W k , and for w i j = 0 , then all w k , i j = 0 for all k. The weights w’s may be selected based on potential specifications, such as physical distance, social networks, or “economic” quantities among variables, see Case et al. [9], or based on a best combination of these specifications, see Lam and Souza [32].
With the homogeneous classification (5), a heterogenous spatial autoregressive model with a linear specification having m distinct spatial coefficients is postulated as
y i = k = 1 m ρ k i j w k , i j y j + x i β + ϵ i .
Dou et al. [36] considered a class of pure spatio-temporal models without regressors by classifying the spatial coefficients based on rows of weight matrix. Their method, however, does not work for the similar models with regressors. Peng [37] considered a SAR model in network by classifying the spatial coefficients based on columns of weight matrix. Peng’s work need sparsity of spatial coefficients.
In this paper, we investigate asymptotic properties of the maximum likelihood (ML) estimator and the quasi-maximum likelihood (QML) estimator for the heterogeneous SAR model (6) under the normal distributional specification, extending the results for the SAR model (3) investigated in Lee [27] to the heterogeneous SAR model (6). The QML estimator is appropriate when the estimator is derived from a normal likelihood but the disturbances in the model are not truly normally distributed, e.g., see Lee [27].
In the existing literature, the ML estimator of such a model is implicitly regarded as having the familiar n -rate of convergence as a usual ML estimator for a parametric statistical model with sample size n, e.g., see the reviews by Anselin [4] and Anselin and Bera [6]. Lee [27] provided a broader view of the asymptotic property of the ML and QML estimators and shown that the rates of convergence of the ML and QML estimators depend on some general features of the spatial weights matrix W of the model (3). This paper will extend Lee’s works to the model (6), aiming at providing a similar view of the asymptotic property and the rates of convergence of the ML and QML estimators in the model (6) under the different scenarios.
The remainder of this paper is organized as follows. Section 2 provides an estimation procedure to find the ML and QML estimators of parameters in the novel heterogeneous spatial autoregressive model and specify regularity conditions to the model (6). In Section 3, we show that identification of parameters can be assured if there is no multicollinearity among the regressors and m spatially generated regressors. The ML and QML estimators can be n -consistent and asymptotically normal (Theorem 3) under some regularity conditions on the spatial weights matrix.
Section 4 considers the spatial scenarios in the space zone between asymptotic normality and inconsistency. This spatial scenarios occur when each individual can be influenced by many neighbors or can influence many neighbors, in which singularity or irregularity of the information matrix may occur and various components of the QML estimators may have different rates of convergence. This includes the ML estimator and QML estimator for the (pure) heterogeneous SAR process. This section also considers the event of multicollinearity where the m spatially generated regressor is collinear with the original regressors. A counterexamples of two distinct spatial coefficients is given to provide an inconsistent QML estimator when multicollinearity occurs.
In Section 5, we conduct finite sample simulation studies for the spatial coefficients heterogeneous SAR models with 5 distinct spatial coefficients. Section 6 provides the brief concluding remarks. The proofs of all main theorems are collected in Appendix A and counterexample of inconsistent QML estimators are provided in Appendix B for this article.

2. Heterogeneous SAR Model and QML Estimators

The heterogeneous SAR model with m distinct spatial coefficients and p common regressors is written in a matrix version as
Y n = k = 1 m ρ k W k Y n + X n β + ϵ n ,
where n is the total number of spatial units, Y n is the n-dimensional equilibrium (price) vector, X n is an n × p matrix of constant (spatial varying) regressors, ρ = ( ρ 1 , , ρ m ) is the spatial coefficient vector, β = ( β 1 , , β p ) is the regression coefficients (may include the intercept), ϵ n is an n-dimensional vector of independently and identically distributed (i.i.d.) disturbances with zero mean and variance σ 2 .
We denote θ 0 = ( ρ 0 , β 0 , σ 0 2 ) to be the true value of θ = ( ρ , β , σ 2 ) . Let S ( ρ ) = I k = 1 m ρ k W k for any spatial parametric vector ρ , where I is the identity matrix. The equilibrium (price) vector Y n is expressed as
Y n = S n 1 ( X n β + ϵ n ) ,
where S n = S ( ρ 0 ) is assumed to be nonsingular. When there are no regressors X n in the model (7), it becomes a pure heterogeneous SAR process:
Y n = i = k m ρ k W k Y n + ϵ n ,
implying that the equilibrium (price) vector Y n is simply derived from the disturbance vector ϵ n .
We use ϵ ( ρ , β ) to denote S ( ρ ) Y n X n β and ϵ n to denote ϵ ( ρ 0 , β 0 ) . The log-likelihood function of θ in (7) is
log L n ( θ ) = n 2 log ( 2 π ) n 2 log ( σ 2 ) + log | S ( ρ ) | 1 2 σ 2 ϵ ( ρ , β ) ϵ ( ρ , β ) .
The extremum estimator derived from the maximization of (10) is written as
θ ^ n = argmax θ log L n ( θ ) ,
where θ takes values in the set of admissible values. When the (i.i.d.) disturbances in the model (7) are normally distributed by N ( 0 , σ 2 ) , the extremum estimator is the maximum likelihood (ML) estimator. When the (i.i.d.) disturbances in the model are not truly normally distributed, the extremum estimator derived from a normal likelihood is called the quasi-maximum likelihood (QML) estimator.
The first-order partial derivatives of function (10) with respect to ρ , β and σ 2 as follows:
log L n ( θ ) ρ k = tr S ( ρ ) 1 W k + 1 σ 2 ϵ ( ρ , β ) W k Y n for k = 1 , , m ,
where tr ( A ) is the trace of A and the formula d log ( det F ) = tr ( F 1 d F ) is used, see Magnus and Neudecker [38],
log L n ( θ ) β = 1 σ 2 X n ϵ ( ρ , β ) ,
and
log L n ( θ ) σ 2 = n 2 σ 2 + 1 2 σ 4 ϵ ( ρ , β ) ϵ ( ρ , β ) .
In order to prove consistency and asymptotic normality, we usual adopt the following solution procedure. The concentrated log-likelihood function of ρ is defined as
log L n ( ρ ) = max β , σ 2 log L n ( θ ) .
Letting (13) be zero obtains the QML estimator of β for fixed ρ
β ˜ n ( ρ ) = ( X n X n ) 1 X n S ( ρ ) Y n ,
and letting (14) be zero gives the QML estimator of σ 2 for fixed ρ
σ ˜ n 2 ( ρ ) = 1 n [ S ( ρ ) Y n X n β ˜ n ( ρ ) ] [ S ( ρ ) Y n X n β ˜ n ( ρ ) ] = 1 n Y n S ( ρ ) ( I P X n ) S ( ρ ) Y n
where P X n = X n ( X n X n ) 1 X n is the orthogonal projection to the column space of X n . Then, the concentrated log likelihood function of ρ is
log L n ( ρ ) = n 2 log ( 2 π ) + 1 n 2 log σ ˜ n 2 ( ρ ) + log | S ( ρ ) | .
Maximizing the concentrated likelihood (17) obtains the QML estimator ρ ^ n of ρ
ρ ^ n = argmax ρ log L n ( ρ ) ,
where ρ takes values in the set of admissible values. The procedure of calculating the QML estimator ρ ^ n can be realized by Newton–Raphson method (Ord [22]) and R package rgenoud. Thus, the QML estimators of β and σ 2 are expressed, respectively, as
β ^ n = β ˜ n ( ρ ^ n ) and σ ^ n 2 = σ ˜ n 2 ( ρ ^ n ) ,
finally obtaining QML estimator θ ^ n in (11).
After finding the QML estimator θ ^ n , we focus our attention on how to investigate its consistency and asymptotic normality. Similar to Lee [27], we first introduce some basic regularity conditions for our heterogeneous SAR model to provide a rigorous analysis of the QML estimators. Additional regularity conditions will be subsequently added.
Assumption A1.
The disturbances, ϵ 1 , , ϵ n , of ϵ n = ( ϵ 1 , , ϵ n ) are i.i.d. with mean zero and variance σ 2 > 0 . Their moments, E ( | ϵ 1 | 4 + γ ) , , E ( | ϵ n | 4 + γ ) for some γ > 0 uniformly exist.
Assumption A2.
The elements w i j of W are O ( d n 1 ) , at most of order d n 1 , uniformly for all i , j , where the rate sequence d n is bounded or divergent, as n tends to infinite. As a normalization, w i i = 0 for all i.
Assumption A3.
The ratio lim n d n / n = 0 as n goes to infinity.
Assumption A4.
The matrix S n is nonsingular.
This tells us that the parametric function S ( ρ ) is nonsingular at point ρ 0 . Since S ( ρ ) is continuous, S ( ρ ) is nonsingular for ρ in a neighborhood of ρ 0 . Alternatively, ρ is said to take values in the set of admissible values.
Assumption A5.
The weight matrix W and S n 1 are uniformly bounded in both row and column sums as n goes to infinity.
It is also assumption in Horn and Johnson [39]. It follows from (5) that W 1 , , W m all are uniformly bounded in both row and column sums. Since usual w i j 0 , putting Assumptions A1 and A5 together implies that there do not exist O ( n ) elements of each row or each column whose number reaches to O ( d n 1 ) .
Assumption A6.
S n ( ρ ) 1 are uniformly bounded in either row or column sums, uniformly in ρ in a compact parameter space Γ with ρ 0 being in the interior of Γ.
Assumption A7.
The elements of X n are uniformly bounded constants for all n and assume that
lim n 1 n X n X n > 0 .

3. Asymptotic Properties of QML Estimators

Let G k = W k S n 1 , then S n 1 = I + k = 1 m ρ k 0 G k . From (8), the reduced form equation of Y n can be represented as
Y n = X n β 0 + k = 1 m ρ k 0 ( G k X n β 0 ) + S n 1 ϵ n .
Condition 1.
Assume that
lim n 1 n ( G 1 X n β 0 , , G m X n β 0 ) ( I P X n ) ( G 1 X n β 0 , , G m X n β 0 ) > 0 .
Condition 2.
Assuming
lim n 1 n ( G 1 X n β 0 , , G m X n β 0 , X n ) ( G 1 X n β 0 , , G m X n β 0 , X n ) > 0 .
Condition 2 says that X n , G 1 X n β 0 , , G m 1 X n β 0 and G m X n β 0 in (18), are not asymptotically multicollinear. It is a sufficient condition for global identification of θ 0 .
Lemma 1.
Under Assumptions A1–A6, Condition 2 is true if and only if Assumption A7 and Condition 1 both are true.

3.1. Consistency

Corresponding to the concentrated log likelihood function, we define its expectation as
Q n ( ρ ) = max β , σ 2 E log L n ( θ ) .
It follows from (10) that the expectation of log L n ( θ ) is expressed as
E log L n ( θ ) = n 2 log ( 2 π ) n 2 log ( σ 2 ) + log | S ( ρ ) | 1 2 σ 2 E ϵ ( ρ , β ) ϵ ( ρ , β ) ,
with
E ϵ ( ρ , β ) ϵ ( ρ , β ) = E ( S ( ρ ) Y n X n β ) ( S ( ρ ) Y n X n β ) = β X n X n β 2 β X n S ( ρ ) S n 1 X n β 0 + n σ n 2 ( ρ ) + β 0 X n B n X n β 0 ,
where σ n 2 ( ρ ) = 1 n σ 0 2 tr ( B n ) with B n = ( S n 1 ) S ( ρ ) S ( ρ ) S n 1 . Moreover,
σ n 2 ( ρ ) = σ 0 2 + 2 σ 0 2 n k = 1 m ( ρ k 0 ρ k ) tr ( G k ) + σ 0 2 n j , k = 1 m ( ρ j 0 ρ j ) ( ρ k 0 ρ k ) tr ( G j G k ) = σ 0 2 + O p ( d n 1 ) .
The optimal solutions of this maximization (20) are
β ˘ n ( ρ ) = ( X n X n ) 1 X n S ( ρ ) S n 1 X n β 0
and
σ ˘ n 2 ( ρ ) = 1 n E [ S ( ρ ) Y n X n β ˘ n ( ρ ) ] [ S ( ρ ) Y n X n β ˘ n ( ρ ) ] = 1 n β 0 X n ( S n 1 ) S ( ρ ) ( I P X n ) S ( ρ ) S n 1 X n β 0 + σ 0 2 tr ( B n ) = 1 n k = 1 ( ρ k ρ k 0 ) ( G k X n β 0 ) ( I P X n ) k = 1 ( ρ k ρ k 0 ) G k X n β 0 + σ n 2 ( ρ ) .
Hence,
Q n ( ρ ) = n 2 [ log ( 2 π ) + 1 ] n 2 log σ ˘ n 2 ( ρ ) + log | S ( ρ ) | .
Let f n ( z n | ρ ) is the density function of a random vector following the multivariate normal N n ( 0 , σ n 2 ( ρ ) S ( ρ ) 1 S ( ρ ) 1 ) . When ρ ρ 0 , if the determinant of covariance σ n 2 ( ρ ) S ( ρ ) 1 S ( ρ ) 1 is not equal to the determinant of covariance σ 0 2 S n 1 ( S n ) 1 , then the probability of the set { z n : f n ( z n | ρ ) f n ( z n | ρ 0 ) } is not zero. Similarly, for any n, when 1 n log | σ n 2 ( ρ ) S ( ρ ) 1 S ( ρ ) 1 | 1 n log | σ 0 2 S n 1 ( S n ) 1 | , the probability of the set { z n : 1 n log f n ( z n | ρ ) 1 n log f n ( z n | ρ 0 ) } is not zero.
Condition 3.
When { d n } is a bound sequence, for ρ ρ 0 ,
lim n 1 n log [ σ 0 2 S 1 ( S ) 1 ] log [ σ n 2 ( ρ ) S ( ρ ) 1 S ( ρ ) 1 ] 0 .
After all above-mentioned preparations, consistency of the QML estimator follows.
Theorem 1.
Under Assumptions A1–A7 with a bound sequence { d n } , given Condition 1 or 3, then θ 0 is globally identifiable and θ ^ n is a consistent estimator of θ 0 .
Proof. 
See Appendix A. □
Our argument process for consistency follows from Theorem 3.4 of White (1994).

3.2. Asymptotic Normality

In this subsection, we derive the asymptotic distribution of the QML estimator θ ^ n . We start from the optimal condition
log L n ( θ ^ n ) θ = 0 , namely , log L n ( θ ) θ θ = θ ^ n = 0 .
Based on (12)–(14), the first-order derivatives of the log-likelihood function at θ 0 are
1 n log L n ( θ 0 ) ρ k = 1 n σ 0 2 [ ϵ n G k ϵ n σ 0 2 tr ( G k ) ] + 1 n σ 0 2 ( G k X n β 0 ) ϵ n , k = 1 , , m , 1 n log L n ( θ 0 ) β = 1 n σ 0 2 X n ϵ n , and 1 n log L n ( θ 0 ) σ 2 = 1 2 n σ 0 4 ϵ n ϵ n n σ 0 2 .
Let us consider the likelihood score vector
U = 1 n log L n ( θ 0 ) θ .
It is easy to see that E ( U ) = 0 from (12)–(14). Then, the covariance or Fisher information matrix of U is
I n ( θ 0 ) Cov ( U ) = E 1 n log L n ( θ 0 ) θ 1 n log L n ( θ 0 ) θ ,
which can be decomposed into
E 1 n log L n ( θ 0 ) θ 1 n log L n ( θ 0 ) θ = Σ n ( θ 0 ) + Ω n ( θ 0 ) ,
where
Σ n ( θ 0 ) = E 1 n 2 log L n ( θ 0 ) θ θ
is called the sample average Hessian matrix. The following discussion aims at calculation of Σ n ( θ 0 ) and Ω n ( θ 0 ) .
From (12)–(14), we obtain the following equations:
1 n 2 log L n ( θ ) ρ k ρ l = 1 n tr S ( ρ ) 1 W k S ( ρ ) 1 W l + 1 n σ 2 Y n W k W l Y n , for k , l = 1 , , m ,
where d F 1 = F 1 ( d F ) F 1 is used,
1 n 2 log L n ( θ ) β β = 1 n σ 2 X n X n , 1 n 2 log L n ( θ ) β ρ k = 1 n σ 2 X n W k Y n , for k = 1 , , m , 1 n 2 log L n ( θ ) β σ 2 = 1 n σ 4 X n ϵ ( ρ , β ) , 1 n 2 log L n ( θ ) σ 2 σ 2 = 1 2 σ 4 + 1 n σ 6 ϵ ( ρ , β ) ϵ ( ρ , β ) , 1 n 2 log L n ( θ ) ρ k σ 2 = 1 n σ 4 ϵ ( ρ , β ) W k Y n , for k = 1 , , m .
By calculating the expectation of the above second derivatives (24) and (25) at θ 0 , we obtain the following lemma.
Lemma 2.
The sample average Hessian matrix Σ n ( θ 0 ) is given by
Σ n ( θ 0 ) = Σ ρ ρ ( θ 0 ) * * Σ β ρ ( θ 0 ) Σ β β ( θ 0 ) 0 Σ σ 2 ρ ( θ 0 ) 0 Σ σ 2 σ 2 ( θ 0 )
with
Σ ρ ρ ( θ 0 ) = 1 n tr ( G k s G l + G l s G k ) + 1 n σ 0 2 ( G k X n β 0 ) ( G l X n β 0 ) , Σ β ρ ( θ 0 ) = 1 n σ 0 2 X n ( G 1 X n β 0 ) 1 n σ 0 2 X n ( G m X n β 0 ) , Σ β β ( θ 0 ) = 1 n σ 0 2 X n X n , Σ σ 2 ρ ( θ 0 ) = 1 n σ 0 2 tr ( G 1 ) 1 n σ 0 2 tr ( G m ) , and Σ σ 2 σ 2 ( θ 0 ) = 1 2 σ 0 4 .
Proof. 
See Appendix A. □
Condition 4.
Assume that
lim n 1 n [ vec ( C 1 + C 1 ) , , vec ( C m + C m ) ] [ vec ( C 1 + C 1 ) , , vec ( C m + C m ) ] > 0 ,
where C k = G k ( 1 / n ) tr ( G k ) I and vec is a vectorization operator.
Condition 2 may be true only under the scenario that { d n } is a bounded sequence because tr ( C i C j ) = O ( n / d n ) for any i , j . The following is a sufficient and necessary condition for a nonsingular average Hessian matrix Σ θ = lim n Σ n ( θ 0 ) .
Theorem 2.
Under Assumptions A1–A7, the average Hessian matrix Σ θ is nonsingular if and only if either of Conditions 1 and 4 holds.
Proof. 
See Appendix A. □
For a divergent sequence { d n } , Condition 4 is violated. In this case, Σ θ is nonsingular if and only if Condition 1 holds. Set μ k = E ( ϵ k ) at θ 0 , k = 3 , 4 and g k = ( G k , 11 , , G k , n n ) with G k , i i being the ( i , i ) th component of G k . Calculation of Ω n ( θ 0 ) is summarized in the following lemma.
Lemma 3.
The difference between the information matrix and the sample average Hessian matrix is given by
Ω n ( θ 0 ) = I n ( θ 0 ) Σ n ( θ 0 ) = Ω ρ ρ ( θ 0 ) * * Ω β ρ ( θ 0 ) 0 * Ω σ 2 ρ ( θ 0 ) Ω σ 2 β ( θ 0 ) Ω σ 2 σ 2 ( θ 0 ) ,
where
Ω ρ ρ ( θ 0 ) = 1 n σ 0 4 ( μ 4 3 σ 0 4 ) g k g l + μ 3 n σ 0 4 ( G k X n β 0 ) g l + g k ( G l X n β 0 ) , Ω β ρ ( θ 0 ) = μ 3 n σ 0 4 X n g 1 μ 3 n σ 0 4 X n g l μ 3 n σ 0 4 X n g m , Ω σ 2 ρ ( θ 0 ) = 1 2 n σ 0 6 ( μ 4 3 σ 0 4 ) tr ( G l ) + μ 3 1 G l X n β 0 , Ω σ 2 β ( θ 0 ) = μ 3 2 n σ 0 6 1 X n , and Ω σ 2 σ 2 ( θ 0 ) = 1 4 σ 8 ( μ 4 3 σ 0 4 ) .
Proof. 
See Appendix A. □
If ϵ n is normally distributed, then μ 3 = 0 and μ 4 = 3 σ 0 4 , implying that Ω n ( θ 0 ) = 0 . Thus, the sample average Hessian matrix Σ n ( θ 0 ) is the covariance of U or the information matrix I n ( θ 0 ) . In the sense, Σ n ( θ 0 ) can be said to be an information matrix. Finally, with the above long and necessary preparations, the asymptotic distribution of the QML estimator θ ^ n is summarized in the following theorem.
Theorem 3.
Under Assumptions A1–A7 and either of Conditions 1 and 4, n ( θ ^ n θ 0 ) converges in distribution to the multivariate normal N ( 0 , Σ θ 1 + Σ θ 1 Ω θ Σ θ 1 ) , where
Ω θ = lim n Ω n ( θ 0 ) and Σ θ = lim n E 1 n 2 log L n ( θ 0 ) θ θ .
Moreover, particularly if ϵ n is normally distributed, then n ( θ ^ n θ 0 ) converges in distribution to the multivariate normal N ( 0 , Σ θ 1 ) .
Proof. 
See Appendix A. □
The estimation of the asymptotic covariance of θ ^ n is a routine issue. The Σ θ is estimated by
1 n 2 log L n ( θ ^ n ) θ θ .
The Ω θ is estimated by I n ( θ ^ n ) Σ n ( θ ^ n ) in (27). For the QML estimator, the extra moments μ 3 and μ 4 in (27) can be estimated by the third and fourth order empirical moments based on estimated residuals of the ϵ ’s.
Remark 1.
The asymptotic results in Theorems 1 and 3 are valid regardedless of whether the series { d n } is a bounded or divergent sequence.
For the case in which lim n d n = , because G k , i j = O ( d n 1 ) , the matrices (16) and (27) can be simplified to
Σ θ = lim n 1 n σ 0 2 ( G k X n β 0 ) G l X n β 0 * 0 1 n σ 0 2 X n G 1 X n β 0 G m X n β 0 1 n σ 0 2 X n X n 0 0 0 1 2 σ 0 4
and
Ω θ = lim n 0 0 * 0 0 * μ 3 2 n σ 0 6 1 G 1 X n β 0 G m X n β 0 μ 3 2 n σ 0 6 1 X n 1 4 σ 0 8 ( μ 4 3 σ 0 4 ) .
Theoretically, the presence of X n and the linear independence of G 1 X n β 0 , , G m X n β 0 and X n are the crucial conditions for the asymptotic results in Theorem 3, in particular, the n -rate of convergence of θ ^ n . Practically, G 1 X n β 0 , , G m X n β 0 and X n are not (asymptotically) multicollinear to guarantee consistency of θ ^ n .
Remark 2.
When the disturbances ϵ’s are normally distributed, θ ^ n is the ML estimator. The ML estimators, β ^ n and σ ^ n 2 , still have independent as the same as that of linear regression analysis regardedless of whether the series { d n } is a bounded or divergent sequence. However, the dependence between ML estimator ρ ^ n and σ ^ n 2 relies on whether the series { d n } is a bounded or divergent sequence. When the series { d n } is bounded, there is some k 0 such that the ML estimator ρ ^ k 0 and σ ^ n 2 will be asymptotically dependent, see (16), because lim n tr ( G k 0 ) / n is finite and may not be zero. Anselin and Bera (1998) discussed the implication of this dependence on statistical inference problems for the case of m = 1 . We also see that, for the case in which the series { d n } is a divergent sequence, lim n tr ( G k ) / n = 0 for all k, the QML estimator ρ ^ n and σ ^ n 2 are asymptotically independent.
Remark 3.
The requirements in Conditions 1 and 2 are for all spatial coefficients. Sometime, it is possible these requirements to be satisfied only by partial spatial coefficients. Write ρ = ( ρ 1 , ρ 2 ) and ρ 0 = ( ρ 10 , ρ 20 ) , where ρ 1 and ρ 10 are m 1 -dimensional while ρ 2 and ρ 20 are ( m m 1 ) -dimensional. Conditions 1 and 2 hold only for partial spatial coefficients ρ 10 , i.e., without loss of generality,
Condition 1’. The lim n 1 n ( G 1 X n β 0 , , G m 1 X n β 0 ) ( I P X n ) ( G 1 X n β 0 , , G m 1 X n β 0 ) > 0 .
Condition 4’. The lim n 1 n [ vec ( C 1 + C 1 ) , , vec ( C m 1 + C m 1 ) ] [ vec ( C 1 + C 1 ) , , vec ( C m 1 + C m 1 ) ] > 0 .
After using Condition 1’ and 4’ to replace Conditions 1 and 4, consistency in Theorem 1 and asymptotic normality in Theorem 3 hold for ρ ^ 1 , β ^ n , σ ^ n 2 , at ρ 2 = ρ 20 .

4. Asymptotic Normality with Non-Square-Root Rates

Consider the case of lim n = . It follows from Theorem 2 that the average Hessian matrix is singular under Condition 5 and 6.
Condition 5.
For any k { 1 , , m } , lim n 1 n ( G k X n β 0 ) ( I P X n ) ( G k X n β 0 ) = 0 .
Condition 6.
For any k { 1 , , m } , the lim n 1 n vec ( C k + C k ) vec ( C k + C k ) = 0 .
For example, for the pure SAR process (10) with θ = ( ρ , σ 2 ) , as d n , Σ θ = diag [ 0 , 1 / ( 2 σ 0 2 ) ] . There are other cases in which the singularity occurs, see Remark 4. However, it is easy to see that there is a space zone between Theorem 3 and inconsistency. In this section, we will investigate some new condition of guaranteeing consistency with non-square-root rates or asymptotic normality in the situation under Condition 5 and 6.
Lee [27] suggested that the singularity of the average Hessian matrix or the information matrix under normal disturbances has implications on the rate of convergence of the estimators. When lim n d n = , ( 1 / n ) log L n ( ρ ) is rather flat with respect to ρ and the convergence of ( 1 / n ) [ log L n ( ρ ) Q n ( ρ ) ] to zero is too fast to be useful. A properly adjusted rate is made such that
d n n [ Log L n ( ρ ) log L n ( ρ 0 ) ] [ Q n ( ρ ) Q n ( ρ 0 ) ] P 0 uniformly for ρ in Λ .
We consider the following two new conditions.
Condition 7.
The { d n } is a divergent sequence, elements of ( I P X n ) ( G k X n β 0 ) for k = 1 , , m have the uniform order O ( d n 1 ) , and
lim n d n n ( G 1 X n β 0 , , G m X n β 0 ) ( I P X n ) ( G 1 X n β 0 , , G m X n β 0 ) > 0 .
Condition 7 modifies Condition 1 with the factor d n to account for the proper rate of convergence. It is a generalization of Condition 1.
Condition 8.
The { d n } is a divergent sequence and
lim n d n n log σ 0 2 S n 1 ( S n ) 1 d n n log σ n 2 ( ρ ) S ( ρ ) 1 S 1 ( ρ ) 0 .
Condition 8 modifies Condition 3 with the factor d n to account for the proper rate of convergence. It is a generalization of Condition 3.
Theorem 4.
Under Assumptions A1–A7 and Conditions 5 and 6, if either of Conditions 7 and 8 holds, then the QML estimator ρ ^ n derived from the maximization of log L n ( ρ ) in (17) is a consistent estimator.
Proof. 
See Appendix A. □
The central limit theorem for a linear–quadratic form implies that ( d n / n ) L n ( ρ ) / ρ is asymptotically normal. The asymptotic distribution of ρ ^ n follows from
d n n ( ρ ^ n ρ 0 ) = d n n 2 log L n ( ρ 0 ) ρ ρ 1 d n n log L n ( ρ 0 ) ρ .
Theorem 5.
Under Assumptions A1–A7, and Condition 7 or 8,
n d n ( ρ ^ n ρ 0 ) D N 0 , Σ ρ 1 I ρ Σ ρ 1 ,
where
Σ ρ = lim n E d n n 2 L n ( ρ 0 ) ρ ρ = lim n b k l
with
b k l = 1 σ 0 2 d n n ( G k X n β 0 ) ( I P X n ) ( G l X n β 0 ) + d n n [ tr ( G k G l ) + tr ( G k G l ) ]
and
I ρ = lim n E d n n L n ( ρ 0 ) ρ d n n L n ( ρ 0 ) ρ = lim n d n n σ ˜ n 4 ( ρ 0 ) e k l
with
e k l = σ 0 2 ( G k X n β 0 ) ( I P X n ) ( G l X n β 0 ) + μ 3 [ ( G k X n β 0 ) ( I P X n ) c l + ( G l X n β 0 ) ( I P X n ) c k ] + ( μ 4 3 σ 0 4 ) c k c l + σ 0 4 tr ( ( I P X n ) C k ( I P X n ) C l ) + tr ( C k ( I P X n ) C l ) .
Proof. 
See Appendix A. □
After finding the limiting distribution of ρ ^ n , the limiting distributions of β ^ n and σ ^ n 2 defined in (15) and (16) are inevitable consequences.
Theorem 6.
Under Assumptions A1–A7, and Condition 7 or 8,
n d n ( β ^ n β 0 ) = 1 d n X n X n n 1 X n ϵ n n X n X n n 1 1 n X n ( G 1 X n β 0 ) 1 n X n ( G 1 X n β 0 ) n d n ( ρ ^ n ρ 0 ) 1 n k = 1 m n d n ( ρ ^ k ρ k 0 ) X n X n n 1 ( X n G k ϵ n ) n D N 0 , V Σ ρ 1 I ρ Σ ρ 1 V ,
where V = lim n X n X n n 1 lim n 1 n X n ( G 1 X n β 0 ) 1 n X n ( G m X n β 0 ) , and
n ( σ ^ n 2 σ 0 2 ) = k = 1 n ϵ i 2 σ 0 2 n + o p ( 1 ) D N ( 0 , μ 4 σ 0 4 ) .
In particular, when β 0 = 0 ,
n β ^ n = X n X n n 1 X n ϵ n n d n n k = 1 m n d n ( ρ ^ k ρ k 0 ) X n X n n 1 X n G k ϵ n n D N 0 , σ 0 2 lim n 1 n X n X n .
Proof. 
See Appendix A. □
The asymptotic distribution of ρ ^ n has the n / d n -rate of convergence in Theorem 5. As d n is divergent, this rate of convergence is lower than n . The asymptotic distribution of the QML estimator β ^ n and its low rate of convergence in Theorem 6 are determined by the asymptotic distribution of ρ ^ n that forms the leading term in the asymptotic expansion (29). When β 0 = 0 , this leading term vanishes and β ^ n converges to β 0 with the usual n -rate. The asymptotic distribution of σ ^ n 2 keeps the usual n -rate of convergence.
The rate of convergence of β ^ n may be changed when all items, ( 1 / n ) X n G 1 X n β 0 , , ( 1 / n ) X n G m X n β 0 , may vanish asymptotically and simultaneously. However, the exact rate of convergence will depend on how fast these items ( 1 / n ) X n G 1 X n β 0 , , ( 1 / n ) X n G m X n β 0 will vanish in the limit. This asymptotical and simultaneous evanescence of all items may result in some components of β ^ n have n -rate of convergence while others have lower rate of convergence. An interested reader can refer to Lee [27] for the details to the case of m = 1 .
Remark 4.
The set of the vectors G 1 X n β 0 , , G m X n β 0 and the regressor matrix X n can be linearly dependent under some circumstances.
Circumstance 1: The regression coefficient β 0 is a zero vector. The heterogeneous SAR model (7) reduces to the pure heterogeneous SAR model (9). Thus, the vectors, G 1 X n β 0 , , G m X n β 0 , all become zero vectors. Condition 2 is violated because of diag ( 0 , lim n 1 n X n X n ) 0 , resulting in that the set of the vectors G 1 X n β 0 , , G m X n β 0 and the columns of regressor matrix X n can be linearly dependent. Condition 1 is also violated.
Circumstance 2: For a specific W and X n , ( I P X n ) G 1 X n β 0 = 0 , see the counterexample discussed in next subsection. In this case, Conditions 1 and 4 both are violated.
Circumstance 3: When X n = 1 x with a p-dimensional vector, P X n = P 1 P x . Then, ( I P X n ) G k X n β 0 = ( I P 1 ) ( 1 P x ) G k ( I x ) β 0 = 0 for any k. Condition 1 is violated.
In the above circumstances, if Condition 4 is also violated, for example, d n is a divergent sequence, consistency may be violated, implying that asymptotic normality is also violated. Supplementary material provides a counterexample leading to an inconsistent QML estimators.

5. Simulation Studies

In this simulation study section, we estimate the parameters in the heterogenous spatial heterogenous spatial autoregressive model (7) with the given values of the parameters. To investigate the finite sample properties of the QMLE by a Monte Carlo study with an appropriate spatial matrix W, which satisfies the assumptions in Section 2, we focus on the same spatial scenario as those investigated by Case [8] and Lee [27].
Specifically, W n = I n 1 L n 2 , where L n 2 = ( 1 n 2 1 n 2 I n 2 ) / ( n 2 1 ) and 1 n 2 is an n 2 -dimensional column vector of ones. There is n 1 districts and n 2 members in each district with each neighbor of a member in a district being given equal weight. For this spatial scenario, d n = ( n 2 1 ) , n = n 1 n 2 and hence d n / n = O ( 1 / n 1 ) . If both n 1 and n 2 increase to infinite, then d n goes to infinity and d n / n goes to zero as n tends to infinity. Let us consider the data sets generated from the following model
Y n = X n β + ρ 1 W 1 Y n + ρ 2 W 2 Y n + + ρ 5 W 5 Y n + ϵ n ,
where W 1 , , W 5 are the uniform column segmentation of spatial matrix W n , namely, W 1 is the first n / 5 column of W n , , and W 5 is the last n / 5 column of W n such that W = W 1 + + W 5 , the spatial coefficient vector ρ = ( ρ 1 , , ρ 5 ) is set to ( 0.2 , 0.35 , 0.5 , 0.65 , 0.8 ) and ϵ n follows N ( 0 , I n ) .
We consider three different settings for models with three different types of regressors without the intercept term:
  • Model 1: Set β = ( 1 , 1 ) . The n-dimensional regressor vector X 1 is generated by the i.i.d. standard normal N ( 0 , 1 ) and X 2 is generated by the i.i.d t-distribution t ( 2 ) .
  • Model 2: The setting is the same as Model 1. Additionally, the correlation coefficient of the first regressor and the second regressor is 0.5 .
  • Model 3: Set β = ( 1 , 1 ) , Z 1 l , l = 1 , , n 1 be generated by the standard normal N ( 0 , 1 ) and Z 2 l , l = 1 , , n 1 be generated by the t-distribution t ( 2 ) . The first regressor x 1 i l of the ith member in the district l is generated as X 1 i l = ( Z 1 l + Z 1 i l ) / 2 , where Z 1 i l are i.i.d. N ( 0 , 1 ) for all i and l and are independent of Z 1 l , and the second regressor x 2 i l is generated as X 2 i l = ( Z 2 l + Z 2 i l ) / 2 where Z 2 i l are i.i.d t ( 2 ) for all i and l and are independent of Z 2 l . The correlation coefficient of the two regressors is 0.5 . This specification implies that the average value of x 1 i l and x 2 i l of the district l will converge in probability to Z 1 l and Z 2 l as n 2 goes to infinity.
The statistical R language is used in simulation studies. For each model, there are 400 repetitions. The empirical mean, bias, empirical root mean square error (RMSE), and coverage probability of 100 ( 1 α ) % confidence interval (CP) for θ = ( ρ , β , σ 2 ) are reported, respectively, in Table 1, Table 2 and Table 3.
We experimented with three different values of n 1 at 30, 50 and 80 and three different values of n 2 at 20, 40 and 60, respectively, resulting in n taking values of 600, 1000, 1200, 1600, 1800, 2000, 3000, 3200 and 4800. For a fixed n 1 , the biases and RMSEs of ρ , β and σ 2 decrease as n 2 increases. On the other hand, for a fixed n 2 , the biases and RMSEs of θ , β and σ 2 decrease as n 1 increases. The estimators of ρ , β and σ 2 in Model 1 and Model 2 have similar bias and RMSE, while the estimator of ρ , β and σ 2 in Model 3 has the lowest bias and RMSE in all three models.
From the three tables above, we conclude that regressor vectors whose elements are correlated (see Model 3) makes the parameter estimator have lower bias and RMSE than those in Model 1, and this improvement of the estimator is not affected by the correlated regressors because the regressors of both Model 2 and Model 3 are correlated. The coverage probability of 95 % confidence interval (CP) of ρ , β and σ 2 are more closer to 0.95 as n increases.

6. Concluding Remarks

In this paper, we proposed a heterogeneous spatial autoregressive model with all n ( n 1 ) spatial coefficients taking m distinct true values, where m is independent of the sample size n, comprehensively expounding the motivations and reasons for investigating the model. It is necessary and important to establish a corresponding estimation and the properties for the novel model. We established the asymptotic properties of the maximum likelihood estimator and the quasi-maximum likelihood estimator for the parameters in the novel model, extending Lee’s work [27] for the classic spatial autoregressive model.
In the spatial autoregressive model, the homoscedasticity of the variances of disturbances was too unrealistic, particularly as the sample size n is large to infinite. How to relax the assumption of homoscedasticity to heteroscedasticity is an interesting and challenging problem. It is also our future research topic.

Author Contributions

Conceptualization, J.H. and F.Q.; methodology, J.H. and H.D.; software, H.D. and F.Q.; validation, H.D. and F.Q.; project management, J.H. and H.D.; investigation, F.Q. and H.D.; Preparation of the original work draft, H.D. and F.Q.; revision, J.H. and F.Q.; visualization, F.Q.; supervision, funding acquisition, J.H., H.D. and F.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China [Grants 11571219] and Shanghai Research Center for Data Science and Decision Technology. This work was also supported by Startup Foundation for Talents in Zhejiang Agriculture and Forestry University (2017FR044).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

Sincere thanks to everyone who suggested revisions and improved this article.

Conflicts of Interest

No conflict of interest among all authors.

Appendix A. Proofs of Main Theoretical Results

Proof (Proof of Lemma 2) .
Let A be a nonrandom matrix, then
E ( Y A Y ) = μ A μ + tr ( A Σ ) .
for a random vector Y with mean μ = E ( Y ) and covariance matrix Cov ( Y ) = Σ . With the second derivatives (24), (25), (A1) and routine calculations, we obtain
E 1 n 2 log L n ( θ 0 ) ρ k ρ l = 1 n tr S n 1 W k S n 1 W l + 1 n σ 0 2 E ( Y n W k W l Y n ) = 1 n tr ( G k s G l ) + 1 n σ 0 2 ( G k X n β 0 ) ( G l X n β 0 ) with G k s = G k + G k , for k , l = 1 , , m ,
E 1 n 2 log L n ( θ 0 ) β β = 1 n σ 0 2 X n X n ,
E 1 n 2 log L n ( θ 0 ) β ρ k = 1 n σ 0 2 X n ( G k X n β 0 ) , for k = 1 , , m ,
E 1 n 2 log L n ( θ 0 ) β σ 2 = 0 ,
E 1 n 2 log L n ( θ 0 ) σ 2 ρ k = 1 n σ 0 4 E [ ϵ n W k Y n ] = 1 n σ 0 2 tr ( G k ) , for k = 1 , , m ,
E 1 n 2 log L n ( θ ) σ 2 σ 2 = 1 2 σ 0 4 + 1 n σ 0 6 E [ ϵ n ϵ n ] = 1 2 σ 0 4 .
Therefore, Lemma 2 is proved. □
Proof (Proof of Lemma 3) .
Using the block matrices expresses I n ( θ 0 ) as
I n ( θ 0 ) = I ρ ρ ( θ 0 ) * * I β ρ ( θ 0 ) I β β ( θ 0 ) * I σ 2 ρ ( θ 0 ) I σ 2 β ( θ 0 ) I σ 2 σ 2 ( θ 0 ) .
With routine calculations, we have
I ρ ρ ( θ 0 ) = a 11 a 12 a 1 m a m 1 a m 2 a m m ,
where
a k l = 1 n E 1 σ 0 2 ϵ n W k Y n tr ( S 1 W k ) 1 σ 0 2 ϵ n W l Y n tr ( S 1 W l ) = 1 n tr ( G k ) tr ( G l ) 1 σ 0 2 tr ( G k ) E ( ϵ n G l ϵ n ) 1 σ 0 2 tr ( G l ) E ( ϵ n G k ϵ n ) + 1 σ 0 4 E ( Y n W k ϵ n ϵ n W l Y n )
with
E ( Y n W k ϵ n ϵ n W l Y n ) = E ( X n β 0 + ϵ n ) ( W k S n 1 ) ϵ n ϵ n ( W l S n 1 ) ( X n β 0 + ϵ n ) = ( G k X n β 0 ) E ϵ n ϵ n G l X n β 0 + ( G k X n β 0 ) E ϵ n ϵ n G l ϵ n + E ϵ n G k ϵ n ϵ n G l X n β 0 + E ϵ n G k ϵ n ϵ n G l ϵ n .
Recall that, for i.i.d. ϵ 1 , , ϵ n ,
E ( ϵ i ϵ j ) = σ 0 2 , for i = j 0 , for i j ; E ( ϵ i ϵ j ϵ s ) = μ 3 , for i = j = s , 0 , otherwise ;
and
E ( ϵ i ϵ j ϵ s ϵ t ) = μ 4 , for i = j = s = t , σ 0 4 , for s = t , i = j or s = i , t = j or s = j , t = i , 0 , otherwise .
Then, we have
E ( ϵ n G k ϵ n ) = E [ tr ( G k V ϵ n ) ] = tr [ G k E ( ϵ n ϵ n ) ] = σ 0 2 tr ( G k ) , E ( ϵ n G l ϵ n ) = E [ tr ( G l V ϵ n ) ] = tr [ G l E ( ϵ n ϵ n ) ] = σ 0 2 tr ( G l ) , E ( ϵ n ϵ n G k ϵ n ) = E ( ϵ 1 3 ) ( G k , 11 , , G i , n n ) = μ 3 g k , E ϵ n G l ϵ n ϵ n = E ( ϵ 1 3 ) ( G l , 11 , , G l , n n ) = μ 3 g l
and
E ( ϵ n G k ϵ n ϵ n G l ϵ n ) = s = 1 n t = 1 n i = 1 n j = 1 n G k , t s G l , i j E ( ϵ s ϵ t ϵ i ϵ j ) = μ 4 i = 1 n G k , i i G l , i i + σ 0 4 s i [ G k , s s G l , i i + G k , i s G l , i s + G k , i s G l , s i ] = ( μ 4 3 σ 0 4 ) g k g l + σ 0 4 s , i = 1 n [ G k , s s G l , i i + G k , i s G l , i s + G k , i s G l , s i ] = ( μ 4 3 σ 0 4 ) g k g l + σ 0 4 [ tr ( G k ) tr ( G l ) + tr ( G k G l ) + tr ( G k G l ) ] = ( μ 4 3 σ 0 4 ) g k g l + σ 0 4 [ tr ( G k ) tr ( G l ) + tr ( G k s G l + G k G l s ) ] ,
implying that
a k l = 1 n σ 0 2 ( G k X n β 0 ) ( G l X n β 0 ) + 1 n tr G k s G l + 1 n σ 0 4 ( μ 4 3 σ 0 4 ) g k g l + μ 3 n σ 0 4 ( G k X n β 0 ) g l + g k ( G l X n β 0 ) .
Moreover, we have
I β ρ ( θ 0 ) = 1 n E 1 σ 0 2 X n ϵ n 1 σ 0 2 ϵ n W l Y n tr ( S n 1 W l ) = 1 n σ 0 2 X n ( G l X n β 0 ) + μ 3 n σ 0 4 X n g l p × m
I β β ( θ 0 ) = 1 n σ 0 4 E X n ϵ n ϵ n X n = 1 n σ 0 2 X n X n ,
I σ 2 ρ ( θ 0 ) = 1 n E n 2 σ 0 2 + 1 2 σ 0 4 ϵ n ϵ n 1 σ 0 2 ϵ n W l Y n tr ( S 1 W l ) = 1 2 n σ 0 6 ( μ 4 σ 0 4 ) tr ( G l ) + μ 3 1 G l X n β 0 1 × m
I σ 2 β ( θ 0 ) = 1 n E n 2 σ 0 2 + 1 2 σ 0 4 ϵ n ϵ n 1 σ 0 2 ϵ n X n = μ 3 2 n σ 0 6 1 X n ,
and
I σ 2 σ 2 ( θ 0 ) = 1 n E n 2 σ 0 2 + 1 2 σ 0 4 ϵ n ϵ n n 2 σ 0 2 + 1 2 σ 0 4 ϵ n ϵ n = 1 4 σ 0 8 μ 4 σ 0 4 .
Finally, calculating I n ( θ 0 ) Σ n ( θ 0 ) obtains the expression of Ω n ( θ 0 ) . □
Proof (Proof of Theorem 1) .
It follows from the estimation procedure stated in Section 2 that β ^ n and σ ^ n 2 are the continuous functions of ρ ^ n . It means that we only need to show the consistency of ρ ^ n . To realize this goal, it suffices from Theorem 3.4 of White [40] to show that (i)
lim n 1 n [ log L n ( ρ ) Q n ( ρ ) ] = 0
uniformly for ρ on Γ , and (ii) the uniqueness identification condition that, for any ε > 0 ,
lim sup n max ρ N ε ( ρ 0 ) 1 n [ Q n ( ρ ) Q n ( ρ 0 ) ] < 0 ,
where N ε ( ρ 0 ) is the complement of an open neighborhood of ρ 0 in Γ of radius ε .
To prove part (i), it follows from (17) and (22) that
1 n [ log L n ( ρ ) Q n ( ρ ) ] = 1 2 log σ ˜ n 2 ( ρ ) log σ ˘ n 2 ( ρ )
where σ ˘ n 2 ( ρ ) is determined by (21) and σ ˜ n 2 ( ρ ) from (16) can be written as
σ ˜ n 2 ( ρ ) = 1 n k = 1 m ( ρ k ρ k 0 ) ( G k X n β 0 ) ( I P X n ) k = 1 m ( ρ k ρ k 0 ) ( G k X n β 0 ) + D 1 n + D 2 n = σ ˘ n 2 ( ρ ) + D 1 n + D 2 n σ n 2 ( ρ )
with
D 1 n ( ρ ) = 1 n k = 1 m ( ρ k ρ k 0 ) ( G k X n β 0 ) ( I P X n ) S ( ρ ) S n 1 ϵ n
and
D 2 n ( ρ ) = 1 n ϵ n ( S n 1 ) S ( ρ ) ( I P X n ) S ( ρ ) S n 1 ϵ n .
It can be shown that D 1 n ( ρ ) = o p ( 1 ) and D 2 n ( ρ ) σ n 2 ( ρ ) = o p ( 1 ) uniformly on Γ . Therefore,
log σ ˜ n 2 ( ρ ) log σ ˘ n 2 ( ρ ) = log 1 + D 1 n σ ˘ n 2 ( ρ ) + D 2 n σ n 2 ( ρ ) σ ˘ n 2 ( ρ ) = o p ( 1 )
uniformly on Γ . Consequently,
sup ρ Γ 1 n | log L n ( ρ ) Q n ( ρ ) | = o p ( 1 ) ,
proving part (i).
To prove part (ii), the identification uniqueness condition can be established as follows. Note that [ Q n ( ρ ) Q n ( ρ 0 ) ] / n can be divided into two parts
1 n [ Q n ( ρ ) Q n ( ρ 0 ) ] = 1 n E [ log f n ( z n | ρ ) ] E [ log f n ( z n | ρ 0 ) ] 1 2 log σ ˘ n 2 ( ρ ) log σ n 2 ( ρ ) ,
where f n ( z n | ρ ) is the density function of a random vector following the multivariate normal N n ( 0 , σ n 2 ( ρ ) S ( ρ ) 1 S ( ρ ) 1 ) , and E is the expectation with respect to the probability density function f n ( z n | ρ 0 ) . For the first part, { E [ log f n ( z n | ρ ) ] E [ log f n ( z n | ρ 0 ) ] } / n , consider the following function
g ( z n | ρ ) = f ( z n | ρ ) f ( z n | ρ 0 ) 1 n .
For ρ ρ 0 , it follows from Jensen’s inequality that
E [ log g ( z n | ρ ) ] log E [ g ( z n | ρ ) ] log E f ( z n | ρ ) f ( z n | ρ 0 ) 1 n = log f ( z n | ρ ) f ( z n | ρ 0 ) f ( z n | ρ 0 ) d z n 1 n = 0 ,
implying that
1 n E [ log f n ( z n | ρ ) ] 1 n E [ log f n ( z n | ρ 0 ) ] for ρ ρ 0 .
For the second part, it is clear from (21) that
log σ n 2 ( ρ ) log σ ˘ n 2 ( ρ ) .
It follows
1 n [ Q n ( ρ ) Q n ( ρ 0 ) ] 0 .
Under Condition 1, strict inequality (A5) holds, implying that Q n ( ρ ) / n < Q n ( ρ 0 ) / n and then the identification uniqueness condition (ii) holds. When Condition 1 is violated, inequality (A5) may becomes an equality, for example, β 0 X n G k ( I P X n ) G k X n β 0 = 0 for all k. That Q n ( ρ ) / n < Q n ( ρ 0 ) / n holds only if the strick inequality (A4) holds. Since log ( x ) and x 1 / n are strictly convex, a sufficient condition of guaranteeing the strict inequality (A4) is
P z n : g n ( z n | ρ ) 1 for ρ ρ 0 > 0 .
or
P z n : 1 n log f n ( z n | ρ ) 1 n log f n ( z n | ρ 0 ) 0 for ρ ρ 0 > 0 ,
which is true under Condition 4. Therefore, Q n ( ρ ) / n < Q n ( ρ 0 ) / n and then the identification uniqueness condition (ii) also holds.
Combining part (i) and (ii) together completes the proof of consistency. □
Proof (Proof of Theorem 2) .
Under Assumptions A1–A7, taking the row elementary transformation
E = I Σ ρ β ( θ 0 ) Σ β β ( θ 0 ) 1 Σ ρ σ 2 ( θ 0 ) Σ σ 2 σ 2 ( θ 0 ) 1 0 I 0 0 0 I
on the average Hessian matrix Σ n ( θ 0 ) yields
E Σ n ( θ 0 ) = Σ ρ ρ Σ ρ β Σ β β 1 Σ β ρ Σ ρ σ 2 Σ σ 2 σ 2 1 Σ σ 2 ρ 0 0 Σ β ρ ( θ 0 ) Σ β β ( θ 0 ) 0 Σ σ 2 ρ ( θ 0 ) 0 Σ σ 2 σ 2 ( θ 0 ) .
Thus, under Assumptions A1–A7, the average Hessian matrix Σ n ( θ 0 ) is nonsingular if and only if
Σ ρ ρ Σ ρ β Σ β β 1 Σ β ρ Σ ρ σ 2 Σ σ 2 σ 2 1 Σ σ 2 ρ
is nonsingular. It can be decomposed into
Σ ρ ρ Σ ρ β Σ β β 1 Σ β ρ Σ ρ σ 2 Σ σ 2 σ 2 1 Σ σ 2 ρ = H 1 H 1 + H 2 H 2 ,
where
H 1 = 1 n ( I P X n ) G 1 X n β 0 1 n ( I P X n ) G m X n β 0
and
H 2 = 2 n vec ( C 1 + C 1 ) 2 n vec ( C m + C m )
with C k = G k 1 n tr ( G k ) I . Therefore, it follows from (A6) that Σ ρ ρ Σ ρ β Σ β β 1 Σ β ρ Σ ρ σ 2 Σ σ 2 σ 2 1 Σ σ 2 ρ is nonsingular if and only if either of H 1 H 1 and H 2 H 2 exists and nonsingular, which are guaranteed by Condition 1 or 4. We complete the proof. □
Proof (Proof of Theorem 3) .
By the mean value theorem, the function log L n ( θ ) / θ at θ ^ n is expressed as
log L n ( θ ) θ = 2 log L n ( θ ¯ n ) θ θ ( θ θ ^ n ) ,
where θ ¯ n is between θ 0 and θ ^ n . First, we show that
1 n 2 log L n ( θ ¯ n ) θ θ converges in probability to 1 n 2 log L n ( θ 0 ) θ θ
Assumption A5 implies that G 1 ( ρ ¯ n ) , , G m ( ρ ¯ n ) are uniformly bounded in row and column sums uniformly in a neighborhood of ρ 0 . For k , l = 1 , , m , from (24), we have
1 n 2 log L n ( θ ¯ n ) ρ k ρ l 1 n 2 log L n ( θ 0 ) ρ k ρ l = 1 n tr [ G k ( ρ ¯ n ) G l ( ρ ¯ n ) ] tr [ G k ( ρ 0 ) G l ( ρ 0 ) ] + 1 σ ¯ n 2 1 σ 0 2 Y n W k W l Y n , = 1 n h ( ρ ˙ n ) ( ρ ˙ n ρ 0 ) + 1 n 1 σ ¯ n 2 1 σ 0 2 Y n W k W l Y n = o p ( 1 ) ,
where the mean value theorem is used, the vector h ( ρ ) = ( h 1 ( ρ ) , , h m ( ρ ) ) with h i ( ρ ) = tr [ G k ( ρ ) G i ( ρ ) G l ( ρ ) + tr [ G k ( ρ ) G l ( ρ ) G i ( ρ ) ] and ρ ˙ n is between ρ ¯ n and ρ 0 , due to h i ( ρ ˙ n ) = O ( n / d n ) and Y n W k W l Y n = O p ( n / d n ) . As other terms of the second order derivatives in (3.8) can be easily analyzed, the result (A8) holds.
Second, based on the expressions (24)–(26) and the facts that X G k ϵ n / n = o p ( 1 ) from Assumptions A1–A7 and ( 1 / n ) [ ϵ n G G l ϵ n σ 0 2 tr ( G k G l ) ] = o p ( 1 ) , we obtain
1 n 2 log L n ( θ 0 ) θ θ converges in probability to E 1 n 2 log L n ( θ 0 ) θ θ .
It follows from Theorem 2 that the average Hessian matrix Σ θ is nonsingular for large enough n. Hence, 2 log L n ( θ ¯ n ) / θ θ in the neighborhood N ε ( θ 0 ) is nonsingular. It follows from (A7) that
n ( θ ^ n θ 0 ) = 1 n 2 log L n ( θ ¯ n ) θ θ 1 1 n log L n ( θ 0 ) θ .
Third, the components of ( 1 / n ) log L n ( θ 0 ) / θ are linear or quadratic functions of ϵ n . With the existence of high-order moments of ϵ in Assumption A1, the central limit theorem for linear quadratic forms of Kelejian and Prucha [41] can be applied and
1 n log L n ( θ 0 ) θ converges in distribution N ( 0 , Σ θ + Ω θ ) .
Finally, it follows from Slutsky’s theorem that
n ( θ ^ n θ 0 ) converges in distribution to N ( 0 , Σ θ 1 + Σ θ 1 Ω θ Σ θ 1 ) .
If ϵ n is normally distributed, then Ω θ = 0 , implying that n ( θ ^ n θ 0 ) converges in distribution to the multivariate normal N ( 0 , Σ θ 1 ) . Therefore, the proof is complete. □
Proof (Proof of Theorem 4) .
From (16) and (22), we find
Δ n ( ρ ) d n n [ log L n ( ρ ) Q n ( ρ ) ] = d n 2 [ log σ ˜ n 2 ( ρ ) log σ ˘ n 2 ( ρ ) ] .
By the mean value theorem,
Δ n ( ρ ) Δ n ( ρ 0 ) = d n 2 log σ ˜ n 2 ( ρ ¯ ) log σ ˘ n 2 ( ρ ¯ ) ρ ( ρ ρ 0 ) , = 1 σ ˜ 2 ( ρ ¯ ) d n n F 1 ( ρ ¯ ) , , F m ( ρ ¯ ) ( ρ ρ 0 ) ,
where ρ ¯ is between ρ and ρ 0 and F k ( ρ ) = B k ( ρ ) σ ˜ 2 ( ρ ¯ ) σ ˘ 2 ( ρ ¯ ) σ ˘ 2 ( ρ ¯ ) A k ( ρ ) with
A k ( ρ ) = j = 1 m ( ρ j 0 ρ j ) ( G k X n β 0 ) ( I P X n ) G k X n β 0 + σ 0 2 tr G k S ( ρ ) S n 1
and
B k ( ρ ) = Y n W k ( I P X n ) S ( ρ ) Y n A k ( ρ ) .
We must show that (i) Δ n ( ρ ) Δ n ( ρ 0 ) converges in probability to zero uniformly on Γ .
Note that, by the law of large numbers for quadratic forms,
d n n [ ϵ n ( I P X n ) G k ϵ n σ 0 2 tr ( G k ) ] = o p ( 1 ) and d n n [ ϵ n G k ( I P X n ) G l ϵ n σ 0 2 tr ( G k G l ) ] = o p ( 1 ) ,
and, by Condition 7,
d n n ( G k X n β 0 ) ( I P X n ) G l ϵ n = o p ( 1 ) and d n n ( G k X n β 0 ) ( I P X n ) ϵ n = o p ( 1 ) .
Thus, we obtain
d n n B k = d n n ( G k X n β 0 ) ( I P X n ) ϵ n + j = 1 m ( ρ j 0 ρ j ) ( G k X n β 0 ) ( I P X n ) G j ϵ n + j = 1 m ( ρ j 0 ρ j ) ( G j X n β 0 ) ( I P X n ) G k ϵ n + j = 1 m ( ρ j 0 ρ j ) ϵ n G k ( I P X n ) G j ϵ n σ 0 2 tr ( G k ) σ 0 2 j = 1 m ( ρ j 0 ρ j ) tr ( G k G j ) = o p ( 1 ) ,
uniformly on Γ . ( d n / n ) A k ( ρ ) has O ( 1 ) uniformly on Γ . From (A3) in Theorem 1, σ ˜ n 2 ( ρ ) σ ˘ 2 ( ρ ) = o p ( 1 ) uniformly on Γ . On σ ˜ 2 ( ρ ¯ ) and σ ˘ 2 ( ρ ¯ ) are bounded (away from zero) in probability. Therefore, d n n { log L n ( ρ ) Q n ( ρ ) [ log L n ( ρ 0 ) Q n ( ρ 0 ) ] } converges in probability to zero uniformly on Γ , having proved (i).
Next, we need to show that (ii) the uniqueness identification condition that, for any ε > 0 ,
lim sup n max ρ N ε ( ρ 0 ) d n n [ Q n ( ρ ) Q n ( ρ 0 ) ] < 0 ,
where N ε ( ρ 0 ) is the complement of an open neighborhood of ρ 0 in Γ of radius ε .
The similar argument process shown in part (ii) of Theorem 1 is adopted. Note that d n [ Q n ( ρ ) Q n ( ρ 0 ) ] / n can be divided into two parts
d n n [ Q n ( ρ ) Q n ( ρ 0 ) ] = d n n E [ log f n ( z n | ρ ) ] E [ log f n ( z n | ρ 0 ) ] d n 2 log σ ˘ n 2 ( ρ ) log σ n 2 ( ρ ) ,
where f n ( z n | ρ ) is the density function of a random vector following the multivariate normal N 0 , σ n 2 ( ρ ) S ( ρ ) 1 S ( ρ ) 1 and E is the expectation with respect to the probability density function f n ( z n | ρ 0 ) . For the first part, consider the following function
g 1 ( z n | ρ ) = f ( z n | ρ ) f ( z n | ρ 0 ) d n n .
For ρ ρ 0 , it follows from Jensen’s inequality that
E [ log g 1 ( z n | ρ ) ] log E [ g 1 ( z n | ρ ) ] log E f ( z n | ρ ) f ( z n | ρ 0 ) d n n = log f ( z n | ρ ) f ( z n | ρ 0 ) f ( z n | ρ 0 ) d z n d n n = 0 ,
implying that
d n n E [ log f n ( z n | ρ ) ] d n n E [ log f n ( z n | ρ 0 ) ] for ρ ρ 0 .
For the second part, it is clear from (21) that
log σ n 2 ( ρ ) log σ ˘ n 2 ( ρ ) .
It follows
d n n [ Q n ( ρ ) Q n ( ρ 0 ) ] 0 .
Under Condition 7, strict inequality (A10) holds, implying that d n Q n ( ρ ) / n < d n Q n ( ρ 0 ) / n and then the identification uniqueness condition (ii) holds. When Condition 7 is violated, inequality (A10) may becomes an equality. That d n Q n ( ρ ) / n < d n Q n ( ρ 0 ) / n holds only if the strick inequality (A9) holds. Since log ( x ) and x d n / n ( d n < n from Assumption A3) are strictly convex, a sufficient condition of guaranteeing the strict inequality (A9) is
P z n : g n ( z n | ρ ) 1 for ρ ρ 0 > 0 .
or
P z n : d n n log f n ( z n | ρ ) d n n log f n ( z n | ρ 0 ) 0 for ρ ρ 0 > 0 ,
which is true under Condition 8. Therefore, d n Q n ( ρ ) / n < d n Q n ( ρ 0 ) / n , and then the identification uniqueness condition (ii) also holds.
Combining the part (i) and (ii) together proves the consistency. □
Proof (Proof of Theorem 5) .
To derive the limiting (here, the asymptotic normal) distribution of ρ ^ n with n / d n -rate of convergence, our statements are divided into four steps.
First Step: The first- and second-order derivatives of the concentrated log-likelihood are derived as
L n ( ρ ) ρ k = 1 σ ˜ n 2 ( ρ ) Y n W k ( I P X n ) S ( ρ ) Y n tr [ S ( ρ ) 1 W k ] for k = 1 , , m , and 2 L n ( ρ ) ρ k ρ l = 2 n σ ˜ 4 ( ρ ) Y n W k ( I P X n ) S ( ρ ) Y n Y n W l ( I P X n ) S ( ρ ) Y n 1 σ ˜ n 2 ( ρ ) Y n W k ( I P X n ) W l Y n tr [ S ( ρ ) 1 W k S ( ρ ) 1 W l ] for k , j = 1 , , m ,
where σ ˜ n 2 ( ρ ) = ( 1 / n ) Y n S ( ρ ) ( I P X n ) S ( ρ ) Y n . For the pure SAR process, β 0 = 0 and P X n = 0 and the corresponding derivatives are similar with ( I P X n ) replaced by the identity I.
Second step: Under Condition 7 or 8, we have
d n n Y n W k ( I P X n ) W l Y n = d n n ( G k X n β 0 ) ( I P X n ) ( G l X n β 0 ) + d n n ϵ n G k ( I P X n ) G l ϵ n + o p ( 1 )
and
d n n Y n W k ( I P X n ) S ( ρ ) Y n = d n n ϵ n G k ( I P X n ) ϵ n + j = 1 m ( ρ j 0 ρ j ) d n n ( G k X n β 0 ) ( I P X n ) ( G l X n β 0 ) + d n n ϵ n G k ( I P X n ) G l ϵ n + o p ( 1 ) .
When lim n d n = , we have ( d n / n ) Y n W k ( I P X n ) S ( ρ ) Y n = o p ( 1 ) and σ ˜ n 2 ( ρ ) = σ 0 2 + o p ( 1 ) uniformly on Γ . Then,
d n n 2 L n ( ρ ) ρ k ρ l = 1 σ 0 2 d n n ( G k X n β 0 ) ( I P X n ) ( G l X n β 0 ) + d n n ϵ n G k ( I P X n ) G l ϵ n d n n tr [ S ( ρ ) 1 W k S ( ρ ) 1 W l ] + o p ( 1 ) .
Under Assumption A6, ( d n / n ) tr [ G k ( ρ ) G i ( ρ ) G l ( ρ ) ] = O p ( 1 ) uniformly on Γ . By the Taylor expansion, we have
d n n 2 L n ( ρ ¯ ) ρ k ρ l 2 L n ( ρ 0 ) ρ k ρ l = d n n tr [ S ( ρ ) 1 W k S ( ρ ) 1 W l ] tr [ G k G l ] + o p ( 1 ) = d n n tr [ G k ( ρ ¯ n ) G l ( ρ ¯ n ) ] tr [ G k G l ] + o p ( 1 ) = d n n h ( ρ ¯ n ) ( ρ ¯ n ρ 0 ) + o p ( 1 ) P 0 ,
where the mean value theorem is used again, ρ ¯ is any consistent estimator to ρ 0 and the vector h ( ρ ) = ( h 1 ( ρ ) , , h m ( ρ ) ) with h i ( ρ ) = tr [ G i ( ρ ) G k ( ρ ) G l ( ρ ) + tr [ G k ( ρ ) G i ( ρ ) G l ( ρ ) ] .
Define
F k l ( ρ 0 ) = 1 σ 0 2 ( G k X n β 0 ) ( I P X n ) ( G l X n β 0 ) + ϵ n G k ( I P X n ) G l ϵ n tr ( G k G l ) .
Then,
E [ F k l ( ρ 0 ) ] = 1 σ 0 2 ( G k X n β 0 ) ( I P X n ) ( G l X n β 0 ) [ tr ( G k G l ) + tr ( G k G l ) ] + o p ( 1 ) .
Since,
d n n ϵ n G k ( I P X n ) G l ϵ n σ 0 2 tr ( G k ( I P X n ) G l ) = o p ( 1 ) ,
we find
d n n { F k l ( ρ 0 ) E [ F k l ( ρ 0 ) ] } = 1 σ 0 2 d n n [ ϵ n G k ( I P X n ) G l ϵ n σ 0 2 tr ( G k ( I P X n ) G l ) ] + o ( 1 ) = o p ( 1 )
and then
d n n 2 L n ( ρ 0 ) ρ k ρ l = d n n F k l ( ρ 0 ) + o p ( 1 ) = d n n E [ F k l ( ρ 0 ) ] + o p ( 1 ) = 1 σ 0 2 d n n ( G k X n β 0 ) ( I P X n ) ( G l X n β 0 ) d n n [ tr ( G k G l ) + tr ( G k G l ) ] + o p ( 1 ) b k l + o p ( 1 ) .
By Condition 7 or 8, the average Hessian matrix or information matrix under normality
Σ ρ = lim n E ρ d n n 2 L n ( ρ 0 ) ρ k ρ l = lim n ( b k l ) m × m > 0
is nonsingular.
Third step: Considering d n n L n ( ρ 0 ) ρ , we have
d n n L n ( ρ 0 ) ρ k = 1 σ ˜ 2 ( ρ 0 ) d n n ( G k X n β 0 ) ( I P X n ) ϵ n + ϵ n C k ( I P X n ) ϵ n .
Let η k = ( G k X n β 0 ) ( I P X n ) ϵ n and ζ k = ϵ n C k ( I P X n ) ϵ n . Our task is to find the mean and covariance of ( η 1 + ζ 1 , , η m + ζ m ) With routine calculations, we obtain
E ( η k ) = 0 and E ( ζ k ) = σ 0 2 tr [ C k ( I P X n ) ] = O p ( 1 )
and for any k , l = 1 , , m
Cov [ ( η k + ζ k ) ( η l + ζ l ) ] = E [ ( η k + ζ k ) ( η l + ζ l ) ] E ( η k + ζ k ) E ( η l + ζ l ) = E ( η k η l + η k ζ l + ζ k η l + ζ k ζ l ) E ( ζ k ) E ( ζ l )
with
E ( η k η l ) = E [ ( G k X n β 0 ) ( I P X n ) ϵ n ( G l X n β 0 ) ( I P X n ) ϵ n ] = σ 0 2 ( G k X n β 0 ) ( I P X n ) ( G l X n β 0 ) ,
E ( η k ζ l ) = E [ ( G k X n β 0 ) ( I P X n ) ϵ n ϵ n C l ( I P X n ) ϵ n ] = μ 3 ( G k X n β 0 ) ( I P X n ) c l
where c l is the vector whose elements are diagonal elements of C l ( I P X n ) , and
E ( ζ k ζ l ) = E [ ϵ n C k ( I P X n ) ϵ n ϵ n C l ( I P X n ) ϵ n ] = ( μ 4 3 σ 0 4 ) c k c l + σ 0 4 tr [ C k ( I P X n ) ] tr [ C l ( I P X n ) ] + tr [ C k ( I P X n ) C l ( I P X n ) ] + tr ( C l ( I P X n ) C k ] = ( μ 4 3 σ 0 4 ) c k c l + σ 0 4 tr [ C k ( I P X n ) ] tr [ C l ( I P X n ) ] + σ 0 4 tr ( ( I P X n ) C k ( I P X n ) C l ) + tr ( C k ( I P X n ) C l ) .
Therefore, we obtain
(i) The mean of d n n L n ( ρ 0 ) ρ is o p ( 1 ) , and
(ii) The covariance of d n n L n ( ρ 0 ) ρ is d n n σ ˜ n 4 ( ρ 0 ) e k l , where
e k l = σ 0 2 ( G k X n β 0 ) ( I P X n ) ( G l X n β 0 ) + μ 3 [ ( G k X n β 0 ) ( I P X n ) c l + ( G l X n β 0 ) ( I P X n ) c k ] + ( μ 4 3 σ 0 4 ) c k c l + σ 0 4 tr ( ( I P X n ) C k ( I P X n ) C l ) + tr ( C k ( I P X n ) C l ) .
The d n n L n ( ρ 0 ) ρ converges in distribution to N ( 0 , I ρ ) , where I ρ = lim n d n n σ ˜ n 4 ( ρ 0 ) e k l .
Fourth step: It follows from Slutsky’s theorem that
n d n ( ρ ^ n ρ 0 ) = d n n 2 log L n ( ρ 0 ) ρ ρ 1 d n n log L n ( ρ 0 ) ρ
converges in distribution to N ( 0 , Σ ρ 1 I ρ Σ ρ ^ 1 ) , implying that we completed the proof of the desired result. □
Proof (Proof of Theorem 6) .
To derive the limiting distribution of β ^ n with n / d n -rate of convergence, it follows from (15) that
n d n ( β ^ n β 0 ) = n d n ( X n X n ) 1 X n S ( ρ ^ ) Y n β 0 = 1 d n X n X n n 1 X n ϵ n n k = 1 m n d n ( ρ ^ k ρ k 0 ) X n X n n 1 X n ( G k X n β 0 ) n 1 n k = 1 m n d n ( ρ ^ k ρ k 0 ) X n X n n 1 X n G k ϵ n n = X n X n n 1 X n ( G 1 X n β 0 ) n X n ( G m X n β 0 ) n n d n ( ρ ^ n ρ 0 ) + O p ( d n 1 / 2 ) .
It is a linear combination function of random vector, n d n ( ρ ^ n ρ 0 ) , and higher-order infinitesimals. The limiting distribution (29) of β ^ n follows. When β 0 = 0 , we have
n β ^ n = n ( X n X n ) 1 X n S ( ρ ^ ) Y n = X n X n n 1 X n ϵ n n d n n k = 1 m n d n ( ρ ^ k ρ k 0 ) X n X n n 1 X n G k ϵ n n = X n X n n 1 X n ϵ n n + O p ( d n / n ) D N 0 , σ 0 2 lim 1 n X n X n .
To derive the limiting distribution of σ ^ n with n -rate of convergence, it follows from (16) that
n ( σ ˜ n 2 σ 0 2 ) = n 1 n Y n S ( ρ ^ n ) ( I P X n ) S ( ρ ^ n ) Y n σ 0 2 = 1 n ( X n β 0 + ϵ n ) ( I k = 1 m ( ρ ^ k ρ k 0 ) G k ) ( I P X n ) × ( I l = 1 m ( ρ ^ l ρ l 0 ) G l ) ( X n β 0 + ϵ n ) n σ 0 2 .
The detailed decomposition is as follows:
n ( σ ˜ n 2 σ 0 2 ) = 1 n ( ϵ n ϵ n n σ 0 2 ) + 1 n ϵ n P X n ϵ n 2 d n k = 1 m n d n ( ρ ^ k ρ k 0 ) d n n ϵ n G k ( I P X n ) ϵ n 2 n k = 1 m n d n ( ρ ^ k ρ k 0 ) d n n ( G k X n β 0 ) ( I P X n ) ϵ n + 1 n k , l = 1 m n d n ( ρ ^ k ρ k 0 ) n d n ( ρ ^ l ρ l 0 ) d n n ϵ n G k ( I P X n ) G l ϵ n + 2 n k , l = 1 m n d n ( ρ ^ k ρ k 0 ) n d n ( ρ ^ l ρ l 0 ) d n n ( G k X n β 0 ) ( I P X n ) G l ϵ n + 1 n k , l = 1 m n d n ( ρ ^ k ρ k 0 ) n d n ( ρ ^ l ρ l 0 ) d n n ( G k X n β 0 ) ( I P X n ) G l X n β 0 = i = 1 n ϵ i 2 σ 0 2 n + o p ( 1 ) D N ( 0 , μ 4 σ 0 4 ) .
Therefore, we have proven the desired results. □

Appendix B. Counterexample of Inconsistent QML Estimators

Let us consider the following example
W 2 n = 1 n 1 2 n 1 2 n I 2 n .
In this case, each individual can be influenced with exactly the same effect (absolute equality) by all neighbors and can influence all neighbors at an exactly the same effect. The number of neighbors is n 1 . Here, d 2 n = O ( n ) , which violates Assumption A3. Take
W 1 = 1 n ( 1 n 1 n I n ) 0 1 n 1 n 1 n 0 and W 2 = 0 1 n 1 n 1 n 0 1 n ( 1 n 1 n I n ) ,
which satisfy the homogeneous classification condition (5). For such a spatial weight matrices, W 1 and W 2 , we have
S ( ρ ) = I 2 n ρ 1 W 1 ρ 2 W 2 = n + ρ 1 n I ρ 1 n 1 n 1 n ρ 2 n 1 n 1 n ρ 1 n 1 n 1 n n + ρ 2 n I ρ 2 n 1 n 1 n B 11 B 12 B 21 B 22 .
Thus,
S ( ρ ) 1 = B 11 1 + B 11 1 B 12 B 22.1 1 B 21 B 11 1 B 11 1 B 12 B 22.1 1 B 22.1 1 B 21 B 11 1 B 22.1 1 ,
where
B 11 1 = n n + ρ 1 I n + ρ 1 n ( n + ρ 1 ) [ n ( 1 ρ 1 ) + ρ 1 ] 1 1 a 1 I n + a 2 1 1
with a 1 1 and n a 2 ρ 1 / ( 1 ρ 1 ) (assuming ρ 1 1 ),
B 22.1 = B 22 B 21 B 11 1 B 12 = n + ρ 2 n I n ρ 2 ( n + ρ 1 ) n [ n ( 1 ρ 1 ) + ρ 1 ] 1 1 ,
and
B 22.1 1 = n n + ρ 2 I n + ρ 2 ( n 2 + n ρ 1 ) ( n + ρ 2 ) [ n 2 ( 1 ρ 1 ρ 2 ) + n ( ρ 1 + ρ 2 2 ρ 1 ρ 2 ) + ρ 1 ρ 2 ] 1 1 b 1 I n + b 2 1 1
with b 1 1 and n b 2 ρ 2 / ( 1 ρ 1 ρ 2 ) (assuming ρ 1 + ρ 2 1 ). It follows that
G 1 = W 1 S ( ρ 0 ) 1 = 1 n ( 1 n 1 n I n ) [ B 11 1 + B 11 1 B 12 B 22.1 1 B 21 B 11 1 ] 1 n ( 1 n 1 n I n ) B 11 1 B 12 B 22.1 1 1 n 1 n 1 n [ B 11 1 + B 11 1 B 12 B 22.1 1 B 21 B 11 1 ] 1 n 1 n 1 n B 11 1 B 12 B 22.1 1
and
G 2 = W 2 S ( ρ 0 ) 1 = 1 n 1 n 1 n B 22.1 1 B 21 B 11 1 1 n 1 n 1 n B 22.1 1 1 n ( 1 n 1 n I n ) B 22.1 1 B 21 B 11 1 1 n ( 1 n 1 n I n ) B 22.1 1 .
Note that we have not made a distinction between ρ 0 or ρ in B’s in our discussion.

Appendix B.1. The Case of Violating Condition 1

As X 2 n includes two constant columns, X 0 = diag ( 1 n , 1 n ) , i.e., X 2 n = ( X 0 , X 1 ) , we have
P X 2 n = P X 0 + P M X 0 X 1 ,
where M X 0 = ( I P X 0 ) = diag ( I P 1 , I P 1 ) . Note that
I 2 n P X 2 n = I P X 0 P M X 0 X 1 = I M X 0 ( X 1 M X 0 X 1 ) 1 X 0 M X 0 .
Thus, with G k = W k S ( ρ 0 ) 1 , ( I P 1 ) B 11 1 B 12 B 22.1 1 = 0 and ( I P 1 ) B 22.1 1 B 12 B 11 1 = 0 , we obtain
( I P X 2 n ) G 1 X 2 n β 0 = I M X 0 ( X 1 M X 0 X 1 ) 1 X 0 a 1 n ( I P 1 ) 0 0 0 X 2 n β 0 = 0
and
( I P X 2 n ) G 2 X 2 n β 0 = I M X 0 ( X 1 M X 0 X 1 ) 1 X 0 0 0 0 b 1 n ( I P 1 ) X 2 n β 0 = 0 ,
implying that Condition 1 is violated. G X 2 n β 0 belongs to the column space of X 2 n or G X 2 n β 0 is multicollinear with X 2 n .
From (A11) and (A12), we find
G 2 = ρ 1 n ( a 1 + a 2 n ) ( b 1 + b 2 n ) 1 n 1 n 1 n ( b 1 + b 2 n ) 1 n 1 n ρ 1 ( n 1 ) n 2 ( a 1 + a 2 n ) ( b 1 + b 2 n ) 1 n 1 n 1 n [ ( b 1 + ( n 1 ) b 2 ) 1 n 1 n b 1 I n ]
with
tr ( G 2 ) = ρ 1 ( a 1 + a 2 n ) ( b 1 + n b 2 ) + ( n 1 ) b 2 ρ 1 + ρ 2 1 ρ 1 ρ 2 .
Furthermore, we obtain
G 2 G 2 = ρ 1 2 a + ρ 1 ( n 1 ) n a b 2 1 n 1 n n ρ 1 a + ( n 1 ) n b 2 1 n 1 n n n 1 n ρ 1 2 a + ρ 1 ( n 1 ) 2 n 2 a b 2 1 n 1 n n n 1 n ρ 1 a b 2 + ( b b 2 ) ( b b 2 2 b 1 / n ) 1 n 1 n n + b 1 2 n 2 I n
where a = a 1 + n a 2 1 / ( 1 ρ 1 ) and b = b 1 + n b 2 ( 1 ρ 1 ) / ( 1 ρ 1 ρ 2 ) , with
tr ( G 2 G 2 ) = ρ 1 2 a + 2 ρ 1 ( n 1 ) n a b 2 + ( b b 2 ) b b 2 2 b 1 n + b 1 2 n 1 ( 1 ρ 1 ρ 2 ) 2 ,
implying that Condition 4 is violated. Singularity of the information matrix occurs.
For simplicity, consider σ 0 2 = 1 as known. The log-likelihood function (10) is
log L 2 n ( ρ , β ) = n 2 log ( 2 π ) + log | S ( ρ ) | 1 2 ϵ 2 n ( ρ , β ) ϵ 2 n ( ρ , β ) .
Given ρ , the QML estimator of β ^ ( ρ ) is
β ^ ( ρ ) = ( X 2 n X 2 n ) 1 X 2 n S ( ρ ) Y 2 n
and the concentrated log-likelihood function of ρ is
log L 2 n ( ρ ) = n 2 log ( 2 π ) + log | S ( ρ ) | 1 2 Y 2 n S ( ρ ) ( I P X 2 n ) S ( ρ ) Y 2 n .
The first-order derivative of the likelihood function (A17) with respect to ρ k is
log L 2 n ( ρ ) ρ k = tr [ S ( ρ ) 1 W k ] + Y n S ( ρ ) ( I P X 2 n ) W k Y n = tr [ S ( ρ ) 1 W k ] + ( X 2 n β + ϵ 2 n ) [ I + ( ρ 01 ρ 1 ) G 1 + ( ρ 02 ρ 2 ) G 2 ] ( I P X 2 n ) G k ϵ 2 n = tr [ S ( ρ ) 1 W k ] + ϵ 2 n [ I + ( ρ 01 ρ 1 ) G 1 + ( ρ 02 ρ 2 ) G 2 ] ( I P X 2 n ) G k ϵ 2 n .
Thus,
log L 2 n ( ρ 0 ) ρ k = tr ( G k ) + ϵ 2 n ( I P X 2 n ) G k ϵ 2 n .
The second-order derivative of (A17) is
2 log L 2 n ( ρ ) 2 ρ k = tr [ S ( ρ ) 1 W k S ( ρ ) 1 W k ] ( X 2 n β + ϵ 2 n ) G k ( I P X 2 n ) G k ( X 2 n β + ϵ 2 n ) = tr [ S ( ρ ) 1 W k S ( ρ ) 1 W k ] ϵ 2 n G k ( I P X 2 n ) G k ϵ 2 n .
It follows from the mean value theorem that
ρ ^ k ρ 0 k = 2 log L 2 n ( ρ ¯ ) 2 ρ k 1 log L 2 n ( ρ 0 ) ρ k ,
where ρ ¯ k is between ρ ^ k and ρ 0 k .
Assume that ρ ^ k is consistent. If ρ ^ k is consistent, it means that ρ ^ k converges in probability to true value ρ 0 k , i.e., ρ ^ k P ρ 0 k . It follows from (A13), (A14), (A16), (A18) and (A19) that
log L 2 n ( ρ 0 ) ρ 2 = tr ( G 2 ) ( 1 ϵ n , ϵ n ) ( I P X 2 n ) 0 0 0 b 1 n 1 ϵ n ϵ n = tr ( G 2 ) b 1 n ϵ n ϵ n + b 1 n ϵ 2 n X 2 n 2 n 1 2 n X 2 n X 2 n 1 1 2 2 X n ϵ n n ρ 01 + ρ 02 1 ρ 01 ρ 02 1 = 1 1 ρ 01 ρ 02 ,
where ϵ 2 n = ( 1 ϵ n , ϵ n ) , X 2 n = ( 1 X n , 2 X n ) and Assumption A7 is used in dealing with the third item, and
2 log L 2 n ( ρ ) 2 ρ k = tr [ S ( ρ ) 1 W 2 S ( ρ ) 1 W 2 ] ϵ 2 n G 2 ( I P X 2 n ) G 2 ϵ 2 n = tr [ S ( ρ ) 1 W 2 S ( ρ ) 1 W 2 ] ( 1 ϵ n , ϵ n ) 0 0 0 b 1 n ( I P X 2 n ) 0 0 0 b 1 n 1 ϵ n ϵ n = tr [ S ( ρ ) 1 W 2 S ( ρ ) 1 W 2 ] b 1 2 n 2 0 , ϵ n ( I P X 2 n ) 0 ϵ n = tr [ S ( ρ ) 1 W 2 S ( ρ ) 1 W 2 ] b 1 2 n 2 ϵ n ϵ n + 2 b 1 2 n 2 ϵ n 2 X n n 1 2 n X 2 n X 2 n 1 2 X n ϵ n n P 1 ρ 1 + ρ 1 2 ( 1 ρ 1 ρ 2 ) 2 .
It follows that
ρ ^ 2 ρ 02 P ρ 1 + ρ 2 1 1 ρ 1 + ρ 1 2 0 .
There is a contradiction, implying that ρ ^ 2 is not consistent.

Appendix B.2. The Pure Heterogeneous SAR Process

For the pure heterogeneous SAR process (9), β 0 = 0 , implying that Condition 1 is violated. Note that Condition 4 has been violated, see Appendix B.1. The first-order derivative at ρ 0 reduces to
log L 2 n ( ρ 0 ) ρ 2 = tr ( G 2 ) + ϵ 2 n G 2 ϵ 2 n ,
following from (A13) that
ϵ 2 n G 2 ϵ 2 n = ρ 1 a 1 + n a 2 b 1 + n b 2 i = 1 n ϵ i n 2 + b 1 + n b 2 i = 1 n ϵ i n i = 1 n ϵ n + i n + ρ 1 ( n 1 ) n a 1 + n a 2 b 1 + n b 2 i = 1 n ϵ n + i n i = 1 n ϵ i n + b 1 + ( n 1 ) b 2 i = 1 n ϵ n + i n 2 b 1 n i , j = 1 n ϵ n + i 2 P ρ 01 1 ρ 01 ρ 02 ξ 1 2 + 1 1 ρ 1 ρ 02 ξ 1 ξ 2 + 1 ρ 01 1 ρ 01 ρ 02 ξ 2 2 1 ,
where ξ 1 and ξ 2 are two independent standard normal variables. Thus, from (A14), we find
log L 2 n ( ρ 0 ) ρ 2 D ρ 01 1 ρ 01 ρ 02 ξ 1 2 + 1 1 ρ 01 ρ 02 ( ξ 1 ξ 2 1 ) + 1 ρ 01 1 ρ 01 ρ 02 ξ 2 2 ,
where D denotes convergence in distribution. The second-order derivative reduces to
2 log L 2 n ( ρ ) 2 ρ 2 = tr [ S ( ρ ) 1 W 2 S ( ρ ) 1 W 2 ] ϵ 2 n G 2 G 2 ϵ 2 n ,
following from (A15) that
ϵ 2 n G 2 G 2 ϵ 2 n = ρ 1 2 a + ρ 1 ( n 1 ) n a b 2 i = 1 n ϵ i n 2 + ρ 1 a + ( n 1 ) n b 2 i = 1 n ϵ i n i = 1 n ϵ n + i n + n 1 n ρ 1 2 a + ρ 1 ( n 1 ) 2 n 2 a b 2 i = 1 n ϵ n + i n i = 1 n ϵ i n + n 1 n ρ 1 a b 2 + ( b b 2 ) ( b b 2 2 b 1 / n ) i = 1 n ϵ n + i n 2 + b 1 2 n i = 1 n ϵ n + i 2 n P ρ 01 ( 1 ρ 01 ρ 02 ) 2 ξ 1 2 + 1 ( 1 ρ 01 ρ 02 ) 2 ξ 1 ξ 2 + 1 ρ 01 ( 1 ρ 01 ρ 02 ) 2 ξ 2 2 .
Thus, with (A16), we obtain
2 log L 2 n ( ρ ) 2 ρ 2 D ρ 01 ( 1 ρ 01 ρ 02 ) 2 ξ 1 2 1 ( 1 ρ 01 ρ 02 ) 2 ( ξ 1 ξ 2 + 1 ) 1 ρ 01 ( 1 ρ 01 ρ 02 ) 2 ξ 2 2 .
It follows from (A20) that
ρ ^ 2 ρ 02 = 2 log L 2 n ( ρ ¯ ) 2 ρ 2 1 log L 2 n ( ρ 0 ) ρ 2 D ( 1 ρ 01 ρ 02 ) ρ 01 ξ 1 2 + ξ 1 ξ 2 1 + ( 1 ρ 01 ) ξ 2 2 ρ 01 ξ 1 2 + ξ 1 ξ 2 + 1 + ( 1 ρ 01 ) ξ 2 2 0 ,
which would be a contradiction as ρ ^ 2 ρ 02 would not have a degenerate distribution at the origin point. This contradiction tells us that the estimator ρ ^ 2 could not be a consistent estimator of ρ 02 .

References

  1. Bell, K.P.; Bockstael, N.E. Applying the generalized-moments estimation approach to spatial problems involving microlevel data. Rev. Econ. Stat. 2000, 82, 72–82. [Google Scholar] [CrossRef]
  2. Banerjee, S.; Gelfand, A.E.; Knight, J.R.; Sirmans, C.F. Spatial modeling of house prices using normalized distance-weighted sums of stationary processes. J. Bus. Econ. Stat. 2004, 22, 206–213. [Google Scholar] [CrossRef]
  3. Cliff, A.D.; Ord, J.K. Spatial Autocorrelation; Pion Ltd.: London, UK, 1973. [Google Scholar]
  4. Anselin, L. Spatial Econometrics: Methods and Models; Kluwer: Dordrecht, The Netherlands, 1988. [Google Scholar]
  5. Cressie, N. Statistics for Spatial Data; John Wiley: New York, NY, USA, 1993. [Google Scholar]
  6. Anselin, L.; Bera, A.K. Spatial dependence in linear regression models with an introduction to spatial econometrics. In Handbook of Applied Economics Statistics; Ullah, A., Giles, D.E.A., Eds.; Marcel Dekker: New York, NY, USA, 1998. [Google Scholar]
  7. Elhorst, J.P. Spatial Econometrics: From Cross Sectional Data to Spatial Panels; Springer: Heidelberg, Germany; New York, NY, USA; Dordrecht, The Netherlands; London, UK, 2014. [Google Scholar]
  8. Case, A.C. Spatial patterns in household demand. Econometrica 1991, 59, 953–965. [Google Scholar] [CrossRef]
  9. Case, A.C.; Rosen, H.S.; Hines, J.R. Budget spillovers and fiscal policy interdependence: Evidence from the states. J. Public Econ. 1993, 52, 285–307. [Google Scholar] [CrossRef]
  10. Besley, T.; Case, A. Incumbent behavior: Vote-seeking, tax-setting, and yard stick competition. Am. Econ. Rev. 1995, 85, 25–45. [Google Scholar]
  11. Brueckner, J.K. Testing for strategic interaction among local governments: The case of growth controls. J. Urban Econ. 1998, 44, 438–467. [Google Scholar] [CrossRef]
  12. Bertrand, M.; Luttmer, E.F.R.; Mullainathan, S. Network effects and welfare cultures. Q. J. Econ. 2000, 115, 1019–1055. [Google Scholar] [CrossRef]
  13. Topa, G. Social interactions, local spillovers and unemployment. Rev. Econ. Stud. 2001, 68, 261–295. [Google Scholar] [CrossRef]
  14. Coval, J.D.; MosKouitz, T.J. The geography of investment: Informed trading and asset prices. J. Political Econ. 2001, 109, 811–841. [Google Scholar] [CrossRef]
  15. Druska, V.; Horrace, W.C. Generalized moments estimation for spatial panel data: Indonesian rice farming. Am. J. Agric. Econ. 2004, 86, 185–198. [Google Scholar] [CrossRef]
  16. Frazier, C.; Kockelman, K.M. Spatial econometric models for panel data: Incorporating spatial and temporal data. Transp. Res. Rec. J. Transp. Res. Board 2005, 1902, 80–90. [Google Scholar] [CrossRef]
  17. Baltagi, B.; Li, D. Prediction in the panel data model with spatial correlation: The case of liquor. Spat. Econ. Anal. 2006, 1, 175–185. [Google Scholar] [CrossRef]
  18. Pirinsky, C.; Wang, Q. Does corporate headquarters location matter for stock returns? J. Financ. 2006, 61, 1991–2015. [Google Scholar] [CrossRef]
  19. Bekaert, G.; Hodrick, R.J.; Zhang, X.Y. International stock return comovements. J. Financ. 2009, 64, 2591–2626. [Google Scholar] [CrossRef]
  20. Robinson, P.M.; Rossi, F. Improved Lagrange multiplier tests in spatial autoregressions. Econ. J. 2014, 17, 139–154. [Google Scholar] [CrossRef]
  21. Liu, X.; Chen, J.; Cheng, S. A penalized quasi-maximum likelihood method for variable selection in the spatial autoregressive model. Spat. Stat. 2018, 25, 86–104. [Google Scholar] [CrossRef]
  22. Ord, K. Estimation methods for models of spatial Interaction. J. Am. Stat. Assoc. 1975, 70, 120–126. [Google Scholar] [CrossRef]
  23. Smirnov, O.; Anselin, L. Fast maximum likelihood estimation of very large spatial autoregressive models: A characteristic polynomial approach. Comput. Stat. Data Anal. 2001, 35, 301–319. [Google Scholar] [CrossRef]
  24. Robinson, P.M.; Rossi, F. Refinements in maximum likelihood inference on spatial autocorrelation in panel data. J. Econom. 2015, 189, 447–456. [Google Scholar] [CrossRef]
  25. Kelejian, H.H.; Prucha, I.R. A Generalized Moments Estimator for the Autoregressive Parameter in a Spatial Model. Int. Econ. Rev. 1999, 40, 509–533. [Google Scholar] [CrossRef]
  26. Lee, L.F.; Liu, X. Efficient GMM estimation of high order spatial autoregressive models with autoregressive disturbances. Econom. Theory 2010, 26, 187–230. [Google Scholar] [CrossRef] [Green Version]
  27. Lee, L.F. Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 2004, 72, 1899–1925. [Google Scholar] [CrossRef]
  28. Yu, J.; De Jong, R.; Lee, L.F. Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and T are large. J. Econom. 2008, 146, 118–134. [Google Scholar] [CrossRef]
  29. Song, Y.; Liang, X.; Zhu, Y.; Lin, L. Robust variable selection with exponential squared loss for the spatial autoregressive model. Comput. Stat. Data Anal. 2021, 155, 107094. [Google Scholar] [CrossRef]
  30. Ju, Y.; Yang, Y.; Hu, M.; Dai, L.; Wu, L. Bayesian Influence Analysis of the Skew-Normal Spatial Autoregression Models. Mathematics 2022, 10, 1306. [Google Scholar] [CrossRef]
  31. Ahrens, A.; Bhattacharjee, A. Two-step Lasso estimation of the spatial weights matrix. Econometrics 2015, 3, 128–155. [Google Scholar] [CrossRef]
  32. Lam, C.; Souza, P.C.L. Estimation and selection of spatial weight matrix in a spatial lag model. J. Bus. Econ. Stat. 2020, 38, 693–710. [Google Scholar] [CrossRef]
  33. Clark, A.E.; Loheac, Y. “It was not me, it was them!” Social influence in risky behavior by adolescents. J. Health Econ. 2007, 26, 763–784. [Google Scholar] [CrossRef]
  34. Mas, A.; Moretti, E. Peers at work. Am. Econ. Rev. 2009, 99, 112–145. [Google Scholar] [CrossRef]
  35. Banerjee, A.; Chandrasekhar, A.; Duflo, E.; Jackson, M. The diffusion of microfinance. Science 2013, 341, 1236498. [Google Scholar] [CrossRef]
  36. Dou, B.; Parrella, M.; Yao, Q. Generalized yule-walker estimation for spatio-temporal models with unknown diagonal coefficients. J. Econom. 2016, 194, 369–382. [Google Scholar] [CrossRef] [Green Version]
  37. Peng, S. Heterogeneous endogenous effects in networks. arXiv 2019, arXiv:1908.00663. [Google Scholar]
  38. Magnus, J.R.; Neudecker, H. Matrix Differential Calculus with Applications in Statistics and Econometrics; John Wiley & Sons: Hoboken, NJ, USA, 1991. [Google Scholar]
  39. Horn, R.; Johnson, C. Matrix Analysis; Cambridge University Press: New York, NY, USA, 1985. [Google Scholar]
  40. White, H. Estimation Inference and Specification Analysis; Cambridge University Press: New York, NY, USA, 1994. [Google Scholar]
  41. Kelejian, H.H.; Prucha, I.R. On the asymptotic distribution of the Moran I test statistic with applications. J. Econom. 2001, 104, 219–257. [Google Scholar] [CrossRef] [Green Version]
Table 1. Empirical ML estimates of θ and CP in Model 1 with true value θ 0 and n = n 1 n 2 .
Table 1. Empirical ML estimates of θ and CP in Model 1 with true value θ 0 and n = n 1 n 2 .
n 2 20 40 60
n 1 θ θ 0 θ ^ BiasRMSECP ρ ^ BIASRMSECP θ ^ BIASRMSECP
30 ρ 1 0.200.1739−0.02610.099891%0.1905−0.00950.064693%0.20550.00550.037893%
ρ 2 0.350.3363−0.01370.054792%0.35560.00560.043891%0.3383−0.01170.031292%
ρ 3 0.500.52390.02390.057992%0.4812−0.01880.044194%0.4882−0.01180.030695%
ρ 4 0.650.65410.00410.035293%0.65410.00410.031692%0.6425−0.00750.025494%
ρ 5 0.800.7935−0.00650.024093%0.7979−0.00210.026393%0.7989−0.00110.010594%
β 1 11.01520.01520.041396%1.02390.02390.045195%1.00810.00810.036993%
β 2 11.03160.03160.043694%0.9956−0.00440.037295%1.01290.01290.035896%
σ 2 11.01120.01120.097493%0.9852−0.01480.063794%1.00400.00400.051894%
50 ρ 1 0.200.1858−0.01420.047392%0.1982−0.00180.033992%0.1904-0.00960.028993%
ρ 2 0.350.3334−0.01660.037393%0.3391−0.01090.032095%0.3471−0.00290.024895%
ρ 3 0.500.4946−0.00540.025292%0.50010.00010.021497%0.4964−0.00360.012593%
ρ 4 0.650.67560.02560.024394%0.6416−0.00840.017695%0.6468−0.00320.014797%
ρ 5 0.800.7891−0.01090.011395%0.7999−0.00010.009792%0.7995−0.00050.007895%
β 1 11.00200.00200.039194%0.9986−0.00140.035893%1.00260.00260.032496%
β 2 11.02960.02960.034195%1.03190.03190.032195%1.02240.02240.026891%
σ 2 10.9898−0.01020.057991%1.00610.00610.045493%1.00700.00700.033392%
80 ρ 1 0.200.20560.00560.035392%0.20110.00110.029292%0.1998−0.00020.025893%
ρ 2 0.350.3371−0.01290.020694%0.35120.00120.026491%0.3489−0.00110.015695%
ρ 3 0.500.4975−0.00250.024191%0.50030.00030.018793%0.4996−0.00040.013994%
ρ 4 0.650.65470.00470.011692%0.65040.00040.015391%0.6497−0.00030.009492%
ρ 5 0.800.7956−0.00440.008294%0.7995−0.00050.008895%0.7998−0.00020.006491%
β 1 11.00330.00330.027497%0.9966−0.00340.023997%1.00210.00210.021193%
β 2 10.9876−0.01240.021696%0.9867−0.01330.021795%1.01090.01090.019496%
σ 2 10.9942−0.00580.032794%0.9951−0.00490.027693%0.9979−0.00210.024795%
Table 2. Empirical ML estimates of θ and CP in Model 2 with true value θ 0 and n = n 1 n 2 .
Table 2. Empirical ML estimates of θ and CP in Model 2 with true value θ 0 and n = n 1 n 2 .
n 2 20 40 60
n 1 θ θ 0 θ ^ BiasRMSECP ρ ^ BIASRMSECP θ ^ BIASRMSECP
30 ρ 1 0.200.22590.02590.097192%0.20890.00890.047793%0.20420.00420.032895%
ρ 2 0.350.3323−0.01770.058794%0.3495−0.00050.028395%0.35990.00990.028294%
ρ 3 0.500.4829−0.01710.054793%0.4926−0.00740.025691%0.50260.00260.024692%
ρ 4 0.650.6459−0.00410.034491%0.6400−0.01000.024694%0.6443−0.00570.019491%
ρ 5 0.800.7988−0.00120.014095%0.7985−0.00150.010292%0.7973−0.00270.009592%
β 1 10.9944−0.00560.043096%0.9984−0.00160.040893%1.00130.00130.024593%
β 2 10.9983−0.00170.036394%1.00220.00220.031491%1.00090.00090.021596%
σ 2 10.9771−0.02290.092593%0.9818−0.01820.064094%1.00150.00150.054491%
50 ρ 1 0.200.20780.00780.043393%0.20450.00450.032797%0.1939−0.00610.026693%
ρ 2 0.350.35570.00570.031692%0.3403−0.00970.032094%0.3467−0.00330.022895%
ρ 3 0.500.50370.00370.025294%0.4981−0.00190.015493%0.4944−0.00560.014596%
ρ 4 0.650.6445−0.00550.017190%0.65300.00300.007694%0.65050.00050.014794%
ρ 5 0.800.7997−0.00030.009692%0.80190.00190.010795%0.7996−0.00040.009095%
β 1 10.9958−0.00420.030694%0.9986−0.00140.024891%0.9988−0.00120.017196%
β 2 10.9997−0.00030.031495%1.00040.00040.021298%0.9997−0.00030.019192%
σ 2 10.9869−0.01310.051691%0.9863−0.01370.043393%1.00050.00050.033095%
80 ρ 1 0.200.1947−0.00530.033891%0.1994−0.00060.028495%0.1927−0.00730.024492%
ρ 2 0.350.35050.00050.016892%0.35010.00010.014991%0.35320.00320.015592%
ρ 3 0.500.50210.00210.019595%0.50080.00080.017096%0.50270.00270.011795%
ρ 4 0.650.6453−0.00470.010196%0.6469−0.00310.013297%0.6497-0.00030.008291%
ρ 5 0.800.7996−0.00040.005993%0.7997−0.00030.004893%0.80210.00210.005694%
β 1 11.01040.01040.026792%1.00080.00080.016591%1.00270.00270.011596%
β 2 10.9996−0.00040.020393%0.9997−0.00030.017690%1.0000≤0.00010.012494%
σ 2 11.00320.00320.033193%1.00070.00070.028393%0.9976−0.00240.024391%
Table 3. Empirical ML estimates of θ and CP in Model 3 with true value θ 0 and n = n 1 n 2 .
Table 3. Empirical ML estimates of θ and CP in Model 3 with true value θ 0 and n = n 1 n 2 .
n 2 20 40 60
n 1 θ θ 0 θ ^ BIASRMSECP θ ^ BIASRMSECP θ ^ BIASRMSECP
30 ρ 1 0.200.1987−0.00130.037393%0.20630.00630.030695%0.1978−0.00220.024793%
ρ 2 0.350.3451−0.00490.031792%0.35040.00040.022994%0.35080.00080.014796%
ρ 3 0.500.4992−0.00080.030192%0.4874−0.01260.023491%0.4991−0.00090.017495%
ρ 4 0.650.6491−0.00090.021093%0.65550.00550.018194%0.65250.00250.014394%
ρ 5 0.800.7915−0.00850.018992%0.7975−0.00250.013693%0.7996−0.00040.005095%
β 1 10.9862−0.01380.028494%1.0000−0.00000.018796%0.9989−0.00110.019196%
β 2 11.01600.01600.026393%0.9988−0.00120.019492%1.00090.00090.018593%
σ 2 10.9534−0.04660.077794%0.9853−0.01470.052093%1.01050.01050.039695%
50 ρ 1 0.200.1881−0.01190.029191%0.20090.00090.024894%0.1978−0.00220.019093%
ρ 2 0.350.35350.00350.021394%0.3461−0.00390.019292%0.35160.00160.015893%
ρ 3 0.500.4984−0.00160.022492%0.4963−0.00370.016994%0.4968−0.00320.010596%
ρ 4 0.650.6461−0.00390.014095%0.6451−0.00490.014993%0.65090.00090.009792%
ρ 5 0.800.7999−0.00010.007896%0.7980−0.00200.006692%0.80060.00060.002994%
β 1 11.00390.00390.022193%1.00480.00480.019894%1.00060.00060.012293%
β 2 10.9965−0.00350.022294%0.9957−0.00430.019591%0.9993−0.00070.011993%
σ 2 10.9925−0.00750.047795%0.9913−0.00870.035492%1.00790.00790.027993%
80 ρ 1 0.200.1968−0.00320.020495%0.1988−0.00120.019592%0.20190.00190.015392%
ρ 2 0.350.35550.00550.018994%0.3486−0.00140.013995%0.35070.00070.006392%
ρ 3 0.500.50020.00020.016493%0.4991−0.00090.011593%0.50080.00080.004195%
ρ 4 0.650.65150.00150.008594%0.65070.00070.005296%0.65100.00100.007391%
ρ 5 0.800.7982−0.00180.011596%0.80090.00090.004197%0.7999−0.00010.002794%
β 1 10.9962−0.00380.017494%0.9972−0.00280.012493%0.9965−0.00350.007996%
β 2 11.00420.00420.017293%1.00250.00250.012192%1.00360.00360.007694%
σ 2 10.9891−0.01090.028192%0.9930−0.00700.021294%1.00230.00230.018595%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Qiu, F.; Ding, H.; Hu, J. Asymptotic Properties of Quasi-Maximum Likelihood Estimators for Heterogeneous Spatial Autoregressive Models. Symmetry 2022, 14, 1894. https://doi.org/10.3390/sym14091894

AMA Style

Qiu F, Ding H, Hu J. Asymptotic Properties of Quasi-Maximum Likelihood Estimators for Heterogeneous Spatial Autoregressive Models. Symmetry. 2022; 14(9):1894. https://doi.org/10.3390/sym14091894

Chicago/Turabian Style

Qiu, Feng, Hao Ding, and Jianhua Hu. 2022. "Asymptotic Properties of Quasi-Maximum Likelihood Estimators for Heterogeneous Spatial Autoregressive Models" Symmetry 14, no. 9: 1894. https://doi.org/10.3390/sym14091894

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop