Abstract
In this paper, we address a class of heterogeneous spatial autoregressive models with all spatial coefficients taking m distinct true values, where m is independent of the sample size n, and we establish asymptotic properties of the maximum likelihood estimator and the quasi-maximum likelihood estimator for all parameters in the class of models, extending Lee’s work (2004). The rates of convergence of those estimators depend on the features of values taken by elements of the spatial weights matrix in this model. Under the situations where, based on the values of the weights, each individual will not only influence a few neighbors but also be influenced by only a few neighbors, the estimator can enjoy an -rate of convergence and be asymptotically normal. However, when each individual can influence many neighbors or can be influenced by many neighbors and their number does not exceed , singularity of the information matrix may occur, and various components of the estimators may have different (usually lower than ) rates of convergence. An inconsistent estimator is provided if some important assumptions are violated. Finally, simulation studies demonstrate that the finite sample performances of maximum likelihood estimators are good.
1. Introduction
Spatial econometrics consists of econometric techniques dealing with empirical economic problems caused by spatial (cross sectional) interaction between spatial individuals. The dependence across spatial individuals is an interaction issue in urban, real estate, regional, public, agricultural, environmental economics, finance and industrial organizations. In many economic applications, modeling the interaction is essential for understanding market competition. One typical example is the commodity market where commodity buyers compare commodities of all sizes, with all kind of options and in all locations as well as their prices before purchasing.
In this example, the market demand for a commodity is a function of the prices and characteristics of all commodities. Specifically, let and denote the observed and unobserved characteristics of commodity i, respectively, and let denote its price. The market demand for commodity () is a function of , , where n denotes the total number of commodities in the market. Setting the market demand for commodity equal to its market supply and solving for , we obtain the following partial equilibrium price:
The marginal effect of with respect to measures the price competition from commodity j, while the marginal effect with respect to measures the quality competition. To aid exposition, we shall call the dependence of the partial equilibrium price of commodity i on other prices as the direct spatial interaction and call dependence on other commodity characteristics ( for all ) as the indirect spatial interaction. The spatial interaction is the sum of the direct and indirect spatial interaction. The spatial interaction is often ignored in the existing hedonic commodity price regression literature. One argument for the omission is that researchers are interested only in predicting commodity prices, not in understanding market competition. If prediction is the goal, then we should solve (1) for all prices to obtain the general equilibrium prices:
Clearly, the general equilibrium prices may contain the indirect spatial interaction terms; neglecting these terms in the hedonic commodity price regression could result in omitted variable bias and poor price prediction. Bell and Bockstael [] and Banerjee et al. [] partially addressed the omitted variable problem by adding the spatial error correlation to the hedonic commodity price regression. Their models yield consistent parameter estimates and prediction if and only if the other omitted interaction terms are uncorrelated with its own characteristics.
To capture spatial interaction, the approaches in spatial econometrics are to impose structures on a model. A well-known structure is the spatial autoregressive (SAR) studied by Cliff and Ord [] who extended autocorrelation in time series to spatial dimensions. The SAR model with common regressors is expressed as
where is the value of the equilibrium price for the ith spatial individual, is the nonstochastic spatial weights from j to i with for all i, is the single spatial coefficient, is the value of the kth regressor among p common regressors for the ith spatial individual, is the regression coefficient of the kth regressor for all spatial individuals. Assume that the disturbances or errors across are independent and identically distributed (i.i.d.) with mean zero and variance .
The spatial aspect of a SAR model has the distinguishing feature of simultaneity in econometric equilibrium models. Sustainable developments in testing and estimation of SAR models have been summarized in substantial literature, e.g., Cliff and Ord [], Anselin [], Cressie [], Anselin and Bera [], and Elhorst [] among others. Recent empirical applications of the SAR model in main stream economics journals include Case [], Case et al. [], Besley and Case [], Brueckner [], Bell and Bockstael [], Bertrand et al. [], Topa [], Coval and MosKouitz [], Druska and Horrace [], Frazier and Kockelman [], Baltagi and Li [], Pirinsky and Wang [], Bekaert et al. [], Robinson and Rossi [], Liu et al. [], and among others.
The SAR models can be estimated by the method of maximum likelihood (ML), e.g., Ord [], Smirnov and Anselin [], and Robinson and Rossi [], methods of moments, e.g., Kelejian and Prucha [], and Lee and Liu [], as well as the method of quasi-maximum likelihood (QML), e.g., Lee [] and Yu et al. [].
In recent years, many authors have considered the problem of spatial autoregressive models. Liu et al. [] developed a penalized quasimaximum likelihood method for simultaneous model selection and parameter estimation in the SAR models with independent and identical distributed errors. Song et al. [] proposed a variable selection method based on exponential squared loss for the SAR models. Ju et al. [] developed Bayesian influence analysis for skew-normal spatial autoregression models (SSARMs). Our argument will focus on the more general specification of linear structure of the spatial interaction.
Let us carefully look up model (3). The single spatial coefficient in (3) together with the nonnegative spatial weights implies a unidirectional effect. Specifically, if (<0), then those spatial individuals with positive spatial weights ( for ) all have positive (negative) effects on individual i.
In applications, however, it is possible that some spatial individuals have positive effects while other individuals have negative effects on individual i. For instance, in regional economics applications, region j may serve as a supply hub for region i and could have a positive effect on region i’s economy, while region k competes (e.g., an alternative competition) against region i and could have a negative effect on region i. Specification of the linear structure of spatial interaction in (3) rules out this type of bidirectional effect.
In addition, the spatial weights in (3) are often constructed to be symmetric (i.e., ), implying that the effect of individual i on individual j is identical to the effect of individual j on individual i. In applications, it is possible to have asymmetric effects. Again, in regional economics applications, economically strong region i is likely to have a bigger impact on economically weak region j than the economically weak region on the economically strong region. This type of asymmetric effects are not permitted by (3). One could argue that the asymmetric effects can be modeled through construction of asymmetric spatial weights. As we know, there is hardly any theoretical guidance on the construction of those asymmetric weights and the construction of weights itself is not entirely undisputed. About weights matrix, an interested reader can be referred to Ahrens and Bhattacharjee [], and Lam and Souza [].
This type of individual-specific endogenous effects is universal in real society. In many applications, individuals have different impacts on neighbors’ behaviors. For example, Clark and Loheac [] designed a special (classic) spatial panel model to show that popular teenagers in a school have much stronger influences on their classmates’ smoking decisions than their less popular peers, Mas and Moretti [] applied the expected utility principle to find that the magnitude of spillovers varies dramatically among workers with different skill levels, Banerjee et al. [] used a model of word-of-mouth diffusion to investigate that individuals directly connected with some village leaders are more likely to join the micro-finance program than those connected to someone else.
To better capture the spatial interaction, a spatial autoregressive model with a general linear specification that includes all spatial interaction terms should be expressed as
where is the spatial coefficient representing the effect of individual j on individual i. If the true values of the spatial coefficients satisfy for all i and j, the most general SAR model (4) is reduced to the famous model (3). If for some , then the effect of individual j on individual i is not equal to the effect of individual i on individual j even if , and if the effects of individual i on j and individual j on i are bidirectional (in the opposite direction).
Clearly the most general model (4) is flexible enough to permit both asymmetric, individual-specific endogenous and bidirectional effects. Despite of these advantages, this model is not identified since there are spatial coefficients, increasing as n increases. Some restrictions must be placed on those coefficients.
Homogeneous classification of spatial coefficients is an available method. We classify the spatial coefficients into m subgroups where m is independent of n and spatial coefficients in a subgroup take a same value. Two approaches, data-driven selection and economic geographic attributes, can be adopted to realize this type of homogeneous classification. In regional economics applications, many regions belonging to an upper administrative unit is one example of economic geographic attributes. In this example, the number of upper administrative units, m, is regarded as being independent of n if it is much smaller than the number of regions, n. In one word, we are restricting the true values of the spatial coefficients in model (4) to a set of m finite distinct values.
Consequently, the specified spatial weight matrix W of order n can be divided into m nonzero spatial weight matrices, , of order n, satisfying the homogeneous classification condition
where, for any nonzero component of W, there is a unique such that and for with being the th component of , and for , then all for all k. The weights w’s may be selected based on potential specifications, such as physical distance, social networks, or “economic” quantities among variables, see Case et al. [], or based on a best combination of these specifications, see Lam and Souza [].
With the homogeneous classification (5), a heterogenous spatial autoregressive model with a linear specification having m distinct spatial coefficients is postulated as
Dou et al. [] considered a class of pure spatio-temporal models without regressors by classifying the spatial coefficients based on rows of weight matrix. Their method, however, does not work for the similar models with regressors. Peng [] considered a SAR model in network by classifying the spatial coefficients based on columns of weight matrix. Peng’s work need sparsity of spatial coefficients.
In this paper, we investigate asymptotic properties of the maximum likelihood (ML) estimator and the quasi-maximum likelihood (QML) estimator for the heterogeneous SAR model (6) under the normal distributional specification, extending the results for the SAR model (3) investigated in Lee [] to the heterogeneous SAR model (6). The QML estimator is appropriate when the estimator is derived from a normal likelihood but the disturbances in the model are not truly normally distributed, e.g., see Lee [].
In the existing literature, the ML estimator of such a model is implicitly regarded as having the familiar -rate of convergence as a usual ML estimator for a parametric statistical model with sample size n, e.g., see the reviews by Anselin [] and Anselin and Bera []. Lee [] provided a broader view of the asymptotic property of the ML and QML estimators and shown that the rates of convergence of the ML and QML estimators depend on some general features of the spatial weights matrix W of the model (3). This paper will extend Lee’s works to the model (6), aiming at providing a similar view of the asymptotic property and the rates of convergence of the ML and QML estimators in the model (6) under the different scenarios.
The remainder of this paper is organized as follows. Section 2 provides an estimation procedure to find the ML and QML estimators of parameters in the novel heterogeneous spatial autoregressive model and specify regularity conditions to the model (6). In Section 3, we show that identification of parameters can be assured if there is no multicollinearity among the regressors and m spatially generated regressors. The ML and QML estimators can be -consistent and asymptotically normal (Theorem 3) under some regularity conditions on the spatial weights matrix.
Section 4 considers the spatial scenarios in the space zone between asymptotic normality and inconsistency. This spatial scenarios occur when each individual can be influenced by many neighbors or can influence many neighbors, in which singularity or irregularity of the information matrix may occur and various components of the QML estimators may have different rates of convergence. This includes the ML estimator and QML estimator for the (pure) heterogeneous SAR process. This section also considers the event of multicollinearity where the m spatially generated regressor is collinear with the original regressors. A counterexamples of two distinct spatial coefficients is given to provide an inconsistent QML estimator when multicollinearity occurs.
In Section 5, we conduct finite sample simulation studies for the spatial coefficients heterogeneous SAR models with 5 distinct spatial coefficients. Section 6 provides the brief concluding remarks. The proofs of all main theorems are collected in Appendix A and counterexample of inconsistent QML estimators are provided in Appendix B for this article.
2. Heterogeneous SAR Model and QML Estimators
The heterogeneous SAR model with m distinct spatial coefficients and p common regressors is written in a matrix version as
where n is the total number of spatial units, is the n-dimensional equilibrium (price) vector, is an matrix of constant (spatial varying) regressors, is the spatial coefficient vector, is the regression coefficients (may include the intercept), is an n-dimensional vector of independently and identically distributed (i.i.d.) disturbances with zero mean and variance .
We denote to be the true value of . Let for any spatial parametric vector , where I is the identity matrix. The equilibrium (price) vector is expressed as
where is assumed to be nonsingular. When there are no regressors in the model (7), it becomes a pure heterogeneous SAR process:
implying that the equilibrium (price) vector is simply derived from the disturbance vector .
We use to denote and to denote . The log-likelihood function of in (7) is
The extremum estimator derived from the maximization of (10) is written as
where takes values in the set of admissible values. When the (i.i.d.) disturbances in the model (7) are normally distributed by , the extremum estimator is the maximum likelihood (ML) estimator. When the (i.i.d.) disturbances in the model are not truly normally distributed, the extremum estimator derived from a normal likelihood is called the quasi-maximum likelihood (QML) estimator.
The first-order partial derivatives of function (10) with respect to , and as follows:
where is the trace of A and the formula is used, see Magnus and Neudecker [],
and
In order to prove consistency and asymptotic normality, we usual adopt the following solution procedure. The concentrated log-likelihood function of is defined as
Letting (13) be zero obtains the QML estimator of for fixed
and letting (14) be zero gives the QML estimator of for fixed
where is the orthogonal projection to the column space of . Then, the concentrated log likelihood function of is
Maximizing the concentrated likelihood (17) obtains the QML estimator of
where takes values in the set of admissible values. The procedure of calculating the QML estimator can be realized by Newton–Raphson method (Ord []) and R package rgenoud. Thus, the QML estimators of and are expressed, respectively, as
finally obtaining QML estimator in (11).
After finding the QML estimator , we focus our attention on how to investigate its consistency and asymptotic normality. Similar to Lee [], we first introduce some basic regularity conditions for our heterogeneous SAR model to provide a rigorous analysis of the QML estimators. Additional regularity conditions will be subsequently added.
Assumption A1.
The disturbances, , of are i.i.d. with mean zero and variance . Their moments, for some uniformly exist.
Assumption A2.
The elements of W are , at most of order , uniformly for all , where the rate sequence is bounded or divergent, as n tends to infinite. As a normalization, for all i.
Assumption A3.
The ratio as n goes to infinity.
Assumption A4.
The matrix is nonsingular.
This tells us that the parametric function is nonsingular at point . Since is continuous, is nonsingular for in a neighborhood of . Alternatively, is said to take values in the set of admissible values.
Assumption A5.
The weight matrix W and are uniformly bounded in both row and column sums as n goes to infinity.
It is also assumption in Horn and Johnson []. It follows from (5) that all are uniformly bounded in both row and column sums. Since usual , putting Assumptions A1 and A5 together implies that there do not exist elements of each row or each column whose number reaches to .
Assumption A6.
are uniformly bounded in either row or column sums, uniformly in ρ in a compact parameter space Γ with being in the interior of Γ.
Assumption A7.
The elements of are uniformly bounded constants for all n and assume that
3. Asymptotic Properties of QML Estimators
Let , then . From (8), the reduced form equation of can be represented as
Condition 1.
Assume that
Condition 2.
Assuming
Condition 2 says that , and in (18), are not asymptotically multicollinear. It is a sufficient condition for global identification of .
Lemma 1.
Under Assumptions A1–A6, Condition 2 is true if and only if Assumption A7 and Condition 1 both are true.
3.1. Consistency
Corresponding to the concentrated log likelihood function, we define its expectation as
Hence,
Let is the density function of a random vector following the multivariate normal . When , if the determinant of covariance is not equal to the determinant of covariance , then the probability of the set is not zero. Similarly, for any n, when , the probability of the set is not zero.
Condition 3.
When is a bound sequence, for ,
After all above-mentioned preparations, consistency of the QML estimator follows.
Theorem 1.
Under Assumptions A1–A7 with a bound sequence , given Condition 1 or 3, then is globally identifiable and is a consistent estimator of .
Proof.
See Appendix A. □
Our argument process for consistency follows from Theorem 3.4 of White (1994).
3.2. Asymptotic Normality
In this subsection, we derive the asymptotic distribution of the QML estimator . We start from the optimal condition
Let us consider the likelihood score vector
It is easy to see that from (12)–(14). Then, the covariance or Fisher information matrix of U is
which can be decomposed into
where
is called the sample average Hessian matrix. The following discussion aims at calculation of and .
By calculating the expectation of the above second derivatives (24) and (25) at , we obtain the following lemma.
Lemma 2.
The sample average Hessian matrix is given by
with
Proof.
See Appendix A. □
Condition 4.
Assume that
where and vec is a vectorization operator.
Condition 2 may be true only under the scenario that is a bounded sequence because for any . The following is a sufficient and necessary condition for a nonsingular average Hessian matrix .
Theorem 2.
Under Assumptions A1–A7, the average Hessian matrix is nonsingular if and only if either of Conditions 1 and 4 holds.
Proof.
See Appendix A. □
For a divergent sequence , Condition 4 is violated. In this case, is nonsingular if and only if Condition 1 holds. Set at , and with being the th component of . Calculation of is summarized in the following lemma.
Lemma 3.
The difference between the information matrix and the sample average Hessian matrix is given by
where
Proof.
See Appendix A. □
If is normally distributed, then and , implying that . Thus, the sample average Hessian matrix is the covariance of U or the information matrix . In the sense, can be said to be an information matrix. Finally, with the above long and necessary preparations, the asymptotic distribution of the QML estimator is summarized in the following theorem.
Theorem 3.
Under Assumptions A1–A7 and either of Conditions 1 and 4, converges in distribution to the multivariate normal , where
Moreover, particularly if is normally distributed, then converges in distribution to the multivariate normal .
Proof.
See Appendix A. □
The estimation of the asymptotic covariance of is a routine issue. The is estimated by
The is estimated by in (27). For the QML estimator, the extra moments and in (27) can be estimated by the third and fourth order empirical moments based on estimated residuals of the ’s.
Remark 1.
The asymptotic results in Theorems 1 and 3 are valid regardedless of whether the series is a bounded or divergent sequence.
and
Theoretically, the presence of and the linear independence of and are the crucial conditions for the asymptotic results in Theorem 3, in particular, the -rate of convergence of . Practically, and are not (asymptotically) multicollinear to guarantee consistency of .
Remark 2.
When the disturbances ϵ’s are normally distributed, is the ML estimator. The ML estimators, and , still have independent as the same as that of linear regression analysis regardedless of whether the series is a bounded or divergent sequence. However, the dependence between ML estimator and relies on whether the series is a bounded or divergent sequence. When the series is bounded, there is some such that the ML estimator and will be asymptotically dependent, see (16), because is finite and may not be zero. Anselin and Bera (1998) discussed the implication of this dependence on statistical inference problems for the case of . We also see that, for the case in which the series is a divergent sequence, for all k, the QML estimator and are asymptotically independent.
Remark 3.
The requirements in Conditions 1 and 2 are for all spatial coefficients. Sometime, it is possible these requirements to be satisfied only by partial spatial coefficients. Write and , where and are -dimensional while and are -dimensional. Conditions 1 and 2 hold only for partial spatial coefficients , i.e., without loss of generality,
Condition 1’. The .
Condition 4’. The .
After using Condition 1’ and 4’ to replace Conditions 1 and 4, consistency in Theorem 1 and asymptotic normality in Theorem 3 hold for , at .
4. Asymptotic Normality with Non-Square-Root Rates
Consider the case of . It follows from Theorem 2 that the average Hessian matrix is singular under Condition 5 and 6.
Condition 5.
For any , .
Condition 6.
For any , the .
For example, for the pure SAR process (10) with , as , . There are other cases in which the singularity occurs, see Remark 4. However, it is easy to see that there is a space zone between Theorem 3 and inconsistency. In this section, we will investigate some new condition of guaranteeing consistency with non-square-root rates or asymptotic normality in the situation under Condition 5 and 6.
Lee [] suggested that the singularity of the average Hessian matrix or the information matrix under normal disturbances has implications on the rate of convergence of the estimators. When , is rather flat with respect to and the convergence of to zero is too fast to be useful. A properly adjusted rate is made such that
We consider the following two new conditions.
Condition 7.
The is a divergent sequence, elements of for have the uniform order , and
Condition 7 modifies Condition 1 with the factor to account for the proper rate of convergence. It is a generalization of Condition 1.
Condition 8.
The is a divergent sequence and
Condition 8 modifies Condition 3 with the factor to account for the proper rate of convergence. It is a generalization of Condition 3.
Theorem 4.
Under Assumptions A1–A7 and Conditions 5 and 6, if either of Conditions 7 and 8 holds, then the QML estimator derived from the maximization of in (17) is a consistent estimator.
Proof.
See Appendix A. □
The central limit theorem for a linear–quadratic form implies that is asymptotically normal. The asymptotic distribution of follows from
Theorem 5.
Under Assumptions A1–A7, and Condition 7 or 8,
where
with
and
with
Proof.
See Appendix A. □
After finding the limiting distribution of , the limiting distributions of and defined in (15) and (16) are inevitable consequences.
Theorem 6.
Under Assumptions A1–A7, and Condition 7 or 8,
where , and
In particular, when ,
Proof.
See Appendix A. □
The asymptotic distribution of has the -rate of convergence in Theorem 5. As is divergent, this rate of convergence is lower than . The asymptotic distribution of the QML estimator and its low rate of convergence in Theorem 6 are determined by the asymptotic distribution of that forms the leading term in the asymptotic expansion (29). When , this leading term vanishes and converges to with the usual -rate. The asymptotic distribution of keeps the usual -rate of convergence.
The rate of convergence of may be changed when all items, , …, , may vanish asymptotically and simultaneously. However, the exact rate of convergence will depend on how fast these items will vanish in the limit. This asymptotical and simultaneous evanescence of all items may result in some components of have -rate of convergence while others have lower rate of convergence. An interested reader can refer to Lee [] for the details to the case of .
Remark 4.
The set of the vectors and the regressor matrix can be linearly dependent under some circumstances.
Circumstance 1: The regression coefficient is a zero vector. The heterogeneous SAR model (7) reduces to the pure heterogeneous SAR model (9). Thus, the vectors, , all become zero vectors. Condition 2 is violated because of , resulting in that the set of the vectors and the columns of regressor matrix can be linearly dependent. Condition 1 is also violated.
Circumstance 2: For a specific W and , , see the counterexample discussed in next subsection. In this case, Conditions 1 and 4 both are violated.
Circumstance 3: When with a p-dimensional vector, . Then, for any k. Condition 1 is violated.
In the above circumstances, if Condition 4 is also violated, for example, is a divergent sequence, consistency may be violated, implying that asymptotic normality is also violated. Supplementary material provides a counterexample leading to an inconsistent QML estimators.
5. Simulation Studies
In this simulation study section, we estimate the parameters in the heterogenous spatial heterogenous spatial autoregressive model (7) with the given values of the parameters. To investigate the finite sample properties of the QMLE by a Monte Carlo study with an appropriate spatial matrix W, which satisfies the assumptions in Section 2, we focus on the same spatial scenario as those investigated by Case [] and Lee [].
Specifically, = , where and is an -dimensional column vector of ones. There is districts and members in each district with each neighbor of a member in a district being given equal weight. For this spatial scenario, = , and hence = . If both and increase to infinite, then goes to infinity and goes to zero as n tends to infinity. Let us consider the data sets generated from the following model
where are the uniform column segmentation of spatial matrix , namely, is the first column of , …, and is the last column of such that , the spatial coefficient vector is set to and follows .
We consider three different settings for models with three different types of regressors without the intercept term:
- Model 1: Set . The n-dimensional regressor vector is generated by the i.i.d. standard normal and is generated by the i.i.d t-distribution .
- Model 2: The setting is the same as Model 1. Additionally, the correlation coefficient of the first regressor and the second regressor is .
- Model 3: Set , , be generated by the standard normal and be generated by the t-distribution . The first regressor of the ith member in the district l is generated as , where are i.i.d. for all i and l and are independent of , and the second regressor is generated as where are i.i.d for all i and l and are independent of . The correlation coefficient of the two regressors is . This specification implies that the average value of and of the district l will converge in probability to and as goes to infinity.
The statistical R language is used in simulation studies. For each model, there are 400 repetitions. The empirical mean, bias, empirical root mean square error (RMSE), and coverage probability of confidence interval (CP) for are reported, respectively, in Table 1, Table 2 and Table 3.
Table 1.
Empirical ML estimates of and CP in Model 1 with true value and .
Table 2.
Empirical ML estimates of and CP in Model 2 with true value and .
Table 3.
Empirical ML estimates of and CP in Model 3 with true value and .
We experimented with three different values of at 30, 50 and 80 and three different values of at 20, 40 and 60, respectively, resulting in n taking values of 600, 1000, 1200, 1600, 1800, 2000, 3000, 3200 and 4800. For a fixed , the biases and RMSEs of , and decrease as increases. On the other hand, for a fixed , the biases and RMSEs of , and decrease as increases. The estimators of , and in Model 1 and Model 2 have similar bias and RMSE, while the estimator of , and in Model 3 has the lowest bias and RMSE in all three models.
From the three tables above, we conclude that regressor vectors whose elements are correlated (see Model 3) makes the parameter estimator have lower bias and RMSE than those in Model 1, and this improvement of the estimator is not affected by the correlated regressors because the regressors of both Model 2 and Model 3 are correlated. The coverage probability of confidence interval (CP) of , and are more closer to as n increases.
6. Concluding Remarks
In this paper, we proposed a heterogeneous spatial autoregressive model with all spatial coefficients taking m distinct true values, where m is independent of the sample size n, comprehensively expounding the motivations and reasons for investigating the model. It is necessary and important to establish a corresponding estimation and the properties for the novel model. We established the asymptotic properties of the maximum likelihood estimator and the quasi-maximum likelihood estimator for the parameters in the novel model, extending Lee’s work [] for the classic spatial autoregressive model.
In the spatial autoregressive model, the homoscedasticity of the variances of disturbances was too unrealistic, particularly as the sample size n is large to infinite. How to relax the assumption of homoscedasticity to heteroscedasticity is an interesting and challenging problem. It is also our future research topic.
Author Contributions
Conceptualization, J.H. and F.Q.; methodology, J.H. and H.D.; software, H.D. and F.Q.; validation, H.D. and F.Q.; project management, J.H. and H.D.; investigation, F.Q. and H.D.; Preparation of the original work draft, H.D. and F.Q.; revision, J.H. and F.Q.; visualization, F.Q.; supervision, funding acquisition, J.H., H.D. and F.Q. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the National Natural Science Foundation of China [Grants 11571219] and Shanghai Research Center for Data Science and Decision Technology. This work was also supported by Startup Foundation for Talents in Zhejiang Agriculture and Forestry University (2017FR044).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Acknowledgments
Sincere thanks to everyone who suggested revisions and improved this article.
Conflicts of Interest
No conflict of interest among all authors.
Appendix A. Proofs of Main Theoretical Results
Proof (Proof of Lemma 2) .
Let A be a nonrandom matrix, then
for a random vector Y with mean and covariance matrix . With the second derivatives (24), (25), (A1) and routine calculations, we obtain
Therefore, Lemma 2 is proved. □
Proof (Proof of Lemma 3) .
Using the block matrices expresses as
With routine calculations, we have
where
with
Recall that, for i.i.d. ,
and
Then, we have
and
implying that
Moreover, we have
and
Finally, calculating obtains the expression of . □
Proof (Proof of Theorem 1) .
It follows from the estimation procedure stated in Section 2 that and are the continuous functions of . It means that we only need to show the consistency of . To realize this goal, it suffices from Theorem 3.4 of White [] to show that (i)
uniformly for on , and (ii) the uniqueness identification condition that, for any ,
where is the complement of an open neighborhood of in of radius .
To prove part (i), it follows from (17) and (22) that
where is determined by (21) and from (16) can be written as
with
and
It can be shown that and uniformly on . Therefore,
uniformly on . Consequently,
proving part (i).
To prove part (ii), the identification uniqueness condition can be established as follows. Note that can be divided into two parts
where is the density function of a random vector following the multivariate normal , and is the expectation with respect to the probability density function . For the first part, , consider the following function
For , it follows from Jensen’s inequality that
implying that
For the second part, it is clear from (21) that
It follows
Under Condition 1, strict inequality (A5) holds, implying that and then the identification uniqueness condition (ii) holds. When Condition 1 is violated, inequality (A5) may becomes an equality, for example, for all k. That holds only if the strick inequality (A4) holds. Since and are strictly convex, a sufficient condition of guaranteeing the strict inequality (A4) is
or
which is true under Condition 4. Therefore, and then the identification uniqueness condition (ii) also holds.
Combining part (i) and (ii) together completes the proof of consistency. □
Proof (Proof of Theorem 2) .
Under Assumptions A1–A7, taking the row elementary transformation
on the average Hessian matrix yields
Thus, under Assumptions A1–A7, the average Hessian matrix is nonsingular if and only if
is nonsingular. It can be decomposed into
where
and
with . Therefore, it follows from (A6) that is nonsingular if and only if either of and exists and nonsingular, which are guaranteed by Condition 1 or 4. We complete the proof. □
Proof (Proof of Theorem 3) .
By the mean value theorem, the function at is expressed as
where is between and . First, we show that
Assumption A5 implies that are uniformly bounded in row and column sums uniformly in a neighborhood of . For , from (24), we have
where the mean value theorem is used, the vector with and is between and , due to and . As other terms of the second order derivatives in (3.8) can be easily analyzed, the result (A8) holds.
Second, based on the expressions (24)–(26) and the facts that from Assumptions A1–A7 and , we obtain
It follows from Theorem 2 that the average Hessian matrix is nonsingular for large enough n. Hence, in the neighborhood is nonsingular. It follows from (A7) that
Third, the components of are linear or quadratic functions of . With the existence of high-order moments of in Assumption A1, the central limit theorem for linear quadratic forms of Kelejian and Prucha [] can be applied and
Finally, it follows from Slutsky’s theorem that
If is normally distributed, then , implying that converges in distribution to the multivariate normal . Therefore, the proof is complete. □
Proof (Proof of Theorem 4) .
By the mean value theorem,
where is between and and with
and
We must show that (i) converges in probability to zero uniformly on .
Note that, by the law of large numbers for quadratic forms,
and, by Condition 7,
Thus, we obtain
uniformly on . has uniformly on . From (A3) in Theorem 1, uniformly on . On and are bounded (away from zero) in probability. Therefore, converges in probability to zero uniformly on , having proved (i).
Next, we need to show that (ii) the uniqueness identification condition that, for any ,
where is the complement of an open neighborhood of in of radius .
The similar argument process shown in part (ii) of Theorem 1 is adopted. Note that can be divided into two parts
where is the density function of a random vector following the multivariate normal and is the expectation with respect to the probability density function . For the first part, consider the following function
For , it follows from Jensen’s inequality that
implying that
For the second part, it is clear from (21) that
It follows
Under Condition 7, strict inequality (A10) holds, implying that and then the identification uniqueness condition (ii) holds. When Condition 7 is violated, inequality (A10) may becomes an equality. That holds only if the strick inequality (A9) holds. Since and ( from Assumption A3) are strictly convex, a sufficient condition of guaranteeing the strict inequality (A9) is
or
which is true under Condition 8. Therefore, , and then the identification uniqueness condition (ii) also holds.
Combining the part (i) and (ii) together proves the consistency. □
Proof (Proof of Theorem 5) .
To derive the limiting (here, the asymptotic normal) distribution of with -rate of convergence, our statements are divided into four steps.
First Step: The first- and second-order derivatives of the concentrated log-likelihood are derived as
where . For the pure SAR process, and and the corresponding derivatives are similar with replaced by the identity I.
Second step: Under Condition 7 or 8, we have
and
When , we have and uniformly on . Then,
Under Assumption A6, uniformly on . By the Taylor expansion, we have
where the mean value theorem is used again, is any consistent estimator to and the vector with .
Define
Then,
Since,
we find
and then
By Condition 7 or 8, the average Hessian matrix or information matrix under normality
is nonsingular.
Third step: Considering , we have
Let and . Our task is to find the mean and covariance of With routine calculations, we obtain
and for any
with
where is the vector whose elements are diagonal elements of , and
Therefore, we obtain
(i) The mean of is , and
(ii) The covariance of is , where
The converges in distribution to , where .
Fourth step: It follows from Slutsky’s theorem that
converges in distribution to , implying that we completed the proof of the desired result. □
Proof (Proof of Theorem 6) .
To derive the limiting distribution of with -rate of convergence, it follows from (15) that
It is a linear combination function of random vector, , and higher-order infinitesimals. The limiting distribution (29) of follows. When , we have
To derive the limiting distribution of with -rate of convergence, it follows from (16) that
The detailed decomposition is as follows:
Therefore, we have proven the desired results. □
Appendix B. Counterexample of Inconsistent QML Estimators
Let us consider the following example
In this case, each individual can be influenced with exactly the same effect (absolute equality) by all neighbors and can influence all neighbors at an exactly the same effect. The number of neighbors is . Here, , which violates Assumption A3. Take
which satisfy the homogeneous classification condition (5). For such a spatial weight matrices, and , we have
Thus,
where
with and (assuming ),
and
with and (assuming ). It follows that
and
Note that we have not made a distinction between or in B’s in our discussion.
Appendix B.1. The Case of Violating Condition 1
As includes two constant columns, , i.e., , we have
where . Note that
Thus, with , and , we obtain
and
implying that Condition 1 is violated. belongs to the column space of or is multicollinear with .
Furthermore, we obtain
where and , with
implying that Condition 4 is violated. Singularity of the information matrix occurs.
For simplicity, consider as known. The log-likelihood function (10) is
Given , the QML estimator of is
and the concentrated log-likelihood function of is
The first-order derivative of the likelihood function (A17) with respect to is
Thus,
The second-order derivative of (A17) is
It follows from the mean value theorem that
where is between and .
Assume that is consistent. If is consistent, it means that converges in probability to true value , i.e., . It follows from (A13), (A14), (A16), (A18) and (A19) that
where , and Assumption A7 is used in dealing with the third item, and
It follows that
There is a contradiction, implying that is not consistent.
Appendix B.2. The Pure Heterogeneous SAR Process
For the pure heterogeneous SAR process (9), , implying that Condition 1 is violated. Note that Condition 4 has been violated, see Appendix B.1. The first-order derivative at reduces to
following from (A13) that
where and are two independent standard normal variables. Thus, from (A14), we find
where denotes convergence in distribution. The second-order derivative reduces to
following from (A15) that
Thus, with (A16), we obtain
It follows from (A20) that
which would be a contradiction as would not have a degenerate distribution at the origin point. This contradiction tells us that the estimator could not be a consistent estimator of .
References
- Bell, K.P.; Bockstael, N.E. Applying the generalized-moments estimation approach to spatial problems involving microlevel data. Rev. Econ. Stat. 2000, 82, 72–82. [Google Scholar] [CrossRef]
- Banerjee, S.; Gelfand, A.E.; Knight, J.R.; Sirmans, C.F. Spatial modeling of house prices using normalized distance-weighted sums of stationary processes. J. Bus. Econ. Stat. 2004, 22, 206–213. [Google Scholar] [CrossRef]
- Cliff, A.D.; Ord, J.K. Spatial Autocorrelation; Pion Ltd.: London, UK, 1973. [Google Scholar]
- Anselin, L. Spatial Econometrics: Methods and Models; Kluwer: Dordrecht, The Netherlands, 1988. [Google Scholar]
- Cressie, N. Statistics for Spatial Data; John Wiley: New York, NY, USA, 1993. [Google Scholar]
- Anselin, L.; Bera, A.K. Spatial dependence in linear regression models with an introduction to spatial econometrics. In Handbook of Applied Economics Statistics; Ullah, A., Giles, D.E.A., Eds.; Marcel Dekker: New York, NY, USA, 1998. [Google Scholar]
- Elhorst, J.P. Spatial Econometrics: From Cross Sectional Data to Spatial Panels; Springer: Heidelberg, Germany; New York, NY, USA; Dordrecht, The Netherlands; London, UK, 2014. [Google Scholar]
- Case, A.C. Spatial patterns in household demand. Econometrica 1991, 59, 953–965. [Google Scholar] [CrossRef]
- Case, A.C.; Rosen, H.S.; Hines, J.R. Budget spillovers and fiscal policy interdependence: Evidence from the states. J. Public Econ. 1993, 52, 285–307. [Google Scholar] [CrossRef]
- Besley, T.; Case, A. Incumbent behavior: Vote-seeking, tax-setting, and yard stick competition. Am. Econ. Rev. 1995, 85, 25–45. [Google Scholar]
- Brueckner, J.K. Testing for strategic interaction among local governments: The case of growth controls. J. Urban Econ. 1998, 44, 438–467. [Google Scholar] [CrossRef]
- Bertrand, M.; Luttmer, E.F.R.; Mullainathan, S. Network effects and welfare cultures. Q. J. Econ. 2000, 115, 1019–1055. [Google Scholar] [CrossRef]
- Topa, G. Social interactions, local spillovers and unemployment. Rev. Econ. Stud. 2001, 68, 261–295. [Google Scholar] [CrossRef]
- Coval, J.D.; MosKouitz, T.J. The geography of investment: Informed trading and asset prices. J. Political Econ. 2001, 109, 811–841. [Google Scholar] [CrossRef]
- Druska, V.; Horrace, W.C. Generalized moments estimation for spatial panel data: Indonesian rice farming. Am. J. Agric. Econ. 2004, 86, 185–198. [Google Scholar] [CrossRef]
- Frazier, C.; Kockelman, K.M. Spatial econometric models for panel data: Incorporating spatial and temporal data. Transp. Res. Rec. J. Transp. Res. Board 2005, 1902, 80–90. [Google Scholar] [CrossRef]
- Baltagi, B.; Li, D. Prediction in the panel data model with spatial correlation: The case of liquor. Spat. Econ. Anal. 2006, 1, 175–185. [Google Scholar] [CrossRef]
- Pirinsky, C.; Wang, Q. Does corporate headquarters location matter for stock returns? J. Financ. 2006, 61, 1991–2015. [Google Scholar] [CrossRef]
- Bekaert, G.; Hodrick, R.J.; Zhang, X.Y. International stock return comovements. J. Financ. 2009, 64, 2591–2626. [Google Scholar] [CrossRef]
- Robinson, P.M.; Rossi, F. Improved Lagrange multiplier tests in spatial autoregressions. Econ. J. 2014, 17, 139–154. [Google Scholar] [CrossRef]
- Liu, X.; Chen, J.; Cheng, S. A penalized quasi-maximum likelihood method for variable selection in the spatial autoregressive model. Spat. Stat. 2018, 25, 86–104. [Google Scholar] [CrossRef]
- Ord, K. Estimation methods for models of spatial Interaction. J. Am. Stat. Assoc. 1975, 70, 120–126. [Google Scholar] [CrossRef]
- Smirnov, O.; Anselin, L. Fast maximum likelihood estimation of very large spatial autoregressive models: A characteristic polynomial approach. Comput. Stat. Data Anal. 2001, 35, 301–319. [Google Scholar] [CrossRef]
- Robinson, P.M.; Rossi, F. Refinements in maximum likelihood inference on spatial autocorrelation in panel data. J. Econom. 2015, 189, 447–456. [Google Scholar] [CrossRef]
- Kelejian, H.H.; Prucha, I.R. A Generalized Moments Estimator for the Autoregressive Parameter in a Spatial Model. Int. Econ. Rev. 1999, 40, 509–533. [Google Scholar] [CrossRef]
- Lee, L.F.; Liu, X. Efficient GMM estimation of high order spatial autoregressive models with autoregressive disturbances. Econom. Theory 2010, 26, 187–230. [Google Scholar] [CrossRef] [Green Version]
- Lee, L.F. Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 2004, 72, 1899–1925. [Google Scholar] [CrossRef]
- Yu, J.; De Jong, R.; Lee, L.F. Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and T are large. J. Econom. 2008, 146, 118–134. [Google Scholar] [CrossRef]
- Song, Y.; Liang, X.; Zhu, Y.; Lin, L. Robust variable selection with exponential squared loss for the spatial autoregressive model. Comput. Stat. Data Anal. 2021, 155, 107094. [Google Scholar] [CrossRef]
- Ju, Y.; Yang, Y.; Hu, M.; Dai, L.; Wu, L. Bayesian Influence Analysis of the Skew-Normal Spatial Autoregression Models. Mathematics 2022, 10, 1306. [Google Scholar] [CrossRef]
- Ahrens, A.; Bhattacharjee, A. Two-step Lasso estimation of the spatial weights matrix. Econometrics 2015, 3, 128–155. [Google Scholar] [CrossRef]
- Lam, C.; Souza, P.C.L. Estimation and selection of spatial weight matrix in a spatial lag model. J. Bus. Econ. Stat. 2020, 38, 693–710. [Google Scholar] [CrossRef]
- Clark, A.E.; Loheac, Y. “It was not me, it was them!” Social influence in risky behavior by adolescents. J. Health Econ. 2007, 26, 763–784. [Google Scholar] [CrossRef]
- Mas, A.; Moretti, E. Peers at work. Am. Econ. Rev. 2009, 99, 112–145. [Google Scholar] [CrossRef]
- Banerjee, A.; Chandrasekhar, A.; Duflo, E.; Jackson, M. The diffusion of microfinance. Science 2013, 341, 1236498. [Google Scholar] [CrossRef]
- Dou, B.; Parrella, M.; Yao, Q. Generalized yule-walker estimation for spatio-temporal models with unknown diagonal coefficients. J. Econom. 2016, 194, 369–382. [Google Scholar] [CrossRef] [Green Version]
- Peng, S. Heterogeneous endogenous effects in networks. arXiv 2019, arXiv:1908.00663. [Google Scholar]
- Magnus, J.R.; Neudecker, H. Matrix Differential Calculus with Applications in Statistics and Econometrics; John Wiley & Sons: Hoboken, NJ, USA, 1991. [Google Scholar]
- Horn, R.; Johnson, C. Matrix Analysis; Cambridge University Press: New York, NY, USA, 1985. [Google Scholar]
- White, H. Estimation Inference and Specification Analysis; Cambridge University Press: New York, NY, USA, 1994. [Google Scholar]
- Kelejian, H.H.; Prucha, I.R. On the asymptotic distribution of the Moran I test statistic with applications. J. Econom. 2001, 104, 219–257. [Google Scholar] [CrossRef] [Green Version]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).