Next Article in Journal / Special Issue
Strategic Interaction Model with Censored Strategies
Previous Article in Journal
A Jackknife Correction to a Test for Cointegration Rank
Previous Article in Special Issue
The SAR Model for Very Large Datasets: A Reduced Rank Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Asymptotic Distribution and Finite Sample Bias Correction of QML Estimators for Spatial Error Dependence Model

School of Economics, Singapore Management University, 90 Stamford Road, Singapore 178903, Singapore
*
Author to whom correspondence should be addressed.
Econometrics 2015, 3(2), 376-411; https://doi.org/10.3390/econometrics3020376
Submission received: 3 March 2015 / Revised: 12 May 2015 / Accepted: 14 May 2015 / Published: 21 May 2015
(This article belongs to the Special Issue Spatial Econometrics)

Abstract

:
In studying the asymptotic and finite sample properties of quasi-maximum likelihood (QML) estimators for the spatial linear regression models, much attention has been paid to the spatial lag dependence (SLD) model; little has been given to its companion, the spatial error dependence (SED) model. In particular, the effect of spatial dependence on the convergence rate of the QML estimators has not been formally studied, and methods for correcting finite sample bias of the QML estimators have not been given. This paper fills in these gaps. Of the two, bias correction is particularly important to the applications of this model, as it leads potentially to much improved inferences for the regression coefficients. Contrary to the common perceptions, both the large and small sample behaviors of the QML estimators for the SED model can be different from those for the SLD model in terms of the rate of convergence and the magnitude of bias. Monte Carlo results show that the bias can be severe, and the proposed bias correction procedure is very effective.

1. Introduction

With the fast globalization of economic activities and the concept of ‘neighbor’ ceasing to be merely the person next door, economists and econometricians alike have recognized the importance of modeling the spatial interaction of economic variables. As in time series where the concern is to alleviate the estimation problems caused by the lag in time, the analogous case in cross-sectional data gives rise to a lag in space.
The conventional way to incorporate spatial autocorrelation in a regression model is to add a spatial lag of the dependent variable or a spatial lag of the error variable into the model, giving rise to a regression model with spatial lag dependence (SLD) or a regression model with spatial error dependence (SED). See, among the others, Cliff and Ord [1,2], Ord [3], Burridge [4], Cliff and Ord [5], Anselin [6,7], Anselin and Bera [8] and Anselin [9]. These two models have over the years become the building blocks for spatial econometric modeling, and many more general spatial econometric models have been developed based on them. See, e.g., Anselin [10], Das et al. [11], Kelejian and Prucha [12] and Lee and Liu [13] for more general spatial regression models, Pinkse [14] and Fleming [15] for spatial discrete choices models and Lee and Yu [16] for a survey on spatial panel data models.
Of the methods available for spatial model estimation, the maximum likelihood (ML) or quasi-ML (QML) method remains attractive due to its efficiency. As a result of the fast increase in computing power allowing for easier manipulation of large matrices, the initial reluctance for the use of QML estimation as opposed to other easily implementable estimation methods has been alleviated.1 As such, there had been a growing interest in developing the theoretical aspects behind QML estimation in recent times, which mainly identifies two intriguing issues related to the QML estimation of spatial models: asymptotic distribution and finite sample bias of the ML or QML estimators (MLEs or QMLEs). Of the two models, the SLD model has been extensively studied in terms of the asymptotic distributions of the MLEs or QMLEs (Lee [25]) and finite sample bias corrections on MLEs or QMLEs (Bao and Ullah [26]; Bao [27]; Yang [28]). A particularly interesting phenomenon revealed by Lee [25] for the SLD model is that the spatial dependence may slow down the rate of convergence of QMLEs of certain model parameters, including the spatial parameter. An equally interesting phenomenon revealed by subsequent studies is that spatial dependence may cause QMLEs to be biased and more so with heavier spatial dependence (Baltagi and Yang [29,30]; Yang [28]; Liu and Yang [31]).
Surprisingly, these issues have not been addressed in terms of the SED model. In particular, the effect of the degree of spatial dependence on the convergence rate of the QMLEs has not been formally studied, and methods for correcting finite sample bias of the QMLEs for the SED model have not been given.2 Built on the works of Lee [25] and Yang [28], this paper fills in these gaps. Of the two, bias correction is particularly important to the applications of this model, as it leads potentially to much improved inferences for the regression coefficients. Contrary to common perceptions, both large and small sample behaviors of the QML estimators for the SED model can be different from those for the SLD model in terms of the rate of convergence and the magnitude of bias. In summary, the QMLE of the spatial parameter for the SED model always has a convergence rate slower than n whenever the degree of spatial dependence grows with the increase in sample size n, whereas the QMLEs of the regression coefficient and error variance always have n -rate of convergence whether or not the degree of spatial dependence increases with n. In contrast, the QMLEs of all of the parameters in the SLD model have n -rate of convergence when the spatially generated regressor is not asymptotically multicollinear with the original regressors (Lee [25], Assumption 8), and a slower than n -rate of convergence occurs in some parameters for non-regular cases where the spatially-generated regressor is asymptotically multicollinear with the original regressors and the degree of spatial dependence grows with the increase of n. Monte Carlo results show that the proposed bias correction procedure works very well for the SED model without compromising the efficiency of the original QMLEs.
This paper is organized as follows. Section 2 presents results for consistency and asymptotic normality of the QMLEs for the SED model. Section 3 presents methods for finite sample bias correction. Section 4 extends the study to an alternative SED model where the spatial autoregressive (SAR) error is replaced by a spatial moving average (SMA) error; an undesirable feature of this alternative model specification is revealed. Section 5 presents Monte Carlo results, and Section 6 concludes the paper.

2. Asymptotic Properties of QMLEs for SED Model

In this section, we examine the asymptotic properties of the QMLEs of the linear regression model with spatial error dependence, giving particular attention to the effect of spatial dependence on the rate of convergence of the QMLEs. We show that the QMLEs of the regression coefficients and the error variance always have the conventional n -rate of convergence, whereas the QMLE of the spatial parameter has the conventional n -rate of convergence if the degree of spatial dependence does not grow with the increase in sample size; otherwise, it has a slower rate. With an adjustment on the normalization factor for the score component of the spatial parameter, we establish the joint asymptotic normality for the QMLEs of the model parameters. All proofs are given in Appendix A.

2.1. The Model and the QML Estimation

Consider the following linear regression model with spatial error dependence (SED), where the SED is specified as a spatial autoregressive (SAR) process:
Y n = X n β + u n
u n = ρ W n u n + ϵ n
where Y n is an n × 1 vector of observations on the dependent variable corresponding to n spatial units, X n is an n × k matrix containing the values of k exogenous regressors, W n is an n × n spatial weights matrix that summarizes the interactions among the spatial units, ϵ n is an n × 1 vector of independent and identically distributed (i.i.d.) disturbances with mean zero and variance σ 2 , ρ is the spatial parameter and β denotes the k × 1 vector of regression coefficients.
Let θ = ( β , σ 2 , ρ ) be the vector of model parameters and θ 0 be its true value. Denote A n ( ρ ) = I n ρ W n and A n = A n ( ρ 0 ) , where I n is an n × n identity matrix. If A n 1 exists, then Model (1) can be written as,
Y n = X n β 0 + A n 1 ϵ n
leading to Var ( u n ) = Var ( A n 1 ϵ n ) = σ 0 2 ( A n A n ) 1 .
The linear regression with the spatial lag dependence (SLD) model has the form: Y n = ρ 0 W n Y n + X n β 0 + ϵ ; which can be rewritten as Y n = X n β 0 + ρ 0 G n X n β 0 + A n 1 ϵ n , where G n = W n A n 1 . While in both SED and SLD models, the spatial effects generate a non-spherical structure in the disturbance term, the SLD model has an extra spatially-generated regressor, G n X n β 0 . This spatial regressor plays an important role in the identification and estimation of the spatial parameter in the SLD model in a maximum likelihood estimation framework (Lee [25]).
The first comprehensive treatment of maximum likelihood estimation for the SLD and SED models was given by Ord [3]. More formal results can be found in Anselin [6]. In particular, Anselin [6] pointed out that the MLE of the SED model can be carried out as an application of the general framework of Magnus [34] for non-spherical errors. See Anselin [7]; and Anselin and Bera [8] for a detailed survey on the SLD and SED models.
While the SLD and SED models have been so fundamental and pivotal to the development of the spatial econometric models and methods, an important issue, which is perhaps unique to spatial econometrics models, the effect of the degree of spatial dependence on the asymptotic properties of the QMLEs, in particular the rate of convergence, was not addressed until Lee [25], who clearly identified the situations where the rate of convergence can be affected when the spatial dependence increase with the number of observations. However, this issue has not been addressed in the context of SED models. Furthermore, as will be seen from the following sections, the degree of spatial dependence also has a profound impact on the finite sample performance of the spatial parameter estimates.
The quasi Gaussian log-likelihood function for the SED model is given by,
n ( θ ) = n 2 log ( 2 π σ 2 ) + log | A n ( ρ ) | 1 2 σ 2 ( Y n X n β ) A n ( ρ ) A n ( ρ ) ( Y n X n β )
Maximizing n ( θ ) gives the MLE, θ ^ n of θ if the errors are indeed Gaussian, otherwise the QMLE. Given ρ, the log-likelihood function n ( θ ) is partially maximized at,
β ^ n ( ρ ) = [ X n A n ( ρ ) A n ( ρ ) X n ] 1 X n A n ( ρ ) A n ( ρ ) Y n , and
σ ^ n 2 ( ρ ) = 1 n Y n A n ( ρ ) M n ( ρ ) A n ( ρ ) Y n
where M n ( ρ ) = I n A n ( ρ ) X n [ X n A n ( ρ ) A n ( ρ ) X n ] 1 X n A n ( ρ ) . The concentrated log-likelihood function for ρ upon substituting the constrained QMLEs β ^ n ( ρ ) and σ ^ n 2 ( ρ ) into Equation (4):
n c ( ρ ) = n 2 [ log ( 2 π ) + 1 ] + log | A n ( ρ ) | n 2 log ( σ ^ n 2 ( ρ ) )
Maximizing n c ( ρ ) gives the unconstrained QMLE ρ ^ n of ρ, which, in turn, gives the unconstrained QMLEs of β and σ 2 as β ^ n = β ^ n ( ρ ^ n ) and σ ^ n 2 = σ ^ n 2 ( ρ ^ n ) .

2.2. Consistency and Asymptotic Normality

The asymptotic properties of the QMLEs of the SED model are built on the following basic regularity conditions:
Assumption 1: The true ρ 0 is in the interior of the compact parameter set 𝒫.
Assumption 2: { ϵ n , i } are i.i.d. with mean zero, variance σ 2 and E | ϵ n , i | 4 + δ < , δ > 0 .
Assumption 3: X n has full column rank k; its elements are uniformly bounded constants, and lim n 1 n X n A n ( ρ ) A n ( ρ ) X n exists and is non-singular for any ρ in a neighborhood of ρ 0 .
Assumption 4: The elements { w i j } of W n are at most of order h n 1 uniformly for all i and j, where h n can be bounded or divergent, but subject to lim n h n n = 0 ; W n is uniformly bounded in both row and column sums, and its diagonal elements are zero.
Assumption 5: A n is non-singular, and A n 1 is uniformly bounded in both row and column sums. Further, A n 1 ( ρ ) is uniformly bounded in either row or column sums, uniformly in ρ P .
We allow for the possibility that the degree of spatial dependence, quantified by h n , grows with the sample size n and the possibility that the error distribution is misspecified, i.e., the true error distribution is not normal. These conditions are similar to those in Lee [25] to ascertain the n / h n -consistency of the QMLEs of the SLD model. All conditions, but that on h n , are very general regularity conditions considered widely in the literature. Assumption 1 states that the spatial parameter ρ can only take values in a compact space, such that the Jacobian term of the likelihood function, log | A n ( ρ ) | , is well defined.3 The full rank condition of Assumption 3 is needed to guarantee that the model does not suffer from multicollinearity. Assumption 4 is based on Lee [25], where extensive discussions can be found. Assumption 5 allows us to write the model in the reduced form (3). Uniform boundedness conditions given in Assumptions 4 and 5 are needed to limit the spatial correlation to a manageable degree. Boundedness on the regressors is not restrictive when analyzing cross-sectional units, and in the case with stochastic regressors, it can be replaced by certain finite moment conditions.
Identification of the model parameters requires that the expected log-likelihood function, ¯ n ( θ ) = E [ n ( θ ) ] , has identifiably unique maximizers that converge to θ 0 as n (White [37], Theorem 3.4; Lee [25]). The expected log-likelihood function is,
¯ n ( θ ) = n 2 log ( 2 π σ 2 ) + log | A n ( ρ ) | 1 2 σ 2 E ( Y n X n β ) A n ( ρ ) A n ( ρ ) ( Y n X n β )
which, for a given ρ, is partially maximized at,
β n ( ρ ) = ( X n A n ( ρ ) A n ( ρ ) X n ) 1 X n A n ( ρ ) A n ( ρ ) E ( Y n ) = β 0 , and
σ n 2 ( ρ ) = 1 n E [ Y n X n β n ( ρ ) ] A n ( ρ ) A n ( ρ ) [ Y n X n β n ( ρ ) ] = 1 n E tr [ ϵ n ϵ n A n 1 A n ( ρ ) A n ( ρ ) A n 1 ] = 1 n σ 0 2 tr [ A n 1 A n ( ρ ) A n ( ρ ) A n 1 ]
The resulting concentrated expected log-likelihood function, ¯ n c ( ρ ) , takes the form,
¯ n c ( ρ ) = max β , σ 2 ¯ n ( θ ) = n 2 ( log ( 2 π ) + 1 ) + log | A n ( ρ ) | n 2 log ( σ n 2 ( ρ ) )
From Assumption 3, it is clear that β and σ 2 are identified once ρ is. The latter is guaranteed if ¯ n c ( ρ ) has an identifiably unique maximizer in P that converges to ρ 0 as n , or lim n h n n [ ¯ n c ( ρ ) ¯ n c ( ρ 0 ) ] < 0 , ρ ρ 0 . The global identification condition for the SED model thus simplifies to a condition on ρ alone.
Assumption 6: lim n h n n log | σ 0 2 A n 1 A n 1 | log | σ n 2 ( ρ ) A n 1 ( ρ ) A n 1 ( ρ ) | 0 , ρ ρ 0 .
This differentiates the SED model from the SLD in the asymptotic behaviors of the QMLEs. The spatially-generated regressor G n X n β 0 of the SLD model Y n = X n β 0 + ρ 0 G n X n β 0 + A n 1 ϵ n can help with identifying ρ if it is not asymptotically multicollinear with the original regressors, giving the conventional n -rate of convergence of ρ ^ n irrespective of whether h n is bounded or unbounded. When G n X n β 0 is asymptotically collinear with X n , the convergence rate of ρ ^ n becomes n / h n . In contrast, ρ ^ n for the SED model always has a n / h n -rate of convergence. Note that the variance of Y n of Equation (1) is σ 0 2 A n 1 A n 1 , and hence, the global identification condition given above ensures the uniqueness of the variance matrix. With this global identification condition and the uniform convergence of h n n [ n c ( ρ ) ¯ n c ( ρ ) ] to zero in 𝒫, which is proven in the Appendix, the consistency of ρ ^ n follows.
Theorem 1: Under Assumptions 1–6, the QMLE ρ ^ n is a consistent estimator of ρ 0 .
Theorem 1 and Assumption 3 lead immediately to the consistency of β ^ n and σ ^ n 2 . However, Theorem 1 reveals nothing about the rate of convergence of ρ ^ n , and hence, the rates of convergence of β ^ n and σ ^ n 2 remain unknown, as well. To reveal the exact convergence rates and, at the same time, to derive the asymptotic distributions of the QMLEs, consider the score function,
S n ( θ ) n ( θ ) θ = 1 σ 2 X n A n ( ρ ) A n ( ρ ) u n ( β ) 1 2 σ 4 u n ( β ) A n ( ρ ) A n ( ρ ) u n ( β ) n 2 σ 2 1 σ 2 u n ( β ) A n ( ρ ) W n u n ( β ) tr [ G n ( ρ ) ]
where u n ( β ) = Y n X n β and G n ( ρ ) = W n A n 1 ( ρ ) . It is known that for likelihood-based inferences, the normalized score 1 n S n ( θ 0 ) at the true parameter value would be asymptotically normal. Indeed, under Assumptions 1–5, one can easily show that this is true for β and σ 2 components of 1 n S n ( θ 0 ) . However, the normalized score for ρ is O p ( 1 h n ) ; see Lemmas A.2 and A.3 in the Appendix. This means that when h n is divergent, the likelihood function with respect to ρ is too flat, so that its normalized score converges to a degenerate distribution. As a result, ρ ^ n converges to ρ 0 at a slower rate than the conventional n -rate. A similar phenomenon is observed by Lee [25] for the spatial parameter, as well as the regression coefficients in the SLD model, in the ‘non-regular cases’ where the spatially-generated regressor G n X n β 0 is asymptotically collinear with the regular regressors. This motivate us to consider the following modification.
To account for the effect of spatial dependence on the asymptotic behavior of the QMLE ρ ^ n of the spatial parameter ρ and to jointly study the asymptotic distribution of the QMLE θ ^ n of the model parameter vector θ, we consider the following modified score vector:
S n * ( θ ) = K n S n ( θ )
where K n = diag ( I k , 1 , h n ) . Hence, 1 n S n * ( θ ) would have a proper asymptotic behavior whether h n is divergent or bounded. Under Assumptions 1–5, the central limit theorem (CLT) for linear-quadratic forms of Kelejian and Prucha [38] can be applied to prove the result,
1 n S n * ( θ 0 ) D N ( 0 , Γ * )
where Γ * = lim n 1 n Γ n * , Γ n * = Var [ S n * ( θ 0 ) ] = K n Γ n K n , Γ n = Var [ S n ( θ 0 ) ] , and:
Γ n = 1 σ 0 2 X n A n A n X n 1 2 σ 0 3 γ X n A n ι n 1 σ 0 γ X n A n g n 1 2 σ 0 3 γ ι n A n X n n 4 σ 0 4 ( κ + 2 ) 1 2 σ 0 2 ( κ + 2 ) tr ( G n ) 1 σ 0 γ g n A n X n 1 2 σ 0 2 ( κ + 2 ) tr ( G n ) κ g n g n + tr ( G n s G n )
where ι n is an n × 1 vector of ones, γ = σ 0 3 E ( ϵ n , i 3 ) is the measure of skewness, κ = σ 0 4 E ( ϵ n , i 4 ) 3 is the measure of excess kurtosis, g n = diag ( G n ) , G n = G n ( ρ 0 ) and G n s = G n + G n .
It is easy to see that the information matrix Σ n = E 2 θ θ n ( θ 0 ) , takes the form:
Σ n = 1 σ 0 2 X n A n A n X n 0 0 0 n 2 σ 0 4 1 σ 0 2 tr ( G n ) 0 1 σ 0 2 tr ( G n ) tr ( G n s G n )
which leads to the modified version of the information matrix, Σ n * = K n Σ n K n . One can show that Γ * exists, its diagonal elements are non-zero and Σ * = lim n 1 n Σ n * exists and is positive-definite irrespective of whether h n is bounded or unbounded. In contrast,
lim n 1 n Γ n = 1 σ 0 2 V 1 γ 2 σ 0 3 V 2 0 γ 2 σ 0 3 V 2 1 4 σ 0 4 ( κ + 2 ) 0 0 0 0 and lim n 1 n Σ n = 1 σ 0 2 V 1 0 0 0 1 2 σ 0 4 0 0 0 0
if h n is unbounded, where V 1 = lim n 1 n X n A n A n X n and V 2 = lim n 1 n X n A n ι n . Hence, without the adjustment factor K n , we cannot derive the asymptotic normality results due to the singularity of the matrices required to compute the asymptotic variance-covariance matrix.
To see that Σ * is non-singular under a general h n , consider the determinant of Σ n * : | Σ n * | = 1 2 σ 0 6 1 n | X n A n A n X n | h n n [ tr ( G n s G n ) 2 n tr 2 ( G n ) ] . If h n is bounded, then by Assumptions 3, 4 and 5, | Σ n * | = O ( 1 ) . Now, suppose h n is unbounded where lim n h n = , such that h n n 0 ; then g n , i i , 1 n tr ( G n G n ) , 1 n tr ( G n 2 ) and 1 n tr ( G n ) are all O ( h n 1 ) , and hence, by Assumption 3, | Σ n * | = O ( 1 ) . We have the following theorem for asymptotic normality of QMLE θ ^ n of θ 0 .
Theorem 2: Under Assumptions 1–6, we have,
n K n 1 ( θ ^ n θ 0 ) D N ( 0 , Σ * 1 Γ * Σ * 1 )
where Γ * = lim n 1 n Γ n * and Σ * = lim n 1 n Σ n * . If errors { ϵ n , i } are normally distributed, then n K n 1 ( θ ^ n θ 0 ) D N ( 0 , Σ * 1 ) .
Remark 1: For practical applications of the above result, it is important to note that h n , the quantity characterizing the degree of spatial dependence and affecting the rate of convergence of the QMLEs, is not known in general. However, inference concerning the model parameters does not depend on it, because Σ n * 1 Γ n * Σ n * 1 = ( K n Σ n K n ) 1 ( K n Γ n K n ) ( K n Σ n K n ) 1 = K n 1 Σ n 1 Γ n Σ n 1 K n 1 . Hence, AVar ( θ ^ n θ 0 ) = n 1 Σ n 1 Γ n Σ n 1 .
For the purpose of statistical inference, it might be useful to have the marginal asymptotic distributions of the QMLEs, in particular the marginal asymptotic distribution of ρ ^ n .
Corollary 1: Under the assumptions of Theorem 2, we have,
n ( β ^ n β 0 ) D N 0 , σ 0 2 V 1 1 , n ( σ ^ n 2 σ 0 2 ) D N 0 , 2 σ 0 4 T 1 + κ σ 0 4 ( T 1 2 T 2 2 T 3 ) , n h n ( ρ ^ n ρ 0 ) D N 0 , T 4 + κ T 5 ;
where T 1 = lim n t r ( G n s G n ) t r ( C n s C n ) , T 2 = lim n t r ( G n ) t r ( C n s C n ) , T 3 = lim n 1 n t r ( G n s G n ) 2 g n g n , T 4 = lim n n h n t r 1 ( C n s C n ) , T 5 = lim n n h n g n g n n 1 t r 2 ( G n ) t r 2 ( C n s C n ) , C n = G n t r ( G n ) n I n and C n s = C n + C n .
Corollary 1 clearly reveals that only the QMLE of the spatial parameter has a slower rate of convergence of n / h n when h n is unbounded, which says that the effect of a growing spatial dependence is that the effective sample size for estimating ρ is reduced to n / h n ; β ^ n and σ ^ n 2 have the traditional n -convergence rate whether h n is bounded or unbounded. Intuitively, this is correct, since unlike in the SLD model where there is a lagged dependent variable W n Y n , in the SED model, the spatial structure affects only the errors, and hypothetically, if ρ is known, the model in Equation (1) can be simplified to a linear regression model.
We note that due to the block-diagonal structure of Σ n and the fact that the skewness measure γ appears only in the off-diagonal blocks of Γ n , the marginal asymptotic distributions do not depend on γ. For general asymptotic inferences, γ and κ can be consistently estimated by γ ^ n = 1 n σ ^ n 3 i = 1 n ϵ ^ n , i 3 and κ ^ n = 1 n σ ^ n 4 i = 1 n ϵ ^ n , i 4 3 , respectively, where ϵ ^ n , i are the QML residuals. Thus, the estimates of Σ n and Γ n are obtained by plugging in θ ^ n , γ ^ n and κ ^ n into Σ n and Γ n . These discussions show that the asymptotic inferences for the SED model based on QML estimation are extremely simple. However, an important question remains: how do they perform in finite samples? Take a simple and a very important special case, where the inference concerns the regression coefficients β. While the bias of ρ ^ n does not have much impact on the bias of β ^ n and σ ^ n 2 , it does translate into the bias of the variance estimator of β ^ n through the term V ^ n = 1 n X n A n ( ρ ^ n ) A n ( ρ ^ n ) X n (see the end of Section 4). This shows the importance of bias correction for the SED model or perhaps for the more general models with non-spherical errors.

3. Finite Sample Bias Correction for the QML Estimators

With the formal asymptotic results given in the earlier section, we are ready to study the more important issue: the finite sample properties of the QMLEs of the SED model. The problem of estimation bias, arising from the estimation of non-linear parameters, has been widely recognized by econometricians (see, among others, Kiviet [39]; Hahn and Kuersteiner [40]; Hahn and Newey [41]; Bun and Carree [42]). Spatial econometricians too have recognized this issue in estimating spatial econometric models and have successfully tackled this problem for the SLD model (Bao and Ullah [26]; Bao [27]; Yang [28]). However, no work has been done for the SED model and other spatial models. In a spatial regression context, spatial parameters enter the regression model in a highly non-linear manner, and spatial dependence may be quite strong. As a result, the bias problem in estimating spatial parameter(s) may be quite severe, and hence, it is very important to perform bias corrections on spatial estimators. Among the various methods for bias corrections, the stochastic expansion method of Rilstone et al. [43] has recently gained more attention. With the introduction of the bootstrap method by Yang [28], its applicability has been greatly expanded (see Efron [44] for a general introduction to the bootstrap method).
In this section, we derive the second- and third-order biases of the QMLE of the spatial parameter in the SED model, based on the technique of stochastic expansion (Rilstone et al. [43]) and bootstrap (Yang [28]). As in Yang [28], the key quantities involved in the terms related to the bias of a non-linear estimator are the derivatives of the concentrated log-likelihood function and their expectations. While deriving the analytical solutions of the higher-order derivatives may only be a matter of tedious algebraic manipulations, evaluation of their expectations can be very difficult, if not impossible. We follow the general method introduced in Yang [28] and propose a bootstrap procedure for implementing these bias corrections for the SED model. The validity of this procedure when applied to the SED model is established. Monte Carlo results show an excellent performance of the proposed bias correction procedure. We argue that once the spatial estimator is bias corrected, the estimators of the other models parameters become nearly unbiased. All proofs are given in Appendix B.

3.1. The General Method for Bias Correction

In studying the finite sample properties of a parameter estimator, say θ ^ n , defined as θ ^ n = arg { ψ n ( θ ) = 0 } for an estimating function ψ n ( θ ) , based on a sample of size n, Rilstone et al. [43] and Bao and Ullah [26] developed a stochastic expansion from which a bias correction on θ ^ n can be made. The vector of parameters θ may contain a set of linear and scale parameters, say δ, and a non-linear parameter, say ρ, in the sense that given ρ, the constrained estimator δ ^ n ( ρ ) of the vector δ possesses an explicit expression and the estimation of ρ has to be done through numerical optimization. In this case, Yang [28] argued that it is more effective to work with the concentrated estimating function (CEF), ψ ˜ n ( ρ ) = ψ n ( δ ^ n ( ρ ) , ρ ) , and to perform a stochastic expansion on this CEF and, hence, do the bias correction only on the non-linear estimator defined by,
ρ ^ n = arg { ψ ˜ n ( ρ ) = 0 }
In doing so, a multi-dimensional problem is reduced to a single-dimensional problem, and the additional variability from the estimation of the ‘nuisance’ parameters δ is taken into account in bias correcting the estimate of the non-linear parameter ρ.
Let H r n ( ρ ) = d r d ρ r ψ ˜ n ( ρ ) , r = 1 , 2 , 3 . Under some general smoothness conditions on ψ ˜ n ( ρ ) , Yang [28] presented a third-order, CEF-based, stochastic expansion for ρ ^ n at the true parameter value ρ 0 as,
ρ ^ n ρ 0 = a 1 / 2 + a 1 + a 3 / 2 + O p ( n 2 )
where a s / 2 represents terms of order O p ( n s / 2 ) for s = 1 , 2 , 3 , and they are,
a 1 / 2 = Ω n ψ ˜ n , a 1 = Ω n H 1 n a 1 / 2 + 1 2 Ω n E ( H 2 n ) ( a 1 / 2 2 )      and a 3 / 2 = Ω n H 1 n a 1 + 1 2 Ω n H 2 n ( a 1 / 2 2 ) + Ω n E ( H 2 n ) ( a 1 / 2 a 1 ) + 1 6 Ω n E ( H 3 n ) ( a 1 / 2 3 ) ,
where ψ ˜ n ψ ˜ n ( ρ 0 ) , H r n H r n ( ρ 0 ) , r = 1 , 2 , 3 , H r n = H r n E ( H r n ) and Ω n = [ E ( H 1 n ) ] 1 .
The above stochastic expansion leads to a second-order bias, E ( a 1 / 2 + a 1 ) , and a third-order bias, E ( a 1 / 2 + a 1 + a 3 / 2 ) , which may be used for performing bias corrections on ρ ^ n , provided that analytical expressions of the various expected quantities in the expansion can be derived so that they can be estimated through a plug-in method. Several applications of this plug-in method have appeared in the literature, including Bao and Ullah [26], for the pure spatial autoregressive process, and Bao [27], for the SLD model. The plug-in method may run into difficulty when the analytical expectations are not available or are difficult/impossible to derive as in the SED model that we consider. To overcome this obstacle, Yang [28] proposed a simple and yet a very effective bootstrap method to estimate the relevant expected values.

3.2. Bias of the QMLE of the Spatial Parameter of the SED Model

Recall the concentrated log-likelihood function, defined in Equation (7). Define the concentrated score function or the CEF for ρ as, ψ ˜ n ( ρ ) = ρ h n n n c ( ρ ) , then,
ψ ˜ n ( ρ ) = h n T 0 n ( ρ ) + h n R 1 n ( ρ )
where T 0 n ( ρ ) = 1 n tr ( G n ( ρ ) ) and:
R 1 n ( ρ ) = Y n A n ( ρ ) M n ( ρ ) G n ( ρ ) M n ( ρ ) A n ( ρ ) Y n Y n A n ( ρ ) M n ( ρ ) A n ( ρ ) Y n
leading to ρ ^ n = arg { ψ ˜ n ( ρ ) = 0 } . Let H r n ( ρ ) = d r d ρ r ψ ˜ n ( ρ ) , r = 1 , 2 , 3 , then,
h n 1 H 1 n ( ρ ) = T 1 n ( ρ ) R 2 n ( ρ ) + 2 R 1 n 2 ( ρ ) ,
h n 1 H 2 n ( ρ ) = 2 T 2 n ( ρ ) R 3 n ( ρ ) 6 R 1 n ( ρ ) R 2 n ( ρ ) + 8 R 1 n 3 ( ρ ) ,
h n 1 H 3 n ( ρ ) = 6 T 3 n ( ρ ) R 4 n ( ρ ) 8 R 1 n ( ρ ) R 3 n ( ρ ) + 6 R 2 n 2 ( ρ ) 48 R 1 n 2 ( ρ ) R 2 n ( ρ ) + 48 R 1 n 4 ( ρ )
where T r n ( ρ ) = 1 n t r ( G n r + 1 ( ρ ) ) , r = 1 , 2 , 3 , and:
R j n ( ρ ) = Y n A n ( ρ ) M n ( ρ ) D j n ( ρ ) M n ( ρ ) A n ( ρ ) Y n Y n A n ( ρ ) M n ( ρ ) A n ( ρ ) Y n , j = 2 , 3 , 4
The full expressions for D j n ( ρ ) , j = 2 , 3 , 4 are given in Appendix B. Clearly, D 1 n ( ρ ) = G n ( ρ ) in R 1 n ( ρ ) .
The above expressions show that the key quantities in the third-order stochastic expansion for ρ ^ n (the QMLE of the spatial parameter in the SED model) are those ratios of quadratic forms R j n ( ρ ) , j = 1 , 4 . Note that, in what follows, a function of ρ evaluated at ρ = ρ 0 is denoted by dropping the function argument, e.g., ψ ˜ n = ψ ˜ n ( ρ 0 ) , A n = A n ( ρ 0 ) , G n = G n ( ρ 0 ) , R j n = R j n ( ρ 0 ) , H r n = H r n ( ρ 0 ) , T r n = T r n ( ρ 0 ) . Now, some case-specific conditions on R j n are needed to regulate the limiting behavior of H r n , so that the required quantities have finite limits in expectation.
Assumption 7: E h n n ϵ n M n G n M n ϵ n 1 σ ¯ n 4 1 σ 0 4 ( σ ^ n 2 σ 0 2 ) = O ( h n n ) 1 2 , where σ ¯ n 2 lies between σ 0 2 and σ ^ n 2 .
Assumption 8:
( i ) h n s E [ ( R 1 n E R 1 n ) s ] = O ( h n n ) 1 2 , s = 2 , 3 , 4 ;
( i i ) h n s E [ ( R 2 n E R 2 n ) s ] = O ( h n n ) 1 2 , s = 1 , 2 ;
( i i i ) h n E ( R r n E R r n ) = O ( ( h n n ) 1 2 ) , r = 3 , 4 ;
( i v ) h n s + 1 E [ ( R 1 n E R 1 n ) s ( R 2 n E R 2 n ) ] = O ( ( h n n ) 1 2 ) , s = 1 , 2 , and
( v ) h n 2 E [ ( R 1 n E R 1 n ) ( R 3 n E R 3 n ) ] = O ( ( h n n ) 1 2 ) .
The following Lemma shows the bounded behavior of the expectations of the quantities in the stochastic expansion.
Lemma 1: Under Assumptions 1–7: (i) h n R i n = O p ( 1 ) ; (ii) E ( h n R i n ) = O ( 1 ) ; and (iii) h n R i n = E ( h n R i n ) + O p ( ( h n n ) 1 2 ) , i = 1 , , 4 .
Given Lemma 1 and the regularity conditions, we can prove the following propositions:
Proposition 1: Suppose the SED model specified by Equations (1) and (2) satisfies Assumptions 1–8. Then, the third-order stochastic expansion given in Equation (14) holds for the QMLE ρ ^ n of the spatial parameter in the model with n replaced by n / h n for the stochastic order:
ρ ^ n ρ 0 = c 1 n ζ n + c 2 n ζ n + c 3 n ζ n + O p ( ( h n n ) 2 )
where, c s n ζ n are of stochastic order O ( ( h n n ) s 2 ) , s = 1 , 2 , 3 , with,
ζ n = { ψ ˜ n , H 1 n ψ ˜ n , ψ ˜ n 2 , H 1 n 2 ψ ˜ n , H 2 n ψ ˜ n 2 , H 1 n ψ ˜ n 2 , ψ n 3 } ,
c 1 n = { Ω n , 0 6 × 1 } , Ω n = E ( H 1 n ) 1 , c 2 n = { Ω n , Ω n 2 , 1 2 Ω n 3 E ( H 2 n ) , 0 4 × 1 } , and
c 3 n = { Ω n , 2 Ω n 2 , Ω n 3 E ( H 2 n ) , Ω n 3 , 1 2 Ω n 3 , 3 2 Ω n 4 E ( H 2 n ) , 1 2 Ω n 5 E ( H 2 n ) 2 + 1 6 Ω n 4 E ( H 3 n ) } .
Remark 2: Note that by letting C 2 n = c 1 n + c 2 n and C 3 n = c 1 n + c 2 n + c 3 n , the stochastic expansions can be further simplified to c 1 n ζ n (asymptotic), C 2 n ζ n (second-order) and C 3 n ζ n (third order), which are particularly helpful in the bootstrap work introduced later.
Proposition 2: Under Assumptions 1–8 and further assuming that a quantity bounded in probability has a finite expectation, a third-order expansion for the bias of ρ ^ n is:
Bias ( ρ ^ n ) = C 2 n E ( ζ n ) + c 3 n E ( ζ n ) + O ( ( h n n ) 2 )
and the second- and third-order bias-corrected QMLEs are:
ρ ^ n b c 2 = ρ ^ n C ^ 2 n E ^ ( ζ n ) a n d ρ ^ n b c 3 = ρ ^ n C ^ 3 n E ^ ( ζ n )
where a quantity with a ^ is the corresponding estimate of that quantity.
Practical implementation of the bias corrections given in Equation (23) depends on the availability of the estimates E ^ ( ζ n ) and C ^ 2 n or C ^ 3 n . Note that ζ n is defined in terms of ψ ˜ n and H r n , and C 2 n and C 3 n are defined in terms of E ( H r n ) , r = 1 , 2 , 3 . Given the complicated expressions for ψ ˜ n and H r n defined in Equations (15)–(19), the conventional method of estimation by deriving the analytical expectations for E ( ζ n ) and C 2 n or C 3 n would be extremely difficult, if not impossible. The method of using the sample analogue would not work either, due to the fact that ψ ˜ ( ρ ^ n ) = 0 . These reiterate the point raised in Yang [28], and hence, the bootstrap method given in the same is adopted for the estimation of the quantities in question.

3.3. Bootstrap Method for Implementing the Bias Correction

From Equation (15) and Equations (17)–(19), we see that ψ ˜ n and H r n are functions of only R j n , j = 1 , , 4 , i.e., we need to individually estimate the following terms:
E ( R 1 n i ) , i = 1 , , 5 ; E ( R 2 n j ) , j = 1 , 2 ; E ( R 3 n ) ; E ( R 4 n ) ; E ( R 1 n i R 2 n ) , i = 1 , 2 , 3 ; E ( R 1 n R 2 n 2 ) ; E ( R 1 n i R 3 n ) , i = 1 , 2
It is easy to see that,
R j n R j n ( e n , ρ 0 ) = e n Λ j n ( ρ 0 ) e n e n M n ( ρ 0 ) e n
where e n = σ 0 1 ϵ n , Λ j n ( ρ 0 ) = M n ( ρ 0 ) D j n M n ( ρ 0 ) with D 1 n = G n and D j n , j = 2 , 3 being defined at the beginning of Appendix B. It follows that all of the necessary quantities whose expectations are required can be expressed in terms of e n and ρ 0 . In particular, we can write,
H r n H r n ( e n , ρ 0 ) , and ζ n ζ n ( e n , ρ 0 )
Thus, H r n and ζ n and their distributions are invariant of β 0 and σ 0 2 . The bootstrap procedure for estimating the expectations of the above quantities can be described as follows:
(1)
Compute the QMLEs θ ^ n = ( β ^ n , σ ^ n 2 , ρ ^ n ) based on the original data,
(2)
Compute the standardized QML residuals, e ^ n = σ ^ n 1 A n ( ρ ^ n ) ( Y n X n β ^ n ) .4 Denote the empirical distribution function (EDF) of the centered e ^ n by F n ,
(3)
Draw a random sample of size n from F n , and denote it by e n , b * ,
(4)
Compute R i n ( e n , b * , ρ ^ n ) , i = 1 , , 4 , and hence, H i n ( e n , b * , ρ ^ n ) , i = 1 , 2 , 3 and ζ n ( e n , b * , ρ ^ n ) ,
(5)
Repeat Steps (3) and (4) B times, and the bootstrap estimates of E ( H i n ) , i = 1 , 2 , 3 and E ( ζ n ) are given by:
E ^ ( H i n ) = 1 B b = 1 B H i n ( e n , b * , ρ ^ n ) , and E ^ ( ζ i n ) = 1 B b = 1 B ζ i n ( e n , b * , ρ ^ n )
The proposed bootstrap procedure overcomes the difficulty of analytically evaluating the expectations of very complicated quantities and is very straightforward, since in every bootstrap iteration, no re-estimation of the model parameters is required. The question that remains is its validity, particularly the validity of using C ^ 2 n E ^ ( ξ n ) in the third-order bias corrections C ^ 3 n E ^ ( ξ n ) = C ^ 2 n E ^ ( ξ n ) + c ^ 3 n E ^ ( ξ n ) . We now elaborate using the quantities R j n .
Let F 0 be the CDF of e n , i . The EDF F n is thus an estimate of F 0 . If ρ 0 and F 0 were known, then E [ R j n ( e n , ρ 0 ) ] = ˙ 1 M m = 1 M R j n ( e n , m , ρ 0 ) , where e n , m is a random sample of size n drawn from F 0 and M is an arbitrarily large number. If ρ is unknown, but F 0 is known, E [ R j n ( e n , ρ 0 ) ] can be estimated by 1 M m = 1 M R j n ( e n , m , ρ ^ n ) , giving the so-called Monte Carlo (or parametric bootstrap) estimates of an expectation. In reality, however, both ρ 0 and F 0 are unknown. Hence, this Monte Carlo method does not work. The bootstrap analogue of Model (3) takes the form,
Y n , b * = X n β ^ n + σ ^ n A n 1 ( ρ ^ n ) e n , b *
where ( β ^ n , σ ^ n 2 , ρ ^ n ) are now treated as bootstrap parameters. Based on the generated bootstrap data ( Y n , b * , W n , X n ) and the bootstrap parameter ρ ^ n , one computes R j n defined by Equations (16) and (20), to give bootstrap analogues of R j n , which are R j n ( e n * , ρ ^ n ) , j = 1 , , 4 . The bootstrap estimates of E [ R j n ( e n , ρ 0 ) ] are thus,
E * [ R j n ( e n * , ρ ^ n ) ] = ˙ 1 B b = 1 B R j n ( e n , b * , ρ ^ n ) , for a large   B
which takes the same form as the Monte Carlo estimate with a known F 0 . This gives a heuristic justification for the validity of the bootstrap method.
Formally, denote the second- and third-order bias terms by b 2 ( ρ 0 , γ 0 ) = C 2 n E ( ζ n ) and b 3 ( ρ 0 , γ 0 ) = c 3 n E ( ζ n ) , respectively, where γ 0 = γ ( F 0 ) denotes the higher (than second) order moments of F 0 on which b 2 and b 3 may depend. In our QML estimation framework, γ 0 is unknown, as F 0 is specified up to only the first two moments. Following the arguments above, the bootstrap estimates of b 2 and b 3 must take the form b ^ 2 = b 2 ( ρ ^ n , γ ^ n ) and b ^ 3 = b 3 ( ρ ^ n , γ ^ n ) , where γ ^ n = γ ( F ^ n ) . The validity of the bootstrap estimates of bias corrections is thus established.
Proposition 3: Under the assumptions of Proposition 2 and, further, assuming a quantity bounded in probability has a finite expectation, then,
E [ b 2 ( ρ ^ n , γ ^ n ) ] = b 2 ( ρ 0 , γ 0 ) + O ( ( h n n ) 2 ) , a n d E [ b 3 ( ρ ^ n , γ ^ n ) ] = b 3 ( ρ 0 , γ 0 ) + o p ( ( h n n ) 2 )
It follows that E ( ρ ^ n bc 2 ) = ρ 0 + O ( ( h n n ) 3 2 ) and E ( ρ ^ n bc 3 ) = ρ 0 + O ( ( h n n ) 2 ) .

4. An Alternative Model Specification

As mentioned in Section 2, an alternative to the SED model with an SAR error process is the SED model with a spatial moving average (SMA) error process,
Y n = X n β + u n , u n = ϵ n ρ W n ϵ n
where all of the quantities are defined in a similar manner as Equation (1). The model at the true parameters can be written as Y n = X n β 0 + A n ϵ n , giving Var ( u n ) = σ 0 2 A n A n , suggesting a similar non-spherical error structure. The quasi-Gaussian log-likelihood function for this model is,
n ( θ ) = n 2 log ( 2 π σ 2 ) log | A n ( ρ ) | 1 2 σ 2 ( Y n X n β ) A n 1 ( ρ ) A n 1 ( ρ ) ( Y n X n β )
Given ρ, the constrained QMLEs are,
β ^ n ( ρ ) = ( X n A n 1 ( ρ ) A n 1 ( ρ ) X n ) 1 X n A n 1 ( ρ ) A n 1 ( ρ ) Y n , and σ ^ n 2 ( ρ ) = 1 n Y n A n 1 ( ρ ) M n ( ρ ) A n 1 ( ρ ) Y n
where M n ( ρ ) = I n A n 1 ( ρ ) X n [ X n A n 1 ( ρ ) A n 1 ( ρ ) X n ] 1 X n A n 1 ( ρ ) . This results in the following concentrated log-likelihood function by substituting β ^ n ( ρ ) and σ ^ n 2 ( ρ ) into Equation (27),
n c ( ρ ) = n 2 [ log ( 2 π ) + 1 ] log | A n ( ρ ) | n 2 log ( σ ^ n 2 ( ρ ) )
The unconstrained QMLE ρ ^ n of ρ maximizes n c ( ρ ) , and the unconstrained QMLEs of β and σ 2 are given as β ^ n β ^ n ( ρ ^ n ) and σ ^ n 2 σ ^ n 2 ( ρ ^ n ) , respectively, as in Section 2.
The QMLE ρ ^ n of the SMA error model is likely to perform poorer than that of the SAR error model, because the parameter space P for ρ stays the same, but ρ ^ n now becomes upward biased by comparing Equation (28) with Equation (7). Thus, when ρ is positive, 0.5 say, ρ ^ n may hit the upper bound of P when n is small, causing difficulty in estimating ρ.5 Monte Carlo results given in Section 5 confirm this point. See also Martellosio [33] for related discussions.
Asymptotic distribution: The consistency and asymptotic normality of θ ^ n can be proven in a similar manner as in the SED model with SAR errors, under a similar set of regularity conditions. In particular, Assumption 3 has to be modified as lim n 1 n X n A 1 ( ρ ) A 1 ( ρ ) X n exists and is non-singular uniformly in ρ in a neighborhood of ρ 0 ; and replace Assumption 6, the identification condition, by: for any ρ ρ 0 , lim n h n n log | σ 0 2 A n A n | log | σ n 2 ( ρ ) A n ( ρ ) A n ( ρ ) | 0 , where σ n 2 ( ρ ) = σ 0 2 n tr [ A n A n 1 ( ρ ) A n 1 ( ρ ) A n ] .
Theorem 3: Under the modified Assumptions 1–6, we have,
n K n 1 ( θ ^ n θ 0 ) D N ( 0 , Σ * 1 Γ * Σ * 1 )
where Γ * = lim n 1 n Γ n * , Σ * = lim n 1 n Σ n * , Γ n * = K n Γ n K n , Σ n * = K n Σ n K n ,
Γ n = 1 σ 0 2 X n A n 1 A n 1 X n 1 2 σ 0 3 γ X n A n 1 ι n 1 σ 0 γ X n A n 1 g n 1 2 σ 0 3 γ ι n A n 1 X n n 4 σ 0 4 ( κ + 2 ) 1 2 σ 0 2 ( κ + 2 ) tr ( G n ) 1 σ 0 γ g n A n 1 X n 1 2 σ 0 2 ( κ + 2 ) tr ( G n ) κ g n g n + tr ( G n s G n )
Σ n = 1 σ 0 2 X n A n 1 A n 1 X n 0 0 0 n 2 σ 0 4 1 σ 0 2 tr ( G n ) 0 1 σ 0 2 tr ( G n ) tr ( G n s G n ) , and G n = A n 1 W n
Note that if the errors { ϵ n , i } are normally distributed, then n K n 1 ( θ ^ n θ 0 ) D N ( 0 , Σ * 1 ) . A similar set of results as in Corollary 1 can be obtained, as well. Since the arguments for the proof of Theorem 3 are very similar to those of Theorem 2, the explicit proof is omitted.
Finite sample bias correction: To simplify the exposition, we only present the necessary expressions for a second-order bias correction. The third-order results are available from the authors upon request. The derivatives of the averaged concentrated log-likelihood function h n n n c ( ρ ) , up to a third-order, are:
ψ ˜ n ( ρ ) = h n T 0 n ( ρ ) h n R 1 n ( ρ ) , h n 1 H 1 n ( ρ ) = T 1 n ( ρ ) R 2 n ( ρ ) + 2 R 1 n 2 ( ρ ) , h n 1 H 2 n ( ρ ) = 2 T 2 n ( ρ ) R 3 n ( ρ ) + 6 R 1 n ( ρ ) R 2 n ( ρ ) 8 R 1 n 3 ( ρ )
where T r n ( ρ ) = 1 n tr ( G n r + 1 ( ρ ) ) , r = 0 , 1 , 2 ,
R 1 n ( ρ ) = Y n A n 1 ( ρ ) M n ( ρ ) G n ( ρ ) M n ( ρ ) A n 1 ( ρ ) Y n Y n A n 1 ( ρ ) M n ( ρ ) A n 1 ( ρ ) Y n , a n d
R j n ( ρ ) = Y n A n 1 ( ρ ) M n ( ρ ) D j n ( ρ ) M n ( ρ ) A n 1 ( ρ ) Y n Y n A n 1 ( ρ ) M n ( ρ ) A n 1 ( ρ ) Y n , j = 2 , 3
where D 2 n ( ρ ) and D 3 n ( ρ ) are given in Appendix B.
Finally, with the clear definitions of the quantities ψ ˜ n ( ρ ) , h n 1 H 1 n ( ρ ) and h n 1 H 2 n ( ρ ) , the second-order bias correction of the QMLE ρ ^ n can be carried out using an identical bootstrap procedure as described in Section 3. The validity of the bootstrap procedure applied to this model can be proven in a similar manner. While the third-order bias correction can be carried out in the same manner, we found from the Monte Carlo experiments that the second-order bias corrections are more than satisfactory in all of the cases considered.
Impact of bias correction: In connection with the discussion at the end of Section 2, we now offer some details on the impact of bias correcting ρ ^ n on the subsequent inference for β in the form of testing H 0 : c 0 β = 0 . The test statistic based on Corollary 1 is t n = c 0 β ^ n / σ ^ n 2 c 0 V ^ n 1 c 0 / n , where V ^ n = 1 n X n A n ( ρ ^ n ) A n ( ρ ^ n ) X n = V n ( ρ ^ n ρ 0 ) X n ( W n A n + A n W n ) X n / n + ( ρ ^ n ρ 0 ) 2 X n W n W n X n / n . As ρ ^ n is downward biased, V ^ n tends to overestimate V n , and hence, V ^ n 1 tends to underestimate V n 1 , causing t n to be more variable, hence size distortions (over rejections). Our Monte Carlo results (unreported for brevity) show that simply replacing ρ ^ n in t n by ρ ^ n bc 2 defined in Equation (23) significantly reduces the size distortion. This shows that bias correction has a great potential for improving inferences for the regression coefficients. A formal study on this is interesting, but beyond the scope of this paper.

5. Simulation

The objective of the Monte Carlo simulations is to investigate the finite sample behavior of ρ ^ n and the bias-corrected ρ ^ n , under various spatial layouts, error distributions and the model parameters. The simulations are carried out based on the following data generation processes (DGP):
Y n = ι n β 0 + X 1 n β 1 + X 2 n β 2 + u n , u n = ρ W n u n + ϵ n
where ι n is an n × 1 vector of ones for the intercept term and X 1 n and X 2 n are the n × 1 vectors containing the values of two fixed regressors. The parameters of the simulation are initially set to be as: β = ( 5 , 1 , 1 ) , σ 2 = 1 ; ρ takes values from { 0 . 5 , 0 . 25 , 0 , 0 . 25 , 0 . 5 } and n take values from { 50 , 100 , 200 , 500 } . Each set of Monte Carlo results is based on M = 10 , 000 Monte Carlo samples and B = 999 + n 0 . 75 bootstrap samples within each Monte Carlo sample.
Spatial weights matrix: We use three different methods for generating the spatial weights matrix W n : (i) Rook Contiguity; (ii) Queen Contiguity; and (iii) Group Interaction. The degree of spatial dependence specified by layouts in (i) and (ii) are fixed, while in (iii), it grows with the increase in sample size. Specifically in (iii), W n is block-diagonal, with k blocks (groups) of sizes n 1 , , n k . The r-th block is an n r × n r matrix with off-diagonal elements 1 n r 1 and diagonal elements zero. In our Monte Carlo experiments, k = round ( n δ ) with δ = 0 . 5 or 0 . 65 , and { n r , r = 1 , , k } are k random draws from a discrete uniform distribution from 0 . 5 m to 1 . 5 m with m = round ( n / k ) . Clearly, in this case, the degree of spatial dependence, indicated by the average group size m, increases with n, and it is stronger when δ = 0 . 5 than when δ = 0 . 65 . See Yang [28] for a detailed description.
Regressors: The fixed regressors are generated by REG1: { x 1 i , x 2 i } i . i . d . N ( 0 , 1 ) / 2 when rook or queen contiguity is followed; and according to either REG1; or REG2: { x 1 , i r , x 2 , i r } i . i . d . ( 2 z r + z i r ) / 10 , where ( z r , z i r ) i . i . d . N ( 0 , 1 ) for i = 1 , , n r and r = 1 , , k , when the group interaction scheme is followed. The REG2 scheme gives non-i.i.d. regressors where the group means of the regressors’ values are different; see Lee [25]. Note that both schemes give a signal-to-noise ratio of one when β 1 = β 2 = σ = 1 .
Error distribution: To generate ϵ n = σ e n , three DGPs are considered: DGP1: { e n , i } are i.i.d. standard normal; DGP2: { e n , i } are i.i.d. standardized normal mixture with 10 % of the values from N ( 0 , 4 ) and the remaining from N ( 0 , 1 ) ; and DGP3: { e n , i } i.i.d. standardized log-normal with parameters zero and one. Thus, the error distribution from DGP2 is leptokurtic, and that of DGP3 is both skewed and leptokurtic.
Partial Monte Carlo results are summarized in Table 1, Table 2, Table 3 and Table 4, where in each table, the Monte Carlo means, root mean square errors (RMSE) and the standard errors (SE) of ρ ^ n and ρ ^ n b c 2 are reported. The results for ρ ^ n b c 3 are omitted, as ρ ^ n b c 2 provides satisfactory bias corrections for all of the cases, and the additional gain of using ρ ^ n b c 3 , although apparent, is quite marginal. Further, the case of queen contiguity (Table 2) is replicated by changing the β value to ( 0 . 5 , 0 . 1 , 0 . 1 ) (Table 5) and by changing the σ value to three (Table 6). We also give some partial results (Table 7 and Table 8) for the SMA error model under the same set of parameter values set out at the beginning of this section. It is useful to the note the following general characteristics of the results:
(i)
ρ ^ n suffers from severe downward bias for almost all of the ρ values considered. The severity of the bias varies according to variations in: (1) the sample size; (2) the spatial layout; and (3) the distribution of the errors considered.
(ii)
ρ ^ n b c 2 is almost unbiased in all cases, even at considerably small sample sizes, which ascertains the effectiveness of the proposed bias correction procedure. These corrections can be attained without compromising the efficiency of the original QMLEs.
(iii)
The spatial layout has a considerable impact on the finite sample performance of ρ ^ n in terms of the bias, RMSE and SE. A relatively sparse W n , as in contiguity schemes, results in lower bias, RMSE and SE, while a relatively dense W n , as in the group interaction scheme, results in the opposite.
(iv)
The bias of the original QMLE seems to worsen as the error distribution deviates from normality. In contrast, ρ ^ n b c 2 attains a similar level of accuracy in all cases.
(v)
The performance of ρ ^ n is not so sensitive to changes in the values of σ and β in terms of bias, and the bias correction works well regardless of the true value set for the parameters.
(vi)
The impact of the degree of spatial dependence on the rate of convergence is clearly revealed when comparing the results in Table 3 with those in Table 4 under the group interaction scheme. When the degree of spatial dependence is stronger as in the case where k = n 0 . 5 , the rate of convergence is slower than in the case where k = n 0 . 65 .
As expected, the magnitude of the bias, RMSE and SE is larger for small sample sizes. When considering the efficiency variations in terms of standard errors, it can be seen that the efficiency of the estimators is sensitive to the sample size and the spatial layout. However, the different error distributions do not seem to have a significant effect on standard errors, reiterating the applicability of the proposed bias correction method in terms of robustness.
When the errors follow the SMA process, u n = ( I n W n ) ϵ n , the Monte Carlo results given in Table 7 and Table 8 show that: (i) the bias becomes positive; (ii) the QMLE ρ ^ n again can be severely biased; and (iii) the bias corrected ρ ^ n is almost unbiased. As discussed in Section 4, the Monte Carlo results indeed show that when ρ is positive (e.g., 0.5) and n is small (e.g., 50), ρ ^ n can be close to or can hit its upper bound, say 0 . 9999 , causing numerical instability in calculating A n 1 ( ρ ^ n ) = ( I n ρ ^ n W n ) 1 , thus resulting in a poor performance of ρ ^ n and causing difficulty in bootstrapping the bias. This stands in contrast to the SED model with SAR errors, where ρ ^ n is downward biased. However, with a larger n ( 100 ) , this problem disappears, as seen from the results in Table 7 and Table 8. Nevertheless, this does signal the possible poor performance of the QMLE for an SMA error model when the sample size is not so large and the true spatial parameter value is positive and big.
Finally, compared to the Monte Carlo results presented in Yang [28] for the SLD model, we see that the bias of ρ ^ n is more severe for the SED model, but does not spill over to β ^ n and σ ^ n 2 that much.
Table 1. Empirical mean[RMSE](SD) of estimators of ρ for the spatial error dependence (SED) model with SAR errors: rook contiguity, REG-1.
Table 1. Empirical mean[RMSE](SD) of estimators of ρ for the spatial error dependence (SED) model with SAR errors: rook contiguity, REG-1.
Normal ErrorsMixed Normal ErrorsLog-Normal Errors
ρn ρ ^ n ρ ^ n bc 2 ρ ^ n ρ ^ n bc 2 ρ ^ n ρ ^ n bc 2
0.50500.440[0.175](0.164)0.495[0.169](0.169)0.445[0.166](0.157)0.499[0.161](0.161)0.452[0.152](0.144)0.503[0.147](0.147)
1000.472[0.116](0.112)0.501[0.114](0.114)0.471[0.112](0.108)0.499[0.110](0.110)0.473[0.104](0.101)0.500[0.102](0.102)
2000.487[0.079](0.077)0.501[0.078](0.078)0.486[0.077](0.075)0.500[0.076](0.076)0.487[0.072](0.071)0.500[0.071](0.071)
5000.495[0.049](0.049)0.501[0.049](0.049)0.495[0.049](0.048)0.500[0.049](0.049)0.495[0.046](0.046)0.500[0.046](0.046)
0.25500.202[0.192](0.186)0.248[0.195](0.195)0.203[0.182](0.176)0.248[0.184](0.184)0.207[0.169](0.163)0.250[0.170](0.170)
1000.228[0.130](0.128)0.252[0.131](0.131)0.225[0.127](0.124)0.248[0.127](0.127)0.228[0.119](0.117)0.251[0.120](0.120)
2000.239[0.091](0.090)0.251[0.091](0.091)0.239[0.090](0.090)0.250[0.090](0.090)0.240[0.085](0.084)0.251[0.085](0.085)
5000.246[0.057](0.057)0.250[0.057](0.057)0.246[0.057](0.057)0.251[0.058](0.058)0.246[0.055](0.055)0.251[0.055](0.055)
0.0050−0.032[0.192](0.189)0.002[0.201](0.201)−0.035[0.184](0.181)−0.002[0.191](0.191)−0.033[0.178](0.175)−0.002[0.184](0.184)
100−0.021[0.135](0.133)−0.004[0.137](0.137)−0.018[0.131](0.130)0.000[0.133](0.133)−0.019[0.124](0.123)−0.003[0.126](0.126)
200−0.010[0.097](0.096)−0.001[0.098](0.098)−0.008[0.093](0.093)0.001[0.094](0.094)−0.010[0.089](0.088)−0.002[0.089](0.089)
500−0.005[0.060](0.060)−0.001[0.060](0.060)−0.005[0.059](0.059)−0.001[0.059](0.059)−0.004[0.058](0.058)0.001[0.058](0.058)
−0.2550−0.270[0.180](0.179)−0.252[0.191](0.191)−0.273[0.171](0.170)−0.255[0.181](0.181)−0.274[0.169](0.168)−0.257[0.178](0.178)
100−0.262[0.127](0.126)−0.252[0.130](0.130)−0.261[0.124](0.123)−0.251[0.127](0.127)−0.262[0.120](0.119)−0.252[0.123](0.123)
200−0.255[0.090](0.090)−0.250[0.091](0.091)−0.255[0.088](0.088)−0.250[0.089](0.089)−0.255[0.087](0.087)−0.250[0.088](0.088)
500−0.253[0.057](0.057)−0.250[0.058](0.058)−0.252[0.057](0.057)−0.250[0.058](0.058)−0.253[0.056](0.056)−0.250[0.057](0.057)
−0.5050−0.503[0.152](0.152)−0.502[0.163](0.163)−0.503[0.144](0.144)−0.500[0.153](0.153)−0.509[0.144](0.143)−0.507[0.153](0.153)
100−0.504[0.107](0.107)−0.502[0.111](0.111)−0.503[0.104](0.104)−0.501[0.108](0.108)−0.504[0.103](0.103)−0.502[0.106](0.106)
200−0.502[0.076](0.076)−0.501[0.077](0.077)−0.502[0.074](0.074)−0.501[0.076](0.076)−0.503[0.074](0.074)−0.502[0.075](0.075)
500−0.501[0.048](0.048)−0.500[0.049](0.049)−0.501[0.047](0.047)−0.500[0.048](0.048)−0.501[0.046](0.046)−0.501[0.047](0.047)
Table 2. Empirical mean[RMSE](SD) of estimators of ρ for the SED model with SAR errors: queen contiguity, REG-1.
Table 2. Empirical mean[RMSE](SD) of estimators of ρ for the SED model with SAR errors: queen contiguity, REG-1.
Normal ErrorsMixed Normal ErrorsLog-Normal Errors
ρn ρ ^ n ρ ^ n bc 2 ρ ^ n ρ ^ n bc 2 ρ ^ n ρ ^ n bc 2
0.50500.390[0.244](0.218)0.492[0.215](0.215)0.395[0.232](0.206)0.493[0.204](0.204)0.406[0.207](0.184)0.501[0.181](0.181)
1000.445[0.153](0.143)0.499[0.140](0.140)0.449[0.145](0.135)0.501[0.133](0.133)0.451[0.133](0.124)0.501[0.122](0.122)
2000.474[0.099](0.095)0.500[0.095](0.095)0.474[0.098](0.095)0.500[0.094](0.094)0.476[0.091](0.087)0.500[0.087](0.087)
5000.491[0.059](0.058)0.501[0.058](0.058)0.490[0.059](0.058)0.500[0.058](0.058)0.490[0.056](0.055)0.500[0.055](0.055)
0.25500.144[0.270](0.248)0.248[0.250](0.250)0.153[0.255](0.236)0.254[0.238](0.238)0.153[0.239](0.218)0.250[0.219](0.219)
1000.196[0.179](0.171)0.253[0.169](0.169)0.194[0.177](0.168)0.249[0.166](0.166)0.197[0.165](0.156)0.250[0.154](0.154)
2000.221[0.121](0.117)0.248[0.117](0.117)0.222[0.118](0.115)0.249[0.114](0.114)0.225[0.110](0.107)0.250[0.107](0.107)
5000.240[0.073](0.073)0.250[0.073](0.073)0.240[0.075](0.074)0.250[0.074](0.074)0.241[0.069](0.068)0.251[0.068](0.068)
0.0050−0.101[0.294](0.276)−0.002[0.285](0.285)−0.095[0.277](0.260)0.003[0.268](0.268)−0.095[0.259](0.241)−0.001[0.247](0.247)
100−0.059[0.200](0.192)−0.002[0.192](0.192)−0.059[0.197](0.188)−0.002[0.189](0.189)−0.055[0.181](0.172)0.001[0.172](0.172)
200−0.027[0.135](0.132)0.001[0.133](0.133)−0.026[0.132](0.130)0.002[0.130](0.130)−0.027[0.124](0.121)−0.002[0.121](0.121)
500−0.011[0.083](0.082)−0.001[0.082](0.082)−0.011[0.082](0.081)0.000[0.081](0.081)−0.010[0.079](0.079)0.001[0.079](0.079)
−0.2550−0.339[0.299](0.285)−0.248[0.300](0.300)−0.338[0.284](0.270)−0.249[0.283](0.283)−0.337[0.265](0.250)−0.251[0.261](0.261)
100−0.308[0.211](0.203)−0.252[0.206](0.206)−0.303[0.202](0.195)−0.248[0.198](0.198)−0.307[0.194](0.185)−0.254[0.188](0.188)
200−0.277[0.142](0.140)−0.251[0.141](0.141)−0.274[0.140](0.138)−0.249[0.139](0.139)−0.275[0.132](0.129)−0.250[0.130](0.130)
500−0.262[0.089](0.089)−0.252[0.089](0.089)−0.260[0.088](0.088)−0.250[0.088](0.088)−0.261[0.084](0.083)−0.251[0.084](0.084)
−0.5050−0.576[0.291](0.281)−0.499[0.301](0.301)−0.577[0.283](0.272)−0.502[0.290](0.290)−0.584[0.268](0.255)−0.511[0.271](0.270)
100−0.548[0.208](0.203)−0.498[0.209](0.209)−0.550[0.201](0.195)−0.501[0.201](0.201)−0.547[0.193](0.188)−0.499[0.193](0.193)
200−0.524[0.144](0.142)−0.501[0.144](0.144)−0.524[0.141](0.139)−0.501[0.141](0.141)−0.521[0.136](0.134)−0.498[0.136](0.136)
500−0.511[0.090](0.089)−0.502[0.090](0.089)−0.510[0.089](0.089)−0.501[0.089](0.089)−0.509[0.086](0.086)−0.500[0.086](0.086)
Table 3. Empirical mean[RMSE](SD) of estimators of ρ for the SED model with SAR errors: group interaction, k = n 0 . 5 , REG-2.
Table 3. Empirical mean[RMSE](SD) of estimators of ρ for the SED model with SAR errors: group interaction, k = n 0 . 5 , REG-2.
Normal ErrorsMixed Normal ErrorsLog-Normal Errors
ρn ρ ^ n ρ ^ n bc 2 ρ ^ n ρ ^ n bc 2 ρ ^ n ρ ^ n bc 2
0.50500.277[0.403](0.335)0.523[0.223](0.222)0.287[0.395](0.332)0.524[0.223](0.222)0.303[0.354](0.294)0.532[0.194](0.192)
1000.375[0.233](0.197)0.512[0.148](0.148)0.377[0.233](0.198)0.511[0.149](0.149)0.384[0.214](0.180)0.515[0.136](0.136)
2000.424[0.160](0.141)0.502[0.116](0.116)0.430[0.152](0.134)0.506[0.111](0.111)0.432[0.143](0.126)0.507[0.104](0.104)
5000.454[0.106](0.096)0.502[0.085](0.085)0.455[0.105](0.095)0.502[0.085](0.085)0.456[0.100](0.090)0.502[0.080](0.080)
0.2550−0.082[0.548](0.437)0.291[0.325](0.322)−0.078[0.541](0.431)0.288[0.318](0.315)−0.061[0.507](0.401)0.296[0.296](0.293)
1000.051[0.345](0.281)0.268[0.220](0.219)0.052[0.342](0.278)0.265[0.218](0.218)0.068[0.309](0.249)0.275[0.196](0.194)
2000.129[0.239](0.206)0.259[0.171](0.171)0.127[0.236](0.201)0.256[0.168](0.168)0.131[0.220](0.184)0.257[0.154](0.153)
5000.176[0.160](0.141)0.254[0.126](0.126)0.175[0.161](0.142)0.253[0.127](0.127)0.179[0.153](0.135)0.255[0.120](0.120)
0.0050−0.433[0.679](0.523)0.040[0.419](0.417)−0.432[0.672](0.514)0.034[0.412](0.411)−0.400[0.620](0.474)0.055[0.378](0.375)
100−0.270[0.448](0.357)0.018[0.288](0.288)−0.260[0.435](0.347)0.020[0.280](0.280)−0.251[0.409](0.324)0.025[0.263](0.261)
200−0.172[0.315](0.264)0.009[0.223](0.223)−0.171[0.312](0.261)0.008[0.221](0.221)−0.162[0.295](0.246)0.012[0.209](0.209)
500−0.107[0.215](0.186)0.002[0.167](0.167)−0.106[0.213](0.185)0.002[0.166](0.166)−0.100[0.199](0.173)0.006[0.156](0.155)
−0.2550−0.758[0.767](0.575)−0.210[0.487](0.485)−0.746[0.753](0.567)−0.209[0.483](0.481)−0.723[0.708](0.527)−0.195[0.448](0.445)
100−0.573[0.534](0.425)−0.227[0.354](0.353)−0.574[0.530](0.420)−0.233[0.350](0.350)−0.563[0.490](0.377)−0.228[0.314](0.313)
200−0.467[0.394](0.329)−0.242[0.282](0.282)−0.466[0.382](0.315)−0.242[0.271](0.271)−0.455[0.356](0.291)−0.236[0.250](0.250)
500−0.383[0.263](0.227)−0.240[0.205](0.204)−0.381[0.263](0.228)−0.246[0.206](0.206)−0.379[0.250](0.215)−0.245[0.194](0.194)
−0.5050-1.057[0.828](0.614)−0.456[0.553](0.551)-1.059[0.828](0.611)−0.467[0.550](0.549)-1.040[0.782](0.566)−0.454[0.505](0.503)
100−0.880[0.612](0.480)−0.481[0.409](0.409)−0.875[0.598](0.465)−0.482[0.397](0.396)−0.857[0.562](0.434)−0.472[0.369](0.368)
200−0.753[0.451](0.374)−0.487[0.325](0.325)−0.751[0.445](0.369)−0.487[0.320](0.320)−0.746[0.422](0.344)−0.487[0.299](0.299)
500−0.655[0.308](0.267)−0.493[0.242](0.242)−0.659[0.311](0.267)−0.497[0.243](0.243)−0.652[0.294](0.251)−0.492[0.228](0.228)
Table 4. Empirical mean[RMSE](SD) of estimators of ρ for the SED model with SAR errors: group interaction, k = n 0 . 65 , REG-2.
Table 4. Empirical mean[RMSE](SD) of estimators of ρ for the SED model with SAR errors: group interaction, k = n 0 . 65 , REG-2.
Normal ErrorsMixed Normal ErrorsLog-Normal Errors
ρn ρ ^ n ρ ^ n bc 2 ρ ^ n ρ ^ n bc 2 ρ ^ n ρ ^ n bc 2
0.50500.435[0.155](0.140)0.504[0.119](0.119)0.440[0.147](0.134)0.507[0.114](0.114)0.441[0.133](0.119)0.506[0.101](0.101)
1000.458[0.110](0.101)0.502[0.091](0.091)0.460[0.105](0.097)0.502[0.087](0.087)0.462[0.094](0.086)0.503[0.077](0.077)
2000.477[0.077](0.073)0.503[0.069](0.068)0.475[0.077](0.073)0.501[0.068](0.068)0.478[0.069](0.065)0.503[0.061](0.061)
5000.486[0.053](0.051)0.501[0.050](0.050)0.485[0.053](0.051)0.500[0.049](0.049)0.487[0.050](0.048)0.502[0.046](0.046)
0.25500.148[0.213](0.186)0.257[0.166](0.166)0.151[0.205](0.179)0.257[0.160](0.160)0.154[0.189](0.162)0.257[0.144](0.144)
1000.182[0.156](0.140)0.252[0.129](0.129)0.183[0.151](0.135)0.252[0.124](0.124)0.185[0.139](0.123)0.252[0.112](0.112)
2000.209[0.113](0.105)0.252[0.099](0.099)0.211[0.109](0.102)0.253[0.096](0.096)0.209[0.104](0.095)0.250[0.090](0.090)
5000.228[0.076](0.073)0.252[0.070](0.070)0.227[0.077](0.073)0.251[0.070](0.070)0.227[0.072](0.068)0.251[0.066](0.066)
0.0050−0.129[0.253](0.218)0.006[0.205](0.205)−0.127[0.244](0.208)0.006[0.195](0.195)−0.119[0.222](0.187)0.011[0.175](0.174)
100−0.087[0.191](0.170)0.005[0.159](0.159)−0.088[0.187](0.165)0.003[0.155](0.154)−0.081[0.169](0.148)0.007[0.138](0.138)
200−0.056[0.144](0.133)0.003[0.126](0.126)−0.056[0.140](0.128)0.002[0.122](0.122)−0.052[0.131](0.120)0.005[0.114](0.114)
500−0.033[0.101](0.096)−0.001[0.093](0.093)−0.034[0.100](0.094)−0.001[0.091](0.091)−0.030[0.093](0.088)0.002[0.086](0.086)
−0.2550−0.395[0.273](0.231)−0.248[0.227](0.227)−0.389[0.260](0.220)−0.244[0.216](0.216)−0.384[0.241](0.201)−0.242[0.196](0.196)
100−0.351[0.218](0.193)−0.244[0.184](0.184)−0.353[0.215](0.189)−0.247[0.180](0.180)−0.349[0.197](0.170)−0.246[0.162](0.162)
200−0.319[0.170](0.156)−0.248[0.149](0.149)−0.321[0.169](0.154)−0.251[0.147](0.147)−0.317[0.155](0.140)−0.249[0.134](0.134)
500−0.290[0.122](0.115)−0.249[0.112](0.112)−0.291[0.122](0.115)−0.251[0.112](0.112)−0.289[0.114](0.107)−0.250[0.104](0.104)
−0.5050−0.647[0.276](0.234)−0.499[0.241](0.241)−0.644[0.269](0.228)−0.499[0.236](0.236)−0.639[0.252](0.210)−0.497[0.215](0.215)
100−0.616[0.241](0.212)−0.497[0.205](0.205)−0.609[0.234](0.207)−0.492[0.200](0.200)−0.610[0.219](0.189)−0.495[0.183](0.183)
200−0.580[0.193](0.176)−0.499[0.170](0.170)−0.579[0.191](0.174)−0.499[0.168](0.168)−0.579[0.179](0.161)−0.500[0.156](0.156)
500−0.547[0.141](0.133)−0.500[0.129](0.129)−0.545[0.139](0.131)−0.498[0.128](0.128)−0.544[0.131](0.124)−0.497[0.121](0.121)
Table 5. Replication of Table 2 for β = ( 0 . 5 , 0 . 1 , 0 . 1 ) .
Table 5. Replication of Table 2 for β = ( 0 . 5 , 0 . 1 , 0 . 1 ) .
Normal ErrorsMixed Normal ErrorsLog-Normal Errors
ρn ρ ^ n ρ ^ n bc 2 ρ ^ n ρ ^ n bc 2 ρ ^ n ρ ^ n bc 2
0.50500.395[0.242](0.218)0.499[0.213](0.213)0.396[0.230](0.205)0.497[0.200](0.200)0.404[0.210](0.187)0.501[0.182](0.182)
1000.446[0.150](0.140)0.500[0.138](0.138)0.447[0.149](0.139)0.499[0.137](0.137)0.451[0.135](0.125)0.501[0.123](0.123)
2000.474[0.100](0.096)0.500[0.096](0.096)0.475[0.096](0.093)0.500[0.092](0.092)0.476[0.091](0.087)0.500[0.087](0.087)
5000.490[0.059](0.058)0.500[0.058](0.058)0.490[0.059](0.058)0.500[0.058](0.058)0.491[0.056](0.055)0.501[0.055](0.055)
0.25500.137[0.282](0.258)0.246[0.258](0.258)0.145[0.263](0.241)0.251[0.240](0.240)0.152[0.246](0.225)0.253[0.224](0.224)
1000.195[0.182](0.173)0.252[0.172](0.172)0.196[0.173](0.165)0.252[0.163](0.163)0.195[0.162](0.152)0.249[0.151](0.151)
2000.224[0.121](0.118)0.250[0.118](0.118)0.224[0.118](0.115)0.251[0.115](0.115)0.226[0.111](0.108)0.251[0.108](0.108)
5000.241[0.072](0.071)0.251[0.071](0.071)0.240[0.072](0.071)0.251[0.071](0.071)0.241[0.070](0.070)0.251[0.070](0.070)
0.0050−0.104[0.297](0.279)0.004[0.286](0.286)−0.106[0.285](0.264)−0.002[0.270](0.270)−0.098[0.269](0.250)0.004[0.255](0.255)
100−0.059[0.201](0.192)−0.002[0.193](0.193)−0.058[0.196](0.187)−0.001[0.188](0.188)−0.054[0.181](0.173)0.002[0.173](0.173)
200−0.027[0.134](0.131)0.001[0.132](0.132)−0.028[0.133](0.131)−0.002[0.131](0.131)−0.027[0.124](0.121)−0.001[0.121](0.121)
500−0.010[0.082](0.081)0.002[0.082](0.082)−0.012[0.083](0.082)−0.001[0.082](0.082)−0.011[0.079](0.078)−0.001[0.078](0.078)
−0.2550−0.352[0.305](0.288)−0.253[0.302](0.302)−0.351[0.294](0.276)−0.254[0.289](0.289)−0.346[0.279](0.262)−0.252[0.273](0.273)
100−0.302[0.208](0.202)−0.247[0.205](0.205)−0.304[0.203](0.196)−0.249[0.199](0.199)−0.304[0.192](0.185)−0.251[0.187](0.187)
200−0.275[0.142](0.140)−0.250[0.141](0.141)−0.280[0.139](0.136)−0.255[0.137](0.137)−0.277[0.134](0.131)−0.252[0.132](0.132)
500−0.261[0.090](0.089)−0.251[0.089](0.089)−0.261[0.088](0.087)−0.251[0.088](0.088)−0.259[0.085](0.085)−0.249[0.085](0.085)
−0.5050−0.591[0.300](0.286)−0.506[0.307](0.307)−0.592[0.290](0.276)−0.508[0.294](0.294)−0.588[0.280](0.265)−0.506[0.282](0.282)
100−0.549[0.207](0.201)−0.500[0.208](0.208)−0.554[0.203](0.195)−0.506[0.201](0.201)−0.548[0.193](0.187)−0.500[0.192](0.192)
200−0.524[0.144](0.142)−0.501[0.144](0.144)−0.522[0.141](0.140)−0.499[0.142](0.142)−.−0.523[0.136](0.134)−0.501[0.136](0.136)
500−0.509[0.091](0.090)−0.500[0.091](0.091)−0.508[0.090](0.089)−0.499[0.090](0.090)−0.510[0.087](0.086)−0.500[0.087](0.087)
Table 6. Replication of Table 2 for σ = 3 .
Table 6. Replication of Table 2 for σ = 3 .
Normal ErrorsMixed Normal ErrorsLog-Normal Errors
ρn ρ ^ n ρ ^ n bc 2 ρ ^ n ρ ^ n bc 2 ρ ^ n ρ ^ n bc 2
0.50500.392[0.243](0.217)0.499[0.210](0.210)0.396[0.234](0.209)0.499[0.202](0.202)0.404[0.212](0.189)0.505[0.182](0.182)
1000.449[0.150](0.141)0.501[0.139](0.139)0.449[0.147](0.137)0.499[0.135](0.135)0.452[0.134](0.125)0.501[0.123](0.123)
2000.474[0.098](0.095)0.500[0.094](0.094)0.475[0.097](0.094)0.500[0.093](0.093)0.474[0.091](0.087)0.499[0.087](0.087)
5000.489[0.060](0.059)0.499[0.059](0.059)0.490[0.060](0.059)0.500[0.058](0.058)0.490[0.056](0.055)0.500[0.055](0.055)
0.25500.139[0.282](0.259)0.253[0.257](0.257)0.136[0.271](0.246)0.247[0.243](0.243)0.147[0.249](0.227)0.255[0.224](0.223)
1000.196[0.180](0.172)0.250[0.171](0.171)0.195[0.174](0.165)0.249[0.165](0.165)0.202[0.159](0.152)0.253[0.151](0.151)
2000.220[0.120](0.116)0.247[0.116](0.116)0.225[0.119](0.116)0.251[0.116](0.116)0.226[0.110](0.107)0.251[0.107](0.107)
5000.240[0.074](0.073)0.250[0.073](0.073)0.240[0.072](0.071)0.251[0.071](0.071)0.240[0.070](0.070)0.250[0.070](0.070)
0.0050−0.114[0.307](0.285)0.001[0.291](0.291)−0.111[0.297](0.275)0.001[0.280](0.280)−0.109[0.279](0.256)−0.001[0.259](0.259)
100−0.053[0.195](0.188)0.003[0.189](0.189)−0.053[0.192](0.184)0.001[0.185](0.185)−0.051[0.177](0.170)0.002[0.171](0.171)
200−0.027[0.134](0.131)−0.001[0.132](0.132)−0.028[0.132](0.129)−0.002[0.129](0.129)−0.027[0.123](0.120)−0.002[0.121](0.121)
500−0.010[0.083](0.083)0.001[0.083](0.083)−0.011[0.082](0.082)−0.001[0.082](0.082)−0.011[0.079](0.078)−0.001[0.078](0.078)
−0.2550−0.364[0.312](0.291)−0.258[0.306](0.305)−0.356[0.298](0.278)−0.250[0.291](0.291)−0.355[0.286](0.266)−0.252[0.276](0.276)
100−0.300[0.209](0.203)−0.248[0.207](0.207)−0.302[0.202](0.195)-0.252[0.199](0.199)−0.297[0.187](0.181)−0.248[0.183](0.183)
200−0.277[0.143](0.141)−0.252[0.142](0.142)−0.275[0.139](0.137)−0.249[0.138](0.138)−0.274[0.134](0.132)−0.249[0.132](0.132)
500−0.259[0.088](0.087)−0.249[0.087](0.087)−0.262[0.088](0.087)−0.252[0.087](0.087)−0.260[0.085](0.085)−0.250[0.085](0.085)
−0.5050−0.593[0.305](0.290)−0.501[0.312](0.312)−0.596[0.292](0.276)−0.504[0.296](0.296)−0.599[0.281](0.263)−0.509[0.280](0.280)
100−0.548[0.207](0.201)−0.503[0.208](0.208)−0.547[0.198](0.193)−0.502[0.199](0.199)−0.543[0.192](0.187)−0.499[0.192](0.192)
200−0.522[0.145](0.143)−0.499[0.145](0.145)−0.525[0.142](0.140)−0.503[0.142](0.142)−0.522[0.136](0.134)−0.500[0.136](0.136)
500−0.509[0.091](0.091)−0.500[0.091](0.091)−0.511[0.089](0.088)−0.502[0.089](0.089)−0.510[0.086](0.086)−0.501[0.086](0.086)
Table 7. Empirical mean[RMSE](SD) of estimators of ρ for the SED model with spatial moving average (SMA) errors: queen contiguity, REG-1.
Table 7. Empirical mean[RMSE](SD) of estimators of ρ for the SED model with spatial moving average (SMA) errors: queen contiguity, REG-1.
Normal ErrorsMixed Normal ErrorsLog-Normal Errors
ρn ρ ^ n ρ ^ n bc 2 ρ ^ n ρ ^ n bc 2 ρ ^ n ρ ^ n bc 2
0.501000.554[0.154](0.145)0.509[0.418](0.418)0.552[0.151](0.142)0.509[0.318](0.318)0.553[0.149](0.139)0.506[0.140](0.140)
2000.527[0.101](0.097)0.501[0.096](0.096)0.528[0.099](0.095)0.502[0.095](0.095)0.527[0.096](0.093)0.501[0.092](0.092)
5000.510[0.059](0.058)0.500[0.058](0.058)0.510[0.059](0.058)0.500[0.058](0.058)0.510[0.059](0.058)0.500[0.058](0.058)
0.251000.302[0.184](0.176)0.256[0.178](0.178)0.301[0.180](0.173)0.255[0.171](0.171)0.292[0.171](0.166)0.247[0.163](0.163)
2000.275[0.121](0.119)0.251[0.117](0.117)0.273[0.120](0.118)0.250[0.116](0.116)0.274[0.115](0.112)0.251[0.111](0.111)
5000.259[0.074](0.073)0.250[0.073](0.073)0.261[0.073](0.072)0.252[0.072](0.072)0.260[0.071](0.071)0.251[0.070](0.070)
0.001000.041[0.204](0.200)−0.001[0.196](0.196)0.040[0.197](0.193)−0.002[0.188](0.188)0.039[0.187](0.183)−0.001[0.179](0.179)
2000.019[0.136](0.134)−0.002[0.132](0.132)0.022[0.133](0.131)0.002[0.129](0.129)0.021[0.129](0.127)0.001[0.125](0.125)
5000.009[0.083](0.083)0.001[0.083](0.083)0.009[0.082](0.082)0.001[0.081](0.081)0.008[0.081](0.080)0.000[0.080](0.080)
−0.25100−0.214[0.217](0.214)−0.249[0.208](0.208)−0.217[0.210](0.208)−0.251[0.202](0.202)−0.222[0.197](0.195)−0.254[0.189](0.189)
200−0.234[0.145](0.144)−0.250[0.142](0.142)−0.233[0.143](0.142)−0.249[0.140](0.140)−0.235[0.138](0.137)−0.251[0.134](0.134)
500−0.245[0.089](0.089)−0.251[0.089](0.089)−0.245[0.089](0.089)−0.251[0.089](0.089)−0.245[0.086](0.086)−0.251[0.086](0.086)
−0.50100−0.472[0.218](0.216)−0.498[0.209](0.209)−0.475[0.214](0.212)−0.500[0.205](0.205)−0.479[0.201](0.200)−0.502[0.193](0.193)
200−0.489[0.149](0.149)−0.501[0.146](0.146)−0.492[0.146](0.146)−0.503[0.143](0.143)−0.490[0.139](0.138)−0.500[0.136](0.136)
500−0.495[0.092](0.092)−0.500[0.091](0.091)−0.495[0.089](0.089)−0.500[0.089](0.089)−0.496[0.087](0.087)−0.500[0.086](0.086)
Table 8. Empirical mean[RMSE](SD) of estimators of ρ for the SED model with SMA errors: group interaction, k = n 0 . 5 , REG-1.
Table 8. Empirical mean[RMSE](SD) of estimators of ρ for the SED model with SMA errors: group interaction, k = n 0 . 5 , REG-1.
Normal ErrorsMixed Normal ErrorsLog-Normal Errors
ρn ρ ^ n ρ ^ n bc 2 ρ ^ n ρ ^ n bc 2 ρ ^ n ρ ^ n bc 2
0.501000.549[0.129](0.120)0.508[0.128](0.127)0.548[0.126](0.117)0.507[0.124](0.124)0.548[0.121](0.111)0.507[0.118](0.118)
2000.534[0.106](0.100)0.503[0.104](0.104)0.534[0.104](0.098)0.502[0.102](0.102)0.533[0.099](0.094)0.502[0.097](0.097)
5000.519[0.078](0.076)0.501[0.078](0.078)0.520[0.079](0.077)0.502[0.079](0.079)0.519[0.077](0.074)0.502[0.076](0.076)
0.251000.309[0.184](0.174)0.254[0.183](0.183)0.310[0.179](0.169)0.256[0.177](0.177)0.306[0.167](0.158)0.253[0.165](0.165)
2000.292[0.148](0.142)0.252[0.147](0.147)0.292[0.147](0.141)0.252[0.146](0.146)0.294[0.140](0.133)0.254[0.138](0.138)
5000.277[0.116](0.113)0.252[0.116](0.116)0.276[0.116](0.113)0.252[0.116](0.116)0.275[0.111](0.108)0.251[0.111](0.111)
0.001000.071[0.234](0.223)0.005[0.234](0.234)0.069[0.228](0.217)0.004[0.227](0.227)0.065[0.211](0.200)0.002[0.209](0.209)
2000.051[0.197](0.190)0.001[0.198](0.198)0.053[0.192](0.185)0.004[0.192](0.192)0.052[0.180](0.172)0.004[0.178](0.178)
5000.032[0.152](0.149)−0.001[0.154](0.154)0.032[0.150](0.146)0.001[0.150](0.150)0.034[0.145](0.141)0.003[0.145](0.145)
−0.25100−0.168[0.281](0.269)−0.246[0.282](0.282)−0.174[0.269](0.258)−0.251[0.270](0.270)−0.172[0.254](0.242)−0.246[0.253](0.253)
200−0.194[0.234](0.227)−0.253[0.236](0.236)−0.187[0.233](0.225)−0.245[0.233](0.233)−0.192[0.221](0.214)−0.249[0.222](0.222)
500−0.210[0.188](0.184)−0.248[0.189](0.189)−0.211[0.188](0.184)−0.249[0.189](0.189)−0.213[0.178](0.174)−0.251[0.179](0.179)
−0.50100−0.411[0.321](0.308)−0.500[0.324](0.324)−0.408[0.315](0.302)−0.495[0.316](0.316)−0.417[0.294](0.282)−0.503[0.296](0.296)
200−0.427[0.276](0.266)−0.496[0.276](0.276)−0.427[0.272](0.262)−0.495[0.273](0.273)−0.436[0.256](0.247)−0.502[0.257](0.257)
500−0.456[0.219](0.215)−0.501[0.221](0.221)−0.453[0.223](0.218)−0.498[0.224](0.224)−0.456[0.213](0.208)−0.501[0.214](0.214)

6. Conclusions

This paper fills in some gaps in the literature by providing formal results for the asymptotic distribution, as well as finite sample bias correction of the QMLEs for the spatial error dependence model. The primary concentration in the paper is an SED model with autoregressive errors of order one. Comparable results for moving average errors of order one have been illustrated, as well.
The consistency and the asymptotic normality of the QMLEs has been addressed with specific attention given to the effect of the degree of spatial dependence on the rate of convergence of the QMLEs of the model parameters. Specifically when the degree of spatial dependence, h n , grows with the sample size n, the QMLE of the spatial parameter will have a lower rate of convergence (of n / h n ), while the other QMLEs will have a n -rate of convergence irrespective of the behavior of h n . Of the finite sample properties of spatial models, specific attention has been given to the finite sample bias of the QMLE of the spatial parameter, as it enters the model in a highly nonlinear manner, and thus, the estimation of it constitutes the main source of bias. Simulation studies indicate a prominent single direction bias in the estimation of the spatial parameter, which, in turn, affects the subsequent inferences for the other model parameters. The severity of the bias increases as the spatial weights matrix becomes less sparse.
The finite sample results of this paper demonstrate again that stochastic expansions (Rilstone et al. [43]) coupled with bootstrap (Yang [28]) provide a general and effective method for finite sample bias corrections of a nonlinear estimator. The suggested theories and methodologies are likely to be appealing to both theorists, as well as practitioners alike who are dealing with the SED model or any other regression model that considers a spatial dependence structure in the error process (like SARAR, panel SARAR, spatial dynamic panel data models, etc.). It would be interesting, as future research, to address similar issues for these more complicated models. A formal study of the impacts of bias correcting spatial/nonlinear estimators on the subsequent inferences for the regression coefficients is also on the agenda for our future research, in relation to a broader model of non-spherical errors.

Acknowledgments

The authors wish to thank the participants of the Asian Meeting of the Econometric Society 2013 and the Singapore Economic Review Conference 2013, as well as three anonymous referees for their useful comments and suggestions. Zhenlin Yang gratefully acknowledges the research support from Singapore Management University.

Author Contributions

Yang Zhenlin suggested the problem and the methodology to use. Shew Fan Liu implemented these suggestions to solve the problem. All authors contributed to the final paper.

Appendix

A. Proofs of Asymptotic Results in Section 2

The following lemmas are extended versions of some lemmas from Lee [25] and Kelejian and Prucha [38], which are needed in the proofs of the main results.
Lemma A.1: Suppose the matrix of independent variables X n has uniformly bounded elements and that the matrix A n is defined s.t. Assumptions 3 and 5 are satisfied, then the projection matrices M n ( ρ ) = I n A n ( ρ ) X n [ X n A n ( ρ ) A n ( ρ ) X n ] 1 X n A n ( ρ ) and P n ( ρ ) = I n M n ( ρ ) are uniformly bounded in both row and column sums, uniformly in ρ P .
Lemma A.2: Let A n be an n × n matrix, uniformly bounded in both row and column sums. Then, for M n = M n ( ρ 0 ) defined in Lemma A.1,
( i ) tr ( A n m ) = O ( n ) for m 1 ,
( i i ) tr ( A n A n ) = O ( n ) ,
( i i i ) tr ( ( M n A n ) m ) = tr ( A n m ) + O ( 1 ) for m 1 and
( i v ) tr ( ( A n M n A n ) m ) = tr ( ( A n A n ) m ) + O ( 1 ) for m 1 .
Suppose further that B n is an n × n matrix, uniformly bounded in both row and column sums and C n is a matrix s.t. the elements are of order O ( h n 1 ) ; then,
( i v ) A n B n is uniformly bounded in both row and column sums,
( v ) A n C n = C n A n = O ( h n 1 ) uniformly and
( v i ) tr ( A n C n ) = tr ( C n A n ) = O ( n h n ) uniformly.
Lemma A.3 (moments and limiting distribution of quadratic forms): Suppose the innovations { ϵ n i } satisfy Assumption 2, and let γ and κ be respectively the measures of skewness and excess kurtosis of ϵ n i . Further, let A n be an n × n matrix with elements denoted by a n , i j . Let, Q n = ϵ n A n ϵ n ; then,
( i ) E ( Q n ) = σ 0 2 tr ( A n ) and
( i i ) Var ( Q n ) = σ 0 4 [ tr ( A n A n + A n 2 ) + κ i = 1 n a n , i i 2 ] .
Now, if A n is uniformly bounded either in row or column sums with the elements being of uniform order O ( 1 h n ) , then,
( i i i ) E ( Q n ) = O ( n h n ) ,
( i v ) Var ( Q n ) = O ( n h n ) ,
( v ) Q n = O p ( n h n ) ,
( v i ) h n n Q n h n n E ( Q n ) = O p ( h n n ) 1 2 = o p ( 1 ) and
( v i i ) Var ( h n n Q n ) = O ( h n n ) = o ( 1 ) .
Further, if the elements of A n are uniformly bounded in both row and column sums and Assumption 4 is satisfied, then,
( v i i i ) Q n E ( Q n ) Var ( Q n ) D N ( 0 , 1 ) .
Proof of Theorem 1: Following Theorem 3.4 of White [37], it is sufficient to show that: (i) the identification uniqueness condition lim sup n max ρ N ϵ c ( ρ 0 ) h n n [ ¯ n c ( ρ ) ¯ n c ( ρ 0 ) ] < 0 for any ϵ > 0 , where N ϵ c ( ρ 0 ) is the compliment of an open neighborhood of ρ 0 on P of radius ϵ; and (ii) the uniform convergence in probability h n n [ n c ( ρ ) ¯ n c ( ρ ) ] p 0 uniformly in ρ P .
To show (i), first observing from Equation (10) that σ n 2 ( ρ 0 ) = σ 0 2 , we have,
lim n h n n ¯ n c ( ρ ) ¯ n c ( ρ 0 ) = lim n h n n ( log | A n ( ρ ) | log | A n | ) h n 2 ( log σ n 2 ( ρ ) log σ 0 2 ) = lim n h n 2 n ( log | A n ( ρ ) A n ( ρ ) | log | A n A n | ) + h n 2 n ( log | σ n 2 ( ρ ) I n | log | σ 0 2 I n | ) 0   for   ρ ρ 0 ,   by Assumption 6.
Next, let p n ( θ ) = exp [ n ( θ ) ] be the quasi joint pdf of u n ( = Y n X n β 0 ) and p n 0 ( θ ) the true joint pdf of u n . Let E q denote the expectation with respect to p n , to differentiate from the usual notation E that corresponds to p n 0 . By Jensen’s inequality (see Rao [45] (p. 58)), we have,
0 = log E q p n ( θ ) p n ( θ 0 ) E q log p n ( θ ) p n ( θ 0 ) = E log p n ( θ ) p n ( θ 0 )
where the last equation follows from the fact that log p n ( θ 0 ) and log p n ( θ ) are either a quadratic form or a linear-quadratic form of u n , and hence, their expectations w.r.t p n ( θ 0 ) are the same as those w.r.t. p n 0 ( θ 0 ) . It follows that E [ log p n ( θ ) ] E [ log p n ( θ 0 ) ] , and that,
¯ n ( ρ ) = max β , σ 2 E [ log p n ( θ ) ] E [ log p n ( θ 0 ) ] = ¯ n ( ρ 0 ) , for   ρ ρ 0
The identification uniqueness condition thus follows.
To show (ii), note that h n n [ n c ( ρ ) ¯ n c ( ρ ) ] = h n 2 [ log ( σ ^ n 2 ( ρ ) ) log ( σ n 2 ( ρ ) ) ] . By the mean value theorem, h n [ log ( σ ^ n 2 ( ρ ) ) log ( σ n 2 ( ρ ) ) ] = h n σ ˜ n 2 ( ρ ) [ σ ^ n 2 ( ρ ) σ n 2 ( ρ ) ] , where σ ˜ n 2 ( ρ ) lies between σ ^ n 2 ( ρ ) and σ n 2 ( ρ ) . Note that,
σ ^ n 2 ( ρ ) = 1 n Y n A n ( ρ ) M n ( ρ ) A n ( ρ ) Y n = 1 n ϵ n A n 1 A n ( ρ ) M n ( ρ ) A n ( ρ ) A n 1 ϵ n = 1 n ϵ n A n 1 A n ( ρ ) A n ( ρ ) A n 1 ϵ n n ( ρ )
where n ( ρ ) 1 n ϵ n A n 1 A n ( ρ ) P n ( ρ ) A n ( ρ ) A n 1 ϵ n .
By Assumption 3, V 1 n ( ρ ) 1 n X n A n ( ρ ) A n ( ρ ) X n = O ( 1 ) . In addition, from Lemma A.2, 1 n tr ( W n A n 1 ) 1 n tr ( G n ) = O ( 1 h n ) and using A n ( ρ ) = A n + ( ρ 0 ρ ) W n , we have,
n * ( ρ ) = 1 n X n A n ( ρ ) A n ( ρ ) A n 1 ϵ n = 1 n X n A n ϵ n + ( ρ 0 ρ ) X n ( W n + A n G n ) ϵ n + ( ρ 0 ρ ) 2 X n W n G n ϵ n = O p ( 1 h n )
Hence, n ( ρ ) = 1 n n * ( ρ ) V 1 n 1 ( ρ ) n * ( ρ ) = o p ( 1 ) , uniformly in ρ P . It follows by Lemma A.3(vi) that, h n [ σ ^ n 2 ( ρ ) σ n 2 ( ρ ) ] = h n n [ ϵ n A n 1 A n ( ρ ) A n ( ρ ) A n 1 ϵ n σ 0 2 tr [ A n 1 A n ( ρ ) A n ( ρ ) A n 1 ] + o p ( 1 ) = o p ( 1 ) , uniformly in ρ P .
It is left to show that σ n 2 ( ρ ) is uniformly bounded away from zero, which is done by a counter argument. Suppose σ n 2 ( ρ ) is not uniformly bounded away from zero in P. Then, there exists a sequence ρ n P s.t. σ n 2 ( ρ n ) 0 as n . Consider a simpler model by setting β in Equation (1) to zero. The Gaussian log-likelihood is t , n ( θ ) = n 2 log ( 2 π σ 2 ) + log | A n ( ρ ) | 1 2 σ 2 Y n A n ( ρ ) A n ( ρ ) Y n . Then, ¯ t , n ( ρ ) = max σ 2 E [ t , n ( θ ) ] = n 2 [ log ( 2 π ) + 1 ] n 2 log ( σ n 2 ( ρ ) ) + log | A n ( ρ ) | . By Jensen’s inequality, ¯ t , n ( θ ) E [ t , n ( θ 0 ) ] = ¯ t , n ( ρ 0 ) , ρ . This implies 1 n [ ¯ t , n ( θ ) ¯ t , n ( θ 0 ) ] 0 and 1 2 log ( σ n 2 ( ρ ) ) 1 2 log ( σ 0 2 ) + 1 n ( log | A n ( ρ 0 ) | log | A n ( ρ ) | ) = O ( 1 ) using Lemma A.2, that is log ( σ n 2 ( ρ ) ) is bounded from above, which is a contradiction. Hence, σ n 2 ( ρ ) is bounded away from zero uniformly in ρ P , and log ( σ n 2 ( ρ ) ) is well defined ρ P .
Since σ n 2 ( ρ ) is bounded away from zero and h n [ σ ^ n 2 ( ρ ) σ n 2 ( ρ ) ] = o p ( 1 ) , σ ^ n 2 ( ρ ) is bounded away from zero uniformly in probability in 𝒫, as well. Collecting all of these results together along with the mean value theorem, we have h n | log ( σ ^ n 2 ( ρ ) ) log ( σ n 2 ( ρ ) ) | = o p ( 1 ) uniformly in ρ P . Hence, sup ρ P h n n | [ n c ( ρ ) ¯ n c ( ρ ) ] | = o p ( 1 ) .
Proof of Theorem 2: By applying the mean value theorem on the modified first-order condition, we have,
0 = 1 n S n * ( θ ^ n ) = 1 n S n * ( θ 0 ) + 1 n θ S n * ( θ ˜ n ) ( θ ^ n θ 0 ) = 1 n S n * ( θ 0 ) 1 n K n H n ( θ ˜ n ) K n . n K n 1 ( θ ^ n θ 0 )
where θ ˜ n lies between the line segment joining θ 0 and θ ^ n ; thus, θ ˜ p θ 0 . Here, H n ( θ ) is the negative Hessian matrix, and K n is as defined in Section 2.2.
Under Assumptions 1–5, the central limit theorem for linear-quadratic forms of Kelejian and Prucha [38] is applicable, which gives 1 n S n * ( θ 0 ) = K n n θ ( θ 0 ) D N ( 0 , Γ * ) , where Γ * = lim n 1 n Γ n * and Γ n * = Var [ S n * ( θ 0 ) ] . The asymptotic normality of θ ^ n thus follows from: (i) 1 n K n H n ( θ ˜ n ) K n 1 n K n H n ( θ 0 ) K n = o p ( 1 ) and (ii) 1 n K n H n ( θ 0 ) K n 1 n K n Σ n K n = o p ( 1 ) , where Σ n = E [ H n ( θ 0 ) ] is the information matrix given in Section 2.2. To show (i), note that H n ( θ ) =
1 σ 2 X n A n ( ρ ) A n ( ρ ) X n 1 σ 4 X n A n ( ρ ) ϵ n ( δ ) 2 σ 2 X n A n ( ρ ) G n ( ρ ) ϵ n ( δ ) 1 σ 4 ϵ n ( δ ) A n ( ρ ) X n 1 2 σ 6 ( 2 ϵ n ( δ ) ϵ n ( δ ) n σ 2 ) 1 σ 4 ϵ n ( δ ) G n ( ρ ) ϵ n ( δ ) 2 σ 2 ϵ n ( δ ) G n ( ρ ) A n ( ρ ) X n 1 σ 4 ϵ n ( δ ) G n ( ρ ) ϵ n ( δ ) 1 σ 2 [ ϵ n ( δ ) G n ( ρ ) G n ( ρ ) ϵ n ( δ ) + σ 2 tr ( G n 2 ( ρ ) ) ]
where δ = ( β , ρ ) . Let A ˜ n = A n ( ρ ˜ n ) . Under Assumption 3 and using θ ˜ n p θ 0 , we have,
1 n 2 β β n ( θ ˜ n ) 2 β β n ( θ 0 ) = 1 n 1 σ 0 2 X n A n A n X n 1 σ ˜ n 2 X n A ˜ n A ˜ n X n = 1 σ 0 2 1 σ ˜ n 2 1 n X n A n A n X n + o p ( 1 ) = o p ( 1 )
noticing that A n A n A ˜ n A ˜ n = ( ρ ˜ n ρ 0 ) ( W n + W n ) ( ρ ˜ n 2 ρ 0 2 ) W n W n .
Similarly, it can be shown that, letting ϵ ˜ n = ϵ n ( ρ ˜ n ) ,
1 n 2 ( σ 2 ) 2 n ( θ ˜ n ) 2 ( σ 2 ) 2 n ( θ 0 ) = 1 n σ 0 6 ϵ n ϵ n 1 n σ ˜ n 6 ϵ ˜ n ϵ ˜ n 1 2 1 σ 0 4 1 σ ˜ n 4 = 1 n σ 0 6 ( ϵ n ϵ n ϵ ˜ n ϵ ˜ n ) + o p ( 1 ) = o p ( 1 )
since ϵ ˜ n ϵ ˜ n ϵ n ϵ n = 2 ( ρ 0 ρ ˜ n ) ϵ n G n ϵ n + 2 ϵ n A n X n ( β 0 β ˜ 0 ) + ( ρ 0 ρ ˜ 0 ) 2 ϵ n G n G n ϵ n + 2 ( ρ 0 ρ ˜ n ) ϵ n W n X n ( β 0 β ˜ n ) + 2 ( ρ 0 ρ ˜ n ) ϵ n G n A n X n ( β 0 β ˜ n ) + ( β 0 β ˜ n ) X n A n A n X n ( β 0 β ˜ n ) + 2 ( ρ 0 ρ ˜ n ) 2 ϵ n G n W n X n ( β 0 β ˜ n ) + 2 ( ρ 0 ρ ˜ n ) ( β β ˜ n ) X n A n W n X n ( β 0 β ˜ n ) + ( ρ 0 ρ ˜ n ) 2 ( β 0 β ˜ n ) X n W n W n X n ( β 0 β ˜ n ) = o p ( 1 ) .
Now, by the mean value theorem, tr ( G n 2 ( ρ ˜ n ) ) = tr ( G n 2 ) + 2 tr [ G n 3 ( ρ ¯ n ) ] ( ρ ˜ n ρ 0 ) , where ρ ¯ n lies between ρ 0 and ρ ˜ n . By Lemma A.2 and Assumptions 4 and 5, tr [ G n 3 ( ρ ¯ n ) ] = O n h n . Hence, h n n [ tr ( G n 2 ( ρ ˜ n ) ) tr ( G n 2 ) ] = o p ( 1 ) , since ρ ˜ n p ρ 0 .
Further, ϵ n G n G n ϵ n = Y n W n W n Y n 2 Y n W n W n X n β 0 + β 0 X n W n W n X n β 0 = O p n h n by Lemmas A.2(i) and A.3(v). Hence, h n n [ ϵ ˜ n G ˜ n G ˜ n ϵ ˜ n ϵ n G n G n ϵ n ] = h n n [ ( β 0 β ˜ n ) X n W n W n X n ( β 0 β ˜ n ) 2 ϵ n G n W n X n ( β 0 β ˜ n ) ] = o p ( 1 ) ; hence,
h n n 2 ρ 2 n ( θ ˜ n ) 2 ρ 2 n ( θ 0 ) = h n n 1 σ 0 2 ϵ n G n G n ϵ n 1 σ ˜ n 2 ϵ ˜ n G ˜ n G ˜ n ϵ ˜ n + tr ( G n 2 ) tr ( G ˜ n 2 ) = h n n 1 σ 0 2 1 σ ˜ n 2 ϵ n G n G n ϵ n + o p ( 1 ) = o p ( 1 )
Using similar arguments, the convergence in probability to zero of the rest of the terms in the modified Hessian can be shown:
h n n ( 2 β ρ n ( θ ˜ n ) 2 β ρ n ( θ 0 ) ) = 2 h n n σ 0 2 ( X n W n ϵ n X n W n ϵ ˜ n ) + o p ( 1 ) = o p ( 1 ) , 1 n ( 2 β σ 2 n ( θ ˜ n ) 2 β σ 2 n ( θ 0 ) ) = 1 n σ 0 4 [ ( X n A n ϵ n ) ( X n A ˜ n ϵ ˜ n ) ] + o p ( 1 ) = o p ( 1 ) , and h n n ( 2 σ 2 ρ n ( θ ˜ n ) 2 σ 2 ρ n ( θ 0 ) ) = h n n σ 4 ( ϵ n G n ϵ n ϵ ˜ n G ˜ n ϵ ˜ n ) + o p ( 1 ) = h n n σ 4 [ ϵ n W n ( Y n X n β n ) ϵ ˜ n W n ( Y n X n β ˜ n ) ] + o p ( 1 ) = h n n σ 4 [ ( ϵ n ϵ ˜ n ) W n Y n ϵ n W n X n β n + ϵ ˜ n W n X n β ˜ n ] + o p ( 1 ) = o p ( 1 )
Proof of (ii) is more straightforward, as the differences of the corresponding elements of 1 n K n H n ( θ 0 ) K n and 1 n K n Σ n K n are, respectively, zero, 1 n σ 4 ( X n A n ϵ n ) = o p ( 1 ) , 1 2 n σ 6 ( 2 ϵ n ϵ n n σ 2 ) 1 2 σ 0 4 = 1 n σ 6 ϵ n ϵ n = o p ( 1 ) , 2 h n n σ 0 2 X n A n G n = o p ( 1 ) , h n n σ 4 ϵ n G n ϵ n h n n σ 0 2 tr ( G n ) = o p ( 1 ) and h n n σ 0 2 ( ϵ n G n G n ϵ n + σ 2 tr ( G n 2 ) ) h n n tr ( G n s G n ) = h n n σ 0 2 ϵ n G n G n ϵ n = o p ( 1 ) .
Results (i) and (ii) give 0 = 1 n S n * 1 n Σ n * . n K n 1 ( θ ^ n θ 0 ) + o p ( 1 ) , and it follows that,
n K n 1 ( θ ^ n θ 0 ) = Σ n * 1 S n * D N ( 0 , Σ * 1 Γ * Σ * 1 )
Proof of Corollary 1: By using the block diagonal nature of Σ n ,
Σ n 1 = σ 0 2 ( X n A n A n X n ) 1 0 0 0 2 σ 0 4 n T 1 n 2 σ 0 2 n T 2 n 0 2 σ 0 2 n T 2 n h n n T 4 n
where T 1 n = t r ( G n s G n ) t r ( C n s C n ) , T 2 n = t r ( G n ) t r ( C n s C n ) , T 4 n = n h n t r 1 ( C n s C n ) . Then, deriving Σ n * 1 Γ n * Σ n * 1 = K n 1 Σ n 1 Γ n Σ n 1 K n 1 is just a matter of matrix multiplication.

B. Proofs of Higher-Order Results in Section 3

We prove the higher-order results given in Section 3. First, we present the full expressions for D j n ( ρ ) , j = 2 , 3 , 4 , which are required in the expressions for R j n ( ρ ) given in Equation (20):
D 2 n ( ρ ) = G n ( ρ ) M n ( ρ ) G n ( ρ ) 2 G n ( ρ ) P n ( ρ ) G n ( ρ ) G n ( ρ ) P n ( ρ ) G n ( ρ ) , D 3 n ( ρ ) = D ˙ 2 n ( ρ ) + G n ( ρ ) P n ( ρ ) D 2 n ( ρ ) + D 2 n ( ρ ) P n ( ρ ) G n ( ρ ) G n ( ρ ) M n ( ρ ) D 2 n ( ρ ) D 2 n ( ρ ) M n ( ρ ) G n ( ρ ) , D 4 n ( ρ ) = D ˙ 3 n ( ρ ) + G n ( ρ ) P n ( ρ ) D 3 n ( ρ ) + D 3 n ( ρ ) P n ( ρ ) G n ( ρ ) G n ( ρ ) M n ( ρ ) D 3 n ( ρ ) D 3 n ( ρ ) M n ( ρ ) G n ( ρ )
where P n ( ρ ) = I n M n ( ρ ) and D ˙ j n ( ρ ) = d d ρ D j n ( ρ ) , j = 2 , 3 . Note that a predictable pattern emerges from D 3 n ( ρ ) onwards. Using the fact that d d ρ G n i = G n i + 1 for i = 1 , 2 , , we have,
D ˙ 2 n ( ρ ) = G n 2 ( ρ ) M n ( ρ ) G n ( ρ ) + G n ( ρ ) M ˙ n ( ρ ) G n ( ρ ) + G n ( ρ ) M n ( ρ ) G n 2 ( ρ ) 2 G n 2 ( ρ ) P n ( ρ ) G n ( ρ ) + 2 G n ( ρ ) M ˙ n ( ρ ) G n ( ρ ) 2 G n ( ρ ) P n ( ρ ) G n 2 ( ρ ) G n 2 ( ρ ) P n ( ρ ) G n ( ρ ) + G n ( ρ ) M ˙ n ( ρ ) G n ( ρ ) G n ( ρ ) P n ( ρ ) G n 2 ( ρ ) , M ˙ n ( ρ ) = P n ( ρ ) G n ( ρ ) M n ( ρ ) + M n ( ρ ) G n ( ρ ) P n ( ρ ) , D ˙ 3 n ( ρ ) = G n 3 ( ρ ) M n ( ρ ) G n ( ρ ) + 2 G n 2 ( ρ ) M ˙ n ( ρ ) G n ( ρ ) + 2 G n 2 ( ρ ) M n ( ρ ) G n 2 ( ρ ) + G n ( ρ ) M ¨ n ( ρ ) G n ( ρ ) + 2 G n ( ρ ) M ˙ n ( ρ ) G n 2 ( ρ ) + G n ( ρ ) M n ( ρ ) G n 3 ( ρ ) 2 G n 3 ( ρ ) P n ( ρ ) G n ( ρ ) + 4 G n 2 ( ρ ) M ˙ n ( ρ ) G n ( ρ ) 4 G n 2 ( ρ ) P n ( ρ ) G n 2 ( ρ ) + 2 G n ( ρ ) M ¨ n ( ρ ) G n ( ρ ) + 4 G n ( ρ ) M ˙ n ( ρ ) G n 2 ( ρ ) 2 G n ( ρ ) P n ( ρ ) G n 3 ( ρ ) G n 3 ( ρ ) P n ( ρ ) G n ( ρ ) + 2 G 2 ( ρ ) M ˙ n ( ρ ) G n ( ρ ) 2 G n 2 ( ρ ) P n ( ρ ) G n 2 ( ρ ) + G n ( ρ ) M ¨ n ( ρ ) G n ( ρ ) + 2 G n ( ρ ) M ˙ n ( ρ ) G n 2 ( ρ ) G n ( ρ ) P n ( ρ ) G n 3 ( ρ )
M ¨ n ( ρ ) = 2 P n ( ρ ) G n ( ρ ) P n ( ρ ) G n ( ρ ) M n ( ρ ) + 2 P n ( ρ ) G n ( ρ ) M n ( ρ ) G n ( ρ ) P n ( ρ ) + 2 M n ( ρ ) G n ( ρ ) P n ( ρ ) G n ( ρ ) P n ( ρ ) 2 M n ( ρ ) G n ( ρ ) P n ( ρ ) G n ( ρ ) M n ( ρ )
For the SED model with SMA errors, the additional quantities required by Equation (30) are,
D 2 n ( ρ ) = G n ( ρ ) M n ( ρ ) G n ( ρ ) + 2 G n ( ρ ) M n ( ρ ) G n ( ρ ) G n ( ρ ) P n ( ρ ) G n ( ρ ) , D 3 n ( ρ ) = D ˙ 2 n ( ρ ) G n ( ρ ) P n ( ρ ) D 2 n ( ρ ) D 2 n ( ρ ) P n ( ρ ) G n ( ρ ) + G n ( ρ ) M n ( ρ ) D 2 n ( ρ ) + D 2 n ( ρ ) M n ( ρ ) G n ( ρ ) , D ˙ 2 n ( ρ ) = G n 2 ( ρ ) M n ( ρ ) G n ( ρ ) + G n ( ρ ) M ˙ n ( ρ ) G n ( ρ ) + G n ( ρ ) M n ( ρ ) G n 2 ( ρ ) + 2 G n 2 ( ρ ) M n ( ρ ) G n ( ρ ) + 2 G n ( ρ ) M ˙ n ( ρ ) G n ( ρ ) + 2 G n ( ρ ) M n ( ρ ) G n 2 ( ρ ) G n 2 ( ρ ) P n ( ρ ) G n ( ρ ) + G n ( ρ ) M ˙ n ( ρ ) G n ( ρ ) G n ( ρ ) P n ( ρ ) G n 2 ( ρ ) , M ˙ n ( ρ ) = P n ( ρ ) G n ( ρ ) M n ( ρ ) M n ( ρ ) G n ( ρ ) P n ( ρ ) , a n d P n = I n M n
Proof of Lemma 1: Note, σ ^ n 2 ( ρ 0 ) σ ^ n 0 2 = 1 n Y n A n M n A n Y n = 1 n ϵ n M n ϵ n . By the moments for quadratic forms, we have Var ( σ ^ n 0 2 ) = 1 n 2 O ( n ) = O ( 1 n ) . Now, by the generalized Chebyshev’s inequality, P ( n | σ ^ n 0 2 σ 0 2 | δ ) 1 δ 2 n Var ( σ ^ n 0 2 ) = O ( 1 ) . Hence, by the definition of order of magnitudes6 for stochastic components, we have σ ^ n 0 2 = σ 0 2 + O p ( 1 n ) .
In order to prove that σ ^ n 0 2 is n -consistent, by the mean value theorem, we have 1 σ ^ n 0 2 1 σ 0 2 = 1 σ ¯ n 0 4 ( σ ^ n 0 2 σ 0 2 ) , which can be written as 1 σ ^ n 0 2 = 1 σ 0 2 1 σ 0 4 ( σ ^ n 0 2 σ 0 2 ) 1 σ ¯ n 0 4 1 σ 0 4 ( σ ^ n 0 2 σ 0 2 ) , where σ ¯ 2 lies between σ ^ n 0 2 and σ 0 2 . Hence, σ ¯ n 0 2 = σ 0 2 + O p ( 1 n ) , σ ¯ n 0 4 = σ 0 2 + O p ( 1 n ) 2 = σ 0 4 + O p ( 1 n ) , and σ ¯ n 0 4 = σ 0 4 + O p ( 1 n ) 1 = σ 0 4 + O p ( 1 n ) . Therefore, we conclude that σ ^ n 0 2 = σ 0 2 + O p ( 1 n ) .
Now, consider h n R 1 n = h n n σ ^ n 0 2 ϵ n M n G n M n ϵ n . By Lemma A.3(v), h n n ϵ n M n G n M n ϵ n = O p ( 1 ) . Hence,
h n R 1 n = 1 σ 0 2 h n n ϵ n M n G n M n ϵ n + O p ( 1 n ) = O p ( 1 )
Using the expression for σ ^ n 0 2 , E ( h n R 1 n ) = 1 σ 0 2 E h n n ϵ n M n G n M n ϵ n 1 σ 0 4 E h n n ϵ n M n G n M n ϵ n ( σ ^ n 0 2 σ 0 2 ) E h n n ϵ n M n G n M n ϵ n 1 σ ¯ n 0 4 1 σ 0 4 ( σ ^ n 0 2 σ 0 2 ) . The first term is h n σ 0 2 n E ( ϵ n ϵ n ) tr ( M n G n M n ) = O ( 1 ) . The third term is O ( ( h n n ) 1 2 ) by Assumption 7. For the second term, note that E ( σ ^ n 0 2 ) = σ 0 2 + O ( 1 n ) and E ( ϵ n M n G n M n ϵ n ) = σ 0 2 tr ( M n G n M n ) = O ( n h n ) . Then, by the Cauchy–Schwartz inequality,
| E ϵ n M n G n M n ϵ n ( σ ^ n 0 2 σ 0 2 ) | = | E [ ϵ n M n G n M n ϵ n E ( ϵ n M n G n M n ϵ n ) + E ( ϵ n M n G n M n ϵ n ) ] ( σ ^ n 0 2 σ 0 2 ) | | E [ ϵ n M n G n M n ϵ n σ 0 2 tr ( M n G n M n ) ] ( σ ^ n 0 2 σ 0 2 ) | + σ 0 2 | tr ( M n G n M n ) E σ ^ n 0 2 σ 0 2 | = | Cov [ ϵ n M n G n M n ϵ n σ 0 2 tr ( M n G n M n ) ] , ( σ ^ n 0 2 σ 0 2 ) | + O ( 1 h n ) 1 n Var ( ϵ n M n G n M n ϵ n ) Var ( ϵ n M n ϵ ) 1 2 + O ( 1 h n ) = 1 n O ( n h n ) O ( n ) 1 2 + O ( 1 h n ) = O ( 1 h n )
where we have used the results for quadratic forms. Then, 1 σ 0 4 E h n n ϵ n M n G n M n ϵ n ( σ ^ n 0 2 σ 0 2 ) = O ( h n n ) , which implies,
E ( h n R 1 n ) = Max O ( 1 ) , O h n n , O ( h n n ) 1 2 = O ( 1 ) .
By Equations (B-1) and (B-2), h n R 1 n E ( h n R 1 n ) = h n σ 0 2 n ϵ n M n G n M n ϵ n h n σ 0 2 n E ( ϵ n ϵ n ) tr ( M n G n M n ) + O p ( 1 n ) O ( h n n ) O ( ( h n n ) 1 2 ) = O ( ( h n n ) 1 2 ) .
By Lemma A.2, the remaining parts can be proven in a similar fashion noting that D j n = O ( n h n ) of the sandwich forms of R j n for j = 2 , 3 , 4 of the higher order derivatives of the concentrated estimating equation.
Proof of Proposition 1: We go on to prove the proposition using Lemma 1. To that effect, consider the Taylor series expansion of ψ ˜ n ( ρ ) around ρ 0 ,
0 = ψ ˜ n ( ρ ^ n ) = ψ ˜ n + H 1 n ( ρ ^ n ρ 0 ) + 1 2 H 2 n ( ρ ^ n ρ 0 ) 2 + 1 6 H 3 n ( ρ ^ n ρ 0 ) 3 + 1 6 [ H 3 n ( ρ ¯ ) H 3 n ] ( ρ ^ n ρ 0 ) 3
where the last two terms sum up the mean value form of the remainder term with ρ ¯ lying between ρ 0 and ρ ^ n . We have already shown that ρ ^ n ρ 0 p h n n 1 2 . Next, note that h n T r n = O ( 1 ) for r = 0 , 1 , 2 , 3 by Assumptions 4 and 5. Now, in order to prove the result of the proposition, we need to establish the following conditions:
( i ) ψ ˜ n = O p ( h n n ) 1 2 and E ( ψ ˜ n ) = O ( h n n ) ,
( i i ) E ( H r n ) = O ( 1 ) and H r n E ( H r n ) = O p ( h n n ) 1 2 for r = 1 , 2 , 3 ,
( i i i ) H 1 n 1 = O p u ( 1 ) and E ( H 1 n ) 1 = O ( 1 ) and
( i v ) H 3 n ( ρ ¯ ) H 3 n = O p ( h n n ) 1 2 .
For ( i ) , by Lemma A.2, ϵ n M n G n M n ϵ n σ 0 2 tr ( M n G n M n ) = O p ( n h n ) 1 2 and:
tr ( M n G n M n ) = tr ( G n ) + O ( 1 ) = n T 0 n + O ( 1 )
Therefore, ψ ˜ n = h n T 0 n + h n R 1 n = h n T 0 n + h n σ 0 2 n ϵ n M n G n M n ϵ n + O p ( 1 n ) = h n T 0 n + h n σ 0 2 n σ 0 2 tr ( G n ) + O p ( ( n h n ) 1 2 ) + O p ( 1 n ) = O p ( h n n ) 1 2 and E ( ψ ˜ n ) = h n T 0 n + h n n tr ( M n G n M n ) + O ( h n n ) 1 2 = h n T 0 n + h n n ( tr ( G n ) + O ( 1 ) ) + O ( h n n ) 1 2 = O ( h n n ) .
For ( i i ) , Lemma 1 implies ( h n R 1 n ) s = E ( h n R 1 n ) s + O p ( ( h n n ) 1 2 ) for s = 2 , 3 , 4 , ( h n R 2 n ) 2 = E ( h n R 2 n ) 2 + O p ( ( h n n ) 1 2 ) , ( h n R 1 n ) s h n R 2 n = E ( h n R 1 n ) s E ( h n R 2 n ) + O p ( ( h n n ) 1 2 ) for s = 1 , 2 , and h n R 1 n h n R 3 n = E ( h n R 1 n ) E ( h n R 3 n ) + O p ( ( h n n ) 1 2 ) .
Therefore, Assumption 8 implies E [ ( h n R 1 n ) s ] = E ( h n R 1 n ) s + O ( ( h n n ) 1 2 ) for s = 2 , 3 , 4 , E [ ( h n R 2 n ) 2 ] = E ( h n R 2 n ) 2 + O ( ( h n n ) 1 2 ) , E [ ( h n R 1 n ) s h n R 2 n ] = E ( h n R 1 n ) s E ( h n R 2 n ) + O ( ( h n n ) 1 2 ) for s = 1 , 2 , and E [ h n R 1 n h n R 3 n ] = E ( h n R 1 n ) E ( h n R 3 n ) + O ( ( h n n ) 1 2 ) . Combining these results with (B-3) and Lemma 1, we reach the conclusion that H r n E ( H r n ) = O p ( ( h n n ) 1 2 ) and E ( H r n ) = O ( 1 ) for r = 1 , 2 , 3 .
For ( i i i ) , by Lemma 1 and E [ ( h n R 1 n ) 2 ] = E ( h n R 1 n ) 2 + O ( h n n ) 1 2 ,
E ( H 1 n ) = 2 h n E [ ( h n R 1 n ) 2 ] h n T 1 n E ( h n R 2 n ) = 2 h n h n n tr ( M n G n M n ) + O ( h n n ) 1 2 2 h n T 1 n h n n tr ( M n D 2 n M n ) + O ( h n n ) 1 2 = 2 h n h n n tr ( M n G n M n ) 2 h n T 1 n h n n tr ( M n D 2 n M n ) + O ( h n n ) 1 2 = 2 h n h n n tr ( G n ) 2 h n n tr ( G n 2 ) h n n tr ( G n G n ) + O ( h n n ) 1 2 = h n n tr ( G n 2 ) + tr ( G n G n ) 2 T 0 n 2 tr ( I n ) + O ( h n n ) 1 2 = h n n tr ( G n T 0 n I n ) 2 + tr ( G n T 0 n I n ) ( G n T 0 n I n ) + O ( h n n ) 1 2
That is, E ( H 1 n ) is negative for sufficiently large n and it is finite. Therefore, E ( H 1 n ) 1 = O ( 1 ) . Furthermore, by H 1 n = E ( H 1 n ) + O p ( ( h n n ) 1 2 ) , we have H 1 n 1 = O p ( 1 ) .
Finally, for ( i v ) , consider Equation (19) evaluated at ρ ¯ n . By the mean value theorem h n T 3 n ( ρ ¯ ) = h n n tr ( G n 4 ( ρ ¯ ) ) = h n n tr ( G n 4 ) + 4 h n n tr ( G n 5 ( ρ ˜ ) ) ( ρ ¯ ρ 0 ) , where ρ ˜ lies between ρ ¯ and ρ 0 . By repeatedly applying the mean value theorem, we can find a ρ ˜ , which is much closer to the true value ρ 0 . For such ρ ˜ , h n n tr ( G n 5 ( ρ ˜ ) ) = O ( 1 ) by Assumptions 4 and 5. Combining with the ( n h n ) 1 / 2 -convergence of ρ ¯ to the true value, we have h n T 3 n ( ρ ¯ ) = O ( 1 ) .
Now, consider σ ^ n 2 ( ρ ¯ ) = 1 n Y n A n ( ρ ¯ ) M n ( ρ ¯ ) A n ( ρ ¯ ) Y n and σ ^ n 0 2 = 1 n Y n A n M n A n Y n . Similarly, by the mean value theorem, we have σ ^ n 2 ( ρ ¯ ) = σ ^ n 0 2 2 n ( ρ ¯ ρ 0 ) Y n A n ( ρ ˜ ) M n ( ρ ˜ ) G n ( ρ ˜ ) M n ( ρ ˜ ) A n ( ρ ˜ ) Y n = σ ^ n 0 2 2 ( ρ ¯ ρ 0 ) O p ( h n 1 ) = σ ^ n 0 2 + O p ( ( n h n ) 1 / 2 ) . By continuity of σ ^ n 0 2 , it can be deduced that σ ^ n 2 ( ρ ¯ ) = σ ^ n 0 2 + O p ( ( n h n ) 1 / 2 ) 1 = σ ^ n 0 2 + O p ( ( n h n ) 1 / 2 ) . Now,
h n R 1 n ( ρ ¯ ) = σ ^ n 2 ( ρ ¯ ) h n n Y n A n ( ρ ¯ ) M n ( ρ ¯ ) G n ( ρ ¯ ) M n ( ρ ¯ ) A n ( ρ ¯ ) Y n = σ ^ n 2 ( ρ ¯ ) h n n Y n A n M n G n M n A n Y n ( ρ ¯ ρ 0 ) Y n A n ( ρ ˜ ) M n ( ρ ˜ ) D 2 n ( ρ ˜ ) M n ( ρ ˜ ) A n ( ρ ˜ ) Y n = h n R 1 n + O p ( 1 n h n ) 1 2 O p ( ( h n n ) 1 2 ) = h n R 1 n + O p ( h n n ) 1 2
Using a similar set of arguments, it can be shown that h n R k n ( ρ ¯ ) = h n R k n + O p ( ( h n n ) 1 2 ) for k = 2 , 3 , 4 . Then, it follows that H 3 n ( ρ ¯ ) H 3 n = O p ( ( h n n ) 1 2 ) .
Proof of Proposition 2: The arguments are similar to that of Proposition 1.
Proof of Proposition 3: Note that b 2 ( ρ 0 , γ 0 ) = O ( ( n h n ) 1 ) and that it is differentiable. It follows that ( ρ 0 , γ 0 ) b 2 ( ρ 0 , γ 0 ) = O ( ( n h n ) 1 ) . As ρ ^ n , the QMLE of ρ defined at the beginning of Section 2, is n / h n -consistent, it can be shown that γ ^ n = γ ( F ^ n ) is also n / h n -consistent. We have, under the additional assumptions in Proposition 3,
b 2 ( ρ ^ n , γ ^ n ) = b 2 ( ρ 0 , γ 0 ) + ρ 0 b 2 ( ρ 0 , γ 0 ) ( ρ ^ n ρ 0 ) + γ 0 b 2 ( ρ 0 , γ 0 ) ( γ ^ n γ 0 ) + O p ( ( n h n ) 2 )
Thus, E [ b 2 ( ρ ^ n , γ ^ n ) ] = b 2 ( ρ 0 , γ 0 ) + ρ 0 b 2 ( ρ 0 , γ 0 ) E ( ρ ^ n ρ 0 ) + γ 0 b 2 ( ρ 0 , γ 0 ) E ( γ ^ n γ 0 ) + O ( ( n h n ) 2 ) ] . As E ( ρ ^ n ρ 0 ) = O ( h n n ) , it can be shown that E ( γ ^ n γ 0 ) = O ( h n n ) . These lead to E [ b 2 ( ρ ^ n , γ ^ n ) ] = b 2 ( ρ 0 , γ 0 ) + O ( ( n h n ) 2 ) . Similarly, we show that E [ b 3 ( ρ ^ n , γ ^ n ) ] = b 3 ( ρ 0 , γ 0 ) + o ( ( n h n ) 2 ) , noting that b 3 ( ρ 0 , γ 0 ) = O ( ( n h n ) 3 / 2 ) .
Clearly, our bootstrap estimate has two-step approximations: one is that described above and the other is the bootstrap approximations to the various expectations in Equation (25) given ρ ^ n , e.g.,
E ^ ( H 1 n ψ ˜ n ) = 1 B b = 1 B H 1 n ( e n , b * , ρ ^ n ) ψ ˜ n ( e n , b * , ρ ^ n )
However, these approximations can be made arbitrarily accurate, for a given ρ ^ n and F n , by choosing an arbitrarily large B. The result of Proposition 3 thus follows.

References

  1. A. Cliff, and J.K. Ord. “Testing for spatial autocorrelation among regression residuals.” Geogr. Anal. 4 (1972): 267–284. [Google Scholar] [CrossRef]
  2. A.D. Cliff, and J.K. Ord. Spatial Autocorrelation. London, UK: Pion, 1973. [Google Scholar]
  3. J. Ord. “Estimation methods for models of spatial interaction.” J. Am. Stat. Assoc. 70 (1975): 120–126. [Google Scholar] [CrossRef]
  4. P. Burridge. “On the Cliff-Ord test for spatial autocorrelation.” J. R. Stat. Soc. B42 (1980): 107–108. [Google Scholar]
  5. A.D. Cliff, and J.K. Ord. Spatial Process, Models and Applications. London, UK: Pion, 1981. [Google Scholar]
  6. L. Anselin. “Estimation Methods for Spatial Autoregressive Structures: A Study in Spatial Econometrics.” PhD Thesis, Cornell University, Ithaca, NY, USA, 1980. [Google Scholar]
  7. L. Anselin. Spatial Econometrics: Methods and Models. Dordrecht, The Netherlands: Kluwer Academic Publishers, 1988. [Google Scholar]
  8. L. Anselin, and A.K. Bera. “Spatial dependence in linear regression models with an introduction to spatial econometrics.” In Handbook of Applied Economic Statistics. Edited by A. Ullah and E.A. David. Giles, NY, USA: Marcel Dekker, 1998. [Google Scholar]
  9. L. Anselin. “Spatial econometrics.” In A Companion to Theoretical Econometrics. Edited by B.H. Baltagi. Hoboken, NJ, USA: Blackwell Publishing, 2001. [Google Scholar]
  10. L. Anselin. “Spatial externalities, spatial multipliers, and spatial econometrics.” Int. Reg. Sci. Rev. 26 (2003): 153–166. [Google Scholar] [CrossRef]
  11. D. Das, H.H. Kelejian, and I.R. Prucha. “Finite sample properties of spatial autoregressive models with autoregressive disturbances.” Pap. Reg. Sci. 82 (2003): 1–26. [Google Scholar] [CrossRef]
  12. H.H. Kelejian, and I.R. Prucha. “A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbance.” J. Real Estate Financ. Econ. 17 (1998): 99–121. [Google Scholar] [CrossRef]
  13. L.F. Lee, and X. Liu. “Efficient GMM estimation of high order spatial autoregressive models with autoregressive disturbances.” Econom. Theory 26 (2010): 187–230. [Google Scholar] [CrossRef]
  14. J. Pinkse. Asymptotic Properties of the Moran and Related Tests and a Test for Spatial Correlation in Probit Models. Vancouver, BC, Canada: Department of Economics, University of British Columbia, 1998. [Google Scholar]
  15. M.M. Fleming. “Techniques for estimating spatially dependent discrete choice models.” In Advances in Spatial Econometrics. Edited by L. Anselin, R.J.G.M. Florax and S.J. Rey. Berlin/Heidelberg, Germany: Springer, 2004. [Google Scholar]
  16. L.F. Lee, and J. Yu. “Some recent developments in spatial panel data models.” Reg. Sci. Urban Econ. 40 (2010): 255–271. [Google Scholar] [CrossRef]
  17. H.H. Kelejian, and D.P. Robinson. “A suggested method of estimation for spatial interdependent models with autocorrelated errors and an application to a county expenditure model.” Pap. Reg. Sci. 72 (1993): 297–312. [Google Scholar] [CrossRef]
  18. H.H. Kelejian, and I.R. Prucha. “A generalized moments estimator for the autoregressive parameter in a spatial model.” Int. Econ. Rev. 40 (1999): 509–533. [Google Scholar] [CrossRef]
  19. L.F. Lee. “The Ohio State University, Columbus, OH, USA. Generalised method of moments estimation of spatial autoregressive processes.” Unpublished manuscript. 2001. [Google Scholar]
  20. L.F. Lee. “GMM and 2SLS estimation of mixed regressive, spatial autoregressive models.” J. Econom. 137 (2007): 489–514. [Google Scholar] [CrossRef]
  21. B. Fingleton. “A generalized method of moments estimator for a spatial model with moving average errors, with application to real estate prices.” Empir. Econ. 33 (2008): 35–57. [Google Scholar] [CrossRef]
  22. L.F. Lee. “Best spatial two-stage least squares estimators for a spatial autoregressive models with autoregressive disturbances.” Econom. Rev. 22 (2003): 307–335. [Google Scholar] [CrossRef]
  23. H.H. Kelejian, and I.R. Prucha. “Instrumental variable estimation of a spatial autoregressive models with autoregressive disturbances: Large and small sample results.” In Spatial and Spatiotemporal Econometrics, Advances in Econometrics. Edited by J. LeSage and R.K. Pace. New York, NY, USA: Elsevier, 2004, Volume 18, pp. 163–198. [Google Scholar]
  24. L.F. Lee. “Consistency and efficiency of least squares estimation for mixed regressive spatial autoregressive models.” Econom. Theory 18 (2002): 252–277. [Google Scholar] [CrossRef]
  25. L.F. Lee. “Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models.” Econometrica 72 (2004): 1899–1925. [Google Scholar] [CrossRef]
  26. Y. Bao, and A. Ullah. “Finite sample properties of maximum likelihood estimator in spatial models.” J. Econom. 137 (2007): 396–413. [Google Scholar] [CrossRef]
  27. Y. Bao. “Finite sample bias of QMLE in spatial autoregressive models.” Econom. Theory 29 (2013): 68–88. [Google Scholar] [CrossRef]
  28. Z.L. Yang. “A general method for third order bias and variance corrections on a non-linear estimator.” J. Econom. 186 (2015): 178–200. [Google Scholar] [CrossRef]
  29. B. Baltagi, and Z.L. Yang. “Standardized LM tests for spatial error dependence in linear or panel regressions.” Econom. J. 16 (2013): 103–134. [Google Scholar] [CrossRef]
  30. B. Baltagi, and Z.L. Yang. “Heteroskedasticity and non-normality robust LM tests of spatial dependence.” Reg. Sci. Urban Econ. 43 (2013): 725–739. [Google Scholar] [CrossRef]
  31. S.F. Liu, and Z.L. Yang. “Modified QML estimation of spatial autoregressive models with unknown heteroskedasticity and nonnormality.” Reg. Sci. Urban Econ. 52 (2015): 50–70. [Google Scholar] [CrossRef]
  32. F. Jin, and L.F. Lee. “Cox-type tests for competing spatial autoregressive models with spatial autoregressive disturbances.” Reg. Sci. Urban Econ. 43 (2013): 590–616. [Google Scholar] [CrossRef]
  33. F. Martellosio. “Power properties of invariant tests for spatial autocorrelation in linear regression.” Econom. Theory 26 (2010): 152–186. [Google Scholar] [CrossRef]
  34. J.R. Magnus. “Maximum likelihood estimation of the GLS model with unknown parameters in the disturbance covariance matrix.” J. Econom. 7 (1978): 281–312. [Google Scholar] [CrossRef]
  35. H.H. Kelejian, and I.R. Prucha. “Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances.” J. Econom. 157 (2010): 53–67. [Google Scholar] [CrossRef] [PubMed]
  36. J. LeSage, and R.K. Pace. Introduction to Spatial Econometrics. London, UK: CRC Press, Taylor & Francis Group, 2009. [Google Scholar]
  37. H. White. Estimation, Inference and Specification Analysis. New York, NY, USA: Cambridge University Press, 1994. [Google Scholar]
  38. H.H. Kelejian, and I.R. Prucha. “On the asymptotic distribution of the Moran I test statistic with applications.” J. Econom. 104 (2001): 219–257. [Google Scholar] [CrossRef]
  39. J.F. Kiviet. “On bias, inconsistency, and efficiency of various estimators in dynamic panel data models.” J. Econom. 68 (1995): 53–78. [Google Scholar] [CrossRef]
  40. J. Hahn, and G. Kuersteiner. “Asymptotically unbiased inference for a dynamic panel model with fixed effects when both n and T are large.” Econometrica 70 (2002): 1639–1657. [Google Scholar] [CrossRef]
  41. J. Hahn, and W. Newey. “Jackknife and analytical bias reduction for nonlinear panel models.” Econometrica 72 (2004): 1295–1319. [Google Scholar] [CrossRef]
  42. M.J.G. Bun, and M.A. Carree. “Bias-corrected estimation in dynamic panel data models.” J. Bus. Econ. Stat. 23 (2005): 200–210. [Google Scholar] [CrossRef]
  43. P. Rilstone, V.K. Srivastava, and A. Ullah. “The second-order bias and mean squared error of nonlinear estimators.” J. Econom. 75 (1996): 369–395. [Google Scholar] [CrossRef]
  44. B. Efron. “Bootstrap method: Another look at the jackknife.” Ann. Stat. 7 (1979): 1–26. [Google Scholar] [CrossRef]
  45. C.R. Rao. Linear Statistical Inference and Its Applications. New York, NY, USA: John Wiley and Sons, 1973. [Google Scholar]
  • 1Other estimation methods include GMM(Kelejian and Robinson [17]; Kelejian and Prucha [18]; Lee [19,20]; Fingleton [21]), 2SLS(Kelejian and Prucha [12]; Lee [22]), IV estimation (Kelejian and Prucha [23]) and OLS estimation (Lee [24]).
  • 2Here, the degree of spatial dependence refers to, e.g., the number of neighbors each spatial unit has or the connectivity in general. Jin and Lee [32] studied asymptotic properties of models with both SLD and SED for the purpose of constructing Cox-type tests, but did not study these issues. Further, it is important to know the differences between the SLD model and the SED model in terms of asymptotic and finite sample behaviors, as they may provide valuable guidance in the specification choice. See also Martellosio [33] for a related work.
  • 3For this, it is necessary that | I n ρ W n | = i = 1 n ( 1 ρ λ i ) > 0 , where { λ i } are the eigenvalues of W n . If the eigenvalues of W n are all real, the parameter space 𝒫 can be a closed interval contained in ( λ min 1 , λ max 1 ) , where λ min and λ max are, respectively, the minimum and maximum eigenvalues. If W n is row-normalized, then λ max = 1 and 1 λ min < 0 and 𝒫 can be a closed interval contained in ( λ min 1 , 1 ) , where the lower bound can be below 1 (Anselin [7]). In general, the eigenvalues of W n may not be all real, and in this case, Kelejian and Prucha [35] suggested the interval ( τ n 1 , τ n 1 ) , where τ n = max i | λ i | is the spectral radius of the weights matrix; LeSage and Pace [36] (p. 88–89) suggested interval ( λ s 1 , 1 ) , where λ s is the most negative real eigenvalue of W n , as only the real eigenvalues can affect the singularity of I n λ W n .
  • 4Whether to bootstrap the standardized QML residuals e ^ n or the original QML residuals ϵ ^ n = σ ^ n e ^ n does not make a difference, as R j n are invariant of σ 0 . However, use of e ^ n makes the theoretical discussion easier.
  • 5A more natural parameterization for the SMA error model may be u n = ϵ n + ρ W n ϵ n , under which P becomes a closed interval contained in ( 1 , λ min 1 ) , but the QMLE ρ ^ n is now downward biased, and hence, when ρ 0 is negative and n is small, ρ ^ n may hit the lower bound of 𝒫, causing the numerical instability of ( I n + ρ ^ n W n ) 1 .
  • 6If ϵ > 0 , c 0 , n 0 > 0 s.t. P ( | x n | > c f n ) < ϵ , n n 0 , then x n = O p ( f n )

Share and Cite

MDPI and ACS Style

Liu, S.F.; Yang, Z. Asymptotic Distribution and Finite Sample Bias Correction of QML Estimators for Spatial Error Dependence Model. Econometrics 2015, 3, 376-411. https://doi.org/10.3390/econometrics3020376

AMA Style

Liu SF, Yang Z. Asymptotic Distribution and Finite Sample Bias Correction of QML Estimators for Spatial Error Dependence Model. Econometrics. 2015; 3(2):376-411. https://doi.org/10.3390/econometrics3020376

Chicago/Turabian Style

Liu, Shew Fan, and Zhenlin Yang. 2015. "Asymptotic Distribution and Finite Sample Bias Correction of QML Estimators for Spatial Error Dependence Model" Econometrics 3, no. 2: 376-411. https://doi.org/10.3390/econometrics3020376

Article Metrics

Back to TopTop