A Review of Software for Spatial Econometrics in R

The software for spatial econometrics available in the R system for statistical computing is reviewed. The methods are illustrated in a historical perspective, highlighting the main lines of development and employing historically relevant datasets in the examples. Estimators and tests for spatial cross-sectional and panel models based either on maximum likelihood or on generalized moments methods are presented. The paper is concluded reviewing some current active lines of research in spatial econometric software methods.


Introduction
The term spatial econometrics was coined by the Belgian economist Jean Paelinck in 1974 during an address to the Dutch statistical association. A few years later with his famous book "Spatial Econometrics: Methods and Models" Luc Anselin was instrumental to lay down the foundation of the field [1]. His book was (and still is after many years) a very well organized collection of tools for the analysis of spatial data. The use of spatial econometrics in empirical applications was facilitated and extended by how easily the methods and examples presented in [1] could be reproduced in SpaceStat software. In the early years, most of the attention given to spatial econometrics was from researchers in quantitative geography and regional science. Sessions of the North American Regional Science Association annual meetings were populated by research analyzing regional data from a spatial econometric perspective. This was probably due also to certain similarities between spatial econometrics and methods typically adopted in regional science as for example Input-Output analysis. At the same time SpaceStat was complemented by other tools such as the MATLAB spatial econometrics toolbox by LeSage (http://www.spatial-econometrics.com/) and the package spdep in R. In recent years, the field of spatial econometrics has experienced a rapid growth in conjunction with the interest and attention received by researchers in mainstream economics and econometrics. A multiplicity of methods and models have been developed for cross-sectional as well as panel data [2][3][4]. Currently, spatial econometrics routines to estimate spatial models are available from many commercial (and non commercial) software, as for example Stata and the PySal module spreg [5]. R [6] is with no doubt the open source environment that contains the richest variety of options.
The aim of this paper is to survey all the available spatial econometrics packages and methods in R that deal with polygon spatial data, presenting the interested researchers with an up-to-date and comprehensive review of methods in both the cross-sectional and the panel data domains. All examples are meant to be replicable with resources from the public domain, allowing the reader to immediately put the relevant method into practice.
The rest of the paper is organized as follows: in Section 2 we present four data sets that will be then used to illustrate the libraries introduced. Section 3 is entirely devoted to cross sectional models and R libraries that deal with them. We present those libraries following as close as possible the chronological order in which they appeared in R. Section 4 describes static panel data model and the library splm, which implements maximum likelihood (ML) and generalized moments (GM) estimation. Section 5 deals with further developments and alternative methods and approaches both for cross-sectional and panel models. Section 6 concludes the paper.

Used Car Prices
In the 1960s, Hanna [7] wanted to examine the effects of regional differences in state taxes and transportation charges on used car prices. The article lists data for the 48 coterminous US states and the District of Columbia. The data set was used by Hepple [8] in developing ML estimation methods for spatial series, as cross-sectional spatial data were then known. We may read a copy of the data, punched from the article and stored in a GeoPackage file with the state boundaries, into R in this way: > library(sf) > used.cars < -st_read("Data and Shapes/hanna66.gpkg", quiet=TRUE)  [7]) are lowest in Kentucky and in surrounding states northward towards Detroit. The lower panel shows the transport costs, which are lowest in Michigan for obvious reasons, because Detroit was then Motor City. State sales taxes are set by political decisions, and were considered carefully by Hanna as a driver for differences in used car prices. One might expect some leakage of demand for used cars across state boundaries; the figure also shows some variation in sales taxes between neighbouring states that might provoke leakage.  The upper panel of Figure 1 also shows the graph of contiguous states, that is states sharing at least one boundary point. Some links are counter-intuitive, such as Wisconsin-Michigan, but here it is Upper Michigan that shares a boundary with Wisconsin. It is this symmetric neigbour graph that forms the basis for measuring the relationships between values at observations (here states) and aggregate values at neighbouring observations.
The aggregate values here are taken as the average of values in neighbouring states, with edge weights summing to unity for each state, denoted by style="W" in the definition of the list of weights object. Spatial weights matrices are typically very sparse, and while formulae often show them as matrices, they are seldom used as dense matrices.
To get an impression of the level of spatial dependence in these variables, we may use the approximate profile likelihood estimator [9,10] [APLE], which may be viewed somewhat like a correlation coefficient between the values observed for each state and the average values for neighbouring states. For the 1960 used car prices, APLE is 0.7579, for transport costs unsurprisingly a little stronger: 0.8404. The coefficient for sales taxes is negative: −0.3625, and for both transport costs and sales taxes taken together it is more moderate: 0.5317.

Driving Under the Influence
One of the main advantages of GMM methods in space is that this technique is able to handle additional endogenous variables (other than the spatial lag). For this reason we choose to employ the simulated county data set US Driving Under the Influence (DUI) from in [11][12][13] for our demonstration. The data for 3109 counties (excluding Alaska, Hawaii, and US territories), was constructed simulating from variables in Powers and Wilson [14]. The counties were available from an ESRI Shapefile downloaded in 2102 from the US Census. (https://www2.census.gov/geo/tiger/TIGER2008/tl_2008_us_county00.zip). The dependent variable dui is defined as the alcohol-related arrest rate per 100,000 daily vehicle miles traveled (DVMT). The regressors include police (measured in terms of number of sworn officers per 100,000 DVMT); nondui (non-alcohol-related arrests per 100,000 DVMT); vehicles (number of registered vehicles per 1000 residents), and dry (a dummy for counties that prohibit alcohol sale within their borders, about 10% of counties). An additional dummy variable elect is 1 if a county government faces elections, 0 otherwise, and has 295 non-zero entries.
More than likely, the size of the police force is related with the alcohol-arrest rates. Therefore, police is treated as an endogenous variable. [12] also assume that the dummy variable elect make a proper instrument for police.
We first read the shape file and the list of neighbors. Then, the function nb2listw serves to transform the neighbors into an actual (row-standardized) weights matrix. The last three lines of code define the formulas. In particular, fm2 is the main formula that relates dui to the explanatory variables from which police is excluded because of endogeneity and is added separately. Finally, the last line of code below defines that the variable elect should be used as instrument.

Rice Farming
The famous Econometrica paper of Case [15] has introduced spatial panel data to a large audience and the mainstream profession. She applied a comprehensive spatial panel data framework to the empirical analysis of rice production in Indonesia, a subject panel data econometricians would come back to in more recent years. In another seminal paper [16] 171 rice farms in Indonesia are observed over six growing seasons, three wet and three dry, between 1975 and 1983. The farms are located in six different villages of the Chimanuk River basin in West Java. Following Druska and Horrace [16], the proximity matrix is constructed considering all the farms of the same village as neighbours.
Data and weights matrix are available in the splm package as, respectively, RiceFarms and riceww; the spatial weights do not change over time, and are of size N × N, while there are N × T observations. > hldata(RiceFarms, package="splm") > data(riceww, package="splm") > ricelw <-mat2listw(riceww) Druska and Horrace [16] estimated a production frontier equation where rice output depends on the inputs: seed, urea, phosphate (tsp), labour hours (lab) and land (size); all variables in logs but phosphate. Dummy variables are added for:the use of high yield varieties of seed (high); for a mix of seed varieties (mixed); and for the use of pesticides. Dummy variables are also included for each of the six villages, and for wet season (as opposed to dry): > ricefm <-log(goutput) log(seed) + log(urea) + phosphate + + log(totlabor) + log(size) + I(pesticide>0) + + I(varieties=="high") + I(varieties=="mixed") + # as.factor(region) + + I(as.numeric(time) %in% c(1,3,5)) As per Druska and Horrace [16], "[o]f the six villages included in the sample, two are on the north coast of the island in an area with average altitudes of 10-15 m above sea level. Another three villages are in an area (600-1100 m above sea level) in the central part of West Java. The last village is in the center of the island with an average altitude of 375 m. The infrastructure in the Cimanuk River Basin is fairly heterogeneous. Some of the villages (in both high and lowland areas) lack reliable transportation systems, and local roads are almost impassable in the wet (rainy) season. Other villages, located in close proximity to province capital cities, are highly accessible along paved, all-weather roads." Therefore, both village-level heterogeneity and spatial correlation between farms belonging to the same village can be expected. Of the two possible spatial effects, spatial dependence in the errors is more likely, because of the similarity in idiosyncratic factors and climate conditions between neighbouring farms; the inclusion in the model of a spatial lagged response is harder to justify, as it seems unrealistic for one farm's production to influence those of neighbours.

Crime in North Carolina
The second panel data set considered is based on a well known economic model of crime estimated by [17]. (The data are available from the website associated with Baltagi's book [18].) They use a panel data on 90 counties in North Carolina over the period 1981-1987. The empirical model relates the crime rate (lcrmrte) to a set of controls for the return to legal activities, and to a number of deterrent variables (probability of arrest, probability of conviction conditional on arrest, and probability of imprisonment conditional on conviction). The crime rate variable is the ratio between an FBI index that measures the number of crimes, and county population (i.e., crime per-capita in the county). The ratio of arrest to offenses is a proxy for the probability of arrest (lprbarr); the ratio of convictions to arrest is a proxy for the probability of conviction (lprbconv), and, finally, the proportion of total convictions resulting in prison sentences is a proxy for the probability of imprisonment (lprbpris). A measure of sanction severity (lavgsen) measured by the average prison sentence length in days is included in the model as well.
All of the other variables are either observable county characteristic, or controls for the relative return to legal activities. The relative returns to legal activities are captured by the average weekly wage in the county in various sectors, such as construction (lwcon); transportation, utilities, and communications (lwtuc); wholesale and retail trade (lwtrd); finance, insurance, and real estate (lwfir); services (lwser); manufacturing (lwmfg); and federal (lwfed), state (lwsta), and local government (lwloc). The dummy variable (urban) controls for differences in participation in the legal sector that may occur between urban and rural environment. A similar role is played by the density variable (ldensity) which measures the ratio between county population and county land area.
It is well known that crime rates tend to change with demographic characteristics of the population. The model incorporates the proportion of male population between the ages of 15-24 (lpctymle), as well as the proportion of the population that is minority (lpctmin). Finally, regional or cultural factors that may affect the crime rate are picked up with the inclusion of a central and western dummies. Ref. [17] estimates the model both by the between and the within estimators and find quite impressive differences. Since they are concerned by the heterogeneity in their sample, they reject estimators that do not condition on country effects. This decision is clearly based on the evidence of a Hausman test.

Cross Sectional Models
The general model presented in this section allows for endogeneity of (some of) the regressors. The point of departure is the Cliff-Ord spatial model: where y is an n × 1 vector of observations on the dependent variable, Y is an n × p matrix of observations on p endogenous variables, X is a n × k matrix of observations on k exogenous variables, W is an n × n observed and non-stochastic spatial weight matrix and, consequently, Wy is an n × 1 variable that is generally referred to as the spatial lag of the dependent variable; π and β are corresponding parameters; and λ is the spatial autoregressive coefficient. Given the presence of Y, the model can be viewed as a representation of a single equation of a system of equations. The error vector u follows a spatial autoregressive process of the form: where ρ is a scalar spatial autoregressive parameter, M is an n × n spatial weights matrix that may or may not be the same as W, Mu is an n × 1 vector of observation on the spatially lagged vector of residuals. An alternative, more compact way to express the same model is: where Z = [Y, X, Wy] is the set of all (endogenous and exogenous) explanatory variables, and δ = [π , β , λ] is the corresponding vector of parameters. Finally, the assumption on which the ML relies is that ε ∼ N(0, σ 2 ). The general model (Equation (1)) may be restricted in various ways. Particularly in ML applications, π is generally set to zero. The spatial error model (SEM) is generated from the general model when The spatial lag model (SLM) or spatial autoregressive model (SAR) is generated from the general model when ρ = γ = π = 0: The spatial Durbin model (SDM) is generated from the general model when ρ = π = 0: It is also possible to define the spatial error model with lags of the explanatory variables (henceforth SDEM) when λ = π = 0: If the only restriction are γ = π = 0 we have a spatial autoregressive model with autoregressive error term (SARAR): Finally, the SARAR model can also include lagged explanatory variables. In this case the only restriction on the general model is π = 0, that corresponds to the following specification: Over time, a characteristic of spatial lag models (and, by extension, of any model including the spatially lagged response) has become clear: that, unlike the spatial error model, the spatial dependence in the parameter λ feeds back. The interpretation of marginal effects should therefore not be based on the fitted parameters β, but rather on correctly formulated impact measures, as discussed in references given further on.
The reason for the feedback lies with the data generation process of the spatial lag model (and by extension in the general model). Rewriting: where I is the n × n identity matrix. This means that the expected impact of a unit change in an exogenous variable r for a single observation i on the dependent variable y i is no longer equal to β r , unless λ = 0. The awkward n × n S r (W) = ((I − λW) −1 Iβ r ) matrix term is needed to calculate impact measures for the spatial lag model, and similar forms for other models including the general model, when λ = 0.

Initial Development in R: The spdep Package
Bivand and Gebhardt [19] discusssed initial approaches to handling and analysing spatial data using R, based on a presentation at the 1998 European Reagional Science Association (ERSA) Congress in Vienna. A specialist meeting in Santa Barbara in May 2002 turned out to be very fruitful, but the contribution covering R took a little while to appear [20]; the meeting proceedings were online directly. Further discussion of spatial regression for areal/lattice data was presented at the 2002 ERSA Congress in Dortmund and published straight away [21]. The main traits of the development of spatial data handling are described by Bivand [22].

Spatial Dependence and the OLS Model
Initial concerns about the presence of spatial dependence in variables included in standard Gaussian linear models concentrated on the interpretation of tests on regression coefficients. Since positive spatial dependence may signal fewer effective degrees of freedom, could one trust standard tests assuming that no spatial dependence was present? Table 1 shows a small part of the findings of Smith and Lee [24], using the 49 coterminous states neighbour graph ( Figure 1) and 10,000 draws. If both y and x show spatial dependence, standard test assumptions should not be used. If either is free of spatial dependence, standard test assumptions may be used. Here, we impose the levels of spatial dependence, and also know the scheme used to do this. With empirical data, we do not know where the spatial dependence is coming from, including the actual footprint of the variables. While sales taxes are determined here by states, neither prices nor transport costs are; they are aggregates. Table 1. Simulation of the power of a t-test on the regression coefficient at the nominal level of 0.05 for uncorrelated y and x and spatial dependence for the response ρ y and the covariate ρ x , following Smith and Lee [24]. Hepple [8] questioned whether the relationships described by Hanna held up when spatial dependence was taken into account. He used the sum of transport costs and sales taxes as the covariate, but we can also use the two covariates directly: > lm_obj1 <-lm(av55_59˜I(transp + salesTax), used.cars) > lm_obj2 <-lm(av55_59˜transp + salesTax, used.cars) > lm_obj3 <-lm(av55_59˜offset(transp) + salesTax, used.cars) Table 2 shows that the simpler model, summing the costs and taxes, appears to perform less well than the model with two separate covariates. However, we know that transport costs might be offset rather than fitted, so the third model implicitly imposes a coefficient of unity on transport costs. McMillen [25] stresses the importance of considering whether apparent spatial dependence is in fact engendered by model mis-specification, such as the erroneous inclusion or omission of covariates, and the inappropriate functional form of included covariates. When Cliff and Ord [26,27] proposed an extension of the Moran's I spatial autocorrelation test for regression residuals, there was already some confusion associated with the choice facing analysts between the spatial error model and the spatial lag model. The test was, like the Durbin-Watson test, based on testing a linear model against an alternative of omitted spatial autocorrelation in the error term.
> moran1 <-lm.morantest(lm_obj1, lw, alternative="two.sided") > moran2 <-lm.morantest(lm_obj2, lw, alternative="two.sided") > moran3 <-lm.morantest(lm_obj3, lw, alternative="two.sided") Table 3 shows that all of the fitted models appear to show significant spatial autocorrelation in the error. Hepple [8] draws the same conclusion from a similar test, and fitted a spatial error model. This test may also be used when a weighted linear model is used; here no major differences are observed although the level of residual spatial autocorrelatiion is not so strong when counts of cars held by households from the 1960 US Census are used as weights. Because Moran's I for regression residuals gives no guidance in choosing between spatial error and spatial lag models, ref. [28] and Anselin and Bera [29] introduced Lagrange multiplier (LM) tests. The re are five tests, a test for residual autocorrelation (LMerr) with a robust version (RLMerr) for error dependence in the presence of spatial dependence in the response, conversely LMlag for an omitted spatially lagged response with RLMlag robust to the simulatneous presence of residual dependence, and the portmanteau test SARMA which is the sum of LMlag and RLMerr or LMerr and RLMlag.
> LM1 <-lm.LMtests(lm_obj1, lw, test="all") > LM2 <-lm.LMtests(lm_obj2, lw, test="all") > LM3 <-lm.LMtests(lm_obj3, lw, test="all") Table 4 shows the conventional probability values for the three models and the five tests. While SARMA, LMlag and LMerr indicate strong autocorrelation mis-specification, the contrasts between the robust RLMerr and RLMlag suggest that models (1) and (2) might be suffering from an omitted spatially lagged response, while the picture for model (3) is unclear. Until the last decade, it has been felt reasonable to use the balance between the robust LM tests as a guide, because the spatial lag and spatial error models are not nested, so cannot be tested against each other using likelihood ratio tests. The ML estimation methods for spatial lattice regression models grew from developments in Cliff and Ord [26], soon afterwards refined in Ord [30]. In these and in [8,31], short-cuts were sought but largely rejected, in favour of optimizing the appropriate likelihood function. The implementation in spdep uses line search over the single spatial coefficient, calculating the other coefficients once that is found. The development in [26] only addresses the simultaneous autoregressive (SAR) approach, but [32] and the rich literature based on his work prefers to treat spatial lattice regression in a Markov random field setting (conditional autoregressive, CAR), with spatially structured random effects included in an otherwise aspatial model. Reference [33] summarizes these developments and relates the SAR and CAR approaches.
The log-likelihood function for the spatial error model is: β may be concentrated out of the sum of squared errors term, for example as: where Q ρ is obtained by decomposing (X − ρWX) = Q ρ R ρ . The first published versions of the eigenvalue method for finding the Jacobian [30] (p. 121) is: where ζ i are the eigenvalues of W.
One specific problem addressed by [30] (p. 125) is that of the eigenvalues of the asymmetric row-standardized matrix W with underlying symmetric neighbour relations c ij = c ji . If we write w = C1, where 1 is a vector of ones, we can get: W = CD, where D = diag(1/w); by similarity, the eigenvalues of W are equal to those of: From the very beginning in spdep, sparse Cholesky alternatives were available for cases in which finding the eigenvalues of a large weights matrix would be impracticable.
The log-likelihood function for the spatial lag model is: and by extension the same framework is used for the spatial Durbin model when [X(WX)] are grouped together. The sum-of-squared errors (SSE) term in the square brackets is found using auxilliary regressions e = y − (X X)Xy and u = Wy − (X X)XWy, and SSE = e e − 2λu e + λ 2 u u. The cross-products of u and e can conveniently be calculated before line search begins. The spatial error model (SEM, Equation (4)) was fitted to Hanna's data by Hepple [8]; here we separate the two covariates, but if estimated using the sum of the two, the errorsarlm() function yields the same results at those reported in the article. All the model estimation functions from the spdep package have been split out into spatialreg [34], mostly because most users need spdep for creating spatial neighbour objects and for testing for autocorrelation. A separate model estimation package permits faster development of the model fitting functions without disturbing other work. The model fitting functions follow the structure for R functions of this kind, using a formula interface. The list of weights object is required, and when no method is specified for the computation of the log Jacobian to which we will return later, the eigenvalues of the spatial weights matrix are used [30].
> library(spatialreg) > sem_obj2 <-spatialreg::errorsarlm(av55_59˜transp + salesTax, + used.cars, listw=lw) While each iteration of the line search of SEM involves a regression, the spatial lag model (SLM, Equation (5)), and the spatial Durbin model (SDM, Equation (6)) use intermediate values from two pre-computed regressions in each iteration. Again, the implementations use the eigenvalues of the spatial weights matrix to compute the log Jacobian at each iteration. Setting the Durbin= argument to TRUE adds the spatially lagged covariates, omitting the lagged intercept when the spatial weights are row standardized.
> slm_obj2 <-spatialreg::lagsarlm(av55_59˜transp + salesTax, + used.cars, listw=lw) > sdm_obj2 <-spatialreg::lagsarlm(av55_59˜transp + salesTax, + used.cars, listw=lw, Durbin=TRUE) Table 5 shows the fitted model output as it would have been presented until about 10 years ago. All the spatial models improve the fit of the model compared with the aspatial model in the first column. In addition, a Lagrange multiplier test of the residuals of the SLM for spatial autocorrelation has a probability value of 0.1578 (SDM: 0.4471), neither of which appear to indicate any further spatial patterning. Pace and LeSage [35] propose a Hausman test assessing whether SEM coefficients are as close to the OLS coefficients as they should be in a well-specified model; the probability value here is 0.0293. Had they been more different, one could find that the model was more seriously mis-specified. All of these tests are borderline, so all open for more analysis but do not point clearly in a single direction. Finally, the examples of spatial regression in Waller and Gotway [36], using both SAR and CAR approaches, and introducing case weights to try to handle heterogeneity, led to the re-implementation of the spatial error model in the spautolm() function, which will not be presented here.

The "Advent" of The GMM
In two seminal papers [37,38] suggested a generalized method of moments (GMM) approach to the estimation of a SARAR model (Equation (8)) and established asymptotic results for the estimator. The main motivation for the success of this approach was, and in part still is, the computational simplicity even for large samples compared to ML. Additionally, at the time the GMM was proposed, one further problem was the lack of formal results concerning the asymptotic properties of ML, that were only derived later in a paper by [39]. The original approach is based on a three steps procedure: 1. In the first step, the first equation in (8) is estimated by two-stage least square (2SLS) using the matrix of instruments where, generally, q = 2. 2. In the second step, the residuals from the first step are employed to obtain an estimate of ρ by solving a non-linear system of three equations resulting from the specification of the three following quadratic moment conditions: With the estimate of ρ obtained in the second step, a transformation of the model is taken to filter out the spatial parameter and the transformed model is estimated again by 2SLS.
The estimator described by these three steps is generally referred to as the feasible generalized spatial two-stage least square (FGS2SLS) estimator. One issue to emphasize is that ρ is treated as a nuisance parameter. Basically, the idea is to filter out spatial autocorrelation in the errors that is potentially dangerous for statistical inference on the model parameters, but there is no interest to make inference on the spatial error parameter itself. (In a later unpublished document, the authors demostrated how to compute the variance for the spatial error parameter in order to make inference on it. The GMerrorsar function takes advantage of this and therefore in the demonstration below there will be a standard error for ρ.) At this point it should be also evident that the nested models (i.e., the SLM, SEM, SDM and SDEM) can be estimated easily by modifyng the three steps described above. On the one hand, in the SEM and SDEM models the first and third steps are simply OLS, since there is not any endogeneity deriving from the presence of the spatial lag of y. On the other hand, the SLM and SDM models can be estimated by 2SLS since the error term is no longer spatially autocorrelated.
There were three separate functions in spdep: stsls for the SLM, GMerrorsar for the SEM, initially contributed by Luc Anselin, and gs2sls for the SARAR model. These functions, along with many others, recently migrated into spatialreg.
The estimation of the lag model, as well as those of the error and SARAR models, of the DUI data cannot use the formula defined in Section 2, since none of the functions in spdep allow for additional endogenous variables.
> lag_gmm_sreg <-spatialreg::stsls(dui˜nondui + vehicles + dry + + police, data = shape2, + listw = listw_dui) > err_gmm_sreg <-spatialreg::GMerrorsar(dui˜nondui + vehicles + + dry + police, data = shape2, + listw = listw_dui) > sarar_gmm_sreg <-spatialreg::gstsls(dui˜nondui + vehicles + + dry + police, + data = shape2, listw = listw_dui) Table 6 reports results for the three implementations. A glance at the table shows that, out of the regressors, only non-alcohol related arrests is not statistically significant. The presence of police force is the larger deterrent to driving under the influence of alcohol. It is also noteworthy that the coefficient estimates are very stable across different models. The value of λ for the SLM is higly statistically significant (even though it is quite small in magnitude). The spatial error coefficient ρ in the SEM model is similar in magnitude, but it is not statistically significant. Finally, the estimated value and inference for λ in the SARAR model is (almost) identical to the SLM, while ρ is smaller than the SEM coefficient estimates and, as we mentioned previously, inference is not available.

An Early Version of sphet
Few years later that the GM approach was implemented in spdep, [40] developed a new package for estimating and testing spatial models with heteroskedastic innovations. The library was mainly based on GM estimators and semi-parametric methods for the estimation of the coefficient's variance-covariance matrix. sphet was complementing but not overlapping with spdep. In fact, sphet focused only on GM and instrumental variables (IV) methods, leaving aside ML, and dealt with potential heteroskedasticity in the error term, features that was only partly taken into account in stsls. From a theoretical point of view, the procedures implemented in sphet were derived in [41,42]. The point of departure of [41] was the SARAR model with potential heteroskedasticity in the innovations. A noticeable difference of [41] is that they gave results for the spatial error coefficient for both consistency and asymptotic normality. Of course, this enabled to perform statistical inference on both spatial parameters. Moreover, the moment conditions were slightly different from their earlier paper, thus leading to a different system of equations, that, in turn, resulted in a different estimates for the spatial error parameter.
The corresponding function in sphet is called gstslshet. The syntax of the function is pretty straightforward: the first argument is a description of the model to be estimated, then the optional argument containing the data set, and, finally, the (mandatory) object of class "listw".
> het_old <-sphet::gstslshet(dui˜nondui + vehicles + dry + police, data = shape2, listw = listw_dui) Table 7 reveals that, besides the different moment conditions (that influence the estimated value of ρ), the model coefficients estimates, including the coefficient of the spatially lagged dependent variable, are very stable. The spatial error coefficient can be tested although, in this case, it is not statistically significant. Reference [42] propose a semi-parametric method for the estimation of the coefficient's variance-covariance matrix that is robust against possible misspecifications of the disturbances and allows for unknown forms of heteroskedasticity and correlation across spatial units (HAC estimation). Instead of assuming a specific spatial structure for the error term, they assume a general form that nests many different spatial processes such as spatial autoregressive or spatial moving average. The rationale behind this idea is that the error term, being the unknown part of the model, should not be specified a priori in terms of a specific spatial process, but rather assume a very general flexible form. However, the spatial HAC estimator is not immune from possible criticisms related to the type of kernel and the bandwidth selection. If the decision about the kernel choice has been proven not to be very relevant in that different kernels lead to negligible differences in practical applications, the same does not apply for the bandwith.
The function to produce the 2SLS with HAC standard errors available from sphet is stslshac. The procedure is based on the choice of a distance function that along with a kernel determines the non-zero observations in the variance-covariance matrix.

Further Development in R: The spdep Package and the Improvement of sphet
When [43] was published, the treatment of spatial econometrics covered the available software implementations in spdep. LeSage and Pace [2] appeared shortly afterwards, significantly "raising the bar" as expressed by Elhorst [44] in an extended review. Reference [45] discussed in detail both how to estimate an extended range of nested models using ML, and how to handle model interpretation, pursuing topics presented in Halleck Vega and Elhorst [46] and LeSage [47].
Work began in 2019 to split model fitting functions out from the spdep package, moving these components to the new spatialreg package. At about the same time, Bayesian fitting methods were added, based on porting of MATLAB Spatial Econometrics Toolbox code carried out by Virgilio Gómez-Rubio and Abhirup Mallik.
At the same time, the theoretical development of the generalized methods of moments in spatial econometric models was flourishing and gave rise to many important contributions. This corresponded to a series of major revisions of sphet, and, in particular, the inclusion of the wrapper function spreg. We will turn to this after describing the evolution of the ML estimation in the next subsection.

Evolution of the ML Estimation
Bivand et al. [48] review the technical issues around the calculation of the log Jacobian term in ML and Bayesian model estimation. It had been established from the mid-1990s that sparse matrix decomposition (Cholesky for symmetric weights matrices and LU for intrinsically asymmetric weights matrices) was feasible [49,50]. This was extended to cover the initial decomposition step in sparse Cholesky decomposition, which does not need to be repeated to look up log Jacobian values for successive values of the spatial coefficient.
> rho.min <-err_eigen$interval [1] > args.slm <-list(rho.min = rho.min, rho.max = rho.max, + W = W, X = matrix(0, n, 0), + Q.beta = matrix(1, 0, 0)) > hyper.slm <-list(prec = list(prior = "loggamma", param = c(0.01, 0.01)), + rho = list(initial = 0, prior = "logitbeta", param = c(1, 1)) ) > err_INLA_slm <-inla(dui nondui + vehicles + dry + police + + f(idx, model="slm"), args.slm=args.slm, hyper=hyper.slm), + data=as.data.frame(shape2), family="gaussian", + control.family = list(hyper = zero.variance), + control.compute = list(dic = TRUE, cpo = TRUE, config = TRUE)) Table 9 shows clearly that the coefficients of the covariates are very much the same for all estimation methods. The values of ρ differ a little, but all of those based on the likelihood are very close. For comparison of estimation methods, the standard error of ρ is more interesting. When eigenvalues are used to compute the log Jacobian, it is assumed that the data set is small enough to compute the asymptotic variance-covariance matrix of the coefficients. The estimated value in this case matches those from MCMC and INLA very well, but when the sparse Cholesky is used for the log Jacobian, inverting (I − ρW) is not attempted, and a finite difference Hessian approach is tried instead. This may lead, as in this case, to there being marginally negative values on the diagonal of the estimated variance-covariance matrix, leading to failures when taking square roots. Problems with estimating the variance-covariance matrix can also occur in general when the scaling of the spatial coefficient and the remaining coefficients differ greatly. In this case the problem may be avoided by dividing the response by 1000, but the introduction of sensitivity tests to the poor conditioning of these matrices may be required. The standard error of ρ seems to be estimated well by MCMC and INLA. The likelihood ratio test on ρ return the same probability value in both the eigenvalue and sparse Cholesky cases, however, so resolving numerical issues in the variance-covariance matrix has not been seen as critical, although it also impacts the Hausman test as well. Table 9. Spatial error model estimates and timings for the DUI data set: GMM, ML (eigenvalue and sparse Cholesky log Jacobian), MCMC using sparse LU griddy Gibbs log Jacobian and INLA using the experimental "slm" latent model. Turning to the timings reported in Table 9, the set up times for the ML eigenvalue method, involving finding the eigenvalues of W, and for MCMC conducting many LU decompositions for a coarse grid of values of the spatial coefficient to prepare for griddy Gibbs sampling, are longer than for the other estimation methods. Fitting for INLA is much longer because the n random effects are computed as coefficients, so that the dimensionality of the problem is bigger. MCMC here took 2500 draws only, discarding the first 500; more draws would increase the run time. Completion for the ML eigenvalue method includes the calculation of the asymptotic variance-covariance matrix of the coefficients, which could be speeded up somewhat with multi-threaded linear algebra.

GMM
There are a number of loose ends in the implementations, especially where numerical issues can appear, or where approximations lead to degradations when the spatial coefficient is near the extremities of its domain.

Interpretation and Impacts Evaluation
A fuller comparative treatment of model interpretation and the calculation of impacts is given by Bivand and Piras [52]. Difficulties arise from interaction between the spatial dependence modelled in the response, parameterized as λ and the coefficients on the covariates.
As observed before, in the spatial lag mode-unlike the spatial error case-the spatial dependence in the parameter λ feeds back. These difficulties are discussed as emanating effects [53], also known as impacts [2,54], simultaneous spatial reaction function/reduced form [55] and equilibrium effects [56].
This feedback comes from the fact that, while the elements of the Hessian matrix for the ML spatial error model linking ρ and β are zero (∂ 2 /(∂β∂λ) = 0), in the spatial lag model (and by extension in the spatial Durbin model): ∂ 2 /(∂β∂λ) = 0. In the spatial error model, for exogenous variable r, ∂y i /∂x ir = β r and ∂y i /∂x jr = 0 for i = j. In the spatial lag model, ∂y i /∂x jr = ((I − λW) −1 Iβ r ) ij , where I is the N × N identity matrix, and (I − λW) −1 is known to be dense. The awkward S r (W) = ((I − λW) −1 Iβ r ) matrix term needed to calculate impact measures for the lag model, and S r (W) = ((I − λW) −1 (Iβ r − Wγ r )) for the spatial Durbin model, may be approximated using traces of powers of the spatial weights matrix as well as analytically.
The average direct impacts are represented by the sum of the diagonal elements of the matrix divided by n for each exogenous variable. The average total impacts are the sum of all matrix elements divided by n for each exogenous variable. The average indirect impacts are the differences between the direct and total impact vectors.
The development for approximation using traces of powers of the spatial weights matrix in [2] (pp. [114][115] for the lag model and q traces is as follows: where the intercept β 0 is dropped, and with a a p-vector of ones: Let us revert to the smaller used car data set, and show the important difference between predictions from the OLS model for the base data set and a new data set with the transport cost variable incremented by one: In the OLS case, the mean difference between the predictions is (of course) the value of the coefficient for the transport cost variable. In the SLM case, λ is far from zero, so the feedback is strong, and the difference between predictions is much larger than the coefficient value.
If we pick apart the model output, we can calculate the S r (W) matrix for the transport cost variable, and see that the mean difference between predictions is the average total impact: We can further check that the average direct and total impacts calculated in this way match the values returned by the impacts() method, when the spatial weights matrix is inverted inside the method: The eigenvalue and trace methods make it possible to conduct Monte Carlo tests on the impacts using draws from the fitted model coefficients and their variance-covariance matrix.

Evolution of the GMM and Recent Developments
The theoretical development of the generalized methods of moments in spatial econometric models has been flourishing over the last 15 years. Many important scholars in the field got involved and major commercial software (like, for example, Stata) started implementing codes to estimate the techniques that were under development. In this context, [52] presented a comparison of the implementations available for spatial econometric models. In the meanwhile, sphet had gone under a process of serious revisions that culminated with the inclusion of the wrapper function spreg. Specifyfing a model argument, spreg allows to estimate all of the specifications nested in the general model of Equation (1). The re are mainly two advantages of GMM compared to ML: On the one hand, GMM can deal with very large sets of data since it does not require inversion of large matrices. On the other hand, dealing with additional (other than the spatial lag) endogeneous variables is simple, provided that one has proper and valid instruments.
For the DUI data, the size of the police force is most likely related with the alcoholarrest rates. The refore, police can be treated as an endogenous variable. As we anticipated earlier, ref. [12] also assume that the dummy variable elect (where elect is 1 if a county government faces an election, 0 otherwise) make a valid instrument for police.
> err_gmm_sphet <-sphet::spreg(formula = dui˜nondui + vehicles + dry + police, data = shape2, listw = listw_dui, model = "error") > err_gmm_sphet_e <-sphet::spreg(formula = fm2, data = shape2, + listw = listw_dui, endog = endog2, instruments = instruments2, + model = "error") > lag_gmm_sphet <-sphet::spreg(formula = dui˜nondui + vehicles + dry + + police, data = shape2, listw = listw_dui, model = "lag") > lag_gmm_sphet_e <-sphet::spreg(formula = fm2, data = shape2, + listw = listw_dui, endog = endog2, instruments = instruments2, + model = "lag") > sarar_gmm_sphet <-sphet::spreg(formula = dui˜nondui + vehicles + dry + + police, data = shape2, listw = listw_dui, model = "sarar") > sarar_gmm_sphet_e <-sphet::spreg(formula = fm2, data = shape2, + listw = listw_dui, endog = endog2, instruments = instruments2, + model = "sarar") Table 10 compares the SEM, the SLM, and the SARAR models with the corresponding models assuming that police force is endogenous. Let us focus first on the three models with no additional endogeneity. While the SLM is the same as the one presented in Table 6, the SEM and the SARAR models are sligthly different since spreg uses different moment conditions. Despite this fact, results are very close to the one presented before and similar conclusions can be drawn. In particular, the spatial error parameter is not statistically significant, while the positive spatial lag coefficient is small but strongly statistically significant. This means that the DUI related arrests in neighbouring counties affects the alcohol related arrests for a given county. This result can be explained in terms of copycat policies or a certain level of coordination in police enforcement between counties. In terms of the explanatory variables in the model, nondui is the only one that is not statistically signicant. The estimated coefficient for police is large and positive in all three models. Moving to the specifications that treat police as endogenous the results are quite different particularly in terms of the magnitude of the coefficient estimates. Moreover, police turns out to be negative once endogeneity is controlled for. Two additional things have to be noted. The first relates to the SARAR model. The summary method for SARAR models automatically performs a Wald test that both ρ and λ are statistically significant. The second relates to the SLM as well as to the SARAR model. Once again for models that are specified in terms of a spatial lag of the dependent variable appropriate summary measures needs to be used to take into account for simultaneity. This is the reason why appropriate spatial effects are calculated for the SAR and SARAR models. However, when additional endogenous variables are present, the calculation of the impacts is quite complicated. Ref. [57] show how to approximate that calculation but since is very case specific, it has not been implemented (yet) in sphet.

Spatial Panel Data Models
The econometric literature has considered panel regression models with spatially autocorrelated outcomes or disturbances and random or fixed individual effects for more than three decades now.
The pioneering book of Anselin [1] and the famous Econometrica paper of Case [15] have introduced the subject to a large audience. The former reserved a minor part of a booklength treatment to the SEM model in a random effects setting, while the second applied a comprehensive spatial panel data framework to the empirical analysis of rice production in Indonesia, a subject panel data econometricians would come back to in more recent years. Nevertheless, few spatial panel data applications have followed, mainly because of the computational difficulties and the lack of ready-made, user-friendly software.
The more recent methodological contributions by [58,59] and the first comprehensive treatments of the subject in [60,61] have further helped the diffusion of spatial panel methods in applied practice, this time helped by the circulation of the first general-purpose routines, written in MATLAB by J. Paul Elhorst and kindly provided for public use by the author.
Nevertheless, the number of empirical applications has constantly trailed that of theoretical developments in this particular subject. Although clearly written, well tested and not difficult to adapt to one's problem, Elhorst's MATLAB routines were still primarily written for the author's own use; moreover, if the specific routines were provided free for general use, MATLAB was, and is, non-free. The availability of estimators and tests of production-quality usability (i.e., devoting much of the functionality to data and modelling interfaces, results' presentation and consistency checks) within an open source environment would boost the number of spatial panel applications in the empirical literature. This happened with the emergence of the dedicated R package splm described here (see [62]); and, some years later, with the Stata add-on package 'xsmle' [63] (in this latter case, while the package is provided in the open domain, the base software system is not; still, Stata is a de facto standard in econometrics and most researchers are likely to have access to a copy).

Static Spatial Panels
Spatial panel data models capture spatial interactions across spatial units observed over time. A general static panel model includes a spatial lag of the dependent variable and spatial autoregressive disturbances: where y is an nT × 1 vector of observations on the dependent variable, X is a nT × k matrix of observations on the non-stochastic exogenous regressors, I T an identity matrix of dimension T, W is the n × n spatial weights matrix of known constants whose diagonal elements are set to zero, and λ the corresponding spatial parameter. The disturbance vector is the sum of two terms u = (ι T ⊗ I n )µ + ε where ι T is a T × 1 vector of ones, I n an n × n identity matrix, µ is a vector of time-invariant individual specific effects and ε a vector of spatially autocorrelated idiosyncratic errors that follow a spatial autoregressive process of the form ε = ρ(I T ⊗ W)ε + e with ρ as the spatial autoregressive parameter, W the spatial weights matrix and e ∼ I ID(0, σ 2 e ). The spatial weights matrices in the lag and the error term can differ (see the following). I n − ρW is assumed non-singular. The spatial panel model described above draws on panels of n data points observed over T time periods. Contrary to standard panel data practice, data are stacked by time, then by cross-section (so that the individual index is the "fastest" one). The spatial weight matrix W is assumed time invariant, as customary in the literature, and enters spatial panel models as I T ⊗ W where ⊗ is the Kronecker product. In the following, the models are illustrated based on the Rice Farming data.

The Pooled Spatial Model
If one could safely assume out any individual heterogeneity, spatial panels could be estimated by simply applying cross-sectional estimation techniques to the pooled dataset, employing an extended W matrix as specified above. This hypothesis, nevertheless, is extremely unlikely to hold. Below we estimate the pooled model; results will be reported later, comparing them to those considering individual effects.

Tests
In principle, spatial correlation in the residuals of panel models can be tested through a Moran test, treating all observations as a pool and employing a panel extension of the neigbourhood matrix, where as discussed above W nT = I T ⊗ W. This approach nevertheless depends on the pooling assumption, i.e., assuming out any form of individual effect: which is inappropriate in the vast majority of cases. Specific test statistics have therefore been devised for spatial panels.

LM Tests
The Lagrange multiplier (LM), or score, test procedure on verifying whether the score of the likelihood of a restricted model is significantly different from the zero vector. If this is not the case, then the restriction is not binding w.r.t. the problem at hand and the corresponding null hypothesis is not rejected. Differently from its siblings, the asymptotically equivalent Likelihood Ratio and Wald tests, the LM test only requires to estimate the restricted model, therefore is often the procedure of choice in econometrics because of its computational parsimony, especially when estimation of the unrestricted model is complicated, costly or even problematic.
Since the seminal work of [64], LM tests have been extensively employed to test for random effects and serial or cross-sectional correlation in panel data models. For the above reasons, LM tests are particularly appealing in a spatial random effects setting because estimates for the full model are often much more difficult to compute than those for the restricted one.

Conditional and Joint Tests for Spatial or Random Effects
Building on the earlier literature, ref. [59] have extended the ML-based testing framework deriving joint, marginal and conditional tests for all combinations of random effects and spatial correlation. While the marginal tests are those already known, and the joint test is of little practical value because it will be a signal either of spatial or random effects without giving directions regarding which one is actually present, the conditional tests are particularly important because they allow to test for one of the two effects allowing for the presence of the other. The comparative disadvantage of conditional tests is that their implementation is slightly more complicated as being based on the ML residuals from the model containing the "other" effect-the one the test is conditional on-instead of on OLS residuals.
Specifically, the hypotheses under consideration are: 1. H a 0 : λ = σ 2 µ = 0 under the alternative that at least one component is not zero 2. H b 0 : σ 2 µ = 0 (assuming λ = 0), under the one-sided alternative that the variance component is greater than zero 3. H c 0 : λ = 0 assuming no random effects (σ 2 µ = 0), under the two-sided alternative that the spatial autocorrelation coefficients is different from zero 4. H d 0 : λ = 0 assuming the possible existence of random effects (σ 2 µ may or may not be zero), under the two-sided alternative that the spatial autocorrelation coefficient is different from zero 5. H e 0 : σ 2 µ = 0 assuming the possible existence of spatial autocorrelation (λ may or may not be zero)and the one-sided alternative that the variance component is greater than zero.

Local CD Test
An alternative testing procedure from the heterogeneous panel literature can be applied to homogeneous panels as well, containing either fixed or random effects. This is based on a particularization of Pesaran's ( [65]) CD test for global spatial dependence. The CD test is based on an average (across the sectional dimension) of sample estimates of the pairwise correlations of residuals of the separate (timewise) regressions for every cross sectional unit: The CD test is asymptotically standard Normal distributed under the null of no cross-sectional correlation; moreover, it does not depend on the heterogeneity assumption. In general, residuals from any appropriate model (pooled, FE, RE) can be employed.
A variant of the CD test, called CD(p) test, has been designed to test for local crosssectional dependence, i.e., dependence between neighbours only: in other words, for spatial dependence. It works by considering only the subset of "neighbouring" pairs of crosssectional units, selected by means of the familiar binary proximity matrix. Originally, a regular ordering of observations was assumed, so that the m-th cross-sectional observation was a neighbour to the (m − 1)-th and to the (m + 1)-th. Reference [66] first extended the application of the CD(p) test to an irregular lattice. The formula for the local test is an adaptation of the original CD statistic where, as observed, the binary proximity matrix is employed as a selector for discarding the correlation coefficients relative to pairs of observations that are not neighbours (corresponding to zeros in W): where w(p) ij is the (i, j)-th element of the p-th order proximity matrix, so that if h, k are not neighbours, w(p) hk = 0 andρ hk is canceled out. Both the global (i.e., non-spatial) and the local CD tests have been available since 2008 in the plm package [67]. In the following we compute the local CD test on the residuals of the random effects panel model: > library(plm) > CDp.pool <-pcdtest(plm(ricefm, data = RiceFarms, model='pooling'), Again, spatial dependence is clearly present regardless of which kind of individual effects, if any, are included.

Individual Effects: Fixed or Random
The spatial panel data literature, following the mainstream non-spatial approach, distinguishes between treating the unobserved individual effects as fixed or random. In a random effects specification, these are assumed uncorrelated with the regressors, so that they can be safely treated as components of the error term: see, e.g., Assumption RE.1.b in Wooldridge [68] (10.4). Should this hypothesis not hold, then the latter strategy would introduce endogeneity and produce inconsistent estimates; the individual effects would either have to be estimated out or, which is more often the case, eliminated by first differencing or time-demeaning the data (see [68] 10.5). The standard device for assessing the hypothesis of no correlation (i.e., for testing the appropriateness of random effects methods), is the Hausman [69] test. In a spatial setting, Mutl and Pfaffermayr [70] derived an appropriate Hausman test for spatial panels.
From another viewpoint, the random effects hypothesis is considered consistent with sampling individuals from a potentially infinite population. for this reason Elhorst [61] dismissed its practical utility in spatial econometric contexts, where sampling typically takes place over a fixed set of countries or regions.

ML Estimation
For all the popularity of either the SAR and the SEM specification, econometric practice generally focuses on one effect only. With an exception made for the pioneering work of [15], few applications in the literature allow for both. Nevertheless, the expression for the likelihood of a model combining a spatial lag with any error structure Σ, including spatial dependence ones, is easily written as a panel version of the general likelihood in Anselin [1]: As such, the spatial lag model can be estimated combining the SAR filter with any spatial or non-spatial structure, e.g., random effects: > sarre <-spml(ricefm, data = RiceFarms, listw = ricelw, + model="random", lag = TRUE, spatial.error = "none")

Individual Effects and Spatial Errors
What differentiates the panel estimators from their cross-sectional counterparts is their ability to deal with the individual effects. In the MATLAB routines due to [58], which have long been the de facto standard in the econometric analysis of spatial panel data, the partial time-demeaning technique familiar from standard panel data (see, e.g., [68] Ch. 10) is combined with Anselin's ML framework: the data are either time-demeaned (FE) or partially time-demeaned (RE) in order to eliminate the individual effects, then standard SAR or SEM estimators are applied to the transformed data (see [61]).
Computationally, the fixed effects case is simpler, being encompassed by the pooled case: fixed effects models are generally estimated by pre-demeaning the data, according to the framework described in Elhorst [58]. This procedure has been criticized by [60] because time-demeaning alters the properties of the joint distribution of errors, introducing serial dependence: ( [71] p.257) also discuss the issue; see also [62] (p.33) for Monte Carlo evidence of the magnitude of the bias. [72] (3.2) suggest to either correct the estimates ex post or to circumvent the problem using a different orthonormal transformation of the data. The current implementation in splm can perform the Lee and Yu correction.
In the random effects case, following [58], to estimate the SARRE model one can add spatial filtering on y using I T ⊗ A = I T ⊗ (I n − λW) and the determinant of the spatial filter matrix, |I T ⊗ A| = |A| T , to the likelihood of the random effects model. Concentrating the likelihood with respect to β and σ 2 e as where θ is the quasi-demeaning parameter and the residualsẽ are those of the demeaned model with a spatial filter on yẽ = (I T ⊗ A)ỹ −Xβ, and maximizing it w.r.t. λ and θ; then iterating until convergence between this maximization and the GLS step, whose first order conditions arê yields an efficient two-step estimation procedure. The transformation procedure for the SEM model (which employs a spectral decomposition of the errors covariance) is omitted here: see [58] (pp. [19][20][21]. Although not explicitly stated by the author, [58]'s methodology is also easily extended, by combination, to the SAREM specification (for an application see [73]); on the other hand, it does not lend itself as easily to extensions in the direction of serially correlated errors (see the following).
The implementation in splm works instead on untransformed data and approaches random effects together with any other feature of the error covariance, spatial dependence included [74]. This has the advantage of keeping some components of the error term (most notably, the random effects) out of the spatial dependence, which can re-main a feature of the idiosyncratic error only, as in most applications in the literature (see, e.g., [44,[58][59][60][61][70][71][72][75][76][77][78][79][80][81][82][83][84][85]) but entails some computational complications. The alternative specification where the individual effects follow the same spatial process as the idiosyncratic errors, as in [86], which is also considered below, is much easier to compute.

Fixed Effects
Consistently with the conventions of the standard panel package plm, the most robust specification-the FE-is the default choice in the estimator function: > library(splm) > semfe <-spml(ricefm, data = RiceFarms, listw = ricelw, + lag = FALSE, spatial.error = "b") Spatial error correlation is remarkably high, as expected. What about spatial lag correlation? In his seminal papers which laid the foundations for practical estimation of spatial panel data models both under the fixed and the random effects assumptions, [58,61] does not consider combinations of spatially lagged response and spatially autocorrelated error term; while the original contribution of Case [15] did. With splm it is indeed possible to estimate a model containing both effects to assess the significance of each through a Wald test. We only report estimation results for the relevant coefficients: > saremfe <-spml(ricefm, data = RiceFarms, listw = ricelw, + lag = TRUE, spatial.error = "b") > summary(saremfe) The results confirm the relevance of the spatial error process, while the spatial lag is only marginally significant.

Independent Random Effects
The Rice Farms dataset, with observations coming from a large number of small villages employing the same standard technology, is a good candidate for a random effects analysis, perhaps after controlling for the region (which itself is likely to be a source of systematic differences in soil quality and climate).
Two kinds of random effects specifications are possible in spatial error panels: one where the spatial process in the error includes the random effects, the other where the individual random effects are idiosyncratic and independent of the neighbours' ones. In the latter case, µ ∼ I ID(0, σ 2 µ ), and the error term can be rewritten as: where B n = (I n − ρW). As a consequence, the composite error term becomes The hypothesis of independent random effects is the most natural to assume in many cases, including the one at hand; the idea being that random shocks, possibly related to weather, economic or health conditions, are likely to affect farms within the same village; while the individual heterogeneity captures the persistent random differences between individual farms in terms of soil quality or ability of the farmers. > library(splm) > semre <-spml(ricefm, data = RiceFarms, listw = ricelw, + model="random", lag = FALSE, spatial.error = "b") The estimated variance of the random effect is small in proportion of that of the idiosyncratic error (about one fifth); the spatial error correlation is confirmed as very strong.

Spatially Correlated Random Effects
The specification for the disturbances of [86] assumes that spatial correlation applies to both the individual effects and the idiosyncratic errors. Although the "Baltagi" and "KKP" data generating processes look similar, they do imply different spatial spillover mechanisms. The economic meaning of the two models is also different: in the first model only the time-varying components diffuse spatially, in the second spatial spillovers too have a permanent component [76]. Reference [87] (see also 2.4) on the difference between these two RE specifications. In this latter case, commonly referred to as "KKP", the composite disturbance term u = (ι T ⊗ I n )µ + ε follows a first order spatial autoregressive process of the form: Then the variance-covariance matrix of u is: where Ω ε = [σ 2 e I T + σ 2 µ J T ] ⊗ I n is the typical variance-covariance matrix of a one-way error component model.
It is not obvious why the spatial process should carry over to the individual effects in the case of the Rice Farms data; although one plausible hypothesis is that if the random individual heterogeneity is related to the quality of soil or to the working ability of the farmers-perhaps through tradition and cultural affinity-then one might see this as leading to a spatial process in the individual effects as well.

Serial and Spatial Correlation
Serial correlation in spatial panel data has long been overlooked, if not for the very special case of persistent random effects. Nevertheless, if autocorrelation of the autoregressive type were present it would bias ML estimates, and may be a symptom of more serious misspecification: unit roots or missing dynamics. Ref. [75] further generalized the structure of the errors, introducing serial correlation in the remainder of the error term together with the spatial correlation and random effects. They derived a number of LM tests for the different effects, either marginal (i.e., assuming the other effects out) or conditional (i.e., allowing for their presence). The general model is: While the marginal tests are already established testing procedures in the literature, the main contribution lies with the three-way joint test J and the one-way conditional tests C.1-3. The hypotheses under consideration are: 1. An early application of the C.2 conditional test for spatial correlation in RE panels with serially correlated errors, based on a prototype of the R code, appeared at the same SEA conference as the [75] paper and was later published as (see 0.290 [66]). Production versions of the test resulted in the function bsjktest with J and C.1-3 appearing in the new splm package for R [62].

ML Estimation of Models with Serially Correlated Errors
In the splm package, Millo et al. [62]-also based on the early empirical work of Case [15]-realized that the estimation framework allowed for the coexistence of spatial lag and spatial error, and introduced the possibility of combining them into the so-called SAREM (or SARAR) model, so that functionality of this kind was available in the package from the outset (i.e., before 2010). In the same fashion, it is possible to combine a spatial lag with a further generalization of the errors according to the Baltagi et al. [75] model outlined above. Extending Baltagi et al. [75] to the spatial lag, Millo [74] formalizes the estimation procedure for this kind of specification; he also presents a similar extension to serial correlation of the errors a la Kapoor et al. [86].
> saremsrre <-spreml(ricefm, data = RiceFarms, w = riceww, + lag = TRUE, errors = "semsrre") > sarem2srre <-spreml(ricefm, data = RiceFarms, w = riceww, + lag = TRUE, errors = "sem2srre") The encompassing model's estimates confirm that the relevant spatial process is the error; and that random effects of modest magnitude are present, together with an even weaker form of autocorrelation in the remainder errors. Again, the practical difference between independent or spatially correlated random effects turns out to be minimal.
The estimates from all the spatial panel models in the previous paragraphs are presented in the following Table 11 :

Endogeneity in Static Panel Data Models
As we mentioned early, the initial contribution to the application of GM methods for spatial panels dates back to [86]. The y considered a panel data model involving a first order spatially autoregressive disturbance term that, in turn, allowed for an error component structure in the innovations. The proposed methodology was based on an extension of the moment conditions put forth from the same authors in the context of a cross-sectional model. A few years later while considering a spatial panel version of the Hausman test, ref. [70] extended the estimation procedure to a Cliff and Ord type model including the spatial lag of the dependent variable as well as a spatially lagged one-way error component model. The y implemented instrumental variables estimation under both the fixed and the random effects specifications. However, Piras [88] noted that they were not taking full advantage of the six moment conditions derived in [86] since they were using only a subset of those moment conditions.As a consequence, ref. [88] suggested an improvement that included all six moment conditions in [86]. The approach taken by [88] followed more closely the fixed and between effects two stage least squares estimator for spatial panel models proposed by [89]. (This was in turn an extension of the [90] error component 2SLS estimator to a spatial panel model.) The function spgm implements the procedure described in [88] with the extra feature of considering additional (other than the spatial lag) endogenous variables. Table 12 compares results from the "classical" error component two stage least square (EC2SLS) in [90] and the spatial version of the EC2SLS. The first model can be obtained by setting both lag and error arguments to FALSE and specifying endogenous variables along with instruments. To obtain the second model the user has to include both spatial lag and error parameters. The data set to produce Table 12 where presented in Section 2.4 and relates to an economic model of crime estimated by [17]. Keep in mind that [17] had a genuine concern about the endogeneity of police per-capita ad the probability of arrest. The refore, those two variables are instrumented using per-capita tax revenue and a mix of different types of offense. The spatial lag parameter at the bottom of the second column in Table 12 is positive and statistically significant and then justifies the spatial specification. The spatial connection are driven from the fact that counties with high (low) levels of crime are generally clustered. This may be due to some sort of copy-cat policies occurring within the counties.

Developments and Alternative Approaches
Before concluding, we will draw attention to work in progress, and to alternative approaches to spatial regression for area data, first for cross-sectional models, later for spatial panel models.

Developments and Alternative Approaches in Cross-Sectional Models
One of many implementations of Markov Random Field (MRF) spatially structured random effects in generalized additive models (GAM) is found in Wood [91], implemented in [92]. The neighbour objects needs to be matched to the variable expressing the random effect, here State. The MRF smooth requires manual adjustment of the number of knots, because here we are not using a multi-level approach and so approach the upper bound on the number of parameters to be estimated. In addition, the MRF approach does not row-standardize the spatial weights, using a conditional rather than a simultaneous autoregression. > used.cars$State <-as.factor(used.cars$State) > names(nb) <-as.character(used.cars$State) > N <-nrow(used.cars ) > library(mgcv ) > gam_obj2 <-gam(av55_59˜transp + salesTax + + s(State, bs="mrf", k=N-2, xt=list(nb=nb)), data = used.cars) Another approach is through hierarchical generalized linear models (HGLM), presented by [93] and implemented in [94]. Both GAM and HGLM can fit Gaussian responses, but can also fit discrete responses. This implementation can accommodate either SAR or CAR spatially structured random effects. All such approaches estimate the per-observation random effects and their standard errors, so are somewhat constrained as the number of observations increases.
> library(hglm ) > hglm_obj2 <-hglm(fixed=av55_59˜transp + salesTax, random=˜1|State, + data=used.cars, rand.family=SAR(D=as(lw, "CsparseMatrix")) Figure 2 shows the spatially structured random effects, where the HGLM SAR and GAM MRF values are very similar indeed (Table 13). In an extension of work handling missing observations of the response variable, ref. [95] began a series of articles, followed by [96,97]. Reference [98] give a complete survey of the challenges raised in predicting from models when the observations are autocorrelated (implemented in spatialreg for ML estimators); this has obvious extensions to spillover between training, validation and test data sets in machine learning contexts.

Limited Dependent Variables Models
In addition to the resolutions shown above using GAM or HGLM, a number of other approaches have been proposed for limited dependent variables, in particular discrete dependent variables. Ref. [99] follow [2] in fitting spatial probit models using MCMC, implemented in [100]. The link function used in this approach differs in character from those used in generalized additive and linear mixed models, such as those fitted using standard Bayesian techniques, which dominate the application of such approaches outside spatial econometrics. It is also possible to use GMM approaches, as shown by Klier and McMillen [101] and implemented in [23]. The same package also provides implementations od spatial quantile regression [102].

Multi-Level Models
Multi-level models involve the grouping of observations within nested containers, where the spatial processes may play out at the finest level, or at coarser spatial scales. Recent work has been undertaken by [103][104][105], and is implemented in [106] using MCMC. Many of the general packages for Bayesian model fitting, such as [107] implemented in [108] can also be used for fitting multi-level models. A comparative review is given by [109].

Spatial Filtering Methods
Spatial filtering methods as developed by Griffith [110] build on using standard linear and generalized linear models supplemented with selected eigenvectors from the spatial weights matrix. In [111][112][113], examples were given of how standard and non-standard spatial econometric problems could be approached using spatial filtering. Tiefelsdorf and Griffith [114] proposed that the eigenvectors for inclusion should be selected by their ability to reduce residual autocorrelation rather than to increase model fit. This approach was implemented by Chun and Tiefelsdorf in spdep and moved to spatialreg [34], with two steps, first to select eigenvectors taken from the spatial weights matrix doubly centred using the hat matrix of the actual regression, then using lm to fit the model, effectively removing residual autocorrelation: > SF0 <-SpatialFiltering(av55_59˜transp + salesTax, data = used.cars, + nb=nb, style="W") > SF_obj2 <-lm(av55_59˜transp + salesTax + fitted(SF0), + data = used.cars )  Figure 3 shows the products of the selected eigenvectors and their estimated regression coefficients in map form. Typically, the small subset of eigenvectors selected mops up spatial autocorrelation in the residual. References [115,116] adopt a similar approach in a generalized linear model context, implemented in spdep by Pedro Peres-Neto and moved now to spatialreg as ME analogous with SpatialFiltering, but centering the spatial weights matrix on the null model hat matrix, and using bootstrap methods in evaluating the the choice of eigenvectors. The correlations between the implied cumulated outcomes of these methods are shown in Table 13. Reference [117] describe many of the underlying motivations, including the view that Moran eigenvector spatial filtering approaches may permit both spatial autocorrelation and spatial scale tto be accommodate in a single model; a further implementation is given in [118].   [114] approach to spatial filtering.
Murakami and Griffith [119] provide a fresher version of spatial filtering implemented in [120]. This also appears to centre the spatial weights matrix on the null model hat matrix, and chooses eigenvectors not to reduce residual autocorrelation, but chooses those among the eigenvectors with positive eigenvalues that increase model fit most up to a threshold to control overfitting. The default approach uses an exponential variogram model to generate the weights matrix from planar coordinates. The meigen function subsets the full set of eigenvectors before the data are seen, then esf calls lm itself while further subsetting the eigenvectors.
> library(spmoran ) > centroids_laea <-st_centroid(st_geometry(used.cars_laea) ) > meig <-meigen(st_coordinates(centroids_laea)) > y <-model.response(model.frame(av55_59˜transp + salesTax, + data=used.cars_laea) ) > X <-model.matrix(˜transp + salesTax, data=used.cars_laea) > esf_obj2 <-esf(y, X, meig=meig ) Figure 4 shows the products of the first four eigenvectors chosen and their regression coefficients, and differs from the approaches shown above mostly in using a distance model to relate the observations to each other rather than the graph of neighbours, and in selecting to improve fit rather than reduce residual autocorrelation.   Table 13 shows the correlations between the two estimates of spatially structured random effects, three cumulated spatial filtering approaches, and the spatially structured term implied by the ML estimates of the spatial error model. As can be seen, they are very similar to each other, so the choice of approach may be fairly flexible and relate more to the needs of users and their domain usages that to a single body of theory. While the presence of spatial dependence has been widely recognized as an issue to address in econometric models, spatial heterogeneity has not received that much attention ans, therefore, is not always adequately taken into account. Possible reasons for this should be searched not only in the theoretical and practical difficulties to identify spatial heterogeneity separately from spatial dependence in empirical models, but also (and perhaps primarily) to the lack of readily available software to perform this kinds of analysis. There is a wide array of models to account for spatial heterogeneity, but only few of them are available in R. (An interesting overview can be found in [1].) One can model heterogeneity allowing for a different parameter for each observations. This is the idea embedded in the so-called geographically weighted regression [121] available in R from the package spgwr. A different approach to this extreme case would be assuming that spatial heterogeneity can be classified into a limited number of spatial regimes characterized by different values of the regression parameters. One of the future directions for spatial models in R would be geared towards the development of such regime models.

Higher Order Spatial Models
A natural extension of the techniques in sphet would be considering higher order spatial models [122,123]. The presence of additional lags (either of the dependent variable, of the regressors, or of the error term) would allow to test different types of interactions and make the model interpretation richer.

Systems of Spatial Equations
An earlier version of the package splm included codes to estimate spatial simultaneous equation and spatial seemingly unrelated regression equations. By the time splm was published the routine for those models migrated into a new package spse that, unfortunately, never saw the light. spse contained two major functions: spsegm and spseml. spsegm implemented the feasible generalized three stages least square estimator (FGS3SLS) for simultaneous systems of spatially interrelated cross sectional equations put forward by [124], while spseml implemented ML estimation of simultaneous systems of spatial seemingly unrelated regression equation following the lines in [1]. The package is now hosted on Github (https://github.com/gpiras/spse) and is (again) under development. Future plans will include the extension to simultaneous equation combined with higher order spatial interactions [125]. (In a similar context, the package spsur [126] also deals with seemingly unrelated regression equations.)

Machine Learning and Spatial Econometrics
A first attempt to insert a spatial lag model into a regression tree has been presented by Wagner and Zeileis [127], and is implemented in [128].

Developments and Alternative Approaches in Spatial Panels
Here two important developments will be mentioned briefly, referring the interested reader to the cited references for futher details.

Dynamic Spatial Panels
Many economic phenomena are inherently dynamic in nature and therefore call for estimators allowing for time lags of the included objects, most notably for a lagged dependent variable. Nevertheless, the abundant results from the time series literature do not easily carry over to the spatial panel case. The estimation of panel models containing both individual effects and a lagged dependent variable is well known to be problematic because of the serial correlation induced in the error terms by the transformation procedures employed to eliminate the individual heterogeneity (either time-demeaning or first differencing), so that e.g., the fixed effects estimator for a dynamic model would be biased [129]. Arellano and Bond [130] famously proposed a GMM procedure for consistently estimating the dynamic model with individual effects; yet their estimator assumes cross-sectionally uncorrelated errors and is hence not appropriate in a spatial context.
Research on dynamic spatial panels has been quite recent and is mostly associated with Elhorst. His first paper on the subject [131] set the stage in 2001; then he provided a first solution based on approximating the initial conditions (complete with MATLAB routines in the public domain) some years later [132]. Still, the ML estimator of [132] has a number of caveats: refs. [83,133] derive the asymptotic properties and a bias correction. The review paper of [134] discusses the general space-time specification, the different estimation approaches of ML/QML, GMM and Markov Chain Monte Carlo (MCMC). In general, despite the existence of some solutions, estimation of spatial dynamic panels can be said to be a still developing field. R software for this task is still lacking but is likely to be developed in the near future.

Heterogeneous SAR Panels
Another recent extension of the mainstream spatial panel is the heterogeneous spatially autoregressive (HSAR) estimator of [135] which, in the general framework of the SAR panel with fixed effects, relaxes the assumptions of parameter homogeneity and of error homoscedasticity, so that the model becomes y it = λ N ∑ j=1 w ij y jt + β x it + u it , i = 1, 2, . . . , N; t = 1, 2, . . . , T where y it is the response variable for unit i observed at time t, x it = (x i1,t , x i2,t , . . . , x ik,t ) is a k × 1 vector of exogenous explanatory variables, with the associated k × 1 vector of slope parameters, β = (β 1 , β 2 , . . . , β k ) , w ij is the element (i, j) of W; and the vector of individual SAR coefficients λ = (λ 1 , λ 2 , . . . , λ k ) . As in a standard linear panel data model, the idiosyncratic error is in turn the sum of two terms u it = µ i + ε it , i = 1, 2, . . . , N; t = 1, 2, . . . , T where µ i is a time-invariant unit specific effect and ε it is the idiosyncratic error. µ are allowed to be correlated with the regressors, and ε to be heteroscedastic. ML estimators are provided for the individual coefficients; then the latter can be averaged according to the mean groups (MG) method of Pesaran and Smith [136] to obtain average coefficients and their dispersion matrix. A project to produce a multilanguage implementation, including an R package, is nearing completion [137].

Conclusions
This paper was dedicated to a review of the functionality for spatial econometric methods available in the R system for statistical computing, in the light of the historical developments of methods, mostly following a chronological order and hinting when appropriate at implementations in different software environments. It addressed estimators and tests for: spatial econometric models on cross sectional data, both based on ML and on GM; spatial panel models with either correlated or independent unobserved heterogeneity; spatial panel models with possibly endogenous explanatory variables. The methods have been presented through empirical examples based on four well-known and historically relevant datasets. At the end of the paper, several active areas of development are hinted at.
Although some specific areas of spatial econometric modelling have been covered in recent books-cross-sectional methods in [3] (Appendix B), panel data in [138] (Ch. 10)this is the first comprehensive review addressing the development of spatial methods in R in a historical perspective and trying to cover all relevant functionality in both the cross-sectional and the panel domain.
R is considered the lingua franca of statistical computing. As such, most available statistical functionality is available under form of R packages, including very powerful optimization features. Moreover, the R system offers a wide range of tools for dealing with spatial data, including mapping and automated computing of spatial weights. Therefore, although there are several other viable and powerful options, R is arguably the ideal development environment for spatial regression modelling. Finally, two aspects that need to be taken into consideration are that R, unlike other software, is open-source and crossplatform.
The R infrastructure described in the paper is entirely open source and packaged into the standard, user-friendly and documented packages so that it is ready for the perusal of empirical researchers. The paper itself is entirely replicable in its computational aspects, based on open source code and data from the R project. As such, it complies with the reproducibility requirements of Peng [139], the "gold standard of full replication", as providing "a detailed log of every action taken by the computer" which can be reproduced by anybody on any system.