Non-Iterative Multiscale Estimation for Spatial Autoregressive Geographically Weighted Regression Models

Multiscale estimation for geographically weighted regression (GWR) and the related models has attracted much attention due to their superiority. This kind of estimation method will not only improve the accuracy of the coefficient estimators but also reveal the underlying spatial scale of each explanatory variable. However, most of the existing multiscale estimation approaches are backfitting-based iterative procedures that are very time-consuming. To alleviate the computation complexity, we propose in this paper a non-iterative multiscale estimation method and its simplified scenario for spatial autoregressive geographically weighted regression (SARGWR) models, a kind of important GWR-related model that simultaneously takes into account spatial autocorrelation in the response variable and spatial heterogeneity in the regression relationship. In the proposed multiscale estimation methods, the two-stage least-squares (2SLS) based GWR and the local-linear GWR estimators of the regression coefficients with a shrunk bandwidth size are respectively taken to be the initial estimators to obtain the final multiscale estimators of the coefficients without iteration. A simulation study is conducted to assess the performance of the proposed multiscale estimation methods, and the results show that the proposed methods are much more efficient than the backfitting-based estimation procedure. In addition, the proposed methods can also yield accurate coefficient estimators and such variable-specific optimal bandwidth sizes that correctly reflect the underlying spatial scales of the explanatory variables. A real-life example is further provided to demonstrate the applicability of the proposed multiscale estimation methods.


Introduction
Geographically weighted regression (GWR) models [1][2][3], which are an extension of the linear regression models by allowing the regression coefficients to vary over space, have been a popular tool for modeling spatial heterogeneity in regression relationships. A GWR model is originally calibrated by the locally weighted least-squares procedure, where the local weights at each focal spatial location are determined by a pre-specified kernel function with a single bandwidth for all of the regression coefficients, and the optimal bandwidth size is chosen by a data-driven criterion, such as the cross-validation (CV) or the corrected Akaike information criterion (AICc) [2,3]. In the GWR technique, spatial heterogeneity in regression relationships is revealed by spatial variation patterns of the estimated regression coefficients. Therefore, the accuracy of coefficient estimators is essential to validly interpret spatial heterogeneity in regression relationships.
In geographical information science, spatial scale is one of the most important concepts [4], and a spatial process inherently operates at a spatial scale [5]. In a GWR model, different spatial scales of the explanatory variables may lead to the fact that their respective coefficients have different levels of spatial heterogeneity, and a common bandwidth in the traditional GWR technique can not produce valid estimators for all of the coefficients [6,7], which has also been theoretically proven in statistically varying coefficient models [8]. In order to overcome this shortcoming, Yang [6] proposed GWR with flexible bandwidths in which the backfitting procedure [9] is employed to iteratively estimate the spatially varying coefficients with the variable-specific bandwidth sizes selected by the AICc or CV criterion. Furthermore, Fotheringham et al. [7] explicitly connected the spatial scale of each explanatory variable with its specific bandwidth and termed the model as a multiscale geographically weighted regression model. A similar calibration procedure was also suggested by Leong and Yue [10] from the perspective of improving the coefficient estimation accuracy of a GWR model. For ease of presentation, we henceforth refer to this multiscale estimation method as GWR-BF, meaning that a GWR model is calibrated by the backfitting-based iterative procedure. It has been demonstrated that GWR-BF will yield not only more accurate estimators of the regression coefficients but also provide information about the spatial scale at which each explanatory variable operates [7,10,11].
Recently, GWR-BF has been extended to some other GWR-related models. For example, Chen and Mei [12] extended the GWR-BF method and formulated a multiscale estimation procedure for the semi-parametric GWR models originally proposed by Brunsdon et al. [13]; furthermore, combining the backfitting procedure with the profile maximum likelihood estimation for spatial autoregressive geographically weighted regression (SARGWR) models [14][15][16], Chen et al. [17] formulated a multiscale estimation method for the models; Wu et al. [18] proposed a backfitting-based multiscale estimation approach for geographically and temporally weighted regression (GTWR) models [19,20]; Zhang et al. [21] suggested a unilateral temporal weighting scheme and proposed a more flexible multiscale estimation method for GTWR models. Due to its superiority, the multiscale estimation for the GWR and the related models have been applied to many areas for spatial or spatiotemporal data analysis (see, for example, [22][23][24][25][26][27][28]).
The backfitting-based multiscale estimation methods for the GWR and the related models, however, are iterative algorithms in estimating individual regression coefficients and searching for their respective optimal bandwidth sizes. Therefore, such methods are very time-consuming, especially when the number of explanatory variables is large. In particular, the multiscale estimation method for SARGWR models in Chen et al. [17] is much more time-demanding because, in addition to the computation cost of the iterative backfitting algorithm and maximum likelihood estimation, the spatial autoregressive coefficient should also be optimized in each iteration loop, where the determinant of the related matrix with its order being the sample size needs to be calculated for each candidate value of the autoregressive coefficient, although this maximum-likelihood-based multiscale estimation can yield more accurate estimators for both the autoregressive and the regression coefficients. Therefore, the development of cost-effective algorithms is essential for realworld applications of multiscale estimation methods of the GWR and its related models.
Recently, there has been a rapid development of machine-learning-based GWR methods, such as geographically weighted extreme learning machine [29], geographically weighted elastic net [30], geographically (and temporally) neural network weighted regression [31,32], geographically weighted regression with the integration of machine learning [33], and the adapted geographically LASSO [34]. One can refer to the reference [35] for a comprehensive overview of spatial machine learning methods, including GWR models. The machine-learning-based GWR methods are, in general, more efficient in estimation and accurate in prediction than the multiscale and even the traditional calibration methods. Nevertheless, how to derive the spatial scale information of each explanatory variable with machine-learning-based GWR methods remains to be investigated.
Focusing on the GWR-BF multiscale estimation procedure, some efforts have been devoted to the reduction of computation cost. For example, Yang [6] suggested to pre-specify a larger value of the convergence threshold. However, this may lead to the premature of coefficient estimators. Based on the power of modern computers, Li and Fotheringham [36] formulated a parallel implementation for GWR-BF. Very recently, Wu et al. [37] proposed a non-iterative multiscale estimation procedure for GWR models. In this estimation method, the idea of the two-step locally weighted least-squares procedure proposed by Fan and Zhang [8] for fitting the statistically varying coefficient models is employed to implement the non-iterative estimation and the local-linear-fitting method for calibrating GWR models [38] is further used in each step to reduce the boundary effect of coefficient estimators. We henceforth abbreviate this non-iterative multiscale estimation method as GWR-LL, where "LL" means the coefficients are estimated by the local-linear-fitting method in both steps. The simulation study with a real-life example in Wu et al. [37] shows that GWR-LL is not only much more cost-effective than GWR-BF but can also significantly reduce the boundary effect of coefficient estimators. Furthermore, GWR-LL also yields such variablespecific optimal bandwidth sizes that correctly characterize the respective spatial scales of the explanatory variables.
As a kind of important GWR-related model, the spatial autoregressive geographically weighted regression (SARGWR) model incorporates a spatial lag term of the response variable into the GWR model to simultaneously take into account spatial autocorrelation in the response variable and spatial heterogeneity in the regression relationships [14][15][16]. Since SARGWR models can simultaneously consider the two fundamental properties of spatial data, i.e., spatial autocorrelation and spatial heterogeneity, they have a wide application background in spatial data analysis. Therefore, it is of great importance to the development of some cost-effective multiscale estimation methods in view of the advantages of the multiscale estimation methods for calibrating the GWR and its related models. As aforementioned, however, the existing iterative multiscale estimation for SARGWR models proposed by Chen et al. [17] is extremely time-consuming.
Motivated by the simplicity of the two-stage least-squares (2SLS) method in which the spatial lag term is replaced by an instrument-variables-based estimator, and then the autoregressive coefficient is estimated by the ordinary least-squares procedure [39,40], and considering that the GWR-based 2SLS estimation for SARGWR models has been explicitly formulated in Mei and Chen [41], we first extend in this paper the GWR-based 2SLS estimation to the local-linear GWR-based 2SLS estimation of SARGWR models. Then, the idea of GWR-LL in Wu et al. [37] is employed to develop a non-iterative multiscale estimation method for SARGWR models. Specifically, the extended local-linear GWR-based 2SLS estimation, instead of the profile maximum likelihood estimation in Chen et al. [17], of SARGWR models are used to derive the initial estimators of the spatial autoregressive coefficient and the regression coefficients for the purpose of reducing the computation cost. Then, the GWR-LL procedure is applied to the estimation of the regression coefficients. Furthermore, the 2SLS estimation for spatial autoregressive models [39,40] is used to reestimate the autoregressive coefficient. This non-iterative multiscale estimation method is referred to as SARGWR-LL henceforth, meaning that the local-linear GWR estimation method [38] is used in both the initial and final steps. Moreover, considering that the locallinear estimation of GWR models is more time-consuming than their traditional estimation, and the initial estimators of the regression coefficients might have less influence on their final multiscale estimators, we further propose a simplified scenario of SARGWR-LL by replacing the initial local-linear GWR-based 2SLS estimators of the spatial autoregressive coefficient and regression coefficients with their respective GWR-based 2SLS estimators, which we refer to as SARGWR-GL in the subsequent presentation, where "G" means that the GWR-based 2SLS estimation method is employed in the initial step.
The rest of the paper is organized as follows. In Section 2, the GWR-based 2SLS estimation of SARGWR models [41] is briefly reviewed, on which its local-linear GWRbased version is derived; based on the GWR-based 2SLS estimation and its local-linear GWR-based version of SARGWR models, the SARGWR-LL and its simplified scenario SARGWR-GL are finally formulated. A simulation study and a real-life example based on Dublin voter turnout data are conducted in Section 3 to assess and compare the performance of the related multiscale estimation methods for SARGWR models. The paper is ended with a conclusion and discussion.

Spatial Autoregressive Geographically Weighted Regression (SARGWR) Model
Let Y be the response variable, and X 1 ,X 2 , · · · ,X p be the associated explanatory variables. Given their observations {(y i ; x i1 , x i2 , · · · , x ip )} n i=1 collected at n spatial locations , the SARGWR model studied in this paper is where the parameter ρ, which is assumed to be |ρ| < 1, is called the autoregressive coefficient, measuring the intensity of spatial autocorrelation in the response variable Y; {w ij } n i,j=1 are the elements of a pre-specified row-standardized spatial weights matrix W = w ij n×n of the n sampling locations with w ii = 0 (i = 1, 2, · · · , n) assumed by convention, which characterizes the neighborhood relationship between each sampling point and its neighbors; {β j (u, v)} p j=1 are p regression coefficients which are unknown functions of the spatial coordinates (u, v); and {ε i } n i=1 are independent and identically distributed errors with E(ε i ) = 0 and Var(ε i ) = σ 2 > 0. We can take X 1 = 1 (i.e., x i1 = 1 for i = 1, 2, · · · , n) to make the model include a spatially varying intercept.
Let y = (y 1 , y 2 , · · · , y n ) T ; The SARGWR model in Equation (1) can be expressed by matrix notation as Remark 1. The above SARGWR model assumes that the autoregressive coefficient ρ is constant. This means that spatial autocorrelation in the response variable between the observation of the response variable at any spatial sampling unit and those observed at its neighbors is the same over space, which may be unrealistic for many real-life spatial data sets. The more general SARGWR model should allow ρ to vary over space, i.e., ρ(u, v) instead. Considering that the SARGWR model with constant autoregressive coefficient is a kind of standard model in the application, and the 2SLS procedure can be directly employed to estimate the parameter ρ, we then mainly focus this kind of SARGWR model on deriving its non-iterative multiscale estimation method. The extension to the SARGWR model with a spatially varying autoregressive coefficient ρ(u, v) will be discussed in the final section. In this subsection, we briefly review the GWR-based 2SLS estimation of the SARGWR model, which we will use to formulate the SARGWR-GL multiscale estimation procedure. For more detailed derivation, one can refer to the supplementary materials at https://doi. org/10.1016/j.spasta.2022.100666 provided in Mei and Chen [41]. We rewrite the model in Equation (2) as

Preliminary
We treat the above model as a GWR model with the observation vector of the response variable beingỹ = y − ρWy, and obtain, according to Brunsdon et al. [2], that the GWR estimators of β(u i , v i ) (i = 1, 2, · · · , n) arê where the subscript "G" means the traditional GWR estimation method, X = (x 1 , x 2 , · · · , x n ) T is the design matrix, and is the calibration weights matrix at (u i , v i ), which is related to the bandwidth h. Here, the word "calibration" is used in order to distinguish this matrix from the spatial weights matrix W in the model. Substitutingβ G (u i , v i ) (i = 1, 2, · · · , n) into M, we obtain the estimator of M as where I n is the identity matrix of order n, and Furthermore, replacing M in Equation (3) with its estimator M G yields the following artificial spatial autoregressive model: In the above model, since Wy is endogenous, an instrumental estimator of Wy is needed to derive a consistent estimator of ρ in the framework of least-squares estimation. As suggested by Geniaux and Martinetti [16] as well as by Mei and Chen [41], the instrumental variables Q = (X, W X (−1) , W 2 X (−1) ), where X is the design matrix, and X (−1) is such a matrix that the first column of X is removed when an intercept is included in the SARGWR model, can be used to estimate Wy by formulating the following GWR model: where y wi is the i-th element of Wy and q T i = (1, q i2 , · · · , q i,3p−2 ) is the i-th row of the instrumental variables Q. Calibrating the model in Equation (7) with the traditional GWR technique [2], we obtain the fitted values of y wi (i = 1, 2, · · · , n) aŝ Therefore, the instrumental estimator of Wy, which we denoted byŷ w , iŝ where . . .
Replacing Wy in the model in Equation (6) with its instrumental estimatorŷ w in Equation (8) and using the least-squares estimation procedure, we obtain the GWR-based 2SLS estimator of the autoregressive coefficient ρ aŝ Substitutingρ G into Equation (4), we obtain the GWR-based 2SLS estimators of the regres- and the fitted vector can be expressed aŝ where the hat matrix is For ease of presentation, we henceforth refer to the GWR-based 2SLS estimatorŝ ρ G andβ G (u i , v i ) (i = 1, 2, · · · , n) in Equations (10) and (11) as their respective GWR-2SLS estimators.

Local-Linear GWR-Based 2SLS Estimation of the SARGWR Model
In what follows, we extend the GWR-based 2SLS estimation of the SARGWR model to local-linear GWR-based estimation, which will be used to formulate the multiscale estimation procedure, SARGWR-LL.
Treating once again the model in Equation (3) as a GWR model with the observation vector of the response variable beingỹ = y − ρWy. According to Wang et al. [38], the local-linear estimators of β( where the subscript "L" represents the local-linear GWR estimation, is the design matrix at (u i , v i ), 0 p×(2p) is the zero matrix of order p × (2p), and W h (u i , v i ) is the same diagonal calibration weights matrix in Equation (5). The resulting estimator of The same derivation for Equation (6) yields the following artificial spatial autoregressive model: Substitute the instrumental estimatorŷ w of Wy in Equation (8) into the above model and obtain the local-linear GWR-based 2SLS (henceforth referred to as LGWR-2SLS) estimator of ρ asρ The LGWR-2SLS estimators of the regression coefficient vectors arê The fitted vector of the response variable at the sampling locations {(u i , v i )} n i=1 can be expressed asŷ where the hat matrix H L (h) is of the same form as that in Equation (13) except that S G (h) therein is replaced by S L (h), shown in Equation (16).

Generating the Calibration Weights Matrix
As is well known in the GWR literature [2,3], the elements in the calibration weights matrix in Equation (5) are generated by a kernel function K(t), which is usually taken to be the Gaussian kernel or the bisquare kernel, and the bandwidth h can be set to be fixed or adaptive. Specifically, given a focal sampling location (u i , v i ), the weights with a fixed bandwidth at (u where {d ij } n j=1 are the distances (usually the Euclidean distance) from (u i , v i ) to all of the sampling locations (u j , v j ) n j=1 . The weights with an adaptive bandwidth at (u i , v i ) are commonly generated by the bisquare kernel, which shows where the bandwidth h ik is the distance from (u i , v i ) to its k-th nearest sampling location and is variable with (u i , v i ). As shown by Gollini et al. [42], when the sampling locations are irregularly distributed over space, and the weights with an adaptive bandwidth perform better than those with a fixed bandwidth.
Throughout this paper, the AICc criterion [3] is employed to select the optimal bandwidth size in both GWR-2SLS and LGWR-2SLS estimation approaches. Specifically, let H(h) be the hat matrix in either of the two estimation methods. The AICc score is computed by where H(h) = H G (h) in Equation (13) for GWR-2SLS and H(h) = H L (h) in Equation (19) for LGWR-2SLS. The optimal bandwidth size, which we denote by h 0 , is Remark 2. When the adaptive bandwidth is set, the optimal size of the parameter k, which is taken as a proxy of the adaptive bandwidth h ik at each (u i , v i ), is selected by the AICc criterion. To avoid causing confusion, henceforth, we call k the bandwidth.

SARGWR-LL Procedure
Based on the LGWR-2SLS estimatorsρ L in Equation (17) andβ L (u i , v i ) in Equation (18) of the autoregressive coefficient ρ and the regression coefficients β(u i , v i ) (i = 1, 2, · · · , n), the SARGWR-LL non-iterative multiscale estimation is formulated by the following steps: (i) Let h 0L be the optimal bandwidth size selected in the LGWR-2SLS estimation of the SARGWR model. Fix the instrumental estimatorŷ w of Wy in Equation (8) andρ L in Equation (17) at h 0L , which we denote byŷ w (h 0L ) andρ L (h 0L ), respectively; (ii) Leth = c h 0L where c ∈ (0, 1) is a constant called the bandwidth shrinking parameter. Compute the coefficient estimatorsβ L (u i , v i ) (i = 1, 2, · · · , n) in Equation (18) in which the bandwidth h is replaced byh andρ L is substituted byρ L (h 0L ). We denote the resulting estimators bỹ (iii) Fixing each m ∈ {1, 2, · · · , p} and substitutingρ L (h 0L ) and β into the SARGWR model in Equation (1), we formulate the following artificial GWR model with a single spatially varying coefficient: Calibrating the above model by the local-linear GWR estimation [38] with the optimal bandwidth size selected by the AICc criterion, we obtain the SARGWR-LL estimators of and h m is the optimal bandwidth size selected by the AICc criterion in which y and the hat matrix H(h) in the AICc score in Equation (22) are respectively replaced by y * (m) and (iv) Repeating Step (iii) for each of m = 1, 2, · · · , p, we finally obtain the SARGWR-LL with h 1 , h 2 , · · · , h p being the final optimal bandwidth sizes of the p coefficients β 1 (u, v), β 2 (u, v), · · · , β p (u, v) in the SARGWR model in Equation (1), respectively; (v) For each i = 1, 2, · · · , n, substituting into the SARGWR model in Equation (1) yields the following artificial spatial autoregressive model: w ij y j +ε i , i = 1, 2, · · · , n, orỹ = ρWy +ε, whereỹ = (ỹ 1 ,ỹ 2 , · · · ,ỹ n ) T andε = (ε 1 ,ε 2 , · · · ,ε n ) T . Replacing Wy in the above model with its instrumental estimatorŷ w (h 0L ) and re-estimating ρ by the least-squares method, we obtain the final SARGWR-LL estimator of ρ aŝ Remark 3. In step (ii), a bandwidth shrinking parameter c is introduced to shrink the optimal bandwidth size h 0L selected in the LGWR-2SLS estimation, and the shrunk bandwidthh is used to obtain the initial estimators of the regression coefficients. As noted by Fan and Zhang [8], a smaller bandwidth size will reduce estimation biases but increase estimation variances of the regression coefficients. The less biased initial estimators in Equation (24) are helpful in increasing the accuracy of the final estimators of the regression coefficients, while the increased variances can be expected to be smoothed out in the following step (iii).

SARGWR-GL Procedure
The above SARGWR-LL procedure takes the LGWR-2SLS estimators of the regression coefficients to be the initial estimators. As shown in Equation (18), the LGWR-2SLS estimators of the regression coefficients relate to a more complicated design matrix X(u i , v i ), shown in Equation (15), which should be re-set at each of the sampling locations . Therefore, computingρ L in Equation (17) andβ L (u i , v i ) (i = 1, 2, · · · , n) in Equation (18) is more time-demanding than computing their GWR-2SLS estimatorŝ ρ G in Equation (10) andβ G (u i , v i ) (i = 1, 2, · · · , n) in Equation (11). Furthermore, the initial estimators of ρ and β(u i , v i ) (i = 1, 2, · · · , n) might have less effect on their final estimators because β(u i , v i ) (i = 1, 2, · · · , n) will be re-estimated by the local-linear GWR procedure and ρ will be re-estimated by the 2SLS estimation method. With these considerations, we replace the initial LGWR-2SLS estimators of ρ and β(u i , v i ) (i = 1, 2, · · · , n) with their respective GWR-2SLS estimators and propose a simplified scenario of SARGWR-LL, which we refer, as mentioned in the introduction, to SARGWR-GL. The main steps of the SARGWR-GL procedure are as follows: (i) Let h 0G be the optimal bandwidth size selected in the GWR-2SLS estimation of the SARGWR model. Fix the instrumental estimatorŷ w of Wy in Equation (8) andρ G in Equation (10) at h 0G , which we denote byŷ w (h 0G ) andρ G (h 0G ), respectively.
(iii) The steps followed are totally the same as those in Section 2.3.1, except thatŷ w (h 0L ) in Equation (25) is replaced byŷ w (h 0G ).

SARGWR-BF Procedure
Moreover, for the purpose of comparison, we accordingly formulate a backfittingbased multiscale estimation procedure for the SARGWR model, in which the GWR-2SLS estimation method is used in each iteration. We refer to this procedure as SARGWR-BF and describe its detailed steps in Appendix A because this part is less related to the main theme of this paper.

Simulation Study and Real-Life Example
In this section, a simulation study is conducted to assess the performance of the proposed SARGWR-LL and SARGWR-GL multiscale estimation methods for the SARGWR model. In particular, the proposed non-iterative multiscale estimation methods and the iterative multiscale estimation method SARGWR-BF described in Appendix A are compared in both the accuracy of the coefficient estimators and the computation efficiency. Furthermore, a real-life example based on Dublin voter turnout data is given to show the applicability of the proposed multiscale estimation methods.

(i) Spatial layout
We took the unit square [0,1] × [0,1] in a Cartesian coordinate system as the spatial region. Considering that the sampling locations in many practical problems are irregularly distributed over space, the sampling points {(u i , v i )} n i=1 were designed in the way that each pair of (u i , v i ) was independently drawn from the uniform distribution U(0, 1) with n = 400. The sampling points used in the simulation are depicted in Figure 1, and they are fixed throughout the simulation.

(ii) Model for generating the experimental data
The following SARGWR model was considered: where the row-standardized spatial weights matrix W = (w ij ) n×n was determined by the l-nearest neighbor procedure. As pointed out by Boots and Tiefelsdorf [43], numerous studies have found that irregular spatial tessellations share, on average, many topological properties with a hexagonal tessellation in which a given hexagon has, in general, six neighbors when we define that one hexagon is a neighbor of another hexagon if they have a common side. Accordingly, we took l = 6 in the simulation study. Specifically, let d ij be the Euclidean distance between (u i , v i ) and (u j , v j ). Given each (u i , v i ), for each of j = 1, 2, · · · , n with j = i, if d ij ≤ d il , we set w ij = 1; if d ij > d il , we set w ij = 0. Then the elements in W are defined by w ij = w ij / n ∑ j=1 w ij for i, j = 1, 2, · · · , n with i = j, and w ii = 0 (i = 1, 2, · · · , n) by convention. The observations {x i2 } n i=1 and {x i3 } n i=1 of the explanatory variables X 2 and X 3 were independently drawn from the uniform distribution U(− √ 3, √ 3) and the standard normal distribution N(0, 1), respectively. The model errors {ε i } n i=1 were generated from the normal distribution N(0, 0.5 2 ). We designated the three regression coefficients as The true surfaces of the regression coefficients are shown in Figure 2, from which we can observe that their respective levels of spatial heterogeneity are obviously different. The autoregressive coefficient ρ was set to be from −0.9 to 0.9 with an increment of 0.3.
When the values {x i2 , x i3 } n i=1 of the explanatory variables X 2 and X 3 as well as {ε i } n i=1 of the model errors have been drawn from their respective distributions, the observation vector y of the response variable is generated according to the matrix form of the model in Equation (2), i.e., ) T , and ε = (ε 1 , ε 2 , · · · , ε n ) T .

(iii) Designs of the other experimental items
In both SARGWR-LL and SARGWR-GL, we set the shrinking parameter as c =0.6, 0.7, 0.8, 0.9, and 1. The bisquare kernel with an adaptive bandwidth was used to generate the weights in Equation (21), and the optimal size of the bandwidth k was selected by the AICc criterion, whereh in step (ii) of both SARGWR-LL and SARGWR-GL was taken to be the integer part of c h 0L or c h 0G .

(iv) Indices for measuring accuracy of the coefficient estimators
Each experimental setting was repeatedly run N times, where, in each replication, both model errors {ε i } n i=1 and the observations {x i2 , x i3 } n i=1 of the explanatory variables X 1 and X 2 were re-drawn from their respective distributions. Based on the coefficient estimators of the SARGWR model in Equation (26) in the N replications, we defined the following indices for measuring the accuracy of the coefficient estimators.
Given each of the SARGWR-LL, SARGWR-GL, and SARGWR-BF methods, letρ (r) be the estimator of the autoregressive coefficient ρ in the r-th replication. We take the mean of its estimators in the N replications, as the final estimator of ρ and use the root mean square error (RMSE) to measure the estimation accuracy of ρ.
In addition, the averaged root mean square errors (RMSEs) over the sampling points given by , are used to measure the global estimation accuracy of β j (u, v).
In the simulation, we set N = 200 to compute the above indices.

(i) Estimation accuracy of the autoregressive and regression coefficients
With the experiment design in Section 3.1.1, the values of the estimation accuracy indices of the autoregressive and regression coefficients for SARGWR-LL and SARGWR-GL are reported in Table 1. For comparison, the related results from SARGWR-BF with the convergence threshold η 0 = 0.01 are also attached. With regard to the autoregressive coefficient ρ, it is known from the fourth column of the table that its final estimators from the three multiscale estimation methods were comparable, and there was a trend where the estimators were somewhat smaller than their respective real values, especially when autocorrelation in the response variable is extremely high (i.e., the value of ρ is −0.9 and 0.9). In terms of the RMSE, the same trend is found for the estimation accuracy. However, for the small and moderate values of ρ, the three multiscale estimation methods all yield an accurate estimator of ρ. Moreover, the estimators of ρ yielded by the two non-iterative multiscale estimation models were rather robust to the variation of values of the shrinking parameter c in terms of both the mean and the RMSE indices.
For the regression coefficients β j (u, v) (j = 1, 2, 3), SARGWR-LL generally yielded more accurate estimators than SARGWR-GL except for the case of ρ = 0.9 for β 1 (u, v). The gain in the coefficient estimation accuracy for SARGWR-LL should have come from the initial estimators of regression coefficients where the local-linear GWR estimation procedure was used because, as shown by Wang et al. [38], the local-linear GWR procedure can yield more accurate estimators of the regression coefficients than the traditional GWR method especially when a regression coefficient is a linear function of spatial coordinates. However, there was no notable difference in the estimation accuracy between the two proposed noniterative multiscale methods, even in the cases of ρ = −0.9 and 0.9, which demonstrates that the initial estimators of regression coefficients, as expected previously, do not produce notable effects on the accuracy of the final multiscale estimators of regression coefficients. Moreover, comparing the values of ARMSE for c = 1 with those of c being less than 1, we know that, for both SARGWR-LL and SARGWR-GL, the improvement in accuracy of the final coefficient estimators can really be achieved by shrinking the optimal bandwidth size selected in the initial LGWR-2SLS or GWR-2SLS estimation in most cases, especially for the SARGWR-GL method. However, the values of ARMSE seemed robust for c = 0.6, 0.7, 0.8, and 0.9. This finding is useful in practice in that it provides a non-rigorous way for the analyst to choose the value of c. Among the three multiscale estimation methods, SARGWR-BF generally yielded the worst estimators of the regression coefficients in terms of RMSE, although the estimation accuracy could be improved by setting a smaller value of the convergence threshold η 0 . The computation efficiency of the three multiscale estimation methods will be discussed further later.
In order to visually show the performance of the three estimation methods in retrieving true surfaces of the regression coefficients, we depict in Figures 3-5 the estimated surfaces of the regression coefficients via their respective final estimators defined in Equation (28) for the cases of ρ = −0.9, 0, 0.9 and c = 0.7. The estimated surfaces in the other cases are all very similar, and we omitted them to save space. By comparing Figures 3-5 with Figure 1, it can be observed that SARGWR-LL and SARGWR-GL retrieved the true surfaces of the regression coefficients more accurately than SARGWR-BF.

(ii) Optimal bandwidth sizes of the regression coefficients
As mentioned in the introduction section, the optimal bandwidth size of each regression coefficient can characterize the underlying spatial scale of the corresponding explanatory variable, which was one of the main objectives for developing multiscale estimation methodologies for the GWR and the related models. Figures 6 and 7 show shows the boxplots of the optimal bandwidth sizes of the three regression coefficients selected in the 200 experiment replications for SARGWR-LL, SARGWR-GL, and SARGWR-BF, respectively. Here, we only show the boxplots in the cases of c = 0.7, 1, and ρ = −0.9, 0, 0.9 to illustrate the impact of the shrinking parameter, c, and the autoregressive coefficient, ρ, on the optimal bandwidth sizes for the three multiscale estimation methods. The boxplots for the other experiment settings were all similar, which we omitted here. It can be observed from Figures 6 and 7 that the three multiscale estimation methods all yielded such variable-specific optimal bandwidth sizes that are consistent with the heterogeneity levels of the respective coefficients in almost all experiment settings. That is, the more heterogeneous the coefficient was, the smaller the optimal bandwidth size was, which demonstrates that the three methods can all correctly reveal the respective underlying spatial scales of the explanatory variables. Compared to the iterative estimation method SARGWR-BF, both non-iterative estimation methods SARGWR-LL and SARGWR-GL could better reflect the difference in heterogeneity among the three coefficients because the optimal bandwidth sizes from SARGWR-LL and SARGWR-GL show on the whole, an evident difference and the correct order of spatial heterogeneity levels among the three coefficients. However, the uncertainty of the optimal bandwidth sizes yielded by SARGWR-LL and SARGWR-GL for the least heterogeneous coefficient β 1 (u, v) was much larger than that produced by SARGWR-BF. The reason for causing this large uncertainty in SARGWR-LL and SARGWR-GL was perhaps due to the property that, theoretically, the optimal bandwidth size for a linear function in the local-linear kernel smoothing technique tends to infinity with the increase of the sample size [44]. For the other two non-linear coefficients, uncertainty in the optimal bandwidth size was comparable among the three multiscale estimation methods. Moreover, the figures show that the variable-specific optimal bandwidth sizes for the three methods were affected by the intensity of the spatial autocorrelation in the response variable, especially for the least heterogeneous coefficient β 1 (u, v). For example, when the very high positive spatial autocorrelation exists in the response variable (i.e., ρ = 0.9), the medians of the optimal bandwidth sizes yielded by SARGWR-LL and SARGWR-GL for β 1 (u, v) become even slightly smaller than those for β 3 (u, v). However, less influence of the intensity of spatial autocorrelation on the optimal bandwidth sizes was observed for the SARGWR-BF method. In addition, comparing Figure 6 with Figure 7, we can see that the boxplots with c = 0.7 are very similar to the corresponding boxplots with c = 1 for both non-iterative multiscale estimation methods. This demonstrates that shrinking the optimal bandwidth size to obtain initial estimators of the regression coefficients had little impact on their respective final optimal bandwidth sizes.

(iii) Computation efficiency
As one of our main focuses for developing non-iterative multiscale estimation methods for the SARGWR model, computation efficiency is essential for SARGWR-LL and SARGWR-GL. According to the foregoing simulation, which was conducted by our writing the Matlab codes and carrying out the computation on our personal computer with AMD Ryzen 5 5600G @ 3.90GHz of CPU and 16GB of memory, the average time for running a replication was about 90 s for SARGWR-LL and about 80 s for SARGWR-GL. With the convergence threshold η 0 = 0.01, the SARGWR-BF method took about 210 s to run a replication, which is more than two times as much as the computing time that SARGWR-LL and SARGWR-GL took, respectively. In essence, both SARGWR-LL and SARGWR-GL are one-step SARGWR-BF by taking the initial estimators of the regression coefficients to be the LGWR-2SLS and the GWR-2SLS estimators with a shrunk optimal bandwidth size, respectively. Therefore, SARGWR-LL and SARGWR-GL should be more efficient than SARGWR-BF when the iteration times for implementing SARGWR-BF until convergence are greater than 1. Moreover, as expected, SARGWR-GL was more efficient than SARGWR-LL, and the difference in the computation time between SARGWR-LL and SARGWR-GL will become larger with the number of explanatory variables increasing. Furthermore, in view of the foregoing findings that both the estimation accuracy of the coefficients and the variable-specific optimal bandwidth sizes are all comparable between SARGWR-LL and SARGWR-GL, the SARGWR-BF provides a more efficient alternative for dealing with large data sets. Although computation time for an estimation method closely depends on the optimization of the codes written and the computation equipment used, the above comparison still makes sense in understanding the relative computation efficiency of the three multiscale estimation methods, in that the codes were written by ourselves and the computation was carried out on a same computer.  What we are interested in is exploring spatial autocorrelation in the percentage of the voting population among the n = 322 EDs and spatial heterogeneity of the impact of the eight variables on the percentage of voting population. For this purpose, we built the following SARGWR model: where the spatial weights matrix W = w ij n×n was formulated in the same way as that in the simulation study. That is, the elements in W were first determined by the binary way with the 6-nearest neighbor procedure used in the simulation study, where the Euclidean distance between the spatial coordinates of two EDs was used to determine the neighbors of each ED, and then were row-standardized. Moreover, all of the explanatory variables were standardized.

Model Calibration with the Results
Both SARGWR-LL and SARGWR-GL methods were applied to the calibration of the above model, in which the bisquare kernel with an adaptive bandwidth was used to generate the calibration weights matrix in Equation (5), and the optimal bandwidth size was selected by the AICc criterion. Furthermore, the bandwidth shrinking parameter was set to c = 0.7. For the purpose of comparison, the SARGWR-BF method was also used to calibrate the model, in which the convergence threshold was set to be η 0 = 0.01.
The estimated value of the autoregressive coefficient is 0.2547 for SARGWR-LL, 0.1810 for SARGWR-GL, and 0.1714 for SARGWR-BF, which are all positive and quantitatively similar, meaning that there exists positive spatial autocorrelation among the percentages of the voting population in the EDs. The running time on our personal computer is 56 s and 43 s for SARGWR-LL and SARGWR-GL, respectively, while SARGWR-BF took 68 s when the convergence threshold η 0 = 0.01 is reached.
The variable-specific optimal bandwidth sizes for the three multiscale estimation methods are listed in Table 2. In order to assess the impact of the shrinking parameter on the optimal bandwidth sizes of individual explanatory variables, the corresponding optimal bandwidth sizes selected in SARGWR-LL and SARGWR-GL with c = 1 (i.e., the original optimal bandwidth size was not shrunk) are also reported in Table 2. It is known from the table that although the corresponding optimal bandwidth sizes among the three estimation methods are different from each other, their relative orders are roughly consistent, which provides information about the spatial scale of each explanatory variable. In particular, the influence of LARent, SC1, and LowEduc on the percentage of voting population (GenEl) is least heterogeneous because the three estimation methods all produce extremely large bandwidth sizes for these three explanatory variables, while the influence of DiffAdd is the most heterogeneous due to its very small bandwidth sizes yielded by the three estimation methods. Comparing the optimal bandwidth sizes with c = 0.7 and the corresponding ones with c = 1, we can observe that they are all comparable, which demonstrates, as shown in the simulation study, that the final optimal bandwidth sizes of the individual explanatory variables are very robust to the variation of the shrinking parameter for the SARGWR-LL and SARGWR-GL methods. Table 2. Variable-specific optimal bandwidth sizes yielded by the three multiscale estimation methods with the shrinking parameter c = 0.7 and 1 for SARGWR-LL and SARGWR-GL. For the assessment of the goodness-of-fits and the ability to extract spatial autocorrelation of the three multiscale estimation methods, we computed the values of R 2 and Moran's I of the residuals with the p-values derived by 200 randomly permuted residual samples. The values of R 2 are 0.7936 for SARGWR-LL, 0.8993 for SARGWR-GL, and 0.6896 for SARGWR-BF, respectively. The smaller value of R 2 for SARGWR-BF may be due to the relative inaccuracy of the regression coefficient estimators, as demonstrated in the simulation study. The values of Moran's I of the residuals are 0.0654 with the p-value 0.020, 0.0555 with the p-value 0.055, and 0.0258 with the p-value 0.195 for SARGWR-LL, SARGWR-GL, and SARGWR-BF, respectively, showing that SARGWR-BF is of stronger ability for extracting spatial autocorrelation. In addition, for the comparison of goodness-of-fits of the three multiscale estimation methods with other possible models, we calibrated the corresponding SARGWR model with the GWR-2SLS method, GWR model with the traditional estimation method, and MGWR model, where the coefficients of the LARent, SC1 and LowEduc were assumed to be constant, with the two-step estimation method. The values of R 2 are 0.7060, 0.7135, and 0.7154 for SARGWR, GWR, and MGWR models, respectively, which are all smaller than the corresponding R 2 values of the multiscale estimation methods except for SARGWR-BF.
The estimators of the regression coefficients by the three estimation methods are shown via heat maps in Figure 8. It can be observed that the spatial patterns of individual coefficients are basically consistent among the three estimation methods, although some local differences exist, especially between the estimator of intercept by SARGWR-LL and those by SARGWR-GL and SARGWR-BF, which might be caused by the larger difference between the estimated values of spatial autoregressive coefficient ρ and the different multiscale estimation methods used. Relatively, however, the heat maps of the regression coefficient estimators produced by SARGWR-LL and SARGWR-GL are more similar.
Based on the heat maps of the regression coefficients, the impact of each explanatory variable on the response variable can be qualitatively interpreted. For example, the influence of Age3 on GenEl increases from north to south, showing a positive effect in the south area, especially in the southeast area, and a negative effect in the north area; Age1 has an evident negative impact on GenEl mainly in the south area, while the most positive influence of Age2 appears in this area; the influence of Unempl on GenEl is negative over the whole area with the least negative impact being on the center area. DiffAdd, whose coefficient is the most spatially heterogeneous, shows a weakly positive influence in the northern area but a strong negative influence in the center and southeast areas. Due to the very large optimal bandwidth sizes of the LARent, SC1, and LowEduc, their influence could be interpreted to be global. However, as will be discussed in the last section, a formal statistical test is still needed to verify that their corresponding coefficients are constant.

Conclusions and Discussion
Inspired by the cost-effective multiscale estimation approach proposed by Wu et al. [37] for calibrating GWR models and considering the importance of SARGWR models in the application, in this article, we proposed two non-iterative multiscale estimation methods called SARGWR-LL and SARGWR-GL, respectively, for SARGWR models based on their 2SLS estimation. The simulation study and real-life data analysis demonstrate that both SARGWR-LL and SARGWR-GL perform as well as or better than the iterative multiscale estimation method SARGWR-BF, not only in estimating the autoregressive coefficient and retrieving the underlying spatial patterns of the regression coefficients but also in revealing the spatial scales at which the explanatory variables operate. Most importantly, the proposed SARGWR-LL and SARGWR-GL methods were much more efficient than SARGWR-BF in terms of computational efficiency.
As aforementioned in Remark 1, the SARGWR model, which this paper has focused on, assumes a constant autoregressive coefficient. This assumption may be unrealistic for many real-life spatial data sets. Recently, Mei and Chen [41] have extended the SARGWR model to allow the autoregressive coefficient to vary over space and proposed a GWRbased 2SLS estimation method for the extended SARGWR model. It seems possible for both SARGWR-LL and SARGWR-GL multiscale estimation procedures to be extended to the SARGWR model with a spatially varying autoregressive coefficient, provided that the spatially varying autoregressive coefficient can be efficiently estimated by some local smoothing technique. This extension is worth investigating in view of the wide application background of the extended SARGWR model.
Moreover, SARGWR models assume that all of the regression coefficients vary over space, and their multiscale estimation can additionally provide information on the relative levels of spatial heterogeneity of the regression coefficients via variable-specific optimal bandwidth sizes. As pointed out by Yu et al. [45] and Fotheringham [46], formal statistical tests for identifying the constant coefficients and locally evaluating the significance of the influence of the explanatory variables at each spatial sampling point are also essential to regression-based local modeling. For the SARGWR model, Li et al. [47] proposed two kinds of tests to respectively identify whether spatial autocorrelation exists in the response variable and whether some of the regression coefficients are constant. Similar tests have also been proposed by Mei and Chen [41] for the extended SARGWR model. In these tests, there were too many null hypotheses for the regression coefficients to be considered, which makes it very complex to comprehensively identify the constant regression coefficients in the models. Taking into account the information of spatial heterogeneity of each regression coefficient provided by a multiscale estimation method as the prior information, i.e., an especially large bandwidth size of a regression coefficient means that this coefficient is possibly constant, we can formulate much fewer but more specific null hypotheses for testing the possible constant coefficients in a SARGWR or the extended SARGWR model. After the constant regression coefficients in a SARGWR model are well identified, a semiparametric SARGWR model, where some regression coefficients are constant and others vary over space, is then formulated. It also seems possible to extend both SARGWR-LL and SARGWR-GL's multiscale estimation methods to semi-parametric SARGWR models. These issues deserve to be studied in future research.