Next Article in Journal
A Dynamic Programming Algorithm for Finding an Optimal Sequence of Informative Measurements
Next Article in Special Issue
Detecting Non-Overlapping Signals with Dynamic Programming
Previous Article in Journal
Trajectory Tracking Control Method for Omnidirectional Mobile Robot Based on Self-Organizing Fuzzy Neural Network and Preview Strategy
Previous Article in Special Issue
Estimation of Large-Dimensional Covariance Matrices via Second-Order Stein-Type Regularization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robust Variable Selection with Exponential Squared Loss for the Spatial Durbin Model

College of Science, China University of Petroleum, Qingdao 266580, China
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(2), 249; https://doi.org/10.3390/e25020249
Submission received: 29 November 2022 / Revised: 25 January 2023 / Accepted: 28 January 2023 / Published: 30 January 2023
(This article belongs to the Special Issue Statistical Methods for Modeling High-Dimensional and Complex Data)

Abstract

:
With the continuous application of spatial dependent data in various fields, spatial econometric models have attracted more and more attention. In this paper, a robust variable selection method based on exponential squared loss and adaptive lasso is proposed for the spatial Durbin model. Under mild conditions, we establish the asymptotic and “Oracle” properties of the proposed estimator. However, in model solving, nonconvex and nondifferentiable programming problems bring challenges to solving algorithms. To solve this problem effectively, we design a BCD algorithm and give a DC decomposition of the exponential squared loss. Numerical simulation results show that the method is more robust and accurate than existing variable selection methods when noise is present. In addition, we also apply the model to the 1978 housing price dataset in the Baltimore area.

1. Introduction

In recent years, spatial section data have been widely used in geography, politics, environment, and other fields. Therefore, spatial econometrics, a model initially used in the economic area, has also attracted much attention. Anselin (1988) [1] divides spatial econometric models into spatial error models, spatial hysteresis models, and spatial Durbin models (SDM). Among them, the spatial Durbin model is represented as y = ρ W y + X β + W X δ + ε . The spatial Dubin model considers the influence of the independent variable and the dependent variable of the spatial lag on the dependent variable simultaneously and can more easily estimate the unbiased coefficient. At the same time, the spatial Dubin model can also calculate spatial spillover effects based on panel data. In spatial regression analysis, the influence of regional locations on observations is expressed employing spatial weight matrix W, and the appropriate setting of spatial weight matrix is an essential basis for spatial econometric analysis. There are two main ways to select a spatial weight matrix: The first method is to select a spatial weight matrix from an optional set of spatial weight matrices. Kelejian (2008) [2] uses GMM estimation to select an actual spatial weight matrix. A non-nested J test method is proposed to test a set of alternative models with different spatial weight matrices for the empty SAR model. The second type of method estimates the weight matrix by averaging different spatial weight matrices. Zhang and Yu (2018) [3] propose a model averaging process to reduce estimation error. This approach overcomes the difficulty that the actual spatial weight matrix is not in the candidate matrix.
In the field of classical linear regression, much work has contributed to variable selection. Among them, the most popular method is to add penalty functions to the model for variable selection. These punishment regression methods have a unified theoretical framework, such as most minor absolute shrinkage and selection operators (lasso, Tibshirani, 1996) [4], smoothly clipped absolute deviation (SCAD, Fan and Li, 2001) [5], and adaptive lasso (Zou, 2006) [6]. Since SDM has spatial autocorrelation, the above variable selection method can be directly applied to the SDM model.
Due to noise and outliers, the classical variable selection methods in regression models often face the problem of instability, so many scholars have proposed some robust variable selection algorithms. The Huber loss function was widely used in early studies, but this function has some limitations in efficiency and solution. Wang et al. (2013) [7] proposed a robust parameter estimation method based on the exponential squared loss function, which is widely used in boosting algorithms (Friedman et al., 2000) [8]. When γ is small, the loss of experience caused by a larger | t | value is close to 1; therefore, the loss function is robust to parameter estimation. Wang et al., (2013) [7] also point out that this method is more robust than other robust variable selection methodsm such as Huber estimation, quantile regression estimation (Koenker and Bassett, 1978) [9], and compound quantile regression estimation (Zou and Yuan, 2008) [10], and proposed the selection method of parameter γ .
Our research focuses on variable selection for spatial Durbin models. The spatial Durbin model combines the spatial interaction of dependent and explanatory variables, but only a few researchers use and study this model. Beer and Rield (2011) [11] used the maximum likelihood estimation to estimate the parameters of the spatial Durbin model. They used the Monte Carlo method to analyze the characteristics of the estimator. Mustaqim (2018) [12] discusses instrumental variable efficiency in simultaneous spatial Durbin models. Estimation methods are 2SLS and GMM-S2SLS. The analysis results show that the GMM-S2SLS method produces less bias than the 2SLS method. Zhu, Yanli (2020) [13] proposed parameter estimation of the spatial Durbin model based on Markov Chain Monte Carlo (MCMC). Wei, Lili (2021) [14] proposed a within-group spatial two-stage least squares estimator. However, the existing variable selection methods are affected by outliers in limited samples and are not robust enough. Therefore, it is imperative to study a robust variable selection method.
Considering robustness, we combine parameter penalty with exponential square loss and assume that the errors of the model are independent and identically distributed. For the parameter penalty method, we use adaptive lasso. We applied the robust selection method based on the exponential squared loss variable to the spatial autoregressive model and achieved satisfactory results [15]. The spatial autoregressive model is one of the special cases of the spatial Durbin model. In this paper, we aim to study the application of the robust selection method based on the exponential squared loss variable in the spatial Durbin model.
A robust variable selection method for the spatial Durbin model based on adaptive lasso penalty and exponential square loss function is proposed in this paper. This method cannot only estimate the regression coefficient but also has the function of variable selection. Next, we show the framework of the paper.
  • We build a robust variable selection method for SDM, equipped with an exponential squared loss, resistant to the influence of outliers in the observed values and errors estimating the space weight matrix.
  • To solve the optimization problem of SDM, we propose a block coordinate descent (BCD) algorithm. Secondly, to solve the subproblems generated by the BCD algorithm, we design the DC decomposition of exponential square loss and construct the CCCP program. Finally, to obtain the BCD algorithm’s convergence, we analyze the algorithm’s convergence rate to the stagnation point under mild conditions.
  • We proved the “Oracle” property of the robust variable selection method and conducted numerical experiments to verify the robustness and effectiveness of the model. Numerical studies show that when there are outliers in the observed data, the method proposed in this paper is superior to the comparison method in correctly identifying zero coefficients, nonzero coefficients, and MedSE incorrectly.
The structure of this paper is as follows. Section 2 introduces the spatial Durbin model and gives the exponential square loss function based on adaptive lasso. In Section 3, we propose an effective algorithm to complete the variable selection process. In order to check the performance of the model under limited samples, we have carried out a numerical simulation in Section 4. In Section 5, we apply our model to real-world datasets. We summarize the full text in Section 6.

2. Variable Selection and Estimation

2.1. Spatial Durbin Model

The observed dependent variable Y i R 1 × 1 , and the corresponding independent vector X i = X i 1 , , X i p , where the p is a fixed constant. Let the dependent variable vector Y = Y 1 , , Y n T and the independent variable matrix X = X 1 , , X n T R n × p . The spatial Durbin model is as follows:
Y = ρ W Y + X β + W X δ + ε .
where the regression coefficient vector β = β 1 , , β p T R p × 1 , the spatial autocorrelation coefficient ρ R 1 × 1 , the regression coefficient vector of exogenous variables δ = δ 1 , , δ p T R p × 1 , and the error vector ε = ε 1 , , ε n T R n × 1 . W X is a spatial lag term that reflects the interaction of independent variables between individuals. W y embodies the interaction between the strain variable y and its surrounding y. We assume that noises ε all obey N 0 , σ 2 and are independent of each other. y can be expressed as the following formula:
Y = I n ρ W 1 ( X β + W X δ + ε ) .
Since the maximum eigenvalue of W is 1 after normalization, to guarantee I n ρ W reversibility, we order | ρ | < 1 . Additionally, in this article, we ignore the endogenous nature of the model.

2.2. Variable Selection Method for SDM

Rewrite model (1) as model (3) in the following form:
ε i ( θ ) = Y i ρ W Y i + X i β + W X i δ .
Take the variable selection for the SDM into consideration. In practical applications, the regression coefficient vector β * is usually sparse. At the same time, sparse solutions can find useful dimensions and reduce redundancy, as well as improve the accuracy and robustness of regression prediction (Fan and Li, 2001 [5]; Tibshirani, 1996 [4]). Applying the penalized method to variable selection is natural, which can select essential variables and estimate the regression coefficient. In this article, we punish the loss function using the adaptive lasso penalty function. The adaptive lasso penalty is described as follows:
j = 1 p P β j = j = 1 p η j β j .
where η j = 1 | β j ^ | , β j ^ is generally given by least squares estimates. Considering that the exponential square loss function has good robustness, we use it as the model’s loss function in this paper. The exponential square loss expression is as follows:
ϕ γ ( t ) = 1 exp t 2 / γ .
Here, γ is a parameter that controls the robustness of the loss function. γ limits the effect of outliers on the model but also reduces the accuracy of the model. Therefore, it is essential to choose the right γ . The method of selecting the right γ is shown in Section 2.4.
The model is constructed on the basis of the above model (3). The objective function to be solved is as follows:
min β R p , δ R p , ρ [ 0 , 1 ] L ( β , δ , ρ ) = 1 n i = 1 n ϕ γ ( Y i ρ Y i X i β W X i δ ) + λ j = 1 p η j | β j | + λ j = 1 p σ j | δ j | .
We may as well order
Y ˜ = W Y ,
X ˜ = [ X , W X ] ,
β ˜ = β T , δ T T .
We can obtain a simplified expression of (6) as follows as (7):
min β ˜ R 2 p , ρ [ 0 , 1 ] L ( β ˜ , ρ ) = i = 1 n ϕ γ Y i ρ Y ˜ i X ˜ i β ˜ + λ j = 1 2 p η j β ˜ j ,
where λ > 0 is a regularization parameter. ϕ γ ( . ) is exponential squared loss.

2.3. Oracle Properties and Large Sample Properties

In this section, we discuss the large sample properties and oracle properties of the proposed spatial Durbin model parameter estimation method.
First of all, let us make the true value of β ˜ be β ˜ 0 = β ˜ 10 , , β ˜ 2 p 0 T . Additionally, because β ˜ 0 = β 0 T , δ 0 T T , where β 0 = ( β 10 T , β 20 T ) T , δ 0 = ( δ 10 T , δ 20 T ) T . Based on the sparsity assumed above, we assume that β 20 = 0 , δ 20 = 0 . So, β ˜ 0 = [ β 10 T , 0 T , δ 10 T , 0 T ] T . For the convenience of expression, we make a transformation to the β ˜ 0 , so that β ˜ 0 = [ β 10 T , δ 10 T , 0 T , 0 T ] T = [ β ˜ 10 T , 0 T ] T . In order to adapt to this transformation in β ˜ 0 , X ˜ needs to make a similar transformation. In the following, we all assume that X ˜ was transformed accordingly. For convenience, we express β ˜ 10 as β ˜ 10 in the following text. Let β ˜ ^ = β ˜ ^ 1 T , β ˜ ^ 2 T T be the resulting estimator of (4), suppose that the β ˜ ^ here has also undergone the above transformation. I ( β ˜ , γ ) = 2 γ Z Z T e r 2 / γ 2 r 2 γ 1 d F ( Z , y ) , where r = Y I n ρ W 1 X ˜ β ˜ = Y Z β ˜ , Z = I n ρ W 1 X ˜ , a n = max p λ n j β ˜ 0 j : β ˜ 0 j 0 , b n = max p λ n j β ˜ 0 j : β ˜ 0 j 0 . Let the true value of ρ be ρ 0 . Thus, θ 0 = ( ρ 0 , β ˜ 0 ) . For ease of presentation, let β ˜ 10 = ρ and β ˜ 1 j = β ˜ 1 j , j = 1 , 2 , , s , then denote β ˜ 1 = ρ , β ˜ 11 , , β ˜ 1 s T and β ˜ 01 = ρ 0 , β ˜ 01 , , β ˜ 0 s T .
We prove the asymptotic and oracle properties of the proposed penalty estimators. Before we can prove it, we need the following hypothesis.
Assumption 1. 
Σ = E Z Z T is positive definite and E Z 3 < .
Assumption 2. 
The matrix I n ρ W is nonsingular with | ρ | < 1 .
Assumption 3. 
The row and column sums of the matrices W n and I ρ W n are bounded uniformly in absolute value.
Assumption 4. 
For matrix G n = W ( I ρ W ) 1 , there exists a constant λ ˜ c such that λ ˜ c I n G n G n T is positive semidefinite for all n.
Assumption 5. 
1 / min s + 1 j p λ j = o p (1). Additionally, with probability 1,
lim inf n lim inf t 0 + min s + 1 j p p λ j ( t ) λ j > 0 .
Assumption 6. 
n a n = o p ( 1 ) , b n = o P ( 1 ) .
Assumption 7. 
γ n γ 0 = o p ( 1 ) for some γ 0 > 0 .
Assumption 8. 
There are constants C 1 and C 2 such that, when θ 1 , θ 2 > C 1 λ j p λ j θ 1 p λ j θ 2 C 2 θ 1 θ 2 , for j = 0 , 1 , , p .
For our proposed estimator, we give the following sample properties. The following theorem gives the consistency and “oracle" property of the proposed estimator.
Theorem 1. 
If Assumptions 1 8 are true, then there is a local maximizer θ ^ such that θ ^ θ 0 = O p n 1 / 2 + a n .
Theorem 2. 
(Oracle Property). Suppose that Assumptions 1 8 hold, and I β ˜ 0 , γ 0 is negative definite. If γ n γ 0 = o p ( 1 ) for some γ 0 > 0 , θ ^ = ρ ^ , β ˜ ^ 1 T , β ˜ ^ 2 T T must satisfy:
(1) 
sparsity, that is, β ˜ ^ n 2 = 0 with probability 1;
(2) 
asymptotic normality:
n I 1 β ˜ 01 , γ 0 + Σ 1 β ˜ ^ n 1 β ˜ 01 + I 1 β ˜ 01 , γ 0 + Σ 1 1 Δ N 0 , Σ 2 ,
where β ˜ ^ n 1 = ρ ^ , β ˜ ^ 11 , , β ˜ ^ 1 s T , and β ˜ 01 = ρ 0 , β ˜ 01 , , β ˜ 0 s T ,
Σ 1 = diag p λ 1 β ˜ 01 , , p λ s β ˜ 0 s Σ 2 = cov exp r 2 / γ 0 2 r γ 0 Z i 1 Δ = p λ 1 β ˜ 01 sign β ˜ 01 , , p λ s β ˜ 0 s × sign β ˜ 0 s T I 1 β ˜ 01 , γ 0 = 2 γ 0 E exp r 2 / γ 0 2 r 2 r 0 1 × E Z i 1 Z i 1 T .
The detailed proofs of Theorem 1 and Theorem 2 are shown in the Appendix A and Appendix B.

2.4. The Selection of Parameter γ

Parameter γ can control the robustness and efficiency of the robust variable selection method. Wang et al., (2013) [7] proposed a parameter selection method based on normal regression. In this paper, we extend the selection method of parameter γ to the spatial Durbin model. The specific process is as follows:
Step 1. Initialize ρ ^ = ρ ( 0 ) and β ˜ ^ = β ˜ ( 0 ) . Set ρ ( 0 ) = 1 2 , β ˜ ( 0 ) a robust estimator. Rewrite the model Y = ρ W Y + X β ˜ + + W X δ + ϵ as Y * = X * β ˜ * + ϵ , where Y * = Y ρ W Y , X * = [ X W X ] , β ˜ * = [ β ˜ , δ ] T .
Step 2. Find the pseudo-outlier set of the sample: Let D n = X 1 * , Y 1 * , , X n * , Y n * . Calculate r i ( β ˜ * ^ ) = Y i * X i * β ˜ * ^ , i = 1 , , n and S n = 1.4826 × median i r i ( β ˜ ^ * ) median j r j ( β ˜ ^ * ) . Then, there exist the pseudo-outlier set D m = X i , Y i : r i ( β ˜ ^ * ) 2.5 S n , set m = 1 i n : r i ( β ˜ ^ * ) 2.5 S n , and D n m = D n D m .
Step 3. Select the tuning parameter γ n : construct V ^ ( γ ) = { I ^ ( β ˜ ^ * ) } 1 Σ ˜ 2 { I ^ ( β ˜ ^ * ) } 1 , in which
I ^ ( β ˜ ^ * ) = 2 γ 1 n i = 1 n exp r i 2 ( β ˜ ^ * ) / γ 2 r i ( β ˜ ^ * ) γ 1 · 1 n i = 1 n X i X i T ,
Σ ˜ 2 = Cov exp r 1 2 ( β ˜ ^ * ) / γ 2 r 1 ( β ˜ ^ * ) γ X 1 , , exp r n 2 ( β ˜ ^ * ) / γ 2 r n ( β ˜ ^ * ) γ X n .
Next, let γ n be the minimizer of det ( V ^ ( γ ) ) in the set G = { γ : ζ ( γ ) ( 0 , 1 ] } , where ζ ( · ) enjoys the common definition with that in Wang et al., (2013) [7].
Step 4. Update ρ ^ and β ^ as the optimal solution of min β ˜ R p , ρ [ 0 , 1 ] 1 n i = 1 n ϕ γ Y i ρ Y ˜ i X ˜ i β ˜ , where Y ˜ = W Y , X ˜ = [ X W X ] , β ˜ = [ β , δ ] T . Go to Step 2 until convergence.
In the above process, the initial step requires an initial value β ˜ ( 0 ) . In practice, the estimate of LAD loss is usually used as β ˜ ( 0 ) .

2.5. The Selection of Parameter λ and η j

We order λ i = λ · η i , in which λ and η i are from model (7). Usually, researchers use cross-validation, AIC, and BIC criteria to select λ i . In this paper, considering the complexity of computation and the consistency of variable selection, we adopt the method of Wang, Li, and Tsai (2007) [16] to consider regularization parameters by minimizing the BIC-type objective function. The BIC-type objective function is as follows:
t = 1 n 1 exp Y i ρ Y ˜ i X ˜ i β ˜ 2 / γ + n j = 1 2 p λ i β ˜ j j = 1 2 p log 0.5 n λ j log ( n ) ,
The selection method of parameter γ is given above. This makes λ i = log ( n ) / n θ i . In practice, let θ i = θ ^ i , where θ ^ i is the exponential square loss estimator without penalty term. Note that this choice satisfies the condition λ ^ i 0 for i d 0 , and λ ^ i for i > d 0 , with d 0 as the number of nonzero value in the θ 0 . Therefore, the final estimator can ensure the consistency of variable selection.

3. Algorithm for Model Solving

In this section, we focus on designing algorithms to solve model (7). This optimization problem has two optimization variables, β ˜ R 2 p and ρ [ 0 , 1 ] . So, the block coordinate descent algorithm becomes our first choice. However, the subproblems used to solve β ˜ are nonconvex functions and are not differentiable, and the convergence of the block coordinate drop algorithm is difficult to guarantee. In this case, we used bump decomposition and CCCP algorithms to deal with it. Finally, regarding the processing of penalty terms in the optimization model, we use the ISTA algorithm. This is reflected below.

3.1. Block Coordinate Descent Algorithm Frame

We present the framework of the block coordinate descent algorithm in Algorithm 1.    
Algorithm 1: Block coordinate descent algorithm
1. Set initial value for 0 R 2 p and ρ 0 ( 0 , 1 ) ;
2. repeat { For k = 0 , 1 , 2 , }
3.      Solve the subproblem about ρ with initial point ρ k :
ρ k + 1 min ρ [ 0 , 1 ] L β ˜ k , ρ
4.      Solve the subproblem with initial value β ˜ k ,
β k + 1 min β ˜ R 2 p L β ˜ , ρ k + 1
      to obtain a solution β ˜ k + 1 , ensuring that L β ˜ k , ρ k + 1 L β ˜ k + 1 , ρ k + 1 0 , and β ˜ k + 1 is a stationary point of L β ˜ , ρ k + 1 .
5. until convergence.
    Next, we need to solve subproblems (8) and (9).

3.2. Solving the Subproblem (8)

Subproblem (8) minimizes the univariate function at the interval [0, 1], so it can be solved using a golden section algorithm based on parabolic interpolation. For more information about the algorithm, see Forsythe et al., (1977) [17]. It is not repeated in this article.

3.3. Solving the Subproblem (9)

For subproblem (9), by observation, we can see that the penalty term part of the optimization model is the convex function, and the loss function part ϕ γ can also be decomposed into the difference between the two convex functions, that is, the DC function. So, subproblem (8) is DC programming. We can construct corresponding algorithms to solve the problem.
We can first perform a DC decomposition of the loss function ϕ γ ( t ) = 1 exp t 2 / γ . Suppose there are two convex functions F ( t ) and G ( t ) , make F ( t ) G ( t ) = ϕ γ ( t ) . Because F ( t ) = G ( t ) + ϕ γ ( t ) is a convex function, F ( t ) = ϕ γ ( t ) + G ( t ) > 0 , t R . We may as well order G ( t ) = 2 γ 2 γ t 2 . So, we can make G ( t ) = 1 3 γ 2 t 4 , F ( t ) = G ( t ) + ϕ γ ( t ) = 1 exp t 2 / γ + 1 3 γ 2 t 4 . It can be verified that both F ( t ) and G ( t ) are convex functions.
The DC decomposition of ϕ γ ( t ) is as follows:
F ( t ) = 1 exp t 2 / γ + 1 3 γ 2 t 4 ,
G ( t ) = 1 3 γ 2 t 4 ,
ϕ γ ( t ) = F ( t ) G ( t ) .
We can use the CCCP algorithm to solve the problem after DC decomposition. Next, define the following two functions:
J vex ( β ˜ ) = 1 n i = 1 n F Y i ρ k + 1 w i , Y X i β ˜ + λ j = 1 2 p P β ˜ j ,
J cav ( β ˜ ) = 1 n i = 1 n G Y i ρ k + 1 w i , Y X i β ˜ .
w i is the i th row of the weight matrix W, and j = 1 p P β ˜ j is a convex penalty with respect to β ˜ . Then, J vex ( · ) and J cav ( · ) are a convex function and concave function, respectively. So, the suboptimization problem (9) can be rewritten as
min β ˜ R 2 p L ( β ˜ , ρ k + 1 ) = J vex ( β ˜ ) + J cav ( β ˜ ) .
At this point, it can be found that the optimization problem (15) can be solved by the CCCP(Concave–Convex Procedure) algorithm. The CCCP algorithm framework is shown below (Algorithm 2):    
Algorithm 2: The Concave–Convex Procedure (CCCP)
1. Initialize β ˜ 0 . Set k = 0 .
2. repeat
3.
β ˜ k + 1 = argmin β ˜ J vex ( β ˜ ) + J cav β ˜ k · β ˜
4. untill convergence of β ˜ k .
It is easy to know that the optimization problem (16) is a convex optimization problem. The CCCP algorithm minimizes the problem (15) by iteratively solving a series of convex problems (16). Therefore, the solving method of subproblem (16) directly affects the iterative efficiency of the CCCP algorithm.
Observe subproblem (16): J cav β ˜ k · β ˜ is a linear function about β ˜ . J vex ( β ˜ ) contains the convex function 1 n i = 1 n F Y i ρ k + 1 w i , Y X i β ˜ and penalty term λ j = 1 2 p P β ˜ j for β ˜ . We might as well order
ψ ( β ˜ ) = 1 n i = 1 n F Y i ρ k + 1 w i , Y X i β ˜ + J cav β ˜ k · β ˜ ,
where ψ ( β ˜ ) is a convex function about β ˜ . So, subproblem (16) can be represented as
min β ˜ R p ψ ( β ˜ ) + λ i = 1 p P β ˜ i .
Optimization problems (17) are composed of convex functions and adaptive lasso penalty terms, and we can use the ISTA algorithm to solve such problems.
For all L > 0 , ISTA approximates the function F ( β ) = ψ ( β ˜ ) + λ i = 1 2 p η i β ˜ i at β ˜ = ξ as
Q L ( β ˜ , ξ ) = ψ ( ξ ) + β ˜ ξ , ψ ( ξ ) + L 2 β ˜ ξ 2 + λ i = 1 2 p η i β ˜ i .
This function has the following minimum point:
Θ L ( ξ ) = argmin β ˜ R 2 p Q L ( β ˜ , ξ ) = argmin β ˜ R 2 p λ i = 1 2 p η i β ˜ i + L 2 β ˜ ξ 1 L ψ ( ξ ) 2 = S λ η / L ξ 1 L ψ ( ξ ) .
With η = η 1 , , η 2 p R 2 p , and for ν = λ η / L R + p , S α : R 2 p R 2 p the vector-formed soft-thresholding operator S v ( β ˜ ) = β ˜ ¯ , β ˜ ¯ i = β ˜ i v i + sgn β ˜ i , i = 1 , , 2 p .
Thus, the solution of problem (11) can be simply expressed as β ˜ k = Θ L β ˜ k 1 .
In this article, we use the FISTA algorithm with a faster convergence speed than ISTA (Beck and Teboulle, 2009) [18]. The FISTA algorithm framework with backtracking steps is given below (Algorithm 3):
Algorithm 3: FISTA with Backtracking Step for solving (17)
Require: A , ξ , w λ > 0 Ensure: solution β ˜
1: Step 0. Select L 0 > 0 , η > 1 , β ˜ 0 R 2 p Let ξ 1 = β ˜ 0 , t 1 = 1
2: Step k ( k 1 ) .
3: Determine the smallest non-negative integer i k which make L ¯ = η i k L k 1 satisfy
4:
F Θ L ¯ ξ k Q L ¯ Θ L ¯ ξ k , ξ k .
5: Let L k = η i k L k 1 according to (19), calculate:
6: β ˜ k = Θ L k ξ k
7: t k + 1 = 1 2 1 + 1 + 4 t k 2
8: ξ k + 1 = β ˜ k + t k 1 t k + 1 β ˜ k β ˜ k 1
9: Output β ˜ : = β ˜ k .
So far, we completed the solution of subproblem (9).

4. Numerical Examples

We designed five numerical experiments to verify the performance and accuracy of variable selection methods under different conditions. For example, there are abnormal values in dependent variable Y and too many insignificant covariates.
Data generation will be based on model (1). We make the covariance matrix X an n × ( q + 3 ) matrix, and the X obeys the ( q + 3 ) -dimensional normal distribution, the mean value is zero, and the covariance matrix is σ i j , where σ i j = 0 . 5 | i j | . This means that the number of samples is n, the number of significant covariates is 3, and the number of insignificant covariates is q. In the following experiments, we set n and q to n { 200 , 360 , 500 } and q { 5 , 20 , 40 , 60 } . For the spatial regression coefficient ρ , in the experiment, we set it to ρ { 0.2 , 0.5 , 0.8 } .
We define the spatial weights matrix as a k-diagonal matrix, i.e., a matrix with only the main diagonal and the k-1 skew diagonals around it as element 1, and the other elements as 0. In numerical experiments, we set k = 7.
The regression coefficient β is set to: β = ( β 1 , β 2 , β 3 , 0 q ) , where ( β 1 , β 2 , β 3 ) = ( 3 , 2 , 1.5 ) . The regression coefficient vector of exogenous variables δ is set to: δ = ( δ 1 , δ 2 . δ 3 , 0 q ) , where ( δ 1 , δ 2 . δ 3 ) = ( 1.5 , 1.2 , 1 ) , and where 0 q is a zero vector and its dimension is q; this means that the number of 0 elements of β and δ that we set in the experiment is the same, both of which are q. The dependent variable Y is generated by the model (2).
For the error term, let ε N 0 , σ 2 I n , σ 2 obeys uniform distribution, and its generation interval is σ 1 0.1 , σ 1 + 0.1 with σ 1 { 1 , 2 } . Of course, in practice, the observation noise does not completely conform to the Gaussian distribution, and there may be abnormal values in the response. The abnormal values in the response are discussed in Section 4.3.
To reflect the excellence of this model, we also used square loss and LAD to compare with our exponential square loss. To ensure the accuracy of the experiment, we repeated each experiment 100 times. The following results are the results of MSE in the middle of 100 repeated experiments.We express the median of MSE as MedSE.

4.1. Nonregular Estimation of Normal Data

In this section, we conduct experiments on the condition that q = 5, the noise is Gaussian noise, and the penalty term is not set for the parameter estimation model. The results are shown in Table 1. Square, Exp, and LAD represent square loss, exponential square loss, and LAD loss, respectively. (1) This shows that Exp, Square, and LAD made estimates of β and δ , which are close to typical values (the means of the true values of b e t a 1 , β 2 , and β 3 are 3.0, 2.0, and 1.6, then, the mean sof the true values of δ 1 , δ 2 , and δ 3 are 2.0, 1.5, and 1.0.). By comparison, the estimated value obtained by the square loss model is the best. (2) For MedSE, the square loss model also performs the best. (3) The three loss functions can give accurate estimates of the spatial autoregressive coefficients ρ .

4.2. Nonregular Estimation for High-Dimensional Data

In this subsection, we made q { 20 , 40 , 60 } , and the parameter estimation results of the model on normal data with huge sample dimensions are explained. The results are shown in Table 2. It can be found that the estimation of β , δ , and ρ of any model is far less effective than that of q = 5. The results of MedSE are also not satisfactory. Due to the insufficient number of samples, such results can be expected.

4.3. Nonregular Estimation of Data with Outliers in Dependent Variable y

In this subsection, we make the error term ϵ obey the mixed Gaussian distribution 1 ξ 1 · N ( 0 , 1 ) + ξ 1 · N 10 , 6 2 , where ξ 1 { 0.01 , 0.05 } . In this case, the observed y will have many outliers. We illustrate the results (Table 3) of the estimated coefficients of β and δ when the observations of y have outliers. (1) For MedSE, unlike the results in Table 1, where y has no outliers, in almost all tests in Table 3, exponential square loss performed the best. (2) By comparison, the estimated values of β and δ obtained by the exponential square loss model are the best. (3) For the estimation of ρ , the exponential square loss is also the best. Therefore, we can conclude that when y has outliers, the SDM based on exponential square loss has good robustness.

4.4. Nonregular Estimation of Data with Noise in Spatial Weight Matrix

In this section, we simulate the presence of noise in the spatial weight matrix. We added a minor disturbance term ϵ to each nonzero element in the spatial weight matrix W, where ϵ 1 ξ 2 · N ( 0 , 0.001 ) + ξ 2 · N 0 , 1 , ξ 2 { 0.01 , 0.03 , 0.05 } , and all the simulated data are generated with ρ = 0.5 , σ = 1 . The test results are shown in Table 4. Compared with normal data (Table 1), the MedSE value increased. Additionally, for each loss function, the estimation of β , δ , and ρ also worsens. When the weight matrix has noise, the exponential square loss and LAD loss have good performance. Compared with the square loss, they have more accurate estimates of the parameters and smaller MedSE values. However, it cannot be denied that LAD loss performs better than exponential square loss.

4.5. Estimation with Adaptive-l1 Regularizer

We add adaptive L1 regularization to the loss function in this section and conduct experiments. We also record the average number of zero coefficients correctly selected by the model as “Correct” and the average number of nonzero coefficients incorrectly judged by the model as “Incorrect”.
Table 5 shows the results of adaptive lasso regularization on normal data with q = 5. The results show that, under almost all test results, the SDM model with exponential square loss and adaptive lasso cannot only identify more true zero coefficients (“Correct” with exponential square loss model is almost twice as much as that with square loss and LAD loss model) and nearly zero ‘Incorrect’ numbers but also has the best MedSE and accurate estimation of ρ ^ .
Table 6 shows the results of adaptive lasso regularization on normal data with q { 20 , 40 , 60 } . The results show that when there are too many insignificant covariates, the accuracy of the results of the Square loss model with adaptive lasso and the LAD loss model with adaptive lasso decreases significantly. However, the model with adaptive lasso and exponential square loss is still accurate. It can identify more true “Correct” numbers and nearly zero “Incorrect" numbers and has the best MedSE and precise estimation of ρ ^ .
Table 7 shows the results of estimation with adaptive-l1 regularization when the observations of y have outliers. The results show that, in almost all test results, the exponential square loss model with adaptive L1 has identified more true zero coefficients (“Correct") and, in most cases, has lower MedSE. Compared with the model without regularization term (Table 3), the model with adaptive L1 has a better effect. In the test, the exponential square loss model with adaptive L1 identified at least 8 zero coefficients and, in most cases, determined 10 zero coefficients. For MedSE, the exponential square loss model with adaptive L1 has the smallest MedSE in all cases, except that when n = 500 and 2q = 10, the MedSE in some cases is slightly larger than the LAD loss model with adaptive L1. This shows that the SDM using exponential square loss and adaptive lasso has excellent variable selection ability and strong robustness when the Y observation has outliers.
Table 8 shows the results of adaptive lasso regularization for data that q = 5, rho = 0.5, and spatial weight matrix has noise. For all test results, the exponential square loss with adaptive L1 identifies more zero coefficients than other models (‘Correct’). Compared with the results of the model without regularization term (Table 4), the model with adaptive L1 has a better effect. However, for MedSE, when n = 200, 2q = 10, the exponential square loss with adaptive L1 is the best; when n = 500, 2q = 10, the LAD loss with adaptive L1 is the best; when n = 360, 2q = 10, the LAD loss with adaptive L1 and the exponential square loss with adaptive L1 have little difference. However, since the exponential square loss with adaptive L1 can identify more nonzero coefficients, we believe that the exponential square loss with adaptive L1 is better than the LAD loss with adaptive L1. The results show that when the spatial weight matrix has estimation error, the SDM with exponential square loss and adaptive lasso has excellent variable selection ability and robustness.

5. Application of Practical Examples

In this part, we apply the model to actual data to verify the accuracy and efficiency of variable selection and parameter estimation.
We selected a dataset with 211 observations. The dataset describes house sales in the Baltimore area in 1978 and contains home prices and other relevant features. Original data were made available by Robin Dubin [19], Weatherhead School of Management, Case Western Research University, Cleveland, OH. The characteristics of this data are described in Table 9. We mainly study the relationship between price and several other variables. We let the dependent variable be log(PRICE), and the independent variables are NROOM, DWELL, NBATH, PATIO, FIREPL, AC, BMENT, NSTOR, GAR, AGE, CITCOU, LOTSZ, and SQFT.
We set the spatial weight matrix W by geographic location relationship. The geographic location can be determined by features X and Y. The expression for w i j looks like this:
w i j = 1 ( X i X j ) 2 + ( Y i Y j ) 2 .
In addition, we normalize the spatial weights matrix.
Table 10 shows the variable selection results of SDM for square loss, exponential square loss, and LAD loss with adaptive lasso and no penalty. To make variable selection results more intuitive, we designed Table 11. In Table 11, if the model believes that the independent variable has a positive effect on the dependent variable, we mark it as “+”; if the model believes that the independent variable is negatively correlated to the dependent variable, we mark it as “−”; and if the model considers the independent variable not to affect the dependent variable (the absolute value of the parameter estimate is less than 0.001), we do not label it. Additionally, we let the total number of “+” features be count “+”; Let the total number of “−” features be count “−”; make the total number of all independent variables related to the dependent variable count. We can find that the BIC index, with or without regularization, is the lowest exponential square loss. As seen from Table 10 and Table 11, our variable selection method has a smaller BIC index than other variable selection methods and selects fewer independent variables, making the model more accurate and more straightforward. This fully illustrates the excellence of the variable selection method proposed in this paper.
Next, we analyze our regression results. For the variable NROOM, the six models all think it positively correlates with the house price, so the more rooms, the higher the house price. For variables DWELL, EXP+adaptive-l1, Square+adaptive-l1, and LAD+adaptive-l1, it is not considered that they will impact house prices, while EXP+null, Square+null, and LAD+null think that they have a specific positive correlation with housing prices. The three models believe that if it is a detached unit, it will make the house price higher. For the variable NBATH, all models believe it positively correlates with the housing price. Therefore, the more bathrooms, the higher the house price. For the variables PATIO and FIREPL, the models with regularization term are considered independent of house price; however, the model without regularization term believes that it is related to house price, the regression coefficients are very small, and the signs of regression coefficients are different. Therefore, we believe that these two characteristics have little impact on house price. For the variable AC, the model, without adding the with no regularization term, thinks that it is positively related to the house price; that is, the house price with air conditioning is higher than that without air conditioning. For the variable BMENT, except for EXP+adaptive-l1, other models believe it positively relates to the house price; that is, houses with basements tend to have higher prices. For the variable CITCOU, the nonregularized model considers that it positively correlates with the house price; in Baltimore, houses in the city will be more expensive. For the spatial autocorrelation coefficient, the six models’ estimated values are close to 0.5. It can be seen that the rise in house price will lead to an increase in surrounding house prices. Additionally, we can see that NROOM_W, BMENT_W, and CITCOU_W, under the estimation of the six models, all have negative regression coefficients. Therefore, we can know that the spatial regression coefficients of NROOM, BMENT, and CITCOU are negative. As a result, houses with a lot of rooms, houses with basements, and houses in urban areas can have a negative impact on house prices around them. This is also customary. After all, if all the configurations of a house perform well, people will naturally expect more from the houses around them.

6. Conclusions

This paper constructs a robust method for SDM variable selection based on adaptive lasso and exponential square loss. We established the “oracle" nature of the proposed estimators. For the nondifferentiable and nonconvex problems when the model is solved, we design the BCD algorithm, DC decomposition, and CCCP algorithm to solve them. Numerical simulations show that our method has good robustness and accuracy when there is noise in the observed data. Additionally, when the spatial weight matrix estimation is inaccurate, our method also has some robustness. In variable selection, our method is significantly better than exponential squared loss and LAD loss, and almost all zero coefficients can be identified in numerical simulations. Taking the housing price dataset of the Baltimore region in 1978 as an example, the excellence and accuracy of the variable selection method of the SDM proposed in this paper are verified. Our analysis demonstrates the difference between our robust variable selection approach and other penalty regression methods, demonstrating the importance of developing robust variable selection methods.

Author Contributions

Conceptualization, Y.S. and Z.L.; methodology, Z.L.; software, Z.L.; validation, Y.S.; formal analysis, Z.L.; investigation, Y.C.; resources, Z.L.; writing-original draft preparation, Z.L.; writing-review and editing, Z.L., Y.S. and Y.C.; supervision, Y.S.; project administration, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

The researches are supported by the National Key Research and Development Program of China (2021YFA1000102).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorem 1

Let ξ n = n 1 / 2 + a n and set u = C , where u is d-dimensional vector and C is a large enough constant. Similar to Fan and Li (2001), we first show that β ˜ ^ β ˜ 0 = O p ξ n . It suffices to show that, for any given ϵ > 0 , there is a large constant C such that, for large n,
P sup | u | = C n θ 0 + ξ n u < n θ 0 1 ϵ .
Define Z = ( I ρ W ) 1 X ˜ T , ϵ * = ( I / r h o W ) 1 ϵ n , and then we can represent model (1) as
Y = ( I ρ W ) 1 X ˜ T β ˜ + ( I ρ W ) 1 ε = Z T β + ε * .
For the optimization model (7)
min β ˜ R 2 p , ρ [ 0 , 1 ] L ( β ˜ , ρ ) ,
we know that this is equivalent to
max β ˜ ˜ R 2 p , ρ [ 0 , 1 ] L ( β ˜ , ρ ) ,
which can be expressed as
n θ = i = 1 n exp Y i Z i β ˜ / γ n n j = 1 2 p p λ j β ˜ j .
Let D n ( θ , γ ) = i = 1 n exp Y i Z i β ˜ 2 / γ 2 Y i Z i β ˜ γ Z i . Since p λ j ( 0 ) = 0 for j = 1 , 2 , , p , we have
n θ 0 + ξ n u n θ 0 = i = 1 n exp Y i Z i β ˜ 0 + ξ n u / γ n i = 1 n exp Y i Z i β ˜ 0 / γ n j = 1 2 p p λ j β ˜ j 0 + ξ n u j p λ j β ˜ j 0 i = 1 n exp Y i Z i β ˜ 0 + ξ n u / γ n i = 1 n exp Y i Z i β ˜ 0 / γ n j = 1 s p λ j β ˜ j 0 + ξ n u j p λ j β ˜ j 0 = S n ( u ) + K n ( u ) .
Note that
S n ( u ) = i = 1 n exp Y i Z i β ˜ 0 + ξ n u 2 γ n i = 1 n exp Y i Z i β ˜ 0 2 γ n = ξ n i = 1 n exp Y i Z i β ˜ 0 2 γ n 2 Y i Z i β ˜ γ n Z i T T u 1 2 u T 2 γ n Z Z T e Y Z β ˜ 0 2 / γ n × 2 Y Z β ˜ 0 2 γ n 1 d F ( Z , y ) u n ξ n 2 1 + o p ( 1 ) = ξ n D n β 0 , γ n T u 1 2 u T I β ˜ 0 , γ n u n ξ n 2 1 + o p ( 1 ) .
Additionally,
K n ( u ) = n j = 0 s p λ j β ˜ j 0 + ξ n u j p λ j β ˜ j 0 = n ξ n j = 0 s p λ j β ˜ j 0 sign β ˜ j 0 u j + n ξ n 2 j = 0 s p λ j β ˜ j 0 u j 2 { 1 + o ( 1 ) } a n n ξ n j = 0 s u j + b n n ξ n 2 j = 0 s u j 2 { 1 + o ( 1 ) } a n n ξ n j = 0 s u j + 2 b n n ξ n 2 u 2 s a n n ξ n j = 0 s u j + b n n ξ n 2 u 2 .
Since γ n γ 0 = o p ( 1 ) , by Taylor’s expansion, we have
n θ 0 + ξ n u n θ 0 ξ n D n θ 0 , γ n T u 1 2 u T I θ 0 , γ n u n ξ n 2 1 + o p ( 1 ) s a n n ξ n j = 0 s u j + b n n ξ n 2 u 2 .
Note that n 1 / 2 D n θ 0 , γ 0 = O P ( 1 ) . So, there is O p n 1 / 2 ξ n = O p n ξ n 2 in the last equation of (A.7). By choosing a sufficiently large C, the second term dominates the first term uniformly in u = C . Since b n = o p ( 1 ) , the third term is also dominated by the second term of (A.7). Therefore, (A.1) holds by choosing a sufficiently large C.

Appendix B. Proof of Theorem 2

Appendix B.1. Proof of Theorem 2(i)

Here, we show the proof of the first point of Theorem 2. For this, we need only prove that, as n , there is any beta1 satisfying β ˜ 1 β ˜ 01 = O p n 1 / 2 , and for some small ϵ n = Cn 1 / 2 and j = s + 1 , , p , we have
n ( β ˜ ) β ˜ j = > 0 , for 0 < β ˜ j < ϵ n < 0 , for ϵ n < β ˜ j < 0 .
First, let us make
Q n ( β ˜ , γ ) = i = 1 n exp Y i Z i T β ˜ 2 / γ .
Then,
n ( β ˜ ) β ˜ j = Q n β ˜ , γ n β ˜ j n p λ j β ˜ j sign β ˜ j .
By Taylor expansion, we can obtain
n ( β ˜ ) β ˜ j = Q n β ˜ 0 , γ n β ˜ j + l = 1 p 2 Q n β ˜ 0 , γ n β ˜ j β ˜ l β ˜ l β ˜ l 0 + l = 1 p k = 1 p 3 Q n β ˜ * , γ n β ˜ j β ˜ l β ˜ k β ˜ l β ˜ l 0 β ˜ k β ˜ k 0 n p λ j β ˜ j sign β ˜ j = R 11 + R 12 + R 13 n p λ j β ˜ j sign β ˜ j .
where β ˜ * lies between β ˜ and β ˜ 0 . Moreover, because
n 1 2 Q n β ˜ 0 , γ 0 β ˜ j β ˜ l = E 2 Q n β ˜ 0 β ˜ j β ˜ l + o p ( 1 ) ,
n 1 Q n β ˜ 0 , γ 0 β ˜ j = O p n 1 / 2 .
So there is R 11 = O p ( n ) , R 12 = O p ( n ) , and R 13 = O p ( n ) . Additionally, because of b n = o p ( 1 ) and n a n = o p ( 1 ) , we are able to make β ˜ β ˜ 0 = O p n 1 / 2 .
Since 1 / min s + 1 j d n λ j = o p ( 1 ) and lim n inf lim t 0 + inf min s + 1 j d p λ j ( | t | ) / λ j > 0 with probability 1, the sign of the derivative is completely determined by that of β j . This completes the proof of Theorem 1 (i).

Appendix B.2. Proof of Theorem 2(ii)

Here, we show the proof of the second point of Theorem 2. For brevity, let β ˜ 10 * = ρ and β ˜ 1 j * = β ˜ 1 j , j = 1 , , s , then denote β ˜ 1 * = ρ , β ˜ 11 , , β ˜ 1 s T and β ˜ 0 * = ρ 0 , β ˜ 10 , , β ˜ 0 s T . We known that θ ^ minimizes Q n ( θ ) . We showed that there exists a n -consistent local maximizer of n β ˜ 1 , 0 . satisfying that
n β ˜ ^ 1 , 0 β ˜ j = 0 , for j = 1 , , s .
Since β ˜ ^ 1 is a consistent estimator, we have
Q n β ˜ ^ 1 , 0 , γ n β j n p λ j β ˜ ^ j sign β ˜ ^ j = Q n β ˜ 0 , γ n β ˜ j + l = 1 s 2 Q n β ˜ 0 , γ n β ˜ j β ˜ l + o p ( 1 ) β ˜ ^ l β ˜ 01 n p λ j β ˜ 0 j sign β ˜ 0 j + p λ j β 0 j + o p ( 1 ) β ˜ ^ j β ˜ 0 j = 0 .
The above equation can be rewritten as follows:
Q n β ˜ 0 , γ n β ˜ j = l = 1 s E 2 Q n β ˜ 0 , γ n β ˜ j β ˜ l + o p ( 1 ) β ˜ ^ l β ˜ 01 + n Δ + n Σ 1 + O p ( 1 ) β ˜ ^ n 1 β ˜ 01 ,
n I 1 β ˜ 01 , γ 0 β ˜ ^ n 1 β ˜ 01 + n Δ + n Σ 1 + O p ( 1 ) β ˜ ^ n 1 β ˜ 01 = n I 1 β ˜ 01 , γ 0 + Σ 1 β ˜ ^ n 1 β ˜ 01 + n Δ = n I 1 β ˜ 01 , γ 0 + Σ 1 β ˜ ^ n 1 β ˜ 01 + n I 1 β ˜ 01 , γ 0 + Σ 1 I 1 β ˜ 01 , γ 0 + Σ 1 1 Δ = n I 1 β ˜ 01 , γ 0 + Σ 1 β ˜ ^ n 1 β ˜ 01 + n I 1 β ˜ 01 , γ 0 + Σ 1 1 Δ = Q n β ˜ 0 , γ n β ˜ j + o p ( 1 ) .
Since n γ n γ 0 = o p ( 1 ) , invoking Slutsky’s lemma and the Lindeberg–Feller central limit theorem, we have
n I 1 β ˜ 01 , γ 0 + Σ 1 β ˜ ^ n 1 β ˜ 01 + I 1 β ˜ 01 , γ 0 + Σ 1 1 Δ N 0 , Σ 2 , w h e r e   β ˜ ^ n 1 = ρ ^ , β ˜ ^ 11 , , β ˜ ^ 15 T , a n d   β ˜ 01 = ρ 0 , β ˜ 01 , , β ˜ 0 s T , Σ 1 = diag p λ 1 β ˜ 01 , , p λ s β ˜ 0 s , Σ 2 = cov exp r 2 / γ 0 2 r γ 0 Z i 1 , Δ = p λ 1 β ˜ 01 sign β ˜ 01 , , p λ s β ˜ 0 s × sign β ˜ 0 s T , I 1 β ˜ 01 , γ 0 = 2 γ 0 E exp r 2 / γ 0 2 r 2 γ 0 1 × E Z i 1 Z i 1 T .
Then, the proof of part (ii) is completed.

References

  1. Anselin, L. Spatial Econometrics: Methods and Models; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1988. [Google Scholar]
  2. Kelejian, H.H. A spatial J-test for model specification against a single or a set of non-nested alternatives. Lett. Spat. Resour. Sci. 2008, 1, 3–11. [Google Scholar] [CrossRef] [Green Version]
  3. Zhang, X.; Yu, J. Spatial weights matrix selection and model averaging for spatial autoregressive models. J. Econom. 2018, 203, 1–18. [Google Scholar] [CrossRef]
  4. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  5. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  6. Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef] [Green Version]
  7. Wang, X.; Jiang, Y.; Huang, M.; Zhang, H. Robust variable selection with exponential squared loss. J. Am. Stat. Assoc. 2013, 108, 632–643. [Google Scholar] [CrossRef] [PubMed]
  8. Friedman, J.; Hastie, T.; Tibshirani, R. Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 2000, 28, 337–407. [Google Scholar] [CrossRef]
  9. Koenker, R.; Bassett, G. Regression quantiles. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
  10. Zou, H.; Yuan, M. Composite quantile regression and the oracle model selection theory. Ann. Stat. 2008, 36, 1108–1126. [Google Scholar] [CrossRef]
  11. Beer, C.; Riedl, A. Modelling spatial externalities in panel data: The Spatial Durbin model revisited. Pap. Reg. Sci. 2012, 91, 299–318. [Google Scholar] [CrossRef]
  12. Mustaqim; Setiawan; Suhartono; Ulama, B.S.S. Efficient estimation of simultaneous equations of spatial Durbin panel data model. In Proceedings of the AIP Conference Proceedings; AIP Publishing LLC: Melville, NY, USA, 2018; Volume 2021, p. 060024. [Google Scholar]
  13. Zhu, Y.; Han, X.; Chen, Y. Bayesian estimation and model selection of threshold spatial Durbin model. Econom. Lett. 2020, 188, 108956. [Google Scholar] [CrossRef]
  14. Wei, L.; Zhang, C.; Su, J.J.; Yang, L. Lixiong Panel threshold spatial Durbin models with individual fixed effects. Econom. Lett. 2021, 201, 109778. [Google Scholar] [CrossRef]
  15. Song, Y.; Liang, X.; Zhu, Y.; Lin, L. Robust variable selection with exponential squared loss for the spatial autoregressive model. Comput. Stat. Data Anal. 2021, 155, 107094. [Google Scholar] [CrossRef]
  16. Wang, H.; Li, G.; Tsai, C.L. Regression coefficient and autoregressive order shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2007, 69, 63–78. [Google Scholar] [CrossRef]
  17. Forsythe, G.E.; Moler, C.B.; Malcolm, M.A. Computer Methods for Mathematical Computations; Prentice Hall: Hoboken, NJ, USA, 1977. [Google Scholar]
  18. Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef] [Green Version]
  19. Dubin, R.A. Spatial autocorrelation and neighborhood quality. Reg. Sci. Urban Econ. 1992, 22, 433–452. [Google Scholar] [CrossRef]
Table 1. Nonregular estimation of normal data (q = 5).
Table 1. Nonregular estimation of normal data (q = 5).
n = 200 , 2 q = 10 n = 360 , 2 q = 10 n = 500 , 2 q = 10
ExpSquareLAD ExpSquareLAD ExpSquareLAD
ρ = 0.8 , σ = 1
β 1 3.09042.68663.2503 3.13352.84873.0801 2.80842.79472.9486
β 2 2.03031.94491.8899 1.95941.94982.0897 2.09492.13542.0207
β 3 1.64221.46891.8394 1.57251.54091.3781 1.61741.73941.6394
δ 1 1.2421.33821.3069 1.5041.61171.3924 1.46041.21561.3616
δ 2 1.51091.39631.3582 1.12451.1321.3174 1.17861.11391.111
δ 3 0.81551.07111.0625 1.11011.05750.9693 1.09031.00921.0871
ρ ^ 0.80010.80110.7999 0.79990.80060.7997 0.80020.79790.7981
MedSE0.59940.41580.4693 0.25180.28270.3432 0.2480.2340.3086
ρ = 0.5 , σ = 1
β 1 3.08543.03493.0617 3.12542.80393.0451 2.80583.18993.2542
β 2 2.00582.15321.927 1.95562.18232.2277 2.09861.99752.0256
β 3 1.67991.37881.6744 1.57021.42681.7227 1.62081.68131.303
δ 1 1.23381.2191.7939 1.48481.18141.3734 1.4581.41451.6612
δ 2 1.49431.42331.5202 1.13221.32661.3411 1.1860.98841.3373
δ 3 0.88491.07660.5961 1.10360.96140.8644 1.09660.96710.9961
ρ ^ 0.50210.49990.5 0.50030.50.5 0.49980.49990.4999
MedSE0.60070.3880.4564 0.28080.28290.3287 0.24520.22620.2939
ρ = 0.2 , σ = 1
β 1 3.00722.70082.7579 3.02832.95722.9077 2.6352.88363.0321
β 2 1.89031.70811.93 1.81522.0811.9718 1.86032.07941.4453
β 3 1.53861.42971.571 1.46461.27881.3697 1.45121.54861.5858
δ 1 0.86221.22410.9667 1.22971.21841.015 1.09631.1841.1689
δ 2 1.44271.05840.7845 0.82790.81041.2721 0.76840.8891.0247
δ 3 0.66090.57151.1202 1.02650.93510.4027 0.75510.88190.9224
ρ ^ 0.24170.24190.2519 0.22710.23650.2437 0.250.22160.2317
MedSE0.90370.81340.9757 0.54920.62860.7235 0.99210.52870.6407
ρ = 0.8 , σ = 2
β 1 3.07272.88823.0548 3.26342.77953.423 2.68083.19023.0531
β 2 2.11641.81412.0596 2.03871.67572.1636 2.10041.98171.9528
β 3 1.67471.39961.4515 1.74571.75971.7635 1.53951.35491.2795
δ 1 0.98071.57731.6723 1.41121.34931.5103 1.48311.60881.5378
δ 2 1.78140.86040.9949 1.09791.12571.0441 1.14391.00581.3651
δ _ 3 0.73010.93580.8807 1.07070.60321.0806 1.04231.01720.8286
ρ ^ 0.8010.80450.7943 0.79960.80420.7978 0.79890.79890.7897
MedSE1.20580.77310.9502 0.51310.54930.6914 0.55360.47190.5597
ρ = 0.5 , σ = 2
β 1 3.07623.19163.1159 3.23253.05282.8093 2.67313.01783.0432
β 2 2.08392.24081.5295 1.88261.94222.1974 2.12072.11751.9125
β 3 1.71691.3841.8689 1.60751.72061.5534 1.55651.3551.2292
δ 1 0.97061.36181.483 1.4371.52691.7395 1.48621.36111.1453
δ 2 1.78021.13371.2179 1.05091.08851.4445 1.18251.29471.3262
δ 3 0.80671.28631.0691 1.21730.86350.901 1.07770.94280.9601
ρ ^ 0.50440.50350.5 0.50070.49960.4986 0.49940.49730.4998
MedSE1.24590.80650.9201 0.60330.54340.6822 0.53870.46990.5622
ρ = 0.2 , σ = 2
β 1 2.98383.08113.0512 3.12532.9652.7197 2.53822.82192.7438
β 2 1.95251.83712.3438 1.72151.99632.2379 1.88931.8861.8351
β 3 1.54771.51981.2998 1.47411.64481.0015 1.40751.79821.7257
δ 1 0.55111.55011.1878 1.16621.18160.8059 1.16231.05451.1654
δ 2 1.66140.79111.9069 0.68630.98681.2406 0.80821.02111.3427
δ 3 0.57390.38870.1885 1.10210.82040.8706 0.78750.53430.8185
ρ ^ 0.26240.23410.2342 0.2350.23070.2313 0.24720.23490.238
MedSE1.51381.14171.5123 0.81430.8160.907 1.09210.74220.8705
Table 2. Nonregular estimation for high-dimensional data.
Table 2. Nonregular estimation for high-dimensional data.
n = 200 , 2 q = 40 n = 360 , 2 q = 80 n = 500 , 2 q = 120
ExpSquareLAD ExpSquareLAD ExpSquareLAD
ρ = 0.8 , σ = 1
β 1 2.99912.93552.8898 33.15793.273 2.10762.81973.0519
β 2 1.871.85342.148 2.24712.0971.7728 1.49592.13262.4031
β 3 1.66411.72861.3997 1.79331.47821.3566 1.26951.56881.1838
δ 1 1.54491.41231.2135 1.17430.97471.6914 0.84351.44141.6309
δ 2 1.11331.29981.4747 1.1931.0060.6772 1.55241.01021.3473
δ 3 1.1041.01650.8516 1.56470.69180.8134 0.98710.80720.3043
ρ ^ 0.78410.79760.7812 0.78140.76260.7698 0.77850.79210.7578
MedSE1.03891.19752.4913 3.90241.35193.216 3.57191.49833.4753
ρ = 0.5 , s i g m a = 1
β 1 2.96743.04613.0203 2.89813.34332.8355 3.06143.05612.7106
β 2 1.95941.9562.2012 2.22221.87162.041 2.12712.00352.1597
β 3 1.70141.55281.4707 1.6981.54861.533 1.40041.83451.5845
δ 1 1.4011.41711.8907 1.45371.32441.4937 1.41511.65721.516
δ 2 1.31441.30971.1209 1.34711.05621.4014 1.25551.12350.9392
δ 3 1.11861.09070.9799 0.94390.9771.1517 1.30211.21481.0304
ρ ^ 0.49840.4990.5 0.50080.50040.5 0.49970.50070.5
MedSE0.72680.83611.0131 0.8460.87351.0286 1.1520.90111.1143
ρ = 0.2 , σ = 1
β 1 2.67742.73372.2269 2.72222.63722.5639 2.76792.6252.3669
β 2 1.87581.53912.0274 1.97261.44351.4924 1.75171.83281.7334
β 3 1.60731.16911.5513 1.55151.45171.9718 1.35881.41071.5919
δ 1 0.66480.1578−0.522 1.34220.59940.0136 0.47780.55410.5164
δ 2 1.34010.53490.2749 1.01150.13720.3757 −0.186−0.095−0.533
δ 3 0.85270.22991.1821 0.47950.6847−0.254 0.40570.9201−0.205
ρ ^ 0.27310.3680.424 0.3090.37210.4449 0.4290.42030.4601
MedSE1.65453.31944.2853 2.34813.25674.9894 5.03423.93224.9488
ρ = 0.8 , σ = 2
β 1 3.04122.92793.2539 3.00012.94253.3778 3.06612.91542.9285
β 2 1.81981.7311.3838 2.20042.0091.6173 2.09611.79231.9155
β 3 1.56951.67931.9783 1.8181.75792.1066 1.34091.82221.9951
δ 1 1.37621.81330.6013 1.00671.29251.0236 1.23581.34861.1922
δ 2 1.43441.00011.841 1.32861.32431.0053 1.06311.01361.148
δ 3 0.96481.25881.7202 1.47151.1231.021 1.6290.96060.9799
ρ ^ 0.77750.78860.7847 0.78080.79770.7196 0.7870.79410.7652
MedSE1.97471.91763.2383 4.25312.10493.9547 2.6412.1394.1654
ρ = 0.5 , σ = 2
β 1 2.99243.1273.3996 2.90712.95563.2707 3.12242.91922.7059
β 2 1.9371.73252.2317 2.16742.19391.9265 2.10342.07632.0959
β 3 1.63661.92841.3111 1.75161.53561.6939 1.3291.23351.5695
δ 1 1.18141.31551.5799 1.30221.34351.781 1.32291.24761.5013
δ 2 1.67921.43351.1793 1.50961.49140.8587 1.11171.37110.9883
δ 3 1.04170.98911.1182 0.86490.82261.0424 1.61050.96171.3937
ρ ^ 0.49820.50050.5 0.50130.50150.5 0.49920.49960.5
MedSE1.77611.69922.0438 1.94051.74392.1404 2.36911.82272.2753
ρ = 0.8 , σ = 2
β 1 2.71262.4633.0218 2.73652.67662.3112 2.8192.74982.8593
β 2 1.83891.77031.1025 1.91651.6251.8649 1.73351.76381.7436
β 3 1.55751.3021.5166 1.59311.18681.3592 1.2991.40371.2125
δ 1 0.50040.3751.4855 1.17630.9432−0.186 0.46550.4060.846
δ 2 1.64850.5337−0.864 1.08690.1892−0.044 −0.2270.1407−1.145
δ 3 0.81290.676−0.697 0.4247−0.6410.3875 0.6011−0.2890.6946
ρ ^ 0.27160.35980.5 0.3110.3550.4821 0.42610.41470.4469
MedSE2.19553.54155.0709 2.89973.86625.2612 5.35884.2565.583
Table 3. Nonregular estimation of data with outliers in dependent variable y.
Table 3. Nonregular estimation of data with outliers in dependent variable y.
n = 200 , 2 q = 10 n = 360 , 2 q = 10 n = 500 , 2 q = 10
ExpSquareLAD ExpSquareLAD ExpSquareLAD
ρ = 0.8 , σ = 1 , ξ = 0.01
β 1 3.0533.3332.873 3.023.2372.754 2.8823.1022.827
β 2 2.2131.481.958 2.1252.0481.836 2.1261.9481.754
β 3 1.5771.5792.046 1.571.8571.908 1.6041.6771.757
δ 1 1.3411.9830.811 1.4440.9661.7 1.4641.1651.344
δ 2 1.3110.9591.736 1.1271.1131.199 0.9620.9061.722
δ 3 0.8761.2240.644 1.1821.2840.752 1.1190.3820.954
ρ ^ 0.8010.7910.798 0.80.80.786 0.7940.7560.799
MedSE0.6091.8661.009 0.3981.3920.755 0.4051.2680.623
ρ = 0.5 , σ = 1 , ξ = 0.01
β 1 3.0353.3713.049 3.0363.1233.179 2.8743.0312.972
β 2 2.2172.192.259 2.0941.6721.981 2.141.6872.249
β 3 1.5611.8751.646 1.5552.0081.582 1.6991.4741.436
δ 1 1.1431.371.623 1.4481.4321.227 1.4921.4641.264
δ 2 1.3950.6211.193 1.1361.6420.954 1.0751.4721.079
δ 3 0.9290.9180.768 1.1190.7421.213 1.1861.1611.138
ρ ^ 0.50.4970.5 0.50.4990.499 0.5010.50.5
MedSE0.7381.3410.84 0.3471.0970.627 0.3410.9590.511
ρ = 0.2 , σ = 1 , ξ = 0.01
β 1 3.0322.8172.936 2.9613.0773.17 2.73.3332.752
β 2 2.0852.4971.757 1.9461.6461.802 1.8691.91.853
β 3 1.521.2241.654 1.4411.8171.329 1.5061.6141.479
δ 1 0.8821.3811.323 1.2160.8831.478 1.1511.6541.018
δ 2 1.6671.1170.84 0.8790.7850.688 0.6311.3231.034
δ 3 0.7230.491.022 0.9961.2070.72 0.8090.6181.096
ρ ^ 0.2130.2070.222 0.2250.230.223 0.2560.1880.232
MedSE1.0911.5091.065 0.5561.0760.807 1.0611.0520.625
ρ = 0.8 , σ = 1 , ξ = 0.05
β 1 2.8452.0643.28 2.9143.4933.116 2.9353.8573.369
β 2 2.2283.4731.405 2.1482.5011.688 2.1092.3361.564
β 3 1.851.0762.271 1.7030.181.86 1.6752.191.487
δ 1 1.412.9570.344 1.3610.5220.856 1.7911.1811.062
δ 2 0.8630.2582.166 1.05-0.32.015 0.8990.7822.052
δ 3 0.8693.3230.661 1.2552.5460.566 0.951.5681.107
ρ ^ 0.7960.7880.782 0.7990.7940.793 0.7880.770.789
MedSE0.9844.7782.45 0.7163.7071.366 0.8353.0771.048
ρ = 0.5 , σ = 1 , ξ = 0.05
β 1 2.8943.6362.978 2.83.6552.922 2.8823.0273.152
β 2 2.1691.2932.131 2.1772.572.54 2.1491.9712.333
β 3 1.7660.0531.419 1.5721.9191.607 1.7052.1841.673
δ 1 1.215-0.050.866 1.4361.791.238 1.7290.580.811
δ 2 1.2831.5932.262 1.020.4491.381 1.011.3811.316
δ 3 0.8330.4890.595 1.068-0.270.837 1.042-0.141.032
ρ ^ 0.4970.4730.5 0.4990.4940.501 0.4990.4990.5
MedSE0.6273.8781.536 0.5062.9890.996 0.7992.4670.803
ρ = 0.2 , σ = 1 , ξ = 0.05
β 1 3.1112.3733.24 2.5973.5162.797 2.7843.6172.885
β 2 2.2943.9552.314 2.1952.0321.871 2.0131.8442.128
β 3 1.7910.1541.293 1.3351.0971.344 1.5891.2961.467
δ 1 1.6872.8491.613 1.5422.571.161 1.4181.0351.627
δ 2 1.5361.5030.8 0.981.21.125 0.80.8721.222
δ 3 0.8051.7321.143 0.8840.3440.76 0.9341.1471.145
ρ ^ 0.1330.0340.183 0.1890.0880.226 0.2210.1740.189
MedSE0.9753.4611.326 1.0742.4280.822 0.812.0470.67
Table 4. Nonregular Estimation of Data with Noise in Spatial Weight Matrix.
Table 4. Nonregular Estimation of Data with Noise in Spatial Weight Matrix.
n = 200 , 2 q = 10 n = 360 , 2 q = 10 n = 500 , 2 q = 10
ExpSquareLAD ExpSquareLAD ExpSquareLAD
ρ = 0.5 , σ = 1 , ξ = 0.01
β 1 3.1253.1432.909 3.2862.6143.142 2.822.1382.895
β 2 1.6921.891.934 1.32.392.025 2.072.3671.826
β 3 1.9191.6331.597 1.680.7161.622 1.6512.7611.473
δ 1 1.1670.7371.612 0.9971.41.418 1.3650.861.452
δ 2 1.4221.2351.059 −0.280.5841.318 1.2470.4851.191
δ 3 0.8980.0381.076 2.0831.161.273 0.9780.7991.164
ρ ^ 0.5010.4920.496 0.5010.4770.5 0.50.4860.5
MedSE0.6362.5960.562 2.6572.6230.411 0.2752.5910.341
ρ = 0.5 , σ = 1 , ξ = 0.03
β 1 2.9411.7282.955 1.382.1933.15 2.832.2952.963
β 2 1.1432.5752.278 0.2982.3221.952 1.882.2922.002
β 3 1.7712.2991.386 1.1950.3831.267 1.8671.8091.619
δ 1 1.008−0.631.607 0.0190.2111.156 0.3962.3481.218
δ 2 0.6052.4751.326 0.2550.2830.961 0.9741.3170.994
δ 3 0.5790.1460.922 0.580.8260.921 0.8810.6991.136
ρ ^ 0.5030.4680.495 0.5030.4490.499 0.4940.450.498
MedSE1.5613.9250.819 3.6453.9220.547 1.2274.9720.454
ρ = 0.5 , σ = 1 , ξ = 0.05
β 1 3.021.9813.046 3.1832.0542.849 2.8932.5073.187
β 2 1.4790.8572.072 1.2590.6362.253 1.9110.8652.265
β 3 1.9780.7530.897 1.7211.6491.299 1.8272.0391.362
δ 1 0.8370.6451.349 0.7850.671.362 0.610.9340.962
δ 2 1.557−0.561.26 0.551−0.681.275 0.8131.3841.135
δ 3 −0.23−0.130.379 1.1050.5090.536 1.067−1.561.24
ρ ^ 0.5020.4310.489 0.5040.4310.493 0.4930.4590.491
MedSE1.9225.0791.207 2.1914.5880.805 1.0345.2320.69
Table 5. Estimation with adaptive-l1 regularizer on normal data (q = 5).
Table 5. Estimation with adaptive-l1 regularizer on normal data (q = 5).
n = 200 , 2 q = 10 n = 360 , 2 q = 10 n = 500 , 2 q = 10
ExpSquareLAD ExpSquareLAD ExpSquareLAD
ρ = 0.8 , σ = 1
Correct105.235.78 105.535.61 105.615.64
Incorrect000 000 000
ρ ^ 0.80080.80350.8011 0.79990.80140.7982 0.79970.79950.801
MedSE0.37470.38870.4697 0.14680.28430.3259 0.13160.23740.2944
ρ = 0.5 , σ = 1
Correct105.275.61 105.475.74 105.695.75
Incorrect000 000 000
ρ ^ 0.50130.50080.502 0.50010.50030.5005 0.49990.49970.5005
MedSE0.35750.35140.4354 0.13420.27510.3161 0.12070.22170.2699
ρ = 0.2 , σ = 1
Correct105.425.52 9.985.425.36 95.35.46
Incorrect00.050.14 00.010.03 000
ρ ^ 0.23510.23750.2508 0.22570.2310.2426 0.2450.22650.2407
MedSE0.79050.86371.0335 0.47580.63280.7898 0.83720.55650.6443
ρ = 0.8 , σ = 2
Correct105.345.12 105.15.42 95.175.18
Incorrect000 600 000
ρ ^ 0.80170.79920.8087 0.50330.79880.8018 0.80020.80360.7986
MedSE0.82020.78260.9687 4.52190.55240.6503 0.27530.47290.5452
ρ = 0.5 , σ = 2
Correct105.345.21 105.45.27 95.275.23
Incorrect000 000 000
ρ ^ 0.50340.50140.4998 0.50050.50010.5001 0.49970.49880.4998
MedSE0.8130.750.9107 0.30390.55540.6583 0.27230.44260.5261
ρ = 0.2 , σ = 2
Correct85.175.11 95.515.29 95.315.27
Incorrect00.030.23 00.010.02 000
ρ ^ 0.26010.23820.2301 0.23590.22410.2535 0.24320.2460.246
MedSE1.39031.09421.3318 0.69050.75081.0301 0.85080.70310.8261
Table 6. Estimation with adaptive-l1 regularizer on normal data of high dimension.
Table 6. Estimation with adaptive-l1 regularizer on normal data of high dimension.
n = 200 , 2 q = 40 n = 360 , 2 q = 80 n = 500 , 2 q = 120
ExpSquareLAD ExpSquareLAD ExpSquareLAD
ρ = 0.8 , σ = 1
Correct4021.3221.19 8042.4242.63 119.0165.1164.21
Incorrect00.020.04 000.03 00.020.07
ρ ^ 0.79910.80110.769 0.80.7880.775 0.79950.7730.773
MedSE0.18181.0911.746 0.15531.3481.969 0.26721.4781.992
ρ = 0.5 , σ = 1
Correct4021.6122.4 8043.5245.49 119.9966.7869.73
Incorrect000 000 000
ρ ^ 0.49840.50180.5 0.50030.50.5 0.50050.50.5
MedSE0.18260.84580.767 0.14890.8670.788 0.22890.9150.809
ρ = 0.2 , σ = 1
Correct39.9920.7421.06 73.9941.9842.53 109.9962.6963.91
Incorrect00.650.89 00.721.15 0.990.721.14
ρ ^ 0.22060.35540.375 0.24760.36440.437 0.34240.3710.431
MedSE0.40322.93963.381 0.82373.44793.853 2.69213.6913.975
ρ = 0.8 , σ = 2
Correct3820.3120.9 77.9841.0642.05 117.9962.5862.78
Incorrect00.020.05 00.010.01 000.16
ρ ^ 0.79620.79440.785 0.80020.79370.778 0.79820.7930.753
MedSE0.46851.77951.959 0.39632.01062.263 0.72392.1232.766
ρ = 0.5 , σ = 2
Correct38.0220.5121.13 76.9941.7643.31 118.0163.3765.18
Incorrect000 000 000
ρ ^ 0.49630.50090.5 0.50060.49870.5 0.50080.4990.5
MedSE0.44161.61281.464 0.39871.71931.522 0.66231.7761.571
ρ = 0.8 , σ = 2
Correct38.9920.7320.87 7541.2242.04 11562.4363.03
Incorrect00.811.22 00.591.05 00.81.1
ρ ^ 0.2190.35830.461 0.23830.35230.434 0.29620.4130.462
MedSE0.57593.6563.804 0.75933.60173.887 1.87114.3924.039
Table 7. Estimation with adaptive-l1 regularization when the observations of y have outliers.
Table 7. Estimation with adaptive-l1 regularization when the observations of y have outliers.
n = 200 , 2 q = 10 n = 360 , 2 q = 10 n = 500 , 2 q = 10
ExpSquareLAD ExpSquareLAD ExpSquareLAD
ρ = 0.8 , σ = 1 , ξ = 0.01
Correct105.35.47 105.055.45 9.855.5
Incorrect00.230.01 00.070 00.090
ρ ^ 0.80160.77590.781 0.79970.79780.7949 0.79570.76060.7991
MedSE0.40011.89770.5016 0.22291.62610.3415 0.25471.42350.287
ρ = 0.5 , σ = 1 , ξ = 0.01
Correct105.235.53 105.115.68 105.035.74
Incorrect00.10 00.020 00.010
ρ ^ 0.49990.50030.5024 0.49970.49670.4987 0.50040.49810.4999
MedSE0.53841.40930.4443 0.15541.22470.3282 0.19621.15210.2811
ρ = 0.2 , σ = 1 , ξ = 0.01
Correct9.95.155.43 105.115.44 9.15.335.62
Incorrect00.170.14 00.020.03 00.010
ρ ^ 0.20960.25690.2543 0.22360.20910.2362 0.25130.23440.2358
MedSE0.72341.57111.0847 0.42011.14470.8001 0.8570.95370.6616
ρ = 0.8 , σ = 1 , ξ = 0.05
Correct95.560.7973 105.330.7993 8.25.230.7991
Incorrect0.20.735.34 00.55.43 00.425.73
ρ ^ 0.7970.78920 0.79980.79540 0.78570.78920
MedSE0.92654.66490.4994 0.39013.74790.3404 0.58732.94290.2881
ρ = 0.5 , σ = 1 , ξ = 0.05
Correct105.310.5 9.85.240.4999 8,25.130.5002
Incorrect0.10.455.34 00.235.87 00.185.76
ρ ^ 0.49690.49990 0.49910.49740 0.49940.49610
MedSE0.4753.80280.4602 0.23392.91760.332 0.44732.46820.2825
ρ = 0.8 , σ = 1 , ξ = 0.05
Correct105.150.2815 8.25.060.2357 9.15.040.2364
Incorrect00.255.34 00.095.47 00.035.34
ρ ^ 0.13660.16480.25 0.17620.11590.02 0.21520.15380
MedSE0.48583.18581.0803 0.28332.44970.7324 0.51342.10760.6375
Table 8. Estimation with adaptive-l1 regularization with noisy weighting matrix W.
Table 8. Estimation with adaptive-l1 regularization with noisy weighting matrix W.
n = 200 , 2 q = 10 n = 360 , 2 q = 10 n = 500 , 2 q = 10
ExpSquareLAD ExpSquareLAD ExpSquareLAD
ρ = 0.5 , σ = 1 , ξ = 0.01
Correct8.15.245.4 105.045.45 105.115.54
Incorrect00.340 00.410 00.240
ρ ^ 0.49910.49740.5 0.50050.4890.4989 0.50.48350.5001
MedSE1.19252.45030.54 0.18972.55520.3621 0.15572.25790.2916
ρ = 0.5 , σ = 1 , ξ = 0.03
Correct9.85.545.4 6.35.615.3 6.15.145.76
Incorrect00.870 1.10.840 1.90.720
ρ ^ 0.50120.46440.4996 0.49820.46650.4998 0.49740.4760.4976
MedSE0.70484.68580.6433 1.7243.52770.4495 3.06563.71780.3789
ρ = 0.5 , σ = 1 , ξ = 0.05
Correct9.965.25.43 7.025.375.38 6.025.295.77
Incorrect0.041.210 01.20 11.040
ρ ^ 0.50190.46530.4993 0.49930.47580.4953 0.49740.44910.495
MedSE0.8324.93960.806 1.38224.39620.5848 2.21254.09980.4734
Table 9. Variable description.
Table 9. Variable description.
VariableDescription
STATIONID variable
PRICEsales price of house iin $1000 (MLS)
NROOMthe number of rooms
DWELL1 if detached unit, 0 otherwise
NBATHthe number of bathrooms
PATIO1 if patio, 0 otherwise
FIREPL1 if fireplace, 0 otherwise
AC1 if air conditioning, 0 otherwise
BMENT1 if basement, 0 otherwise
NSTORnumber of stories
GARnumber of car spaces in garage (0 = no garage)
AGEage of dwelling in years
CITCOU1 if dwelling is in Baltimore County, 0 otherwise
LOTSZlot size in hundreds of square feet
SQFTinterior living space in hundreds of square feet
Xx coordinate on the Maryland grid
Yy coordinate on the Maryland grid
Table 10. Variable section on real data.
Table 10. Variable section on real data.
EXPSquareLAD
Adaptive-l1Null Adaptive-l1Null Adaptive-l1Null
NROOM0.496740020.20409727 0.010518810.1929546 0.00371230.2362159
DWELL−1.3922 × 10−170.45980677 0.000758310.4703206 0.000291620.5097926
NBATH0.0300635780.36030577 0.003854690.3514846 0.00125520.4254525
PATIO4.91478 × 10−180.01072777 0.000973570.017285 0.00014135−0.092123
FIREPL−9.5477 × 10−18−0.01059 0.00029720.0013726 0.00012271−0.077913
AC−1.2919 × 10−170.3021609 0.0010.311554 0.000202670.3138447
BMENT−2.2645 × 10−170.1187834 0.00441110.1235361 0.001164740.1317025
NSTOR−9.9947 × 10−170.47809045 0.003592860.503778 0.001290790.4164672
GAR−2.2383 × 10−17−0.1040092 0.00038606−0.099652 0.00039553−0.0844
AGE4.72988 × 10−170.01105389 0.033191360.0113282 0.020914040.0100436
CITCOU−6.0274 × 10−170.68202393 0.000935990.6868509 0.000254280.4701451
LOTSZ1.04997 × 10−170.0011463 0.0021950.0011849 0.004439124.606×10−5
SQFT−7.0764 × 10−17−0.0362982 0.0316769−0.037256 0.01075005−0.034887
NROOM_W−0.28507527−0.1092328 −0.012775−0.098775 −0.0059391−0.17384
DWELL_W1.61355 × 10−33−0.1051101 −0.0010685−0.102124 −0.0005411−0.107508
NBATH_W−7.3257 × 10−18−0.1575366 −0.0033218−0.159341 −0.0014154−0.095605
PATIO_W−8.7577 × 10−180.0642994 −8.43 × 10−50.0601125 1.1236 × 10−50.1341514
FIREPL_W−4.6005 × 10−34−0.0387668 0.00068287−0.036572 −2.587 × 10−5−0.042569
AC_W−3.3933 × 10−17−0.2098427 −0.001−0.223255 −0.0004486−0.187274
BMENT_W−0.02780421−0.111448 −0.0074127−0.117315 −0.0032221−0.1317
NSTOR_W−2.6009 × 10−32−0.1648658 −0.0047893−0.159493 −0.0023109−0.178134
GAR_W1.94949 × 10−170.15495116 0.0010.1566788 0.000171730.1186948
AGE_W5.87444 × 10−33−0.0069188 −0.0228056−0.007876 −0.0204497−0.001951
CITCOU_W−0.06178084−0.4116914 −0.0030172−0.426973 −0.001−0.240381
LOTSZ_W−2.2196 × 10−32−0.0005826 −0.0034412−0.000541 −0.0057558−0.000449
SQFT_W2.75575 × 10−170.01072435 −0.02071670.0098465 −0.01128530.0143746
ρ 0.4986137190.49970041 0.495712370.4997043 0.497805290.499992
MSE0.1219117270.11475312 0.137926060.1149259 0.144673430.114829
BIC−304.892336−317.66083 −278.85061−317.3434 −268.77299−317.5214
Table 11. Visual representation of variable selection on real data.
Table 11. Visual representation of variable selection on real data.
EXPSquareLAD
Adaptive-l1Null Adaptive-l1Null Adaptive-l1Null
NROOM++ ++ ++
DWELL + + +
NBATH++ ++ ++
PATIO + +
FIREPL +
AC + + +
BMENT + ++ ++
NSTOR + ++ ++
GAR
AGE + ++ ++
CITCOU + + +
LOTSZ + ++ +
SQFT + +
NROOM_W
DWELL_W
NBATH_W
PATIO_W + + +
FIREPL_W
AC_W
BMENT_W
NSTOR_W
GAR_W + + +
AGE_W
CITCOU_W
LOTSZ_W
SQFT_W + + +
count “+”213 714 711
count “−”312 911 713
count525 1625 1424
BIC−304.892336−317.66083 −278.85061−317.3434 −268.77299−317.5214
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Z.; Song, Y.; Cheng, Y. Robust Variable Selection with Exponential Squared Loss for the Spatial Durbin Model. Entropy 2023, 25, 249. https://doi.org/10.3390/e25020249

AMA Style

Liu Z, Song Y, Cheng Y. Robust Variable Selection with Exponential Squared Loss for the Spatial Durbin Model. Entropy. 2023; 25(2):249. https://doi.org/10.3390/e25020249

Chicago/Turabian Style

Liu, Zhongyang, Yunquan Song, and Yi Cheng. 2023. "Robust Variable Selection with Exponential Squared Loss for the Spatial Durbin Model" Entropy 25, no. 2: 249. https://doi.org/10.3390/e25020249

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop