Next Article in Journal
A Survey on Population-Based Deep Reinforcement Learning
Next Article in Special Issue
Multi-Source Data Repairing: A Comprehensive Survey
Previous Article in Journal
Starlikeness Associated with the Van Der Pol Numbers
Previous Article in Special Issue
The Classification of Application Users Supporting and Facilitating Travel Mobility Using Two-Step Cluster Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

High-Dimensional Variable Selection for Quantile Regression Based on Variational Bayesian Method

Yunnan Key Laboratory of Statistical Modeling and Data Analysis, Yunnan University, Kunming 650091, China
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(10), 2232; https://doi.org/10.3390/math11102232
Submission received: 12 April 2023 / Revised: 1 May 2023 / Accepted: 8 May 2023 / Published: 10 May 2023
(This article belongs to the Special Issue Big Data Mining and Analytics with Applications)

Abstract

:
The quantile regression model is widely used in variable relationship research of moderate sized data, due to its strong robustness and more comprehensive description of response variable characteristics. With the increase of data size and data dimensions, there have been some studies on high-dimensional quantile regression under the classical statistical framework, including a high-efficiency frequency perspective; however, this comes at the cost of randomness quantification, or the use of a lower efficiency Bayesian method based on MCMC sampling. To overcome these problems, we propose high-dimensional quantile regression with a spike-and-slab lasso penalty based on variational Bayesian (VBSSLQR), which can, not only improve the computational efficiency, but also measure the randomness via variational distributions. Simulation studies and real data analysis illustrated that the proposed VBSSLQR method was superior or equivalent to other quantile and nonquantile regression methods (including Bayesian and non-Bayesian methods), and its efficiency was higher than any other method.

1. Introduction

Quantile regression, as introduced by Koenker and Bassett (1978) [1], is an important statistical inference about the relationship between quantiles of the response distribution and available covariates, and can offer a practically significant alternative to the traditional mean regression, because it provides a more comprehensive description of the response distribution than the mean. Moreover, quantile regression can capture the heterogeneous impact of regressors on different parts of the distribution [2], has excellent computational properties [3], exhibits robustness to outliers, and has a wide applicability [4]. For these reasons, quantile regression has attracted extensive attention in the literature. For example, see [5] for a Bayesian quantile regression with asymmetric Laplace distribution used to specify the likelihood, [6] for a Bayesian nonparametric approach to inference for quantile regression, ref. [7] for a mechanism of Bayesian inference of quantile regression models, and [8] for model selection in quantile regression, among others.
Although there is a growing literature on quantile regression, to the best of our knowledge, little of the existing quantile regression models have focused on high-dimensional data with variable numbers and larger sample sizes. In practice, a large number of variables may be collected and some of these will be insignificant and should be excluded from the final model. In the past two decades, there has been active methodological research on penalized methods for significant variable selection in linear parametric models. For example, see [9] for ridge regression, ref. [10] for a least absolute shrinkage and selection operator (Lasso), ref. [11] for smoothly clipped absolute deviation penalty (SCAD), ref. [12] for elastic net penalty, and ref. [13] for adaptive lasso methods. These methods have been extended to quantile regression; for example, see [14] for a L 1 -regularization method for quantile regression, and ref. [15] for variable selection of quantile regression using SCAD and adaptive-Lasso penalties. Nevertheless, the aforementioned regularization methods discussed are both computationally complex and unstable. Additionally, they fail to account for prior information about parameters, which can lead to an unsatisfactory parametric estimation accuracy. In recent decades, Bayesian approaches to variable selection and parameter estimation have garnered significant attention. This is because they can substantially enhance the accuracy and efficiency of parametric estimation, by imposing various priors on model parameters, consistently selecting crucial variables, and providing more information for variable selection than penalization methods for highly nonconvex optimization problems. For example, see [16] for a Bayesian Lasso where the L 1 penalty is involved in Laplace prior, [17] for a Bayesian form of adaptive Lasso, ref. [18] for Bayesian Lasso quantile regression (BLQR), and [19] for Bayesian adaptive Lsso quantile regression (BALQR). The literature mentioned above implemented the standard Gibbs sampler for posterior computation, which is not easily scalable for high-dimensional data, where the number of variables is large compared with the sample size [20].
To address this issue, the Bayesian variable selection method with a spike-and-slab prior [20] has been favored by researchers, and this can be applied to high-dimensional data at the cost of a heavy computation burden. As a computationally efficient alternative to Markov chain Monte Carlo (MCMC) simulation, variational Bayes (VB) methods are gaining traction in machine learning and statistics for approximating posterior distributions in Bayesian inference. High-efficiency variational Bayesian spike-and-slab lasso(VBSSL) has been explored for certain high-dimensional models. Ray and Szabo (2022) [21] used a VBSSL method in a high-dimensional linear model, with regression coefficient’s prior specified as a mixture of Laplace distribution and Dirac mass. Yi and Tang (2022) [22] used VBSSL technology in high-dimensional linear mixed models, with a interesting prior parameter as a mixture of two Laplace distributions. However, to the best of our knowledge, there has been little work done on a VBSSL method for quantile regression. Xi et al. (2016) [23] considered Bayesian variable selection for nonparametric quantile regression with a small variable dimension, in which a spike-and-slab prior was chosen as a mixture of the point mass at zero and normal distribution. In this paper, to reduce the computational burden and quantify the parametric uncertainty, we propose quantile regression with spike-and-slab lasso penalty based on variational Bayesian (VBSSLQR), in which the prior is a mixture of two Laplace distributions, with a smaller or larger variance, respectively.
The main contributions of this paper are as follows: First, our proposed VBSSLQR method can perform variable selection for high-dimensional quantile regression at a relatively low computational cost, and without the need for nonconvex optimization, while also avoiding the curse of dimensionality problem. Second, in contrast to mean regressions, our proposed quantile approach offers a more systematic strategy for analyzing how covariates impact the various quantiles of the response distribution. Third, in ultra-high-dimensional regression, the mean regression errors are frequently presumed to be subGaussian, which is not required in our setting.
The rest of the paper is organized as follows: In Section 2, for high-dimensional data, we propose an efficient quantile regression with a spike-and-slab lasso penalty based on variational Bayes (VBSSLQR). In Section 3, we randomly generate high-dimensional data with n = 200 and p = 500 (excluding intercept items), and perform 500 simulation experiments, to explore the performance of our algorithm and compare it with other quantile regression methods (Bayesian and non-Bayesian) and nonquantile regression methods. The results show that our method is superior to other approaches in the case of high-dimensional data. We applied VBSSLQR to a real dataset that contained information about crime in various cities in the United States, and compared it with other quantile regression methods. The results showed that our method also had a good performance and excellent efficiency with real data, and the relevant results are shown in Section 4. Some concluding remarks are given in Section 5. Technical details are presented in the Appendix A, Appendix B, Appendix C and Appendix D.

2. Models and Methods

2.1. Quantile Regression

Consider a dataset of n independent subjects. For the ith subject, let y i be the response, while x i = ( 1 , x i 1 , . . . , x i r ) is an ( r + 1 ) × 1 predictor vector, a simple linear regression model is defined as follows:
y i = x i β + ε i , i = 1 , 2 , , n ,
where β = ( β 0 , β 1 , , β r ) is the regression coefficient vector with β 0 corresponding to the intercept terms, and ε i represents the error term with unknown distribution. It is usual to assume that the τ th quantile of the random error term is 0; that is, Q τ ( ε i ) = 0 for 0 < τ < 1 . According to this assumption, the form of τ th quantile regression of model (1) is specified as follows:
Q y i ( τ | x i ) = x i β ,
where Q y i ( τ | x i ) is the inverse cumulative distribution function of y i , given x i evaluated at τ . The estimate of the regression coefficient vector β in Equation (2) is
β ^ = argmin β R r + 1 i : y i x i β τ | y i x i β | + i : y i < x i β ( 1 τ ) | y i x i β | = argmin β R r + 1 i = 1 n ρ τ ( y i x i β ) ,
where the loss function ρ τ ( μ ) = μ × ( τ I ( μ < 0 ) ) with the indicator function I ( · ) .
In light of [5,24], minimizing Equation (3) is equivalent to maximizing the likelihood of n independent individuals with the ith one distributed as an asymmetric Laplace distribution (ALD) specified as
p ( y i | x i , β , σ , τ ) = τ ( 1 τ ) σ exp ρ τ y i μ i σ ,
where the local parameter μ i = x i β , the scale parameter σ > 0 , and the skewness parameter τ is between 0 and 1; obviously, ALD is a Laplace distribution when τ = 0.5 and μ i = 0 . However, it is computationally infeasible to carry out statistical inference based directly on Equation (4) involving the nondifferentiable point μ i . Following [25], Equation (4) can be rewritten in the following hierarchical fashion:
y i = x i β + k 1 z i + k 2 σ z i ξ i , z i | σ i . i . d . Exp 1 σ , ξ i i . i . d . N ( 0 , 1 ) , z i is independent of ξ i ,
where k 1 = 1 2 τ τ ( 1 τ ) , k 2 = 2 τ ( 1 τ ) and Exp 1 σ denotes the exponential distribution with mean σ , whose specific density function is p ( z i | σ ) = 1 σ exp 1 σ z i I ( z i > 0 ) . Equation (5) illustrates that an asymmetric Laplace distribution can also be represented as a mixture of exponential and standard normal distributions, which allows us to express a quantile regression model as a normal regression model, in which the response has the following conditional distribution:
y i | x i , β , z i , σ ind N x i β + k 1 z i , k 2 σ z i .
For the above-defined quantile regression model with high-dimensional covariate vector (r is large enough), it is of interest to estimate the parameter vector β and to identify the critical covariates. To this end, we considered Bayesian quantile regression based on spike-and-slab lasso, as follows:

2.2. Bayesian Quantile Regression Based on a Spike-and-Slab Lasso

As early as 2016, Xi et al. [23] applied a spike-and-slab prior to Bayesian quantile regression, but their proposed prior was a mixture of zero particle and normal distribution with large variance, and the estimate of posterior density was obtained using a Gibbs sampler. To offer novel theoretical insights into a class of continuous spike-and-slab priors, Rockova (2018) [26] introduced a novel family of spike-and-slab priors, which are a mixture of two density functions with spike or slab probability. In this paper, we adopt a spike-and-slab lasso prior with a mixture of two Laplace distributions with large or small variance, respectively [26], which facilitates the variational Bayesian technique for approximating the posterior density of parameters and for improving the efficiency of the algorithm. In light of ref. [26], given the indicator γ j = 0 or 1, the prior of β in the Bayesian quantile regression model (5) can be written as
π ( β | γ ) = j = 0 r π ( β j | γ j ) = j = 0 r γ j Ψ 0 ( β j | λ 0 ) + ( 1 γ j ) Ψ 1 ( β j | λ 1 ) ,
where the Laplace density Ψ 0 ( β j | λ 0 ) = λ 0 2 exp ( λ 0 | β j | ) and Ψ 1 ( β j | λ 1 ) = λ 1 2 exp ( λ 1 | β j | ) with precision parameters λ 0 and λ 1 satisfying that λ 0 λ 1 , the indicator variable set γ = { γ j | j = 0 , 1 , , r } , the jth variable is active when γ j = 0 , and inactive otherwise. Similarly to [27], the Laplace distribution for the regression coefficient β j can be represented as a mixture of a normal distribution and an exponential distribution, specifically the distribution of β j can be expressed as a hierarchical structure, as follows:
β j | h 0 j 2 , h 1 j 2 , γ j ind . γ j N 0 , h 0 j 2 + 1 γ j N 0 , h 1 j 2 , h 0 j 2 | λ 0 2 i . i . d . Exp λ 0 2 2 , h 1 j 2 | λ 1 2 i . i . d . Exp λ 1 2 2 , γ j i . i . d . B π γ ,
where B π γ denotes the Bernoulli distribution, with π γ being the probability that indicator variable γ j equals one for j = 0 , 1 , , r , and specifies the prior of π γ as a Beta distribution Be a π γ , b π γ , where hyperparameters a π γ and b π γ , λ 0 2 , and λ 1 2 are regularization parameters to identify important variables, for which we consider the following conjugate priors:
λ 0 2 Ga ν λ 0 , 1 , λ 1 2 Ga ν λ 1 , 1 ,
where Ga a , b denotes the gamma distribution with shape parameter a and scale parameter b. As mentioned above, λ 0 and λ 1 should satisfy that λ 0 λ 1 , to this end, we select hyperparmeters ν λ 0 and ν λ 1 to satisfy that ν λ 0 ν λ 1 . The prior of scale parameter σ in (5) is an inverse gamma distribution IG a σ , b σ of the hyperparameters a σ = 1 and b σ = 0.01 in the paper leading to almost non-informative prior.
Under a Bayesian statistical paradise, based on the above priors and likelihood of quantile regression, it is required to induce the following posterior distribution π ( θ | D ) p ( θ , D ) , where θ = { β , z , σ , γ , h 0 2 , h 1 2 , λ 0 2 , λ 1 2 , π γ } , the latent variable set z = { z i | i = 1 , , n } , h 0 2 = { h 0 j 2 | j = 0 , 1 , , r } , h 1 2 = { h 1 j 2 | j = 0 , 1 , , r } , observing set D = { y , x } with the response set y = { y i | i = 1 , , n } and covariate set x = { x i | i = 1 , , n } . Based on the hierarchical structure (5) of the quantile regression likelihood p ( y i | x i , β ) and the hierarchical structure (7) of the spike-and-slab prior to regression coefficient vector β , we derive the joint density
p ( θ , D ) = i = 1 n N ( y i | x i β + k 1 z i , k 2 σ z i ) Exp ( z i | σ 1 ) Ga ( λ 0 2 | ν λ 0 , 1 ) Ga ( λ 1 2 | ν λ 1 , 1 ) × j = 0 r N ( β j | 0 , h 0 j 2 ) γ j N ( β j | 0 , h 1 j 2 ) ( 1 γ j ) Exp ( h 0 j 2 | λ 0 2 2 ) Exp ( h 1 j 2 | λ 1 2 2 ) × j = 0 r B ( γ j | 1 , π γ ) Be ( π γ | a π γ , b π γ ) IG ( σ | a σ , b σ ) i = 1 n ( k 2 σ z i ) 1 2 exp ( y i x i β k 1 z i ) 2 2 k 2 σ z i 1 σ exp z i σ ( λ 0 2 ) v λ 0 1 exp ( λ 0 2 ) × ( λ 1 2 ) ( v λ 1 1 ) exp ( λ 1 2 ) j = 0 r 1 h 0 j 2 exp β j 2 2 h 0 j 2 γ j 1 h 1 j 2 exp β j 2 2 h 1 j 2 ( 1 γ j ) × λ 0 2 2 exp λ 0 2 2 h 0 j 2 λ 1 2 2 exp λ 1 2 2 h 1 j 2 j = 0 r π γ γ j ( 1 π γ ) 1 γ j π γ a π γ 1 ( 1 π γ ) b π γ 1 × b σ a σ Γ ( a σ ) σ a σ 1 exp b σ σ .
Although sampling from the aforementioned posterior is simple, it becomes increasingly time-consuming for higher dimension quantile models. To tackle this issue, we developed a faster and more efficient alternative method based on variational Bayesian.

2.3. Quantile Regression with a Spike-and-Slab Lasso Penalty Based on Variational BAYESIAN

At present, the most commonly variational Bayesian approximation posterior distribution methods use mean field approximation theory [28], which has the highest efficiency among variational methods, especially for those parameters or parameter block with conjugate priors. Bayesian quantile regression needs to take into account that the variance of each observation value is different, and each y i corresponds to a potential variable z i , which will result in the algorithm efficiency of quantile regression being lower than that of general mean regression. Therefore, in this paper, we use the variational Bayesian algorithm of the mean field approximation, which is the most efficient algorithm, to derive the quantile regression model with the spike-and-slab lasso penalty.
Based on variational theory, we choose a densities for random variables θ from variational family F , which having the same support Θ as the posterior density π ( θ | D ) . We approximate the posterior density π ( θ | D ) by any variational density q ( θ ) F . The variational Bayesian method is to seek the optimal approximation to π ( θ | D ) by minimizing the Kullback-Leibler divergence between q ( θ ) and π ( θ | D ) , which is an optimization problem that can be expressed as:
q * ( θ ) = argmin q ( θ ) F KL q ( θ ) π ( θ | D ) ,
where KL q ( θ ) π ( θ | D ) = Θ q ( θ ) log q ( θ ) π ( θ | D ) d θ , which is not less than zero and equal to zero if, and only if, q ( θ ) π ( θ | D ) . The posterior density π ( θ | D ) = p ( θ , D ) p ( D ) with the joint distribution p ( θ , D ) of parameter θ and data D and the marginal distribution p ( D ) data D . Since p ( D ) = Θ p ( θ , D ) d θ , which does not have an analytic expression for our considered model, it is rather difficult to implement the optimization problem presented above. It is easy to induce that
log p ( D ) = KL q ( θ ) π ( θ | D ) + L { q ( θ ) } L { q ( θ ) } ,
in which the evidence lower bound (ELOB) L { q ( θ ) } = E q ( θ ) log p ( θ , D ) E q ( θ ) log q ( θ ) with E q ( θ ) · representing the expectation taken with respect to the variational density q ( θ ) . Thus, minimizing KL q ( θ ) π ( θ | D ) is equivalent to maximizing L { q ( θ ) } because log p ( D ) does not depend on θ . That is,
q * ( θ ) = argmin q ( θ ) F KL q ( θ ) π ( θ | D ) = argmax q ( θ ) F L { q ( θ ) } ,
which indicates that seeking the optimal approximation problem of π ( θ | D ) becomes maximizing L { q ( θ ) } under the variational family F . The complexity of the approximation problem heavily is related to the variational family F . Therefore, choosing a comparatively simple variational family F to optimize the objective function L { q ( θ ) } with respect to q ( θ ) is fascinating.
Following the commonly used approach to choosing a tractable variational family F in the variational studies, we consider the frequently-used mean-field theory, which assumes that blocks of θ are mutually independent and each is measured by the parameters of the variational density. Obviously, the variational density q ( θ ) is assumed to be factorized across the blocks of θ :
q ( θ ) = j = 0 r q 1 ( β j , γ j ) q 2 ( h 0 j 2 ) q 3 ( h 1 j 2 ) i = 1 n q 4 ( z i ) q 5 ( λ 0 2 ) q 6 ( λ 1 2 ) q 7 ( π γ ) q 8 ( σ ) s 8 q s ( θ s ) ,
in which form of each variational densities q s ( θ s ) ’s is unknown, but the above assumed factorization across components is predetermined. Moreover, the best solutions for q s ( θ s ) ’s are to be achieved by maximizing L { q ( θ 1 , , θ 8 ) } with respect to variational densities q 1 ( θ 1 ) , , q 8 ( θ 8 ) by the coordinate ascent method, where θ = { θ 1 , , θ 8 } where θ s can be either a scalar or a vector. This means that when the correlation between several unknown parameters or potential variables cannot be ignored, they should be put in the same block and merged into θ s . Following the idea of the coordinate ascent method given in ref. [29], when fixing other variational factors q k ( θ k ) for k s , i.e., the optimal density q s * ( θ s ) , which maximizes L { q ( θ ) } with respect to q s ( θ s ) , is shown to take the form
q s * ( θ s ) exp E θ s log p ( θ , D ) ,
where log p ( θ , D ) is the logarithm of the joint density function and E s [ · ] is the expectation taken with respect to the density k s q k ( θ k ) for s = 1 , , 8 .
According to Equations (8) and (9), we can derive the variational posterior for each parameter as follows (see Appendix A for the details):
β j i . i . d . μ γ j N μ 0 j , σ 0 j 2 + ( 1 μ γ j ) N μ 1 j , σ 1 j 2 , for j = 1 , , r , h 0 j 2 i . i . d . μ γ j GIG 1 2 , E λ 0 2 ( λ 0 2 ) , σ 0 j 2 + μ 0 j 2 + ( 1 μ γ j ) Exp 1 2 E λ 0 2 ( λ 0 2 ) , h 1 j 2 i . i . d . ( 1 μ γ j ) GIG 1 2 , E λ 1 2 ( λ 1 2 ) , σ 1 j 2 + μ 1 j 2 + μ γ j Exp 1 2 E λ 1 2 ( λ 1 2 ) , λ 0 2 Ga r + 1 + ν λ 0 , 1 + 1 2 j = 1 r E h 0 j 2 ( h 0 j 2 ) , λ 1 2 Ga r + 1 + ν λ 1 , 1 + 1 2 j = 1 r E h 1 j 2 ( h 1 j 2 ) , γ j i . i . d . B 1 , ( 1 + e ζ i ) 1 , for j = 0 , 1 , , r , π γ Be a + j = 0 r μ γ j , r + 1 + b j = 0 r μ γ j , z i i . i . d . GIG 1 2 , a z i , b z i , for i = 1 , 2 , , n , σ IG 3 2 n + a σ , c σ ,
where GIG ( · , · , · ) denotes generalized inverse Gaussian distribution,
μ 0 j = σ o j 2 E σ ( σ 1 ) k 2 i = 1 n E z i z i 1 y i x i , ( j ) μ β ( j ) k 1 x i j , σ 0 j 2 = E h 0 j 2 | γ j = 1 h 0 j 2 + E σ ( σ 1 ) k 2 i = 1 n x i j 2 E z i z i 1 1 , μ 1 j = σ 1 j 2 E σ ( σ 1 ) k 2 i = 1 n E z i z i 1 y i x i , ( j ) μ β ( j ) k 1 x i j , σ 1 j 2 = E h 1 j 2 | γ j = 0 ( h 1 j 2 ) + E σ ( σ 1 ) k 2 i = 1 n x i j 2 E z i z i 1 1 ,
and
ζ j = E π γ ( log ( 1 π γ ) ) E π γ ( log π γ ) + 1 2 E h 0 j 2 | γ j = 1 ( log h 0 j 2 ) 1 2 E h 1 j 2 | γ j = 0 ( log h 1 j 2 ) + 1 2 { log σ 1 j 2 log σ 0 j 2 } + 1 2 ( log σ 1 j 2 log σ 0 j 2 ) + 1 2 ( μ 1 j 2 ( σ 1 j 2 ) 1 μ 0 j 2 ( σ 0 j 2 ) 1 ) , a z i = E σ ( σ 1 ) k 2 k 1 2 + 2 k 2 , b z i = E σ ( σ 1 ) k 2 [ ( y i x i μ β ) 2 + x i Σ β x i ] , c σ = 1 2 k 2 i = 1 n { k 1 2 μ z i + E z i ( z i 1 ) [ ( y i x i μ β ) 2 + x i Σ β x i ] 2 k 1 ( y i x i μ β ) } + i = 1 n μ z i + b σ .
In the above equation, μ λ 0 2 = E λ 0 2 ( λ 0 2 ) , and μ λ 1 2 , μ β j , μ γ j , μ z i are similar to μ λ 0 2 , μ β = ( μ β 0 , μ β 1 , , μ β p ) with μ β j = μ γ j μ 0 j + ( 1 μ γ j ) μ 1 j , and μ β ( j ) is a p × 1 vector with the jth component of vector μ β deleted, Σ β = diag σ β 0 2 σ β 1 2 , σ β p 2 with σ β j 2 = μ γ j σ 0 j 2 + ( 1 μ γ j ) σ 1 j 2 + μ γ j ( 1 μ γ j ) ( μ 0 j μ 1 j ) 2 .
In the section above, we derived the variational posterior of each parameter. Using the idea of coordinate axis optimization, we can update each variational distribution iteratively until it converges.
For this reason, we list the variational Bayesian spike-and-slab lasso quantile regression (VBSSLQR) algorithm as shown in Algorithm 1:
Algorithm 1 Variational Bayesian spike-and-slab lasso quantile regression (VBSSLQR).
Input:
 Data y , predictors x , prior parameters ν λ 0 = 10 4 , ν λ 1 = 1 , a σ = 1 , b σ = 0.01 , a = b = 1 ,
 precision ϵ = 0.01 and quantile τ ;
Output:
 Optimized variational parameters E β j ( β j ) , Var β j ( β j ) for j = 0 , 1 , , r and the corresponding Bayesian confidence interval.
 Initialize: μ β ( 0 ) ; E h 0 j 2 | γ j = 1 ( 0 ) ( 1 h 0 j 2 ) = 0.01 ; E h 1 j 2 | γ j = 0 ( 0 ) ( 1 h 1 j 2 ) = 1 ; μ γ ( 0 ) = 0.5 ;
       E λ 0 2 ( 0 ) ( λ 0 2 ) = 100 ; E λ 1 2 ( 0 ) ( λ 1 2 ) = 1 ; E π γ ( 0 ) ( log π γ ) = 0 ; E σ ( 0 ) ( σ 1 ) = 1 ;
       E π γ ( 0 ) ( log ( 1 π γ ) ) = 1 ; γ ( 0 ) = 0 ; E z i ( 0 ) ( z i ) = E z i ( 0 ) ( z i 1 ) ;
while  | d ( t ) | > ε  do
 
  for  j = 0 to r do
   Update σ 2 0 j ( t + 1 ) , μ 0 j ( t + 1 ) , σ 2 1 j ( t + 1 ) and μ 1 j ( t + 1 ) according to Equation (10).
   Update E h 0 j 2 | γ j = 1 ( t + 1 ) ( h 0 j 2 ) , E h 0 j 2 ( t + 1 ) ( h 0 j 2 ) , E h 0 j 2 | γ j = 1 ( t + 1 ) ( h 0 j 2 ) , E h 0 j 2 | γ j = 1 ( t + 1 ) ( log h 0 j 2 ) according to q ( h 0 j 2 ) ,
   Update E h 1 j 2 | γ j = 0 ( t + 1 ) ( h 1 j 2 ) , E h 1 j 2 ( t + 1 ) ( h 1 j 2 ) , E h 1 j 2 | γ j = 0 ( t + 1 ) ( h 1 j 2 ) , E h 1 j 2 | γ j = 0 ( t + 1 ) ( log h 1 j 2 ) according to q ( h 1 j 2 ) ,
   Update E γ j ( t + 1 ) ( γ j ) according to the variational posterior q ( γ j ) ,
   Update E β j ( t + 1 ) ( β j ) and Var β j ( t + 1 ) ( β j ) ,
  end for
  Update E λ 0 2 ( t + 1 ) ( λ 0 2 ) according to the variational posterior q ( λ 0 2 ) ,
  Update E λ 0 2 1 ( t + 1 ) ( λ 1 2 ) according to the variational posterior q ( λ 1 2 ) ,
  Update E π γ ( t + 1 ) ( π γ ) , E π γ ( t + 1 ) ( log π γ ) and E π γ ( t + 1 ) ( log ( 1 π γ ) ) according to q ( π γ ) ,
  for  i = 1 to n do
   Update E z i ( t + 1 ) ( z i ) and E z i ( t + 1 ) ( z i 1 ) according to the variational posterior q ( z i ) ,
  end for
  Update E σ ( t + 1 ) ( σ ) and E σ ( t + 1 ) ( σ 1 ) according to the variational posterior q ( σ ) ,
   | d ( t + 1 ) | = max | θ q 1 ( t + 1 ) θ q 1 ( t ) | , . . . , | θ q m ( t + 1 ) θ q m ( t ) | ,
end while
In Algorithm 1 above, Ψ ( · ) is the digamma function and the expectation E h 0 j 2 | γ j = 1 ( t + 1 ) ( log h 0 j 2 ) of log h 0 j 2 with respect to generalized inverse Gaussian distribution. Thus, we assume that x GIG ( p , a , b ) , then:
E x ( log x ) = d K p ( a b ) d p K p ( a b ) 1 2 ln ( a b ) ,
where K p ( · ) represents the Bessel function of the second kind. Note that there is no analytic solution or function to the differential of the modified Bessel function. Therefore, we approximate E x ( log x ) using the second-order Taylor expansion of log x . This paper lists the expectations of some parameter functions about variational posteriors involved in Algorithm 1; see Appendix B for details. Based on our proposed VBSSLQR algorithm, in the next section, we randomly generate high-dimensional data, conduct simulation research, and compare the performance with other methods. Notably, the asymptotic variance of the quantile regression is reciprocal proportional to the density of the errors at the quantile point. In cases where n is small and we estimate extreme quantiles, the correlating asymptotic variance will be large, resulting in less precise estimates [23]. Therefore, the regression coefficient is difficult to estimate at an extreme quantile and this is feasible when the sample sizes needs to be increased appropriately.

3. Simulation Studies

In this section, we used simulated high-dimensional data with a sample size n = 200 and variable number r = 500 , in order to study the performance of VBSSLQR and compare it with existing methods, including linear regression with a lasso penalty (Lasso), linear regression with an adaptive lasso penalty (ALasso), quantile regression with a lasso penalty (QRL), quantile regression with an adaptive lasso penalty (QRAL), Bayesian regularized quantile regression with a lasso penalty (BLQR), and Bayesian regularized quantile regression with an adaptive lasso penalty (BALQR). The data in the simulation studies were generated using Equation (1), in which the covariate vector x i was randomly generated from the multivariate normal distribution N ( 0 , Σ ) with the ( k , l ) th of Σ being 0 . 5 | k l | . Among these covariates, we only considered the ten important explanatory variables that have significant impact on the dependent variable. We set the 1, 51, 101, 151, 201, 251, 301, 351, 401, and 451 predictors to be active, and their regression coefficients are −3, −2.5, −2, −1.5, −1, 1, 1.5, 2, 2.5, 3, and the rest are zero. In addition, we discuss the performance of various approaches in the case of two types of random error; namely, independent and identically distributed (i.i.d.) random errors and heterogeneous random errors.

3.1. Independent and Identically Distributed Random Errors Random

In this subsection, with reference to [19,23], we set the random errors ε i ’s in Equation (1) to be independently and identically distributed and consider the following five different distributions with τ quantile being zero:
  • The error ε i i . i . d . N ( μ , 1 ) with μ being the τ quantile of N ( 0 , 1 ) , for i = 1 , , n ;
  • The error ε i i . i . d . Laplace ( μ , 1 ) with μ being the τ quantile of Laplace ( 0 , 1 ) , where Laplace ( a , b ) denotes the Laplace distribution with location parameter a and scale parameter b;
  • The error ε i i . i . d . 0.1 N ( μ 1 , 9 ) + 0.9 N ( μ 2 , 1 ) with μ 1 and μ 2 respectively being the τ quantile of N ( 0 , 9 ) and N ( 0 , 1 ) ;
  • The error ε i i . i . d . 0.1 Laplace ( μ 1 , 9 ) + 0.9 Laplace ( μ 2 , 1 ) with μ 1 and μ 2 respectively being the τ quantile of Laplace ( 0 , 9 ) and Laplace ( 0 , 1 ) ;
  • The error ε i i . i . d . Cauchy ( μ , 0.2 ) with μ being the τ quantile of Cauchy ( 0 , 0.2 ) , where Cauchy ( a , b ) denotes the Cauchy distribution with location parameter a and scale parameter b;
For all of the above error distributions with any τ ( 0.3 , 0.5 , 0.7 ) , we ran 500 replications for each method, and evaluated the performance using two criteria. The first criterion was the median of mean absolute deviations (MMAD), which quantifies the general distance between the estimated conditional quantile and the true conditional quantile. Specifically, the mean absolute deviation (MAD) in any replication is defined as 1 n i = 1 n | x i β x i β ^ τ ) | , where β ^ τ is the estimate of regression coefficient β given τ . The second criterion of the mean of true positives (TP) and false positives (FP) was selected for each method.
Table 1 shows the median and standard deviation (SD) of MADs estimated using each method for simulations with homogeneous errors. It is clear that our method was either optimal (bold) in all cases, especially in quantile τ = 0.3 , 0.7 , or that our approach was significantly superior to the other six methods. When the error distribution was normal, a Laplace and normal mixture, the MMAD of the BALQR method was suboptimal. When the error distribution was Cauchy, the MMAD of the QRL approach was suboptimal, but Table 2 shows that this sacrificed the complexity of the model. When the error distribution was a Laplace mixture, the MMAD of the lasso approach was suboptimal; however, it selected an overfitting quantile regression model with about 30 FP variables, as can be seen from Table 2. It is particularly important to note that, in the case of high dimensional data, the MMAD for the quantile regression model with the lasso penalty or adaptive lasso penalty is not less than the MMAD for general linear models with a lasso penalty. Therefore, it is inappropriate to use a lasso penalty or adaptive lasso in this case.
In order to show the results of variable selection more intuitively, we introduced TP (true positives) and FP (false positives), to calculate the mean of TP and FP of 500 repeated simulations, respectively. Detailed results are shown in Table 2 below:
It is possible to conclude from the results in Table 2 that the lasso method can generally select all true active variables, but it fits many false active variables; and when the random error is a Cauchy distribution, it cannot select all true active variables; A lasso approach can identify true active variables, but there are also some misjudged behaviors, especially when the random error distribution is a Cauchy distribution; Although the QRL approach can identify real active variables, it still contains many false active variables, and the model it identifies has a high complexity; the QRAL method cannot identify all true active variables, and incorrectly identifies some inactive variables; In the case of high-dimensional data, the BLQR approach cannot select all the true active variables, but it also does not incorrectly select inactive variables. The true active variable selected using the BALQR method is better than using the BLQR method, but some inactive variables are incorrectly selected, especially when the random error distribution is the Cauchy distribution. Our VBSSLQR method not only had the smallest MMAD, but also could select true active variables and eliminate most false active variables. As a whole, our method was superior to the other six methods for variable selection, especially in quantile τ = 0.3 , 0.7 , performing significantly better than BLQR, BALQR, and QRL.

3.2. Heterogeneous Random Errors

Now we consider the case of heterogeneous random errors, to demonstrate the performance of our method. In this subsection, the data were generated from the following model:
y i = x i β + ( 1 + u i ) ε i ,
where u i i . i . d . U ( 0 , 1 ) , in which U ( a , b ) represents the uniform distribution with support set ( a , b ) . The design matrix x i was generated in the same way as above, and the regression coefficient β was set as before. Furthermore, in the simulation study, we combined x i and u i ; that is, u i was also a covariate. Finally, random error ε i was also generated from the five different distributions defined in Section 3.1.
We also studied the performance of the quantile τ ( 0.3 , 0.5 , 0.7 ) under the different methods and simulated 500 times, to calculate the MMAD and mean of TP/FP. We list the experimental results in Table 3 and Table 4.
For heterogeneous random errors, our approach was still the best, similarly to for i.i.d. random errors. It is noteworthy that the VBSSLQR method was more robust than the other methods, and our method had the smallest MMAD change compared to the case of i.i.d. random errors. We can see that the MMAD of our method remained basically unchanged, while the MMAD of the other six methods differed by more than 0.5 compared to the i.i.d. random errors in some states.
We also investigated the mean of TP and FP for heterogeneous random errors under different quantiles τ . The results are listed in Table 4, which shows that the effect of variable selection of heterogeneous random errors was slightly lower than the effect of variable selection of i.i.d. random errors under the same sample size, but our method still provided the best selection results.
We also calculated the mean execution times of various Bayesian quantile regressions under different quantile τ for different distributions of random errors, and list the results in Table A1 of Appendix C, which illustrates that our proposed VBSSLQR approach was a lot more efficient than BLQR and BALQR, for which we sampled MCMC 1000 times and discarded the first 500 times, and made a statistical inference based on 500 samples (the experimental study shows that the algorithm converged after 500 samples). In order to illustrate the feasibility of applying our proposed VBSSLQR to cases with smaller effect sizes, we changed the above active predictors to 1, while the other settings remained unchanged. The performance of our proposed method is shown in Table A2, which illustrates that the results were not significantly different from those of Table 1 and Table 2 when random errors had an independent identical distribution, and slightly worse than those of Table 3 and Table 4 when the random errors were heterogeneous.

4. Examples

In this section, we analyzed a real dataset containing information about crime in various cities across the United States. This dataset is accessible from the University of Irvine machine learning repository (http://archive.ics.uci.edu/ml/datasets/communities+and+crime, accessed on 1 May 2023). We calculated the per capital rate of violent crimes by dividing the total number of violent crimes by the population of each city. The violent crimes considered in our analysis were those classified as murder, rape, robbery, and assault, as per United States law. The observed individuals were communities. The dataset has 116 variables, where the first four columns are response, name of the community, code of county, and code of community; the middle features are demographic information about each community, such as population, age, race, and income; and the final columns are regions. According to the source: “the data combines socio-economic data from the 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crime data from the 1995 FBI UCR”. This dataset has been applied for quantile regression [30]. The dataset is available at
Our dependent variable of interest was the murder rate of each city, denoted as y i for the ith city. As the murder rate denotes the most dangerous violent crimes, we choose this variable. Studying factors correlated with the response dependent variable is of significant importance for the public and law enforcement agencies.
To adapt the data to our model, we preprocessed the data as follows:
  • Delete columns from the data set that contain missing data.
  • Delete the data when the response variable y i equals 0, because this is not an issue of interest to us.
  • Transform y i : y i = log y i 1 y i and let y i be the new response variable.
  • Convert some qualitative variables into quantitative variables.
  • Standardized covariates.
After the above data preprocessing, we obtained 1060 observation objects and 95 covariates in the training set, and we obtained 122 observation objects and 95 covariates in the testing set. We implemented a quantile regression model between the 96 predictors (including intercept) and the response y i with different quantiles.
In this section, we compare QRL, QRAL, BLQR, and BALQR with our method VBSSLQR for real datasets, all with quantile penalty regression. Under different quantiles τ ( 0.1 , 0.3 , 0.5 , 0.7 , 0.9 ) , we compared the performance of the different approaches. We counted the root mean squared error (RMSE) of each method under each quantile τ and the number of selected active variables, to evaluate the performance of each approach on the test set, where the RMSE was evaluated using RMSE = 1 n i = 1 n ( y i y ^ τ i ) 2 with y ^ τ i being the fitted value of response y i under quantile τ . Finally, the results are listed in Table 5 below.
To visually show the results listed in Table 4, the best results are in bold under each quantile τ . Clearly, our method performed better than the other methods for all quantiles, and the active variables selected by our method were suitable, which means that our method was very competitive compared with the other methods. The BLQR and BALQR approaches could only identify the intercept, and they could not identify the variables that really affected the response quantile. The QRL and QRAL methods were prone to overfitting, because they recognized too many active variables. Finally, the efficiency of our method with real data was also significantly higher than that of the other approaches. Although BLQR, QRL, and our proposed method had the same RMSE performance at τ = 0.1 , 0.3 , BLQR showed underfitting and QRL showed overfitting. Thus, we believe there is sufficient evidence to show that our method was very competitive with the other approaches.
Similarly to [30], we list the active variables selected under each quantile in Table 6. Thus, Table 6 shows the variable selection of our proposed method with real data.
In the above table, our method selected only a small number of predictors at each quantile level. Notably, only the variable “NumInShelters” had an impact on the response at all quantiles; the variables “FemalePctDiv” and “TotalPctDiv” had an impact on the response at τ = 0.1 , 0.3 , 0.5 , 0.7 ; the variable “PctVacantBoarded” had an impact on the response at τ = 0.5 , 0.7 , 0.9 ; the variable “racePctWhite” had an impact on the response at τ = 0.5 , 0.7 , 0.9 ; and the other variables affected a few quantiles of response. Therefore, the five variables selected from the quantile regression model were
  • PctVacantBoarded: percentage of households that are vacant and boarded up to prevent vandalism.
  • NumInShelters: number of shelters in the community.
  • FemalePctDiv: percentage of females who are divorced.
  • TotalPctDiv: percentage of people who are divorced.
  • racePctWhite: percentage of people of white race.
In order to obtain a better understanding of these five common variables, we plot their correlation to the response.
Figure 1 depicts the relationship between the five common variables and the murder rate. Only the correlation between the NumInShelters and murder rate is not obvious, with MalePctDivorce, RentLowQ, racePctWhite, and PctVacantBoarded significantly affecting the murder rate. The variable racePctWhite and murder rate are negatively correlated, while FemalePctDiv, TotalPctDiv, PctVacantBoarded, and murder rate are positively correlated. This result was in accordance with the practical situation, and it can be seen that our results were basically consistent with that of [30]. Our method could more comprehensively select important variables under the same quantile. Therefore, the percentage of females who are divorced, percentage of persons who are divorced, the percentage of households that are vacant and boarded up to prevent from vandalism, and the percentage of people with white race affect the murder rate.

5. Conclusions

In this paper, we propose variational Bayesian spike-and-slab lasso quantile regression variable selection and estimation. This method applies spike-and-slab lasso prior to each regression coefficient β , thus punishing the regression coefficient β . The spike prior distribution we choose is a small variance Laplace distribution, while the slab prior distribution is a large variance Laplace distribution [26]. It is precisely because it punishes each regression coefficient β that it has a powerful variable selection function, but this also brings a problem of low efficiency, especially when introducing a spike-and-slab prior into quantile regression (note, the algorithm efficiency of quantile regression is inherently lower than that of general regression approaches). In order to solve the problem of inefficiency (influenced by both the quantile regression and spike-and-slab lasso prior) and to make the algorithm feasible. We introduced a variational Bayesian method, to approximate the posterior distribution of each parameter. The simulation studies and real data analyses illustrated that the quantile regression with the spike-and-slab lasso penalty based on variational Bayesian method performed effectively and exhibited a robust competitiveness compared with other approaches (Bayesian method or non-Bayesian method, quantile regression, or nonquantile regression), especially in the case of high-dimensional data. In future research work, it would be significant to improve the interpretability and computational efficiency of ultra-high-dimensional quantile regression based on VB and dimension reduction techniques.

Author Contributions

Conceptualization, D.D. and A.T.; methodology, A.T. and D.D.; software, D.D. and J.Y.; validation, A.T., D.D. and J.Y.; formal analysis, D.D. and A.T; investigation, D.D., J.Y. and A.T.; Preparation of the original work draft, A.T. and D.D.; visualization, D.D. and J.Y.; supervision, funding acquisition, D.D., A.T. and J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 11961079), Yunnan Univeristy Multidisciplinary Team Construction Projects in 2021, and Yunnan University Quality Improvement Plan for Graduate Course Textbook Construction (CZ22622202).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The research data are available on the website http://archive.ics.uci.edu/ml/datasets/communities+and+crime, accessed on 1 May 2023.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Deduction

Based on the mean-field variational Formula (9), from the joint density (8), it is required to induce the following variational distributions.
q 1 ( β j , γ j ) exp E β j , γ j i = 1 n log N ( y i | x i β + k 1 z i , k 2 σ z i ) + γ j log N ( β j | 0 , h 0 j 2 ) + ( 1 γ j ) log N ( β j | 0 , h 0 j 2 ) + log B ( γ j | 1 , π γ ) ] exp 1 2 k 2 E σ ( σ 1 ) i = 1 n E z i ( z i 1 ) x i j 2 β j 2 + 2 i = 1 n x i j ( E z i ( z i 1 ) ( y i x i , ( j ) μ β ( j ) ) k 1 ) β j ] γ j 2 E h 0 j 2 ( log 2 π h 0 j 2 ) γ j 2 E h 0 j 2 ( 1 h 0 j 2 ) β j 2 ( 1 γ j ) 2 E h 1 j 2 ( log 2 π h 1 j 2 ) ( 1 γ j ) 2 E h 0 j 2 ( 1 h 1 j 2 ) β j 2 + γ j E π γ ( log π γ ) + ( 1 γ j ) E π γ ( log ( 1 π γ ) ) exp 1 2 k 2 E σ ( σ 1 ) i = 1 n E z i ( z i 1 ) x i j 2 γ j 2 E h 0 j 2 ( 1 h 0 j 2 ) ( 1 γ j ) 2 E h 0 j 2 ( 1 h 1 j 2 ) β j 2 1 k 2 i = 1 n x i j ( E z i ( z i 1 ) ( y i x i , ( j ) μ β ( j ) ) k 1 ) β j + γ j E π γ ( log π γ ) + ( 1 γ j ) E π γ ( log ( 1 π γ ) ) γ j 2 E h 0 j 2 ( log 2 π h 0 j 2 ) ( 1 γ j ) 2 E h 1 j 2 ( log 2 π h 1 j 2 ) ,
where μ β ( j ) = E β ( j ) ( β ( j ) ) . therefore, given γ j = k , for k = 0 and 1, then
q 1 ( β j | γ j = k ) exp E β j | γ j = k i = 1 n log N ( y i | x i β + k 1 z i , k 2 σ z i ) + k log N ( β j | 0 , h 0 j 2 ) + ( 1 k ) log N ( β j | 0 , h 0 j 2 ) ] ,
if γ j = 0 then
q 1 ( β j | γ j = 0 ) exp 1 2 k 2 E σ ( σ 1 ) i = 1 n E z i ( z i 1 ) x i j 2 1 2 E h 1 j 2 | γ j = 0 ( 1 h 1 j 2 ) β j 2 1 k 2 i = 1 n x i j ( E z i ( z i 1 ) ( y i x i , ( j ) μ β ( j ) ) k 1 ) β j , β j | γ j = 0 i . i . d . N ( μ 1 j , σ 1 j 2 ) ,
if γ j = 1 then
q 1 ( β j | γ j = 1 ) exp 1 2 k 2 E σ ( σ 1 ) i = 1 n E z i ( z i 1 ) x i j 2 1 2 E h 0 j 2 | γ j = 1 ( 1 h 0 j 2 ) β j 2 1 k 2 i = 1 n x i j ( E z i ( z i 1 ) ( y i x i , ( j ) μ β ( j ) ) k 1 ) β j , β j | γ j = 1 i . i . d . N ( μ 0 j , σ 0 j 2 ) ,
therefore β j i . i . d . k = 0 1 p ( γ j = k ) N ( μ k j , σ k j 2 ) , for j = 0 , 1 , , r , where
μ k j = σ k j 2 E σ ( σ 1 ) k 2 i = 1 n E z i ( z i 1 ) ( y i x i , ( j ) μ β ( j ) ) k 1 x i j , σ k j 2 = E h k j 2 | γ j = 1 k ( h k j 2 ) + E σ ( σ 1 ) k 2 i = 1 n x i j 2 E z i ( z i 1 ) 1 ,
for k = 1 and 2. similarly, if γ j = 1 then
q 2 ( h 0 j 2 | γ j = 1 ) exp E h 0 j 2 | γ j = 1 log N ( β j | 0 , h 0 j 2 ) + log Exp ( h 0 j 2 | λ 0 2 2 ) ( h 0 j 2 ) 1 2 exp E β j | γ j = 1 ( β j 2 ) 2 h 0 j 2 E λ 0 2 ( λ 0 2 ) 2 h 0 j 2 = ( h 0 j 2 ) 1 2 exp 1 2 μ 0 j 2 + σ 0 j 2 h 0 j 2 + E λ 0 2 ( λ 0 2 ) h 0 j 2 , h 0 j 2 | γ j = 1 i . i . d . GIG ( 1 2 , E λ 0 2 ( λ 0 2 ) , μ 0 j 2 + σ 0 j 2 ) ,
if γ j = 0 then
q 3 ( h 0 j 2 | γ j = 0 ) ( h 0 j 2 ) 1 2 exp 1 2 E λ 0 2 ( λ 0 2 ) h 0 j 2 , h 0 j 2 | γ j = 0 i . i . d . Exp ( 1 2 E λ 0 2 ( λ 0 2 ) ) ,
therefore
q 2 ( h 0 j 2 ) = p ( γ j = 1 ) GIG ( 1 2 , E λ 0 2 ( λ 0 2 ) , σ 0 j 2 + μ 0 j 2 ) + p ( γ j = 0 ) Exp ( 1 2 E λ 0 2 ( λ 0 2 ) ) .
similarly,
q 3 ( h 1 j 2 ) = p ( γ j = 0 ) GIG ( 1 2 , E λ 1 2 ( λ 1 2 ) , σ 1 j 2 + μ 1 j 2 ) + p ( γ j = 1 ) Exp ( 1 2 E λ 1 2 ( λ 1 2 ) ) ,
where μ k j and σ k j 2 are the same as above, for k = 1 and 0, from q ( β j , γ j ) then
q 1 ( β j , γ j = 1 ) = C exp 1 2 σ 0 j 2 β j 2 + μ 0 j σ 0 j 2 β j + E π γ ( log π γ ) 1 2 E h 0 j 2 | γ j = 1 ( log h 0 j 2 ) ,
integral out β j obtain
p ( γ j = 1 ) = q ( β j , γ j = 1 ) d β j = C σ 0 j 2 exp μ 0 j 2 ( 2 σ 0 j 2 ) 1 + E π γ ( log π γ ) 1 2 E h 0 j 2 | γ j = 1 ( log h 0 j 2 ) .
similarly,
p ( γ j = 0 ) = q ( β j , γ j = 0 ) d β j = C σ 1 j 2 exp μ 1 j 2 ( 2 σ 1 j 2 ) 1 + E π γ ( log ( 1 π γ ) ) 1 2 E h 1 j 2 | γ j = 0 ( log h 1 j 2 ) .
letting
ζ j = log p ( γ j = 0 ) log p ( γ j = 1 ) = E π γ ( log ( 1 π γ ) ) E π γ ( log π γ ) + 1 2 E h 0 j 2 | γ j = 1 ( log h 0 j 2 ) 1 2 E h 1 j 2 | γ j = 0 ( log h 1 j 2 ) + 1 2 ( log σ 1 j 2 log σ 0 j 2 ) + 1 2 ( μ 1 j 2 ( σ 1 j 2 ) 1 μ 0 j 2 ( σ 0 j 2 ) 1 ) ,
then
p ( γ j = 1 ) = 1 1 + e ζ i , p ( γ j = 0 ) = e ζ i 1 + e ζ i ,
therefore γ j i . i . d . B 1 , ( 1 + e ζ i ) 1 , for j = 0 , 1 , , r , where μ k j and σ k j 2 are the same as above, for k = 1 and 2.
q 5 ( λ 0 2 ) exp E λ 0 2 j = 0 r log Exp ( h 0 j 2 | λ 0 2 2 ) + log π ( λ 0 2 ) ( λ 0 2 ) ν λ 0 + r + 1 1 exp j = 0 r E h 0 j 2 ( h 0 j 2 ) 2 λ 0 2 λ 0 2 λ 0 2 Ga ( ν λ 0 + r + 1 , j = 0 r E h 0 j 2 ( h 0 j 2 ) 2 + 1 ) ,
similarly,
λ 1 2 Ga ( ν λ 1 + r + 1 , j = 0 r E h 1 j 2 ( h 1 j 2 ) 2 + 1 ) ,
where ν λ 0 and ν λ 1 are hyperparameters.
q 7 ( π γ ) exp E π γ j = 0 r log B ( γ j | 1 , π γ ) + log π ( π γ ) = exp j = 0 r E γ j ( γ j ) log π γ + ( r + 1 j = 0 r E γ j ( γ j ) ) log ( 1 π γ ) + ( a 1 ) log π γ + ( b 1 ) log ( 1 π γ ) } = ( π γ ) a 1 + j = 0 r E γ j ( γ j ) ( 1 π γ ) r + b + 1 1 j = 0 r E γ j ( γ j ) ,
therefore π γ Be ( a + j = 0 r E γ j ( γ j ) , r + b + 1 j = 0 r E γ j ( γ j ) ) where a, b are hyperparameters.
q 4 ( z i ) exp E z i log N ( y i | x i β + k 1 z i , k 2 σ z i ) + log E ( z i | σ 1 ) z i 1 2 1 exp 1 2 E σ ( σ 1 ) k 2 ( ( k 1 2 + 2 k 2 ) z i + ( y i x i μ β ) 2 + x i Σ β x i z i ) ,
therefore z i i . i . d . GIG ( 1 2 , a z i , b z i ) , for i = 1 , , n , where
a z i = E σ ( σ 1 ) k 2 ( k 1 2 + 2 k 2 ) , b z i = E σ ( σ 1 ) k 2 [ ( y i x i μ β ) 2 + x i Σ β x i ] ,
with k 1 and k 2 being constants and Σ β is a ( p + 1 ) × ( p + 1 ) diagonal matrix with jth entry being Var ( β j ) .
q 8 ( σ ) exp E σ i = 1 n log N ( y i | x i β + k 1 z i , k 2 σ z i ) + log E ( z i | σ 1 ) + log π ( σ ) ( 1 σ ) n 2 + n + a σ 1 exp 1 2 k 2 k 1 2 E z i ( z i ) + E z i ( z i 1 ) ( ( y i x i μ β ) 2 + x i Σ β x i ) 2 k 1 ( y i x i μ β ) σ 1 i = 1 n E z i ( z i ) σ 1 b σ σ 1 ,
therefore σ IG ( 3 2 n + a σ , c σ ) , where a σ , b σ are hyperparameters and
c σ = 1 2 k 2 i = 1 n { k 1 2 E z i ( z i ) + E z i ( z i 1 ) [ ( y i x i μ β ) 2 + x i Σ β x i ] 2 k 1 ( y i x i μ β ) } + i = 1 n E z i ( z i ) + b σ .

Appendix B. Expectation

The expectation of some parameter functions about variational posteriors:
Since β j i . i . d . k = 0 1 p ( γ j = k ) N ( μ k j , σ k j 2 ) , therefore
E β j ( β j ) = k = 0 1 p ( γ j = k ) E β j | γ j = k ( β j ) = k = 0 1 p ( γ j = k ) μ k j ,
Var β j ( β j ) = p ( γ j = 1 ) σ 0 j 2 + p ( γ j = 0 ) σ 1 j 2 + p ( γ j = 1 ) p ( γ j = 0 ) ( μ 0 j μ 1 j ) 2 ,
E β j | γ j = k ( β j 2 ) = μ k j 2 + σ k j 2 , k = 0 , 1 .
Since h 0 j 2 i . i . d . p ( γ j = 1 ) GIG ( 1 2 , E λ 0 2 ( λ 0 2 ) , σ 0 j 2 + μ 0 j 2 ) + p ( γ j = 0 ) Exp ( 1 2 E λ 0 2 ( λ 0 2 ) ) , therefore
E h 0 j 2 ( 1 h 0 j 2 ) = k = 0 1 p ( γ j = k ) E h 0 j 2 | γ j = k ( 1 h 0 j 2 ) ,
where E h 0 j 2 | γ j = 1 ( 1 h 0 j 2 ) = E λ 0 2 ( λ 0 2 ) ( E β j | γ j = 1 ( β j 2 ) ) 1 and E h 0 j 2 | γ j = 0 ( 1 h 0 j 2 ) = 2 [ E λ 0 2 ( λ 0 2 ) ] 1 ,
E h 0 j 2 | γ j = 1 ( h 0 j 2 ) = E β j | γ j = 1 ( β j 2 ) ( E λ 0 2 ( λ 0 2 ) ) 1 + ( E λ 0 2 ( λ 0 2 ) ) 1 .
Since h 1 j 2 i . i . d . p ( γ j = 0 ) GIG ( 1 2 , E λ 1 2 ( λ 1 2 ) , σ 1 j 2 + μ 1 j 2 ) + p ( γ j = 1 ) Exp ( 1 2 E λ 1 2 ( λ 1 2 ) ) , therefore
E h 1 j 2 ( 1 h 1 j 2 ) = k = 0 1 p ( γ j = k ) E h 1 j 2 | γ j = k ( 1 h 1 j 2 ) ,
where E h 1 j 2 | γ j = 1 ( 1 h 1 j 2 ) = 2 [ E λ 1 2 ( λ 1 2 ) ] 1 and E h 1 j 2 | γ j = 0 ( 1 h 1 j 2 ) = E λ 1 2 ( λ 1 2 ) ( E β j | γ j = 0 ( β j 2 ) ) 1 ,
E h 1 j 2 | γ j = 0 ( h 1 j 2 ) = E β j | γ j = 0 ( β j 2 ) ( E λ 1 2 ( λ 1 2 ) ) 1 + ( E λ 1 2 ( λ 1 2 ) ) 1 .
Since σ IG ( 3 2 n + a σ , c σ ) , therefore
E σ ( σ 1 ) = ( 3 2 n + a σ ) 1 c σ , E σ ( σ ) = ( 3 2 n + a σ 1 ) 1 c σ ,
Since z i i . i . d . GIG ( 1 2 , a z i , b z i ) , therefore
E z i ( z i 1 ) = a z i ( b z i ) 1 , E z i ( z i ) = b z i ( a z i ) 1 + ( a z i ) 1 .
Since λ k 2 Ga ( ν λ k + r + 1 , j = 0 r E h k j 2 ( h k j 2 ) 2 + 1 ) , for k = 0 and 1, therefore
E λ k 2 ( λ k 2 ) = ( r + 1 + ν λ k ) ( 1 + 1 2 j = 0 r E h k j 2 ( h k j 2 ) ) 1 .
Since π γ Be ( a + j = 0 r E γ j ( γ j ) , r + b + 1 j = 0 r E γ j ( γ j ) ) , therefore
E π γ ( log π γ ) = Ψ ( a + j = 0 r E γ j ( γ j ) ) Ψ ( a + b + r + 1 ) ,
E π γ ( log ( 1 π γ ) ) = Ψ ( r + b + 1 j = 0 r E γ j ( γ j ) ) Ψ ( a + b + r + 1 ) ,
E π γ ( π γ ) = ( a + j = 0 r E γ j ( γ j ) ) ( a + r + b + 1 ) 1 .
Since γ j i . i . d . B 1 , ( 1 + e ζ i ) 1 , therefore
E γ j ( γ j ) = p ( γ j = 1 ) = ( 1 + e ζ i ) 1 .

Appendix C. Efficiency Comparison between Bayesian Quantile Regression Methods

Table A1. Mean execution times of the various Bayesian quantile regression methods in the simulation.
Table A1. Mean execution times of the various Bayesian quantile regression methods in the simulation.
QuantileError DistributionMethod
i.i.d. Random ErrorsHeterogenous Random Errorss
BLQRBALQRVBSSLQRBLQRBALQRVBSSLQR
τ = 0.3 normal146.45 s144.56 s 18.06 s146.48 s174.88 s 28.77 s
Laplace145.26 s144.62 s 18.82 s145.86 s173.44 s 30.08 s
normal mixture144.52 s145.27 s 18.11 s143.48 s147.63 s 28.99 s
Laplace mixture145.86 s146.28 s 19.07 s145.82 s171.69 s 30.38 s
Cauchy145.24 s144.63 s 16.06 s145.89 s174.68 s 25.64 s
τ = 0.5 normal144.62 s144.52 s 15.30 s144.67 s174.39 s 25.23 s
Laplace144.54 s145.26 s 15.00 s144.57 s172.84 s 24.51 s
normal mixture144.62 s145.26 s 15.39 s144.63 s173.62 s 25.21 s
Laplace mixture147.68 s145.83 s 15.14 s147.62 s172.25 s 25.13 s
Cauchy145.21 s144.53 s 12.38 s145.86 s173.42 s 20.72 s
τ = 0.7 normal147.29 s144.37 s 18.74 s146.48 s173.46 s 31.05 s
Laplace144.62 s145.27 s 19.89 s144.64 s174.61 s 33.86 s
normal mixture145.84 s145.24 s 19.00 s145.27 s172.84 s 31.60 s
Laplace mixture147.57 s147.98 s 19.88 s147.68 s173.42 s 34.31 s
Cauchy145.87 s142.86 s16.87 s146.28 s173.45 s 28.12 s
The bold represents the optimal result in each scenario and “s” denotes seconds.

Appendix D. Simulation Studies for Cases with Smaller Effect Sizes

Table A2. The performance of our proposed method for cases with smaller effect sizes.
Table A2. The performance of our proposed method for cases with smaller effect sizes.
QuantileDistributionErrors
i.i.d. Random ErrorsHeterogenous Random Errorss
MMAD(sd)TPFPMMAD(sd)TPFP
τ = 0.3 normal0.21(0.05)10.000.130.33(0.10)10.000.25
Laplace0.25(0.08)10.000.150.42(0.34)9.560.37
normal mixture0.23(0.06)10.000.070.36(0.16)9.930.23
Laplace mixture0.29(0.10)9.990.160.98(0.50)8.320.38
Cauchy0.12(0.55)9.430.000.16(0.76)8.730.01
τ = 0.5 normal0.20(0.06)10.000.160.32(0.09)10.000.29
Laplace0.20(0.06)10.000.060.32(0.11)9.990.20
normal mixture0.22(0.06)10.000.100.34(0.10)10.000.21
Laplace mixture0.22(0.07)10.000.060.33(0.26)9.760.17
Cauchy0.07(0.50)9.550.000.10(0.69)9.000.00
τ = 0.7 normal0.21(0.06)10.000.130.34(0.10)10.000.37
Laplace0.25(0.08)10.000.150.40(0.18)9.910.38
normal mixture0.23(0.06)10.000.160.36(0.12)9.990.31
Laplace mixture0.27(0.09)10.000.170.43(0.30)9.660.33
Cauchy0.11(0.61)9.380.000.15(0.67)9.130.00

References

  1. Koenker, R.; Bassett, G. Regression quantile. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
  2. Buchinsky, M. Changes in the united-states wage structure 1963–1987—Application of quantile regression. Econometrica 1994, 62, 405–458. [Google Scholar] [CrossRef]
  3. Thisted, R.; Osborne, M.; Portnoy, S.; Koenker, R. The gaussian hare and the laplacian tortoise: Computability of squared-error versus absolute-error estimators—Comments and rejoinders. Stat. Sci. 1997, 12, 296–300. [Google Scholar]
  4. Koenker, R.; Hallock, K. Quantile regression. J. Econ. Perspect. 2001, 15, 143–156. [Google Scholar] [CrossRef]
  5. Yu, K.; Moyeed, R. Bayesian quantile regression. Stat. Probab. Lett. 2001, 54, 437–447. [Google Scholar] [CrossRef]
  6. Taddy, M.A.; Kottas, A. A bayesian nonparametric approach to inference for quantile regression. J. Bus. Econ. Stat. 2010, 28, 357–369. [Google Scholar] [CrossRef]
  7. Hu, Y.; Gramacy, R.B.; Lian, H. Bayesian quantile regression for single-index models. Stat. Comput. 2013, 23, 437–454. [Google Scholar] [CrossRef]
  8. Lee, E.R.; Noh, H.; Park, B.U. Model selection via bayesian information criterion for quantile regression models. J. Am. Stat. Assoc. 2014, 109, 216–229. [Google Scholar] [CrossRef]
  9. Frank, I.; Friedman, J. A statistical view of some chemometrics regression tools. Technometrics 1993, 35, 109–135. [Google Scholar] [CrossRef]
  10. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
  11. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  12. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser.-Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
  13. Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
  14. Koenker, R. Quantile regression for longitudinal data. J. Multivar. Anal. 2004, 91, 74–89. [Google Scholar] [CrossRef]
  15. Wu, Y.; Liu, Y. Variable selection in quantile regession. Stat. Sin. 2009, 19, 801–817. [Google Scholar]
  16. Park, T.; Casella, G. The bayesian lasso. J. Am. Stat. Assoc. 2008, 103, 681–686. [Google Scholar] [CrossRef]
  17. Leng, C.; Tran, M.N.; Nott, D. Bayesian adaptive Lasso. Ann. Inst. Stat. Math. 2014, 66, 221–244. [Google Scholar] [CrossRef]
  18. Li, Q.; Xi, R.; Lin, N. Bayesian regularized quantile regression. Bayesian Anal. 2010, 5, 533–556. [Google Scholar] [CrossRef]
  19. Alhamzawi, R.; Yu, K.; Benoit, D.F. Bayesian adaptive Lasso quantile regression. Stat. Model. 2012, 12, 279–297. [Google Scholar] [CrossRef]
  20. Ishwaran, H.; Rao, J. Spike and slab variable selection: Frequentist and bayesian strategies. Ann. Stat. 2005, 33, 730–773. [Google Scholar] [CrossRef]
  21. Ray, K.; Szabo, B. Variational bayes for high-dimensional linear regression with sparse priors. J. Am. Stat. Assoc. 2022, 117, 1270–1281. [Google Scholar] [CrossRef]
  22. Yi, J.; Tang, N. Variational bayesian inference in high-dimensional linear mixed models. Mathematics 2022, 10, 463. [Google Scholar] [CrossRef]
  23. Xi, R.; Li, Y.; Hu, Y. Bayesian quantile regression based on the empirical likelihood with spike and slab priors. Bayesian Anal. 2016, 11, 821–855. [Google Scholar] [CrossRef]
  24. Koenker, R.; Machado, J. Goodness of fit and related inference processes for quantile regression. J. Am. Stat. Assoc. 1999, 94, 1296–1310. [Google Scholar] [CrossRef]
  25. Tsionas, E. Bayesian quantile inference. J. Stat. Comput. Simul. 2003, 73, 659–674. [Google Scholar] [CrossRef]
  26. Rockova, V. Bayesian estimation of sparse signals with a continuous spike-and-slab prior. Ann. Stat. 2018, 46, 401–437. [Google Scholar] [CrossRef]
  27. Alhamzawi, R.; Ali, H.T.M. The bayesian adaptive lasso regression. Math. Biosci. 2018, 303, 75–82. [Google Scholar] [CrossRef]
  28. Parisi, G.; Shankar, R. Statistical field theory. Phys. Today 1988, 41, 110. [Google Scholar] [CrossRef]
  29. Beal, M.J. Variational Algorithms for Approximate Bayesian Inference. Ph.D Thesis, University of London, London, UK, 2003. [Google Scholar]
  30. Kuruwita, C.N. Variable selection in the single-index quantile regression model with high-dimensional covariates. Commun.-Stat.-Simul. Comput. 2021, 52, 1120–1132. [Google Scholar] [CrossRef]
Figure 1. Correlation between murder rate and common variables.
Figure 1. Correlation between murder rate and common variables.
Mathematics 11 02232 g001
Table 1. The median and standard deviation of 500 MADs estimated using various methods in simulations with i.i.d. errors.
Table 1. The median and standard deviation of 500 MADs estimated using various methods in simulations with i.i.d. errors.
QuantileError DistributionMethod
Lasso ALasso QRL QRAL BLQR BALQR VBSSLQR
τ = 0.3 normal0.60 (0.06)0.89 (0.14)0.88 (0.07)3.90 (0.20)1.28 (0.17)0.55 (0.09) 0.21 (0.05)
Laplace0.72 (0.08)0.97 (0.15)1.06 (0.15)3.95 (0.20)1.44 (0.21)0.68 (0.20) 0.24 (0.06)
normal mixture0.76 (0.09)1.04 (0.14)1.02 (0.15)3.95 (0.21)1.42 (0.19)0.71 (0.16) 0.23 (0.05)
Laplace mixture0.89 (0.13)1.13 (0.18)1.24 (0.33)4.06 (0.23)1.69 (0.32)1.07 (0.31) 0.26 (0.07)
Cauchy1.23 (4.59)1.46 (12.07)0.59 (0.62)4.24 (12.40)1.92 (1.07)1.98 (4.95) 0.11 (0.60)
τ = 0.5 normal0.40 (0.05)0.81 (0.16)0.80 (0.07)3.90 (0.21)1.21 (0.19)0.29 (0.07) 0.20 (0.05)
Laplace0.56 (0.08)0.90 (0.17)1.01 (0.16)3.96 (0.21)1.35 (0.22)0.46 (0.23) 0.20 (0.05)
normal mixture0.53 (0.07)0.89 (0.17)0.96 (0.13)3.95 (0.20)1.36 (0.25)0.41 (0.21) 0.21 (0.05)
Laplace mixture0.74 (0.13)1.02 (0.20)1.20 (0.22)4.01 (0.23)1.72 (0.32)0.83 (0.32) 0.21 (0.06)
Cauchy1.31 (28.87)1.44 (10.15)0.48 (0.16)4.40 (32.20)2.026 (1.16)4.28 (4.47) 0.07 (0.71)
τ = 0.7 normal0.61 (0.06)0.93 (0.16)0.88 (0.07)3.88 (0.21)1.22 (0.19)0.56 (0.11) 0.20 (0.05)
Laplace0.71 (0.08)1.02 (0.16)1.06 (0.16)3.97 (0.20)1.36 (0.27)0.67 (0.20) 0.25 (0.07)
normal mixture0.75 (0.09)1.06 (0.17)1.01 (0.14)3.94 (0.21)1.32 (0.23)0.75 (0.19) 0.22 (0.05)
Laplace mixture0.89 (0.13)1.20 (0.21)1.24 (0.32)4.01 (0.22)1.65 (0.33)0.99 (0.36) 0.26 (0.09)
Cauchy1.22 (8.78)1.60 (8.43)0.60 (0.60)4.27 (13.65)2.28 (1.25)3.02 (2.91) 0.12 (1.14)
The bold represents the optimal result in each scenario.
Table 2. Mean TP/FP of various methods for simulation with i.i.d. errors.
Table 2. Mean TP/FP of various methods for simulation with i.i.d. errors.
QuantileError DistributionMethod
Lasso ALasso QRL QRAL BLQR BALQR VBSSLQR
TP/FP TP/FP TP/FP TP/FP TP/FP TP/FP TP/FP
τ = 0.3 normal10.00/38.249.80/0.2010.00/175.015.00/95.338.48/0.0010.00/0.02 10.00 / 0.02
Laplace10.00/40.439.75/0.349.99/169.285.00/94.888.06/0.009.86/0.13 10.00 / 0.01
normal mixture10.00/38.849.73/0.319.99/163.965.00/95.398.19/0.009.90/0.10 10.00 / 0.00
Laplace mixture10.00/37.999.62/0.719.87/146.315.00/95.257.43/0.009.50/0.70 10.00 / 0.01
Cauchy8.21/29.057.95/7.099.87/97.344.66/94.496.11/0.007.35/13.24 9.83 / 0.00
τ = 0.5 normal10.00/39.009.75/0.2910.00/156.045.00/94.738.37/0.0010.00/0.01 10.00 / 0.01
Laplace10.00/38.069.69/0.479.99/107.535.00/95.307.97/0.009.81/0.09 10.00 / 0.00
normal mixture10.00/39.039.69/0.5710.00/112.745.00/95.338.10/0.009.86/0.06 10.00 / 0.00
Laplace mixture10.00/39.719.55/0.839.94/71.545.00/94.777.40/0.009.53/0.47 10.00 / 0.00
Cauchy8.19/29.318.04/7.0610.00/8.504.55/95.675.53/0.006.03/22.94 9.77 / 0.00
τ = 0.7 normal10.00/38.859.70/0.4110.00/174.875.00/95.408.41/0.009.99/0.03 10.00 / 0.01
Laplace10.00/37.689.64/0.679.99/168.155.00/95.057.91/0.009.77/0.13 10.00 / 0.01
normal mixture10.00/38.389.63/0.639.99/165.655.00/95.768.11/0.009.88/0.08 10.00 / 0.01
Laplace mixture10.00/39.649.41/1.229.88/150.335.00/95.567.32/0.009.50/1.03 9.99 / 0.01
Cauchy8.35/30.247.54/8.179.89/97.854.65/95.195.73/0.006.81/17.95 9.36 / 0.00
The bold represents the optimal result in each scenario.
Table 3. The median and standard deviation of 500 MADs were estimated via various methods for simulations with heterogeneous random errors.
Table 3. The median and standard deviation of 500 MADs were estimated via various methods for simulations with heterogeneous random errors.
QuantileError DistributionMethod
Lasso ALasso QRL QRAL BLQR BALQR VBSSLQR
τ = 0.3 normal0.91 (0.10)1.13 (0.15)1.31 (0.14)3.96 (0.22)1.57 (0.22)0.90 (0.20) 0.30 (0.07)
Laplace1.08 (0.13)1.24 (0.17)1.61 (0.39)4.09 (0.23)1.93 (0.29)1.23 (0.35) 0.36 (0.13)
normal mixture1.14 (0.14)1.32 (0.17)1.55 (0.35)4.06 (0.24)1.92 (0.31)1.31 (0.35) 0.33 (0.11)
Laplace mixture1.36 (0.21)1.51 (0.23)2.34 (0.60)4.27 (0.26)2.30 (0.42)2.05 (0.52) 0.39 (0.24)
Cauchy1.91 (102.51)1.94 (33.45)1.17 (0.80)4.80 (19.32)2.49 (1.11)3.90 (3.39) 0.15 (0.98)
τ = 0.5 normal0.61 (0.08)0.93 (0.17)1.21 (0.12)3.96/0.211.39 (0.27)0.50 (0.22) 0.30 (0.07)
Laplace0.87 (0.13)1.12 (0.20)1.53 (0.25)4.09 (0.23)1.84 (0.33)1.06 (0.35) 0.29 (0.08)
normal mixture0.81 (0.12)1.08 (0.19)1.44 (0.21)4.05 (0.23)1.77 (0.31)1.08 (0.37) 0.31 (0.08)
Laplace mixture1.12 (0.21)1.31 (0.25)1.84 (0.32)4.28 (0.29)2.25 (0.41)1.65 (0.56) 0.32 (0.12)
Cauchy1.93 (2.23)1.93 (107.59)0.74 (0.27)4.93 (15.90)2.88 (1.23)3.72 (4.80) 0.09 (0.88)
τ = 0.7 normal0.92 (0.09)1.20 (0.17)1.32 (0.14)3.97/0.201.38 (0.26)0.86 (0.21) 0.31 (0.09)
Laplace1.07 (0.13)1.31 (0.18)1.61 (0.37)4.09 (0.22)1.95 (0.33)1.29 (0.31) 0.37 (0.15)
normal mixture1.14 (0.14)1.40 (0.19)1.53 (0.33)4.08 (0.23)1.85 (0.36)1.31 (0.29) 0.34 (0.10)
Laplace mixture1.38 (0.20)1.58 (0.23)2.16 (0.61)4.32 (0.28)2.37 (0.52)1.86 (0.50) 0.40 (0.25)
Cauchy1.76 (3.76)2.01 (17.31)1.24 (0.76)4.65 (31.22)4.01 (1.30)4.31 (18.21) 0.16 (1.04)
The bold represents the optimal result in each scenario.
Table 4. Mean TP/FP of the various methods for simulations with heterogeneous random errors.
Table 4. Mean TP/FP of the various methods for simulations with heterogeneous random errors.
QuantileError DistributionMethod
Lasso ALasso QRL QRAL BLQR BALQR VBSSLQR
TP/FP TP/FP TP/FP TP/FP TP/FP TP/FP TP/FP
τ = 0.3 normal10.00/40.769.72/0.359.98/170.295.00/95.237.97/0.009.75/0.13 10.00 / 0.03
Laplace10.00/38.809.54/0.839.73/140.094.99/94.597.23/0.009.31/1.20 9.97 / 0.05
normal mixture10.00/38.799.59/0.749.74/136.635.00/95.827.29/0.009.35/1.23 9.98 / 0.04
Laplace mixture9.95/37.969.17/2.078.82/76.984.96/95.426.53/0.008.51/4.48 9.83 / 0.05
Cauchy7.46/26.867.15/10.239.56/66.044.43/95.564.86/0.005.73/24.57 9.52 / 0.00
τ = 0.5 normal10.00/39.219.67/0.449.99/142.565.00/95.567.93/0.009.86/0.16 10.00 / 0.05
Laplace10.00/39.039.49/0.939.81/89.765.00/95.337.07/0.009.32/1.09 9.99 / 0.01
normal mixture10.00/38.449.51/0.869.83/85.734.99/95.277.41/0.009.28/0.96 9.99 / 0.02
Laplace mixture9.96/38.919.26/2.089.33/31.134.95/95.766.23/0.008.75/4.13 9.97 / 0.02
Cauchy7.24/25.477.23/9.379.99/4.304.26/95.954.54/0.005.92/24.88 9.60 / 0.00
τ = 0.7 normal10.00/40.589.59/0.529.98/171.435.00/95.537.88/0.009.76/0.16 10.00 / 0.07
Laplace10.00/39.209.44/1.149.74/144.664.99/95.206.87/0.009.26/0.97 9.95 / 0.06
normal mixture10.00/38.299.38/1.149.76/138.145.00/95.497.04/0.009.30/1.30 9.99 / 0.02
Laplace mixture9.96/38.889.08/2.378.93/85.784.96/95.025.95/0.008.69/3.50 9.81 / 0.05
Cauchy7.64/27.347.16/10.199.61/68.754.52/96.363.59/0.005.79/24.929.46/0.00
The bold represents the optimal result in each scenario.
Table 5. RMSE of fitting test dataset and the number of active variables for the various methods for the real dataset.
Table 5. RMSE of fitting test dataset and the number of active variables for the various methods for the real dataset.
MethodRMSE/Number of Active Variables
τ = 0.1 τ = 0.3 τ = 0.5 τ = 0.7 τ = 0.9
QRL1.36/640.94/651.09/660.93/571.21/53
QRAL3.11/373.05/403.08/372.84/363.10/32
BLQR1.34/11.08/11.03/11.08/11.30/1
BALQR1.67/21.08/11.02/11.08/11.30/1
VBSSLQR1.34/12 0.94 /9 0.78 /12 0.83 /7 1.14 /9
The bold represents the optimal result in each scenario.
Table 6. Crime data analysis: variable selection.
Table 6. Crime data analysis: variable selection.
Quantile Level ( τ )Quantile Specific Variables
0.1racepctblackpctUrbanPctLess9thGradeTotalPctDiv
PctNotHSGradPctOccupMgmtProfFemalePctDivPctPersDenseHous
NumInSheltersPctHousOccupPctBornSameState
0.3racepctblackNumInSheltersFemalePctDivPctPersDenseHous
TotalPctDivPctHousOwnOccRentLowQMedRent
0.5racePctWhiteracePctHispFemalePctDivTotalPctDiv
PctWorkMomPctSpeakEnglOnlyHousVacantPctVacantBoarded
RentLowQMedRentNumInShelters
0.7racePctWhiteracePctAsianFemalePctDivTotalPctDiv
PctVacantBoardedNumInShelters
0.9racePctWhiteracePctAsianindianPerCapPctOccupManu
MalePctDivorcePctHousOccupPctVacantBoardedNumInShelters
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dai, D.; Tang, A.; Ye, J. High-Dimensional Variable Selection for Quantile Regression Based on Variational Bayesian Method. Mathematics 2023, 11, 2232. https://doi.org/10.3390/math11102232

AMA Style

Dai D, Tang A, Ye J. High-Dimensional Variable Selection for Quantile Regression Based on Variational Bayesian Method. Mathematics. 2023; 11(10):2232. https://doi.org/10.3390/math11102232

Chicago/Turabian Style

Dai, Dengluan, Anmin Tang, and Jinli Ye. 2023. "High-Dimensional Variable Selection for Quantile Regression Based on Variational Bayesian Method" Mathematics 11, no. 10: 2232. https://doi.org/10.3390/math11102232

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop