Robust Variable Selection and Regularization in Quantile Regression Based on Adaptive-LASSO and Adaptive E-NET

: Although the variable selection and regularization procedures have been extensively considered in the literature for the quantile regression ( QR ) scenario via penalization, many such procedures fail to deal with data aberrations in the design space, namely, high leverage points ( X -space outliers) and collinearity challenges simultaneously. Some high leverage points referred to as collinearity inﬂuential observations tend to adversely alter the eigenstructure of the design matrix by inducing or masking collinearity. Therefore, in the literature, it is recommended that the problems of collinearity and high leverage points should be dealt with simultaneously. In this article, we suggest adaptive LASSO and adaptive E - NET penalized QR ( QR - ALASSO and QR - AE - NET ) procedures where the weights are based on a QR estimator as remedies.


Introduction
Variable selection is at the heart of the model-building process. However, it is fraught with challenges. These challenges stem from data aberrations in both the X-space and Y-space. X-space data aberrations comprise extreme observations referred to as high leverage points, while Y-space extreme ones are residual outliers referred to as outliers. High leverage points can be collinearity hiding or inducing points, referred to as collinearity influential observations [1]. The least squares (LS) method is susceptible to both X-space and Y-space data aberrations (non-Gaussian error terms), hence the need for robust procedures. In the literature, the least absolute deviation (LAD)-based [2] procedures have been suggested as alternatives to the LS as they are robust in the Y-space. One such LAD-based procedure is quantile regression (QR) [3]. However, LAD-based procedures are amenable to high leverage points. QR, which generalizes the LAD estimator to all quantile levels, is a very attractive procedure, as it provides more information about the conditional distribution of Y given X. As a result, it has generated more research interest in recent years, hence the need for robust variable selection procedures in the QR framework.
Subset selection has been a topical issue since the 19th century. However, subset selection procedures tend to be unstable and especially unsuitable for variable selection when the dimension is high [4]. As a result, towards the end of the 20th century, the literature witnessed a fair proliferation of penalization (shrinkage) procedures as the alternative variable/model selection and regularization tools. Penalization procedures tend to be fairly stable and produce lower prediction errors than subset selection procedures. So they have been suggested to proffer solutions to shortcomings of subset selection methods, both in LS and QR scenarios, with varying degrees of success. These methods include procedures such as RIDGE [5], LASSO [6], elastic net (E-NET) [7], the nonnegative garotte selection [4,8] and their extended versions, just to mention a few.
Although the LASSO penalty has the edge over the RIDGE penalty against overfitting (as the RIDGE penalty does not shrink the coefficient estimates to zero) with attractive prediction properties, its model selection criterion is only consistent under some restrictive assumptions [9]. Other drawbacks of the LASSO are that in the p > n case, where p and n are the numbers of predictors and observations, respectively, the LASSO selects at most n variables before it saturates; in the case where a group of variables among which the pairwise correlations are very high, then the LASSO tends to select only one variable from the group randomly; for the usual n > p case with high correlated predictors, it has been empirically observed that the prediction performance of the LASSO is dominated by RIDGE regression [7]. Zou and Hastie [7] suggested the E-NET procedure with similar sparsity of representation to the LASSO often outperforming the LASSO procedure. The competitive edge of the E-NET over the LASSO is that it encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together; and is particularly useful when p is much bigger than n.
The LASSO penalty performs poorly in the presence of high leverage points (especially collinearity influential points), and procedures such as smoothly clipped absolute deviation (SCAD) [10] and minimax concave penalty (MCP) [11] are better options. Weights-based regression methods to remedy the influence of both high leverage points and outliers have been suggested in the literature. In the LS scenario, weighted regression has been used in variable selection, for example, doubly adaptive penalized regression, which satisfies both sparse and robustness properties [12]. In the QR scenario, weighted QR (WQR) has been suggested as a remedy to high leverage points [13]. Building upon this idea, Ranganai and Mudhombo [14] suggested a weighted QR LASSO (WQR-LASSO) variable selection and regularization parameter estimation procedure, which is robust to both the X and Y-spaces data aberrations.
Since LASSO penalizes the parameter coefficient estimates equally and is only consistent under some restrictive assumptions, a dynamic variable selection and regularization approach has been suggested via the adaptive LASSO (ALASSO) penalty. Compared to LASSO, ALASSO enjoys the oracle properties which guarantee optimal performance in large samples, with high dimensionality [15] as well as the computational advantage of the LASSO owing to the efficient path algorithm [16]. The ALASSO procedure relies on suitably chosen weights. Frommlet and Nuel [17] suggested the adaptive RIDGE (ARIDGE) procedure for variable selection with interesting formulations of the adaptive weight. In the QR scenario, variable selection has been suggested via ALASSO QR (QR-ALASSO) [18]. Zou and Zhang [19] suggested the adaptive E-NET (AE-NET) procedure that inherits some good properties from both ALASSO and ARIDGE penalty-based procedures.
This article suggests adaptive LASSO and adaptive E-NET penalized weighted QR (WQR-ALASSO and WQR-AE-NET) procedures with the initial coefficient estimator used to compute adaptive weights need not be consistent. This coefficient estimate is computed from WQR in which the weights used to downweigh high leverage points are based on the minimum covariance determinant (MCD) estimator of Rousseeuw [20]. The WQR-ALASSO and WQR-AE-NET procedures are extensions of Ranganai and Mudhombo [14]'s WQR-LASSO and WQR-E-NET procedures as mitigation against both high leverage and collinearity influential observations.
In summary, the motivations of this study are premised on the following: • Rather than carrying an "omnibus" study of adaptive penalized QR, we carry out a detailed study by distinguishing different types of high leverage points under different distribution scenarios viz.
-Collinearity influential points that comprise collinearity inducing and collinearity masking ones.

-
A "mixture" of collinearity and high leverage points that are not collinearity influential.
• Unlike the conditional mean regression (LS) estimator, which is a global one, the regression quantile (RQ) estimator is a local one. Therefore, we suggest a QR-based estimator instead of (RR) parameter-based estimator suggested in the literature to derive adaptive weights in extending QR-LASSO and QR-E-NET procedures to QR-ALASSO and QR-AE-NET procedures, respectively. • We further extend QR-ALASSO and QR-AE-NET procedures to WQR-ALASSO and WQR-AE-NET procedures using the same methodology. • We carry a comparative study of these models using simulation studies and wellknown data sets from the literature.
The rest of the paper is organized as follows: In Section 2.1, a brief overview of QR, the adaptive weights and the QR-ALASSO and QR-AE-NET procedures are given. The QR-ALASSO and QR-AE-NET procedures are extended to their WQR-ALASSO and WQR-AE-NET counterparts, respectively, in Section 2.2. Simulation results and examples are presented in Section 3. The simulations consider collinearity influential observations in the design matrix as well as normal and t-distributed error term scenarios with varying degrees of tail heaviness. The examples consider data sets from the literature with collinearity influential observations. Finally, concluding remarks are provided in Section 4.

Quantile Regression
Consider the linear regression model given by where y i is the ith response random variable, x i , the ith row of design matrix X ∈ R n×p , β is vector of parameters and i ∼ F, the ith error term. The QR estimator [3] for the coefficient vector β ∈ R p in Equation (1), is based on an optimization problem solved via linear programming techniques. The QR minimization problem for estimating the parameter vector is given bŷ with the check function, denoting the re-weighting function of residuals e i s for 0 < τ < 1, where denotes residuals andβ(τ) the τth RQ.

Variable Selection in Quantile Regression
In this section, we give an overview of QR variable selection using penalized procedures. We discuss QR-penalized methods with LASSO [6] and E-NET [7] based penalties as well as the ALASSO [16] and AE-NET [19] based penalties.
The LASSO-penalized QR procedure, denoted by QR-LASSO, is a QR variable selection procedure that uses a LASSO ( 1 ) penalty. This penalized QR procedure is given by the minimization problem where the tuning parameter λ shrinks the β coefficients towards zero equally. The second term represents the penalty term and other terms are explained in Equations (1) and (2). The continuous shrinkage by QR-LASSO procedure frequently enhances prediction precision due to bias-variance trade-off.
The QR variable selection procedure that uses an ALASSO penalty is denoted by QR-ALASSO. The adaptive weights penalize different coefficients differently in the QR setting, thereby outperforming the LASSO penalty scenario. The QR-ALASSO procedure is an extension of the LASSO penalty-based method. The construction of adaptive weights is discussed in Section 2.1.1. The QR-ALASSO procedure selects variables by solving a minimization problem: (4) whereω j is the adaptive weight and the tuning parameter λ j =ω j λ, shrinks predictors coefficients to zero differently. Other terms are as defined in Equations (1) and (2). The QR-ALASSO tuning parameter is no-longer constant (λ), but λ j for j = 1, 2, . . . , p, which is varying.
We present the penalized QR procedure by [7]'s E-NET penalty (QR-E-NET). This penalized QR procedure comprises a combination of the LASSO and RIDGE penalties. The E-NET-penalized QR is best applicable to unidentified groups of predictors. The E-NET-penalized QR is given by the minimization problem where α ∈ [0, 1] is the mixing parameter between RIDGE (α = 0) and LASSO (α = 1), and λ is the tuning parameter for the second and third terms. This procedure outperforms its RIDGE and LASSO counterparts [7]. In a similar fashion to the extension of QR-LASSO (Equation (3)) to QR-ALASSO (Equation (4)), QR-E-NET (Equation (5)) can be extended to QR-AE-NET (Equation (6)) as in [19]. Suppose the adaptive weight (ω j ) is constructed as in Equation (8). We define the QR-AE-NET estimatorβ aŝ where the terms are defined as in Equations (4) and (5). QR-AE-NET reduces to QR-ALASSO for α = 1 and QR-ARIDGE for α = 0. QR-AE-NET inherits the desired optimal minimax bound from ALASSO [16]. It is further anticipated that the procedure deals with collinearity challenges better due to the presence of the 2 -penalty.

Choice of Adaptive Weights for ALASSO
Consider the penalty construction∑ p j=1 |β j | q as a general penalty term in most penalization problems. The adaptive penalty can be expressed as ∑ p j=1 ω j |β j | q , where ω j s are the adaptive weights. When q = 1, we obtain an ALASSO penalty and for q = 2, we obtain an adaptive RIDGE (ARIDGE) penalty. [16] suggested using the LS β estimator in determining adaptive weights, ω j s. The LSβ is suitable in the absence of collinearity. In the presence of collinearity, Ref. [16] suggested the use of RIDGEβ as a suitable replacement because it is superior in stability than its LS counterpart. In constructing the adaptive weights (ω j ) in the LS case, we suggest using the MCD-based weighted RIDGE regression (WRR) estimator where λ is the tuning parameter, β is a vector of parameters andω i is a robust MCD-based weight [14]. Thus, in the LS case the weights are given by whereβ WRR j is jth parameter estimator from a WRR penalized solution and 1/n is added to avoid dividing by zero. This adaptive weight is a special case of weight, ω = (|β RR j | γ + δ γ ) (θ−2)/γ proposed by [17], where θ = 1, δ = 1/n and γ = 1.
Although the use of ω j may be applicable to the 1 estimator (RQ at τ = 0.5) for the symmetrical distribution, it may not be applicable at extreme quantile levels in the presence of high leverage (and collinearity influential) points. This is due to the fact that the presence of these atypical observations result in RQ planes often crossing (unequal slope parameter estimates) (see [21,22]), although they are theoretically parallel. Therefore, instead of usinĝ β WRR j , we suggest using the one based on WQR,β(τ) WQR j , which is the solution to the minimization problem whereβ(τ) WQR is the τ th WQR-RIDGE estimator, and λ andω i are as defined in Equations (2) and (7). An analogous result to Equation 8 (ω j ) based on using the WQR-RIDGE parameter estimatesβ(τ) WQR becomes whereβ(τ) WQR j is the jth parameter estimator from a WQR-RIDGE penalized solution at a specified τ quantile level, and other terms are defined in Equation (8). Therefore, for all variable selection in simulations and applications, we use the weightsω j for WQR instead of ω j for WRR (ω j is same as ω j at τ = 0.50 RQ level).β(τ) WQR j is more representative at all τ quantiles thanβ WRR j . To our knowledge, no such adaptive weight has been applied in a QR variable selection scenario. The adaptive weight has the advantage of not being influenced by extreme points and adjustable to particular distribution levels, for example, t 1 , t 2 distributions, et cetera and applicable at all τ quantile levels.

Adaptive Penalized Weighted Quantile Regression
Using the adaptive weightsω i and the MCD-based weightsω i , we propose an adaptive penalized WQR variable selection procedure. The WQR penalization problems culminate from ALASSO and AE-NET penalties, namely the WQR adaptive LASSO (WQR-ALASSO) and the WQR adaptive E-NET (WQR-AE-NET) estimators, respectively. The approach is robust in the presence of high leverage points (and collinearity influential observations) due to robustly chosen MCD-based weights as in the WQR (see [14,23,24]).
We first write the proposed WQR-ALASSO given by the minimization problem whereω j s are WQR-RIDGE parameter estimator-based adaptive weights,ω i s are MCDbased weights and, other terms are as defined in Equations (2) and (3). The proposed adaptive penalized QR procedure, WQR-AE-NET is the minimization problem where the terms are defined in Equations (2), (9) and (11). Special cases of this construction are WQR-ALASSO if α = 1 and WQR-ARIDGE if α = 0. Just like its unweighted QR counterpart, WQR-AE-NET inherits the desired optimal minimax bound from ALASSO [16]. The next section discusses the asymptotic properties of our proposed procedures.
Let Equation 6 have β ALASSO as its solution. We now state a theorem for an asymptotic oracle property for i.i.d. random error terms (Theorem 1). Theorem 1. Consider a sample {(x i , y i ), i = 1, . . . , n} from Equation (6) satisfying Conditions (i) and (iii) with Ω = I n (constant weight of 1). If √ nλ n → 0 and n (γ+1)/2 λ n → ∞, then we have ) converges asymptotically in limit to N 0, Now, consider the extension of oracle results to non-i.i.d. random error scenarios. The following assumptions are considered.
The random errors i s are independent with F i (t) = P( i ≤ t) the distribution function of i . We assume that each f i (.) is locally linear near zero (with a positive slope) and )ds, which is a convex function for each n and i. (vi) Assume that, for each u, ( is a strictly convex function taking values in [0, ∞).

Remark 1.
The QR model, a non-i.i.d. random error model, is catered for by (v) [27]. See the proofs online [18].

Simulation Study
In this Section, we discuss the simulation results of four adaptive variable selection and estimation procedures, namely QR-ALASSO, QR-AE-NET (for α = 0.5), WQR-ALASSO, and WQR-AE-NET (for α = 0.5) against their non-adaptive versions. We evaluate and illustrate these four procedures' variable selection and prediction performance under normality and t−distributions (heavy-tailed error distributions) in the presence of collinearity influential and high leverage observations in the design space. The pairwise comparisons are two-pronged, viz., (i) comparison of adaptive (QR-ALASSO and QR-AE-NET) versus non-adaptive (QR-LASSO and QR-E-NET) procedures and (ii) comparison of weighted adaptive (WQR-ALASSO and WQR-AE-NET) procedures versus weighted nonadaptive (WQR-LASSO, WQR-E-NET) procedures. Cross comparisons are explored between (i) and (ii) scenarios. All simulations are applied at two QR levels, τ ∈ (0.25; 0.50) and two sample sizes, n ∈ (50; 100), although n = 100 is left for brevity out in the presentation of the results for brevity.

Simulation Design Scenarios
In the design space, we consider four collinearity influential point scenarios in addition to the orthogonal design matrix and a correlated design matrix with high leverage points, while in the error term distribution scenarios, we consider the normal and heavy tailed (t−distribution cases at different degrees of freedom) distribution cases. The design matrices choices are simulated as in Jongh et al. [14,28,29], viz.
(i) D1-the well-behaved orthogonalized n × p design matrix X (where the initial unorthogonalized's p = 8 columns are generated from N(0, 1)) satisfy the condition X X = nI. We first generate the n × p response data, W, where w ij ∼N(0, 1) with i = 1, 2, . . . , n and j = 1, 2, . . . , p. We find the singular value decomposition (SVD) of the design matrix W, given by W = U DV , where U and V are orthogonal with the diagonal entries of D. The diagonal entries of D are the eigenvalues of the design matrix W. Finally, the design matrix X is given by X = √ nU such that X X = nI since U is orthogonal. We use the design matrix D1 as a baseline when comparing with scenarios D2-D5.
(ii) D2/D4-the design matrix D1 with the most extreme point by Euclidean distance moved 10 units in the X direction (D2) and 100 units in the X direction (D4). The resultant extreme points are collinearity inducing points for scenarios D2 and D4 (see Figure 1). (iii) D3/D5-the design matrix D1 with the most and second most extreme points by Euclidean distance moved 10 and 100 units, respectively, in the X direction. The two extreme points have a masking effect on collinearity (see scenarios D3 and D5 in Figure 1). D6-a correlated design matrix case with high leverage points [14,30]. In D6, we partitioned the n × p design matrix, X = (X 1 , X 2 ) , where the uncontaminated part, X 1 ∼ N(0, V ) and the contaminated m × p sub-matrix X 2 ∼ N(1, I) (1 is the mean vector of ones and I is an identity matrix). The exponential decay 0.5 |j−i| , for i = 1, 2, 3, 4, 5, 6, 7, 8, j = 1, 2, 3, 4, 5, 6, 7, 8 generates the (ij)th entry of covariance matrix V and 0 is the mean vector of zeros. The design matrix D6 has m = (5; 10) contamination points (using contamination rate of 10%) from n = (50; 100) observations.
The number of simulation runs in each of the D1-D6 scenarios is 200, and 10-fold cross-validation (CV) is applied to select the best tuning parameter (optimal λ) in the test data.
Different scenarios of the ALASSO or AE-NET penalized QRs (weighted and unweighted), are explored using hqreg package, a readily available R-add-on-package [31]. The hqreg package, a semi-smooth Newton coordinate descent algorithm, chooses optimal λ as in Ranganai and Mudhombo [14].

Results
To study the robustness and performance of the different ALASSO and AE-NETpenalized QR procedures in the presence of collinearity influential points, high leverage points and influential points under different distributions, we present the simulated results. These results are based on the metrics; the median of the test errors, median 1≤i≤n { i } and its respective measure of dispersion MAD of the test errors, percentage (%) correctly fitted, average of correct/incorrect zero coefficients. The MAD of test errors for each penalized estimator is estimated by We compare the performance of non-adaptive procedures (QR-LASSO and QR-E-NET) and the adaptive ones we suggest here based on a QR adaptive weight, viz., QR-ALASSO, QR-AE-NET, WQR-ALASSO and WQR-AE-NET procedures. Without loss of generality, we only consider all procedures at τ = 0.25 and τ = 0.50 QR levels.

Remark 2.
The five zero coefficients correspond to the set {β 3 , β 4 , β 6 , β 7 , and β 8 }, hence the maximum average of correctly/incorrectly selected (shrunk) coefficients is 5, while the set of correctly selected models is given as a proportion, i.e., a percentage.

D1 under the t-Distribution
Results at the design matrix D1 (the orthogonal design scenario or baseline case) under normality and heavy-tailed t-distributed error scenarios with degrees of freedom d = 1 are shown in Table 1. In all 16 pairwise cases except one, QR-ALASSO and QR-AE-NET procedures outperform their respective non-adaptive penalized procedures at τ ∈ (0.25; 0.50) QR levels with respect to all measures except one. The exception is the LASSOpenalized procedure, which outperforms all adaptive penalized procedures in the heavytailed distribution scenario (t-distribution case) when d = 1, σ = 1 and τ = 0.50. The QR-ALASSO outperforms the QR-AE-NET procedure in both the Gaussian and t-distribution cases in this baseline scenario with respect to MAD of the Test Errors. Generally, the adaptive penalized procedures, QR-ALASSO and QR-AE-NET correctly shrinks covariate parameters (β j :j = 3, 4, 6, 7, 8) to zero with high precision (in all cases) compared to their respective non-adaptive penalized procedures. However, the performance with respect to all measures decreases as the error term distribution becomes heavier. Results at the design matrices with collinearity inducing high leverage points (D2 and D4) are shown in Tables 2 and A1. In the unweighted penalized QR case, the adaptive penalized procedures outperform the non-adaptive procedures in all scenarios in all 16 (100%) respective pairwise comparisons with respect to correctly selected (shrunk) zero coefficients. However, with respect to the percentage of correctly fitted models, 14 (88%) adaptive penalized procedures outperform their respective non-adaptive procedures and 13 (81%) adaptive penalized procedures outperform their respective non-adaptive procedures with respect to prediction. In the weighted penalized WQR scenarios, WQR-ALASSO and WQR-AE-NET, 10 (63%) adaptive penalized procedures outperform their respective non-adaptive procedures with respect to prediction and 12 (75%) with respect to both the percentage of correctly fitted models and correctly shrunk zero coefficients. The results at τ = 0.50 and at τ = 0.25 RQ levels are similar in D4 (see Table A1 for results at τ = 0.50). Again, the performance with respect to all measures decreases as the error term distribution becomes heavier, i.e., as σ increases, the effectiveness of shrinking zero coefficients correctly tends to be compromised. The pairwise comparisons between the weighted versions of penalized procedures (WQR-ALASSO and WQR-AE-NET) and their respective non-weighted versions (QR-ALASSO and QR-AE-NET) result in the former outperforming the latter in all cases with respect to prediction while doing so in the majority of cases with respect to the other two measures. Overall, the WQR-ALASSO generally outperforms all the other models with respect to all measures. Tables 3 and A2 show variable/model selection and prediction performance in the presence of collinearity hiding points (D3 and D5) under the Gaussian distribution. In the unweighted scenarios, the 16 pairwise comparisons demonstrate that the adaptive versions of penalized procedures (QR-ALASSO and QR-AE-NET) dominate their non-adaptive versions as follows; in prediction, they perform better 100% and 88% of the time, respectively; with respect to correctly fitted models they perform better 100% and 50% (while in 50% of the time they perform equally) of the time, respectively; and with respect to correctly shrinking zero coefficients, they both perform better 100% of the time, respectively. The performance picture in the weighted scenario is as follows; with respect to prediction, both WQR-LASSO and WQR-E-NET outperform their respective adaptive versions 63% of the time; with respect to correctly fitted models, the adaptive versions WQR-ALASSO and WQR-AE-NET outperform their non-adaptive ones 88% of the time, while with respect to correctly shrinking zero coefficients they outperform their non-adaptive ones 100% of the time.  Thus, the adaptive weights improve the performance of the models with respect to all measures in the unweighted scenario while doing so with respect to correctly fitted models and correctly shrinking zero coefficients in the unweighted scenario. However, in the weighted scenario, the adaptive weights hamper the performance of the models with respect to prediction. Although the τ = 0.25 at D5 scenario is not included in the tables, we obtain similar results as at τ = 0.50 (see Table A2).

D2 and D3 under the t-Distribution
The heavy-tailed t-distribution scenario results for D2 and D3 with collinearity hiding and inducing observations, respectively, are shown in Tables 4, 5, A3 and A4; respectively. For brevity, the results of the heavy-tailed D4 and D5 scenarios are not reported since they are similar to those of D2 and D3.
At the predictor matrix with collinearity-inducing points (D2), under the t-distribution, all adaptive versions outperformed their non-adaptive versions with respect to all metrics. In the unweighted scenarios, the 16 pairwise comparisons clearly demonstrate that the adaptive versions of penalized procedures (QR-ALASSO and QR-AE-NET) dominate their non-adaptive versions as follows; in prediction, they each perform better 88% of the time over their respective non-adaptive versions; with respect to correctly fitted models, they perform better 100% of the time and 38% (while in 62% of the time they perform equally) over their respective non-adaptive versions; and with respect to correctly shrinking zero coefficients, they outperform their non-adaptive ones 100% and 88% of the time over their respective non-adaptive versions.
In the weighted scenarios, the 16 pairwise comparisons clearly demonstrate that the adaptive versions of penalized procedures (WQR-ALASSO and WQR-AE-NET) dominate their non-adaptive versions as follows; in prediction, they each perform better 100% and 75% of the time over their respective non-adaptive versions; with respect to correctly fitted models, they perform better 88% of the time each, respectively; and with respect to correctly shrinking zero coefficients, they outperform their non-adaptive ones 88% and 100% of the time over their respective non-adaptive versions. The pairwise comparisons between the weighted versions of penalized procedures (WQR-ALASSO and WQR-AE-NET) and their unweighted versions (QR-ALASSO and QR-AE-NET) result in the former outperforming the latter in all respective cases with respect to prediction, while there is no clear "winner" with respect to the other two measures. Overall, the WQR-ALASSO generally outperforms all other models with respect to all measures.
At the predictor matrix with collinearity-masking points (D3) under the t-distribution, all adaptive versions outperformed their non-adaptive versions with respect to all metrics. In the unweighted scenarios, the 16 pairwise comparisons clearly demonstrate that the adaptive versions of penalized procedures (QR-ALASSO and QR-AE-NET) dominate their non-adaptive versions as follows; in prediction, they perform better 63% and 75% of the time over their respective non-adaptive versions; with respect to correctly fitted models, they perform better 100% of the time and 38% (while in 62% of the time they perform equally) over their respective non-adaptive versions; and with respect to correctly shrinking zero coefficients, they outperform their non-adaptive ones 100% of the time in both cases over their respective non-adaptive versions.
In the weighted scenarios, the 16 pairwise comparisons of the adaptive versions of penalized procedures (WQR-ALASSO and WQR-AE-NET) against their non-adaptive versions show that the dominance of the former class of models is somewhat reduced, though they generally outperform the latter. In the prediction scenario, the adaptive versions are outperformed by their non-adaptive versions 62% of the time in both respective cases, while it is the opposite with respect to correctly fitted models and with respect to correctly shrinking zero coefficients. With respect to these two metrics, the adaptive versions of penalized procedures (WQR-ALASSO and WQR-AE-NET) dominate their non-adaptive versions as follows; with respect to correctly fitted models, they perform better 62% of the time, while with respect to correctly shrinking zero coefficients they outperform their non-adaptive ones 100% of the time in both cases over their respective non-adaptive versions.
The pairwise comparisons between the weighted versions of penalized procedures (WQR-ALASSO and WQR-AE-NET) and their unweighted versions (QR-ALASSO and QR-AE-NET) result in the former outperforming the latter in all respective cases with respect to prediction while doing so marginally better with respect to the other two measures. Overall, the WQR-ALASSO generally outperforms all other models with respect to all measures.

D6 under the t-Distribution
The results of variable/model selection as well as prediction performance at the design matrix with collinearity and high leverage points (D6) under heavy-tailed distributions (t-distributions with d ∈ (1; 6) degrees of freedom) scenarios are shown in Tables 6 and A5. In the unweighted scenarios, the 16 pairwise comparisons clearly demonstrate that the adaptive versions of penalized procedures (QR-ALASSO and QR-AE-NET) dominate their non-adaptive versions as follows; in prediction, they each perform better 75% and 62% of the time over their respective non-adaptive versions; with respect to correctly fitted models, they perform better in 100% and 88% of the time over their respective non-adaptive versions, while with respect to correctly shrinking zero coefficients they outperform their non-adaptive ones 100% of the time in both cases over their respective non-adaptive versions.
In the weighted scenarios, the 16 pairwise comparisons clearly demonstrate that the adaptive versions of penalized procedures (WQR-ALASSO and WQR-AE-NET) dominate their non-adaptive versions as follows; with respect to prediction, they outperform their non-adaptive ones 100% of the time in both cases over their respective non-adaptive versions; and with respect to both correctly fitted models and correctly shrinking zero coefficients they both perform better in 88% of the time over their respective non-adaptive versions.  The pairwise comparisons between the weighted versions of penalized procedures (WQR-ALASSO and WQR-AE-NET) and their non-weighted versions (QR-ALASSO and QR-AE-NET) result in former outperforming the latter in all cases with respect to prediction while there is no clear "winner" with respect to the other two measures. Overall, the WQR-ALASSO generally outperforms all other models with respect to all measures.

Examples
In this section, we illustrate the efficacy of the weighted QR-RIDGE-based adaptive weights in penalized QR using two data sets from the literature often used to illustrate the efficacy of robust methodologies in mitigation against collinearity influential as well as high leverage points in general, viz., the Jet-Turbine Engine [32] and Gunst and Mason [33] data sets. The Jet-Turbine Engine data set, which has high collinearity reducing points, is very popular with Engineers (see [32,34]). In contrast, the Gunst and Mason data set, which has collinearity-inducing points, is very popular with statisticians (see also [33]).

The Jet-Turbine Engine Data
The Jet-Turbine Engine data set [32] data set consists of 40 observations with thrust of a jet-turbine engine as the response variable (Y) and predictor variables: primary speed of rotation (X 1 ), secondary speed of rotation (X 2 ), fuel flow rate (X 3 ), pressure (X 4 ), exhaust temperature (X 5 ), and ambient temperature at time of test (X 6 ). According to [34], observations 6 and 20 are high leverage collinearity reducing points. The red data is standardized to correlation form and the predictor variable is generated by Y 1 = X 1 β 1 + 1 , 1 ∼ t 6 for the 38 observations excluding the collinearity reducing ones with X 2 and Y 2 comprising cases (observations) 6 and 20, such that X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 ) , where β 1 = (50, 0, 0, 10, 15, 0) . Table 7 summarizes the estimated QR βs and their biases from the true coefficients/true βs (β 1 = (50, 0, 0, 10, 15, 0) ). In the unweighted scenario, the zero coefficients are shrunk to zero 33% of the time with no clear "winner" between the adaptive and non-adaptive penalization. In the weighted scenario, the proportion of correctly shrunk coefficients increases to 54%, again with no clear "winner" between the adaptive and non-adaptive penalization. However, the adaptive penalization seem to guard against incorrectly shrinking non-zero coefficients to zero (see β 4 under WQR-LASSO at τ = 0.50). As expected, the LASSO (and ALASSO) penalties outperform E-NET (and AE-NET) penalties. From Figure 2a, it is clear that the weighting increases the variability of the residuals as more outliers are flagged by the weighting.

The Gunst and Mason Data
The performance of the proposed adaptive procedures is assessed in this Section using the [33] data set. The data set has 49 observations (countries) with the response variable gross national product (GNP) and predictor variables: infant death rate (TNFD), physician to population ratio (PHYS), population density (DENS), density of agricultural land (AGDS), measure of literacy (LIT), and higher education index (HIED) corresponding the response variable Y and predictor variables and predictor variables X 1 , X 2 , X 3 , X 4 , X 5 , and X 6 , respectively. From the literature, strong collinearity between the DENS (X 3 ) and AGDS (X 4 ) exist, and a few observations are high leverage points. The following points are considered outlying and influential: 7 (Canada), 13 (El Salvador), 17 (Hong Kong), 20 (India), 39 (Singapore), and 46 (United States of America) with observations 17 and 39 being high leverage collinearity inducing points. We standardize the data to correlation form and generate the predictor variable as Y 1 = X 1 β 1 + 1 , 1 ∼ t 6 for the first 43 observations (without influential observations 7, 13, 17, 20, 39 and 46) with X 2 and Y 2 comprising observations 7,13,17,20,39 and 46, such that X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 ) , where β 1 = (0, 8, −13, 0, 0, 6) . Note that these models with no constant term, the intercept F −1 + β 0 , translate to 0 and −0.72 under the t 6 error term distribution, at quantile levels τ = 0.50 and τ = 0.25, respectively. Table 8 summarizes the estimated QR βs and their biases from the true coefficients/true βs (β 1 = (0, 8, −13, 0, 0, 6) ). In the unweighted scenario, the zero coefficients are shrunk to zero 38% of the time with 25% shrunk to zero among the non-adaptive penalization procedures and 50% shrunk to zero among the adaptive penalization procedures. Thus, adaptive penalization clearly outperforms non-adaptive penalization. In the weighted scenario, the proportion of correctly shrunk coefficients increases to 83% with 67% shrunk to zero among the non-adaptive penalization procedures and 100% shrunk to zero among the adaptive penalization procedures. Thus, again adaptive penalization clearly outperforms non-adaptive penalization. Additionally, in the weighted scenario the ALASSO and AE-NET perform equally. As expected the LASSO (and ALASSO) penalties outperform E-NET (and AE-NET) penalties. From Figure 2b, it is clear that the weighting increases the variability of the residuals as more outliers are flagged by the weighting. Note that these models with no constant term, the intercept F −1 + β 0 translate to 0 and −0.72 under the t 6 error term distribution, at quantile levels τ = 0.50 and τ = 0.25, respectively.

Discussion
In this article, we proposed ALASSO and AE-NET penalized QR variable selection and parameter estimation procedures (both weighted and unweighted). We used the proposed adaptive weights,ω j , based on theβ(τ) WQR , a WQR-RIDGE parameter estimate in our proposed adaptive penalized QR procedures. Our proposed QR-ALASSO, QR-AE-NET, WQR-ALASSO and WQR-AE-NET variable selection and parameter estimation procedures were subjected to a simulation study. We first discuss the results of this simulation study results and the applications to well-known data sets from the literature.
The simulation study results under the Gaussian distribution at D1-D6 (predictor matrices with collinearity influential points, collinearity and high leverage points) show that QR-ALASSO, QR-AE-NET, WQR-ALASSO and WQR-AE-NET outperform their non-adaptive counterparts in at least 63% of the time in each respective scenario when low standard errors of the error term are present as in [30]. This is expected as the adaptive penalized QR methods satisfy sparsity and are asymptotically normally distributed, and provide a balance between bias-variance trade-offs. However, the performance with respect to all measures becomes compromised as the error term distribution becomes heavier. This same pattern of results is also evident in the heavy-tailed t-distribution scenario. In unweighted scenario, the QR-ALASSO is superior to other penalized procedures in the presence of larger errors in agreeing with the case of [16]'s results, where ALASSO is superior to LASSO. A similar pattern is observed in the weighted scenarios. However, a comparison between the weighted scenarios and the unweighted ones shows that the MCD weights enhance the efficacy of the penalized procedures at predictor matrices with collinearity influential observations, collinearity and high leverage points (as demonstrated in [14]) with the WQR-ALASSO performing best.
The applications of the suggested procedures to real life data sets are more or less in line with the simulation studies with the procedures exhibiting a higher efficacy at the predictor matrix with collinearity inducing points, i.e., the [33] data set, than the one with collinearity reducing points, i.e., the [32] data set. However, it must be noted that the latter data set has the severest collinearity inherent in it. The introduction of collinearity reducing to "clean" data only managed to lower the condition number from 52.09 to 47.78, which is indicative of the fact that the data is still highly collinear (see [34]). Again, the WQR-ALASSO performs best as in the simulation studies. Thus, although variable selection and regularization is adversely affected by heavy-tailed error term distributions (outliers), the adaptive weights generally improve the efficacy of penalized QR procedures with the MCD-based weights tending to increase their efficacy. We recommend the use of our ALASSO penalty for variable selection, with an option to use the AE-NET penalty if the objective of the study is prediction. Parameter estimation results from popular data sets in the literature support the applicability of our proposed methods.

Conflicts of Interest:
The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: