Sparse Robust Weighted Expectile Screening for Ultra-High-Dimensional Data

Xianjun Wu; Pingping Han; Mingqiu Wang

doi:10.3390/axioms14050340

,

and

¹

School of Statistics and Data Science, Qufu Normal University, Qufu 273100, China

²

School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan 430073, China

^*

Author to whom correspondence should be addressed.

Axioms2025, 14(5), 340;https://doi.org/10.3390/axioms14050340

This article belongs to the Section Mathematical Physics

Version Notes

Order Reprints

Abstract

This paper investigates robust feature screening for ultra-high dimensional data in the presence of outliers and heterogeneity. Considering the susceptibility of likelihood methods to outliers, we propose a Sparse Robust Weighted Expectile Regression (SRoWER) method that combines the

L_{2} E

criterion with expectile regression. By utilizing the IHT algorithm, our method effectively incorporates correlations of covariates and enables joint feature screening. The proposed approach demonstrates robustness against heavy-tailed errors and outliers in data. Simulation studies and a real data analysis are provided to demonstrate the superior performance of the SRoWER method when dealing with outlier-contaminated explanatory variables and/or heavy-tailed error distributions.

Keywords:

asymmetric least squares; feature screening; heteroscedasticity; robust regression; ultra-high dimensional data

MSC:

62J05; 62J07; 62F12

1. Introduction

With the exponential growth of data sets in various fields over the past two decades, numerous exhaustive methods have been proposed to address the issue of coefficient sparsity in high-dimensional statistical models, such as bridge regression [1], LASSO [2], SCAD and other folded-concave penalties [3], and the Dantzig selector [4]. While these methods have demonstrated their effectiveness both theoretically and practically, real-world scenarios present new challenges such as identifying disease-causing genes among millions of other genes or analyzing key factors contributing to stock price fluctuations from vast amounts of business data. To tackle ultra-high dimensional data, a range of techniques has emerged. One notable technique is Sure Independent Screening (SIS), initially developed by Fan and Lv [5] for screening out irrelevant factors before conducting variable selection in ultra-high dimensional linear models. There are numerous further developments based on SIS [6,7,8,9]. However, these methods overlook the correlations of covariates, despite their computational efficiency. Consequently, additional procedures have been proposed to address this limitation, including ISIS [6], FR [10], SMLE [11].

The aforementioned approaches, which are all based on the maximum likelihood function or Pearson’s correlation, become invalid in the presence of outliers. Therefore, robust methods have been extensively studied in the literature. Although quantile regression [12] is effective in handling heterogeneous data, the significantly higher computational cost compared to least squares error necessitates an investigation of the asymmetric least squares (ALS) regression (i.e., expectile regression [13,14,15,16]). ALS regression provides a more comprehensive interpretation of the conditional distribution than the ordinary least squares (OLS) method by allocating different squared error losses to positive and negative residuals, respectively. Moreover, its smooth differentiability greatly reduces computational costs and facilitates theoretical research. Building upon ALS and quantile regression, numerous methods have been proposed to address heterogeneous data with high-dimensionality, such as [17,18] for variable selection and [19,20,21,22,23,24] for feature screening. The study of [25] proposed an expectile partial correlation screening (EPCS) procedure to sequentially identify important variables for expectile regressions in ultra-high dimensions, and proved that this procedure can lead to a sure screening set. Another robust parametric technique called DPD-SIS [26,27] has been developed for ultra-high-dimensional linear regression models and generalized linear models. This approach is based on the robust minimum density power divergence estimator [28], but it is still limited to addressing marginal aspects without accounting for the correlations between features. In addition, the DPD-SIS cannot handle heterogeneity, which is often a feature of ultra-high-dimensional data.

In the context of heterogeneity and outliers in the data, we propose a new method called Robust Weighted Expectile Regression (RoWER), which combines the

L_{2}

error criterion with expectile regression to achieve the robustness and address the heterogeneity. Furthermore, we developed a sparse restricted RoWER (SRoWER) approach to achieve feature screening. Under general assumptions, we show that the SRoWER can be used for sure screening property. Numerical studies validate the robustness and efficacy of SRoWER. There are three advantages of our SRoWER method, including the following: (1) The SRoWER can provide more reliable screening results, particularly in the presence of outliers in both covariates and the response; (2) In the case of heteroscedasticity, the SRoWER yields superior performance in estimation and feature screening as demonstrated in simulation studies; (3) The SRoWER can be efficiently solved by an iterative hard-thresholding-based algorithm.

The remaining sections of this article are organized as follows. Section 2 introduces the model and the RoWER method. In Section 3, we present the SRoWER method for feature screening and establish sure independent screening property. Section 4 describes simulation studies and a real data analysis that evaluates the finite sample performance of the SRoWER method. Concluding remarks are provided in Section 5. The proofs of the main results can be found in the Appendix A.

2. Model and Method

2.1. $L_{2} E$ Criterion for Asymmetric Normal Distribution

To address the problem that likelihood methods are sensitive to outliers, Scott [29] proposed a

L_{2} E

method, for which the objective function is

\int_{- \infty}^{+ \infty} f {(v | θ)}^{2} d v - \frac{2}{n} \sum_{i = 1}^{n} f (v_{i} | θ),

where

f (v | θ)

is a parametric probability density function (pdf) of a random variable V and

v_{i}, i = 1, \dots, n

is a given random sample.

Here, we assume that V follows the asymmetric normal distribution, i.e.,

V \sim A N (v; μ_{τ}, σ^{2}, τ)

. The corresponding pdf is

f (v; μ_{τ}, σ^{2}, τ) = \frac{2}{\sqrt{π σ^{2}}} \frac{\sqrt{τ (1 - τ)}}{\sqrt{τ} + \sqrt{1 - τ}} \exp \{- ρ_{τ} (\frac{v - μ_{τ}}{σ})\},

where

ρ_{τ} (t) = ω_{τ} (t) t^{2}

is asymmetric squared error loss [13],

ω_{τ} (t) = | τ - I (t \leq 0) |

with

I (\cdot)

being the indicator function. Moreover,

μ_{τ}

,

σ

and

τ

are the location, scale and asymmetric parameters, respectively. The following proposition gives the

L_{2} E

criterion of asymmetric normal distribution.

Proposition 1.

Suppose

V \sim A N (v; μ_{τ}, σ^{2}, τ)

, then the

L_{2} E

criterion is

\frac{1}{\sqrt{π σ^{2}}} \frac{\sqrt{2 τ (1 - τ)}}{\sqrt{τ} + \sqrt{1 - τ}} \{1 - 2 \sqrt{2} \frac{1}{n} \sum_{i = 1}^{n} \exp [- ρ_{τ} (\frac{v_{i} - μ_{τ}}{σ})]\} .

(1)

2.2. RoWER

We consider

τ

-mean [13] of the random variable

Z \in R

,

E^{τ} (Z) = arg min_{a \in R} E {ρ_{τ} (Z - a)}, τ \in (0, 1) .

In fact, the

τ

-mean corresponds to Efron’s w-mean [30], where

w = τ / (1 - τ)

. In economics, the

τ

-mean is also called the

τ

-expectile. Let

y = {(y_{1}, \dots, y_{n})}^{T}

be the n-dimensional response vector,

X = {(x_{1}, \dots, x_{n})}^{T}

be the

n \times p

design matrix with

x_{i} = {(x_{i 1}, \dots, x_{i p})}^{T}, i = 1, \dots, n

. The ALS regression is carried out using the following

arg min_{β \in R^{p}} \sum_{i = 1}^{n} ρ_{τ} (y_{i} - x_{i}^{T} β),

which degenerates to the OLS regression when

τ = 0.5

.

Consider the following linear model

y = X β^{τ} + ε^{τ},

(2)

where

β^{τ} = {(β_{1} (τ), \dots, β_{p} (τ))}^{T}

is a p-dimensional parameter vector and

ε^{τ} = {(ε_{1}^{τ}, \dots, ε_{n}^{τ})}^{T}

is the vector of n independent errors satisfying

E^{τ} (ε_{i}^{τ} | x_{i}) = 0, i = 1, \dots, n

for some

τ \in (0, 1)

. We adopt the sparsity assumption on

β^{τ}

, that is, the regression coefficient vector

β^{τ}

has many zero components. In model (2), it is crucial to understand that varying

τ

allows for variations in the coefficient vector

β^{τ}

, so we can model different locations of the conditional distribution. For convenience, the superscript

τ

of

β^{τ}

and

ε^{τ}

is omitted in the following so that no confusion arises.

By substituting

μ_{τ} = x_{i}^{T} β

and

σ = 1

into (1), we can obtain the following loss function by disregarding the terms that are independent of

β

,

L_{n} (β) = - \frac{1}{n} \sum_{i = 1}^{n} \exp {- ρ_{τ} (y_{i} - x_{i}^{T} β)} .

(3)

However, (3) may not be strictly convex, so we propose a new loss function in the following Proposition 2 by Taylor’s expansion and logarithmic transformation.

Proposition 2.

Given a consistent estimator

\tilde{β}

of β, minimizing (3) is transformed into minimizing the following loss

D_{n} (β) = \sum_{i = 1}^{n} π_{i} (\tilde{β}) ρ_{τ} (y_{i} - x_{i}^{T} β),

(4)

where

π_{i} (\tilde{β}) = \frac{\exp {- ρ_{τ} (y_{i} - x_{i}^{T} \tilde{β})}}{\sum_{l = 1}^{n} \exp {- ρ_{τ} (y_{l} - x_{l}^{T} \tilde{β})}}, i = 1, \dots, n,

which is abbreviated as

π_{i}

.

Here

π_{i}

’s can be treated as the weights of the asymmetric least squares loss, and the loss (4) is referred to as the RoWER. When

τ = 0.5

, the RoWER degenerates to the weighted least squares regression. This paper chooses the consistent estimator

\tilde{β}

as

β_{L}

based on Lemma A5. We assume that

π_{i}

’s are lower bounded.

3. The SRoWER and Sure Screening Property

Let

M

be any subset of

{1, \dots, p}

, which corresponds to a submodel with the relevant regression coefficient vector

β_{M} = {(β_{j}, j \in M)}^{T}

and the design matrix

x_{M} = {(x_{1 M}, \dots, x_{n M})}^{T}

,

x_{i M} = {(x_{i j}, j \in M)}^{T}

. In addition, let

{∥ \cdot ∥}_{2}

be the

L_{2}

-norm, and

{∥ \cdot ∥}_{0}

be the

L_{0}

-norm, which denotes the number of non-zero components of a vector. The size of model

M

is denoted as

s (M)

. The true model is represented by

M^{*} = {j : β_{j}^{*} \neq 0}

, with

β^{*}

being the true regression coefficient vector, and

s (M^{*}) = {∥ β^{*} ∥}_{0} = k_{0}

.

3.1. The IHT Algorithm

For the objective function

D_{n} (β)

, assuming that

β

is sparse with

s (M^{*}) = k_{0} ⩽ k

for some known k, the RoWER method with sparsity restriction (SRoWER) yields an estimator of

β

defined as

{\hat{β}}_{[k]} = arg min_{β} D_{n} (β), {∥ β ∥}_{0} ⩽ k,

(5)

and

\hat{M} = {j : {\hat{β}}_{[k] j} \neq 0}

stands for the set of subscripts of the non-zero components of

{\hat{β}}_{[k]}

.

For feature screening, the goal is to retain a relatively small number of features from p features. Currently, many studies have proposed methods to solve such problems. For example, Mallat and Zhang [31] proposed the matching pursuit algorithm. Moreover, the hard thresholding method proposed by Blumensath and Davies [32] is particularly effective for linear models. We now follow the idea of an iterative hard thresholding (IHT) algorithm to compute the SRoWER estimate. For

γ

within the neighborhood of a given

β

, the IHT uses the approximation of

D_{n} (\cdot)

,

Q_{n} (γ; β) = D_{n} (β) + {(γ - β)}^{T} T_{n} (β) + (u / 2) {∥ γ - β ∥}_{2}^{2},

(6)

where

T_{n} (β) = \frac{\partial D_{n} (β)}{\partial β} = - \sum_{i = 1}^{n} 2 π_{i} ω_{τ, i} (β) (y_{i} - x_{i}^{T} β) x_{i},

ω_{τ, i} (β) = | τ - I (y_{i} \leq x_{i}^{T} β) |

, and

u > 0

is a scale parameter. Denote

T_{n} (β) = {(T_{n j} (β), j = 1, \dots, p)}^{T}

.

By (6), the approximate solution of (5) can be obtained by the following iterative procedure

β^{(t + 1)} = arg min_{γ} Q_{n} (γ; β^{(t)}), {∥ γ ∥}_{0} \leq k .

(7)

The optimization of (7) is equivalent to

β^{(t + 1)} = arg min_{γ} ∥ γ - u^{- 1} {u β^{(t)} - T_{n} (β^{(t)})} ∥_{2}^{2}, {∥ γ ∥}_{0} \leq k .

(8)

If there is no constraint

{∥ γ ∥}_{0} \leq k

, the analytic solution of (8) is

\tilde{γ} = β^{(t)} - u^{- 1} T_{n} (β^{(t)})

. However, due to the sparsity restriction,

β^{(t + 1)}

can be obtained by selecting the component of

\tilde{γ}

with the largest absolute value before k, i.e.,

V (\tilde{γ}; k) = {(V ({\tilde{γ}}_{1}; r), \dots, V ({\tilde{γ}}_{p}; r))}^{T},

where r is the k-th largest component of

(| \tilde{γ} |_{1}, \dots, | \tilde{γ} {|_{p})}^{T}

, and

V (γ; r) = γ I (| γ | \geq r)

is a hard thresholding function. Given the sparse solution

β^{(t)}

obtained at the t-th iteration, iterating (8) is equivalent to iterating the following expression

β^{(t + 1)} = V (β^{(t)} - u^{- 1} T_{n} (β^{(t)}); k) .

The ultra-high dimensional case is often faced with a huge amount of computational tasks including matrix operations. However, the use of thresholding functions can eliminate this issue. Moreover, it naturally incorporates information on the correlations between predictors. Theorem 1 shows that the value of

D_{n} (\cdot)

decreases as the number of iterations increases.

Theorem 1.

Let

{β^{(t)}}

be the sequence obtained by(7),

λ_{\max}

be the maximum eigenvalue of

X^{T} X

. If

u / 2 - \bar{c} λ_{\max} \geq 0

with

\bar{c} = τ \lor (1 - τ)

, the value of

D_{n} (\cdot)

decreases as the number of iterations increases, i.e.,

D_{n} (β^{(t + 1)}) \leq D_{n} (β^{(t)})

.

3.2. Sure Screening Property

This subsection will prove the sure screening property of feature screening based on the SRoWER method. Define

M_{+}^{k} = {M : M^{*} \subset {M; ∥ M ∥}_{0} \leq k},

M_{-}^{k} = {M : M^{*} ⊄ {M; ∥ M ∥}_{0} \leq k}

as the collections of the over-fitted models and the under-fitted models, respectively. When p,

k_{0}

, k and

β^{*}

vary along with the sample size n, we provide the asymptotic property of

{\hat{β}}_{[k]}

. Additionally, we make the following assumptions, some of which are completely technical and only help us comprehend the SRoWER method theoretically.

(A1): $\log p = O (n^{α})$ for some $0 \leq α < 1$ .
(A2): There exist $w_{1}, w_{2} > 0$ and some non-negative constants $η_{1}, η_{2}$ , such that

$\min_{j \in M^{*}} | β_{j}^{*} | \geq w_{1} n^{- η_{1}}$

and

$k_{0} \leq k \leq w_{2} n^{η_{2}} .$
(A3): There exists a constant $η_{3} > 0$ , such that ${max}_{1 \leq i \leq n} {max}_{1 \leq j \leq p} | x_{i j} | \leq η_{3}$ .
(A4): Suppose that the random errors $ε_{i}$ are i.i.d. sub-Gaussian random variables satisfying $E^{τ} (ε_{i} | x_{i}) = 0, i = 1, \dots, n$ .
(A5): Let $δ = β - β^{*}$ and

$1 / 2 R_{n} = D_{n} (β) - D_{n} (β^{*}) - δ^{T} T_{n} (β^{*}) .$

There exists a constant $v > 0$ , such that, for sufficiently large n,

$R_{n} \geq v n {∥ δ_{M^{*}} ∥}_{2}^{2}$

for $∥ δ_{M_{c}^{*}} ∥_{1} \leq 3 {∥ δ_{M^{*}} ∥}_{1}$ with $M_{c}^{*}$ being the complement of $M^{*}$ .

Condition (A1) shows that p diverges exponentially with n, which is a common setting in the ultra-high dimension. The two requirements in Condition (A2) are crucial for establishing the sure screening property. The former one implies that the signals of the true model are stronger than the random errors, so they are detectable. The latter one implies that the sparsity of

β

makes sure screening possible with

s (\hat{M}) = k \geq k_{0}

. Condition (A3) is a regular condition for the theoretical derivation. Condition (A4) is the same as the assumption of [17]. Condition (A5) is similar to [11].

Theorem 2.

Suppose that Conditions (A1)–(A5) are satisfied with

2 η_{1} + η_{2} < 1 - α

. Let

\hat{M}

be the estimated model obtained by the SRoWER with size k; then, we have

P {M^{*} \subset \hat{M}} \to 1, as n \to \infty .

By using feature screening, important features that are highly correlated with the response variable can be kept in

\hat{M}

. However, it is necessary to note that there is no explicit choice for k, because it depends on the different dimensions. Note that the IHT algorithm needs a initial estimate

β^{(0)}

. To further enhance computational efficiency, the LASSO estimate is chosen as the initial value of the iterations. The following theorem shows that with the initial value

β^{(0)}

obtained using LASSO, the IHT-implemented SRoWER can satisfy the property of sure independent screening within a finite iteration.

Theorem 3.

Let

β^{(t)}

be the t-th update of the IHT procedure. The scale parameter

u \geq ξ k n

for some

ξ > 0

, and let

M^{(t)} = {j : β_{j}^{(t)} \neq 0}

be the screening features. The initial value of iteration is

β^{(0)} = arg min_{β} {D_{n} (β) + {n λ ∥ β ∥}_{1}},

where λ satisfies

λ n^{(1 - α) / 2} \to \infty

and

λ = o (n^{- (η_{1} + η_{2})})

. Then, under Conditions (A1)–(A5), for any finite

t \geq 1

, we have

P {M^{*} \subset M^{(t)}} \to 1, as n \to \infty .

3.3. The Choice of k

For the SRoWER method, we need prespecified k, such as

log (n) n^{1 / 3}

[4,6,11]. Here, we treat k as a tuning parameter to control model complexity, and determine k by minimizing the following EBIC score:

E B I C (k) = log \{\frac{1}{n} \sum_{i = 1}^{n} π_{i} (\tilde{β}) ρ_{τ} (y_{i} - x_{i M}^{T} β_{M})\} + k \frac{log (n) log (p)}{2 n},

where

k = s (M)

. The study of [33] proposed the EBIC for model selection for large model spaces. Here, we use it to determine k for comparing the SRoWER with the EPCS proposed by [25], which also used the EBIC for model selection.

Note that the EBIC selector for determining k requires searching over

k = 1, 2, \dots, p

. To balance the computation and model selection accuracy in practice, we minimize

E B I C (k)

for

k = 1, \dots, log (n) n^{1 / 3}

.

4. Numerical Studies

4.1. Simulation Studies

In this subsection, the finite sample performance of SRoWER is evaluated using simulation studies and compared with EPCS [25] and SMLE [11] based on expectile regression, i.e.,

π_{i} = 1, i = 1, \dots, n

in the SRoWER. The IHT algorithm is used to carry out feature screening based on SRoWER, and the iteration is stopped when

∥ β^{(t)} - β^{(t - 1)} ∥_{2} \leq 10^{- 3}

.

We take

p = 500

,

n = 100, 200

, and expectile level

τ = 0.5, 0.05

, which correspond to the mean regression and an extreme expectile regression, respectively. All simulation results are based on 200 replications (with standard deviations in parentheses). To evaluate the performance of the screening approach, we use three criteria: the number of true positive variables (TP), the percentage of correctly fitted models (CF), and the root mean-squared error (RMSE)

∥ {\hat{β}}_{[k]} - β^{*} ∥_{2} .

Example 1.

Consider the linear model

y_{i} = 3 x_{i 6} + 1.5 x_{i 12} + x_{i 15} + x_{i 20} + [ε_{i} - E^{τ} (ε_{i})], i = 1, \dots, n,

where the candidate features

x_{i}

are i.i.d. generated from multivariate normal distribution

N_{p} (0, Σ)

with

Σ = (σ_{j k})

, and

σ_{j k} = ρ^{| j - k |}

. We set

ρ = 0

and

0.5

. The true model is

M^{*} = {6, 12, 15, 20}

.

ε_{i}

is generated from the standard gumbel distribution (Gumbel), the standard normal distribution (Normal), and the t distribution with three degrees of freedom (T), respectively.

Example 1 considers a relatively simple case. The simulation results are given in Table 1 and Table 2. For

τ = 0.5

, we can see that all three considered methods can almost screen all important features for three different error distributions. No one method can control the other two methods in all cases, but the SRoWER performs better than the SMLE and EPCS in most instances. However, at extreme expectiles

(τ = 0.05)

, the SRoWER method performs much better than the SMLE and EPCS in terms of RMSE except for the case of Gumbel with

n = 200

. In addition, all the results become better when the sample size increases from 100 to 200.

Table 1. Simulation results of Example 1 for

τ = 0.5

.

Table 2. Simulation results of Example 1 for

τ = 0.05

.

Example 2.

For the linear model in Example 1, to examine the robustness of the SRoWER, we consider the case where there are outliers in the covariates. We first generate data as those in Example 1. Next, we artificially add outliers from

N (0, 5^{2})

to random 50 covariates of 10% of the observations. The other settings are the same as those in Example 1.

Example 2 considers the case where both covariates and response variables have outliers. The simulation results are shown in Table 3 and Table 4. We can see that the SRoWER has the smallest RMSE compared with SMLE and EPCS. Three considered methods have similar performance in variable selection, except for the case of T. Both SRoWER and SMLE perform better than EPCS in terms of CF for the case of T.

Table 3. Simulation results of Example 2 for

τ = 0.5

.

Table 4. Simulation results of Example 2 for

τ = 0.05

.

Example 3.

Here we consider a heterogeneous model. We first generate

{(z_{1}, \dots, z_{p})}^{T}

from multivariate normal distribution

N_{p} (0, Σ)

with

Σ = (σ_{j k})

, and

σ_{j k} = ρ^{| j - k |}

. We set

ρ = 0

and

0.5

. Let

x_{1} = Φ (z_{1})

and

x_{j} = z_{j}, j = 2, 3, \dots, p

, where

Φ (\cdot)

is the cumulative distribution function of standard normal distribution. The response is then simulated from the following normal linear heteroscedastic model

y_{i} = 3 x_{i 6} + 1.5 x_{i 12} + x_{i 15} + x_{i 20} + (0.7 x_{i 1}) [ε_{i} - E^{τ} (ε_{i})],

where

ε_{i} \sim N (0, 1)

. Meanwhile, the other settings are the same as those in Example 2.

From Table 5 and Table 6, we can see that the conclusions are similar to those of Examples 1 and 2. Hence, the SRoWER performs well even in the case that there are outliers in the heterogeneous model.

Table 5. Simulation results of Example 3 for

τ = 0.5

.

Table 6. Simulation results of Example 3 for

τ = 0.05

.

4.2. Real Data Example

This subsection applies the SRoWER method for feature screening to the Mid-Atlantic wage data with 3000 observations and 8 predictors from [34] that are available in the `ISLR’ package in R. A total of eight predictors (two continuous and six categorical) are considered. The continuous variables are year that wage information was recorded (year) and age of worker (age). The categorical factors include a factor with the following levels: 1. Never Married, 2. Married, 3. Widowed, 4. Divorced, and 5. Separated, which indicate marital status (marital). Another factor (race) contains the following levels: 1. White, 2. Black, 3. Asian and 4. Other. Another factor (education) contains the following levels 1. <HS Grad, 2. HS Grad, 3. Some College, 4. College Grad and 5. Advanced Degree. Another factor (jobclass) contains the following levels: 1. Industrial and 2. Information indicating type of job. Another factor (health) contains the following levels, 1. ≤Good and 2. ≥Very Good, indicating health level of worker. Another factor contains the following levels, 1. Yes and 2. No, indicating whether worker has health insurance (health_ins). We use the dummy variables to represent six categorical variables. Therefore, there are 16 covariates, and the response is the logarithm of wage. Following the set up of [35], to demonstrate the application in high dimension, we extend the data by introducing the following artificial covariates:

X_{j} = \frac{Z_{j} + 2 W}{3}, j = 17, \dots, 500,

where

Z_{j}

is the standard normal random variables and W follows the standard uniform distribution.

To test the prediction performance of SRoWER, EPCS, and SMLE, we randomly generatde 100 partitions of the full data, and divided the data into two parts, where

n_{t r} = 1500

samples are treated as training data and the remaining

n_{t e} = 1500

samples are treated as testing data. The average of model size (Size), the number of selected noise variables (SNV) and expectile prediction error (EPE) at

τ = 0.5

and

0.05

, where EPE is computed by the test data

E P E_{τ} = \frac{1}{n_{t e}} \sum_{i = 1}^{n_{t e}} π_{i} (\tilde{β}) ρ_{τ} (y_{i} - {\hat{y}}_{i}),

where

{\hat{y}}_{i} = x_{i}^{T} {\hat{β}}_{τ}

is the

τ

expectile estimate of the i-th observation with

{\hat{β}}_{τ}

being calculated based on the training data set, and

y_{i}

is the i-th observation in the test set.

The results are reported in Table 7. For

τ = 0.5

, both SRoWER and SMLE perform similarly in terms of EPE and SNV, while the SRoWER includes more than one variable compared with the SMLE. Although the model size of the SRoWER and EPCS are similar, the EPE of the EPCS is the largest among three methods. For

τ = 0.05

, the SRoWER performs best, while the EPCS performs worst. The selected model sizes vary for different

τ

, it indicates the heteroscedasticity of the model. This conclusion agrees with the results of [36].

Table 7. Expectile prediction error (EPE), model size (Size), and selected noise variables (SNV) over 100 repetitions and their standard errors (in parentheses) for wage data.

5. Conclusions

To deal with the heterogeneity and the outliers in the covariates and/or the response, this paper proposes the RoWER method, which is further applied to screen the features in ultra-high dimensional data. We have also proposed an iterative hard-thresholding algorithm for implementing the feature screening procedure, and establish the sure screening property for the SRoWER method. Simulation studies and a real data analysis verify that the SRoWER method not only reduces the huge computational effort faced by ultra-high dimensional data, but also shows excellent robustness in heterogeneous data. Compared with ISIS [6], the SRoWER naturally accounts for the joint effects between features, and benefits from the advantage of the SMLE in terms of computational efficiency. Based on the proposed method, the problem of robust feature screening for classification data also presents a promising direction for future research.

Author Contributions

Conceptualization, M.W.; methodology, X.W., P.H. and M.W.; software, X.W.; formal analysis, X.W. and P.H.; writing—original draft preparation, X.W.; writing—review and editing, P.H. and M.W.; supervision, M.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (12271294), and the Natural Science Foundation of Shandong Province (ZR2024MA089).

Data Availability Statement

Data sets were provided in the `ISLR’ package in R.

Acknowledgments

The authors are grateful to the editor and reviewers for their valuable comments and suggestions. We also sincerely thank Yundong Tu for providing their codes for us.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Proof of Proposition 1.

It is seen that

\begin{matrix} \int_{- \infty}^{+ \infty} f^{2} d u & = \int_{- \infty}^{+ \infty} \frac{4 τ (1 - τ)}{π σ^{2} (1 + 2 \sqrt{τ (1 - τ)})} \exp \{- 2 ρ_{τ} (\frac{u - μ_{τ}}{σ})\} d u \\ = \frac{4 τ (1 - τ)}{π σ^{2} (1 + 2 \sqrt{τ (1 - τ)})} \int_{- \infty}^{+ \infty} \exp \{- 2 | τ - I (u \leq μ_{τ}) | \frac{{(u - μ_{τ})}^{2}}{σ^{2}}\} d u \\ = \frac{4 τ (1 - τ)}{π σ^{2} (1 + 2 \sqrt{τ (1 - τ)})} \times \\ [\int_{μ_{τ}}^{\infty} \exp \{- 2 τ \frac{{(u - μ_{τ})}^{2}}{σ^{2}}\} d u + \int_{- \infty}^{μ_{τ}} \exp \{- 2 (1 - τ) \frac{{(u - μ_{τ})}^{2}}{σ^{2}}\} d u] . \end{matrix}

Note that

\int_{μ_{τ}}^{+ \infty} \exp \{- 2 τ \frac{{(u - μ_{τ})}^{2}}{σ^{2}}\} d u = \frac{σ \sqrt{π}}{2 \sqrt{2 τ}},

\int_{- \infty}^{μ_{τ}} \exp \{- 2 (1 - τ) \frac{{(u - μ_{τ})}^{2}}{σ^{2}}\} d u = \frac{σ \sqrt{π}}{2 \sqrt{2 (1 - τ)}} .

Then, we have

\begin{matrix} \int_{- \infty}^{+ \infty} f^{2} d u & = \frac{4 τ (1 - τ)}{π σ^{2} (1 + 2 \sqrt{τ (1 - τ)})} [\frac{σ \sqrt{π}}{2 \sqrt{2 τ}} + \frac{σ \sqrt{π}}{2 \sqrt{2 (1 - τ)}}] \\ = \frac{(\sqrt{τ} + \sqrt{1 - τ}) \sqrt{2 τ (1 - τ)}}{σ \sqrt{π} (1 + 2 \sqrt{τ (1 - τ)})} \\ = \frac{1}{\sqrt{π σ^{2}}} \frac{\sqrt{2 τ (1 - τ)}}{\sqrt{τ} + \sqrt{1 - τ}} . \end{matrix}

Hence, the

L_{2} E

criterion for asymmetric normal distribution is

\frac{1}{\sqrt{π σ^{2}}} \frac{\sqrt{2 τ (1 - τ)}}{\sqrt{τ} + \sqrt{1 - τ}} \{1 - 2 \sqrt{2} \frac{1}{n} \sum_{i = 1}^{n} \exp [- ρ_{τ} (\frac{v_{i} - μ_{τ}}{σ})]\} .

The proposition is proved. □

Proof of Proposition 2.

By Taylor’s expansion, we have

\begin{matrix} \log (- L_{n} (β)) \\ \approx & \log (- L_{n} (\tilde{β})) - \sum π_{i} (\tilde{β}) [ρ_{τ} (y_{i} - x_{i}^{T} β) - ρ_{τ} (y_{i} - x_{i}^{T} \tilde{β})], \end{matrix}

for

β \approx \tilde{β}

, where

π_{i} (\tilde{β}) = \frac{\exp {- ρ_{τ} (y_{i} - x_{i}^{T} \tilde{β})}}{\sum_{l = 1}^{n} \exp {- ρ_{τ} (y_{l} - x_{l}^{T} \tilde{β})}} .

Therefore, the minimization of

L_{n} (β)

is transformed into the minimization of the following objective function

D_{n} (β) = \sum_{i = 1}^{n} π_{i} (\tilde{β}) ρ_{τ} (y_{i} - x_{i}^{T} β) .

The proposition is then proved. □

Before proving Theorem 1, we give the following lemmas.

Lemma A1

([17]). The asymmetric least squared loss

ρ_{τ} (\cdot)

is continuously differentiable, but is not twice differentiable at zero when

τ \neq 0.5

. Moreover, for any

t, t_{0} \in R

and

τ \in (0, 1)

, we have

\underset{̲}{c} {(t - t_{0})}^{2} \leq ρ_{τ} (t) - ρ_{τ} (t_{0}) - ρ_{τ}^{'} (t_{0}) (t - t_{0}) \leq \bar{c} {(t - t_{0})}^{2},

where

\underset{̲}{c} = τ \land (1 - τ)

,

\bar{c} = τ \lor (1 - τ)

, which confirms that

ρ_{τ}

is strongly convex.

Lemma A2

([17]). For any

t, t_{0} \in R

and

τ \in (0, 1)

, we have

2 \underset{̲}{c} | t - t_{0} | \leq | ρ_{τ}^{'} (t) - ρ_{τ}^{'} (t_{0}) | \leq 2 \bar{c} | t - t_{0} | .

The lemma follows that

ρ_{τ}^{'} (\cdot)

is Lipschitz continuous.

Lemma A3

([17]). Let

Z_{1}, \dots, Z_{n} \in R

be i.i.d. sub-Gaussian random variables, and

K = \max_{i} {∥ Z_{i} ∥}_{SG}

be sub-Gaussian norm,

i = 1, \dots, n

, where

∥ Z_{i} ∥_{SG} = {sup}_{s \geq 1} s^{- 1 / 2} (E | Z_{i} {|^{s})}^{1 / s}

. Then for any

a = {(a_{1}, \dots, a_{n})}^{T} \in R^{n}

and

h \geq 0

, we have

P \{|\sum_{i = 1}^{n} a_{i} Z_{i}| \geq h\} \leq 2 \exp \{- \frac{C h^{2}}{K^{2} {∥ a ∥}_{2}^{2}}\} .

where

C > 0

is a constant.

Lemma A4

([17]). Let Z be sub-Gaussian random variable,

Z^{+} = \max (Z, 0)

,

Z^{-} = \max (- Z, 0)

; then, random variables

Z^{+}

and

Z^{-}

are also sub-Gaussian. For any

b_{1}, b_{2} \in R

,

b_{1} Z^{+} + b_{2} Z^{-}

is sub-Gaussian.

Proof of Theorem 1.

The definition of the IHT algorithm based on

Q_{n} (\cdot)

is

Q_{n} (γ; β) = D_{n} (β) + {(γ - β)}^{T} T_{n} (β) + (u / 2) {∥ γ - β ∥}_{2}^{2},

β^{(t + 1)} = {argmin}_{γ} Q_{n} (γ; β^{(t)}), {∥ γ ∥}_{0} \leq k .

By Lemma A1 and the assumption

u / 2 \geq \bar{c} λ_{\max}

, we have

\begin{matrix} D_{n} (β^{(t)}) & = Q_{n} (β^{(t)}; β^{(t)}) \\ \geq Q_{n} (β^{(t + 1)}; β^{(t)}) \\ = D_{n} (β^{(t + 1)}) - [D_{n} (β^{(t + 1)}) - D_{n} (β^{(t)}) - {(β^{(t + 1)} - β^{(t)})}^{T} T_{n} (β^{(t)})] \\ + (u / 2) ∥ β^{(t + 1)} - β^{(t)} ∥_{2}^{2} \\ \geq D_{n} (β^{(t + 1)}) - \bar{c} ∥ x (β^{(t + 1)} - β^{(t)}) ∥_{2}^{2} + u / 2 {∥ β^{(t + 1)} - β^{(t)} ∥}_{2}^{2} \\ \geq D_{n} (β^{(t + 1)}) + (u / 2 - \bar{c} λ_{\max}) {∥ β^{(t + 1)} - β^{(t)} ∥}_{2}^{2} . \end{matrix}

This proves that

D_{n} (β^{(t)})

decreases after each iteration. □

Lemma A5.

For some

λ > 0

, denote

β_{L} = {argmin}_{β} {D_{n} (β) + {n λ ∥ β ∥}_{1}}

. Under Conditions of Theorem 3, we have

P {∥ β_{L} - β^{*} ∥_{1} \leq 16 v^{- 1} λ k_{0}} \to 1, n \to \infty,

where v is the same as defined in Condition (A5).

Proof of Lemma A5.

Based on the definition of

β_{L}

, we have

D_{n} (β_{L}) + n λ ∥ β_{L} ∥_{1} \leq D_{n} (β^{*}) + n λ {∥ β^{*} ∥}_{1},

that is,

D_{n} (β_{L}) - D_{n} (β^{*}) \leq n λ ∥ β^{*} ∥_{1} - n λ {∥ β_{L} ∥}_{1} .

Denote

1 / 2 R_{n} = D_{n} (β_{L}) - D_{n} (β^{*}) - {(β_{L} - β^{*})}^{T} T_{n} (β^{*})

and

δ^{L} = β_{L} - β^{*}

, it follows that

\begin{matrix} n^{- 1} R_{n} & \leq 2 n^{- 1} \sum_{i = 1}^{n} 2 π_{i} ω_{τ, i} (β^{*}) (y_{i} - x_{i}^{T} β^{*}) x_{i}^{T} δ^{L} + 2 λ ∥ β^{*} ∥_{1} - 2 λ {∥ β_{L} ∥}_{1} \\ \leq 2 n^{- 1} \max_{1 \leq j \leq p} | T_{n j} (β^{*}) | \cdot ∥ δ^{L} ∥_{1} + 2 λ ∥ β^{*} ∥_{1} - 2 λ {∥ β_{L} ∥}_{1} . \end{matrix}

We now derive a bound on

T_{n} (β^{*})

. Define

A = \{\max_{1 \leq j \leq p} | T_{n j} (β^{*}) | \leq n λ / 2\} .

By Lemma A3 and Condition (A3), for each

j \in {1, \dots, p}

, we have

\begin{matrix} P (|T_{n j} (β^{*})| > n λ / 2) & = P \{|\sum_{i = 1}^{n} π_{i} x_{i j} ζ_{i}| > n λ / 2\} \\ \leq 2 \exp \{- \frac{C n^{2} λ^{2}}{4 K^{2} \sum_{i = 1}^{n} {(π_{i} x_{i j})}^{2}}\} \\ \leq 2 \exp \{- \frac{C n λ^{2}}{4 K^{2} η_{3}^{2}}\}, \end{matrix}

where

ζ_{i} = 2 ω_{τ, i} (β^{*}) (y_{i} - x_{i}^{T} β^{*})

, which is sub-Gaussian with Condition (A4) and Lemma A4. Hence, for any

α \in (0, 1)

, and some generic positive constants

c, a

,

\begin{matrix} P (A^{c}) & \leq & \sum_{j = 1}^{p} P \{|T_{n j} (β^{*})| > n λ / 2\} \\ \leq & 2 p \exp \{- \frac{C n λ^{2}}{4 K^{2} η_{3}^{2}}\} \\ \leq & \exp {a n^{α} - c λ^{2} n} \to 0, \end{matrix}

(A1)

as

n \to 0

. This implies

P (A) \to 1

. Therefore, under the event

A

, we have

n^{- 1} R_{n} \leq λ ∥ δ^{L} ∥_{1} + 2 λ ∥ β^{*} ∥_{1} - 2 λ {∥ β_{L} ∥}_{1} .

This further implies that

\begin{matrix} n^{- 1} R_{n} + λ {∥ δ^{L} ∥}_{1} & \leq & 2 λ (\sum_{i = 1}^{p} | β_{j} - β_{j}^{*} | + | β_{j}^{*} | - | β_{j} |) \\ = & 2 λ (\sum_{j \in M^{*}} | β_{j} - β_{j}^{*} | + | β_{j}^{*} | - | β_{j} |) \\ \leq & 4 λ ∥ δ_{M^{*}}^{L} ∥_{1} . \end{matrix}

(A2)

Since

R_{n} \geq 0

by Lemma A1, we have

∥ δ^{L} ∥_{1} \leq 4 {∥ δ_{M^{*}}^{L} ∥}_{1}

, which leads to

∥ δ_{M_{c}^{*}}^{L} ∥_{1} \leq 3 {∥ δ_{M^{*}}^{L} ∥}_{1}

. By Condition (A5), (A2) and Cauchy inequality, it follows that

∥ δ_{M^{*}}^{L} ∥_{1}^{2} \leq k_{0} ∥ δ_{M^{*}}^{L} ∥_{2}^{2} \leq k_{0} v^{- 1} (n^{- 1} R_{n}) \leq 4 v^{- 1} k_{0} λ {∥ δ_{M^{*}}^{L} ∥}_{1},

which gives rise to

∥ δ_{M^{*}}^{L} ∥_{1} \leq 4 v^{- 1} λ k_{0}

. Thus, under the event

A

, we have

∥ δ^{L} ∥_{1} = ∥ δ_{M_{c}^{*}}^{L} ∥_{1} + ∥ δ_{M^{*}}^{L} ∥_{1} \leq 4 {∥ δ_{M^{*}}^{L} ∥}_{1} \leq 16 v^{- 1} λ k_{0} .

The lemma is proved. □

Proof of Theorem 2.

Let

{\hat{β}}_{M}

be the estimator of

β

obtained by SRoWER based on model

M

. If

P {\hat{M} \in M_{+}^{k}} \to 1

, the theorem is then proved. This suffices to show that

P {\min_{M \in M_{-}^{k}} D_{n} ({\hat{β}}_{M}) \leq \max_{M \in M_{+}^{k}} D_{n} ({\hat{β}}_{M})} \to 0,

as

n \to \infty

.

For any

M \in M_{-}^{k}

, define

M^{'} = M \cup M^{*} \in M_{+}^{2 k}

. Consider

β_{M^{'}}

close to

β_{M^{'}}^{*}

such that

∥ β_{M^{'}} - β_{M^{'}}^{*} ∥_{2} = w_{1} n^{- η_{1}}

, for some

w_{1}, η_{1} > 0

. When n is sufficiently large,

β_{M^{'}}

falls into a small neighborhood of

β_{M^{'}}^{*}

. By Lemma A1, we have

\begin{matrix} D_{n} (β_{M^{'}}^{*}) - D_{n} (β_{M^{'}}) \\ = & \sum_{i = 1}^{n} π_{i} [ρ_{τ} (y_{i} - x_{i, M^{'}}^{T} β_{M^{'}}^{*}) - ρ_{τ} (y_{i} - x_{i, M^{'}}^{T} β_{M^{'}})] \\ \leq & \sum_{i = 1}^{n} [π_{i} {(β_{M^{'}} - β_{M^{'}}^{*})}^{T} ζ_{i} x_{i, M^{'}} - \underset{̲}{c} π_{i} {(x_{i, M^{'}}^{T} (β_{M^{'}} - β_{M^{'}}^{*}))}^{2}] \\ \leq & |\sum_{i = 1}^{n} π_{i} {(β_{M^{'}} - β_{M^{'}}^{*})}^{T} ζ_{i} x_{i, M^{'}}| - \underset{̲}{c} \underset{̲}{π} λ_{\min} w_{1}^{2} n^{- 2 η_{1}}, \end{matrix}

where

ζ_{i} = 2 ω_{τ} (β_{M^{'}}^{*}) (y_{i} - x_{i, M^{'}}^{T} β_{M^{'}}^{*})

,

λ_{\min}

is the smallest eigenvalue of

x_{M^{'}}^{T} x_{M^{'}}

, and

\underset{̲}{π}

is the lower bound of

{π_{i}, 1 \leq i \leq n}

. By Lemma A3, we have

\begin{matrix} P {D_{n} (β_{M^{'}}^{*}) - D_{n} (β_{M^{'}}) \geq 0} \\ \leq & P \{|\sum_{i = 1}^{n} π_{i} {(β_{M^{'}} - β_{M^{'}}^{*})}^{T} ζ_{i} x_{i, M^{'}}| \geq \underset{̲}{π} \underset{̲}{c} λ_{\min} w_{1}^{2} n^{- 2 η_{1}}\} \\ \leq & 2 exp \{- \frac{{\underset{̲}{c}}^{2} {\underset{̲}{π}}^{2} λ_{\min}^{2} w_{1}^{4} n^{- 4 η_{1}}}{4 K^{2} \sum_{i = 1}^{n} π_{i}^{2} {({(β_{M^{'}} - β_{M^{'}}^{*})}^{T} x_{i, M^{'}})}^{2}}\} \\ \leq & 2 exp \{- \frac{{\underset{̲}{c}}^{2} {\underset{̲}{π}}^{2} λ_{\min}^{2} w_{1}^{2} n^{- 4 η_{1}}}{4 K^{2} λ_{\max} n^{- 2 η_{1}}}\}, \end{matrix}

which leads to

\begin{matrix} P {D_{n} (β_{M^{'}}^{*}) \geq D_{n} (β_{M^{'}})} \\ \leq & 2 \exp \{- \frac{{\underset{̲}{c}}^{2} {\underset{̲}{π}}^{2} λ_{\min}^{2} w_{1}^{2} n^{- 4 η_{1}}}{4 K^{2} λ_{\max} n^{- 2 η_{1}}}\} . \end{matrix}

Thus, by Bonferroni inequality and Condition (A1),

\begin{matrix} P {D_{n} (β_{M^{'}}^{*}) \geq \min_{M \in M_{-}^{k}} D_{n} (β_{M^{'}})} \\ \leq & \sum_{M \in M_{-}^{k}} P {D_{n} (β_{M^{'}}^{*}) \geq D_{n} (β_{M^{'}})} \\ \leq & 2 p^{k} \exp {- b n^{1 - 2 η_{1}}} \\ \leq & 2 \exp {c w_{2} n^{η_{2}} n^{α} - b n^{1 - 2 η_{1}}} = o (1), \end{matrix}

where b is some generic positive constant. Due to the convexity of

D_{n} (β_{M^{'}})

in

β_{M^{'}}

, the above conclusion holds for any

β_{M^{'}}

, such that

∥ β_{M^{'}} - β_{M^{'}}^{*} ∥_{2} \geq w_{1} n^{- η_{1}}

.

For any

M \in M_{-}^{k}

, let

{\dot{β}}_{M^{'}}

be

{\hat{β}}_{M}

augmented with zero corresponding to the elements in

M^{'} / M^{*}

. By Condition (A2), it is seen that

∥ {\dot{β}}_{M^{'}} - β_{M^{'}}^{*} ∥_{2} \geq {∥ β_{M^{'} / M^{*}}^{*} ∥}_{2} \geq w_{1} n^{- η_{1}} .

Consequently,

\begin{matrix} P {\max_{M \in M_{+}^{k}} D_{n} ({\hat{β}}_{M}) \geq \min_{M \in M_{-}^{k}} D_{n} ({\hat{β}}_{M})} \\ \leq & P {D_{n} (β_{M^{'}}^{*}) \geq \min_{M \in M_{-}^{k}} D_{n} ({\dot{β}}_{M^{'}})} \\ = & o (1) . \end{matrix}

The theorem is proved. □

Proof of Theorem 3.

Let

q = \min_{j \in M^{*}} | β_{j}^{*} |

. By Condition (A2),

q = O (w_{1} n^{- η_{1}})

, it suffices to prove for any

t \geq 0

P \{∥ β^{(t)} - β^{*} ∥_{\infty} \leq q / 2\} \to 1 .

It is clearly implied by

∥ β^{(t)} - β^{*} ∥_{\infty} = o_{P} (q) .

(A3)

We use mathematical induction to prove (A3).

Step 1.

When

t = 0

, the initial value for the iteration

β^{(0)}

is defined as the LASSO estimator of

β

, that is

β_{L}

. By Lemma A5, we have

P {∥ β^{(0)} - β^{*} ∥_{1} \leq 16 v^{- 1} λ k_{0}} \to 1, as n \to \infty .

Under Condition (A2), we have

k_{0} = O (n^{η_{2}})

,

q^{- 1} = O (n^{η_{1}})

. By

λ = o (n^{- (η_{1} + η_{2})})

,

λ k_{0} = o (q)

. Hence, when

t = 0

, (A3) holds.

Step 2.

Suppose (A3) is true for

t - 1

, we show that (A3) is also true for t with

t \geq 1

. By the definition of

V

,

β^{(t)} = V (γ^{(t)}; k)

, where

γ^{(t)} = β^{(t - 1)} - u^{- 1} T_{n} (β^{(t - 1)})

, then

∥ β^{(t)} - β^{*} ∥_{\infty} = ∥ V (γ^{(t)}; k) - β^{*} ∥_{\infty} \leq {∥ γ^{(t)} - β^{*} ∥}_{\infty} .

Thus, (A3) is proved by showing

∥ γ^{(t)} - β^{*} ∥_{\infty} = o_{P} (q)

. Note that

∥ γ^{(t)} - β^{*} ∥_{\infty} \leq ∥ β^{(t - 1)} - β^{*} ∥_{\infty} + u^{- 1} {∥ T_{n} (β^{(t - 1)}) ∥}_{\infty} .

Lemma A2 implies that for any

j \in {1, \dots, p}

, we have

| T_{n j} (β^{(t - 1)}) - T_{n j} (β^{*}) | \leq 2 \bar{c} \sum_{i = 1}^{n} | x_{i}^{T} (β^{(t - 1)} - β^{*}) | | π_{i} x_{i j} |,

which further implies

\begin{matrix} u^{- 1} | T_{n j} (β^{(t - 1)}) | \\ \leq & u^{- 1} | T_{n j} (β^{*}) | + 2 u^{- 1} \bar{c} {∥ β^{(t - 1)} - β^{*} ∥}_{\infty} \max_{1 \leq j \leq p} \{\sum_{i = 1}^{n} \sum_{h \in M_{t}} | x_{i j} x_{i h} |\} \\ \leq & u^{- 1} ∥ T_{n} (β^{*}) ∥_{\infty} + 2 u^{- 1} n k η_{3}^{2} {∥ β^{(t - 1)} - β^{*} ∥}_{\infty}, \end{matrix}

where

M_{t} = M^{(t)} \cup M^{*}

,

s (M_{t}) \leq 2 k

. Meanwhile, (A1) gives rise to

∥ T_{n} (β^{*}) ∥_{\infty} = O_{P} (n λ)

. By

λ n^{η_{1} + η_{2}} = o (1)

and

u \geq ξ n k

, we get

u^{- 1} {∥ T_{n} (β^{*}) ∥}_{\infty} = o_{P} (q)

, due to

2 u^{- 1} n k η_{3}^{2} ∥ β^{(t - 1)} - β^{*} ∥_{\infty} \leq 2 {(ξ n k)}^{- 1} n k η_{3}^{2} {∥ β^{(t - 1)} - β^{*} ∥}_{\infty} = o_{P} (q),

so we have

∥ γ^{(t)} - β^{*} ∥_{\infty} = o_{P} (q)

. Thus the second step of the mathematical induction is completed. The theorem is therefore proved. □

References

Frank, L.E.; Friedman, J.H. A statistical view of some chemometrics regression tools. Technometrics 1993, 35, 109–135. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Candes, E.; Tao, T. The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Stat. 2007, 35, 2313–2351. [Google Scholar]
Fan, J.; Lv, J. Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 2008, 70, 849–911. [Google Scholar] [CrossRef]
Fan, J.; Samworth, R.; Wu, Y. Ultrahigh dimensional feature selection: Beyond the linear model. J. Mach. Learn. Res. 2009, 10, 2013–2038. [Google Scholar]
Zhu, L.P.; Li, L.; Li, R.; Zhu, L.X. Model-free feature screening for ultrahigh-dimensional data. J. Am. Stat. Assoc. 2011, 106, 1464–1475. [Google Scholar] [CrossRef]
Li, R.; Zhong, W.; Zhu, L. Feature screening via distance correlation learning. J. Am. Stat. Assoc. 2012, 107, 1129–1139. [Google Scholar] [CrossRef]
Fan, J.; Song, R. Sure independence screening in generalized linear models with NP-dimensionality. Ann. Stat. 2010, 38, 3567–3604. [Google Scholar] [CrossRef]
Wang, H. Forward regression for ultra-high dimensional variable screening. J. Am. Stat. Assoc. 2009, 104, 1512–1524. [Google Scholar] [CrossRef]
Xu, C.; Chen, J. The sparse MLE for ultrahigh-dimensional feature screening. J. Am. Stat. Assoc. 2014, 109, 1257–1269. [Google Scholar] [CrossRef] [PubMed]
Koenker, R. Quantile Regression; Cambridge University Press: New York, NY, USA, 2005. [Google Scholar]
Newey, W.K.; Powell, J.L. Asymmetric least squares estimation and testing. Econom. J. Econom. Soc. 1987, 55, 819–847. [Google Scholar] [CrossRef]
Zhao, J.; Chen, Y.; Zhang, Y. Expectile regression for analyzing heteroscedasticity in high dimension. Stat. Probab. Lett. 2018, 137, 304–311. [Google Scholar] [CrossRef]
Ciuperca, G. Variable selection in high-dimensional linear model with possibly asymmetric errors. Comput. Stat. Data Anal. 2021, 155, 107112. [Google Scholar] [CrossRef]
Song, S.; Lin, Y.; Zhou, Y. Linear expectile regression under massive data. Fundam. Res. 2021, 1, 574–585. [Google Scholar] [CrossRef]
Gu, Y.; Zou, H. High-dimensional generalizations of asymmetric least squares regression and their applications. Ann. Stat. 2016, 44, 2661–2694. [Google Scholar] [CrossRef]
Wang, L.; Wu, Y.; Li, R. Quantile regression for analyzing heterogeneity in ultra-high dimension. J. Am. Stat. Assoc. 2012, 107, 214–222. [Google Scholar] [CrossRef]
He, X.; Wang, L.; Hong, H.G. Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann. Stat. 2013, 41, 342–369. [Google Scholar] [CrossRef]
Wu, Y.; Yin, G. Conditional quantile screening in ultrahigh-dimensional heterogeneous data. Biometrika 2015, 102, 65–76. [Google Scholar] [CrossRef]
Zhong, W.; Zhu, L.; Li, R.; Cui, H. Regularized quantile regression and robust feature screening for single index models. Stat. Sin. 2016, 26, 69–95. [Google Scholar] [CrossRef]
Ma, Y.; Li, Y.; Lin, H. Concordance measure-based feature screening and variable selection. Stat. Sin. 2017, 27, 1967–1985. [Google Scholar] [CrossRef][Green Version]
Chen, L.P. A note of feature screening via a rank-based coefficient of correlation. Biom. J. 2023, 65, 2100373. [Google Scholar] [CrossRef]
Chen, L.P. Feature screening via concordance indices for left-truncated and right-censored survival data. J. Stat. Plan. Inference 2024, 232, 106153. [Google Scholar] [CrossRef]
Tu, Y.; Wang, S. Variable screening and model averaging for expectile regressions. Oxf. Bull. Econ. Stat. 2023, 85, 574–598. [Google Scholar] [CrossRef]
Ghosh, A.; Ponzi, E.; Sandanger, T.; Thoresen, M. Robust sure independence screening for nonpolynomial dimensional generalized linear models. Scand. J. Stat. 2023, 50, 1232–1262. [Google Scholar] [CrossRef]
Ghosh, A.; Thoresen, M. A robust variable screening procedure for ultra-high dimensional data. Stat. Methods Med Res. 2021, 30, 1816–1832. [Google Scholar] [CrossRef]
Basu, A.; Harris, I.R.; Hjort, N.L.; Jones, M. Robust and efficient estimation by minimising a density power divergence. Biometrika 1998, 85, 549–559. [Google Scholar] [CrossRef]
Scott, D.W. Parametric statistical modeling by minimum integrated square error. Technometrics 2001, 43, 274–285. [Google Scholar] [CrossRef]
Efron, B. Regression percentiles using asymmetric squared error loss. Stat. Sin. 1991, 1, 93–125. [Google Scholar]
Mallat, S.G.; Zhang, Z. Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 1993, 41, 3397–3415. [Google Scholar] [CrossRef]
Blumensath, T.; Davies, M.E. Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal. 2009, 27, 265–274. [Google Scholar] [CrossRef]
Chen, J.; Chen, Z. Extended Bayesian information criteria for model selection with large model spaces. Biometrika 2008, 95, 759–771. [Google Scholar] [CrossRef]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer-Verlag: New York, NY, USA, 2013. [Google Scholar]
Fan, J.; Ma, Y.; Dai, W. Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. J. Am. Stat. Assoc. 2014, 109, 1270–1284. [Google Scholar] [CrossRef] [PubMed]
Wang, M.; Kang, X.; Liang, J.; Wang, K.; Wu, Y. Heteroscedasticity identification and variable selection via multiple quantile regression. J. Stat. Comput. Simul. 2024, 94, 297–314. [Google Scholar] [CrossRef]

Table 1. Simulation results of Example 1 for

τ = 0.5

.

Table 1. Simulation results of Example 1 for

τ = 0.5

.

$ρ$	$ε$	n	Method	TP	CF	RMSE
0	Normal	100	SRoWER	4 (0 )	0.880 (0.326)	0.985 (0.427)
			SMLE	4 (0 )	0.910 (0.287)	1.092 (0.418)
			EPCS	4 (0 )	0.935 (0.247)	0.951 (0.424)
		200	SRoWER	4 (0 )	0.95 (0.218)	0.514 (0.077)
			SMLE	4 (0 )	0.95 (0.218)	0.693 (0.162)
			EPCS	4 (0 )	0.95 (0.218)	0.696 (0.162)
	Gumbel	100	SRoWER	3.995 (0.071)	0.880 (0.326)	1.153 (0.349)
			SMLE	4.000 (0.000)	0.905 (0.294)	1.518 (0.514)
			EPCS	3.990 (0.100)	0.920 (0.272)	1.390 (0.416)
		200	SRoWER	4 (0 )	0.900 (0.301)	0.756 (0.156)
			SMLE	4 (0 )	0.965 (0.184)	0.877 (0.216)
			EPCS	4 (0 )	0.965 (0.184)	0.922 (0.244)
	T	100	SRoWER	3.900 (0.375)	0.765 (0.425)	1.665 (0.587)
			SMLE	3.845 (0.471)	0.795 (0.405)	2.351 (0.914)
			EPCS	3.795 (0.578)	0.765 (0.425)	2.450 (1.143)
		200	SRoWER	3.980 (0.199)	0.91 (0.287)	0.854 (0.241)
			SMLE	3.960 (0.262)	0.92 (0.272)	1.260 (0.451)
			EPCS	3.945 (0.320)	0.92 (0.272)	1.286 (0.457)
0.5	Normal	100	SRoWER	4 (0 )	0.935 (0.247)	0.714 (0.245)
			SMLE	4 (0 )	0.945 (0.229)	0.834 (0.330)
			EPCS	4 (0 )	0.955 (0.208)	0.741 (0.327)
		200	SRoWER	4 (0 )	0.905 (0.294)	0.594 (0.094)
			SMLE	4 (0 )	0.975 (0.157)	0.322 (0.214)
			EPCS	4 (0 )	0.985 (0.122)	0.290 (0.178)
	Gumbel	100	SRoWER	3.995 (0.071)	0.90 (0.301)	0.943 (0.379)
			SMLE	3.995 (0.071)	0.91 (0.287)	1.175 (0.501)
			EPCS	3.975 (0.186)	0.91 (0.287)	1.187 (0.519)
		200	SRoWER	4 (0 )	0.905 (0.294)	0.695 (0.169)
			SMLE	4 (0 )	0.975 (0.157)	0.570 (0.289)
			EPCS	4 (0 )	0.990 (0.100)	0.397 (0.167)
	T	100	SRoWER	3.870 (0.429)	0.775 (0.419)	1.924 (0.580)
			SMLE	3.780 (0.551)	0.770 (0.422)	2.724 (1.067)
			EPCS	3.525 (0.795)	0.550 (0.499)	2.711 (0.907)
		200	SRoWER	3.970 (0.222)	0.845 (0.363)	0.986 (0.277)
			SMLE	3.955 (0.252)	0.950 (0.218)	0.665 (0.234)
			EPCS	3.940 (0.327)	0.935 (0.247)	0.729 (0.223)

Table 2. Simulation results of Example 1 for

τ = 0.05

.

Table 2. Simulation results of Example 1 for

τ = 0.05

.

$ρ$	$ε$	n	Method	TP	CF	RMSE
0	Normal	100	SRoWER	3.995 (0.071)	0.725 (0.448)	1.565 (0.422)
			SMLE	4.000 (0.000)	0.760 (0.428)	2.187 (0.661)
			EPCS	3.995 (0.071)	0.510 (0.501)	3.067 (0.951)
		200	SRoWER	4 (0 )	0.725 (0.448)	0.891 (0.229)
			SMLE	4 (0 )	0.690 (0.464)	1.635 (0.589)
			EPCS	4 (0 )	0.540 (0.500)	2.031 (0.697)
	Gumbel	100	SRoWER	3.980 (0.140)	0.875 (0.332)	1.050 (0.192)
			SMLE	3.995 (0.071)	0.950 (0.218)	0.932 (0.291)
			EPCS	3.995 (0.071)	0.830 (0.377)	1.391 (0.393)
		200	SRoWER	4 (0 )	0.885 (0.320)	0.601 (0.111)
			SMLE	4 (0 )	0.985 (0.122)	0.422 (0.131)
			EPCS	4 (0 )	0.960 (0.196)	0.619 (0.184)
	T	100	SRoWER	3.120 (1.049)	0.295 (0.457)	4.441 (1.483)
			SMLE	3.325 (0.907)	0.235 (0.425)	7.941 (3.112)
			EPCS	3.120 (1.025)	0.065 (0.247)	9.146 (3.500)
		200	SRoWER	3.765 (0.558)	0.400 (0.491)	2.918 (1.069)
			SMLE	3.800 (0.540)	0.180 (0.385)	7.217 (2.362)
			EPCS	3.800 (0.530)	0.125 (0.332)	7.767 (2.407)
0.5	Normal	100	SRoWER	3.985 (0.122)	0.79 (0.408)	1.355 (0.364)
			SMLE	3.990 (0.141)	0.83 (0.377)	1.746 (0.645)
			EPCS	3.880 (0.487)	0.50 (0.501)	2.710 (0.887)
		200	SRoWER	4 (0 )	0.665 (0.473)	0.962 (0.251)
			SMLE	4 (0 )	0.725 (0.448)	1.752 (0.515)
			EPCS	4 (0 )	0.705 (0.457)	1.733 (0.499)
	Gumbel	100	SRoWER	4.00 (0.000)	0.890 (0.314)	0.979 (0.246)
			SMLE	4.00 (0.000)	0.935 (0.247)	1.035 (0.253)
			EPCS	3.92 (0.405)	0.635 (0.483)	1.563 (0.618)
		200	SRoWER	4 (0 )	0.765 (0.425)	0.774 (0.120)
			SMLE	4 (0 )	0.935 (0.247)	0.670 (0.301)
			EPCS	4 (0 )	0.935 (0.247)	0.541 (0.372)
	T	100	SRoWER	3.260 (0.926)	0.270 (0.445)	4.738 (1.573)
			SMLE	3.415 (0.852)	0.265 (0.442)	7.569 (2.619)
			EPCS	2.615 (1.159)	0.065 (0.247)	8.153 (2.855)
		200	SRoWER	3.725 (0.593)	0.425 (0.496)	2.296 (0.635)
			SMLE	3.770 (0.518)	0.220 (0.415)	7.199 (2.344)
			EPCS	3.460 (0.801)	0.145 (0.353)	6.908 (2.204)

Table 3. Simulation results of Example 2 for

τ = 0.5

.

Table 3. Simulation results of Example 2 for

τ = 0.5

.

$ρ$	$ε$	n	Method	TP	CF	RMSE
0	Normal	100	SRoWER	3.685 (0.713)	0.645 (0.480)	1.695 (0.569)
			SMLE	3.770 (0.537)	0.685 (0.466)	2.771 (1.039)
			EPCS	3.700 (0.716)	0.690 (0.464)	2.880 (0.987)
		200	SRoWER	3.940 (0.277)	0.755 (0.431)	1.492 (0.503)
			SMLE	3.935 (0.285)	0.820 (0.385)	2.216 (0.806)
			EPCS	3.915 (0.372)	0.820 (0.385)	2.150 (0.711)
	Gumbel	100	SRoWER	3.655 (0.720)	0.635 (0.483)	1.553 (0.521)
			SMLE	3.680 (0.663)	0.695 (0.462)	2.986 (1.355)
			EPCS	3.605 (0.782)	0.675 (0.470)	2.806 (0.871)
		200	SRoWER	3.825 (0.464)	0.725 (0.448)	1.296 (0.551)
			SMLE	3.820 (0.478)	0.780 (0.415)	2.050 (0.688)
			EPCS	3.805 (0.528)	0.805 (0.397)	1.456 (0.585)
	T	100	SRoWER	3.495 (0.821)	0.560 (0.498)	2.030 (0.712)
			SMLE	3.445 (0.819)	0.585 (0.494)	3.532 (1.271)
			EPCS	3.380 (0.927)	0.540 (0.500)	3.123 (1.117)
		200	SRoWER	3.895 (0.380)	0.675 (0.470)	1.611 (0.518)
			SMLE	3.805 (0.508)	0.785 (0.412)	1.831 (0.558)
			EPCS	3.830 (0.482)	0.810 (0.393)	2.013 (0.693)
0.5	Normal	100	SRoWER	3.735 (0.646)	0.660 (0.475)	1.740 (0.586)
			SMLE	3.760 (0.587)	0.715 (0.453)	3.033 (0.881)
			EPCS	3.660 (0.760)	0.690 (0.464)	3.292 (1.296)
		200	SRoWER	3.950 (0.240)	0.645 (0.480)	1.588 (0.421)
			SMLE	3.860 (0.437)	0.690 (0.464)	2.469 (0.457)
			EPCS	3.855 (0.525)	0.745 (0.437)	2.582 (0.500)
	Gumbel	100	SRoWER	3.735 (0.622)	0.655 (0.477)	1.839 (0.567)
			SMLE	3.725 (0.601)	0.690 (0.464)	2.801 (1.184)
			EPCS	3.685 (0.615)	0.670 (0.471)	2.463 (1.101)
		200	SRoWER	3.885 (0.377)	0.640 (0.481)	1.617 (0.487)
			SMLE	3.840 (0.394)	0.675 (0.470)	2.523 (0.661)
			EPCS	3.820 (0.468)	0.715 (0.453)	2.624 (0.710)
	T	100	SRoWER	3.455 (0.801)	0.445 (0.498)	2.114 (0.549)
			SMLE	3.425 (0.792)	0.475 (0.501)	3.765 (1.072)
			EPCS	3.100 (1.070)	0.405 (0.492)	3.186 (1.153)
		200	SRoWER	3.910 (0.304)	0.520 (0.501)	1.949 (0.523)
			SMLE	3.775 (0.525)	0.685 (0.466)	2.756 (0.671)
			EPCS	3.670 (0.703)	0.660 (0.475)	2.877 (0.735)

Table 4. Simulation results of Example 2 for

τ = 0.05

.

Table 4. Simulation results of Example 2 for

τ = 0.05

.

$ρ$	$ε$	n	Method	TP	CF	RMSE
0	Normal	100	SRoWER	3.530 (0.918)	0.385 (0.488)	2.090 (0.576)
			SMLE	3.755 (0.638)	0.535 (0.500)	4.484 (1.384)
			EPCS	3.575 (0.865)	0.325 (0.470)	4.070 (1.368)
		200	SRoWER	3.75 (0.671)	0.550 (0.499)	1.406 (0.587)
			SMLE	3.98 (0.140)	0.490 (0.501)	4.210 (1.145)
			EPCS	3.87 (0.463)	0.495 (0.501)	3.615 (1.053)
	Gumbel	100	SRoWER	3.490 (0.930)	0.395 (0.490)	1.649 (0.537)
			SMLE	3.745 (0.626)	0.630 (0.484)	4.217 (1.552)
			EPCS	3.535 (0.856)	0.475 (0.501)	3.409 (1.670)
		200	SRoWER	3.755 (0.606)	0.600 (0.491)	1.370 (0.578)
			SMLE	3.885 (0.391)	0.655 (0.477)	3.338 (1.196)
			EPCS	3.785 (0.592)	0.675 (0.470)	3.088 (1.085)
	T	100	SRoWER	2.795 (1.127)	0.165 (0.372)	3.357 (0.890)
			SMLE	2.985 (1.082)	0.185 (0.389)	8.103 (2.779)
			EPCS	2.735 (1.184)	0.070 (0.256)	8.865 (2.696)
		200	SRoWER	3.44 (0.900)	0.34 (0.475)	2.686 (0.759)
			SMLE	3.67 (0.635)	0.17 (0.377)	7.480 (2.210)
			EPCS	3.56 (0.692)	0.09 (0.287)	7.741 (2.436)
0.5	Normal	100	SRoWER	3.610 (0.966)	0.40 (0.491)	1.966 (0.582)
			SMLE	3.795 (0.524)	0.54 (0.500)	4.450 (1.279)
			EPCS	3.365 (1.170)	0.30 (0.459)	4.641 (1.729)
		200	SRoWER	3.835 (0.547)	0.480 (0.501)	1.556 (0.400)
			SMLE	3.925 (0.346)	0.515 (0.501)	3.623 (0.840)
			EPCS	3.760 (0.628)	0.545 (0.499)	3.360 (0.939)
	Gumbel	100	SRoWER	3.700 (0.702)	0.435 (0.497)	1.739 (0.621)
			SMLE	3.785 (0.529)	0.685 (0.466)	3.233 (1.333)
			EPCS	3.560 (0.895)	0.505 (0.501)	3.989 (1.709)
		200	SRoWER	3.835 (0.509)	0.530 (0.500)	1.620 (0.572)
			SMLE	3.910 (0.335)	0.620 (0.487)	3.390 (1.064)
			EPCS	3.770 (0.607)	0.665 (0.473)	3.121 (1.015)
	T	100	SRoWER	2.815 (1.099)	0.115 (0.320)	3.167 (0.756)
			SMLE	2.960 (0.992)	0.155 (0.363)	8.137 (2.316)
			EPCS	2.265 (1.068)	0.025 (0.157)	8.689 (2.627)
		200	SRoWER	3.445 (0.843)	0.305 (0.462)	2.703 (0.689)
			SMLE	3.620 (0.662)	0.155 (0.363)	7.437 (2.066)
			EPCS	3.115 (0.998)	0.145 (0.353)	8.051 (1.960)

Table 5. Simulation results of Example 3 for

τ = 0.5

.

Table 5. Simulation results of Example 3 for

τ = 0.5

.

$ρ$	n	Method	TP	CF	RMSE
0	100	SRoWER	3.81 (0.579)	0.690 (0.464)	1.600 (0.642)
		SMLE	3.86 (0.471)	0.665 (0.473)	3.671 (1.112)
		EPCS	3.83 (0.532)	0.670 (0.471)	3.186 (1.061)
	200	SRoWER	3.945 (0.25)	0.785 (0.412)	0.968 (0.513)
		SMLE	3.965 (0.21)	0.765 (0.425)	1.956 (0.784)
		EPCS	3.965 (0.21)	0.815 (0.389)	1.762 (0.859)
0.5	100	SRoWER	3.895 (0.495)	0.65 (0.478)	1.432 (0.566)
		SMLE	3.915 (0.372)	0.64 (0.481)	2.811 (0.826)
		EPCS	3.845 (0.559)	0.70 (0.459)	2.841 (0.937)
	200	SRoWER	3.970 (0.198)	0.675 (0.470)	1.325 (0.557)
		SMLE	3.980 (0.140)	0.665 (0.473)	1.968 (0.585)
		EPCS	3.965 (0.184)	0.675 (0.470)	2.221 (0.641)

Table 6. Simulation results of Example 3 for

τ = 0.05

.

Table 6. Simulation results of Example 3 for

τ = 0.05

.

$ρ$	n	Method	TP	CF	RMSE
0	100	SRoWER	3.610 (0.907)	0.545 (0.499)	1.189 (0.659)
		SMLE	3.845 (0.531)	0.530 (0.500)	4.331 (1.956)
		EPCS	3.725 (0.783)	0.515 (0.501)	4.003 (1.500)
	200	SRoWER	3.750 (0.735)	0.600 (0.491)	0.947 (0.715)
		SMLE	3.945 (0.250)	0.535 (0.500)	4.572 (1.206)
		EPCS	3.900 (0.401)	0.560 (0.498)	3.445 (1.388)
0.5	100	SRoWER	3.73 (0.735)	0.435 (0.497)	1.602 (0.523)
		SMLE	3.92 (0.323)	0.550 (0.499)	3.820 (1.434)
		EPCS	3.73 (0.692)	0.520 (0.501)	3.843 (1.253)
	200	SRoWER	3.950 (0.240)	0.49 (0.501)	1.427 (0.527)
		SMLE	3.985 (0.122)	0.53 (0.500)	3.478 (1.163)
		EPCS	3.905 (0.383)	0.54 (0.500)	3.212 (0.929)

Table 7. Expectile prediction error (EPE), model size (Size), and selected noise variables (SNV) over 100 repetitions and their standard errors (in parentheses) for wage data.

$τ$	Method	EPE	Size	SNV
0.5	SRoWER	0.041 (0.002)	6.63 (1.107)	0.06 (0.239)
	SMLE	0.042 (0.002)	5.44 (0.978)	0.04 (0.243)
	EPCS	0.048 (0.005)	6.45 (0.687)	0.00 (0.000)
0.05	SRoWER	0.017 (0.001)	4.60 (1.511)	0.33 (0.620)
	SMLE	0.030 (0.014)	6.03 (2.422)	1.12 (1.565)
	EPCS	0.275 (0.183)	10.11 (3.127)	3.62 (2.107)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Sparse Robust Weighted Expectile Screening for Ultra-High-Dimensional Data

Abstract

1. Introduction

2. Model and Method

2.1. $L_{2} E$ Criterion for Asymmetric Normal Distribution

2.2. RoWER

3. The SRoWER and Sure Screening Property

3.1. The IHT Algorithm

3.2. Sure Screening Property

3.3. The Choice of k

4. Numerical Studies

4.1. Simulation Studies

4.2. Real Data Example

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Article Metrics

Citations

Article Access Statistics

Sparse Robust Weighted Expectile Screening for Ultra-High-Dimensional Data

Abstract

1. Introduction

2. Model and Method

2.1. L 2 E Criterion for Asymmetric Normal Distribution

2.2. RoWER

3. The SRoWER and Sure Screening Property

3.1. The IHT Algorithm

3.2. Sure Screening Property

3.3. The Choice of k

4. Numerical Studies

4.1. Simulation Studies

4.2. Real Data Example

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Article Metrics

Citations

Article Access Statistics

2.1. $L_{2} E$ Criterion for Asymmetric Normal Distribution