Abstract
This paper investigates robust feature screening for ultra-high dimensional data in the presence of outliers and heterogeneity. Considering the susceptibility of likelihood methods to outliers, we propose a Sparse Robust Weighted Expectile Regression (SRoWER) method that combines the criterion with expectile regression. By utilizing the IHT algorithm, our method effectively incorporates correlations of covariates and enables joint feature screening. The proposed approach demonstrates robustness against heavy-tailed errors and outliers in data. Simulation studies and a real data analysis are provided to demonstrate the superior performance of the SRoWER method when dealing with outlier-contaminated explanatory variables and/or heavy-tailed error distributions.
Keywords:
asymmetric least squares; feature screening; heteroscedasticity; robust regression; ultra-high dimensional data MSC:
62J05; 62J07; 62F12
1. Introduction
With the exponential growth of data sets in various fields over the past two decades, numerous exhaustive methods have been proposed to address the issue of coefficient sparsity in high-dimensional statistical models, such as bridge regression [1], LASSO [2], SCAD and other folded-concave penalties [3], and the Dantzig selector [4]. While these methods have demonstrated their effectiveness both theoretically and practically, real-world scenarios present new challenges such as identifying disease-causing genes among millions of other genes or analyzing key factors contributing to stock price fluctuations from vast amounts of business data. To tackle ultra-high dimensional data, a range of techniques has emerged. One notable technique is Sure Independent Screening (SIS), initially developed by Fan and Lv [5] for screening out irrelevant factors before conducting variable selection in ultra-high dimensional linear models. There are numerous further developments based on SIS [6,7,8,9]. However, these methods overlook the correlations of covariates, despite their computational efficiency. Consequently, additional procedures have been proposed to address this limitation, including ISIS [6], FR [10], SMLE [11].
The aforementioned approaches, which are all based on the maximum likelihood function or Pearson’s correlation, become invalid in the presence of outliers. Therefore, robust methods have been extensively studied in the literature. Although quantile regression [12] is effective in handling heterogeneous data, the significantly higher computational cost compared to least squares error necessitates an investigation of the asymmetric least squares (ALS) regression (i.e., expectile regression [13,14,15,16]). ALS regression provides a more comprehensive interpretation of the conditional distribution than the ordinary least squares (OLS) method by allocating different squared error losses to positive and negative residuals, respectively. Moreover, its smooth differentiability greatly reduces computational costs and facilitates theoretical research. Building upon ALS and quantile regression, numerous methods have been proposed to address heterogeneous data with high-dimensionality, such as [17,18] for variable selection and [19,20,21,22,23,24] for feature screening. The study of [25] proposed an expectile partial correlation screening (EPCS) procedure to sequentially identify important variables for expectile regressions in ultra-high dimensions, and proved that this procedure can lead to a sure screening set. Another robust parametric technique called DPD-SIS [26,27] has been developed for ultra-high-dimensional linear regression models and generalized linear models. This approach is based on the robust minimum density power divergence estimator [28], but it is still limited to addressing marginal aspects without accounting for the correlations between features. In addition, the DPD-SIS cannot handle heterogeneity, which is often a feature of ultra-high-dimensional data.
In the context of heterogeneity and outliers in the data, we propose a new method called Robust Weighted Expectile Regression (RoWER), which combines the error criterion with expectile regression to achieve the robustness and address the heterogeneity. Furthermore, we developed a sparse restricted RoWER (SRoWER) approach to achieve feature screening. Under general assumptions, we show that the SRoWER can be used for sure screening property. Numerical studies validate the robustness and efficacy of SRoWER. There are three advantages of our SRoWER method, including the following: (1) The SRoWER can provide more reliable screening results, particularly in the presence of outliers in both covariates and the response; (2) In the case of heteroscedasticity, the SRoWER yields superior performance in estimation and feature screening as demonstrated in simulation studies; (3) The SRoWER can be efficiently solved by an iterative hard-thresholding-based algorithm.
The remaining sections of this article are organized as follows. Section 2 introduces the model and the RoWER method. In Section 3, we present the SRoWER method for feature screening and establish sure independent screening property. Section 4 describes simulation studies and a real data analysis that evaluates the finite sample performance of the SRoWER method. Concluding remarks are provided in Section 5. The proofs of the main results can be found in the Appendix A.
2. Model and Method
2.1. Criterion for Asymmetric Normal Distribution
To address the problem that likelihood methods are sensitive to outliers, Scott [29] proposed a method, for which the objective function is
where is a parametric probability density function (pdf) of a random variable V and is a given random sample.
Here, we assume that V follows the asymmetric normal distribution, i.e., . The corresponding pdf is
where is asymmetric squared error loss [13], with being the indicator function. Moreover, , and are the location, scale and asymmetric parameters, respectively. The following proposition gives the criterion of asymmetric normal distribution.
Proposition 1.
Suppose , then the criterion is
2.2. RoWER
We consider -mean [13] of the random variable ,
In fact, the -mean corresponds to Efron’s w-mean [30], where . In economics, the -mean is also called the -expectile. Let be the n-dimensional response vector, be the design matrix with . The ALS regression is carried out using the following
which degenerates to the OLS regression when .
Consider the following linear model
where is a p-dimensional parameter vector and is the vector of n independent errors satisfying for some . We adopt the sparsity assumption on , that is, the regression coefficient vector has many zero components. In model (2), it is crucial to understand that varying allows for variations in the coefficient vector , so we can model different locations of the conditional distribution. For convenience, the superscript of and is omitted in the following so that no confusion arises.
By substituting and into (1), we can obtain the following loss function by disregarding the terms that are independent of ,
However, (3) may not be strictly convex, so we propose a new loss function in the following Proposition 2 by Taylor’s expansion and logarithmic transformation.
Proposition 2.
Given a consistent estimator of β, minimizing (3) is transformed into minimizing the following loss
where
which is abbreviated as .
Here ’s can be treated as the weights of the asymmetric least squares loss, and the loss (4) is referred to as the RoWER. When , the RoWER degenerates to the weighted least squares regression. This paper chooses the consistent estimator as based on Lemma A5. We assume that ’s are lower bounded.
3. The SRoWER and Sure Screening Property
Let be any subset of , which corresponds to a submodel with the relevant regression coefficient vector and the design matrix , . In addition, let be the -norm, and be the -norm, which denotes the number of non-zero components of a vector. The size of model is denoted as . The true model is represented by , with being the true regression coefficient vector, and .
3.1. The IHT Algorithm
For the objective function , assuming that is sparse with for some known k, the RoWER method with sparsity restriction (SRoWER) yields an estimator of defined as
and stands for the set of subscripts of the non-zero components of .
For feature screening, the goal is to retain a relatively small number of features from p features. Currently, many studies have proposed methods to solve such problems. For example, Mallat and Zhang [31] proposed the matching pursuit algorithm. Moreover, the hard thresholding method proposed by Blumensath and Davies [32] is particularly effective for linear models. We now follow the idea of an iterative hard thresholding (IHT) algorithm to compute the SRoWER estimate. For within the neighborhood of a given , the IHT uses the approximation of ,
where
, and is a scale parameter. Denote .
By (6), the approximate solution of (5) can be obtained by the following iterative procedure
The optimization of (7) is equivalent to
If there is no constraint , the analytic solution of (8) is . However, due to the sparsity restriction, can be obtained by selecting the component of with the largest absolute value before k, i.e.,
where r is the k-th largest component of , and is a hard thresholding function. Given the sparse solution obtained at the t-th iteration, iterating (8) is equivalent to iterating the following expression
The ultra-high dimensional case is often faced with a huge amount of computational tasks including matrix operations. However, the use of thresholding functions can eliminate this issue. Moreover, it naturally incorporates information on the correlations between predictors. Theorem 1 shows that the value of decreases as the number of iterations increases.
Theorem 1.
Let be the sequence obtained by(7), be the maximum eigenvalue of . If with , the value of decreases as the number of iterations increases, i.e., .
3.2. Sure Screening Property
This subsection will prove the sure screening property of feature screening based on the SRoWER method. Define
as the collections of the over-fitted models and the under-fitted models, respectively. When p, , k and vary along with the sample size n, we provide the asymptotic property of . Additionally, we make the following assumptions, some of which are completely technical and only help us comprehend the SRoWER method theoretically.
- (A1)
- for some .
- (A2)
- There exist and some non-negative constants , such thatand
- (A3)
- There exists a constant , such that .
- (A4)
- Suppose that the random errors are i.i.d. sub-Gaussian random variables satisfying .
- (A5)
- Let andThere exists a constant , such that, for sufficiently large n,for with being the complement of .
Condition (A1) shows that p diverges exponentially with n, which is a common setting in the ultra-high dimension. The two requirements in Condition (A2) are crucial for establishing the sure screening property. The former one implies that the signals of the true model are stronger than the random errors, so they are detectable. The latter one implies that the sparsity of makes sure screening possible with . Condition (A3) is a regular condition for the theoretical derivation. Condition (A4) is the same as the assumption of [17]. Condition (A5) is similar to [11].
Theorem 2.
Suppose that Conditions (A1)–(A5) are satisfied with . Let be the estimated model obtained by the SRoWER with size k; then, we have
By using feature screening, important features that are highly correlated with the response variable can be kept in . However, it is necessary to note that there is no explicit choice for k, because it depends on the different dimensions. Note that the IHT algorithm needs a initial estimate . To further enhance computational efficiency, the LASSO estimate is chosen as the initial value of the iterations. The following theorem shows that with the initial value obtained using LASSO, the IHT-implemented SRoWER can satisfy the property of sure independent screening within a finite iteration.
Theorem 3.
Let be the t-th update of the IHT procedure. The scale parameter for some , and let be the screening features. The initial value of iteration is
where λ satisfies and . Then, under Conditions (A1)–(A5), for any finite , we have
3.3. The Choice of k
For the SRoWER method, we need prespecified k, such as [4,6,11]. Here, we treat k as a tuning parameter to control model complexity, and determine k by minimizing the following EBIC score:
where . The study of [33] proposed the EBIC for model selection for large model spaces. Here, we use it to determine k for comparing the SRoWER with the EPCS proposed by [25], which also used the EBIC for model selection.
Note that the EBIC selector for determining k requires searching over . To balance the computation and model selection accuracy in practice, we minimize for .
4. Numerical Studies
4.1. Simulation Studies
In this subsection, the finite sample performance of SRoWER is evaluated using simulation studies and compared with EPCS [25] and SMLE [11] based on expectile regression, i.e., in the SRoWER. The IHT algorithm is used to carry out feature screening based on SRoWER, and the iteration is stopped when .
We take , , and expectile level , which correspond to the mean regression and an extreme expectile regression, respectively. All simulation results are based on 200 replications (with standard deviations in parentheses). To evaluate the performance of the screening approach, we use three criteria: the number of true positive variables (TP), the percentage of correctly fitted models (CF), and the root mean-squared error (RMSE)
Example 1.
Consider the linear model
where the candidate features are i.i.d. generated from multivariate normal distribution with , and . We set and . The true model is . is generated from the standard gumbel distribution (Gumbel), the standard normal distribution (Normal), and the t distribution with three degrees of freedom (T), respectively.
Example 1 considers a relatively simple case. The simulation results are given in Table 1 and Table 2. For , we can see that all three considered methods can almost screen all important features for three different error distributions. No one method can control the other two methods in all cases, but the SRoWER performs better than the SMLE and EPCS in most instances. However, at extreme expectiles , the SRoWER method performs much better than the SMLE and EPCS in terms of RMSE except for the case of Gumbel with . In addition, all the results become better when the sample size increases from 100 to 200.
Table 1.
Simulation results of Example 1 for .
Table 2.
Simulation results of Example 1 for .
Example 2.
For the linear model in Example 1, to examine the robustness of the SRoWER, we consider the case where there are outliers in the covariates. We first generate data as those in Example 1. Next, we artificially add outliers from to random 50 covariates of 10% of the observations. The other settings are the same as those in Example 1.
Example 2 considers the case where both covariates and response variables have outliers. The simulation results are shown in Table 3 and Table 4. We can see that the SRoWER has the smallest RMSE compared with SMLE and EPCS. Three considered methods have similar performance in variable selection, except for the case of T. Both SRoWER and SMLE perform better than EPCS in terms of CF for the case of T.
Table 3.
Simulation results of Example 2 for .
Table 4.
Simulation results of Example 2 for .
Example 3.
Here we consider a heterogeneous model. We first generate from multivariate normal distribution with , and . We set and . Let and , where is the cumulative distribution function of standard normal distribution. The response is then simulated from the following normal linear heteroscedastic model
where . Meanwhile, the other settings are the same as those in Example 2.
From Table 5 and Table 6, we can see that the conclusions are similar to those of Examples 1 and 2. Hence, the SRoWER performs well even in the case that there are outliers in the heterogeneous model.
Table 5.
Simulation results of Example 3 for .
Table 6.
Simulation results of Example 3 for .
4.2. Real Data Example
This subsection applies the SRoWER method for feature screening to the Mid-Atlantic wage data with 3000 observations and 8 predictors from [34] that are available in the `ISLR’ package in R. A total of eight predictors (two continuous and six categorical) are considered. The continuous variables are year that wage information was recorded (year) and age of worker (age). The categorical factors include a factor with the following levels: 1. Never Married, 2. Married, 3. Widowed, 4. Divorced, and 5. Separated, which indicate marital status (marital). Another factor (race) contains the following levels: 1. White, 2. Black, 3. Asian and 4. Other. Another factor (education) contains the following levels 1. <HS Grad, 2. HS Grad, 3. Some College, 4. College Grad and 5. Advanced Degree. Another factor (jobclass) contains the following levels: 1. Industrial and 2. Information indicating type of job. Another factor (health) contains the following levels, 1. ≤Good and 2. ≥Very Good, indicating health level of worker. Another factor contains the following levels, 1. Yes and 2. No, indicating whether worker has health insurance (health_ins). We use the dummy variables to represent six categorical variables. Therefore, there are 16 covariates, and the response is the logarithm of wage. Following the set up of [35], to demonstrate the application in high dimension, we extend the data by introducing the following artificial covariates:
where is the standard normal random variables and W follows the standard uniform distribution.
To test the prediction performance of SRoWER, EPCS, and SMLE, we randomly generatde 100 partitions of the full data, and divided the data into two parts, where samples are treated as training data and the remaining samples are treated as testing data. The average of model size (Size), the number of selected noise variables (SNV) and expectile prediction error (EPE) at and , where EPE is computed by the test data
where is the expectile estimate of the i-th observation with being calculated based on the training data set, and is the i-th observation in the test set.
The results are reported in Table 7. For , both SRoWER and SMLE perform similarly in terms of EPE and SNV, while the SRoWER includes more than one variable compared with the SMLE. Although the model size of the SRoWER and EPCS are similar, the EPE of the EPCS is the largest among three methods. For , the SRoWER performs best, while the EPCS performs worst. The selected model sizes vary for different , it indicates the heteroscedasticity of the model. This conclusion agrees with the results of [36].
Table 7.
Expectile prediction error (EPE), model size (Size), and selected noise variables (SNV) over 100 repetitions and their standard errors (in parentheses) for wage data.
5. Conclusions
To deal with the heterogeneity and the outliers in the covariates and/or the response, this paper proposes the RoWER method, which is further applied to screen the features in ultra-high dimensional data. We have also proposed an iterative hard-thresholding algorithm for implementing the feature screening procedure, and establish the sure screening property for the SRoWER method. Simulation studies and a real data analysis verify that the SRoWER method not only reduces the huge computational effort faced by ultra-high dimensional data, but also shows excellent robustness in heterogeneous data. Compared with ISIS [6], the SRoWER naturally accounts for the joint effects between features, and benefits from the advantage of the SMLE in terms of computational efficiency. Based on the proposed method, the problem of robust feature screening for classification data also presents a promising direction for future research.
Author Contributions
Conceptualization, M.W.; methodology, X.W., P.H. and M.W.; software, X.W.; formal analysis, X.W. and P.H.; writing—original draft preparation, X.W.; writing—review and editing, P.H. and M.W.; supervision, M.W. All authors have read and agreed to the published version of the manuscript.
Funding
This research was supported by the National Natural Science Foundation of China (12271294), and the Natural Science Foundation of Shandong Province (ZR2024MA089).
Data Availability Statement
Data sets were provided in the `ISLR’ package in R.
Acknowledgments
The authors are grateful to the editor and reviewers for their valuable comments and suggestions. We also sincerely thank Yundong Tu for providing their codes for us.
Conflicts of Interest
The authors declare no conflicts of interest.
Appendix A
Proof of Proposition 1.
It is seen that
Note that
Then, we have
Hence, the criterion for asymmetric normal distribution is
The proposition is proved. □
Proof of Proposition 2.
By Taylor’s expansion, we have
for , where
Therefore, the minimization of is transformed into the minimization of the following objective function
The proposition is then proved. □
Before proving Theorem 1, we give the following lemmas.
Lemma A1
([17]). The asymmetric least squared loss is continuously differentiable, but is not twice differentiable at zero when . Moreover, for any and , we have
where , , which confirms that is strongly convex.
Lemma A2
([17]). For any and , we have
The lemma follows that is Lipschitz continuous.
Lemma A3
([17]). Let be i.i.d. sub-Gaussian random variables, and be sub-Gaussian norm, , where . Then for any and , we have
where is a constant.
Lemma A4
([17]). Let Z be sub-Gaussian random variable, , ; then, random variables and are also sub-Gaussian. For any , is sub-Gaussian.
Proof of Theorem 1.
The definition of the IHT algorithm based on is
By Lemma A1 and the assumption , we have
This proves that decreases after each iteration. □
Lemma A5.
For some , denote . Under Conditions of Theorem 3, we have
where v is the same as defined in Condition (A5).
Proof of Lemma A5.
Based on the definition of , we have
that is,
Denote
and , it follows that
We now derive a bound on . Define
By Lemma A3 and Condition (A3), for each , we have
where , which is sub-Gaussian with Condition (A4) and Lemma A4. Hence, for any , and some generic positive constants ,
as . This implies . Therefore, under the event , we have
This further implies that
Since by Lemma A1, we have , which leads to . By Condition (A5), (A2) and Cauchy inequality, it follows that
which gives rise to . Thus, under the event , we have
The lemma is proved. □
Proof of Theorem 2.
Let be the estimator of obtained by SRoWER based on model . If , the theorem is then proved. This suffices to show that
as .
For any , define . Consider close to such that , for some . When n is sufficiently large, falls into a small neighborhood of . By Lemma A1, we have
where , is the smallest eigenvalue of , and is the lower bound of . By Lemma A3, we have
which leads to
Thus, by Bonferroni inequality and Condition (A1),
where b is some generic positive constant. Due to the convexity of in , the above conclusion holds for any , such that .
For any , let be augmented with zero corresponding to the elements in . By Condition (A2), it is seen that
Consequently,
The theorem is proved. □
Proof of Theorem 3.
Let . By Condition (A2), , it suffices to prove for any
It is clearly implied by
We use mathematical induction to prove (A3).
Step 1.
When , the initial value for the iteration is defined as the LASSO estimator of , that is . By Lemma A5, we have
Under Condition (A2), we have , . By , . Hence, when , (A3) holds.
References
- Frank, L.E.; Friedman, J.H. A statistical view of some chemometrics regression tools. Technometrics 1993, 35, 109–135. [Google Scholar] [CrossRef]
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
- Candes, E.; Tao, T. The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Stat. 2007, 35, 2313–2351. [Google Scholar]
- Fan, J.; Lv, J. Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 2008, 70, 849–911. [Google Scholar] [CrossRef]
- Fan, J.; Samworth, R.; Wu, Y. Ultrahigh dimensional feature selection: Beyond the linear model. J. Mach. Learn. Res. 2009, 10, 2013–2038. [Google Scholar]
- Zhu, L.P.; Li, L.; Li, R.; Zhu, L.X. Model-free feature screening for ultrahigh-dimensional data. J. Am. Stat. Assoc. 2011, 106, 1464–1475. [Google Scholar] [CrossRef]
- Li, R.; Zhong, W.; Zhu, L. Feature screening via distance correlation learning. J. Am. Stat. Assoc. 2012, 107, 1129–1139. [Google Scholar] [CrossRef]
- Fan, J.; Song, R. Sure independence screening in generalized linear models with NP-dimensionality. Ann. Stat. 2010, 38, 3567–3604. [Google Scholar] [CrossRef]
- Wang, H. Forward regression for ultra-high dimensional variable screening. J. Am. Stat. Assoc. 2009, 104, 1512–1524. [Google Scholar] [CrossRef]
- Xu, C.; Chen, J. The sparse MLE for ultrahigh-dimensional feature screening. J. Am. Stat. Assoc. 2014, 109, 1257–1269. [Google Scholar] [CrossRef] [PubMed]
- Koenker, R. Quantile Regression; Cambridge University Press: New York, NY, USA, 2005. [Google Scholar]
- Newey, W.K.; Powell, J.L. Asymmetric least squares estimation and testing. Econom. J. Econom. Soc. 1987, 55, 819–847. [Google Scholar] [CrossRef]
- Zhao, J.; Chen, Y.; Zhang, Y. Expectile regression for analyzing heteroscedasticity in high dimension. Stat. Probab. Lett. 2018, 137, 304–311. [Google Scholar] [CrossRef]
- Ciuperca, G. Variable selection in high-dimensional linear model with possibly asymmetric errors. Comput. Stat. Data Anal. 2021, 155, 107112. [Google Scholar] [CrossRef]
- Song, S.; Lin, Y.; Zhou, Y. Linear expectile regression under massive data. Fundam. Res. 2021, 1, 574–585. [Google Scholar] [CrossRef]
- Gu, Y.; Zou, H. High-dimensional generalizations of asymmetric least squares regression and their applications. Ann. Stat. 2016, 44, 2661–2694. [Google Scholar] [CrossRef]
- Wang, L.; Wu, Y.; Li, R. Quantile regression for analyzing heterogeneity in ultra-high dimension. J. Am. Stat. Assoc. 2012, 107, 214–222. [Google Scholar] [CrossRef]
- He, X.; Wang, L.; Hong, H.G. Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann. Stat. 2013, 41, 342–369. [Google Scholar] [CrossRef]
- Wu, Y.; Yin, G. Conditional quantile screening in ultrahigh-dimensional heterogeneous data. Biometrika 2015, 102, 65–76. [Google Scholar] [CrossRef]
- Zhong, W.; Zhu, L.; Li, R.; Cui, H. Regularized quantile regression and robust feature screening for single index models. Stat. Sin. 2016, 26, 69–95. [Google Scholar] [CrossRef]
- Ma, Y.; Li, Y.; Lin, H. Concordance measure-based feature screening and variable selection. Stat. Sin. 2017, 27, 1967–1985. [Google Scholar] [CrossRef][Green Version]
- Chen, L.P. A note of feature screening via a rank-based coefficient of correlation. Biom. J. 2023, 65, 2100373. [Google Scholar] [CrossRef]
- Chen, L.P. Feature screening via concordance indices for left-truncated and right-censored survival data. J. Stat. Plan. Inference 2024, 232, 106153. [Google Scholar] [CrossRef]
- Tu, Y.; Wang, S. Variable screening and model averaging for expectile regressions. Oxf. Bull. Econ. Stat. 2023, 85, 574–598. [Google Scholar] [CrossRef]
- Ghosh, A.; Ponzi, E.; Sandanger, T.; Thoresen, M. Robust sure independence screening for nonpolynomial dimensional generalized linear models. Scand. J. Stat. 2023, 50, 1232–1262. [Google Scholar] [CrossRef]
- Ghosh, A.; Thoresen, M. A robust variable screening procedure for ultra-high dimensional data. Stat. Methods Med Res. 2021, 30, 1816–1832. [Google Scholar] [CrossRef]
- Basu, A.; Harris, I.R.; Hjort, N.L.; Jones, M. Robust and efficient estimation by minimising a density power divergence. Biometrika 1998, 85, 549–559. [Google Scholar] [CrossRef]
- Scott, D.W. Parametric statistical modeling by minimum integrated square error. Technometrics 2001, 43, 274–285. [Google Scholar] [CrossRef]
- Efron, B. Regression percentiles using asymmetric squared error loss. Stat. Sin. 1991, 1, 93–125. [Google Scholar]
- Mallat, S.G.; Zhang, Z. Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 1993, 41, 3397–3415. [Google Scholar] [CrossRef]
- Blumensath, T.; Davies, M.E. Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal. 2009, 27, 265–274. [Google Scholar] [CrossRef]
- Chen, J.; Chen, Z. Extended Bayesian information criteria for model selection with large model spaces. Biometrika 2008, 95, 759–771. [Google Scholar] [CrossRef]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer-Verlag: New York, NY, USA, 2013. [Google Scholar]
- Fan, J.; Ma, Y.; Dai, W. Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. J. Am. Stat. Assoc. 2014, 109, 1270–1284. [Google Scholar] [CrossRef] [PubMed]
- Wang, M.; Kang, X.; Liang, J.; Wang, K.; Wu, Y. Heteroscedasticity identification and variable selection via multiple quantile regression. J. Stat. Comput. Simul. 2024, 94, 297–314. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).