Abstract
A significant portion of theoretical and empirical studies in high-dimensional regression have primarily concentrated on clean datasets. However, in numerous practical scenarios, data are often corrupted by missing values and measurement errors, which cannot be ignored. Despite the substantial progress in high-dimensional regression with contaminated covariates, methods that achieve an effective trade-off among prediction accuracy, feature selection, and computational efficiency remain significantly underexplored. We introduce adaptive convex conditioned Lasso (Adaptive CoCoLasso), offering a new approach that can handle high-dimensional linear models with error-prone measurements. This estimator combines a projection onto the nearest positive semi-definite matrix with an adaptively weighted penalty. Theoretical guarantees are provided by establishing error bounds for the estimators. The results from the synthetic data analysis indicate that the Adaptive CoCoLasso performs strongly in prediction accuracy and mean squared error, particularly in scenarios involving both additive and multiplicative noise in measurements. While the Adaptive CoCoLasso estimator performs comparably or is slightly outperformed by certain methods, such as Hard, in reducing the number of incorrectly identified covariates, its strength lies in offering a more favorable trade-off between prediction accuracy and sparse modeling.
1. Introduction
High-dimensional statistical learning has found extensive applications across diverse fields, including artificial intelligence, genomics, molecular biology, and economics. Numerous effective methods leveraging sparse learning through regularization have been developed to facilitate statistical inference in high-dimensional settings. These methods are well-documented in various studies, such as [1,2,3,4,5,6,7,8,9,10,11,12,13], among others. However, most of the previous research has focused on error-free data. In practice, measurement errors are prevalent in applications such as surveys with missing or inaccurate data due to non-responses, voting systems affected by imprecise instruments or systematic biases, and sensor networks corrupted by communication failures or environmental interference. Challenges involving noisy, incomplete, or corrupted data are frequently encountered. Naively applying methods designed for clean datasets to those affected by measurement errors can lead to inconsistent and imprecise estimates, which in turn result in inaccurate conclusions, particularly in high-dimensional settings. Therefore, developing robust methods for model selection and estimation that explicitly account for measurement errors in high-dimensional problems is of paramount importance.
In recent years, sparse modeling in high-dimensional models with measurement errors has garnered widespread attention. For instance, Ref. [14] proposed minimizing regularized least squares while accounting for additive measurement errors in the covariate matrices of partially linear models. In high-dimensional linear sparse regression, Ref. [15] developed a Lasso-type estimator that utilizes an unbiased approximation to replace the corrupted Gram matrix. However, incorporating measurement error information often leads to non-convex likelihood functions, complicating the solution of the associated optimization problems. To address this challenge, Ref. [16] proposed the nearest positive semi-definite projection matrix as an approximation for the unbiased Gram matrix estimate. Using this matrix as a foundation, they introduced the convex conditioned Lasso (CoCoLasso), which reformulates the objective function as a convex optimization problem to facilitate efficient sparse learning in error-prone high-dimensional linear models.
Although CoCoLasso demonstrates superior computational efficiency due to its convex optimization framework, the use of the penalty imposes the same level of shrinkage on all coefficients, which can introduce biases. This often results in overfitting by selecting an overly complex model to minimize prediction error [1,12,17]. To address the biases and overfitting introduced by the penalty, Ref. [18] proposed balanced estimation, which is based on the nearest positive semi-definite matrix and incorporates combined and concave regularization. Although balanced estimation achieves an ideal trade-off between prediction accuracy and variable selection, it suffers from certain limitations. Firstly, the non-convex nature of the concave regularization results in increased computational complexity, rendering its application in high-dimensional settings difficult. Secondly, the selection of tuning parameters for the concave penalty is often difficult and may lead to suboptimal performance in practice.
To address these issues, we propose Adaptive CoCoLasso, which combines the nearest positive semi-definite projection with an adaptive penalty. The Adaptive CoCoLasso estimator not only preserves the computational efficiency of convex optimization but also achieves precise estimation and feature selection in the presence of both additive and multiplicative measurement errors. By imposing higher penalties on zero coefficients and lower penalties on nonzero coefficients, the Adaptive CoCoLasso minimizes estimation bias and enhances variable selection accuracy. Furthermore, error bounds for the Adaptive CoCoLasso estimator are established, and a theorem guarantees the consistency of support recovery.
This paper makes two primary contributions. First, we propose the Adaptive CoCoLasso estimator for high-dimensional linear regression models where the design matrix is affected by measurement errors, aiming to ensure precise estimation and accurate variable selection. By applying stronger penalties to zero coefficients and weaker penalties to nonzero coefficients, the method effectively mitigates overfitting when dealing with additive and multiplicative measurement errors. In addition, we establish theoretical guarantees for the proposed method by deriving oracle inequalities for prediction and estimation errors and proving the consistency of support recovery. Extensive simulation studies demonstrate the effectiveness of our approach.
The structure of this paper is as follows. Section 2 outlines the model setup and introduces the proposed Adaptive CoCoLasso estimator. Section 3 presents the theoretical properties, including bounds on oracle estimation errors. Section 4 evaluates the finite-sample performance of the proposed method through simulation studies. We provide all the proofs in the Appendix A.
Notation 1.
For a vector , the norm is defined as for , and the norm is provided by . For a matrix , the following matrix norms are defined: , , , and , where and denote the smallest and largest eigenvalues of , respectively.
2. Adaptive CoCoLasso for Error-Prone Models
2.1. Model Setting
Consider the high-dimensional linear regression model
where represents the n-dimensional response vector, denotes the fixed design matrix, is the unknown p-dimensional regression coefficient vector, is an n-dimensional error vector independent of , and is the identity matrix (the Gaussian distribution is assumed for simplicity in analysis. However, similar theoretical results hold under the sub-Gaussian assumption provided that the tail probability of decays exponentially). Measurement errors in the design matrix are common in various applications, leading to the observation of a corrupted covariate matrix rather than the true matrix .
Two classical cases are associated with measurement errors in the design matrix . For cases of additive errors, the observed covariates are represented as , where the rows of the additive error matrix are independently and identically distributed (i.i.d.) with a mean vector and a covariance matrix . For cases of multiplicative errors, the observed covariates follow , where ⊙ denotes the Hadamard product and the rows of the multiplicative error matrix have mean vector and covariance matrix . Missing data can be treated as a special case of this model, where the entries of are Bernoulli random variables with success probability , representing the probability of observing the j-th covariate, and denotes the missingness rate for the j-th covariate. To ensure model identifiability, the covariance matrix (for additive errors) or the pair (for multiplicative errors) is assumed to be known, as in [16,18].
2.2. Adaptive CoCoLasso
In high-dimensional settings where the dimensionality p exceeds the sample size n, the true coefficient vector is often assumed to be sparse. Specifically, the support set , representing the indices of truly relevant predictors, has size satisfying . This sparsity assumption ensures model identifiability by requiring that only a small subset of predictors are nonzero, i.e., . Let denote the complementary set of S. In the context of clean data, penalized least squares methods are widely employed for sparse estimation of the true coefficient vector in high-dimensional linear models. The loss function depends on and , where represents the Gram matrix and denotes the marginal correlation vector of , respectively. When the covariate matrix is affected by errors, [15] proposed unbiased estimators and to approximate the unobservable quantities and . Specifically, these estimators can be expressed as
for the additive error cases and
for the multiplicative error cases, where ⊘ denotes element-wise division.
However, the unbiased surrogate is generally not positive semi-definite in high-dimensional scenarios. Consequently, may possess a negative eigenvalue, resulting in the term lacking a lower bound and causing the loss function to lose convexity. To resolve this problem, the unbiased surrogate is replaced by its nearest positive semi-definite projection matrix, defined as , which can be efficiently solved using the alternating direction method of multipliers (ADMM). By definition and the triangle inequality, it follows that
indicating that serves as an approximation to with accuracy comparable to that of the unbiased estimate .
Following the processing of and , Ref. [16] introduced the CoCoLasso method, which employs the penalty for regularization. However, the penalty often selects overly large models to minimize prediction risk. Motivated by [19], we adopt a weighted penalty to develop a convex objective function, where the weights are determined by an initial estimator. Suppose is an initial estimator of , which provides preliminary estimates of the regression coefficients. Based on this initial estimator, we define the weight vector , where for . These weights enable the penalty to adapt to the relative importance of each variable, assigning larger penalties to coefficients with smaller initial estimates and smaller penalties to coefficients with larger initial estimates. Specifically, the proposed Adaptive CoCoLasso estimator is defined as the optimal solution to the following optimization problem, computed after obtaining and as provided in (2) and (3):
where for some positive constant . Let be the Cholesky factor of , such that , and define to satisfy . Then, the proposed Adaptive CoCoLasso estimator defined in (5) can be reformulated equivalently as the global minimizer of the following optimization problem
For comparison, CoCoLasso solves the following optimization problem:
Here, the penalty term introduces sparsity by shrinking some coefficients to exactly zero, effectively performing variable selection. However, since the same penalty is applied to all coefficients, it tends to introduce bias, particularly for larger coefficients. Unlike CoCoLasso, Adaptive CoCoLasso incorporates data-driven weights , where , into the penalty term adjusting the penalty to reflect the relative importance of each variable. This weighting scheme enables the Adaptive CoCoLasso to handle cases where variables have vastly different scales or signal strengths. Assigning smaller penalties to variables with larger estimated coefficients avoids over-penalizing important predictors, improving both variable selection accuracy and coefficient estimation. Additionally, the Adaptive CoCoLasso enhances the recovery of weak signals and reduces bias introduced by uniform penalties in traditional CoCoLasso.
3. Theoretical Properties
In this section, we rigorously derive the statistical error bounds for the Adaptive CoCoLasso estimate under and norms and establish theoretical guarantees for exact support recovery with high probability. Before discussing the theoretical results, we outline four technical assumptions.
Condition 1.
The distributions of and are identified by a set of parameters . Then, there exist generic constants C and c and positive functions ξ and depending on , and such that, for every , and satisfy the following probability statements:
Condition 2.
For some positive constant κ, assume
where , with and representing the subvectors corresponding to the support set S and its complement , respectively.
Condition 3.
The minimum signal strength satisfies where is a positive constant.
Condition 4.
The initial estimator satisfies with .
Condition 1, known as the closeness condition proposed by [16], requires that surrogates (and consequently ) and achieve sufficient element-wise closeness to and , respectively. This condition has already been proven in [16] for typical additive and multiplicative measurement error cases, with and defined in Equations (2) and (3). Condition 2, the restricted eigenvalue (RE) condition, ensures the stability and non-degeneracy of the design matrix on sparse predictor subsets. A similar RE condition was used in [20] to derive statistical error bounds for the clean Lasso estimator. Condition 3 specifies the minimum signal strength, ensuring that the true signal is large enough relative to the regularization parameter to differentiate significant predictors from noise. This condition is commonly assumed in high-dimensional regression settings to guarantee consistent variable selection and accurate estimation [21,22].
Condition 4 ensures that the initial estimator approximates the true parameter with an error rate of , providing the accuracy needed to construct adaptive weights that capture the underlying sparsity and improve the efficiency of the Adaptive CoCoLasso estimator. For clean covariates, commonly used initial estimators include Lasso [10], which leverages sparsity in high-dimensional settings, and ridge regression [23], which addresses multicollinearity effectively. Another widely used approach is the marginal regression estimator, which achieves zero-consistency under a partial orthogonality condition and is derived by fitting univariate regressions for each predictor separately [19]. When faced with a design matrix with measurement errors, CoCoLasso can be used as the initial estimator as it is specifically designed for measurement error models. Alternatively, the estimator introduced in [15], which provides theoretical guarantees for high-dimensional regression with noisy or missing data, can act as an initial estimator.
Theorem 1.
Under Conditions 1–4, the Adaptive CoCoLasso estimator satisfies the following oracle inequalities with high probability for some positive constant :
Here, , and the constants hidden in the notation depend on the restricted eigenvalue constant κ, the noise variance , and the probabilistic bounds specified in Condition 1.
Moreover, with high probability , it holds that
where denotes the support set of , representing the indices of its nonzero components. Similarly, represents the support set of the Adaptive CoCoLasso estimator .
Theorem 1 establishes the theoretical properties of the Adaptive CoCoLasso estimator under high-dimensional linear models with measurement errors. Specifically, it guarantees oracle inequalities for the estimation errors in both and norms, showing that the errors scale with and , respectively. The constants in these bounds depend on key factors such as the restricted eigenvalue condition, noise variance, and probabilistic bounds. Tail probability is influenced by measurement errors, as described by the component in Condition 1. Additionally, the theorem guarantees consistent support recovery, meaning that the true set of relevant predictors can be identified with high probability as the sample size n and dimensionality p increase. All proofs are provided in Appendix A.
4. Numerical Studies
Within this section, we utilize synthetic datasets to evaluate the effectiveness of the Adaptive CoCoLasso (A-CoCoLasso) estimator on finite samples. The comparison includes various alternative estimators, such as CoCoLasso [16], balanced estimation with regularization and smoothly clipped absolute deviation (B-SCAD), balanced estimation with regularization and hard-thresholding (B-Hard) [18], and the standalone hard-thresholding method. The Adaptive CoCoLasso weights were computed using CoCoLasso regression coefficients. The CoCoLasso and Adaptive CoCoLasso estimators were implemented using the LARS algorithm. All the simulation studies were performed using R, focusing on both additive and multiplicative measurement errors to evaluate the performance of the proposed method. In all the numerical experiments, the penalty parameter was determined through 10-fold cross-validation.
To evaluate the aforementioned estimators, we employed the performance metrics introduced in [18]. The first two metrics are the count of correctly selected covariates (C) and the count of incorrectly selected covariates (IC). These are defined as and , respectively. The third and fourth metrics are the prediction error (PE) and the mean squared error (MSE), provided by and , respectively. These metrics collectively assess feature selection accuracy through C and IC while evaluating predictive performance and estimation performance via PE and MSE, thus providing a comprehensive comparison framework.
4.1. Additive Error Cases
Example 1.
We followed the simulation setup in [18] and generated 100 datasets, each containing observations from the linear model , with , , and . The components of and were independently sampled from a multivariate normal distribution . We considered two covariance structures for : the autoregressive structure, where , and the compound symmetry structure, where . The contaminated covariates were obtained as , where the rows of were independently drawn from with and , respectively. The results for the five estimators are summarized in Table 1 and Table 2.
Table 1.
Means and standard errors (in parentheses) of four performance metrics for five methods under additive error cases over 100 replications in the autoregressive structure.
Table 2.
Means and standard errors (in parentheses) of four performance metrics for five methods under additive error cases over 100 replications in the compound symmetric structure.
The results in Table 1 and Table 2 highlight the comparative performance of the five methods under additive measurement errors in both the autoregressive and compound symmetric structures. Comparing A-CoCoLasso with CoCoLasso, A-CoCoLasso consistently achieved a higher number of C and fewer instances of IC. For example, in the autoregressive structure with , A-CoCoLasso attained C = 2.93, which is closer to the ideal value of 3, compared to CoCoLasso (C = 2.94) while significantly reducing IC from 12.54 (CoCoLasso) to 3.89. Furthermore, A-CoCoLasso demonstrated better prediction error (PE = 1.68) and mean squared error (MSE = 2.01) compared to CoCoLasso (PE = 3.65; MSE = 3.64). Similarly, when compared to Hard, A-CoCoLasso achieved a higher C (2.93 vs. 2.27) and better overall estimation and prediction performance. These results indicate that A-CoCoLasso provides more accurate variable selection and estimation than both CoCoLasso and Hard.
In comparison to the balanced estimation methods (B-SCAD and B-Hard), A-CoCoLasso also demonstrated superior performance, particularly in reducing IC while maintaining C closer to the ideal value. For example, in the compound symmetric structure with , A-CoCoLasso achieved C = 2.46, outperforming both B-SCAD (C = 2.07) and B-Hard (C = 1.64). Additionally, A-CoCoLasso maintained a competitive IC of 12.88, which is lower than that of B-SCAD (IC = 13.59), while achieving better prediction and estimation accuracy (PE = 6.80; MSE = 8.39) compared to both B-SCAD (PE = 8.01; MSE = 9.91) and B-Hard (PE = 8.60; MSE = 11.04). These results demonstrate that A-CoCoLasso not only balances the trade-off between correctly identifying covariates and excluding noise variables but also provides robust estimation and prediction accuracy under various settings with additive measurement errors.
Example 2.
To investigate the performance of the methods under ultra-high-dimensional settings with additive measurement errors, we adopted a similar setting to that in [7]. The coefficient vector was specified as . The sample size, dimensionality, and noise level were set as , , and , respectively, reflecting an ultra-high-dimensional setting. The variability in the additive errors was characterized by standard deviation values of and . Table 3 summarizes the performance of the five methods under this setting.
Table 3.
Means and standard errors (in parentheses) of four performance metrics for five methods under additive error cases over 100 replications in the ultra-high-dimensional autoregressive structure.
Table 3 presents the performance of the five methods under ultra-high-dimensional settings with additive measurement errors for and . In terms of the number of correctly identified covariates (C), A-CoCoLasso achieved a competitive performance compared to CoCoLasso and B-SCAD, with values close to the ideal benchmark of 4 under (C = 3.95), and slightly reduced performance under (C = 3.56). In contrast, Hard demonstrated considerably lower values for C (3.14 for and 2.36 for ), indicating weaker variable selection ability.
When examining the number of incorrectly identified covariates (IC), A-CoCoLasso substantially outperformed CoCoLasso and B-SCAD, maintaining much lower IC values (12.7 for and 11.05 for ) compared to CoCoLasso (31.32 and 24.66, respectively) and B-SCAD (21.69 and 21.84, respectively). Hard achieved the smallest IC but at the cost of reduced C, highlighting its conservative nature. In terms of PE and MSE, A-CoCoLasso remained competitive, with PE = 0.90 and MSE = 1.55 for , and PE = 1.21 and MSE = 2.01 for , showing comparable or better results than Hard and B-Hard while maintaining a balanced variable selection performance.
4.2. Multiplicative Error Cases
Example 3.
We evaluated the performance of Adaptive CoCoLasso and other competing methods, including CoCoLasso, Hard, and balanced estimation, under multiplicative measurement errors. The true model remained the same as in the additive error setup, as described in Example 1. To simulate the multiplicative errors, we generated , where the components of followed a log-normal distribution. Specifically, independently followed the same distribution as , with and . Table 4 and Table 5 present the outcomes for the multiplicative error scenarios.
Table 4.
Means and standard errors (in parentheses) of four performance metrics for five methods under multiplicative error cases over 100 replications in the autoregressive structure.
Table 5.
Means and standard errors (in parentheses) of four performance metrics for five methods under multiplicative error cases over 100 replications in the compound symmetric structure.
The results in Table 4 and Table 5 demonstrate that A-CoCoLasso exhibits strong performance under multiplicative measurement errors across both autoregressive and compound symmetric structures. Compared to the other methods, A-CoCoLasso achieved a desirable balance between correctly identifying covariates and maintaining low false discovery rates while also demonstrating competitive prediction and estimation accuracy. Its robustness under varying levels of multiplicative error ( and ) further highlights its adaptability and effectiveness in handling challenging high-dimensional scenarios with multiplicative measurement errors.
Example 4.
We examined the performance of Adaptive CoCoLasso in ultra-high-dimensional settings with multiplicative measurement errors. To maintain comparability with the additive error scenarios, the simulation setup remained largely consistent with that in Example 2, except that the standard deviation values of the multiplicative errors were specified as and , ensuring a comparable signal-to-noise ratio. The performance of the five methods is summarized in Table 6.
Table 6.
Means and standard errors (in parentheses) of four performance metrics for five methods under multiplicative error cases over 100 replications in the ultra-high-dimensional autoregressive structure.
Table 6 presents the performance of five methods under ultra-high-dimensional settings with multiplicative measurement errors for and . The results demonstrate that A-CoCoLasso achieved a desirable balance between correctly identifying covariates (C) and maintaining a low number of incorrectly identified covariates (IC) while delivering competitive prediction and estimation accuracy. For , A-CoCoLasso shows robust performance, with C = 4.18 and IC = 14.55, outperforming CoCoLasso in terms of IC (34.79) while maintaining similar predictive performance (PE = 0.78 vs. PE = 1.21 for CoCoLasso). As increased to 0.2, A-CoCoLasso remained effective with C = 3.57 and IC = 4.26, again demonstrating a significant reduction in IC compared to CoCoLasso (26.81). Additionally, A-CoCoLasso achieved comparable PE and MSE values to the best-performing methods, such as B-SCAD, while demonstrating better variable selection than Hard. Overall, these results highlight A-CoCoLasso’s ability to effectively balance variable selection and prediction accuracy under challenging ultra-high-dimensional multiplicative error settings.
5. Discussion
This paper introduces the Adaptive CoCoLasso estimator, designed to balance prediction accuracy and feature selection in high-dimensional linear regression with measurement errors, effectively addressing both additive and multiplicative cases. The proposed method combines two key techniques: the nearest positive semi-definite projection matrix for correcting measurement errors by reconstructing the covariate matrix and an adaptive weighted penalty, which enhances sparsity and variable selection by assigning data-driven weights to the coefficients. Unlike combined and concave regularization, which introduces computational challenges due to its non-convex nature and parameter tuning difficulties, Adaptive CoCoLasso retains the computational efficiency of convex optimization while providing robust estimation performance. The methodology leverages the LARS algorithm to solve the penalized optimization problem, ensuring scalability to high-dimensional settings. The theoretical analysis and simulation results show that the Adaptive CoCoLasso achieves robust prediction and estimation performance, effectively addressing overfitting and the challenges posed by contaminated data.
Future work could focus on extending the Adaptive CoCoLasso estimator to address statistical inference challenges, such as constructing confidence intervals and performing hypothesis testing. A major difficulty in these extensions arises from the unknown true covariate matrix, which not only makes predicting the response vector challenging but also hinders accurate noise level estimation, even with a reliable coefficient estimator. These issues fall outside the scope of this paper and represent intriguing directions for future research.
Funding
This work was supported by the National Key R&D Program of China (Grant 2022YFA1008000), Natural Science Foundation of China (Grants 72071187, 11671374, 71731010, and 71921001), and Fundamental Research Funds for the Central Universities (Grants WK3470000017 and WK2040000027).
Data Availability Statement
The original contributions presented in this study are included in the article. Further inquiries can be directed to author.
Conflicts of Interest
The author declares that they have no conflict of interest.
Appendix A
Appendix A.1. Proof of Theorem 1
Proof.
Denote the estimation error by , where is the global minimizer defined in the Adaptive CoCoLasso method. For simplicity of notation, we let denote , where represents the weighted norm. Based on the definition of in (6), the following inequality holds
By simple calculations and the definition of the norm, we have
For better clarity, the proof will be divided into six steps, deriving bounds on the prediction based on the inequality above.
Step 1. We derive a bound for the term . By applying the triangle inequality, we obtain
For the first part of the inequality (A2) on the right-hand side, , Condition 1 implies that
Applying the union bound,
Thus, we have Next, we derive a bound for the second term . Note that
Under the assumption that , invoking Lemma A1, we have the probability bound
for some constants and . Therefore, with high probability, The third component can be bounded as follows. Using Condition 1, for all , we have
where and are constants. Combining Lemma A2, we have
For the sparse vector with support size , we additionally obtain
Using and assuming , it follows that
Combining the three parts above, we have
Since the sparsity s dominates in high-dimensional settings, the final bound is
Step 2. Decompose the error term into components on the support set and its complement , so The weighted norms can be written as
Substituting these into inequality (A1) yields
For the first term on the left-hand side of (A4), we have
where and . By applying Condition 2 for any satisfying , we have
where is the restricted eigenvalue constant. For the error term , Condition 1 ensures that
We can bound as follows: From the sparsity assumption, . Applying the Cauchy–Schwarz inequality where , the size of the support set, we obtain
Combining the bounds for and , we have
When , the second term is asymptotically negligible. Redefining the restricted eigenvalue constant as , we ensure that the first term on the left-hand side of (A4) satisfies
Bound the first term on the right-hand side of (A4) and we have
Using the bound in (A3) we have
For the sparsity structure of , . Under the sparsity constraint , we obtain Further, by the Cauchy–Schwarz inequality we have
Simplifying, this yields
Substituting bounds (A5) and
into inequality (A4), we obtain
Step 3. To bound on the left-hand side of inequality (A6), we observe that where the weights are defined based on the initial estimator , with for . Under Condition 4, the initial estimator satisfies , and the sparsity assumption implies that for . By the consistency of , we have for , leading to . Thus, the penalty term in the objective function (6) becomes Under Condition 3, as becomes large for , the penalty ensures as . To bound on the right-hand side of (A6), we note that the weighted norm satisfies where , and denotes the initial estimator. Under Condition 4, the weights satisfy for since is consistent and . Thus, the weighted norm satisfies for some constant . Substituting, we obtain
Step 4. We derive the norm bound for the estimation error . Using the Cauchy–Schwarz inequality, the norm and norm satisfy , where is the sparsity level. Substituting this, inequality (A7) can be rewritten as
Factoring out , we have
For , the term in parentheses must be non-positive. We have
Substituting , we obtain Simplifying, the norm of the error satisfies where is a constant depending on . Hence, we obtain
where and is some positive constant.
Step 5. We derive the norm bound for the estimation error . The norm of decomposes as . By the sparsity assumption, , we have By the Cauchy–Schwarz inequality, the norm on the support set S satisfies , where is the sparsity level. Substituting the norm bound from inequality (A8), , where depends on , n, and p. Substituting into the inequality for , we have . Combining the bounds for and , the norm satisfies . Defining , we conclude
where , depends on , s, n, and p.
Step 6. We aim to show that correctly identifies the support set of the true regression coefficients , such that For a vector , we have From inequality (A8), the norm of the error satisfies Then, we conclude that For each , the minimum signal strength, Condition 3 implies With high probability, the estimation error satisfies For , combining this with the minimum signal strength, Condition 3 provides For sufficiently large n, where , this ensures . For , the estimation error simplifies to . The norm error bound satisfies , which implies . In addition, by Condition 3, there is . This implies . This leads to a contradiction. Therefore, for . Combining the cases for and , we conclude that □
Appendix A.2. Proof of Lemmas
Lemma A1.
Let denote a fixed design matrix and represent an n-dimensional error vector, where is the identity matrix. For any , we define . Under these settings, the following inequality holds
Proof.
Given that , for and , we have
where is the j-th column of . Using the union bound,
Since ,
Thus,
Simplifying,
The proof of Lemma A1 is now complete.□
Lemma A2.
For any ,
Proof.
From the inequality
it follows that The result is obtained by applying the union bound over for all . □
References
- Bickel, P.J.; Ritov, Y.; Tsybakov, A.B. Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 2009, 37, 1705–1732. [Google Scholar] [CrossRef]
- Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
- Candes, E.; Tao, T. The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Stat. 2007, 35, 2313–2351. [Google Scholar]
- Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression. Ann. Stat. 2004, 32, 407–499. [Google Scholar] [CrossRef]
- Fan, J.; Feng, Y.; Wu, Y. Network exploration via the adaptive Lasso and SCAD penalties. Ann. Appl. Stat. 2009, 3, 521. [Google Scholar] [CrossRef]
- Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
- Fan, Y.; Lv, J. Asymptotic properties for combined L1 and concave regularization. Biometrika 2014, 101, 57–70. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- Kong, Y.; Zheng, Z.; Lv, J. The constrained Dantzig selector with enhanced consistency. J. Mach. Learn. Res. 2016, 17, 4205–4226. [Google Scholar]
- Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Wright, S.J. Coordinate descent algorithms. Math. Program. 2015, 151, 3–34. [Google Scholar] [CrossRef]
- Zou, H. The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
- Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2005, 67, 301–320. [Google Scholar] [CrossRef]
- Liang, H.; Li, R. Variable selection for partially linear models with measurement errors. J. Am. Stat. Assoc. 2009, 104, 234–248. [Google Scholar] [CrossRef] [PubMed]
- Loh, P.L.; Wainwright, M.J. High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity. Ann. Stat. 2012, 40, 1637–1664. [Google Scholar] [CrossRef]
- Datta, A.; Zou, H. CoCoLasso for high-dimensional error-in-variables regression. Ann. Stat. 2017, 45, 2400–2426. [Google Scholar] [CrossRef]
- Zhao, P.; Yu, B. On model selection consistency of Lasso. J. Mach. Learn. Res. 2006, 7, 2541–2563. [Google Scholar]
- Zheng, Z.; Li, Y.; Yu, C.; Li, G. Balanced estimation for high-dimensional measurement error models. Comput. Stat. Data Anal. 2018, 126, 78–91. [Google Scholar] [CrossRef]
- Huang, J.; Ma, S.; Zhang, C.H. Adaptive Lasso for sparse high-dimensional regression models. Stat. Sin. 2008, 18, 1603–1618. [Google Scholar]
- van de Geer, S.A.; Bühlmann, P. On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 2009, 3, 1360–1392. [Google Scholar] [CrossRef]
- Zhang, C.H.; Huang, J. The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann. Stat. 2008, 36, 1567–1594. [Google Scholar] [CrossRef]
- Bühlmann, P.; van de Geer, S. Statistics for High-Dimensional Data: Methods, Theory, and Applications; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
- Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).