Adaptive CoCoLasso for High-Dimensional Measurement Error Models

Qin Yu

doi:10.3390/e27020097

School of Management, University of Science and Technology of China, Hefei 230026, China

Entropy2025, 27(2), 97;https://doi.org/10.3390/e27020097

This article belongs to the Special Issue Information-Theoretic Methods in Data Analytics

Version Notes

Order Reprints

Abstract

A significant portion of theoretical and empirical studies in high-dimensional regression have primarily concentrated on clean datasets. However, in numerous practical scenarios, data are often corrupted by missing values and measurement errors, which cannot be ignored. Despite the substantial progress in high-dimensional regression with contaminated covariates, methods that achieve an effective trade-off among prediction accuracy, feature selection, and computational efficiency remain significantly underexplored. We introduce adaptive convex conditioned Lasso (Adaptive CoCoLasso), offering a new approach that can handle high-dimensional linear models with error-prone measurements. This estimator combines a projection onto the nearest positive semi-definite matrix with an adaptively weighted

ℓ_{1}

penalty. Theoretical guarantees are provided by establishing error bounds for the estimators. The results from the synthetic data analysis indicate that the Adaptive CoCoLasso performs strongly in prediction accuracy and mean squared error, particularly in scenarios involving both additive and multiplicative noise in measurements. While the Adaptive CoCoLasso estimator performs comparably or is slightly outperformed by certain methods, such as Hard, in reducing the number of incorrectly identified covariates, its strength lies in offering a more favorable trade-off between prediction accuracy and sparse modeling.

Keywords:

Adaptive CoCoLasso; measurement error; high-dimensional regression; nearest positive semi-definite projection

1. Introduction

High-dimensional statistical learning has found extensive applications across diverse fields, including artificial intelligence, genomics, molecular biology, and economics. Numerous effective methods leveraging sparse learning through regularization have been developed to facilitate statistical inference in high-dimensional settings. These methods are well-documented in various studies, such as [1,2,3,4,5,6,7,8,9,10,11,12,13], among others. However, most of the previous research has focused on error-free data. In practice, measurement errors are prevalent in applications such as surveys with missing or inaccurate data due to non-responses, voting systems affected by imprecise instruments or systematic biases, and sensor networks corrupted by communication failures or environmental interference. Challenges involving noisy, incomplete, or corrupted data are frequently encountered. Naively applying methods designed for clean datasets to those affected by measurement errors can lead to inconsistent and imprecise estimates, which in turn result in inaccurate conclusions, particularly in high-dimensional settings. Therefore, developing robust methods for model selection and estimation that explicitly account for measurement errors in high-dimensional problems is of paramount importance.

In recent years, sparse modeling in high-dimensional models with measurement errors has garnered widespread attention. For instance, Ref. [14] proposed minimizing regularized least squares while accounting for additive measurement errors in the covariate matrices of partially linear models. In high-dimensional linear sparse regression, Ref. [15] developed a Lasso-type estimator that utilizes an unbiased approximation to replace the corrupted Gram matrix. However, incorporating measurement error information often leads to non-convex likelihood functions, complicating the solution of the associated optimization problems. To address this challenge, Ref. [16] proposed the nearest positive semi-definite projection matrix as an approximation for the unbiased Gram matrix estimate. Using this matrix as a foundation, they introduced the convex conditioned Lasso (CoCoLasso), which reformulates the objective function as a convex optimization problem to facilitate efficient sparse learning in error-prone high-dimensional linear models.

Although CoCoLasso demonstrates superior computational efficiency due to its convex optimization framework, the use of the

ℓ_{1}

penalty imposes the same level of shrinkage on all coefficients, which can introduce biases. This often results in overfitting by selecting an overly complex model to minimize prediction error [1,12,17]. To address the biases and overfitting introduced by the

ℓ_{1}

penalty, Ref. [18] proposed balanced estimation, which is based on the nearest positive semi-definite matrix and incorporates combined

ℓ_{1}

and concave regularization. Although balanced estimation achieves an ideal trade-off between prediction accuracy and variable selection, it suffers from certain limitations. Firstly, the non-convex nature of the concave regularization results in increased computational complexity, rendering its application in high-dimensional settings difficult. Secondly, the selection of tuning parameters for the concave penalty is often difficult and may lead to suboptimal performance in practice.

To address these issues, we propose Adaptive CoCoLasso, which combines the nearest positive semi-definite projection with an adaptive

ℓ_{1}

penalty. The Adaptive CoCoLasso estimator not only preserves the computational efficiency of convex optimization but also achieves precise estimation and feature selection in the presence of both additive and multiplicative measurement errors. By imposing higher penalties on zero coefficients and lower penalties on nonzero coefficients, the Adaptive CoCoLasso minimizes estimation bias and enhances variable selection accuracy. Furthermore, error bounds for the Adaptive CoCoLasso estimator are established, and a theorem guarantees the consistency of support recovery.

This paper makes two primary contributions. First, we propose the Adaptive CoCoLasso estimator for high-dimensional linear regression models where the design matrix is affected by measurement errors, aiming to ensure precise estimation and accurate variable selection. By applying stronger penalties to zero coefficients and weaker penalties to nonzero coefficients, the method effectively mitigates overfitting when dealing with additive and multiplicative measurement errors. In addition, we establish theoretical guarantees for the proposed method by deriving oracle inequalities for prediction and estimation errors and proving the consistency of support recovery. Extensive simulation studies demonstrate the effectiveness of our approach.

The structure of this paper is as follows. Section 2 outlines the model setup and introduces the proposed Adaptive CoCoLasso estimator. Section 3 presents the theoretical properties, including bounds on oracle estimation errors. Section 4 evaluates the finite-sample performance of the proposed method through simulation studies. We provide all the proofs in the Appendix A.

Notation 1.

For a vector

x = {(x_{1}, \dots, x_{p})}^{⊤}

, the

ℓ_{q}

norm is defined as

{∥ x ∥}_{q} = {(\sum_{j = 1}^{p} {| x_{j} |}^{q})}^{1 / q}

for

q \in (0, \infty)

, and the

ℓ_{\infty}

norm is provided by

{∥ x ∥}_{\infty} = {max}_{1 \leq i \leq p} | x_{i} |

. For a matrix

A = (a_{i j}) \in R^{p \times q}

, the following matrix norms are defined:

{∥ A ∥}_{1} = {max}_{1 \leq j \leq q} \sum_{i = 1}^{p} | a_{i j} |

,

{∥ A ∥}_{\infty} = {max}_{1 \leq i \leq p} \sum_{j = 1}^{q} | a_{i j} |

,

{∥ A ∥}_{max} = {max}_{i, j} | a_{i j} |

, and

{∥ A ∥}_{2} = Λ_{max} {(A^{⊤} A)}^{1 / 2}

, where

Λ_{min} (A)

and

Λ_{max} (A)

denote the smallest and largest eigenvalues of

A

, respectively.

2. Adaptive CoCoLasso for Error-Prone Models

2.1. Model Setting

Consider the high-dimensional linear regression model

y = X β + ε,

(1)

where

y = {(y_{1}, \dots, y_{n})}^{⊤}

represents the n-dimensional response vector,

X = {(X_{1}, \dots, X_{n})}^{⊤} \in R^{n \times p}

denotes the fixed design matrix,

β = {(β_{1}, \dots, β_{p})}^{⊤}

is the unknown p-dimensional regression coefficient vector,

ε \sim N (0, σ^{2} I_{n})

is an n-dimensional error vector independent of

X

, and

I_{n}

is the

n \times n

identity matrix (the Gaussian distribution is assumed for simplicity in analysis. However, similar theoretical results hold under the sub-Gaussian assumption provided that the tail probability of

ε

decays exponentially). Measurement errors in the design matrix are common in various applications, leading to the observation of a corrupted covariate matrix

W \in R^{n \times p}

rather than the true matrix

X

.

Two classical cases are associated with measurement errors in the design matrix

X

. For cases of additive errors, the observed covariates are represented as

W = X + A

, where the rows of the additive error matrix

A = {(a_{i j})}_{n \times p}

are independently and identically distributed (i.i.d.) with a mean vector

0

and a covariance matrix

Σ_{a}

. For cases of multiplicative errors, the observed covariates follow

W = X ⊙ M

, where ⊙ denotes the Hadamard product and the rows of the multiplicative error matrix

M = {(m_{i j})}_{n \times p}

have mean vector

μ_{m}

and covariance matrix

Σ_{m}

. Missing data can be treated as a special case of this model, where the entries of

M

are Bernoulli random variables with success probability

1 - π_{j}

, representing the probability of observing the j-th covariate, and

π_{j}

denotes the missingness rate for the j-th covariate. To ensure model identifiability, the covariance matrix

Σ_{a}

(for additive errors) or the pair

(μ_{m}, Σ_{m})

(for multiplicative errors) is assumed to be known, as in [16,18].

2.2. Adaptive CoCoLasso

In high-dimensional settings where the dimensionality p exceeds the sample size n, the true coefficient vector

β

is often assumed to be sparse. Specifically, the support set

S = {j : β_{j} \neq 0}

, representing the indices of truly relevant predictors, has size

s = | S |

satisfying

s = o (\frac{n}{log p})

. This sparsity assumption ensures model identifiability by requiring that only a small subset of predictors are nonzero, i.e.,

s ≪ n

. Let

S^{C}

denote the complementary set of S. In the context of clean data, penalized least squares methods are widely employed for sparse estimation of the true coefficient vector

β = {(β_{1}, \dots, β_{p})}^{⊤}

in high-dimensional linear models. The loss function depends on

Σ

and

ρ

, where

Σ = \frac{1}{n} X^{⊤} X

represents the Gram matrix and

ρ = \frac{1}{n} X^{⊤} y

denotes the marginal correlation vector of

(X, y)

, respectively. When the covariate matrix is affected by errors, [15] proposed unbiased estimators

\hat{Σ}

and

\tilde{ρ}

to approximate the unobservable quantities

Σ

and

ρ

. Specifically, these estimators can be expressed as

{\hat{Σ}}_{a d d} = \frac{1}{n} W^{⊤} W - Σ_{a}, {\tilde{ρ}}_{a d d} = \frac{1}{n} W^{⊤} y

(2)

for the additive error cases and

{\hat{Σ}}_{m u l t} = \frac{1}{n} W^{⊤} W ⊘ (Σ_{m} + μ_{m} μ_{m}^{⊤}), {\tilde{ρ}}_{m u l t} = \frac{1}{n} W^{⊤} y ⊘ μ_{m}

(3)

for the multiplicative error cases, where ⊘ denotes element-wise division.

However, the unbiased surrogate

\hat{Σ}

is generally not positive semi-definite in high-dimensional scenarios. Consequently,

\hat{Σ}

may possess a negative eigenvalue, resulting in the term

{\hat{β}}^{⊤} \hat{Σ} \hat{β}

lacking a lower bound and causing the loss function to lose convexity. To resolve this problem, the unbiased surrogate

\hat{Σ}

is replaced by its nearest positive semi-definite projection matrix, defined as

\tilde{Σ} = arg {min}_{Σ \geq 0} {∥ Σ - \hat{Σ} ∥}_{max}

, which can be efficiently solved using the alternating direction method of multipliers (ADMM). By definition and the triangle inequality, it follows that

\begin{matrix} ∥ \tilde{Σ} - \hat{Σ} ∥_{max} & \leq ∥ \tilde{Σ} {- Σ ∥}_{max} + ∥ Σ - \hat{Σ} ∥_{max} \leq 2 {∥ Σ - \hat{Σ} ∥}_{max}, \end{matrix}

(4)

indicating that

\tilde{Σ}

serves as an approximation to

Σ

with accuracy comparable to that of the unbiased estimate

\hat{Σ}

.

Following the processing of

\tilde{Σ}

and

ρ

, Ref. [16] introduced the CoCoLasso method, which employs the

ℓ_{1}

penalty for regularization. However, the

L_{1}

penalty often selects overly large models to minimize prediction risk. Motivated by [19], we adopt a weighted

L_{1}

penalty to develop a convex objective function, where the weights are determined by an initial estimator. Suppose

\tilde{β} = {({\tilde{β}}_{1}, \dots, {\tilde{β}}_{p})}^{⊤}

is an initial estimator of

β

, which provides preliminary estimates of the regression coefficients. Based on this initial estimator, we define the weight vector

ω = {(ω_{1}, \dots, ω_{p})}^{⊤}

, where

ω_{j} = {| {\tilde{β}}_{j} |}^{- 1}

for

j = 1, \dots, p

. These weights enable the penalty to adapt to the relative importance of each variable, assigning larger penalties to coefficients with smaller initial estimates and smaller penalties to coefficients with larger initial estimates. Specifically, the proposed Adaptive CoCoLasso estimator

\hat{β}

is defined as the optimal solution to the following optimization problem, computed after obtaining

\tilde{Σ}

and

\tilde{ρ}

as provided in (2) and (3):

\begin{matrix} \hat{β} & = arg min_{β \in R^{p}} \{\frac{1}{2} β^{⊤} \tilde{Σ} β - {\tilde{ρ}}^{⊤} β + 2 λ \sum_{j = 1}^{p} ω_{j} | β_{j} |\}, \end{matrix}

(5)

where

λ = c_{0} s {[\frac{log p}{n}]}^{1 / 2}

for some positive constant

c_{0}

. Let

n^{- 1 / 2} \tilde{W} \in R^{p \times p}

be the Cholesky factor of

\tilde{Σ}

, such that

n^{- 1} {\tilde{W}}^{⊤} \tilde{W} = \tilde{Σ}

, and define

\tilde{y} \in R^{p}

to satisfy

n^{- 1} {\tilde{W}}^{⊤} \tilde{y} = \tilde{ρ}

. Then, the proposed Adaptive CoCoLasso estimator

\hat{β}

defined in (5) can be reformulated equivalently as the global minimizer of the following optimization problem

\begin{matrix} \hat{β} & = arg min_{β \in R^{p}} \{\frac{1}{2 n} ∥ \tilde{y} - \tilde{W} {β ∥}_{2}^{2} + 2 λ \sum_{j = 1}^{p} ω_{j} | β_{j} |\} . \end{matrix}

(6)

For comparison, CoCoLasso solves the following optimization problem:

{\hat{β}}^{CoCoLasso} = arg min_{β \in R^{p}} \{\frac{1}{2 n} ∥ \tilde{y} - \tilde{W} {β ∥}_{2}^{2} + λ \sum_{j = 1}^{p} | β_{j} |\} .

Here, the penalty term

λ \sum_{j = 1}^{p} | β_{j} |

introduces sparsity by shrinking some coefficients to exactly zero, effectively performing variable selection. However, since the same penalty

λ > 0

is applied to all coefficients, it tends to introduce bias, particularly for larger coefficients. Unlike CoCoLasso, Adaptive CoCoLasso incorporates data-driven weights

ω_{j}

, where

ω_{j} = {| {\tilde{β}}_{j} |}^{- 1}

, into the penalty term

\sum_{j = 1}^{p} ω_{j} | β_{j} |,

adjusting the penalty to reflect the relative importance of each variable. This weighting scheme enables the Adaptive CoCoLasso to handle cases where variables have vastly different scales or signal strengths. Assigning smaller penalties to variables with larger estimated coefficients avoids over-penalizing important predictors, improving both variable selection accuracy and coefficient estimation. Additionally, the Adaptive CoCoLasso enhances the recovery of weak signals and reduces bias introduced by uniform penalties in traditional CoCoLasso.

3. Theoretical Properties

In this section, we rigorously derive the statistical error bounds for the Adaptive CoCoLasso estimate under

ℓ_{1}

and

ℓ_{2}

norms and establish theoretical guarantees for exact support recovery with high probability. Before discussing the theoretical results, we outline four technical assumptions.

Condition 1.

The distributions of

\hat{Σ}

and

\tilde{ρ}

are identified by a set of parameters

θ

. Then, there exist generic constants C and c and positive functions ξ and

ϵ_{0}

depending on

β_{S}, β_{S^{C}}

, and

σ^{2}

such that, for every

ϵ \leq ϵ_{0}

,

\hat{Σ}

and

\tilde{ρ}

satisfy the following probability statements:

\begin{matrix} P (| {\hat{Σ}}_{i j} - Σ_{i j} | \geq ϵ) & \leq C exp (- c n ϵ^{2} ξ^{- 1}), f o r a n y i, j = 1, \dots, p; \end{matrix}

(7)

\begin{matrix} P (| {\tilde{ρ}}_{j} - ρ_{j} | \geq ϵ) & \leq C exp (- c n s^{- 2} ϵ^{2} ξ^{- 1}), f o r a n y j = 1, \dots, p . \end{matrix}

(8)

Condition 2.

For some positive constant κ, assume

0 < κ = min_{∥ v_{S^{C}} ∥_{1} \leq 3 {∥ v_{S} ∥}_{1}} \frac{v^{⊤} Σ v}{{∥ v ∥}_{2}^{2}},

where

v = {(v_{S}^{⊤}, v_{S^{C}}^{⊤})}^{⊤} \in R^{p}

, with

v_{S}

and

v_{S^{C}}

representing the subvectors corresponding to the support set S and its complement

S^{C}

, respectively.

Condition 3.

The minimum signal strength satisfies

{min}_{j \in S} | β_{j} | \geq C \sqrt{s} λ,

where

C > 0

is a positive constant.

Condition 4.

The initial estimator

\tilde{β}

satisfies

∥ \tilde{β} {- β ∥}_{2} = O_{P} (\sqrt{s} λ)

with

λ = c_{0} s {[\frac{log p}{n}]}^{1 / 2}

.

Condition 1, known as the closeness condition proposed by [16], requires that surrogates

\hat{Σ}

(and consequently

\tilde{Σ}

) and

\tilde{ρ}

achieve sufficient element-wise closeness to

Σ

and

ρ

, respectively. This condition has already been proven in [16] for typical additive and multiplicative measurement error cases, with

\hat{Σ}

and

\tilde{ρ}

defined in Equations (2) and (3). Condition 2, the restricted eigenvalue (RE) condition, ensures the stability and non-degeneracy of the design matrix on sparse predictor subsets. A similar RE condition was used in [20] to derive statistical error bounds for the clean Lasso estimator. Condition 3 specifies the minimum signal strength, ensuring that the true signal is large enough relative to the regularization parameter

λ

to differentiate significant predictors from noise. This condition is commonly assumed in high-dimensional regression settings to guarantee consistent variable selection and accurate estimation [21,22].

Condition 4 ensures that the initial estimator

\tilde{β}

approximates the true parameter

β

with an

ℓ_{2}

error rate of

O_{P} (s \sqrt{\frac{s log p}{n}})

, providing the accuracy needed to construct adaptive weights that capture the underlying sparsity and improve the efficiency of the Adaptive CoCoLasso estimator. For clean covariates, commonly used initial estimators include Lasso [10], which leverages sparsity in high-dimensional settings, and ridge regression [23], which addresses multicollinearity effectively. Another widely used approach is the marginal regression estimator, which achieves zero-consistency under a partial orthogonality condition and is derived by fitting univariate regressions for each predictor separately [19]. When faced with a design matrix with measurement errors, CoCoLasso can be used as the initial estimator as it is specifically designed for measurement error models. Alternatively, the estimator introduced in [15], which provides theoretical guarantees for high-dimensional regression with noisy or missing data, can act as an initial estimator.

Theorem 1.

Under Conditions 1–4, the Adaptive CoCoLasso estimator

\hat{β}

satisfies the following oracle inequalities with high probability

1 - O (p^{- c_{1}})

for some positive constant

c_{1}

:

∥ \hat{β} {- β ∥}_{2} = O (\sqrt{s} λ), {∥ \hat{β} - β ∥}_{1} = O (s λ) .

Here,

λ = c_{0} s {[\frac{log p}{n}]}^{1 / 2}

, and the constants hidden in the

O (\cdot)

notation depend on the restricted eigenvalue constant κ, the noise variance

σ^{2}

, and the probabilistic bounds specified in Condition 1.

Moreover, with high probability

1 - O (p^{- c_{2}})

, it holds that

P (supp (\hat{β}) = supp (β)) \to 1 as n \to \infty,

where

supp (β) = {j : β_{j} \neq 0}

denotes the support set of

β

, representing the indices of its nonzero components. Similarly,

supp (\hat{β})

represents the support set of the Adaptive CoCoLasso estimator

\hat{β}

.

Theorem 1 establishes the theoretical properties of the Adaptive CoCoLasso estimator under high-dimensional linear models with measurement errors. Specifically, it guarantees oracle inequalities for the estimation errors in both

ℓ_{2}

and

ℓ_{1}

norms, showing that the errors scale with

\sqrt{s} λ

and

s λ

, respectively. The constants in these bounds depend on key factors such as the restricted eigenvalue condition, noise variance, and probabilistic bounds. Tail probability is influenced by measurement errors, as described by the component

ζ

in Condition 1. Additionally, the theorem guarantees consistent support recovery, meaning that the true set of relevant predictors can be identified with high probability as the sample size n and dimensionality p increase. All proofs are provided in Appendix A.

4. Numerical Studies

Within this section, we utilize synthetic datasets to evaluate the effectiveness of the Adaptive CoCoLasso (A-CoCoLasso) estimator on finite samples. The comparison includes various alternative estimators, such as CoCoLasso [16], balanced estimation with

L_{1}

regularization and smoothly clipped absolute deviation (B-SCAD), balanced estimation with

L_{1}

regularization and hard-thresholding (B-Hard) [18], and the standalone hard-thresholding method. The Adaptive CoCoLasso weights were computed using CoCoLasso regression coefficients. The CoCoLasso and Adaptive CoCoLasso estimators were implemented using the LARS algorithm. All the simulation studies were performed using R, focusing on both additive and multiplicative measurement errors to evaluate the performance of the proposed method. In all the numerical experiments, the penalty parameter

λ

was determined through 10-fold cross-validation.

To evaluate the aforementioned estimators, we employed the performance metrics introduced in [18]. The first two metrics are the count of correctly selected covariates (C) and the count of incorrectly selected covariates (IC). These are defined as

C = True Positives (TP) = \sum_{j \in S} I ({\hat{β}}_{j} \neq 0)

and

I C = False Positives (FP) = \sum_{j \notin S} I ({\hat{β}}_{j} \neq 0)

, respectively. The third and fourth metrics are the prediction error (PE) and the mean squared error (MSE), provided by

PE (\hat{β}) = {(β - \hat{β})}^{⊤} Σ_{X} (β - \hat{β})

and

MSE (\hat{β}) = {∥ β - \hat{β} ∥}_{2}^{2}

, respectively. These metrics collectively assess feature selection accuracy through C and IC while evaluating predictive performance and estimation performance via PE and MSE, thus providing a comprehensive comparison framework.

4.1. Additive Error Cases

Example 1.

We followed the simulation setup in [18] and generated 100 datasets, each containing

n = 100

observations from the linear model

y = X β + ε

, with

p = 250

,

β = {(3, 1.5, 0, 0, 2, 0, \dots, 0)}^{⊤}

, and

σ = 3

. The components of

X

and

ε

were independently sampled from a multivariate normal distribution

N (0, Σ_{X})

. We considered two covariance structures for

Σ_{X}

: the autoregressive structure, where

Σ_{X} = {(0 . 5^{| i - j |})}_{1 \leq i, j \leq p}

, and the compound symmetry structure, where

Σ_{X} = 0.3 1 1^{⊤} + 0.7 I

. The contaminated covariates

W

were obtained as

W = X + A

, where the rows of

A

were independently drawn from

N (0, τ^{2})

with

τ = 0.75

and

τ = 1.25

, respectively. The results for the five estimators are summarized in Table 1 and Table 2.

Table 1. Means and standard errors (in parentheses) of four performance metrics for five methods under additive error cases over 100 replications in the autoregressive structure.

Table 2. Means and standard errors (in parentheses) of four performance metrics for five methods under additive error cases over 100 replications in the compound symmetric structure.

The results in Table 1 and Table 2 highlight the comparative performance of the five methods under additive measurement errors in both the autoregressive and compound symmetric structures. Comparing A-CoCoLasso with CoCoLasso, A-CoCoLasso consistently achieved a higher number of C and fewer instances of IC. For example, in the autoregressive structure with

τ = 0.75

, A-CoCoLasso attained C = 2.93, which is closer to the ideal value of 3, compared to CoCoLasso (C = 2.94) while significantly reducing IC from 12.54 (CoCoLasso) to 3.89. Furthermore, A-CoCoLasso demonstrated better prediction error (PE = 1.68) and mean squared error (MSE = 2.01) compared to CoCoLasso (PE = 3.65; MSE = 3.64). Similarly, when compared to Hard, A-CoCoLasso achieved a higher C (2.93 vs. 2.27) and better overall estimation and prediction performance. These results indicate that A-CoCoLasso provides more accurate variable selection and estimation than both CoCoLasso and Hard.

In comparison to the balanced estimation methods (B-SCAD and B-Hard), A-CoCoLasso also demonstrated superior performance, particularly in reducing IC while maintaining C closer to the ideal value. For example, in the compound symmetric structure with

τ = 1.25

, A-CoCoLasso achieved C = 2.46, outperforming both B-SCAD (C = 2.07) and B-Hard (C = 1.64). Additionally, A-CoCoLasso maintained a competitive IC of 12.88, which is lower than that of B-SCAD (IC = 13.59), while achieving better prediction and estimation accuracy (PE = 6.80; MSE = 8.39) compared to both B-SCAD (PE = 8.01; MSE = 9.91) and B-Hard (PE = 8.60; MSE = 11.04). These results demonstrate that A-CoCoLasso not only balances the trade-off between correctly identifying covariates and excluding noise variables but also provides robust estimation and prediction accuracy under various settings with additive measurement errors.

Example 2.

To investigate the performance of the methods under ultra-high-dimensional settings with additive measurement errors, we adopted a similar setting to that in [7]. The coefficient vector was specified as

β = {(1, - 0.5, 0.7, - 1.2, - 0.9, 0.3, 0.55, 0, \dots, 0)}^{⊤}

. The sample size, dimensionality, and noise level were set as

n = 80

,

p = 1000

, and

σ = 1

, respectively, reflecting an ultra-high-dimensional setting. The variability in the additive errors was characterized by standard deviation values of

τ = 0.25

and

0.5

. Table 3 summarizes the performance of the five methods under this setting.

Table 3. Means and standard errors (in parentheses) of four performance metrics for five methods under additive error cases over 100 replications in the ultra-high-dimensional autoregressive structure.

Table 3 presents the performance of the five methods under ultra-high-dimensional settings with additive measurement errors for

τ = 0.25

and

τ = 0.5

. In terms of the number of correctly identified covariates (C), A-CoCoLasso achieved a competitive performance compared to CoCoLasso and B-SCAD, with values close to the ideal benchmark of 4 under

τ = 0.25

(C = 3.95), and slightly reduced performance under

τ = 0.5

(C = 3.56). In contrast, Hard demonstrated considerably lower values for C (3.14 for

τ = 0.25

and 2.36 for

τ = 0.5

), indicating weaker variable selection ability.

When examining the number of incorrectly identified covariates (IC), A-CoCoLasso substantially outperformed CoCoLasso and B-SCAD, maintaining much lower IC values (12.7 for

τ = 0.25

and 11.05 for

τ = 0.5

) compared to CoCoLasso (31.32 and 24.66, respectively) and B-SCAD (21.69 and 21.84, respectively). Hard achieved the smallest IC but at the cost of reduced C, highlighting its conservative nature. In terms of PE and MSE, A-CoCoLasso remained competitive, with PE = 0.90 and MSE = 1.55 for

τ = 0.25

, and PE = 1.21 and MSE = 2.01 for

τ = 0.5

, showing comparable or better results than Hard and B-Hard while maintaining a balanced variable selection performance.

4.2. Multiplicative Error Cases

Example 3.

We evaluated the performance of Adaptive CoCoLasso and other competing methods, including CoCoLasso, Hard, and balanced estimation, under multiplicative measurement errors. The true model remained the same as in the additive error setup, as described in Example 1. To simulate the multiplicative errors, we generated

W = X ⊙ M

, where the components of

M = {(m_{i j})}_{n \times p}

followed a log-normal distribution. Specifically,

log (m_{i j})

independently followed the same distribution as

N (0, τ^{2})

, with

τ = 0.25

and

τ = 0.75

. Table 4 and Table 5 present the outcomes for the multiplicative error scenarios.

Table 4. Means and standard errors (in parentheses) of four performance metrics for five methods under multiplicative error cases over 100 replications in the autoregressive structure.

Table 5. Means and standard errors (in parentheses) of four performance metrics for five methods under multiplicative error cases over 100 replications in the compound symmetric structure.

The results in Table 4 and Table 5 demonstrate that A-CoCoLasso exhibits strong performance under multiplicative measurement errors across both autoregressive and compound symmetric structures. Compared to the other methods, A-CoCoLasso achieved a desirable balance between correctly identifying covariates and maintaining low false discovery rates while also demonstrating competitive prediction and estimation accuracy. Its robustness under varying levels of multiplicative error (

τ = 0.25

and

τ = 0.75

) further highlights its adaptability and effectiveness in handling challenging high-dimensional scenarios with multiplicative measurement errors.

Example 4.

We examined the performance of Adaptive CoCoLasso in ultra-high-dimensional settings with multiplicative measurement errors. To maintain comparability with the additive error scenarios, the simulation setup remained largely consistent with that in Example 2, except that the standard deviation values of the multiplicative errors were specified as

τ = 0.1

and

τ = 0.2

, ensuring a comparable signal-to-noise ratio. The performance of the five methods is summarized in Table 6.

Table 6. Means and standard errors (in parentheses) of four performance metrics for five methods under multiplicative error cases over 100 replications in the ultra-high-dimensional autoregressive structure.

Table 6 presents the performance of five methods under ultra-high-dimensional settings with multiplicative measurement errors for

τ = 0.1

and

τ = 0.2

. The results demonstrate that A-CoCoLasso achieved a desirable balance between correctly identifying covariates (C) and maintaining a low number of incorrectly identified covariates (IC) while delivering competitive prediction and estimation accuracy. For

τ = 0.1

, A-CoCoLasso shows robust performance, with C = 4.18 and IC = 14.55, outperforming CoCoLasso in terms of IC (34.79) while maintaining similar predictive performance (PE = 0.78 vs. PE = 1.21 for CoCoLasso). As

τ

increased to 0.2, A-CoCoLasso remained effective with C = 3.57 and IC = 4.26, again demonstrating a significant reduction in IC compared to CoCoLasso (26.81). Additionally, A-CoCoLasso achieved comparable PE and MSE values to the best-performing methods, such as B-SCAD, while demonstrating better variable selection than Hard. Overall, these results highlight A-CoCoLasso’s ability to effectively balance variable selection and prediction accuracy under challenging ultra-high-dimensional multiplicative error settings.

5. Discussion

This paper introduces the Adaptive CoCoLasso estimator, designed to balance prediction accuracy and feature selection in high-dimensional linear regression with measurement errors, effectively addressing both additive and multiplicative cases. The proposed method combines two key techniques: the nearest positive semi-definite projection matrix for correcting measurement errors by reconstructing the covariate matrix and an adaptive weighted

ℓ_{1}

penalty, which enhances sparsity and variable selection by assigning data-driven weights to the coefficients. Unlike combined

ℓ_{1}

and concave regularization, which introduces computational challenges due to its non-convex nature and parameter tuning difficulties, Adaptive CoCoLasso retains the computational efficiency of convex optimization while providing robust estimation performance. The methodology leverages the LARS algorithm to solve the penalized optimization problem, ensuring scalability to high-dimensional settings. The theoretical analysis and simulation results show that the Adaptive CoCoLasso achieves robust prediction and estimation performance, effectively addressing overfitting and the challenges posed by contaminated data.

Future work could focus on extending the Adaptive CoCoLasso estimator to address statistical inference challenges, such as constructing confidence intervals and performing hypothesis testing. A major difficulty in these extensions arises from the unknown true covariate matrix, which not only makes predicting the response vector challenging but also hinders accurate noise level estimation, even with a reliable coefficient estimator. These issues fall outside the scope of this paper and represent intriguing directions for future research.

Funding

This work was supported by the National Key R&D Program of China (Grant 2022YFA1008000), Natural Science Foundation of China (Grants 72071187, 11671374, 71731010, and 71921001), and Fundamental Research Funds for the Central Universities (Grants WK3470000017 and WK2040000027).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to author.

Conflicts of Interest

The author declares that they have no conflict of interest.

Appendix A

Appendix A.1. Proof of Theorem 1

Proof.

Denote the estimation error by

ν = \hat{β} - β

, where

\hat{β}

is the global minimizer defined in the Adaptive CoCoLasso method. For simplicity of notation, we let

λ ∥ \hat{β} ∥_{1, ω}

denote

λ \sum_{j = 1}^{p} ω_{j} | β_{j} |

, where

∥ \hat{β} ∥_{1, ω} = \sum_{j = 1}^{p} ω_{j} | β_{j} |

represents the weighted

ℓ_{1}

norm. Based on the definition of

\hat{β}

in (6), the following inequality holds

\frac{1}{2} {\hat{β}}^{⊤} \tilde{Σ} \hat{β} - {\tilde{ρ}}^{⊤} \hat{β} + 2 λ ∥ \hat{β} ∥_{1, ω} \leq \frac{1}{2} β^{⊤} \tilde{Σ} β - {\tilde{ρ}}^{⊤} β + 2 λ {∥ β ∥}_{1, ω} .

By simple calculations and the definition of the

ℓ_{\infty}

norm, we have

\frac{1}{2} ν^{⊤} \tilde{Σ} ν + 2 λ ∥ \hat{β} ∥_{1, ω} \leq {∥ ν ∥}_{1} ∥ \tilde{ρ} - \tilde{Σ} {β ∥}_{\infty} + 2 λ {∥ β ∥}_{1, ω} .

(A1)

For better clarity, the proof will be divided into six steps, deriving bounds on the prediction based on the inequality above.

Step 1. We derive a bound for the term

∥ \tilde{ρ} - \tilde{Σ} {β ∥}_{\infty}

. By applying the triangle inequality, we obtain

∥ \tilde{ρ} - \tilde{Σ} {β ∥}_{\infty} \leq ∥ \tilde{ρ} {- ρ ∥}_{\infty} {+ ∥ ρ - Σ β ∥}_{\infty} + {∥ (\tilde{Σ} - Σ) β ∥}_{\infty} .

(A2)

For the first part of the inequality (A2) on the right-hand side,

∥ \tilde{ρ} {- ρ ∥}_{\infty}

, Condition 1 implies that

P (| {\tilde{ρ}}_{j} - ρ_{j} | \geq λ / 6) \leq C exp (- c n s^{- 2} λ^{2} ξ^{- 1}), for any j = 1, \dots, p .

Applying the union bound,

P (∥ \tilde{ρ} {- ρ ∥}_{\infty} \geq λ / 6) \leq \sum_{j = 1}^{p} P (| {\tilde{ρ}}_{j} - ρ_{j} | \geq λ / 6) \leq p \cdot C exp (- c n s^{- 2} λ^{2} ξ^{- 1}) .

Thus, we have

∥ \tilde{ρ} {- ρ ∥}_{\infty} = O_{P} (s \sqrt{\frac{log p}{n}}) .

Next, we derive a bound for the second term

{∥ ρ - Σ β ∥}_{\infty}

. Note that

ρ - Σ β = \frac{1}{n} X^{⊤} y - Σ β = \frac{1}{n} X^{⊤} (X β + ε) - Σ β = \frac{1}{n} X^{⊤} ε .

Under the assumption that

ε \sim N (0, σ^{2} I_{n})

, invoking Lemma A1, we have the probability bound

P ({∥ ρ - Σ β ∥}_{\infty} > λ / 6) \leq p \cdot C exp (- c n λ^{2} σ^{- 2}),

for some constants

C > 0

and

c > 0

. Therefore, with high probability,

{∥ ρ - Σ β ∥}_{\infty} = O_{P} (\sqrt{\frac{log p}{n}}) .

The third component

∥ (\tilde{Σ} - Σ) {β ∥}_{\infty}

can be bounded as follows. Using Condition 1, for all

i, j = 1, \dots, p

, we have

P (| {\tilde{Σ}}_{i j} - Σ_{i j} | \geq ϵ) \leq C exp (- c n ϵ^{2} ξ^{- 1}),

where

C > 0

and

c > 0

are constants. Combining Lemma A2, we have

∥ \tilde{Σ} {- Σ ∥}_{max} = O_{P} (\sqrt{\frac{log p}{n}}) .

For the sparse vector

β

with support size

| S | = s

, we additionally obtain

∥ (\tilde{Σ} - Σ) {β ∥}_{\infty} \leq ∥ \tilde{Σ} {- Σ ∥}_{max} {∥ β ∥}_{1} .

Using

{∥ β ∥}_{1} \leq \sqrt{s} {∥ β ∥}_{2}

and assuming

{∥ β ∥}_{2} = O (1)

, it follows that

∥ (\tilde{Σ} - Σ) {β ∥}_{\infty} = O_{P} (s \sqrt{\frac{log p}{n}}) .

Combining the three parts above, we have

∥ \tilde{ρ} - \tilde{Σ} {β ∥}_{\infty} \leq O_{P} (s \sqrt{\frac{log p}{n}}) + O_{P} (\sqrt{\frac{log p}{n}}) + O_{P} (s \sqrt{\frac{log p}{n}}) .

Since the sparsity s dominates in high-dimensional settings, the final bound is

∥ \tilde{ρ} - \tilde{Σ} {β ∥}_{\infty} = O_{P} (s \sqrt{\frac{log p}{n}}) .

(A3)

Step 2. Decompose the error term

ν

into components on the support set

S = {j : β_{j} \neq 0}

and its complement

S^{C}

, so

ν = ν_{S} + ν_{S^{C}} .

The weighted

ℓ_{1}

norms can be written as

∥ \hat{β} ∥_{1, ω} = ∥ {\hat{β}}_{S} ∥_{1, ω} + ∥ {\hat{β}}_{S^{C}} ∥_{1, ω} {, ∥ β ∥}_{1, ω} = {∥ β_{S} ∥}_{1, ω} .

Substituting these into inequality (A1) yields

\frac{1}{2} ν^{⊤} \tilde{Σ} ν + 2 λ ∥ {\hat{β}}_{S^{C}} ∥_{1, ω} \leq ν^{⊤} (\tilde{ρ} - \tilde{Σ} β) + 2 λ {∥ ν_{S} ∥}_{1, ω} .

(A4)

For the first term on the left-hand side of (A4), we have

ν^{⊤} \tilde{Σ} ν = ν^{⊤} Σ ν + ν^{⊤} Δ ν,

where

Δ = \tilde{Σ} - Σ

and

Σ = \frac{1}{n} X^{⊤} X

. By applying Condition 2 for any

ν

satisfying

∥ ν_{S^{C}} ∥_{1} \leq 3 {∥ ν_{S} ∥}_{1}

, we have

ν^{⊤} Σ ν \geq n κ {∥ ν_{S} ∥}_{2}^{2},

where

κ > 0

is the restricted eigenvalue constant. For the error term

Δ = \tilde{Σ} - Σ

, Condition 1 ensures that

{∥ Δ ∥}_{max} \leq O_{P} (\sqrt{\frac{log p}{n}}) .

We can bound

| ν^{⊤} Δ ν |

as follows:

| ν^{⊤} {Δ ν | \leq ∥ Δ ∥}_{max} {∥ ν ∥}_{1}^{2} .

From the sparsity assumption,

{∥ ν ∥}_{1} = ∥ ν_{S} ∥_{1} + ∥ ν_{S^{C}} ∥_{1} \leq 4 {∥ ν_{S} ∥}_{1}

. Applying the Cauchy–Schwarz inequality

∥ ν_{S} ∥_{1} \leq \sqrt{s} {∥ ν_{S} ∥}_{2},

where

s = | S |

, the size of the support set, we obtain

| ν^{⊤} Δ ν | \leq O_{P} (\sqrt{\frac{log p}{n}}) \cdot (4 \sqrt{s} ∥ ν_{S} {∥_{2})}^{2} .

Combining the bounds for

ν^{⊤} Σ ν

and

ν^{⊤} Δ ν

, we have

ν^{⊤} \tilde{Σ} ν = ν^{⊤} Σ ν + ν^{⊤} Δ ν \geq n κ ∥ ν_{S} ∥_{2}^{2} - O_{P} (s \sqrt{\frac{log p}{n}}) {∥ ν_{S} ∥}_{2}^{2} .

When

n ≫ s \sqrt{log p}

, the second term is asymptotically negligible. Redefining the restricted eigenvalue constant as

κ^{'} = κ / 2 > 0

, we ensure that the first term on the left-hand side of (A4) satisfies

ν^{⊤} \tilde{Σ} ν \geq n κ^{'} {∥ ν_{S} ∥}_{2}^{2} .

Bound the first term

ν^{⊤} (\tilde{ρ} - \tilde{Σ} β)

on the right-hand side of (A4) and we have

ν^{⊤} (\tilde{ρ} - \tilde{Σ} β) = \sum_{j \in S} ν_{j} ({\tilde{ρ}}_{j} - {\tilde{Σ}}_{j}^{⊤} β) + \sum_{j \notin S} ν_{j} ({\tilde{ρ}}_{j} - {\tilde{Σ}}_{j}^{⊤} β) .

Using the bound in (A3)

∥ \tilde{ρ} - \tilde{Σ} {β ∥}_{\infty} \leq O_{P} (s \sqrt{\frac{log p}{n}}),

we have

| ν^{⊤} (\tilde{ρ} - \tilde{Σ} β) {| \leq ∥ ν ∥}_{1} \cdot O_{P} (s \sqrt{\frac{log p}{n}}) .

For the sparsity structure of

ν

,

{∥ ν ∥}_{1} = ∥ ν_{S} ∥_{1} + {∥ ν_{S^{C}} ∥}_{1}

. Under the sparsity constraint

∥ ν_{S^{C}} ∥_{1} \leq 3 {∥ ν_{S} ∥}_{1}

, we obtain

{∥ ν ∥}_{1} \leq 4 {∥ ν_{S} ∥}_{1} .

Further, by the Cauchy–Schwarz inequality

∥ ν_{S} ∥_{1} \leq \sqrt{s} {∥ ν_{S} ∥}_{2},

we have

| ν^{⊤} (\tilde{ρ} - \tilde{Σ} β) | \leq 4 \sqrt{s} {∥ ν_{S} ∥}_{2} \cdot O_{P} (s \sqrt{\frac{log p}{n}}) .

Simplifying, this yields

| ν^{⊤} (\tilde{ρ} - \tilde{Σ} β) | \leq O_{P} (\sqrt{s} λ) {∥ ν_{S} ∥}_{2} .

(A5)

Substituting bounds (A5) and

ν^{⊤} \tilde{Σ} ν \geq n κ^{'} {∥ ν_{S} ∥}_{2}^{2}

into inequality (A4), we obtain

\frac{1}{2} n κ^{'} ∥ ν_{S} ∥_{2}^{2} + 2 λ ∥ {\hat{β}}_{S^{C}} ∥_{1, ω} \leq O_{P} (\sqrt{s} λ) ∥ ν_{S} ∥_{2} + 2 λ {∥ ν_{S} ∥}_{1, ω} .

(A6)

Step 3. To bound

∥ {\hat{β}}_{S^{C}} ∥_{1, ω}

on the left-hand side of inequality (A6), we observe that

∥ {\hat{β}}_{S^{C}} ∥_{1, ω} = \sum_{j \in S^{C}} ω_{j} | {\hat{β}}_{j} |,

where the weights

ω = (ω_{1}, \dots, ω_{p})

are defined based on the initial estimator

\tilde{β}

, with

ω_{j} = {| {\tilde{β}}_{j} |}^{- 1}

for

j = 1, \dots, p

. Under Condition 4, the initial estimator satisfies

∥ \tilde{β} {- β ∥}_{2} = O_{P} (\sqrt{s} λ)

, and the sparsity assumption implies that

β_{j} = 0

for

j \in S^{C}

. By the consistency of

\tilde{β}

, we have

| {\tilde{β}}_{j} | = O_{P} (\sqrt{s} λ)

for

j \in S^{C}

, leading to

ω_{j} = O_{P} (\frac{1}{\sqrt{s} λ})

. Thus, the penalty term

2 λ ω_{j} | β_{j} |

in the objective function (6) becomes

2 λ ω_{j} | β_{j} | = O_{P} (λ \cdot ω_{j} \cdot | β_{j} |) = O_{P} (\frac{1}{\sqrt{s}} | β_{j} |) .

Under Condition 3, as

λ ω_{j}

becomes large for

j \in S^{C}

, the penalty ensures

P ({\hat{β}}_{j} = 0) \to 1

as

n \to \infty

. To bound

∥ ν_{S} ∥_{1, ω}

on the right-hand side of (A6), we note that the weighted

ℓ_{1}

norm satisfies

∥ ν_{S} ∥_{1, ω} = \sum_{j \in S} ω_{j} | ν_{j} |,

where

ω_{j} = {| {\tilde{β}}_{j} |}^{- 1}

, and

{\tilde{β}}_{j}

denotes the initial estimator. Under Condition 4, the weights satisfy

| ω_{j} | = O_{P} (1)

for

j \in S

since

{\tilde{β}}_{j}

is consistent and

β_{j} \neq 0

. Thus, the weighted

ℓ_{1}

norm satisfies

∥ ν_{S} ∥_{1, ω} = \sum_{j \in S} ω_{j} | ν_{j} | \leq C ∥ ν_{S} ∥_{1}

for some constant

C > 0

. Substituting, we obtain

\frac{1}{2} n κ^{'} ∥ ν_{S} ∥_{2}^{2} \leq ∥ ν_{S} ∥_{1} \cdot O_{P} (s \sqrt{\frac{log p}{n}}) + 2 C λ {∥ ν_{S} ∥}_{1} .

(A7)

Step 4. We derive the

ℓ_{2}

norm bound for the estimation error

ν = \hat{β} - β

. Using the Cauchy–Schwarz inequality, the

ℓ_{1}

norm and

ℓ_{2}

norm satisfy

∥ ν_{S} ∥_{1} \leq \sqrt{s} {∥ ν_{S} ∥}_{2}

, where

s = | S |

is the sparsity level. Substituting this, inequality (A7) can be rewritten as

\frac{1}{2} n κ^{'} ∥ ν_{S} ∥_{2}^{2} \leq \sqrt{s} ∥ ν_{S} ∥_{2} \cdot O_{P} (s \sqrt{\frac{log p}{n}}) + 2 C λ \sqrt{s} {∥ ν_{S} ∥}_{2} .

Factoring out

∥ ν_{S} ∥_{2}

, we have

∥ ν_{S} ∥_{2} (\frac{1}{2} n κ^{'} {∥ ν_{S} ∥}_{2} - \sqrt{s} O_{P} (s \sqrt{\frac{log p}{n}}) - 2 C λ \sqrt{s}) \leq 0 .

For

∥ ν_{S} ∥_{2} > 0

, the term in parentheses must be non-positive. We have

∥ ν_{S} ∥_{2} \leq \frac{2 \sqrt{s}}{n κ^{'}} O_{P} (s \sqrt{\frac{log p}{n}}) + \frac{4 C \sqrt{s} λ}{n κ^{'}} .

Substituting

λ = O (s \sqrt{\frac{log p}{n}})

, we obtain

∥ ν_{S} ∥_{2} \leq \frac{\sqrt{s}}{n κ^{'}} O_{P} (s \sqrt{\frac{log p}{n}}) .

Simplifying, the

ℓ_{2}

norm of the error satisfies

∥ ν_{S} ∥_{2} \leq C_{1} s \sqrt{\frac{s log p}{n}},

where

C_{1} > 0

is a constant depending on

κ^{'}

. Hence, we obtain

∥ \hat{β} {- β ∥}_{2} = {∥ ν_{S} ∥}_{2} \leq C_{1} \sqrt{s} λ .

(A8)

where

λ = s \sqrt{\frac{log p}{n}}

and

C_{1}

is some positive constant.

Step 5. We derive the

ℓ_{1}

norm bound for the estimation error

ν = \hat{β} - β

. The

ℓ_{1}

norm of

ν

decomposes as

{∥ ν ∥}_{1} = ∥ ν_{S} ∥_{1} + {∥ ν_{S^{C}} ∥}_{1}

. By the sparsity assumption,

∥ ν_{S^{C}} ∥_{1} \leq 3 {∥ ν_{S} ∥}_{1}

, we have

{∥ ν ∥}_{1} \leq 4 {∥ ν_{S} ∥}_{1} .

By the Cauchy–Schwarz inequality, the

ℓ_{1}

norm on the support set S satisfies

∥ ν_{S} ∥_{1} \leq \sqrt{s} {∥ ν_{S} ∥}_{2}

, where

s = | S |

is the sparsity level. Substituting the

ℓ_{2}

norm bound from inequality (A8),

∥ ν_{S} ∥_{2} \leq C_{1} \sqrt{s \frac{s log p}{n}}

, where

C_{1} > 0

depends on

κ^{'}

, n, and p. Substituting into the inequality for

∥ ν_{S} ∥_{1}

, we have

∥ ν_{S} ∥_{1} \leq s \sqrt{s} \cdot C_{1} \sqrt{\frac{s log p}{n}} = C_{1} \frac{s^{2} log p}{n}

. Combining the bounds for

ν_{S}

and

ν_{S^{C}}

, the

ℓ_{1}

norm satisfies

{∥ ν ∥}_{1} \leq 4 {∥ ν_{S} ∥}_{1} \leq 4 C_{1} \frac{s^{2} log p}{n}

. Defining

C_{2} = 4 C_{1}

, we conclude

∥ \hat{β} {- β ∥}_{1} = {∥ ν ∥}_{1} \leq C_{2} s λ,

where

λ = s \sqrt{\frac{log p}{n}}

,

C_{2} > 0

depends on

κ^{'}

, s, n, and p.

Step 6. We aim to show that

\hat{β}

correctly identifies the support set of the true regression coefficients

β

, such that

P (supp (\hat{β}) = supp (β)) \to 1 as n \to \infty .

For a vector

ν = \hat{β} - β

, we have

{∥ ν ∥}_{\infty} \leq {∥ ν ∥}_{2} .

From inequality (A8), the

ℓ_{2}

norm of the error satisfies

∥ \hat{β} {- β ∥}_{2} \leq C_{1} \sqrt{s} λ .

Then, we conclude that

∥ \hat{β} {- β ∥}_{\infty} \leq O_{P} (\sqrt{s} λ) .

For each

j \in S

, the minimum signal strength, Condition 3 implies

| β_{j} | \geq C \sqrt{s} λ .

With high probability, the estimation error satisfies

| {\hat{β}}_{j} - β_{j} | \leq O_{P} (\sqrt{s} λ) .

For

j \in S

, combining this with the minimum signal strength, Condition 3 provides

| {\hat{β}}_{j} | \geq C c_{0} \sqrt{s} λ - O_{P} (\sqrt{s} λ) .

For sufficiently large n, where

C c_{0} > 1

, this ensures

| {\hat{β}}_{j} | > 0

. For

j \notin supp (β)

, the estimation error simplifies to

| {\hat{β}}_{j} | = | {\hat{β}}_{j} - β_{j} | = | {\hat{β}}_{j} |

. The

ℓ_{\infty}

norm error bound satisfies

∥ \hat{β} {- β ∥}_{\infty} \leq O_{P} (\sqrt{s} λ)

, which implies

| {\hat{β}}_{j} | \leq O_{P} (\sqrt{s} λ)

. In addition, by Condition 3, there is

| {\hat{β}}_{j} | \geq C \sqrt{s} λ

. This implies

O_{P} (\sqrt{s} λ) \geq | {\hat{β}}_{j} | \geq C \sqrt{s} λ

. This leads to a contradiction. Therefore,

{\hat{β}}_{j} = 0

for

j \notin supp (β)

. Combining the cases for

j \in S

and

j \notin S

, we conclude that

P (supp (\hat{β}) = supp (β)) \to 1 as n \to \infty .

□

Appendix A.2. Proof of Lemmas

Lemma A1.

Let

X \in R^{n \times p}

denote a fixed design matrix and

ε \sim N (0, σ^{2} I_{n})

represent an n-dimensional error vector, where

I_{n}

is the

n \times n

identity matrix. For any

t > 0

, we define

z = σ \sqrt{\frac{t^{2} + 2 log p}{n}}

. Under these settings, the following inequality holds

P (∥ n^{- 1} X^{⊤} ε ∥_{\infty} \leq z) \geq 1 - 2 exp {- t^{2} / 2} .

Proof.

Given that

ε \sim N (0, σ^{2} I_{n})

, for

t > 0

and

z = σ \sqrt{\frac{t^{2} + 2 log p}{n}}

, we have

1 - P (∥ n^{- 1} X^{⊤} {ε ∥}_{\infty} \leq z) = P (∥ n^{- 1} X^{⊤} ε ∥_{\infty} > z) = P (max_{1 \leq j \leq p} \frac{| ε^{⊤} X_{j} |}{n} > z),

where

X_{j}

is the j-th column of

X

. Using the union bound,

P (max_{1 \leq j \leq p} \frac{| ε^{⊤} X_{j} |}{n} > z) \leq \sum_{j = 1}^{p} P (\frac{| ε^{⊤} X_{j} |}{\sqrt{n} σ} > \sqrt{t^{2} + 2 log p}) .

Since

\frac{ε^{⊤} X_{j}}{\sqrt{n} σ} \sim N (0, 1)

,

P (\frac{| ε^{⊤} X_{j} |}{\sqrt{n} σ} > \sqrt{t^{2} + 2 log p}) = 2 P (\frac{ε^{⊤} X_{j}}{\sqrt{n} σ} > \sqrt{t^{2} + 2 log p}) .

Thus,

P (max_{1 \leq j \leq p} \frac{| ε^{⊤} X_{j} |}{n} > z) \leq 2 p exp \{- (\frac{t^{2}}{2} + log p)\} .

Simplifying,

P (∥ n^{- 1} X^{⊤} ε ∥_{\infty} > z) \leq 2 exp (- \frac{t^{2}}{2}) .

The proof of Lemma A1 is now complete.□

Lemma A2.

For any

ϵ > 0

,

P (∥ \tilde{Σ} {- Σ ∥}_{max} \geq ϵ) \leq p^{2} {max}_{i, j} P (| {\hat{Σ}}_{i, j} - Σ_{i, j} | \geq ϵ / 2) .

Proof.

From the inequality

∥ \tilde{Σ} {- Σ ∥}_{max} \leq ∥ \tilde{Σ} - \hat{Σ} ∥_{max} + ∥ \hat{Σ} {- Σ ∥}_{max} \leq 2 {∥ \hat{Σ} - Σ ∥}_{max},

it follows that

P (∥ \tilde{Σ} {- Σ ∥}_{max} \geq ϵ) \leq P (∥ \hat{Σ} - Σ ∥_{max} \geq ϵ / 2) .

The result is obtained by applying the union bound over

P (| {\hat{Σ}}_{i, j} - Σ_{i, j} | \geq ϵ / 2)

for all

i, j

. □

References

Bickel, P.J.; Ritov, Y.; Tsybakov, A.B. Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 2009, 37, 1705–1732. [Google Scholar] [CrossRef]
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
Candes, E.; Tao, T. The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Stat. 2007, 35, 2313–2351. [Google Scholar]
Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression. Ann. Stat. 2004, 32, 407–499. [Google Scholar] [CrossRef]
Fan, J.; Feng, Y.; Wu, Y. Network exploration via the adaptive Lasso and SCAD penalties. Ann. Appl. Stat. 2009, 3, 521. [Google Scholar] [CrossRef]
Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Fan, Y.; Lv, J. Asymptotic properties for combined L₁ and concave regularization. Biometrika 2014, 101, 57–70. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
Kong, Y.; Zheng, Z.; Lv, J. The constrained Dantzig selector with enhanced consistency. J. Mach. Learn. Res. 2016, 17, 4205–4226. [Google Scholar]
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
Wright, S.J. Coordinate descent algorithms. Math. Program. 2015, 151, 3–34. [Google Scholar] [CrossRef]
Zou, H. The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2005, 67, 301–320. [Google Scholar] [CrossRef]
Liang, H.; Li, R. Variable selection for partially linear models with measurement errors. J. Am. Stat. Assoc. 2009, 104, 234–248. [Google Scholar] [CrossRef] [PubMed]
Loh, P.L.; Wainwright, M.J. High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity. Ann. Stat. 2012, 40, 1637–1664. [Google Scholar] [CrossRef]
Datta, A.; Zou, H. CoCoLasso for high-dimensional error-in-variables regression. Ann. Stat. 2017, 45, 2400–2426. [Google Scholar] [CrossRef]
Zhao, P.; Yu, B. On model selection consistency of Lasso. J. Mach. Learn. Res. 2006, 7, 2541–2563. [Google Scholar]
Zheng, Z.; Li, Y.; Yu, C.; Li, G. Balanced estimation for high-dimensional measurement error models. Comput. Stat. Data Anal. 2018, 126, 78–91. [Google Scholar] [CrossRef]
Huang, J.; Ma, S.; Zhang, C.H. Adaptive Lasso for sparse high-dimensional regression models. Stat. Sin. 2008, 18, 1603–1618. [Google Scholar]
van de Geer, S.A.; Bühlmann, P. On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 2009, 3, 1360–1392. [Google Scholar] [CrossRef]
Zhang, C.H.; Huang, J. The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann. Stat. 2008, 36, 1567–1594. [Google Scholar] [CrossRef]
Bühlmann, P.; van de Geer, S. Statistics for High-Dimensional Data: Methods, Theory, and Applications; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]

Table 1. Means and standard errors (in parentheses) of four performance metrics for five methods under additive error cases over 100 replications in the autoregressive structure.

Measure	A-CoCoLasso	CoCoLasso	Hard	B-SCAD	B-Hard
$τ = 0.75$
C	2.93 (0.03)	2.94 (0.03)	2.27 (0.07)	2.85 (0.04)	2.69 (0.05)
IC	3.89 (0.40)	12.54 (0.75)	0.14 (0.04)	8.85 (0.53)	0.71 (0.18)
PE	1.68 (0.12)	3.65 (0.17)	3.01 (0.27)	2.82 (0.16)	2.26 (0.25)
MSE	2.01 (0.15)	3.64 (0.18)	3.67 (0.33)	3.06 (0.17)	2.58 (0.28)
$τ = 1.25$
C	2.79 (0.04)	2.72 (0.05)	1.78 (0.08)	2.55 (0.05)	2.14 (0.08)
IC	5.30 (0.60)	13.36 (0.86)	0.25 (0.06)	8.20 (0.47)	0.83 (0.24)
PE	5.40 (0.24)	8.69 (0.24)	6.51 (0.39)	7.69 (0.32)	6.45 (0.29)
MSE	5.01 (0.21)	7.53 (0.21)	6.11 (0.34)	6.60 (0.26)	5.79 (0.33)

Table 2. Means and standard errors (in parentheses) of four performance metrics for five methods under additive error cases over 100 replications in the compound symmetric structure.

Measure	A-CoCoLasso	CoCoLasso	Hard	B-SCAD	B-Hard
$τ = 0.75$
C	2.77 (0.04)	2.69 (0.05)	1.86 (0.08)	2.62 (0.05)	2.31 (0.07)
IC	8.37 (0.50)	14.95 (0.73)	0.45 (0.07)	8.88 (0.41)	1.89 (0.29)
PE	3.01 (0.15)	4.62 (0.19)	5.04 (0.41)	3.43 (0.20)	4.16 (0.31)
MSE	3.79 (0.20)	6.05 (0.25)	6.15 (0.49)	4.40 (0.25)	5.43 (0.42)
$τ = 1.25$
C	2.46 (0.06)	2.24 (0.07)	1.20 (0.07)	2.07 (0.07)	1.64 (0.07)
IC	12.88 (0.44)	18.84 (0.64)	1.24 (0.13)	13.59 (0.45)	3.64 (0.27)
PE	6.80 (0.21)	8.46 (0.22)	10.43 (0.52)	8.01 (0.30)	8.60 (0.31)
MSE	8.39 (0.26)	10.81 (0.28)	11.59 (0.62)	9.91 (0.36)	11.04 (0.44)

Table 3. Means and standard errors (in parentheses) of four performance metrics for five methods under additive error cases over 100 replications in the ultra-high-dimensional autoregressive structure.

Measure	A-CoCoLasso	CoCoLasso	Hard	B-SCAD	B-Hard
$τ = 0.25$
C	3.95 (0.07)	3.96 (0.07)	3.14(0.11)	4.30 (0.09)	3.88 (0.08)
IC	12.7 (1.48)	31.32 (2.03)	0.40 (0.07)	21.69 (1.52)	4.82 (1.49)
PE	0.90 (0.03)	1.37 (0.04)	0.95 (0.04)	0.82 (0.06)	0.88 (0.05)
MSE	1.55 (0.03)	2.19 (0.05)	1.70 (0.08)	1.45 (0.05)	1.52 (0.06)
$τ = 0.5$
C	3.56 (0.07)	3.54 (0.08)	2.36 (0.11)	3.69 (0.09)	3.49 (0.09)
IC	11.05 (1.26)	24.66 (1.32)	0.37 (0.10)	21.84(0.99)	5.05 (0.96)
PE	1.21 (0.03)	1.76 (0.04)	1.36 (0.06)	1.27 (0.04)	1.24 (0.05)
MSE	2.01 (0.04)	2.72 (0.04)	2.26 (0.08)	2.07 (0.06)	2.01 (0.06)

Table 4. Means and standard errors (in parentheses) of four performance metrics for five methods under multiplicative error cases over 100 replications in the autoregressive structure.

Measure	A-CoCoLasso	CoCoLasso	Hard	B-SCAD	B-Hard
$τ = 0.25$
C	3.00 (0.00)	3.00 (0.00)	2.67 (0.05)	2.99 (0.01)	2.69 (0.02)
IC	4.78 (0.58)	14.50 (0.93)	0.13 (0.05)	8.46 (0.55)	0.51 (0.06)
PE	0.70 (0.05)	1.90 (0.09)	1.02 (0.10)	0.99 (0.06)	0.65 (0.08)
MSE	0.82 (0.06)	1.82 (0.10)	1.41 (0.14)	1.07 (0.07)	0.77 (0.08)
$τ = 0.75$
C	2.90 (0.03)	2.85 (0.04)	2.11 (0.07)	2.79 (0.04)	2.56 (0.06)
IC	5.17 (0.50)	12.98(0.70)	0.27 (0.06)	9.10 (0.58)	0.88 (0.18)
PE	2.36 (0.15)	5.05 (0.20)	3.76 (0.25)	3.63 (0.21)	2.98 (0.23)
MSE	2.63 (0.16)	4.84 (0.18)	4.50 (0.31)	3.80 (0.20)	3.37 (0.26)

Table 5. Means and standard errors (in parentheses) of four performance metrics for five methods under multiplicative error cases over 100 replications in the compound symmetric structure.

Measure	A-CoCoLasso	CoCoLasso	Hard	B-SCAD	B-Hard
$τ = 0.25$
C	3.00 (0.00)	2.97 (0.02)	2.86 (0.03)	2.97 (0.02)	2.93 (0.03)
IC	8.11 (0.60)	14.79 (0.75)	0.16 (0.05)	8.69 (0.57)	2.03 (0.62)
PE	1.23 (0.07)	2.22 (0.10)	0.76 (0.09)	1.06 (0.06)	1.12 (0.13)
MSE	1.53 (0.09)	2.93 (0.14)	0.96 (0.12)	1.37 (0.08)	1.47 (0.18)
$τ = 0.75$
C	2.67 (0.05)	2.54 (0.06)	1.89 (0.07)	2.55 (0.05)	2.30 (0.06)
IC	9.88 (0.53)	15.76 (0.62)	0.57 (0.09)	10.79 (0.50)	2.77 (0.33)
PE	4.23 (0.18)	6.25 (0.22)	4.85 (0.33)	4.16 (0.21)	4.63 (0.28)
MSE	5.27 (0.24)	8.11 (0.29)	5.83 (0.41)	5.28 (0.27)	6.16 (0.38)

Table 6. Means and standard errors (in parentheses) of four performance metrics for five methods under multiplicative error cases over 100 replications in the ultra-high-dimensional autoregressive structure.

Measure	A-CoCoLasso	CoCoLasso	Hard	B-SCAD	B-Hard
$τ = 0.1$
C	4.18 (0.09)	4.22 (0.08)	3.51 (0.11)	4.56 (0.09)	4.03 (0.09)
IC	14.55 (1.55)	34.79 (3.78)	0.49 (0.11)	27.42 (3.70)	12.87 (4.12)
PE	0.78 (0.02)	1.21 (0.04)	0.83 (0.05)	0.76 (0.04)	0.82 (0.04)
MSE	1.37 (0.03)	1.99 (0.05)	1.50 (0.08)	1.35 (0.05)	1.44 (0.05)
$τ = 0.2$
C	3.57 (0.08)	4.00 (0.08)	3.21 (0.07)	4.29 (0.08)	3.92 (0.08)
IC	4.26 (1.55)	26.81 (1.45)	0.43 (0.09)	20.82 (0.94)	4.88 (1.09)
PE	0.92 (0.04)	1.28 (0.04)	0.93 (0.05)	0.81 (0.04)	0.87 (0.04)
MSE	1.60 (0.05)	2.10 (0.05)	1.66 (0.07)	1.44 (0.05)	1.50 (0.05)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Adaptive CoCoLasso for High-Dimensional Measurement Error Models

Abstract

1. Introduction

2. Adaptive CoCoLasso for Error-Prone Models

2.1. Model Setting

2.2. Adaptive CoCoLasso

3. Theoretical Properties

4. Numerical Studies

4.1. Additive Error Cases

4.2. Multiplicative Error Cases

5. Discussion

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Proof of Theorem 1

Appendix A.2. Proof of Lemmas

References

Article Metrics

Citations

Article Access Statistics