Next Article in Journal
Analog Circuits
Next Article in Special Issue
Mathematical Analysis and Modeling of the Factors That Determine the Quality of Life in the City Councils of Chile
Previous Article in Journal
Research on Multi-Objective Multi-Robot Task Allocation by Lin–Kernighan–Helsgaun Guided Evolutionary Algorithms
Previous Article in Special Issue
Convergence Behavior of Optimal Cut-Off Points Derived from Receiver Operating Characteristics Curve Analysis: A Simulation Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

High-Dimensional Regression Adjustment Estimation for Average Treatment Effect with Highly Correlated Covariates

1
School of Statistics, Beijing Normal University, Beijing 100875, China
2
School of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, China
3
School of Mathematical Science, Shanxi University, Taiyuan 030006, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(24), 4715; https://doi.org/10.3390/math10244715
Submission received: 8 November 2022 / Revised: 5 December 2022 / Accepted: 10 December 2022 / Published: 12 December 2022
(This article belongs to the Special Issue Computational Statistics and Data Analysis)

Abstract

:
Regression adjustment is often used to estimate average treatment effect (ATE) in randomized experiments. Recently, some penalty-based regression adjustment methods have been proposed to handle the high-dimensional problem. However, these existing high-dimensional regression adjustment methods may fail to achieve satisfactory performance when the covariates are highly correlated. In this paper, we propose a novel adjustment estimation method for ATE by combining the semi-standard partial covariance (SPAC) and regression adjustment methods. Under some regularity conditions, the asymptotic normality of our proposed SPAC adjustment ATE estimator is shown. Some simulation studies and an analysis of HER2 breast cancer data are carried out to illustrate the advantage of our proposed SPAC adjustment method in addressing the highly correlated problem of the Rubin causal model.

1. Introduction

Accompanied by the rapid development in information technology, people have opportunities to collect a massive amount of data in many fields, such as genomics, biomedicine, aerography and so on, where the dimension of covariates p often far exceeds the sample size n. Despite the promising application prospects, there are many problems and challenges among statistical inference for high-dimensional data. For instance, the sample covariance matrix is huge and noninvertible under the setting p > n , the unimportant covariates are highly correlated with the response variable because they are associated with the important covariates ([1]). To deal with these problems and challenges, many penalty-based approaches have been proposed to select important covariates and estimate the unknown parameters simultaneously, including Lasso ([2]), SCAD ([3]) and Elastic-net ([4]) penalties. The above literatures mainly focus on considering the regression models and traditional correlations between the covariates and response variable.
In some cases, the traditional correlations cannot fully depict the influence mechanism of variables. Researchers have studied the causal relations among the variables and developed the Rubin causal (Neyman-Rubin) models (see [5,6]); details can be found in Refs. [7,8]. For the case of high-dimensional data, Refs. [9,10] suggested that standard high-dimensional penalty-based methods can be used to estimate the average treatment effect (ATE). Ref. [11] developed a risk-consistent regression adjustment approach for ATE using Lasso penalty in [2]. Ref. [12] proposed Lasso-adjusted ATE estimator by combining the Lasso penalty and regression adjustment method, and showed that the proposed method can reduce the variance of unadjusted ATE estimator in [6]. Ref. [13] further considered the multicollinearity problem in high dimensions, and proposed an Elastic-net adjustment method for ATE.
However, the correlations between the important and unimportant covariates are usually higher than those of the important covariates in high-dimensional settings (see [14,15]). Under this status, the irrepresentable condition ([16]) could fail such that the Lasso-based penalty methods may fail to correctly estimate the signs and distinguish the important and unimportant covariates. Then, the corresponding adjustment ATE estimator may perform poorly. So far, many research scholars have considered the highly correlated problem in high dimensions and provided some effective methods to undertake variable selection. For example, Ref. [17] proposed the Peter-Clark-simple (PC-simple) algorithm to select the important covariates using partial correlation, Ref. [18] developed the factor-adjusted regularized model selection (Farm-Select) method. Ref. [19] gave the semi-standard partial covariance (SPAC) method to effectively choose covariates which have a direct effect on the response variable, and showed that the SPAC outperforms the PC-simple and Farm-Select methods when the original irrepresentable condition in Ref. [16] fails. Nevertheless, these variable selection methods have not yet been used on the fields of causal inference.
In this paper, we consider the estimation problem of ATE in the Rubin causal model with highly correlated covariates. The main contributions of this paper are four-fold. Firstly, the SPAC adjustment estimator is developed by a novel combination of the SPAC variable selection and regression adjustment methods. Secondly, the framework is an extension of that in [19] to study the causal inference and [12] to handle the highly correlated problem. Thirdly, the theoretical property is shown under some regularity conditions. Fourthly, the performance of our proposed SPAC adjustment method is satisfactory, which can be observed by the numerical results of a real data analysis and some simulation studies.
The rest of this article is organized as follows. In Section 2, the SPAC adjustment method for ATE is proposed for the Rubin causal model with highly correlated covariates in high dimensions, and the asymptotic property of the proposed SPAC-Lasso adjustment estimator for ATE is also developed under some regularity conditions. In Section 3, some simulation studies are assigned to assess the effectiveness of our proposed SPAC adjustment method. In Section 4, the proposed estimation approach is applied to an HER2 breast cancer dataset. Some concluding discussions are provided in Section 5. The Appendix A is devoted to some Lemmas related to the proof of theorem.
Notation 1.
For the sake of description, some notations are introduced as follows. For any column vector u = ( u 1 , , u p ) T and a subset S { 1 , , p } , let u 1 = j = 1 p | u i | , u 2 2 = j = 1 p u i 2 and u = max i = 1 , , p | u i | , u S = { u j : j S } , S C denotes the complement of S, | S | denote the cardinality of S. For a matrix D , D T and D 1 denote the transpose and inverse of matrix D , respectively. The notation “ L ” denotes the convergence in distribution.

2. Methodology and Theoretical Property

2.1. Spac Adjustment Method for ATE

We frame our analysis in terms of the Rubin causal model. Let i be the units in the population of size n, Y i be the potential outcome variable, x i = ( x i 1 , , x i p ) T R p be the p-dimensional covariates with p far exceeding the sample size n, the full design matrix of the experiment be X = ( x 1 , , x n ) T , each covariate X j = ( x 1 j , , x n j ) T ( j = 1 , , p ) is standardized with X j T X j = n and i = 1 n x i j = 0 . The observed data x i ( i = 1 , , n ) can be viewed as independent identically distributed (i.i.d.) from a distribution with mean 0 and positive definite covariance matrix Σ p × p , and all the diagonal elements of Σ are equal to 1. Each unit is randomly assigned to the treatment group or control group, and the treatment indicator is denoted by T i with T i = 1 for a treated individual and T i = 0 otherwise. Then, the observed potential outcome for individual i is
Y i obs = T i Y i ( 1 ) + ( 1 T i ) Y i ( 0 ) ,
where Y i ( 1 ) and Y i ( 0 ) are the corresponding potential outcomes under treatment and control groups, respectively, that is, Y i obs = Y i ( 1 ) for T i = 1 , Y i obs = Y i ( 0 ) for T i = 0 . The numbers of the treated and control units are equal to n A = | A | and n B = | B | , respectively, with A = { i { 1 , , n } : T i = 1 } , B = { i { 1 , , n } : T i = 0 } , and n A + n B = n .
In randomized experiments, the sample is often not randomly taken from the population (superpopulation) of interest (see [12,13,20]). In this paper, we focus on ATE in the finite sample, which is defined as
τ = Y ¯ 1 Y ¯ 0 ,
where Y ¯ 1 = n 1 i = 1 n Y i ( 1 ) and Y ¯ 0 = n 1 i = 1 n Y i ( 0 ) are the average responses if all individuals receive treatment or not. Clearly, the averages of potential outcomes over the whole population Y ¯ 1 and Y ¯ 0 are fixed. Based on the idea of replacing the population averages Y ¯ s ( s = 0 , 1 ) with the sample averages, a nature unadjusted ATE estimator is obtained as follows,
τ ^ unadj = 1 n A i A Y i ( 1 ) 1 n B i B Y i ( 0 ) .
As pointed out by [12,21,22], the information of covariates x i can often be used to adjust the estimator in (2) in hope of improving estimation precision. For the high-dimensional data, Ref. [12] proposed the following Lasso-adjusted ATE estimator
τ ^ Lasso = Y ¯ A ( x ¯ A x ¯ ) T β ^ Lasso A Y ¯ B ( x ¯ B x ¯ ) T β ^ Lasso B ,
where Y ¯ A = n A 1 i A Y i ( 1 ) , Y ¯ B = n B 1 i B Y i ( 0 ) , x ¯ A = n A 1 i A x i , x ¯ B = n B 1 i B x i , x ¯ = n 1 i = 1 n x i , and the terms x ¯ w x ¯ for w = A and B illustrate the fluctuations between the subsample and full sample of covariates. The adjustment vectors β ^ Lasso w are obtained based on the Lasso penalty,
β ^ Lasso w = argmin β R p 1 2 n w i w Y i obs Y ¯ w ( x i x ¯ w ) T β 2 + λ w j = 1 p | β j | , w = A , B ,
where λ w > 0 are regularization parameters for Lasso.
However, traditional penalty-based methods fail to effectively estimate the signs and select the important covariates when the important and unimportant covariates are highly correlated, see the details in [19]. This is especially critical in high-dimensional settings. To solve this problem, the SPAC method is proposed to capture the signal strengths of important covariates while eliminating the effects of covariates that are not directly related with the potential outcome variable Y but highly correlated with important covariates. The SPAC between Y obs and the j-th covariate X j is defined as
γ j = β j / d j j 1 / 2 , j = 1 , , p ,
where d j j is the j-th diagonal element of precision matrix Σ 1 , 1 / d j j 1 / 2 = { Var ( X j | X j ) } 1 / 2 = ( 1 R j 2 ) 1 / 2 (see Refs. [23,24]), where X j = { X k : k = 1 , . . . , j 1 , j + 1 , . . . , p } , R j denotes the multiple correlations between the j-th covariate X j and all the other covariates. In particular, γ j is the same as β j if X j is independent of the other covariates. Otherwise, the SPAC γ j mitigates the effect of correlations among the covariates by using ( 1 R j 2 ) 1 / 2 to multiply β j . Obviously, β j = 0 if and only if γ j = 0 for j = 1 , , p . Hence, the SPAC estimator of adjustment vector can be obtained by replacing β j in (4) with γ j ,
γ ^ SPAC Lasso w = argmin γ R p 1 2 n w i w Y i obs Y ¯ w ( x i x ¯ w ) T D ^ γ 2 + λ w j = 1 p d ^ j j | γ j | ,
where w = A , B , D ^ = diag { d ^ 11 1 / 2 , , d ^ p p 1 / 2 } , d ^ j j is the consistent estimator of the j-th diagonal element of precision matrix. In detail, d ^ j j can be adopted as the constrained L 1 -minimization estimation (CLIME, [25]), residual variance estimator ([26]), robust matrix estimator ([27]). Consequently, the adjustment vectors β ^ SPAC Lasso w can be given by using (5),
β ^ SPAC Lasso w = D ^ γ ^ SPAC Lasso w , w = A , B .
Then, the SPAC-Lasso adjustment estimator of ATE is defined as
τ ^ SPAC Lasso = Y ¯ A ( x ¯ A x ¯ ) T β ^ SPAC Lasso A Y ¯ B ( x ¯ B x ¯ ) T β ^ SPAC Lasso B .
Similarly, we can obtain the SPAC-SCAD estimator of ATE by using the SCAD penalty in (6). The performance of our proposed SPAC adjustment methods (SPAC-Lasso and SPAC-SCAD) will be compared with those of the existing ATE estimation methods (unadjusted, Lasso-adjusted, SCAD-adjusted, Elastic-net adjusted) in the following simulation studies, and the theoretical property of the SPAC-Lasso adjustment estimator will be shown in the following subsection.

2.2. Regularity Conditions and Theoretical Property

For Rubin causal model in randomized experiments, there are no assumptions for the relationship between potential outcome variable Y and covariates x . To study the theoretical property of the proposed estimator τ ^ SPAC Lasso , we make the following linear decomposition and define the approximate sparsity, which are similar to that in [12].
Decomposition of the potential outcomes. The potential outcome can be divided into a linear term of covariates and an error term, which is formed as,
Y i ( 1 ) = Y ¯ 1 + ( x i x ¯ ) T β A + e i A , Y i ( 0 ) = Y ¯ 0 + ( x i x ¯ ) T β B + e i B , i = 1 , , n ,
where x ¯ = n 1 i = 1 n x i , β A and β B are p-dimensional vectors of coefficients. In the above decomposition (9), all the quantities are fixed and deterministic numbers, and e A ¯ = e B ¯ = 0 , where e A ¯ = n 1 i = 1 n e i A , e B ¯ = n 1 i = 1 n e i B .
Definition 1.
Similar to [12,13], we define the approximate sparsity measures s λ A and s λ B for treatment and control groups as
s λ A = j = 1 p min β j A λ A 1 , 1 , s λ B = j = 1 p min β j B λ B 1 , 1 ,
which are more flexible than s w = | { j : β j w 0 } | with w = A , B . s λ A and s λ B are allowed to grow with n, s λ = max s λ A , s λ B .
In addition, the following regularity conditions are also needed to obtain the asymptotic normality of the proposed SPAC-Lasso adjustment estimation.
(C1)
p ˜ A = n A / n p A and p ˜ B = n B / n p B as n , and p A ( 0 , 1 ) , p B ( 0 , 1 ) .
(C2)
For j = 1 , , p , there is a fixed constant L > 0 such that n 1 i = 1 n ( x i j ( x ¯ ) j ) 4 L , n 1 i = 1 n ( e i A ) 4 L and n 1 i = 1 n ( e i B ) 4 L .
(C3)
The eigenvalues of the sample covariance matrix n 1 X T X are bounded away from zero and infinity.
(C4)
There exists a constant B > 0 such that β A 1 B , β B 1 B .
(C5)
Let δ n be the maximum covariance between the error terms and the covariates
δ n = max ω = A , B max j | 1 n i = 1 n ( x i j ( x ¯ ) j ) e i ω e ω ¯ | .
Assume that δ n = o 1 / ( s λ log p ) and ( s λ log p ) / n = o ( 1 ) .
(C6)
Let Σ * = n 1 i = 1 n D ^ 1 ( x i x ¯ ) ( x i x ¯ ) T D ^ 1 . There exist constants C 0 > 0 and ξ > 1 , such that
h γ * S 1 C 0 s λ Σ * h γ * , h γ * C ,
where C = h γ * : h γ * S c 1 ξ h γ * S 1 and S = j : | β j A | > λ A or | β j B | > λ B .
(C7)
Let ν = min { 1 / 70 , ( 3 p ˜ A ) 2 / 70 , ( 3 3 p ˜ A ) 2 / 70 } . For some constants c > 0 , L 0 > 0 , 0 < η < ( ξ 1 ) / ( ξ + 1 ) and 1 / η < M < , the regularization parameters of the SPAC-Lasso satisfy that
λ A ( 1 η , M ] × 2 c ( 1 + ν ) L 1 / 2 p ˜ A L 0 2 log p n + δ n L 0 , λ B ( 1 η , M ] × 2 c ( 1 + ν ) L 1 / 2 p ˜ B L 0 2 log p n + δ n L 0 .
Condition (C1) is a basic assumption for the probability of receiving the treatment or control. Condition (C2) is a moment condition for x i j and error terms e i w ( w = A , B ), which is similar to the conditions in [12,21,22]. Conditions (C3) and (C4) are some regularity conditions for high-dimensional statistical inference (see [12,13,28,29]). Conditions (C5)–(C7) are needed to show the convergence rate for β ^ SPAC Lasso , and assumed based on the definition of approximate sparsity. These assumptions are similar to those in [12,13], and are weaker than the assumptions for strict sparsity.
Theorem 1.
Suppose that regularity conditions (C1)–(C7) hold, as n , then
n τ ^ SPAC Lasso τ L N ( 0 , σ 2 ) ,
where
σ 2 = lim n 1 p A p A σ e A 2 + p A 1 p A σ e B 2 + 2 σ e A e B ,
and σ e A 2 = n 1 i = 1 n ( e i A ) 2 , σ e B 2 = n 1 i = 1 n ( e i B ) 2 , σ e A e B = n 1 i = 1 n e i A e i B .
Theorem 1 shows that the asymptotic normality of the proposed SPAC-Lasso adjustment estimator τ ^ SPAC Lasso for highly correlated covariates based on the approximate sparsity measures and appropriate tuning parameters λ A and λ B . Without loss of generality, we assume that Y ¯ 1 = 0 , Y ¯ 0 = 0 and x ¯ = 0 . The assumptions and the results in Theorem 1 are similar to that in [12,13].
Proof. 
According to the decomposition of Y i ( 1 ) and Y i ( 0 ) in (9), we have
n τ ^ SPAC Lasso τ = n Y ¯ A x ¯ A T β ^ SPAC Lasso A n Y ¯ B x ¯ B T β ^ SPAC Lasso B = n x ¯ A T β A + e ¯ A x ¯ A T β ^ SPAC Lasso A n x ¯ B T β B + e ¯ B x ¯ B T β ^ SPAC Lasso B = n e ¯ A e ¯ B I 1 n x ¯ A T h A x ¯ B T h B I 2 ,
where h A = β ^ SPAC Lasso A β A and h B = β ^ SPAC Lasso B β B , e ¯ A = n A 1 i A e i A , e ¯ B = n B 1 i B e i B . Combining the Theorem 1 in [21] and replacing a and b with e A and e B , we have I 1 L N ( 0 , σ 2 ) , where σ 2 is defined in Theorem 1.
By using the Hölder inequality, we have
x ¯ A T h A x ¯ A h A 1 .
Invoking Lemma 1 in [13] and conditions (C1)–(C2), we have
x ¯ A = O p log p n .
According to (5), we obtain that
h A = β ^ SPAC Lasso A β A = D ^ γ ^ SPAC Lasso A D γ A = D ^ γ ^ SPAC Lasso A γ A + D ^ D γ A = D ^ h γ A + D ^ D γ A ,
where h γ A = γ ^ SPAC Lasso A γ A , D = diag d 11 1 / 2 , , d p p 1 / 2 .
Using Lemma A3 in the following Appendix A, we have
h γ A 1 = o p 1 log p .
Together with (12) and conditions (C3)–(C4), we have h A 1 = o p 1 log p . Then,
n x ¯ A T h A = n · O p log p n · o p 1 log p = o p ( 1 ) .
Similarly, we can obtain that n x ¯ B T h B = o p ( 1 ) . Hence, we have I 2 = o p ( 1 ) . This completes the proof of Theorem 1. □

3. Simulation Studies

In this section, the performance of the proposed SPAC-Lasso, SPAC-SCAD adjustment estimators are evaluated, and compared with those of the unadjusted estimator (unadj) and the penalty-based regression adjustment estimators (Lasso, SCAD, Enet). The R package “glmnet” is used to solve the problems of the Elastic-net and Lasso. To implement the SCAD and SPAC-SCAD methods, a = 3.7 is chosen and the R package “ncvreg” is used ([3]). In addition, the estimation of precision matrix is implemented by the R package “fastclime” of [30]. For each regression adjustment method, the tuning parameter is selected by the 10-fold cross-validation. The results are based on 2000 repeated simulations.
The potential outcomes are generated as follows,
Y i ( 1 ) = j = 1 p x i j β j + z + e i A , i = 1 , , n , Y i ( 0 ) = j = 1 p x i j β j + e i B , i = 1 , , n ,
where n = 250 , p = 500 , 1000 and 2000, z U ( 0 , 2 ) , β = ( β 1 , . . . , β p ) T is the coefficient vector, error terms e i A and e i B are i.i.d generated from N ( 0 , 1 ) . The covariates vector x i = ( x i 1 , . . . , x i p ) T is drawn from a multivariate normal distribution N ( 0 p × 1 , Σ p × p ) , and the covariance matrix Σ p × p has the following block-exchangeable structure,
Σ p × p = Σ q × q 11 Σ q × ( p q ) 12 ( Σ q × ( p q ) 12 ) T Σ ( p q ) × ( p q ) 22 ,
where q is the number of the non-zero elements, and
( Σ 11 ) s , j = 1 s = j α 1 s j , ( Σ 12 ) s , j = α 2 , ( Σ 22 ) s , j = 1 s = j α 3 s j .
Here, the parameter vector α = ( α 1 , α 2 , α 3 ) measures the correlations of covariates. To explore the effect of the correlations of covariates, we consider three different choices of α as α = ( 0.1 , 0.3 , 0.8 ) , ( 0.2 , 0.5 , 0.9 ) and ( 0.5 , 0.7 , 0.9 ) . For the coefficient vector β , the first q coefficients take the nonzero values while the remaining p q elements are set to zero. In this simulation, we set q = 9 and let
β = ( 1 , 1 , 1 , 1.5 , 1.5 , 1.5 , 2 , 2 , 2 , 0 , , 0 ) T .
From the generated data, we randomly assign n A = 125 units to the treatment group A and the remaining n B = n n A = 125 units to the control group B.
To assess the finite-sample performance of our proposed SPAC adjustment method (SPAC-Lasso, SPAC-SCAD), we compute the |Bias|, the standard deviations (SD) and the root-mean square errors (RMSE) of each estimator. In our simulation studies, |Bias| represents absolute difference between the estimated ATE and the true ATE. The numerical results are shown in Table 1.
From the results in Table 1, we observe the following results.
(1) When α = ( 0.1 , 0.3 , 0.8 ) , our proposed SPAC methods (SPAC-Lasso, SPAC-SCAD) outperform the unadj method in terms of SDs and RMSEs, and have similar performance with Lasso, SCAD and Enet. Specifically, the SPAC adjustment method reduces the RMSE of unadjusted estimator (unadj) by 86–88%.
(2) As the correlations of covariates α increase, the superiority of the proposed SPAC adjustment method becomes obvious. For example, when p = 2000 and α = ( 0.5 , 0.7 , 0.9 ) , the RMSEs of SPAC-Lasso and SPAC-SCAD are 39 % and 45 % smaller than those of the Lasso and SCAD, respectively.
The variable selection performance is assessed by the mean number of selected nonzero coefficients (S), the false negative rate (FNR), and false positive rate (FPR). The FNR and FPR are defined as
FNR : j = 1 p I ( β ^ j = 0 , β j 0 ) j = 1 p I ( β j 0 ) , FPR : j = 1 p I ( β ^ j 0 , β j = 0 ) j = 1 p I ( β j = 0 ) ,
where I ( · ) is the indicator function. The FNR and FPR indicate the proportion of important covariates which are not selected and the proportion of selected unimportant covariates, respectively. The smaller false rates (FNR and FPR) indicate a better performance for variable selection. The variable selection results are listed in the following Table 2.
From Table 2, we obtain the following results.
(1) When the important covariates and unimportant covariates are weakly correlated α 2 = 0.3 of α = ( 0.1 , 0.3 , 0.8 ) , the SCAD and our proposed SPAC-Lasso, SPAC-SCAD adjustment methods perform well in terms of S, FNRs and FPRs, where the false rates (FNRs and FPRs) are nearly 0. In comparison, the proportions of the selected unimportant variables (FPR) of Lasso and Enet are relatively large, which is also reflected in the number of selected nonzero elements (S).
(2) When the correlations of covariates increase, the proposed SPAC adjustment method (SPAC-Lasso and SPAC-SCAD) has satisfactory performance, while the existing penalty-based regression adjustment methods (Lasso, SCAD, Enet) perform badly. The mean numbers of the selected nonzero coefficients (S) of our proposed method are close to the number of true nonzero elements 9, but the existing adjusted methods fail to correctly select the nonzero and zero coefficients (relatively large FNRs and FPRs). For example, when α = ( 0.5 , 0.7 , 0.9 ) , the proportions of important covariates which are not selected (FNR) of SCAD exceed 0.819, while the largest FNR of SPAC-SCAD is only 0.007.
To further assess the performance of our proposed SPAC adjustment method, we calculate the mean of variance estimates (MVE) for σ in Theorem 1, the mean coverage probability (MCP) and mean interval length (MIL) of the 95% confidence intervals [ τ ^ Z 0.975 · σ ^ / n , τ ^ + Z 0.975 · σ ^ / n ] , where Z α is the α quantile of the standard normal distribution. We then compare the results of proposed method with those of the existing unadjusted (unadj) and penalty-based regression adjustment methods (Lasso, SCAD, Enet) in Table 3.
For the unadjusted method, the variance estimator is defined as
σ ^ unadj 2 = n n A · 1 n A 1 i A Y i ( 1 ) Y ¯ A 2 + n n B · 1 n B 1 i B Y i ( 1 ) Y ¯ B 2 .
For the adjustment methods (SPAC-Lasso, SPAC-SCAD, Lasso, SCAD, Enet), we give the following Neyman-type conservative estimate of the variance σ 2 , which is similar to that in [12,13].
σ ^ 2 = n n A σ ^ e A 2 + n n B σ ^ e B 2 ,
where
σ ^ e A 2 = 1 n A d f A i A Y i ( 1 ) Y ¯ A ( x i x ¯ A ) T β ^ A 2 , σ ^ e B 2 = 1 n B d f B i B Y i ( 0 ) Y ¯ B ( x i x ¯ B ) T β ^ B 2 ,
and d f A = β ^ A 0 + 1 and d f B = β ^ B 0 + 1 are degrees of freedom for treatment and control groups, respectively. β ^ A and β ^ B are estimated adjustment vectors and obtained by different penalties (Lasso, SCAD, Enet) and SPAC methods (SPAC-Lasso, SPAC-SCAD) in (4) and (7).
From Table 3, we observe that:
(1) When α = ( 0.1 , 0.3 , 0.8 ) , the proposed SPAC adjustment method (SPAC-Lasso, SPAC-SCAD) performs better than the unadjusted (unadj) method, and performs similarly to the penalty-based adjustment methods (Lasso, SCAD, Enet) in terms of MVE, MCP and MIL.
(2) When the important and unimportanr covariates are highly correlated, the coverage probabilities of proposed SPAC adjustment method (SPAC-Lasso, SPAC-SCAD) are higher than those of the unadj, Lasso, SCAD and Enet methods, the MVE-values of SPAC-Lasso and SPAC-SCAD are smaller than those of the other methods. For example, when α = ( 0.5 , 0.7 , 0.9 ) , the MCPs of Lasso, SCAD and Enet methods are uniformly below 0.950, while the MCP-values of SPAC-Lasso and SPAC-SCAD are aound 0.980.
(3) The mean interval lengths (MILs) of the SPAC-Lasso and SPAC-SCAD are shorter than those of the unadj, Lasso, SCAD and Enet. Particularly, when α = ( 0.5 , 0.7 , 0.9 ) , the MIL-values of SPAC-Lasso and SPAC-SCAD are 10–14% and 38–46% shorter than those of the Lasso and SCAD, respectively.

4. A Real Data Analysis

In the clinic, the human epidermal growth factor receptor type 2 (HER2) is considered as an important indicator in the classification of the breast cancer. Overexpression or amplification of HER2 (HER2+) might account for around 20% of early breast cancers. As a monoclonal antibody, trastuzumab (also known as Herceptin) has been shown to improve the event-free survival rate and the results of chemotherapy in patients with HER2+ breast cancer ([31]).
In this section, we shall consider the estimation problem of the average treatment effect (trastuzumab) and apply the proposed SPAC adjustment method to the dataset based on a NeoAdjuvant Herceptin (NOAH) randomized clinical trial. The dataset was originally demonstrated in [31] and collected in the Gene Expression Omnibus (GSE50948), and further studied by Refs. [13,32,33].
There were n = 156 patients in the trail: 63 patients received trastuzumab and neoadjuvant chemotherapy (treatment group, T i = 1 ) and 93 patients received neoadjuvant chemotherapy alone (control group, T i = 0 ), i = 1 , , n . The pathological complete response (pCR) was measured by the absence of residual invasive breast cancer and viewed as the potential outcome variable Y i . For each patient, 54,675 gene probes were observed and regarded as the covariates.
The dimension of covariates p = 54,675 is much larger than the sample size n = 156 , we first apply the sure independence screening (SIS) method proposed in [1] to exclude some insignificant variables and reduce the dimension p to a suitable size. Following the suggestions of [13,34], the genes with little variation in intensity (i.e., for j-th gene satisfies max ( X j ) min ( X j ) k with a given value k) are also removed. Then, p * = 2573 gene probes are retained. Based on the dataset, we apply six methods (unadj, Lasso, SCAD, Enet and our proposed SPAC-Lasso and SPAC-SCAD) to estimate ATE. The tuning parameters of five regression adjustment methods (Lasso, SCAD, Enet, SPAC-Lasso and SPAC-SCAD) are chosen by 10-fold cross validation. For each method, we calculate the ATE estimator ( τ ^ ), the number of the selected nonzero coefficients (S), asymptotic variance estimator ( σ ^ ) and 95 % confidence interval length (L). The numerical results are presented in Table 4.
The results in Table 4 show that all the ATE estimates are around 0.250. Combing this with that in [13,31,32], the trastuzumab indeed alleviates the patient’s conditions and improve the prognosis. In addition, we find that the numbers of covariates selected by the SPAC adjustment method (SPAC-Lasso, SPAC-SCAD) are less than those selected by Lasso, SCAD and Enet, which is consistent with the discovery in the simulation studies. The estimated asymptotic variances ( σ ^ ) and 95% confidence interval lengths (L) of SPAC-Lasso and SPAC-SCAD are smallest. Specially, the values of σ ^ of SPAC-Lasso and SPAC-SCAD are 11% and 14% smaller than those of the Lasso and SCAD, respectively, which implies that our proposed SPAC adjustment method can improve the asymptotic performance of the existing unadjusted and penalty-based regression adjustment methods.

5. Conclusions

In this paper, we studied the estimation problem of ATE for Rubin causal model when the covariates are highly correlated. We proposed the SPAC adjustment method (SPAC-Lasso, SPAC-SCAD) for ATE by combining SPAC variable selection method, Lasso and SCAD penalty functions, and regression adjustment technique, which is an extension for the SPAC method of high-dimensional regression models. In theory, we showed the asymptotic normality of the proposed SPAC-Lasso adjustment estimator under some regularity conditions. By some simulation studies and a real data analysis, we showed the advantages of our proposed method in terms of estimating average treatment effect and selecting the important covariates. Thus, the proposed SPAC adjustment method can improve the estimation accuracy for the Rubin causal model with highly correlated covariates.

Author Contributions

Conceptualization, G.L.; Methodology, L.Y.; Validation, F.Z.; Formal analysis, F.Z.; Investigation, Z.D.; Data curation, Z.D.; Writing—original draft preparation, Z.D.; Writing—review and editing, L.Y.; Supervision, L.Y.; Project administration, G.L.; Funding acquisition, G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Nos. 12001277, 12271046, 12131006 and 11971001), the National Social Science Foundation of China (No. 21BTJ030), the Tianjin Natural Science Foundation (No. 19JCZDJC32300).

Data Availability Statement

Not applicable.

Acknowledgments

The authors are grateful to the editor, the associate editor and the three anonymous referees for the constructive comments and suggestions that led to significant improvement of an early manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Some Lemmas and Their Proofs

This Section will provide three Lemmas that are needed for the proof of Theorem 1. We will drop the superscript on h γ , e, γ and γ ^ and focus on the proof for treatment group A, as the same analysis can be applied to control group B.
Lemma A1.
Let M 1 = 1 n A i A X ˜ i ( e i e ¯ A ) + 1 n A i A X ˜ i ( x i x ¯ A ) T ( D D ^ ) γ η λ A , where e ¯ A = n A 1 i A e i A . Suppose that regularity conditions (C1)–(C7) hold, then
P ( M 1 ) 1 2 p .
Proof. 
Recalling X ˜ i , we have
1 n A i A X ˜ i ( e i e ¯ A ) = 1 n A i A D ^ 1 ( x i x ¯ A ) ( e i e ¯ A ) = D ^ 1 · 1 n A i A x i e i x ¯ A e ¯ A .
By the condition (C3) and the sufficient accuracy of CLIME estimator d ^ j j , there exists constants L 0 and L 1 such that for a sufficiently large n,
L 0 d 11 , , d p p , d ^ 11 , , d ^ p p L 1 ,
then combined with the triangle inequality, we have
1 n A i A X ˜ i ( e i e ¯ A ) = max 1 j p d ^ j j 1 / 2 1 n A i A x i j e i x ¯ A j e ¯ A L 0 1 / 2 1 n A i A x i e i x ¯ A e ¯ A L 0 1 / 2 1 n A i A x i e i J 1 + L 0 1 / 2 x ¯ A e ¯ A J 2 ,
where x ¯ A j is the j-th element of x ¯ A .
For the first term J 1 in (A1), we have
J 1 1 n A i A x i e i 1 n i = 1 n x i e i + 1 n i = 1 n x i e i 1 n A i A x i e i 1 n i = 1 n x i e i + δ n ,
where δ n is defined in condition (C5). By condition (C2) and Cauchy-Schwarz inequality, we have
1 n i = 1 n x i j 2 e i 2 1 n i = 1 n x i j 4 1 / 2 1 n i = 1 n e i 4 1 / 2 L .
Using Lemma S1 in [12], we can show that
P 1 n A i A x i e i 1 n i = 1 n x i e i > t n 2 exp log p n A p ˜ A t n 2 ( 1 + ν ) 2 L = 2 exp { log p } = 2 p ,
where t n = ( 1 + ν ) L 1 / 2 p ˜ A 1 2 log p n . Hence,
P ( J 1 t n + δ n ) 1 2 p .
For the second term J 2 in (A1), using the condition (C2) and Lemma 1 in [13], we have
P x ¯ A e ¯ A ( 1 + ν ) L 1 / 2 p ˜ A 2 log p n 1 2 p .
Together with (A2) and (A3), it is easy to see that
P 1 n A i A X ˜ i ( e i e ¯ A ) L 0 1 / 2 ( 2 t n + δ n ) 1 2 p .
Due to X ˜ i = D ^ 1 ( x i x ¯ A ) , we have
1 n A i A D ^ 1 ( x i x ¯ A ) ( x i x ¯ A ) T ( D D ^ ) γ 1 L 0 O p M 1 log p n · 1 n A i A ( x i x ¯ A ) ( x i x ¯ A ) T γ = 1 L 0 O p M 1 log p n · 1 n A i A x i x i T γ x ¯ A x ¯ A T γ 1 L 0 O p M 1 log p n · 1 n A i A x i x i T γ + x ¯ A x ¯ A T γ 1 L 0 O p M 1 log p n · 1 n A i A x i x i T 1 n i = 1 n x i x i T γ + 1 n i = 1 n x i x i T γ + x ¯ A x ¯ A T γ 1 L 0 O p M 1 log p n · 1 n A i A x i x i T 1 n i = 1 n x i x i T · γ 1 + γ 1 + x ¯ A x ¯ A T · γ 1 ,
where M 1 > 0 . By Cauchy-Schwarz inequality and condition (C2), we have
1 n i = 1 n x i j 2 x i k 2 1 n i = 1 n x i j 4 1 2 1 n i = 1 n x i k 4 1 2 L .
Combined with Lemma S1 in [12], we have
P 1 n A i A x i x i T 1 n i = 1 n x i x i T ( 1 + ν ) L 1 / 2 p ˜ A 3 log p n 2 p .
By Lemma 1 in [13], we have
x ¯ A x ¯ A T x ¯ A 2 = o p log p n .
Recall the definition for SPAC and condition (C4), we have
γ 1 = j = 1 p β j d j j 1 L 0 j = 1 p | β j | B L 0 .
Together the above results, we have
I 2 = 1 n A i A X ˜ i ( x i x ¯ A ) T ( D D ^ ) γ = O p log p n .
By (A4), (A6) and condition (C7), we have
P 1 n A i A X ˜ i ( e i e ¯ A ) + 1 n A i A X ˜ i ( x i x ¯ A ) T ( D D ^ ) γ η λ A 1 2 p .
Then the proof is finished. □
Lemma A2.
Let M 2 = 1 n A i A X ˜ i X ˜ i T 1 n i = 1 n D ^ 1 x i x i T D ^ 1 C 1 log p n and C 1 = 2 ( 1 + ν ) L 1 / 2 ( p ˜ A L 0 ) 1 . Suppose that regularity conditions (C1)–(C3) hold, then
P ( M 2 ) 1 2 p .
Proof. 
From the definition of X ˜ i in (A9), we have
1 n A i A X ˜ i X ˜ i T 1 n i = 1 n D ^ 1 x i x i T D ^ 1 1 L 0 1 n A i A x i x i T 1 n i = 1 n x i x i T x ¯ A x ¯ A T 1 L 0 1 n A i A x i x i T 1 n i = 1 n x i x i T ( * ) + 1 L 0 x ¯ A x ¯ A T ( * * ) ,
the last inequation is obtained by triangle inequality.
By Cauchy-Schwarz inequality and condition (C2), we have
1 n i = 1 n x i j 2 x i k 2 1 n i = 1 n x i j 4 1 / 2 1 n i = 1 n x i k 4 1 / 2 L .
Invoking Lemma S1 in [12] and n A / n = p ˜ A in condition (C1), we can bound the first term (∗) in (A7) as follows,
P 1 n A i A x i x i T 1 n i = 1 n x i x i T ( 1 + ν ) L 1 / 2 p ˜ A 3 log p n 2 exp { log p } = 2 p .
For the second term (∗∗) in (A7), we have
x ¯ A x ¯ A T x ¯ A 2 = o p log p n .
Together with (A8), we have
P 1 n A i A X ˜ i X ˜ i T 1 n i = 1 n D ^ 1 x i x i T D ^ 1 C 1 log p n 1 2 p .
Then the proof is finished. □
Lemma A3.
Suppose that regularity conditions (C1)–(C7) hold, then
h γ 1 = o p 1 log p ,
where h γ = γ ^ SPAC Lasso γ .
Proof. 
Note the SPAC estimator γ ^ SPAC Lasso is defined by
γ ^ SPAC Lasso = argmin γ 1 2 n A i A Y i ( 1 ) Y ¯ A ( x i x ¯ A ) T D ^ γ 2 + λ A j = 1 p d ^ j j | γ j | ,
which can be rewritten as
γ ^ * = argmin γ * 1 2 n A i A Y i ( 1 ) Y ¯ A X ˜ i T γ * 2 + λ A j = 1 p | γ * j | ,
where X ˜ i = D ^ 1 ( x i x ¯ A ) , γ * = D ^ 2 γ .
Then, the Karush-Kuhn-Tucker (KKT) condition for γ ^ * is
1 n A i A X ˜ i Y i ( 1 ) Y ¯ A X ˜ i T γ ^ * = λ A κ ,
where κ is the subgradient of γ * 1 at γ * = γ ^ * , that is
κ γ * 1 | γ * = γ ^ * with κ j [ 1 , 1 ] , if γ ^ * j = 0 , κ j = sign ( γ ^ * j ) , otherwise .
By the decomposition of Y i ( 1 ) in (9), we have
Y i ( 1 ) Y ¯ A = ( x i x ¯ A ) T β + e i e ¯ A = ( x i x ¯ A ) T D γ + e i e ¯ A = X ˜ i T γ * + ( e i e ¯ A ) + ( x i x ¯ A ) T ( D D ^ ) γ .
Hence, (A10) can be expressed as
1 n A i A X ˜ i X ˜ i T γ * γ ^ * + 1 n A i A X ˜ i ( e i e ¯ A ) + 1 n A i A X ˜ i ( x i x ¯ A ) T ( D D ^ ) γ = λ A κ ,
where D = diag { d 11 1 / 2 , , d p p 1 / 2 } . Premultiplying (A11) by h γ * T = ( γ * γ ^ * ) T , we have
λ A ( γ * γ ^ * ) T κ = 1 n A i A ( γ * γ ^ * ) T X ˜ i X ˜ i T γ * γ ^ * 1 n A i A h γ * T X ˜ i ( e i e ¯ A ) 1 n A i A h γ * T X ˜ i ( x i x ¯ A ) T ( D D ^ ) γ .
Then, we have
1 n A i A X ˜ i T h γ * 2 λ A γ * 1 γ ^ * 1 + 1 n A i A h γ * T X ˜ i ( e i e ¯ A ) + 1 n A i A h γ * T X ˜ i ( x i x ¯ A ) T ( D D ^ ) γ .
Based on Hölder inequality, the above inequation can be written as
1 n A i A X ˜ i T h γ * 2 λ A γ * 1 γ ^ * 1 + h γ * 1 1 n A i A X ˜ i ( e i e ¯ A ) + h γ * 1 1 n A i A X ˜ i ( x i x ¯ A ) T ( D D ^ ) γ .
Using Lemma A1 in the Appendix, we have
1 n A i A X ˜ i T h γ * 2 λ A γ * 1 γ ^ * 1 + η h γ * 1 .
According to the triangle inequality and the definition of h γ * = γ ^ * γ * , we have
γ * 1 γ ^ * 1 2 γ * S c 1 + γ ^ * S γ * S 1 γ ^ * S c γ * S c 1 = h γ * S 1 h γ * S c 1 + 2 γ * S c 1 .
Hence,
1 n A i A X ˜ i T h γ * 2 λ A h γ * S 1 h γ * S c 1 + 2 γ * S c 1 + η h γ * 1 λ A ( η 1 ) h γ * S c 1 + ( 1 + η ) h γ * S 1 + 2 γ * S c 1 .
Noting that n A 1 i A X ˜ i T h γ * 2 0 , and by the definition of s λ in Definition 1, we have
( 1 η ) h γ * S c 1 ( 1 + η ) h γ * S 1 + 2 γ * S c 1 ( 1 + η ) h γ * S 1 + 2 L 1 L 0 s λ λ A .
Next we will consider two cases for ( 1 + η ) h γ * S 1 + 2 L 1 L 0 1 / 2 s λ λ A , respectively.
(i) ( 1 + η ) h γ * S 1 + 2 L 1 L 0 1 / 2 s λ λ A ( 1 η ) ξ h γ * S 1 . By (A12), we have
h γ * 1 = h γ * S 1 + h γ * S c 1 h γ * S 1 + ( 1 + η ) 1 η h γ * S 1 + 2 L 1 s λ λ A L 0 ( 1 η ) = 1 + η 1 η + 1 h γ * S 1 + 2 L 1 s λ λ A L 0 ( 1 η ) 2 L 1 s λ λ A L 0 ( 1 η ) 2 ( 1 η ) ξ ( 1 + η ) + 1 .
Combining the above results with conditions (C5) and (C7), we can show that s λ λ A = o 1 log p .
(ii) ( 1 + η ) h γ * S 1 + 2 L 1 L 0 1 / 2 s λ λ A < ( 1 η ) ξ h γ * S 1 . By (A12), we can obtain that
h γ * S c 1 ξ h γ * S 1 .
By condition (C6), we have
h γ * 1 = h γ * S 1 + h γ * S c 1 ( 1 + ξ ) h γ * S 1 ( 1 + ξ ) C 0 s λ Σ * h γ * .
Using (A11) and Lemma A1, and combining with the triangle inequality, we can show that
1 n A i A X ˜ i X ˜ i T h γ * 1 n A i A X ˜ i X ˜ i T h γ * 1 n A i A X ˜ i ( e i e ¯ A ) 1 n A i A X ˜ i ( x i x ¯ A ) T ( D D ^ ) γ + 1 n A i A X ˜ i ( e i e ¯ A ) + 1 n A i A X ˜ i ( x i x ¯ A ) T ( D D ^ ) γ ( 1 + η ) λ A ,
where the last inequality holds on the set M 1 of Lemma A1. When the events M 1 and M 2 of Lemma A2 hold, we have
1 n i = 1 n D ^ 1 x i x i T D ^ 1 h γ * 1 n i = 1 n D ^ 1 x i x i T D ^ 1 h γ * 1 n A i A X ˜ i X ˜ i T h γ * + 1 n A i A X ˜ i X ˜ i T h γ * C 1 log p n h γ * 1 + 1 n A i A X ˜ i X ˜ i T h γ * .
By condition (C5) and (A14), we can show that
h γ * 1 ( 1 + ξ ) C 0 s λ log p n h γ * 1 + ( 1 + η ) s λ λ A ( 1 + ξ ) C 0 o ( 1 ) h γ * 1 + ( 1 + η ) s λ λ A .
Hence, we obtain that h γ * 1 = o p 1 log p by using the conditions (C5) and (C7).
Combining the cases (i) and (ii), we know that h γ * 1 = o p 1 log p holds. According to the definitions of h γ , h γ * and γ ^ * , we have
h γ 1 = γ ^ SPAC Lasso γ 1 = o p 1 log p .
Then the proof is finished. □

References

  1. Fan, J.; Lv, J. Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 2008, 70, 849–911. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  3. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  4. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef] [Green Version]
  5. Rubin, D.B. Estimating causal effects of treatments in randomized and nonrandomized Studies. J. Educ. Psychol. 1974, 66, 688–701. [Google Scholar] [CrossRef] [Green Version]
  6. Neyman, J. On the application of probability theory to agricultural experiments. Essay on principles, section 9. Translation of original 1923 paper, which appeared in roczniki nauk rolniczych. Stat. Sci. 1990, 5, 465–472. [Google Scholar]
  7. Rubin, D.B. Matched Sampling for Causal Effects; Cambridge University Press: New York, NY, USA, 2006. [Google Scholar]
  8. Imbens, G.W.; Rubin, D.B. Causal Inference in Statistics, Social, and Biomedical Sciences: An Introduction; Cambridge University Press: New York, NY, USA, 2015. [Google Scholar]
  9. Belloni, A.; Chernozhukov, V.; Hansen, C. Inference on treatment effects after selection among high-dimensional controls. Rev. Econ. Stud. 2014, 81, 608–650. [Google Scholar] [CrossRef]
  10. Belloni, A.; Chernozhukov, V.; Fernández-Val, I.; Hansen, C. Program evaluation and causal inference with high-dimensional data. Econometrica 2017, 85, 233–298. [Google Scholar] [CrossRef] [Green Version]
  11. Wager, S.; Du, W.F.; Taylor, J.; Tibshirani, R. High-dimensional regression adjustments in randomized experiments. Proc. Natl. Acad. Sci. USA 2016, 113, 12673–12678. [Google Scholar] [CrossRef] [Green Version]
  12. Bloniarz, A.; Liu, H.Z.; Zhang, C.H.; Sekhon, J.S.; Yu, B. Lasso adjustments of treatment effect estimates in randomized experiments. Proc. Natl. Acad. Sci. USA 2016, 113, 7383–7390. [Google Scholar] [CrossRef] [Green Version]
  13. Yue, L.L.; Li, G.R.; Lian, H.; Wan, X. Regression adjustment for treatment effect with multicollinearity in high dimensions. Comput. Stat. Data Anal. 2019, 134, 17–35. [Google Scholar] [CrossRef]
  14. Wang, H.; Lengerich, B.J.; Aragam, B.; Xing, E.P. Precision Lasso: Accounting for correlations and linear dependencies in high-dimensional genomic data. Bioinformatics 2019, 35, 1181–1187. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Zhu, W.; Lévy-Leduc, C.; Ternès, N. A variable selection approach for highly correlated predictors in high-dimensional genomic data. Bioinformatics 2021, 37, 2238–2244. [Google Scholar] [CrossRef] [PubMed]
  16. Zhao, P.; Yu, B. On model selection consistency of Lasso. J. Mach. Learn. Res. 2006, 7, 2541–2563. [Google Scholar]
  17. Bühlmann, P.; Kalisch, M.; Maathuis, M.H. Variable selection in high-dimensional linear models: Partially faithful distributions and the PC-simple algorithm. Biometrika 2010, 97, 261–278. [Google Scholar] [CrossRef] [Green Version]
  18. Fan, J.; Shao, Q.M.; Zhou, W.X. Are discoveries spurious Distributions of maximum spurious correlations and their applications. Ann. Stat. 2018, 46, 989–1017. [Google Scholar] [CrossRef]
  19. Xue, F.; Qu, A. Semi-Standard partial covariance variable selection when irrepresentable conditions fail. Stat. Sin. 2022, 32, 1881–1909. [Google Scholar] [CrossRef]
  20. Imbens, G.W. Nonparametric estimation of average treatment effects under exogeneity: A review. Rev. Econ. Stat. 2004, 86, 4–29. [Google Scholar] [CrossRef] [Green Version]
  21. Freedman, D.A. On regression adjustments in experiments with several treatments. Ann. Appl. Stat. 2008, 2, 176–196. [Google Scholar] [CrossRef]
  22. Lin, W. Agnostic notes on regression adjustments to experimental data: Reexamining Freedman’s critique. Ann. Appl. Stat. 2013, 7, 295–318. [Google Scholar] [CrossRef]
  23. Lauritzen, S.L. Graphical Models; Clarendon Press: Oxford, UK, 1996. [Google Scholar]
  24. Raveh, A. On the use of the inverse of the correlation matrix in multivariate data analysis. Am. Stat. 1985, 39, 39–42. [Google Scholar]
  25. Cai, T.; Liu, W.; Luo, X. A constrained L1 minimization approach to sparse precision matrix estimation. J. Am. Stat. Assoc. 2011, 106, 594–607. [Google Scholar] [CrossRef] [Green Version]
  26. Balmand, S.; Dalalyan, A.S. On estimation of the diagonal elements of a sparse precision matrix. Electron. J. Stat. 2016, 10, 1551–1579. [Google Scholar] [CrossRef]
  27. Avella-Medina, M.; Battey, H.S.; Fan, J.; Li, Q. Robust estimation of high-dimensional covariance and precision matrices. Biometrika 2018, 105, 271–284. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  28. Fan, J.; Peng, H. Nonconcave penalized likelihood with a diverging number of parameters. Ann. Stat. 2004, 32, 928–961. [Google Scholar] [CrossRef] [Green Version]
  29. Blazère, M.; Loubes, J.M.; Gamboa, F. Oracle inequalities for a group Lasso procedure applied to generalized linear models in high dimension. IEEE Trans. Inf. Theory 2014, 60, 2303–2318. [Google Scholar] [CrossRef] [Green Version]
  30. Pang, H.; Liu, H.; Vanderbei, R.J. The fastclime package for linear programming and large-scale precision matrix estimation in R. J. Mach. Learn. Res. 2014, 15, 489–493. [Google Scholar]
  31. Gianni, L.; Eiermann, W.; Semiglazov, V.; Manikhas, A.; Lluch, A.; Tjulandin, S.; Zambetti, M.; Vazquez, F.; Byakhow, M.; Lichinitser, M.; et al. Neoadjuvant chemotherapy with trastuzumab followed by adjuvant trastuzumab versus neoadjuvant chemotherapy alone, in patients with HER2-positive locally advanced breast cancer (the NOAH trial): A randomised controlled superiority trial with a parallel HER2-negative cohort. Lancet 2010, 375, 377–384. [Google Scholar]
  32. Prat, A.; Bianchini, G.; Thomas, M.; Belousov, A.; Cheang, M.C.; Koehler, A.; Gómez, P.; Semiglazov, V.; Eiermann, W.; Tjulandin, S.; et al. Research-based PAM50 subtype predictor identifies higher responses and improved survival outcomes in HER2-positive breast cancer in the NOAH study. Clin. Cancer Res. 2014, 20, 511–521. [Google Scholar] [CrossRef] [Green Version]
  33. Roth, J.; Simon, N. A framework for estimating and testing qualitative interactions with applications to predictive biomarkers. Biostatistics 2018, 19, 263–280. [Google Scholar] [CrossRef] [Green Version]
  34. Dudoit, S.; Fridlyand, J.; Speed, T.P. Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 2002, 97, 77–87. [Google Scholar] [CrossRef]
Table 1. Finite sample performance of the ATE estimators.
Table 1. Finite sample performance of the ATE estimators.
α = ( 0.1 , 0.3 , 0.8 ) α = ( 0.2 , 0.5 , 0.9 ) α = ( 0.5 , 0.7 , 0.9 )
pMethods|Bias|SDRMSE|Bias|SDRMSE|Bias|SDRMSE
p = 500 unadj0.00400.78550.78550.00580.87780.87780.00351.18441.1844
Lasso0.00210.11550.11550.00850.15120.15150.00280.16430.1643
SCAD0.00100.09480.09480.01590.23580.23640.00190.28390.2839
Enet0.00230.10980.10980.00860.15330.15350.00320.16450.1646
SPAC-Lasso0.00140.10680.10680.00200.10500.10500.00080.10990.1099
SPAC-SCAD0.00110.09460.09460.00160.09770.09780.00130.12840.1284
p = 1000 unadj0.00520.77960.77960.01620.91520.91530.04291.19051.1913
Lasso0.00090.11990.11990.00270.15150.15150.00930.14940.1497
SCAD0.00030.09180.09180.00430.24030.24040.01320.26650.2669
Enet0.00130.11360.11360.00250.15240.15240.00870.15230.1526
SPAC-Lasso0.00010.10510.10510.00070.10750.10750.00360.09550.0956
SPAC-SCAD0.00020.09160.09160.00000.09710.09710.00440.11570.1158
p = 2000 unadj0.01040.77650.77660.05560.92370.92540.06961.24921.2512
Lasso0.00130.12000.12010.00100.16580.16580.00920.18010.1804
SCAD0.00040.09970.09970.00370.25210.25210.01340.26200.2623
Enet0.00200.11580.11580.00270.16590.16590.00630.18790.1880
SPAC-Lasso0.00050.10740.10740.00040.10450.10450.00400.11040.1105
SPAC-SCAD0.00020.09910.09910.00130.09690.09690.00340.14590.1459
Table 2. Variable selection results for treatment and control groups.
Table 2. Variable selection results for treatment and control groups.
α = ( 0.1 , 0.3 , 0.8 ) α = ( 0.2 , 0.5 , 0.9 ) α = ( 0.5 , 0.7 , 0.9 )
pMethodsSFNRFPRSFNRFPRSFNRFPR
p = 500 Lasso17.6500.0000.01834.6360.1400.05443.0910.1700.073
SCAD9.0090.0000.00012.9370.5520.01832.3330.8190.063
Enet18.2710.0000.01936.0850.1540.05845.1580.1820.077
SPAC-Lasso9.0070.0000.0009.0000.0000.0009.0000.0000.000
SPAC-SCAD9.0000.0000.0009.0000.0000.0008.9370.0070.000
p = 1000 Lasso20.1160.0000.01140.9220.2600.03547.3130.1030.040
SCAD9.3250.0000.00021.0040.7550.01938.1490.8860.037
Enet20.8410.0000.01242.7430.2660.03649.4560.1140.042
SPAC-Lasso9.1930.0000.0009.0050.0000.0009.0000.0000.000
SPAC-SCAD9.0000.0000.0009.0000.0000.0008.9840.0000.000
p = 2000 Lasso20.0830.0000.00642.0910.2770.01854.9690.2810.024
SCAD9.0820.0000.00020.0770.7160.00942.5020.9580.021
Enet20.7480.0000.00644.2700.2860.01958.5560.2930.026
SPAC-Lasso9.2180.0000.0009.0020.0000.0009.0000.0000.000
SPAC-SCAD9.0000.0000.0009.0000.0000.0008.9740.0030.000
Table 3. The performance of the variance estimates and confidence intervals.
Table 3. The performance of the variance estimates and confidence intervals.
α = ( 0.1 , 0.3 , 0.8 ) α = ( 0.2 , 0.5 , 0.9 ) α = ( 0.5 , 0.7 , 0.9 )
pMethodsMVEMCPMILMVEMCPMILMVEMCPMIL
p = 500 unadj12.6380.9563.13314.2930.9523.54318.8320.9494.669
Lasso2.2570.9840.5602.4150.9530.5992.4640.9370.611
SCAD2.0660.9940.5123.6450.9420.9044.3220.9451.071
Enet2.1790.9850.5402.4300.9510.6032.5110.9380.622
SPAC-Lasso2.2040.9880.5462.1070.9880.5222.2070.9860.547
SPAC-SCAD2.0650.9940.5121.9930.9900.4942.4270.9790.602
p = 1000 unadj12.2170.9403.02914.5000.9453.59518.4410.9474.572
Lasso2.2970.9820.5692.4330.9490.6032.2830.9430.566
SCAD2.0920.9950.5193.8010.9520.9424.1570.9481.031
Enet2.2090.9820.5482.4430.9500.6062.3300.9440.578
SPAC-Lasso2.2330.9910.5542.2040.9860.5461.9800.9870.491
SPAC-SCAD2.0940.9950.5192.0760.9920.5152.2430.9830.556
p = 2000 unadj12.6460.9613.13514.9840.9533.71520.0610.9544.974
Lasso2.2240.9820.5512.5420.9420.6302.5340.9180.628
SCAD2.0510.9870.5093.8560.9410.9564.1610.9491.032
Enet2.1370.9800.5302.5610.9460.6352.6500.9160.657
SPAC-Lasso2.1710.9910.5382.1470.9890.5322.1650.9860.537
SPAC-SCAD2.0460.9890.5072.0420.9930.5062.5910.9730.642
Table 4. The performance of different methods for the treatment effect estimation of trastuzumab.
Table 4. The performance of different methods for the treatment effect estimation of trastuzumab.
unadjLassoSCADEnetSPAC-LassoSPAC-SCAD
τ ^ 0.25550.24910.24880.24730.24540.2435
S16.50017.00020.00015.0008.500
σ ^ 0.96700.80310.85190.75660.71360.7317
L0.30350.25210.26740.23750.22400.2296
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Diao, Z.; Yue, L.; Zhao, F.; Li, G. High-Dimensional Regression Adjustment Estimation for Average Treatment Effect with Highly Correlated Covariates. Mathematics 2022, 10, 4715. https://doi.org/10.3390/math10244715

AMA Style

Diao Z, Yue L, Zhao F, Li G. High-Dimensional Regression Adjustment Estimation for Average Treatment Effect with Highly Correlated Covariates. Mathematics. 2022; 10(24):4715. https://doi.org/10.3390/math10244715

Chicago/Turabian Style

Diao, Zeyu, Lili Yue, Fanrong Zhao, and Gaorong Li. 2022. "High-Dimensional Regression Adjustment Estimation for Average Treatment Effect with Highly Correlated Covariates" Mathematics 10, no. 24: 4715. https://doi.org/10.3390/math10244715

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop