High-Dimensional Regression Adjustment Estimation for Average Treatment Effect with Highly Correlated Covariates

Diao, Zeyu; Yue, Lili; Zhao, Fanrong; Li, Gaorong

doi:10.3390/math10244715

Open AccessArticle

High-Dimensional Regression Adjustment Estimation for Average Treatment Effect with Highly Correlated Covariates

by

Zeyu Diao

¹,

Lili Yue

^2,*,

Fanrong Zhao

³ and

Gaorong Li

¹

School of Statistics, Beijing Normal University, Beijing 100875, China

²

School of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, China

³

School of Mathematical Science, Shanxi University, Taiyuan 030006, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(24), 4715; https://doi.org/10.3390/math10244715

Submission received: 8 November 2022 / Revised: 5 December 2022 / Accepted: 10 December 2022 / Published: 12 December 2022

(This article belongs to the Special Issue Computational Statistics and Data Analysis)

Download Versions Notes

Abstract

Regression adjustment is often used to estimate average treatment effect (ATE) in randomized experiments. Recently, some penalty-based regression adjustment methods have been proposed to handle the high-dimensional problem. However, these existing high-dimensional regression adjustment methods may fail to achieve satisfactory performance when the covariates are highly correlated. In this paper, we propose a novel adjustment estimation method for ATE by combining the semi-standard partial covariance (SPAC) and regression adjustment methods. Under some regularity conditions, the asymptotic normality of our proposed SPAC adjustment ATE estimator is shown. Some simulation studies and an analysis of HER2 breast cancer data are carried out to illustrate the advantage of our proposed SPAC adjustment method in addressing the highly correlated problem of the Rubin causal model.

Keywords:

average treatment effect; highly correlated covariates; regression adjustment; rubin causal model; semi-standard partial covariance

MSC:

62D99; 62E20; 62J07

1. Introduction

Accompanied by the rapid development in information technology, people have opportunities to collect a massive amount of data in many fields, such as genomics, biomedicine, aerography and so on, where the dimension of covariates p often far exceeds the sample size n. Despite the promising application prospects, there are many problems and challenges among statistical inference for high-dimensional data. For instance, the sample covariance matrix is huge and noninvertible under the setting

p > n

, the unimportant covariates are highly correlated with the response variable because they are associated with the important covariates ([1]). To deal with these problems and challenges, many penalty-based approaches have been proposed to select important covariates and estimate the unknown parameters simultaneously, including Lasso ([2]), SCAD ([3]) and Elastic-net ([4]) penalties. The above literatures mainly focus on considering the regression models and traditional correlations between the covariates and response variable.

In some cases, the traditional correlations cannot fully depict the influence mechanism of variables. Researchers have studied the causal relations among the variables and developed the Rubin causal (Neyman-Rubin) models (see [5,6]); details can be found in Refs. [7,8]. For the case of high-dimensional data, Refs. [9,10] suggested that standard high-dimensional penalty-based methods can be used to estimate the average treatment effect (ATE). Ref. [11] developed a risk-consistent regression adjustment approach for ATE using Lasso penalty in [2]. Ref. [12] proposed Lasso-adjusted ATE estimator by combining the Lasso penalty and regression adjustment method, and showed that the proposed method can reduce the variance of unadjusted ATE estimator in [6]. Ref. [13] further considered the multicollinearity problem in high dimensions, and proposed an Elastic-net adjustment method for ATE.

However, the correlations between the important and unimportant covariates are usually higher than those of the important covariates in high-dimensional settings (see [14,15]). Under this status, the irrepresentable condition ([16]) could fail such that the Lasso-based penalty methods may fail to correctly estimate the signs and distinguish the important and unimportant covariates. Then, the corresponding adjustment ATE estimator may perform poorly. So far, many research scholars have considered the highly correlated problem in high dimensions and provided some effective methods to undertake variable selection. For example, Ref. [17] proposed the Peter-Clark-simple (PC-simple) algorithm to select the important covariates using partial correlation, Ref. [18] developed the factor-adjusted regularized model selection (Farm-Select) method. Ref. [19] gave the semi-standard partial covariance (SPAC) method to effectively choose covariates which have a direct effect on the response variable, and showed that the SPAC outperforms the PC-simple and Farm-Select methods when the original irrepresentable condition in Ref. [16] fails. Nevertheless, these variable selection methods have not yet been used on the fields of causal inference.

In this paper, we consider the estimation problem of ATE in the Rubin causal model with highly correlated covariates. The main contributions of this paper are four-fold. Firstly, the SPAC adjustment estimator is developed by a novel combination of the SPAC variable selection and regression adjustment methods. Secondly, the framework is an extension of that in [19] to study the causal inference and [12] to handle the highly correlated problem. Thirdly, the theoretical property is shown under some regularity conditions. Fourthly, the performance of our proposed SPAC adjustment method is satisfactory, which can be observed by the numerical results of a real data analysis and some simulation studies.

The rest of this article is organized as follows. In Section 2, the SPAC adjustment method for ATE is proposed for the Rubin causal model with highly correlated covariates in high dimensions, and the asymptotic property of the proposed SPAC-Lasso adjustment estimator for ATE is also developed under some regularity conditions. In Section 3, some simulation studies are assigned to assess the effectiveness of our proposed SPAC adjustment method. In Section 4, the proposed estimation approach is applied to an HER2 breast cancer dataset. Some concluding discussions are provided in Section 5. The Appendix A is devoted to some Lemmas related to the proof of theorem.

Notation 1.

For the sake of description, some notations are introduced as follows. For any column vector

u = {(u_{1}, \dots, u_{p})}^{T}

and a subset

S \subset {1, \dots, p}

, let

{∥ u ∥}_{1} = \sum_{j = 1}^{p} | u_{i} |

,

{∥ u ∥}_{2}^{2} = \sum_{j = 1}^{p} u_{i}^{2}

and

{∥ u ∥}_{\infty} = \max_{i = 1, \dots, p} | u_{i} |

,

u^{S} = {u_{j} : j \in S}

,

S^{C}

denotes the complement of S,

| S |

denote the cardinality of S. For a matrix

D

,

D^{T}

and

D^{- 1}

denote the transpose and inverse of matrix

D

, respectively. The notation “

\overset{L}{⟶}

” denotes the convergence in distribution.

2. Methodology and Theoretical Property

2.1. Spac Adjustment Method for ATE

We frame our analysis in terms of the Rubin causal model. Let i be the units in the population of size n,

Y_{i}

be the potential outcome variable,

x_{i} = {(x_{i 1}, \dots, x_{i p})}^{T} \in R^{p}

be the p-dimensional covariates with p far exceeding the sample size n, the full design matrix of the experiment be

X = {(x_{1}, \dots, x_{n})}^{T}

, each covariate

X_{j} = {(x_{1 j}, \dots, x_{n j})}^{T}

(

j = 1, \dots, p

) is standardized with

X_{j}^{T} X_{j} = n

and

\sum_{i = 1}^{n} x_{i j} = 0

. The observed data

x_{i}

(

i = 1, \dots, n

) can be viewed as independent identically distributed (i.i.d.) from a distribution with mean

0

and positive definite covariance matrix

Σ_{p \times p}

, and all the diagonal elements of

Σ

are equal to 1. Each unit is randomly assigned to the treatment group or control group, and the treatment indicator is denoted by

T_{i}

with

T_{i} = 1

for a treated individual and

T_{i} = 0

otherwise. Then, the observed potential outcome for individual i is

\begin{matrix} Y_{i}^{obs} = T_{i} Y_{i} (1) + (1 - T_{i}) Y_{i} (0), \end{matrix}

where

Y_{i} (1)

and

Y_{i} (0)

are the corresponding potential outcomes under treatment and control groups, respectively, that is,

Y_{i}^{obs} = Y_{i} (1)

for

T_{i} = 1

,

Y_{i}^{obs} = Y_{i} (0)

for

T_{i} = 0

. The numbers of the treated and control units are equal to

n_{A} = | A |

and

n_{B} = | B |

, respectively, with

A = {i \in {1, \dots, n} : T_{i} = 1}

,

B = {i \in {1, \dots, n} : T_{i} = 0}

, and

n_{A} + n_{B} = n

.

In randomized experiments, the sample is often not randomly taken from the population (superpopulation) of interest (see [12,13,20]). In this paper, we focus on ATE in the finite sample, which is defined as

\begin{matrix} τ = {\bar{Y}}_{1} - {\bar{Y}}_{0}, \end{matrix}

(1)

where

{\bar{Y}}_{1} = n^{- 1} \sum_{i = 1}^{n} Y_{i} (1)

and

{\bar{Y}}_{0} = n^{- 1} \sum_{i = 1}^{n} Y_{i} (0)

are the average responses if all individuals receive treatment or not. Clearly, the averages of potential outcomes over the whole population

{\bar{Y}}_{1}

and

{\bar{Y}}_{0}

are fixed. Based on the idea of replacing the population averages

{\bar{Y}}_{s}

(s = 0, 1)

with the sample averages, a nature unadjusted ATE estimator is obtained as follows,

\begin{matrix} {\hat{τ}}_{unadj} = \frac{1}{n_{A}} \sum_{i \in A} Y_{i} (1) - \frac{1}{n_{B}} \sum_{i \in B} Y_{i} (0) . \end{matrix}

(2)

As pointed out by [12,21,22], the information of covariates

x_{i}

can often be used to adjust the estimator in (2) in hope of improving estimation precision. For the high-dimensional data, Ref. [12] proposed the following Lasso-adjusted ATE estimator

\begin{matrix} {\hat{τ}}_{Lasso} = \{{\bar{Y}}_{A} - {({\bar{x}}_{A} - \bar{x})}^{T} {\hat{β}}_{Lasso}^{A}\} - \{{\bar{Y}}_{B} - {({\bar{x}}_{B} - \bar{x})}^{T} {\hat{β}}_{Lasso}^{B}\}, \end{matrix}

(3)

where

{\bar{Y}}_{A} = n_{A}^{- 1} \sum_{i \in A} Y_{i} (1)

,

{\bar{Y}}_{B} = n_{B}^{- 1} \sum_{i \in B} Y_{i} (0)

,

{\bar{x}}_{A} = n_{A}^{- 1} \sum_{i \in A} x_{i}

,

{\bar{x}}_{B} = n_{B}^{- 1} \sum_{i \in B} x_{i}

,

\bar{x} = n^{- 1} \sum_{i = 1}^{n} x_{i}

, and the terms

{\bar{x}}_{w} - \bar{x}

for

w = A

and B illustrate the fluctuations between the subsample and full sample of covariates. The adjustment vectors

{\hat{β}}_{Lasso}^{w}

are obtained based on the Lasso penalty,

\begin{matrix} {\hat{β}}_{Lasso}^{w} = \underset{β \in R^{p}}{argmin} [\frac{1}{2 n_{w}} \sum_{i \in w} {\{Y_{i}^{obs} - {\bar{Y}}_{w} - {(x_{i} - {\bar{x}}_{w})}^{T} β\}}^{2} + λ_{w} \sum_{j = 1}^{p} | β_{j} |], w = A, B, \end{matrix}

(4)

where

λ_{w} > 0

are regularization parameters for Lasso.

However, traditional penalty-based methods fail to effectively estimate the signs and select the important covariates when the important and unimportant covariates are highly correlated, see the details in [19]. This is especially critical in high-dimensional settings. To solve this problem, the SPAC method is proposed to capture the signal strengths of important covariates while eliminating the effects of covariates that are not directly related with the potential outcome variable Y but highly correlated with important covariates. The SPAC between

Y^{obs}

and the j-th covariate

X_{j}

is defined as

\begin{matrix} γ_{j} = β_{j} / d_{j j}^{1 / 2}, j = 1, \dots, p, \end{matrix}

(5)

where

d_{j j}

is the j-th diagonal element of precision matrix

Σ^{- 1}

,

1 / d_{j j}^{1 / 2} = {Var (X_{j} | X_{- j})}^{1 / 2}

= {(1 - R_{j}^{2})}^{1 / 2}

(see Refs. [23,24]), where

X_{- j} = {X_{k} : k = 1, . . ., j - 1, j + 1, . . ., p}

,

R_{j}

denotes the multiple correlations between the j-th covariate

X_{j}

and all the other covariates. In particular,

γ_{j}

is the same as

β_{j}

if

X_{j}

is independent of the other covariates. Otherwise, the SPAC

γ_{j}

mitigates the effect of correlations among the covariates by using

{(1 - R_{j}^{2})}^{1 / 2}

to multiply

β_{j}

. Obviously,

β_{j} = 0

if and only if

γ_{j} = 0

for

j = 1, \dots, p

. Hence, the SPAC estimator of adjustment vector can be obtained by replacing

β_{j}

in (4) with

γ_{j}

,

\begin{matrix} {\hat{γ}}_{SPAC - Lasso}^{w} = \underset{γ \in R^{p}}{argmin} [\frac{1}{2 n_{w}} \sum_{i \in w} {\{Y_{i}^{obs} - {\bar{Y}}_{w} - {(x_{i} - {\bar{x}}_{w})}^{T} \hat{D} γ\}}^{2} + λ_{w} \sum_{j = 1}^{p} {\hat{d}}_{j j} | γ_{j} |], \end{matrix}

(6)

where

w = A, B

,

\hat{D} = diag {{\hat{d}}_{11}^{1 / 2}, \dots, {\hat{d}}_{p p}^{1 / 2}}

,

{\hat{d}}_{j j}

is the consistent estimator of the j-th diagonal element of precision matrix. In detail,

{\hat{d}}_{j j}

can be adopted as the constrained

L_{1}

-minimization estimation (CLIME, [25]), residual variance estimator ([26]), robust matrix estimator ([27]). Consequently, the adjustment vectors

{\hat{β}}_{SPAC - Lasso}^{w}

can be given by using (5),

\begin{matrix} {\hat{β}}_{SPAC - Lasso}^{w} = \hat{D} {\hat{γ}}_{SPAC - Lasso}^{w}, w = A, B . \end{matrix}

(7)

Then, the SPAC-Lasso adjustment estimator of ATE is defined as

\begin{matrix} {\hat{τ}}_{SPAC - Lasso} = \{{\bar{Y}}_{A} - {({\bar{x}}_{A} - \bar{x})}^{T} {\hat{β}}_{SPAC - Lasso}^{A}\} - \{{\bar{Y}}_{B} - {({\bar{x}}_{B} - \bar{x})}^{T} {\hat{β}}_{SPAC - Lasso}^{B}\} . \end{matrix}

(8)

Similarly, we can obtain the SPAC-SCAD estimator of ATE by using the SCAD penalty in (6). The performance of our proposed SPAC adjustment methods (SPAC-Lasso and SPAC-SCAD) will be compared with those of the existing ATE estimation methods (unadjusted, Lasso-adjusted, SCAD-adjusted, Elastic-net adjusted) in the following simulation studies, and the theoretical property of the SPAC-Lasso adjustment estimator will be shown in the following subsection.

2.2. Regularity Conditions and Theoretical Property

For Rubin causal model in randomized experiments, there are no assumptions for the relationship between potential outcome variable Y and covariates

x

. To study the theoretical property of the proposed estimator

{\hat{τ}}_{SPAC - Lasso}

, we make the following linear decomposition and define the approximate sparsity, which are similar to that in [12].

Decomposition of the potential outcomes. The potential outcome can be divided into a linear term of covariates and an error term, which is formed as,

\begin{matrix} Y_{i} (1) = {\bar{Y}}_{1} + {(x_{i} - \bar{x})}^{T} β^{A} + e_{i}^{A}, Y_{i} (0) = {\bar{Y}}_{0} + {(x_{i} - \bar{x})}^{T} β^{B} + e_{i}^{B}, i = 1, \dots, n, \end{matrix}

(9)

where

\bar{x} = n^{- 1} \sum_{i = 1}^{n} x_{i}

,

β^{A}

and

β^{B}

are p-dimensional vectors of coefficients. In the above decomposition (9), all the quantities are fixed and deterministic numbers, and

\bar{e^{A}} = \bar{e^{B}} = 0

, where

\bar{e^{A}} = n^{- 1} \sum_{i = 1}^{n} e_{i}^{A}

,

\bar{e^{B}} = n^{- 1} \sum_{i = 1}^{n} e_{i}^{B}

.

Definition 1.

Similar to [12,13], we define the approximate sparsity measures

s_{λ}^{A}

and

s_{λ}^{B}

for treatment and control groups as

\begin{matrix} s_{λ}^{A} = \sum_{j = 1}^{p} \min \{|β_{j}^{A}| λ_{A}^{- 1}, 1\}, s_{λ}^{B} = \sum_{j = 1}^{p} \min \{|β_{j}^{B}| λ_{B}^{- 1}, 1\}, \end{matrix}

which are more flexible than

s^{w} = | {j : β_{j}^{w} \neq 0} |

with

w = A, B

.

s_{λ}^{A}

and

s_{λ}^{B}

are allowed to grow with n,

s_{λ} = \max \{s_{λ}^{A}, s_{λ}^{B}\}

.

In addition, the following regularity conditions are also needed to obtain the asymptotic normality of the proposed SPAC-Lasso adjustment estimation.

(C1): ${\tilde{p}}_{A} = n_{A} / n \to p_{A}$ and ${\tilde{p}}_{B} = n_{B} / n \to p_{B}$ as $n \to \infty$ , and $p_{A} \in (0, 1), p_{B} \in (0, 1)$ .
(C2): For $j = 1, \dots, p$ , there is a fixed constant $L > 0$ such that $n^{- 1} \sum_{i = 1}^{n} {(x_{i j} - {(\bar{x})}_{j})}^{4} \leq L$ , $n^{- 1} \sum_{i = 1}^{n} {(e_{i}^{A})}^{4} \leq L$ and $n^{- 1} \sum_{i = 1}^{n} {(e_{i}^{B})}^{4} \leq L$ .
(C3): The eigenvalues of the sample covariance matrix $n^{- 1} X^{T} X$ are bounded away from zero and infinity.
(C4): There exists a constant $B > 0$ such that $∥ β^{A} ∥_{1} \leq B$ , $∥ β^{B} ∥_{1} \leq B$ .
(C5): Let $δ_{n}$ be the maximum covariance between the error terms and the covariates

$δ_{n} = \max_{ω = A, B} \{\max_{j} | \frac{1}{n} \sum_{i = 1}^{n} (x_{i j} - {(\bar{x})}_{j}) (e_{i}^{ω} - \bar{e^{ω}}) |\} .$

Assume that $δ_{n} = o (1 / (s_{λ} \sqrt{\log p}))$ and $(s_{λ} \log p) / \sqrt{n} = o (1)$ .
(C6): Let $Σ_{*} = n^{- 1} \sum_{i = 1}^{n} {\hat{D}}^{- 1} (x_{i} - \bar{x}) {(x_{i} - \bar{x})}^{T} {\hat{D}}^{- 1}$ . There exist constants $C_{0} > 0$ and $ξ > 1$ , such that

${∥h_{γ_{*}}^{S}∥}_{1} \leq C_{0} s_{λ} {∥ Σ_{*} h_{γ_{*}} ∥}_{\infty}, \forall h_{γ_{*}} \in C,$

where $C = \{h_{γ_{*}} : {∥h_{γ_{*}}^{S^{c}}∥}_{1} \leq ξ {∥h_{γ_{*}}^{S}∥}_{1}\}$ and $S = \{j : | β_{j}^{A} | > λ_{A} or | β_{j}^{B} | > λ_{B}\} .$
(C7): Let $ν = \min {1 / 70, {(3 {\tilde{p}}_{A})}^{2} / 70, {(3 - 3 {\tilde{p}}_{A})}^{2} / 70}$ . For some constants $c > 0$ , $L_{0} > 0$ , $0 < η < (ξ - 1) / (ξ + 1)$ and $1 / η < M < \infty$ , the regularization parameters of the SPAC-Lasso satisfy that

$\begin{matrix} λ_{A} \in (\frac{1}{η}, M] \times (\frac{2 c (1 + ν) L^{1 / 2}}{{\tilde{p}}_{A} \sqrt{L_{0}}} \sqrt{\frac{2 \log p}{n}} + \frac{δ_{n}}{\sqrt{L_{0}}}), \\ λ_{B} \in (\frac{1}{η}, M] \times (\frac{2 c (1 + ν) L^{1 / 2}}{{\tilde{p}}_{B} \sqrt{L_{0}}} \sqrt{\frac{2 \log p}{n}} + \frac{δ_{n}}{\sqrt{L_{0}}}) . \end{matrix}$

Condition (C1) is a basic assumption for the probability of receiving the treatment or control. Condition (C2) is a moment condition for

x_{i j}

and error terms

e_{i}^{w}

(

w = A, B

), which is similar to the conditions in [12,21,22]. Conditions (C3) and (C4) are some regularity conditions for high-dimensional statistical inference (see [12,13,28,29]). Conditions (C5)–(C7) are needed to show the convergence rate for

{\hat{β}}_{SPAC - Lasso}

, and assumed based on the definition of approximate sparsity. These assumptions are similar to those in [12,13], and are weaker than the assumptions for strict sparsity.

Theorem 1.

Suppose that regularity conditions (C1)–(C7) hold, as

n \to \infty

, then

\begin{matrix} \sqrt{n} ({\hat{τ}}_{SPAC - Lasso} - τ) \overset{L}{⟶} N (0, σ^{2}), \end{matrix}

where

σ^{2} = \lim_{n \to \infty} [\frac{1 - p_{A}}{p_{A}} σ_{e^{A}}^{2} + \frac{p_{A}}{1 - p_{A}} σ_{e^{B}}^{2} + 2 σ_{e^{A} e^{B}}],

and

σ_{e^{A}}^{2} = n^{- 1} \sum_{i = 1}^{n} {(e_{i}^{A})}^{2}

,

σ_{e^{B}}^{2} = n^{- 1} \sum_{i = 1}^{n} {(e_{i}^{B})}^{2}

,

σ_{e^{A} e^{B}} = n^{- 1} \sum_{i = 1}^{n} e_{i}^{A} e_{i}^{B}

.

Theorem 1 shows that the asymptotic normality of the proposed SPAC-Lasso adjustment estimator

{\hat{τ}}_{SPAC - Lasso}

for highly correlated covariates based on the approximate sparsity measures and appropriate tuning parameters

λ_{A}

and

λ_{B}

. Without loss of generality, we assume that

{\bar{Y}}_{1} = 0, {\bar{Y}}_{0} = 0

and

\bar{x} = 0 .

The assumptions and the results in Theorem 1 are similar to that in [12,13].

Proof.

According to the decomposition of

Y_{i} (1)

and

Y_{i} (0)

in (9), we have

\begin{matrix} \sqrt{n} ({\hat{τ}}_{SPAC - Lasso} - τ) = & \sqrt{n} ({\bar{Y}}_{A} - {\bar{x}}_{A}^{T} {\hat{β}}_{SPAC - Lasso}^{A}) - \sqrt{n} ({\bar{Y}}_{B} - {\bar{x}}_{B}^{T} {\hat{β}}_{SPAC - Lasso}^{B}) \\ = & \sqrt{n} ({\bar{x}}_{A}^{T} β^{A} + {\bar{e}}_{A} - {\bar{x}}_{A}^{T} {\hat{β}}_{SPAC - Lasso}^{A}) \\ - \sqrt{n} ({\bar{x}}_{B}^{T} β^{B} + {\bar{e}}_{B} - {\bar{x}}_{B}^{T} {\hat{β}}_{SPAC - Lasso}^{B}) \\ = & \underset{I_{1}}{\underset{︸}{\sqrt{n} ({\bar{e}}_{A} - {\bar{e}}_{B})}} - \underset{I_{2}}{\underset{︸}{\sqrt{n} ({\bar{x}}_{A}^{T} h^{A} - {\bar{x}}_{B}^{T} h^{B})}}, \end{matrix}

(10)

where

h^{A} = {\hat{β}}_{SPAC - Lasso}^{A} - β^{A}

and

h^{B} = {\hat{β}}_{SPAC - Lasso}^{B} - β^{B}

,

{\bar{e}}_{A} = n_{A}^{- 1} \sum_{i \in A} e_{i}^{A}

,

{\bar{e}}_{B} = n_{B}^{- 1} \sum_{i \in B} e_{i}^{B}

. Combining the Theorem 1 in [21] and replacing a and b with

e^{A}

and

e^{B}

, we have

I_{1} \overset{L}{⟶} N (0, σ^{2})

, where

σ^{2}

is defined in Theorem 1.

By using the Hölder inequality, we have

|{\bar{x}}_{A}^{T} h^{A}| \leq ∥ {\bar{x}}_{A} ∥_{\infty} {∥ h^{A} ∥}_{1} .

Invoking Lemma 1 in [13] and conditions (C1)–(C2), we have

\begin{matrix} ∥ {\bar{x}}_{A} ∥_{\infty} = O_{p} (\sqrt{\frac{\log p}{n}}) . \end{matrix}

(11)

According to (5), we obtain that

\begin{matrix} h^{A} & = {\hat{β}}_{SPAC - Lasso}^{A} - β^{A} = \hat{D} {\hat{γ}}_{SPAC - Lasso}^{A} - D γ^{A} \\ = \hat{D} ({\hat{γ}}_{SPAC - Lasso}^{A} - γ^{A}) + (\hat{D} - D) γ^{A} \\ = \hat{D} h_{γ}^{A} + (\hat{D} - D) γ^{A}, \end{matrix}

(12)

where

h_{γ}^{A} = {\hat{γ}}_{SPAC - Lasso}^{A} - γ^{A}

,

D = diag \{d_{11}^{1 / 2}, \dots, d_{p p}^{1 / 2}\}

.

Using Lemma A3 in the following Appendix A, we have

\begin{matrix} ∥ h_{γ}^{A} ∥_{1} = o_{p} (\frac{1}{\sqrt{\log p}}) . \end{matrix}

Together with (12) and conditions (C3)–(C4), we have

∥ h^{A} ∥_{1} = o_{p} (\frac{1}{\sqrt{\log p}})

. Then,

\sqrt{n} {\bar{x}}_{A}^{T} h^{A} = \sqrt{n} \cdot O_{p} (\sqrt{\frac{\log p}{n}}) \cdot o_{p} (\frac{1}{\sqrt{\log p}}) = o_{p} (1) .

Similarly, we can obtain that

\sqrt{n} {\bar{x}}_{B}^{T} h^{B} = o_{p} (1)

. Hence, we have

I_{2} = o_{p} (1)

. This completes the proof of Theorem 1. □

3. Simulation Studies

In this section, the performance of the proposed SPAC-Lasso, SPAC-SCAD adjustment estimators are evaluated, and compared with those of the unadjusted estimator (unadj) and the penalty-based regression adjustment estimators (Lasso, SCAD, Enet). The R package “glmnet” is used to solve the problems of the Elastic-net and Lasso. To implement the SCAD and SPAC-SCAD methods,

a = 3.7

is chosen and the R package “ncvreg” is used ([3]). In addition, the estimation of precision matrix is implemented by the R package “fastclime” of [30]. For each regression adjustment method, the tuning parameter is selected by the 10-fold cross-validation. The results are based on 2000 repeated simulations.

The potential outcomes are generated as follows,

\begin{matrix} Y_{i} (1) = & \sum_{j = 1}^{p} x_{i j} β_{j} + z + e_{i}^{A}, i = 1, \dots, n, \\ Y_{i} (0) = & \sum_{j = 1}^{p} x_{i j} β_{j} + e_{i}^{B}, i = 1, \dots, n, \end{matrix}

where

n = 250

,

p = 500, 1000

and 2000,

z \sim U (0, 2)

,

β = {(β_{1}, . . ., β_{p})}^{T}

is the coefficient vector, error terms

e_{i}^{A}

and

e_{i}^{B}

are i.i.d generated from

N (0, 1)

. The covariates vector

x_{i} = {(x_{i 1}, . . ., x_{i p})}^{T}

is drawn from a multivariate normal distribution

N (0_{p \times 1}, Σ_{p \times p})

, and the covariance matrix

Σ_{p \times p}

has the following block-exchangeable structure,

\begin{matrix} Σ_{p \times p} = (\begin{matrix} Σ_{q \times q}^{11} & Σ_{q \times (p - q)}^{12} \\ {(Σ_{q \times (p - q)}^{12})}^{T} & Σ_{(p - q) \times (p - q)}^{22} \end{matrix}), \end{matrix}

where q is the number of the non-zero elements, and

\begin{matrix} {(Σ^{11})}_{s, j} = \{\begin{matrix} 1 & s = j \\ α_{1} & s \neq j \end{matrix}, {(Σ^{12})}_{s, j} = α_{2}, {(Σ^{22})}_{s, j} = \{\begin{matrix} 1 & s = j \\ α_{3} & s \neq j \end{matrix} . \end{matrix}

Here, the parameter vector

α = (α_{1}, α_{2}, α_{3})

measures the correlations of covariates. To explore the effect of the correlations of covariates, we consider three different choices of

α

as

α = (0.1, 0.3, 0.8)

,

(0.2, 0.5, 0.9)

and

(0.5, 0.7, 0.9)

. For the coefficient vector

β

, the first q coefficients take the nonzero values while the remaining

p - q

elements are set to zero. In this simulation, we set

q = 9

and let

\begin{matrix} β = {(1, 1, 1, 1.5, 1.5, 1.5, 2, 2, 2, 0, \dots, 0)}^{T} . \end{matrix}

From the generated data, we randomly assign

n_{A} = 125

units to the treatment group A and the remaining

n_{B} = n - n_{A} = 125

units to the control group B.

To assess the finite-sample performance of our proposed SPAC adjustment method (SPAC-Lasso, SPAC-SCAD), we compute the |Bias|, the standard deviations (SD) and the root-mean square errors (RMSE) of each estimator. In our simulation studies, |Bias| represents absolute difference between the estimated ATE and the true ATE. The numerical results are shown in Table 1.

From the results in Table 1, we observe the following results.

(1) When

α = (0.1, 0.3, 0.8)

, our proposed SPAC methods (SPAC-Lasso, SPAC-SCAD) outperform the unadj method in terms of SDs and RMSEs, and have similar performance with Lasso, SCAD and Enet. Specifically, the SPAC adjustment method reduces the RMSE of unadjusted estimator (unadj) by 86–88%.

(2) As the correlations of covariates

α

increase, the superiority of the proposed SPAC adjustment method becomes obvious. For example, when

p = 2000

and

α = (0.5, 0.7, 0.9)

, the RMSEs of SPAC-Lasso and SPAC-SCAD are

39 %

and

45 %

smaller than those of the Lasso and SCAD, respectively.

The variable selection performance is assessed by the mean number of selected nonzero coefficients (S), the false negative rate (FNR), and false positive rate (FPR). The FNR and FPR are defined as

\begin{matrix} FNR : \frac{\sum_{j = 1}^{p} I ({\hat{β}}_{j} = 0, β_{j} \neq 0)}{\sum_{j = 1}^{p} I (β_{j} \neq 0)}, FPR : \frac{\sum_{j = 1}^{p} I ({\hat{β}}_{j} \neq 0, β_{j} = 0)}{\sum_{j = 1}^{p} I (β_{j} = 0)}, \end{matrix}

where

I (\cdot)

is the indicator function. The FNR and FPR indicate the proportion of important covariates which are not selected and the proportion of selected unimportant covariates, respectively. The smaller false rates (FNR and FPR) indicate a better performance for variable selection. The variable selection results are listed in the following Table 2.

From Table 2, we obtain the following results.

(1) When the important covariates and unimportant covariates are weakly correlated

α_{2} = 0.3

of

α = (0.1, 0.3, 0.8)

, the SCAD and our proposed SPAC-Lasso, SPAC-SCAD adjustment methods perform well in terms of S, FNRs and FPRs, where the false rates (FNRs and FPRs) are nearly 0. In comparison, the proportions of the selected unimportant variables (FPR) of Lasso and Enet are relatively large, which is also reflected in the number of selected nonzero elements (S).

(2) When the correlations of covariates increase, the proposed SPAC adjustment method (SPAC-Lasso and SPAC-SCAD) has satisfactory performance, while the existing penalty-based regression adjustment methods (Lasso, SCAD, Enet) perform badly. The mean numbers of the selected nonzero coefficients (S) of our proposed method are close to the number of true nonzero elements 9, but the existing adjusted methods fail to correctly select the nonzero and zero coefficients (relatively large FNRs and FPRs). For example, when

α = (0.5, 0.7, 0.9)

, the proportions of important covariates which are not selected (FNR) of SCAD exceed 0.819, while the largest FNR of SPAC-SCAD is only 0.007.

To further assess the performance of our proposed SPAC adjustment method, we calculate the mean of variance estimates (MVE) for

σ

in Theorem 1, the mean coverage probability (MCP) and mean interval length (MIL) of the 95% confidence intervals

[\hat{τ} - Z_{0.975} \cdot \hat{σ} / \sqrt{n}, \hat{τ} + Z_{0.975} \cdot \hat{σ} / \sqrt{n}]

, where

Z_{α}

is the

α

quantile of the standard normal distribution. We then compare the results of proposed method with those of the existing unadjusted (unadj) and penalty-based regression adjustment methods (Lasso, SCAD, Enet) in Table 3.

For the unadjusted method, the variance estimator is defined as

\begin{matrix} {\hat{σ}}_{unadj}^{2} = \frac{n}{n_{A}} \cdot \frac{1}{n_{A} - 1} \sum_{i \in A} {\{Y_{i} (1) - {\bar{Y}}_{A}\}}^{2} + \frac{n}{n_{B}} \cdot \frac{1}{n_{B} - 1} \sum_{i \in B} {\{Y_{i} (1) - {\bar{Y}}_{B}\}}^{2} . \end{matrix}

For the adjustment methods (SPAC-Lasso, SPAC-SCAD, Lasso, SCAD, Enet), we give the following Neyman-type conservative estimate of the variance

σ^{2}

, which is similar to that in [12,13].

\begin{matrix} {\hat{σ}}^{2} = \frac{n}{n_{A}} {\hat{σ}}_{e^{A}}^{2} + \frac{n}{n_{B}} {\hat{σ}}_{e^{B}}^{2}, \end{matrix}

where

\begin{matrix} {\hat{σ}}_{e^{A}}^{2} = \frac{1}{n_{A} - {d f}^{A}} \sum_{i \in A} {\{Y_{i} (1) - {\bar{Y}}_{A} - {(x_{i} - {\bar{x}}_{A})}^{T} {\hat{β}}^{A}\}}^{2}, \\ {\hat{σ}}_{e^{B}}^{2} = \frac{1}{n_{B} - {d f}^{B}} \sum_{i \in B} {\{Y_{i} (0) - {\bar{Y}}_{B} - {(x_{i} - {\bar{x}}_{B})}^{T} {\hat{β}}^{B}\}}^{2}, \end{matrix}

and

d f^{A} = {∥ {\hat{β}}^{A} ∥}_{0} + 1

and

d f^{B} = {∥ {\hat{β}}^{B} ∥}_{0} + 1

are degrees of freedom for treatment and control groups, respectively.

{\hat{β}}^{A}

and

{\hat{β}}^{B}

are estimated adjustment vectors and obtained by different penalties (Lasso, SCAD, Enet) and SPAC methods (SPAC-Lasso, SPAC-SCAD) in (4) and (7).

From Table 3, we observe that:

(1) When

α = (0.1, 0.3, 0.8)

, the proposed SPAC adjustment method (SPAC-Lasso, SPAC-SCAD) performs better than the unadjusted (unadj) method, and performs similarly to the penalty-based adjustment methods (Lasso, SCAD, Enet) in terms of MVE, MCP and MIL.

(2) When the important and unimportanr covariates are highly correlated, the coverage probabilities of proposed SPAC adjustment method (SPAC-Lasso, SPAC-SCAD) are higher than those of the unadj, Lasso, SCAD and Enet methods, the MVE-values of SPAC-Lasso and SPAC-SCAD are smaller than those of the other methods. For example, when

α = (0.5, 0.7, 0.9)

, the MCPs of Lasso, SCAD and Enet methods are uniformly below 0.950, while the MCP-values of SPAC-Lasso and SPAC-SCAD are aound 0.980.

(3) The mean interval lengths (MILs) of the SPAC-Lasso and SPAC-SCAD are shorter than those of the unadj, Lasso, SCAD and Enet. Particularly, when

α = (0.5, 0.7, 0.9)

, the MIL-values of SPAC-Lasso and SPAC-SCAD are 10–14% and 38–46% shorter than those of the Lasso and SCAD, respectively.

4. A Real Data Analysis

In the clinic, the human epidermal growth factor receptor type 2 (HER2) is considered as an important indicator in the classification of the breast cancer. Overexpression or amplification of HER2 (HER2+) might account for around 20% of early breast cancers. As a monoclonal antibody, trastuzumab (also known as Herceptin) has been shown to improve the event-free survival rate and the results of chemotherapy in patients with HER2+ breast cancer ([31]).

In this section, we shall consider the estimation problem of the average treatment effect (trastuzumab) and apply the proposed SPAC adjustment method to the dataset based on a NeoAdjuvant Herceptin (NOAH) randomized clinical trial. The dataset was originally demonstrated in [31] and collected in the Gene Expression Omnibus (GSE50948), and further studied by Refs. [13,32,33].

There were

n = 156

patients in the trail: 63 patients received trastuzumab and neoadjuvant chemotherapy (treatment group,

T_{i} = 1

) and 93 patients received neoadjuvant chemotherapy alone (control group,

T_{i} = 0

),

i = 1, \dots, n

. The pathological complete response (pCR) was measured by the absence of residual invasive breast cancer and viewed as the potential outcome variable

Y_{i}

. For each patient, 54,675 gene probes were observed and regarded as the covariates.

The dimension of covariates

p = 54,675

is much larger than the sample size

n = 156

, we first apply the sure independence screening (SIS) method proposed in [1] to exclude some insignificant variables and reduce the dimension p to a suitable size. Following the suggestions of [13,34], the genes with little variation in intensity (i.e., for j-th gene satisfies

\max (X_{j}) - \min (X_{j}) \leq k

with a given value k) are also removed. Then,

p_{*} = 2573

gene probes are retained. Based on the dataset, we apply six methods (unadj, Lasso, SCAD, Enet and our proposed SPAC-Lasso and SPAC-SCAD) to estimate ATE. The tuning parameters of five regression adjustment methods (Lasso, SCAD, Enet, SPAC-Lasso and SPAC-SCAD) are chosen by 10-fold cross validation. For each method, we calculate the ATE estimator (

\hat{τ}

), the number of the selected nonzero coefficients (S), asymptotic variance estimator (

\hat{σ}

) and

95 %

confidence interval length (L). The numerical results are presented in Table 4.

The results in Table 4 show that all the ATE estimates are around 0.250. Combing this with that in [13,31,32], the trastuzumab indeed alleviates the patient’s conditions and improve the prognosis. In addition, we find that the numbers of covariates selected by the SPAC adjustment method (SPAC-Lasso, SPAC-SCAD) are less than those selected by Lasso, SCAD and Enet, which is consistent with the discovery in the simulation studies. The estimated asymptotic variances (

\hat{σ}

) and 95% confidence interval lengths (L) of SPAC-Lasso and SPAC-SCAD are smallest. Specially, the values of

\hat{σ}

of SPAC-Lasso and SPAC-SCAD are 11% and 14% smaller than those of the Lasso and SCAD, respectively, which implies that our proposed SPAC adjustment method can improve the asymptotic performance of the existing unadjusted and penalty-based regression adjustment methods.

5. Conclusions

In this paper, we studied the estimation problem of ATE for Rubin causal model when the covariates are highly correlated. We proposed the SPAC adjustment method (SPAC-Lasso, SPAC-SCAD) for ATE by combining SPAC variable selection method, Lasso and SCAD penalty functions, and regression adjustment technique, which is an extension for the SPAC method of high-dimensional regression models. In theory, we showed the asymptotic normality of the proposed SPAC-Lasso adjustment estimator under some regularity conditions. By some simulation studies and a real data analysis, we showed the advantages of our proposed method in terms of estimating average treatment effect and selecting the important covariates. Thus, the proposed SPAC adjustment method can improve the estimation accuracy for the Rubin causal model with highly correlated covariates.

Author Contributions

Conceptualization, G.L.; Methodology, L.Y.; Validation, F.Z.; Formal analysis, F.Z.; Investigation, Z.D.; Data curation, Z.D.; Writing—original draft preparation, Z.D.; Writing—review and editing, L.Y.; Supervision, L.Y.; Project administration, G.L.; Funding acquisition, G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Nos. 12001277, 12271046, 12131006 and 11971001), the National Social Science Foundation of China (No. 21BTJ030), the Tianjin Natural Science Foundation (No. 19JCZDJC32300).

Data Availability Statement

Not applicable.

Acknowledgments

The authors are grateful to the editor, the associate editor and the three anonymous referees for the constructive comments and suggestions that led to significant improvement of an early manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Some Lemmas and Their Proofs

This Section will provide three Lemmas that are needed for the proof of Theorem 1. We will drop the superscript on

h_{γ}

, e,

γ

and

\hat{γ}

and focus on the proof for treatment group A, as the same analysis can be applied to control group B.

Lemma A1.

Let

M_{1} = \{∥ \frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} (e_{i} - {\bar{e}}_{A}) ∥_{\infty} + ∥ \frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} {(x_{i} - {\bar{x}}_{A})}^{T} (D - \hat{D}) γ ∥_{\infty} \leq η λ_{A}\}

, where

{\bar{e}}_{A} = n_{A}^{- 1} \sum_{i \in A} e_{i}^{A}

. Suppose that regularity conditions (C1)–(C7) hold, then

P (M_{1}) \geq 1 - \frac{2}{p} .

Proof.

Recalling

{\tilde{X}}_{i}

, we have

\begin{matrix} \frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} (e_{i} - {\bar{e}}_{A}) & = \frac{1}{n_{A}} \sum_{i \in A} {\hat{D}}^{- 1} (x_{i} - {\bar{x}}_{A}) (e_{i} - {\bar{e}}_{A}) \\ = {\hat{D}}^{- 1} \cdot (\frac{1}{n_{A}} \sum_{i \in A} x_{i} e_{i} - {\bar{x}}_{A} {\bar{e}}_{A}) . \end{matrix}

By the condition (C3) and the sufficient accuracy of CLIME estimator

{\hat{d}}_{j j}

, there exists constants

L_{0}

and

L_{1}

such that for a sufficiently large n,

\begin{matrix} L_{0} \leq d_{11}, \dots, d_{p p}, {\hat{d}}_{11}, \dots, {\hat{d}}_{p p} \leq L_{1}, \end{matrix}

then combined with the triangle inequality, we have

\begin{matrix} ∥ \frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} (e_{i} - {\bar{e}}_{A}) ∥_{\infty} & = \max_{1 \leq j \leq p} {\hat{d}}_{j j}^{- 1 / 2} (\frac{1}{n_{A}} \sum_{i \in A} x_{i j} e_{i} - {\bar{x}}_{A_{j}} {\bar{e}}_{A}) \\ \leq L_{0}^{- 1 / 2} {∥\frac{1}{n_{A}} \sum_{i \in A} x_{i} e_{i} - {\bar{x}}_{A} {\bar{e}}_{A}∥}_{\infty} \\ \leq L_{0}^{- 1 / 2} \underset{J_{1}}{\underset{︸}{{∥\frac{1}{n_{A}} \sum_{i \in A} x_{i} e_{i}∥}_{\infty}}} + L_{0}^{- 1 / 2} \underset{J_{2}}{\underset{︸}{{∥{\bar{x}}_{A} {\bar{e}}_{A}∥}_{\infty}}}, \end{matrix}

(A1)

where

{\bar{x}}_{A_{j}}

is the j-th element of

{\bar{x}}_{A}

.

For the first term

J_{1}

in (A1), we have

\begin{matrix} J_{1} & \leq {∥\frac{1}{n_{A}} \sum_{i \in A} x_{i} e_{i} - \frac{1}{n} \sum_{i = 1}^{n} x_{i} e_{i}∥}_{\infty} + {∥\frac{1}{n} \sum_{i = 1}^{n} x_{i} e_{i}∥}_{\infty} \\ \leq {∥\frac{1}{n_{A}} \sum_{i \in A} x_{i} e_{i} - \frac{1}{n} \sum_{i = 1}^{n} x_{i} e_{i}∥}_{\infty} + δ_{n}, \end{matrix}

where

δ_{n}

is defined in condition (C5). By condition (C2) and Cauchy-Schwarz inequality, we have

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} x_{i j}^{2} e_{i}^{2} \leq {(\frac{1}{n} \sum_{i = 1}^{n} x_{i j}^{4})}^{1 / 2} {(\frac{1}{n} \sum_{i = 1}^{n} e_{i}^{4})}^{1 / 2} \leq L . \end{matrix}

Using Lemma S1 in [12], we can show that

\begin{matrix} P (∥ \frac{1}{n_{A}} \sum_{i \in A} x_{i} e_{i} - \frac{1}{n} \sum_{i = 1}^{n} x_{i} e_{i} ∥_{\infty} > t_{n}) \leq 2 \exp \{\log p - \frac{n_{A} {\tilde{p}}_{A} t_{n}^{2}}{{(1 + ν)}^{2} L}\} = 2 \exp {- \log p} = \frac{2}{p}, \end{matrix}

where

t_{n} = (1 + ν) L^{1 / 2} {\tilde{p}}_{A}^{- 1} \sqrt{\frac{2 \log p}{n}}

. Hence,

\begin{matrix} P (J_{1} \leq t_{n} + δ_{n}) \geq 1 - \frac{2}{p} . \end{matrix}

(A2)

For the second term

J_{2}

in (A1), using the condition (C2) and Lemma 1 in [13], we have

\begin{matrix} P (∥ {\bar{x}}_{A} {\bar{e}}_{A} ∥_{\infty} \leq \frac{(1 + ν) L^{1 / 2}}{{\tilde{p}}_{A}} \sqrt{\frac{2 \log p}{n}}) \geq 1 - \frac{2}{p} . \end{matrix}

(A3)

Together with (A2) and (A3), it is easy to see that

\begin{matrix} P (∥ \frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} (e_{i} - {\bar{e}}_{A}) ∥_{\infty} \leq L_{0}^{- 1 / 2} (2 t_{n} + δ_{n})) \geq 1 - \frac{2}{p} . \end{matrix}

(A4)

Due to

{\tilde{X}}_{i} = {\hat{D}}^{- 1} (x_{i} - {\bar{x}}_{A})

, we have

\begin{matrix} ∥ \frac{1}{n_{A}} \sum_{i \in A} {\hat{D}}^{- 1} (x_{i} - {\bar{x}}_{A}) {(x_{i} - {\bar{x}}_{A})}^{T} (D - \hat{D}) γ ∥_{\infty} \\ \leq & \frac{1}{\sqrt{L_{0}}} O_{p} (M_{1} \sqrt{\frac{\log p}{n}}) \cdot ∥ \frac{1}{n_{A}} \sum_{i \in A} (x_{i} - {\bar{x}}_{A}) {(x_{i} - {\bar{x}}_{A})}^{T} γ ∥_{\infty} \\ = & \frac{1}{\sqrt{L_{0}}} O_{p} (M_{1} \sqrt{\frac{\log p}{n}}) \cdot ∥ \frac{1}{n_{A}} \sum_{i \in A} x_{i} x_{i}^{T} γ - {\bar{x}}_{A} {\bar{x}}_{A}^{T} γ ∥_{\infty} \\ \leq & \frac{1}{\sqrt{L_{0}}} O_{p} (M_{1} \sqrt{\frac{\log p}{n}}) \cdot (∥ \frac{1}{n_{A}} \sum_{i \in A} x_{i} x_{i}^{T} γ ∥_{\infty} + ∥ {\bar{x}}_{A} {\bar{x}}_{A}^{T} γ ∥_{\infty}) \\ \leq & \frac{1}{\sqrt{L_{0}}} O_{p} (M_{1} \sqrt{\frac{\log p}{n}}) \\ \cdot (∥ (\frac{1}{n_{A}} \sum_{i \in A} x_{i} x_{i}^{T} - \frac{1}{n} \sum_{i = 1}^{n} x_{i} x_{i}^{T}) γ ∥_{\infty} + ∥ \frac{1}{n} \sum_{i = 1}^{n} x_{i} x_{i}^{T} γ ∥_{\infty} + ∥ {\bar{x}}_{A} {\bar{x}}_{A}^{T} γ ∥_{\infty}) \\ \leq & \frac{1}{\sqrt{L_{0}}} O_{p} (M_{1} \sqrt{\frac{\log p}{n}}) \\ \cdot (∥ \frac{1}{n_{A}} \sum_{i \in A} x_{i} x_{i}^{T} - \frac{1}{n} \sum_{i = 1}^{n} x_{i} x_{i}^{T} ∥_{\infty} \cdot {∥ γ ∥}_{1} + {∥ γ ∥}_{1} + ∥ {\bar{x}}_{A} {\bar{x}}_{A}^{T} ∥_{\infty} \cdot {∥ γ ∥}_{1}), \end{matrix}

(A5)

where

M_{1} > 0

. By Cauchy-Schwarz inequality and condition (C2), we have

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} x_{i j}^{2} x_{i k}^{2} \leq {(\frac{1}{n} \sum_{i = 1}^{n} x_{i j}^{4})}^{\frac{1}{2}} {(\frac{1}{n} \sum_{i = 1}^{n} x_{i k}^{4})}^{\frac{1}{2}} \leq L . \end{matrix}

Combined with Lemma S1 in [12], we have

\begin{matrix} P (∥ \frac{1}{n_{A}} \sum_{i \in A} x_{i} x_{i}^{T} - \frac{1}{n} \sum_{i = 1}^{n} x_{i} x_{i}^{T} ∥_{\infty} \geq \frac{(1 + ν) L^{1 / 2}}{{\tilde{p}}_{A}} \sqrt{\frac{3 \log p}{n}}) \leq \frac{2}{p} . \end{matrix}

By Lemma 1 in [13], we have

\begin{matrix} ∥ {\bar{x}}_{A} {\bar{x}}_{A}^{T} ∥_{\infty} \leq {(∥ {\bar{x}}_{A} ∥_{\infty})}^{2} = o_{p} (\sqrt{\frac{\log p}{n}}) . \end{matrix}

Recall the definition for SPAC and condition (C4), we have

\begin{matrix} {∥ γ ∥}_{1} = \sum_{j = 1}^{p} |\frac{β_{j}}{\sqrt{d_{j j}}}| \leq \frac{1}{\sqrt{L_{0}}} \sum_{j = 1}^{p} | β_{j} | \leq \frac{B}{\sqrt{L_{0}}} . \end{matrix}

Together the above results, we have

\begin{matrix} I_{2} = ∥ \frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} {(x_{i} - {\bar{x}}_{A})}^{T} (D - \hat{D}) γ ∥_{\infty} = O_{p} (\sqrt{\frac{\log p}{n}}) . \end{matrix}

(A6)

By (A4), (A6) and condition (C7), we have

P (∥ \frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} (e_{i} - {\bar{e}}_{A}) ∥_{\infty} + ∥ \frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} {(x_{i} - {\bar{x}}_{A})}^{T} (D - \hat{D}) γ ∥_{\infty} \leq η λ_{A}) \geq 1 - \frac{2}{p} .

Then the proof is finished. □

Lemma A2.

Let

M_{2} = \{{∥\frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} {\tilde{X}}_{i}^{T} - \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}^{- 1} x_{i} x_{i}^{T} {\hat{D}}^{- 1}∥}_{\infty} \leq C_{1} \sqrt{\frac{\log p}{n}}\}

and

C_{1} = 2 (1 + ν) L^{1 / 2} {({\tilde{p}}_{A} L_{0})}^{- 1}

. Suppose that regularity conditions (C1)–(C3) hold, then

P (M_{2}) \geq 1 - \frac{2}{p} .

Proof.

From the definition of

{\tilde{X}}_{i}

in (A9), we have

\begin{matrix} {∥\frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} {\tilde{X}}_{i}^{T} - \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}^{- 1} x_{i} x_{i}^{T} {\hat{D}}^{- 1}∥}_{\infty} & \leq \frac{1}{L_{0}} {∥\frac{1}{n_{A}} \sum_{i \in A} x_{i} x_{i}^{T} - \frac{1}{n} \sum_{i = 1}^{n} x_{i} x_{i}^{T} - {\bar{x}}_{A} {\bar{x}}_{A}^{T}∥}_{\infty} \\ \leq \frac{1}{L_{0}} \underset{(*)}{\underset{︸}{{∥\frac{1}{n_{A}} \sum_{i \in A} x_{i} x_{i}^{T} - \frac{1}{n} \sum_{i = 1}^{n} x_{i} x_{i}^{T}∥}_{\infty}}} + \frac{1}{L_{0}} \underset{(* *)}{\underset{︸}{∥ {\bar{x}}_{A} {\bar{x}}_{A}^{T} ∥_{\infty}}}, \end{matrix}

(A7)

the last inequation is obtained by triangle inequality.

By Cauchy-Schwarz inequality and condition (C2), we have

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} x_{i j}^{2} x_{i k}^{2} \leq {(\frac{1}{n} \sum_{i = 1}^{n} x_{i j}^{4})}^{1 / 2} {(\frac{1}{n} \sum_{i = 1}^{n} x_{i k}^{4})}^{1 / 2} \leq L . \end{matrix}

Invoking Lemma S1 in [12] and

n_{A} / n = {\tilde{p}}_{A}

in condition (C1), we can bound the first term (∗) in (A7) as follows,

\begin{matrix} P ({∥\frac{1}{n_{A}} \sum_{i \in A} x_{i} x_{i}^{T} - \frac{1}{n} \sum_{i = 1}^{n} x_{i} x_{i}^{T}∥}_{\infty} \geq \frac{(1 + ν) L^{1 / 2}}{{\tilde{p}}_{A}} \sqrt{\frac{3 \log p}{n}}) \leq 2 \exp {- \log p} = \frac{2}{p} . \end{matrix}

(A8)

For the second term (∗∗) in (A7), we have

\begin{matrix} {∥{\bar{x}}_{A} {\bar{x}}_{A}^{T}∥}_{\infty} \leq {∥ {\bar{x}}_{A} ∥}_{\infty}^{2} = o_{p} (\sqrt{\frac{\log p}{n}}) . \end{matrix}

Together with (A8), we have

\begin{matrix} P ({∥\frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} {\tilde{X}}_{i}^{T} - \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}^{- 1} x_{i} x_{i}^{T} {\hat{D}}^{- 1}∥}_{\infty} \leq C_{1} \sqrt{\frac{\log p}{n}}) \geq 1 - \frac{2}{p} . \end{matrix}

Then the proof is finished. □

Lemma A3.

Suppose that regularity conditions (C1)–(C7) hold, then

\begin{matrix} ∥ h_{γ} ∥_{1} = o_{p} (\frac{1}{\sqrt{\log p}}), \end{matrix}

where

h_{γ} = {\hat{γ}}_{SPAC - Lasso} - γ

.

Proof.

Note the SPAC estimator

{\hat{γ}}_{SPAC - Lasso}

is defined by

\begin{matrix} {\hat{γ}}_{SPAC - Lasso} = \underset{γ}{argmin} [\frac{1}{2 n_{A}} \sum_{i \in A} {\{Y_{i} (1) - {\bar{Y}}_{A} - {(x_{i} - {\bar{x}}_{A})}^{T} \hat{D} γ\}}^{2} + λ_{A} \sum_{j = 1}^{p} {\hat{d}}_{j j} | γ_{j} |], \end{matrix}

which can be rewritten as

\begin{matrix} {\hat{γ}}_{*} = \underset{γ_{*}}{argmin} [\frac{1}{2 n_{A}} \sum_{i \in A} {\{Y_{i} (1) - {\bar{Y}}_{A} - {\tilde{X}}_{i}^{T} γ_{*}\}}^{2} + λ_{A} \sum_{j = 1}^{p} | γ_{*_{j}} |], \end{matrix}

(A9)

where

{\tilde{X}}_{i} = {\hat{D}}^{- 1} (x_{i} - {\bar{x}}_{A})

,

γ_{*} = {\hat{D}}^{2} γ

.

Then, the Karush-Kuhn-Tucker (KKT) condition for

{\hat{γ}}^{*}

is

\begin{matrix} \frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} \{Y_{i} (1) - {\bar{Y}}_{A} - {\tilde{X}}_{i}^{T} {\hat{γ}}_{*}\} = λ_{A} κ, \end{matrix}

(A10)

where

κ

is the subgradient of

∥ γ_{*} ∥_{1}

at

γ_{*} = {\hat{γ}}_{*}

, that is

\begin{matrix} κ \in \partial ∥ γ_{*} ∥_{1} |_{γ_{*} = {\hat{γ}}_{*}} with \{\begin{matrix} κ_{j} \in [- 1, 1], if {\hat{γ}}_{*_{j}} = 0, \\ κ_{j} = sign ({\hat{γ}}_{*_{j}}), otherwise . \end{matrix} \end{matrix}

By the decomposition of

Y_{i} (1)

in (9), we have

\begin{matrix} Y_{i} (1) - {\bar{Y}}_{A} & = {(x_{i} - {\bar{x}}_{A})}^{T} β + e_{i} - {\bar{e}}_{A} \\ = {(x_{i} - {\bar{x}}_{A})}^{T} D γ + e_{i} - {\bar{e}}_{A} \\ = {\tilde{X}}_{i}^{T} γ_{*} + (e_{i} - {\bar{e}}_{A}) + {(x_{i} - {\bar{x}}_{A})}^{T} (D - \hat{D}) γ . \end{matrix}

Hence, (A10) can be expressed as

\begin{matrix} \frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} {\tilde{X}}_{i}^{T} (γ_{*} - {\hat{γ}}_{*}) + \frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} (e_{i} - {\bar{e}}_{A}) + \frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} {(x_{i} - {\bar{x}}_{A})}^{T} (D - \hat{D}) γ = λ_{A} κ, \end{matrix}

(A11)

where

D = diag {d_{11}^{1 / 2}, \dots, d_{p p}^{1 / 2}}

. Premultiplying (A11) by

- h_{γ_{*}}^{T} = {(γ_{*} - {\hat{γ}}_{*})}^{T}

, we have

\begin{matrix} λ_{A} {(γ_{*} - {\hat{γ}}_{*})}^{T} κ = & \frac{1}{n_{A}} \sum_{i \in A} {(γ_{*} - {\hat{γ}}_{*})}^{T} {\tilde{X}}_{i} {\tilde{X}}_{i}^{T} (γ_{*} - {\hat{γ}}_{*}) - \frac{1}{n_{A}} \sum_{i \in A} h_{γ_{*}}^{T} {\tilde{X}}_{i} (e_{i} - {\bar{e}}_{A}) \\ - \frac{1}{n_{A}} \sum_{i \in A} h_{γ_{*}}^{T} {\tilde{X}}_{i} {(x_{i} - {\bar{x}}_{A})}^{T} (D - \hat{D}) γ . \end{matrix}

Then, we have

\begin{matrix} \frac{1}{n_{A}} \sum_{i \in A} {({\tilde{X}}_{i}^{T} h_{γ_{*}})}^{2} \leq & λ_{A} (∥ γ_{*} ∥_{1} - {∥ {\hat{γ}}_{*} ∥}_{1}) + \frac{1}{n_{A}} \sum_{i \in A} h_{γ_{*}}^{T} {\tilde{X}}_{i} (e_{i} - {\bar{e}}_{A}) \\ + \frac{1}{n_{A}} \sum_{i \in A} h_{γ_{*}}^{T} {\tilde{X}}_{i} {(x_{i} - {\bar{x}}_{A})}^{T} (D - \hat{D}) γ . \end{matrix}

Based on Hölder inequality, the above inequation can be written as

\begin{matrix} \frac{1}{n_{A}} \sum_{i \in A} {({\tilde{X}}_{i}^{T} h_{γ_{*}})}^{2} \leq & λ_{A} (∥ γ_{*} ∥_{1} - {∥ {\hat{γ}}_{*} ∥}_{1}) + {∥ h_{γ_{*}} ∥}_{1} ∥ \frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} (e_{i} - {\bar{e}}_{A}) ∥_{\infty} \\ + ∥ h_{γ_{*}} ∥_{1} ∥ \frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} {(x_{i} - {\bar{x}}_{A})}^{T} (D - \hat{D}) γ ∥_{\infty} . \end{matrix}

Using Lemma A1 in the Appendix, we have

\begin{matrix} \frac{1}{n_{A}} \sum_{i \in A} {({\tilde{X}}_{i}^{T} h_{γ_{*}})}^{2} \leq λ_{A} (∥ γ_{*} ∥_{1} - ∥ {\hat{γ}}_{*} ∥_{1} + η {∥ h_{γ_{*}} ∥}_{1}) . \end{matrix}

According to the triangle inequality and the definition of

h_{γ_{*}} = {\hat{γ}}_{*} - γ_{*}

, we have

\begin{matrix} ∥ γ_{*} ∥_{1} - {∥ {\hat{γ}}_{*} ∥}_{1} & \leq 2 {∥γ_{*}^{S^{c}}∥}_{1} + {∥{\hat{γ}}_{*}^{S} - γ_{*}^{S}∥}_{1} - {∥{\hat{γ}}_{*}^{S^{c}} - γ_{*}^{S^{c}}∥}_{1} \\ = {∥h_{γ_{*}}^{S}∥}_{1} - {∥h_{γ_{*}}^{S^{c}}∥}_{1} + 2 {∥γ_{*}^{S^{c}}∥}_{1} . \end{matrix}

Hence,

\begin{matrix} \frac{1}{n_{A}} \sum_{i \in A} {({\tilde{X}}_{i}^{T} h_{γ_{*}})}^{2} \leq & λ_{A} ({∥h_{γ_{*}}^{S}∥}_{1} - {∥h_{γ_{*}}^{S^{c}}∥}_{1} + 2 {∥γ_{*}^{S^{c}}∥}_{1} + η {∥ h_{γ_{*}} ∥}_{1}) \\ \leq & λ_{A} [(η - 1) {∥h_{γ_{*}}^{S^{c}}∥}_{1} + (1 + η) {∥h_{γ_{*}}^{S}∥}_{1} + 2 {∥γ_{*}^{S^{c}}∥}_{1}] . \end{matrix}

Noting that

n_{A}^{- 1} \sum_{i \in A} {({\tilde{X}}_{i}^{T} h_{γ_{*}})}^{2} \geq 0

, and by the definition of

s_{λ}

in Definition 1, we have

\begin{matrix} (1 - η) {∥h_{γ_{*}}^{S^{c}}∥}_{1} \leq (1 + η) {∥h_{γ_{*}}^{S}∥}_{1} + 2 {∥γ_{*}^{S^{c}}∥}_{1} \leq (1 + η) {∥h_{γ_{*}}^{S}∥}_{1} + \frac{2 L_{1}}{\sqrt{L_{0}}} s_{λ} λ_{A} . \end{matrix}

(A12)

Next we will consider two cases for

(1 + η) {∥h_{γ_{*}}^{S}∥}_{1} + 2 L_{1} {L_{0}}^{- 1 / 2} s_{λ} λ_{A}

, respectively.

(i)

(1 + η) {∥h_{γ_{*}}^{S}∥}_{1} + 2 L_{1} {L_{0}}^{- 1 / 2} s_{λ} λ_{A} \geq (1 - η) ξ {∥h_{γ_{*}}^{S}∥}_{1}

. By (A12), we have

\begin{matrix} ∥ h_{γ_{*}} ∥_{1} = {∥h_{γ_{*}}^{S}∥}_{1} + {∥h_{γ_{*}}^{S^{c}}∥}_{1} \leq & {∥h_{γ_{*}}^{S}∥}_{1} + \frac{(1 + η)}{1 - η} {∥h_{γ_{*}}^{S}∥}_{1} + \frac{2 L_{1} s_{λ} λ_{A}}{\sqrt{L_{0}} (1 - η)} \\ = & (\frac{1 + η}{1 - η} + 1) {∥h_{γ_{*}}^{S}∥}_{1} + \frac{2 L_{1} s_{λ} λ_{A}}{\sqrt{L_{0}} (1 - η)} \\ \leq & \frac{2 L_{1} s_{λ} λ_{A}}{\sqrt{L_{0}} (1 - η)} (\frac{2}{(1 - η) ξ - (1 + η)} + 1) . \end{matrix}

Combining the above results with conditions (C5) and (C7), we can show that

s_{λ} λ_{A} = o (\frac{1}{\sqrt{\log p}})

.

(ii)

(1 + η) {∥h_{γ_{*}}^{S}∥}_{1} + 2 L_{1} {L_{0}}^{- 1 / 2} s_{λ} λ_{A} < (1 - η) ξ {∥h_{γ_{*}}^{S}∥}_{1}

. By (A12), we can obtain that

{∥h_{γ_{*}}^{S^{c}}∥}_{1} \leq ξ {∥h_{γ_{*}}^{S}∥}_{1} .

By condition (C6), we have

\begin{matrix} ∥ h_{γ_{*}} ∥_{1} = {∥h_{γ_{*}}^{S}∥}_{1} + {∥h_{γ_{*}}^{S^{c}}∥}_{1} \leq (1 + ξ) {∥h_{γ_{*}}^{S}∥}_{1} \leq (1 + ξ) C_{0} s_{λ} {∥ Σ_{*} h_{γ_{*}} ∥}_{\infty} . \end{matrix}

(A13)

Using (A11) and Lemma A1, and combining with the triangle inequality, we can show that

\begin{matrix} {∥\frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} {\tilde{X}}_{i}^{T} h_{γ_{*}}∥}_{\infty} \\ \leq & {∥\frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} {\tilde{X}}_{i}^{T} h_{γ_{*}} - \frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} (e_{i} - {\bar{e}}_{A}) - \frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} {(x_{i} - {\bar{x}}_{A})}^{T} (D - \hat{D}) γ∥}_{\infty} \\ + {∥\frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} (e_{i} - {\bar{e}}_{A})∥}_{\infty} + {∥\frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} {(x_{i} - {\bar{x}}_{A})}^{T} (D - \hat{D}) γ∥}_{\infty} \\ \leq & (1 + η) λ_{A}, \end{matrix}

(A14)

where the last inequality holds on the set

M_{1}

of Lemma A1. When the events

M_{1}

and

M_{2}

of Lemma A2 hold, we have

\begin{matrix} {∥\frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}^{- 1} x_{i} x_{i}^{T} {\hat{D}}^{- 1} h_{γ_{*}}∥}_{\infty} \\ \leq & {∥\frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}^{- 1} x_{i} x_{i}^{T} {\hat{D}}^{- 1} h_{γ_{*}} - \frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} {\tilde{X}}_{i}^{T} h_{γ_{*}}∥}_{\infty} + {∥\frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} {\tilde{X}}_{i}^{T} h_{γ_{*}}∥}_{\infty} \\ \leq & C_{1} \sqrt{\frac{\log p}{n}} {∥ h_{γ_{*}} ∥}_{1} + {∥\frac{1}{n_{A}} \sum_{i \in A} {\tilde{X}}_{i} {\tilde{X}}_{i}^{T} h_{γ_{*}}∥}_{\infty} . \end{matrix}

By condition (C5) and (A14), we can show that

\begin{matrix} ∥ h_{γ_{*}} ∥_{1} & \leq (1 + ξ) C_{0} (s_{λ} \sqrt{\frac{\log p}{n}} {∥ h_{γ_{*}} ∥}_{1} + (1 + η) s_{λ} λ_{A}) \\ \leq (1 + ξ) C_{0} \{o (1) ∥ h_{γ_{*}} ∥_{1} + (1 + η) s_{λ} λ_{A}\} . \end{matrix}

Hence, we obtain that

∥ h_{γ_{*}} ∥_{1} = o_{p} (\frac{1}{\sqrt{\log p}})

by using the conditions (C5) and (C7).

Combining the cases (i) and (ii), we know that

∥ h_{γ_{*}} ∥_{1} = o_{p} (\frac{1}{\sqrt{\log p}})

holds. According to the definitions of

h_{γ}

,

h_{γ_{*}}

and

{\hat{γ}}_{*}

, we have

\begin{matrix} ∥ h_{γ} ∥_{1} = {∥ {\hat{γ}}_{SPAC - Lasso} - γ ∥}_{1} = o_{p} (\frac{1}{\sqrt{\log p}}) . \end{matrix}

Then the proof is finished. □

References

Fan, J.; Lv, J. Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 2008, 70, 849–911. [Google Scholar] [CrossRef] [PubMed]
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
Rubin, D.B. Estimating causal effects of treatments in randomized and nonrandomized Studies. J. Educ. Psychol. 1974, 66, 688–701. [Google Scholar] [CrossRef]
Neyman, J. On the application of probability theory to agricultural experiments. Essay on principles, section 9. Translation of original 1923 paper, which appeared in roczniki nauk rolniczych. Stat. Sci. 1990, 5, 465–472. [Google Scholar]
Rubin, D.B. Matched Sampling for Causal Effects; Cambridge University Press: New York, NY, USA, 2006. [Google Scholar]
Imbens, G.W.; Rubin, D.B. Causal Inference in Statistics, Social, and Biomedical Sciences: An Introduction; Cambridge University Press: New York, NY, USA, 2015. [Google Scholar]
Belloni, A.; Chernozhukov, V.; Hansen, C. Inference on treatment effects after selection among high-dimensional controls. Rev. Econ. Stud. 2014, 81, 608–650. [Google Scholar] [CrossRef]
Belloni, A.; Chernozhukov, V.; Fernández-Val, I.; Hansen, C. Program evaluation and causal inference with high-dimensional data. Econometrica 2017, 85, 233–298. [Google Scholar] [CrossRef]
Wager, S.; Du, W.F.; Taylor, J.; Tibshirani, R. High-dimensional regression adjustments in randomized experiments. Proc. Natl. Acad. Sci. USA 2016, 113, 12673–12678. [Google Scholar] [CrossRef]
Bloniarz, A.; Liu, H.Z.; Zhang, C.H.; Sekhon, J.S.; Yu, B. Lasso adjustments of treatment effect estimates in randomized experiments. Proc. Natl. Acad. Sci. USA 2016, 113, 7383–7390. [Google Scholar] [CrossRef]
Yue, L.L.; Li, G.R.; Lian, H.; Wan, X. Regression adjustment for treatment effect with multicollinearity in high dimensions. Comput. Stat. Data Anal. 2019, 134, 17–35. [Google Scholar] [CrossRef]
Wang, H.; Lengerich, B.J.; Aragam, B.; Xing, E.P. Precision Lasso: Accounting for correlations and linear dependencies in high-dimensional genomic data. Bioinformatics 2019, 35, 1181–1187. [Google Scholar] [CrossRef] [PubMed]
Zhu, W.; Lévy-Leduc, C.; Ternès, N. A variable selection approach for highly correlated predictors in high-dimensional genomic data. Bioinformatics 2021, 37, 2238–2244. [Google Scholar] [CrossRef] [PubMed]
Zhao, P.; Yu, B. On model selection consistency of Lasso. J. Mach. Learn. Res. 2006, 7, 2541–2563. [Google Scholar]
Bühlmann, P.; Kalisch, M.; Maathuis, M.H. Variable selection in high-dimensional linear models: Partially faithful distributions and the PC-simple algorithm. Biometrika 2010, 97, 261–278. [Google Scholar] [CrossRef]
Fan, J.; Shao, Q.M.; Zhou, W.X. Are discoveries spurious Distributions of maximum spurious correlations and their applications. Ann. Stat. 2018, 46, 989–1017. [Google Scholar] [CrossRef]
Xue, F.; Qu, A. Semi-Standard partial covariance variable selection when irrepresentable conditions fail. Stat. Sin. 2022, 32, 1881–1909. [Google Scholar] [CrossRef]
Imbens, G.W. Nonparametric estimation of average treatment effects under exogeneity: A review. Rev. Econ. Stat. 2004, 86, 4–29. [Google Scholar] [CrossRef]
Freedman, D.A. On regression adjustments in experiments with several treatments. Ann. Appl. Stat. 2008, 2, 176–196. [Google Scholar] [CrossRef]
Lin, W. Agnostic notes on regression adjustments to experimental data: Reexamining Freedman’s critique. Ann. Appl. Stat. 2013, 7, 295–318. [Google Scholar] [CrossRef]
Lauritzen, S.L. Graphical Models; Clarendon Press: Oxford, UK, 1996. [Google Scholar]
Raveh, A. On the use of the inverse of the correlation matrix in multivariate data analysis. Am. Stat. 1985, 39, 39–42. [Google Scholar]
Cai, T.; Liu, W.; Luo, X. A constrained L₁ minimization approach to sparse precision matrix estimation. J. Am. Stat. Assoc. 2011, 106, 594–607. [Google Scholar] [CrossRef]
Balmand, S.; Dalalyan, A.S. On estimation of the diagonal elements of a sparse precision matrix. Electron. J. Stat. 2016, 10, 1551–1579. [Google Scholar] [CrossRef]
Avella-Medina, M.; Battey, H.S.; Fan, J.; Li, Q. Robust estimation of high-dimensional covariance and precision matrices. Biometrika 2018, 105, 271–284. [Google Scholar] [CrossRef] [PubMed]
Fan, J.; Peng, H. Nonconcave penalized likelihood with a diverging number of parameters. Ann. Stat. 2004, 32, 928–961. [Google Scholar] [CrossRef]
Blazère, M.; Loubes, J.M.; Gamboa, F. Oracle inequalities for a group Lasso procedure applied to generalized linear models in high dimension. IEEE Trans. Inf. Theory 2014, 60, 2303–2318. [Google Scholar] [CrossRef]
Pang, H.; Liu, H.; Vanderbei, R.J. The fastclime package for linear programming and large-scale precision matrix estimation in R. J. Mach. Learn. Res. 2014, 15, 489–493. [Google Scholar]
Gianni, L.; Eiermann, W.; Semiglazov, V.; Manikhas, A.; Lluch, A.; Tjulandin, S.; Zambetti, M.; Vazquez, F.; Byakhow, M.; Lichinitser, M.; et al. Neoadjuvant chemotherapy with trastuzumab followed by adjuvant trastuzumab versus neoadjuvant chemotherapy alone, in patients with HER2-positive locally advanced breast cancer (the NOAH trial): A randomised controlled superiority trial with a parallel HER2-negative cohort. Lancet 2010, 375, 377–384. [Google Scholar]
Prat, A.; Bianchini, G.; Thomas, M.; Belousov, A.; Cheang, M.C.; Koehler, A.; Gómez, P.; Semiglazov, V.; Eiermann, W.; Tjulandin, S.; et al. Research-based PAM50 subtype predictor identifies higher responses and improved survival outcomes in HER2-positive breast cancer in the NOAH study. Clin. Cancer Res. 2014, 20, 511–521. [Google Scholar] [CrossRef]
Roth, J.; Simon, N. A framework for estimating and testing qualitative interactions with applications to predictive biomarkers. Biostatistics 2018, 19, 263–280. [Google Scholar] [CrossRef]
Dudoit, S.; Fridlyand, J.; Speed, T.P. Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 2002, 97, 77–87. [Google Scholar] [CrossRef]

Table 1. Finite sample performance of the ATE estimators.

		$α = (0.1, 0.3, 0.8)$			$α = (0.2, 0.5, 0.9)$			$α = (0.5, 0.7, 0.9)$
p	Methods	\|Bias\|	SD	RMSE	\|Bias\|	SD	RMSE	\|Bias\|	SD	RMSE
$p = 500$	unadj	0.0040	0.7855	0.7855	0.0058	0.8778	0.8778	0.0035	1.1844	1.1844
	Lasso	0.0021	0.1155	0.1155	0.0085	0.1512	0.1515	0.0028	0.1643	0.1643
	SCAD	0.0010	0.0948	0.0948	0.0159	0.2358	0.2364	0.0019	0.2839	0.2839
	Enet	0.0023	0.1098	0.1098	0.0086	0.1533	0.1535	0.0032	0.1645	0.1646
	SPAC-Lasso	0.0014	0.1068	0.1068	0.0020	0.1050	0.1050	0.0008	0.1099	0.1099
	SPAC-SCAD	0.0011	0.0946	0.0946	0.0016	0.0977	0.0978	0.0013	0.1284	0.1284
$p = 1000$	unadj	0.0052	0.7796	0.7796	0.0162	0.9152	0.9153	0.0429	1.1905	1.1913
	Lasso	0.0009	0.1199	0.1199	0.0027	0.1515	0.1515	0.0093	0.1494	0.1497
	SCAD	0.0003	0.0918	0.0918	0.0043	0.2403	0.2404	0.0132	0.2665	0.2669
	Enet	0.0013	0.1136	0.1136	0.0025	0.1524	0.1524	0.0087	0.1523	0.1526
	SPAC-Lasso	0.0001	0.1051	0.1051	0.0007	0.1075	0.1075	0.0036	0.0955	0.0956
	SPAC-SCAD	0.0002	0.0916	0.0916	0.0000	0.0971	0.0971	0.0044	0.1157	0.1158
$p = 2000$	unadj	0.0104	0.7765	0.7766	0.0556	0.9237	0.9254	0.0696	1.2492	1.2512
	Lasso	0.0013	0.1200	0.1201	0.0010	0.1658	0.1658	0.0092	0.1801	0.1804
	SCAD	0.0004	0.0997	0.0997	0.0037	0.2521	0.2521	0.0134	0.2620	0.2623
	Enet	0.0020	0.1158	0.1158	0.0027	0.1659	0.1659	0.0063	0.1879	0.1880
	SPAC-Lasso	0.0005	0.1074	0.1074	0.0004	0.1045	0.1045	0.0040	0.1104	0.1105
	SPAC-SCAD	0.0002	0.0991	0.0991	0.0013	0.0969	0.0969	0.0034	0.1459	0.1459

Table 2. Variable selection results for treatment and control groups.

		$α = (0.1, 0.3, 0.8)$			$α = (0.2, 0.5, 0.9)$			$α = (0.5, 0.7, 0.9)$
p	Methods	S	FNR	FPR	S	FNR	FPR	S	FNR	FPR
$p = 500$	Lasso	17.650	0.000	0.018	34.636	0.140	0.054	43.091	0.170	0.073
	SCAD	9.009	0.000	0.000	12.937	0.552	0.018	32.333	0.819	0.063
	Enet	18.271	0.000	0.019	36.085	0.154	0.058	45.158	0.182	0.077
	SPAC-Lasso	9.007	0.000	0.000	9.000	0.000	0.000	9.000	0.000	0.000
	SPAC-SCAD	9.000	0.000	0.000	9.000	0.000	0.000	8.937	0.007	0.000
$p = 1000$	Lasso	20.116	0.000	0.011	40.922	0.260	0.035	47.313	0.103	0.040
	SCAD	9.325	0.000	0.000	21.004	0.755	0.019	38.149	0.886	0.037
	Enet	20.841	0.000	0.012	42.743	0.266	0.036	49.456	0.114	0.042
	SPAC-Lasso	9.193	0.000	0.000	9.005	0.000	0.000	9.000	0.000	0.000
	SPAC-SCAD	9.000	0.000	0.000	9.000	0.000	0.000	8.984	0.000	0.000
$p = 2000$	Lasso	20.083	0.000	0.006	42.091	0.277	0.018	54.969	0.281	0.024
	SCAD	9.082	0.000	0.000	20.077	0.716	0.009	42.502	0.958	0.021
	Enet	20.748	0.000	0.006	44.270	0.286	0.019	58.556	0.293	0.026
	SPAC-Lasso	9.218	0.000	0.000	9.002	0.000	0.000	9.000	0.000	0.000
	SPAC-SCAD	9.000	0.000	0.000	9.000	0.000	0.000	8.974	0.003	0.000

Table 3. The performance of the variance estimates and confidence intervals.

		$α = (0.1, 0.3, 0.8)$			$α = (0.2, 0.5, 0.9)$			$α = (0.5, 0.7, 0.9)$
p	Methods	MVE	MCP	MIL	MVE	MCP	MIL	MVE	MCP	MIL
$p = 500$	unadj	12.638	0.956	3.133	14.293	0.952	3.543	18.832	0.949	4.669
	Lasso	2.257	0.984	0.560	2.415	0.953	0.599	2.464	0.937	0.611
	SCAD	2.066	0.994	0.512	3.645	0.942	0.904	4.322	0.945	1.071
	Enet	2.179	0.985	0.540	2.430	0.951	0.603	2.511	0.938	0.622
	SPAC-Lasso	2.204	0.988	0.546	2.107	0.988	0.522	2.207	0.986	0.547
	SPAC-SCAD	2.065	0.994	0.512	1.993	0.990	0.494	2.427	0.979	0.602
$p = 1000$	unadj	12.217	0.940	3.029	14.500	0.945	3.595	18.441	0.947	4.572
	Lasso	2.297	0.982	0.569	2.433	0.949	0.603	2.283	0.943	0.566
	SCAD	2.092	0.995	0.519	3.801	0.952	0.942	4.157	0.948	1.031
	Enet	2.209	0.982	0.548	2.443	0.950	0.606	2.330	0.944	0.578
	SPAC-Lasso	2.233	0.991	0.554	2.204	0.986	0.546	1.980	0.987	0.491
	SPAC-SCAD	2.094	0.995	0.519	2.076	0.992	0.515	2.243	0.983	0.556
$p = 2000$	unadj	12.646	0.961	3.135	14.984	0.953	3.715	20.061	0.954	4.974
	Lasso	2.224	0.982	0.551	2.542	0.942	0.630	2.534	0.918	0.628
	SCAD	2.051	0.987	0.509	3.856	0.941	0.956	4.161	0.949	1.032
	Enet	2.137	0.980	0.530	2.561	0.946	0.635	2.650	0.916	0.657
	SPAC-Lasso	2.171	0.991	0.538	2.147	0.989	0.532	2.165	0.986	0.537
	SPAC-SCAD	2.046	0.989	0.507	2.042	0.993	0.506	2.591	0.973	0.642

Table 4. The performance of different methods for the treatment effect estimation of trastuzumab.

	unadj	Lasso	SCAD	Enet	SPAC-Lasso	SPAC-SCAD
$\hat{τ}$	0.2555	0.2491	0.2488	0.2473	0.2454	0.2435
S	−	16.500	17.000	20.000	15.000	8.500
$\hat{σ}$	0.9670	0.8031	0.8519	0.7566	0.7136	0.7317
L	0.3035	0.2521	0.2674	0.2375	0.2240	0.2296

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Diao, Z.; Yue, L.; Zhao, F.; Li, G. High-Dimensional Regression Adjustment Estimation for Average Treatment Effect with Highly Correlated Covariates. Mathematics 2022, 10, 4715. https://doi.org/10.3390/math10244715

AMA Style

Diao Z, Yue L, Zhao F, Li G. High-Dimensional Regression Adjustment Estimation for Average Treatment Effect with Highly Correlated Covariates. Mathematics. 2022; 10(24):4715. https://doi.org/10.3390/math10244715

Chicago/Turabian Style

Diao, Zeyu, Lili Yue, Fanrong Zhao, and Gaorong Li. 2022. "High-Dimensional Regression Adjustment Estimation for Average Treatment Effect with Highly Correlated Covariates" Mathematics 10, no. 24: 4715. https://doi.org/10.3390/math10244715

APA Style

Diao, Z., Yue, L., Zhao, F., & Li, G. (2022). High-Dimensional Regression Adjustment Estimation for Average Treatment Effect with Highly Correlated Covariates. Mathematics, 10(24), 4715. https://doi.org/10.3390/math10244715

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

High-Dimensional Regression Adjustment Estimation for Average Treatment Effect with Highly Correlated Covariates

Abstract

1. Introduction

2. Methodology and Theoretical Property

2.1. Spac Adjustment Method for ATE

2.2. Regularity Conditions and Theoretical Property

3. Simulation Studies

4. A Real Data Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Some Lemmas and Their Proofs

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI