High-Dimensional Regression Adjustment Estimation for Average Treatment Effect with Highly Correlated Covariates
Abstract
1. Introduction
2. Methodology and Theoretical Property
2.1. Spac Adjustment Method for ATE
2.2. Regularity Conditions and Theoretical Property
- (C1)
- and as , and .
- (C2)
- For , there is a fixed constant such that , and .
- (C3)
- The eigenvalues of the sample covariance matrix are bounded away from zero and infinity.
- (C4)
- There exists a constant such that , .
- (C5)
- Let be the maximum covariance between the error terms and the covariatesAssume that and .
- (C6)
- Let . There exist constants and , such that
- (C7)
- Let . For some constants , , and , the regularization parameters of the SPAC-Lasso satisfy that
3. Simulation Studies
4. A Real Data Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Some Lemmas and Their Proofs
References
- Fan, J.; Lv, J. Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 2008, 70, 849–911. [Google Scholar] [CrossRef] [PubMed]
- Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
- Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
- Rubin, D.B. Estimating causal effects of treatments in randomized and nonrandomized Studies. J. Educ. Psychol. 1974, 66, 688–701. [Google Scholar] [CrossRef]
- Neyman, J. On the application of probability theory to agricultural experiments. Essay on principles, section 9. Translation of original 1923 paper, which appeared in roczniki nauk rolniczych. Stat. Sci. 1990, 5, 465–472. [Google Scholar]
- Rubin, D.B. Matched Sampling for Causal Effects; Cambridge University Press: New York, NY, USA, 2006. [Google Scholar]
- Imbens, G.W.; Rubin, D.B. Causal Inference in Statistics, Social, and Biomedical Sciences: An Introduction; Cambridge University Press: New York, NY, USA, 2015. [Google Scholar]
- Belloni, A.; Chernozhukov, V.; Hansen, C. Inference on treatment effects after selection among high-dimensional controls. Rev. Econ. Stud. 2014, 81, 608–650. [Google Scholar] [CrossRef]
- Belloni, A.; Chernozhukov, V.; Fernández-Val, I.; Hansen, C. Program evaluation and causal inference with high-dimensional data. Econometrica 2017, 85, 233–298. [Google Scholar] [CrossRef]
- Wager, S.; Du, W.F.; Taylor, J.; Tibshirani, R. High-dimensional regression adjustments in randomized experiments. Proc. Natl. Acad. Sci. USA 2016, 113, 12673–12678. [Google Scholar] [CrossRef]
- Bloniarz, A.; Liu, H.Z.; Zhang, C.H.; Sekhon, J.S.; Yu, B. Lasso adjustments of treatment effect estimates in randomized experiments. Proc. Natl. Acad. Sci. USA 2016, 113, 7383–7390. [Google Scholar] [CrossRef]
- Yue, L.L.; Li, G.R.; Lian, H.; Wan, X. Regression adjustment for treatment effect with multicollinearity in high dimensions. Comput. Stat. Data Anal. 2019, 134, 17–35. [Google Scholar] [CrossRef]
- Wang, H.; Lengerich, B.J.; Aragam, B.; Xing, E.P. Precision Lasso: Accounting for correlations and linear dependencies in high-dimensional genomic data. Bioinformatics 2019, 35, 1181–1187. [Google Scholar] [CrossRef] [PubMed]
- Zhu, W.; Lévy-Leduc, C.; Ternès, N. A variable selection approach for highly correlated predictors in high-dimensional genomic data. Bioinformatics 2021, 37, 2238–2244. [Google Scholar] [CrossRef] [PubMed]
- Zhao, P.; Yu, B. On model selection consistency of Lasso. J. Mach. Learn. Res. 2006, 7, 2541–2563. [Google Scholar]
- Bühlmann, P.; Kalisch, M.; Maathuis, M.H. Variable selection in high-dimensional linear models: Partially faithful distributions and the PC-simple algorithm. Biometrika 2010, 97, 261–278. [Google Scholar] [CrossRef]
- Fan, J.; Shao, Q.M.; Zhou, W.X. Are discoveries spurious Distributions of maximum spurious correlations and their applications. Ann. Stat. 2018, 46, 989–1017. [Google Scholar] [CrossRef]
- Xue, F.; Qu, A. Semi-Standard partial covariance variable selection when irrepresentable conditions fail. Stat. Sin. 2022, 32, 1881–1909. [Google Scholar] [CrossRef]
- Imbens, G.W. Nonparametric estimation of average treatment effects under exogeneity: A review. Rev. Econ. Stat. 2004, 86, 4–29. [Google Scholar] [CrossRef]
- Freedman, D.A. On regression adjustments in experiments with several treatments. Ann. Appl. Stat. 2008, 2, 176–196. [Google Scholar] [CrossRef]
- Lin, W. Agnostic notes on regression adjustments to experimental data: Reexamining Freedman’s critique. Ann. Appl. Stat. 2013, 7, 295–318. [Google Scholar] [CrossRef]
- Lauritzen, S.L. Graphical Models; Clarendon Press: Oxford, UK, 1996. [Google Scholar]
- Raveh, A. On the use of the inverse of the correlation matrix in multivariate data analysis. Am. Stat. 1985, 39, 39–42. [Google Scholar]
- Cai, T.; Liu, W.; Luo, X. A constrained L1 minimization approach to sparse precision matrix estimation. J. Am. Stat. Assoc. 2011, 106, 594–607. [Google Scholar] [CrossRef]
- Balmand, S.; Dalalyan, A.S. On estimation of the diagonal elements of a sparse precision matrix. Electron. J. Stat. 2016, 10, 1551–1579. [Google Scholar] [CrossRef]
- Avella-Medina, M.; Battey, H.S.; Fan, J.; Li, Q. Robust estimation of high-dimensional covariance and precision matrices. Biometrika 2018, 105, 271–284. [Google Scholar] [CrossRef] [PubMed]
- Fan, J.; Peng, H. Nonconcave penalized likelihood with a diverging number of parameters. Ann. Stat. 2004, 32, 928–961. [Google Scholar] [CrossRef]
- Blazère, M.; Loubes, J.M.; Gamboa, F. Oracle inequalities for a group Lasso procedure applied to generalized linear models in high dimension. IEEE Trans. Inf. Theory 2014, 60, 2303–2318. [Google Scholar] [CrossRef]
- Pang, H.; Liu, H.; Vanderbei, R.J. The fastclime package for linear programming and large-scale precision matrix estimation in R. J. Mach. Learn. Res. 2014, 15, 489–493. [Google Scholar]
- Gianni, L.; Eiermann, W.; Semiglazov, V.; Manikhas, A.; Lluch, A.; Tjulandin, S.; Zambetti, M.; Vazquez, F.; Byakhow, M.; Lichinitser, M.; et al. Neoadjuvant chemotherapy with trastuzumab followed by adjuvant trastuzumab versus neoadjuvant chemotherapy alone, in patients with HER2-positive locally advanced breast cancer (the NOAH trial): A randomised controlled superiority trial with a parallel HER2-negative cohort. Lancet 2010, 375, 377–384. [Google Scholar]
- Prat, A.; Bianchini, G.; Thomas, M.; Belousov, A.; Cheang, M.C.; Koehler, A.; Gómez, P.; Semiglazov, V.; Eiermann, W.; Tjulandin, S.; et al. Research-based PAM50 subtype predictor identifies higher responses and improved survival outcomes in HER2-positive breast cancer in the NOAH study. Clin. Cancer Res. 2014, 20, 511–521. [Google Scholar] [CrossRef]
- Roth, J.; Simon, N. A framework for estimating and testing qualitative interactions with applications to predictive biomarkers. Biostatistics 2018, 19, 263–280. [Google Scholar] [CrossRef]
- Dudoit, S.; Fridlyand, J.; Speed, T.P. Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 2002, 97, 77–87. [Google Scholar] [CrossRef]
p | Methods | |Bias| | SD | RMSE | |Bias| | SD | RMSE | |Bias| | SD | RMSE |
---|---|---|---|---|---|---|---|---|---|---|
unadj | 0.0040 | 0.7855 | 0.7855 | 0.0058 | 0.8778 | 0.8778 | 0.0035 | 1.1844 | 1.1844 | |
Lasso | 0.0021 | 0.1155 | 0.1155 | 0.0085 | 0.1512 | 0.1515 | 0.0028 | 0.1643 | 0.1643 | |
SCAD | 0.0010 | 0.0948 | 0.0948 | 0.0159 | 0.2358 | 0.2364 | 0.0019 | 0.2839 | 0.2839 | |
Enet | 0.0023 | 0.1098 | 0.1098 | 0.0086 | 0.1533 | 0.1535 | 0.0032 | 0.1645 | 0.1646 | |
SPAC-Lasso | 0.0014 | 0.1068 | 0.1068 | 0.0020 | 0.1050 | 0.1050 | 0.0008 | 0.1099 | 0.1099 | |
SPAC-SCAD | 0.0011 | 0.0946 | 0.0946 | 0.0016 | 0.0977 | 0.0978 | 0.0013 | 0.1284 | 0.1284 | |
unadj | 0.0052 | 0.7796 | 0.7796 | 0.0162 | 0.9152 | 0.9153 | 0.0429 | 1.1905 | 1.1913 | |
Lasso | 0.0009 | 0.1199 | 0.1199 | 0.0027 | 0.1515 | 0.1515 | 0.0093 | 0.1494 | 0.1497 | |
SCAD | 0.0003 | 0.0918 | 0.0918 | 0.0043 | 0.2403 | 0.2404 | 0.0132 | 0.2665 | 0.2669 | |
Enet | 0.0013 | 0.1136 | 0.1136 | 0.0025 | 0.1524 | 0.1524 | 0.0087 | 0.1523 | 0.1526 | |
SPAC-Lasso | 0.0001 | 0.1051 | 0.1051 | 0.0007 | 0.1075 | 0.1075 | 0.0036 | 0.0955 | 0.0956 | |
SPAC-SCAD | 0.0002 | 0.0916 | 0.0916 | 0.0000 | 0.0971 | 0.0971 | 0.0044 | 0.1157 | 0.1158 | |
unadj | 0.0104 | 0.7765 | 0.7766 | 0.0556 | 0.9237 | 0.9254 | 0.0696 | 1.2492 | 1.2512 | |
Lasso | 0.0013 | 0.1200 | 0.1201 | 0.0010 | 0.1658 | 0.1658 | 0.0092 | 0.1801 | 0.1804 | |
SCAD | 0.0004 | 0.0997 | 0.0997 | 0.0037 | 0.2521 | 0.2521 | 0.0134 | 0.2620 | 0.2623 | |
Enet | 0.0020 | 0.1158 | 0.1158 | 0.0027 | 0.1659 | 0.1659 | 0.0063 | 0.1879 | 0.1880 | |
SPAC-Lasso | 0.0005 | 0.1074 | 0.1074 | 0.0004 | 0.1045 | 0.1045 | 0.0040 | 0.1104 | 0.1105 | |
SPAC-SCAD | 0.0002 | 0.0991 | 0.0991 | 0.0013 | 0.0969 | 0.0969 | 0.0034 | 0.1459 | 0.1459 |
p | Methods | S | FNR | FPR | S | FNR | FPR | S | FNR | FPR |
---|---|---|---|---|---|---|---|---|---|---|
Lasso | 17.650 | 0.000 | 0.018 | 34.636 | 0.140 | 0.054 | 43.091 | 0.170 | 0.073 | |
SCAD | 9.009 | 0.000 | 0.000 | 12.937 | 0.552 | 0.018 | 32.333 | 0.819 | 0.063 | |
Enet | 18.271 | 0.000 | 0.019 | 36.085 | 0.154 | 0.058 | 45.158 | 0.182 | 0.077 | |
SPAC-Lasso | 9.007 | 0.000 | 0.000 | 9.000 | 0.000 | 0.000 | 9.000 | 0.000 | 0.000 | |
SPAC-SCAD | 9.000 | 0.000 | 0.000 | 9.000 | 0.000 | 0.000 | 8.937 | 0.007 | 0.000 | |
Lasso | 20.116 | 0.000 | 0.011 | 40.922 | 0.260 | 0.035 | 47.313 | 0.103 | 0.040 | |
SCAD | 9.325 | 0.000 | 0.000 | 21.004 | 0.755 | 0.019 | 38.149 | 0.886 | 0.037 | |
Enet | 20.841 | 0.000 | 0.012 | 42.743 | 0.266 | 0.036 | 49.456 | 0.114 | 0.042 | |
SPAC-Lasso | 9.193 | 0.000 | 0.000 | 9.005 | 0.000 | 0.000 | 9.000 | 0.000 | 0.000 | |
SPAC-SCAD | 9.000 | 0.000 | 0.000 | 9.000 | 0.000 | 0.000 | 8.984 | 0.000 | 0.000 | |
Lasso | 20.083 | 0.000 | 0.006 | 42.091 | 0.277 | 0.018 | 54.969 | 0.281 | 0.024 | |
SCAD | 9.082 | 0.000 | 0.000 | 20.077 | 0.716 | 0.009 | 42.502 | 0.958 | 0.021 | |
Enet | 20.748 | 0.000 | 0.006 | 44.270 | 0.286 | 0.019 | 58.556 | 0.293 | 0.026 | |
SPAC-Lasso | 9.218 | 0.000 | 0.000 | 9.002 | 0.000 | 0.000 | 9.000 | 0.000 | 0.000 | |
SPAC-SCAD | 9.000 | 0.000 | 0.000 | 9.000 | 0.000 | 0.000 | 8.974 | 0.003 | 0.000 |
p | Methods | MVE | MCP | MIL | MVE | MCP | MIL | MVE | MCP | MIL |
---|---|---|---|---|---|---|---|---|---|---|
unadj | 12.638 | 0.956 | 3.133 | 14.293 | 0.952 | 3.543 | 18.832 | 0.949 | 4.669 | |
Lasso | 2.257 | 0.984 | 0.560 | 2.415 | 0.953 | 0.599 | 2.464 | 0.937 | 0.611 | |
SCAD | 2.066 | 0.994 | 0.512 | 3.645 | 0.942 | 0.904 | 4.322 | 0.945 | 1.071 | |
Enet | 2.179 | 0.985 | 0.540 | 2.430 | 0.951 | 0.603 | 2.511 | 0.938 | 0.622 | |
SPAC-Lasso | 2.204 | 0.988 | 0.546 | 2.107 | 0.988 | 0.522 | 2.207 | 0.986 | 0.547 | |
SPAC-SCAD | 2.065 | 0.994 | 0.512 | 1.993 | 0.990 | 0.494 | 2.427 | 0.979 | 0.602 | |
unadj | 12.217 | 0.940 | 3.029 | 14.500 | 0.945 | 3.595 | 18.441 | 0.947 | 4.572 | |
Lasso | 2.297 | 0.982 | 0.569 | 2.433 | 0.949 | 0.603 | 2.283 | 0.943 | 0.566 | |
SCAD | 2.092 | 0.995 | 0.519 | 3.801 | 0.952 | 0.942 | 4.157 | 0.948 | 1.031 | |
Enet | 2.209 | 0.982 | 0.548 | 2.443 | 0.950 | 0.606 | 2.330 | 0.944 | 0.578 | |
SPAC-Lasso | 2.233 | 0.991 | 0.554 | 2.204 | 0.986 | 0.546 | 1.980 | 0.987 | 0.491 | |
SPAC-SCAD | 2.094 | 0.995 | 0.519 | 2.076 | 0.992 | 0.515 | 2.243 | 0.983 | 0.556 | |
unadj | 12.646 | 0.961 | 3.135 | 14.984 | 0.953 | 3.715 | 20.061 | 0.954 | 4.974 | |
Lasso | 2.224 | 0.982 | 0.551 | 2.542 | 0.942 | 0.630 | 2.534 | 0.918 | 0.628 | |
SCAD | 2.051 | 0.987 | 0.509 | 3.856 | 0.941 | 0.956 | 4.161 | 0.949 | 1.032 | |
Enet | 2.137 | 0.980 | 0.530 | 2.561 | 0.946 | 0.635 | 2.650 | 0.916 | 0.657 | |
SPAC-Lasso | 2.171 | 0.991 | 0.538 | 2.147 | 0.989 | 0.532 | 2.165 | 0.986 | 0.537 | |
SPAC-SCAD | 2.046 | 0.989 | 0.507 | 2.042 | 0.993 | 0.506 | 2.591 | 0.973 | 0.642 |
unadj | Lasso | SCAD | Enet | SPAC-Lasso | SPAC-SCAD | |
---|---|---|---|---|---|---|
0.2555 | 0.2491 | 0.2488 | 0.2473 | 0.2454 | 0.2435 | |
S | − | 16.500 | 17.000 | 20.000 | 15.000 | 8.500 |
0.9670 | 0.8031 | 0.8519 | 0.7566 | 0.7136 | 0.7317 | |
L | 0.3035 | 0.2521 | 0.2674 | 0.2375 | 0.2240 | 0.2296 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Diao, Z.; Yue, L.; Zhao, F.; Li, G. High-Dimensional Regression Adjustment Estimation for Average Treatment Effect with Highly Correlated Covariates. Mathematics 2022, 10, 4715. https://doi.org/10.3390/math10244715
Diao Z, Yue L, Zhao F, Li G. High-Dimensional Regression Adjustment Estimation for Average Treatment Effect with Highly Correlated Covariates. Mathematics. 2022; 10(24):4715. https://doi.org/10.3390/math10244715
Chicago/Turabian StyleDiao, Zeyu, Lili Yue, Fanrong Zhao, and Gaorong Li. 2022. "High-Dimensional Regression Adjustment Estimation for Average Treatment Effect with Highly Correlated Covariates" Mathematics 10, no. 24: 4715. https://doi.org/10.3390/math10244715
APA StyleDiao, Z., Yue, L., Zhao, F., & Li, G. (2022). High-Dimensional Regression Adjustment Estimation for Average Treatment Effect with Highly Correlated Covariates. Mathematics, 10(24), 4715. https://doi.org/10.3390/math10244715