Model Averaging for Heterogeneous Treatment Effects via Proximity Matching
Abstract
1. Introduction
- We introduce a novel HTE estimation framework that combines proximity matching with model averaging, addressing both model uncertainty and matching-induced bias.
- The resulting estimator, constructed under the combo matching framework, is shown to achieve asymptotic optimality through a derived optimal weighting criterion.
- We validate the proposed method through simulation studies, demonstrating superior performance over existing approaches.
2. Methodology
2.1. Model Framework and Assumptions
- (i)
- for ;
- (ii)
- ;
- (iii)
- The error terms and are independent for .
2.2. Candidate Models and Model Averaging Formulation
3. Weight Choice and Asymptotic Optimality
3.1. Proximity Matching and Pseudo-Observations
- representing the bias due to imperfect covariate matching;
- representing the composite noise from both units.
3.2. Weight Choice Criterion
3.3. Asymptotic Optimality
4. Numerical Experiments
4.1. Simulation Design
4.2. Results
- Nonlinear treatment effects: The PM–OPT method is particularly effective when treatment effects exhibit significant heterogeneity or nonlinearity. As shown in the results, it outperforms traditional methods such as AIC and BIC in these complex settings.
- Model uncertainty: When there is uncertainty about the correct model specification or variable inclusion, PM–OPT provides more robust and reliable estimates compared to classical criteria, which may underfit or overfit the data.
- Moderate sample sizes: In studies with moderate sample sizes, PM–OPT offers improved accuracy and stability over traditional model selection methods, which may not perform as well under these conditions.
- Observational studies with covariate imbalance: In observational studies, where covariate imbalance is often an issue, PM–OPT is a good choice, as it is designed to handle such imbalances more effectively than traditional model selection methods.
5. An Empirical Example
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix A.1. Proof of Theorem 1
Appendix A.2. Proof of Theorem 2
Appendix B
Appendix B.1. Abbreviations
- HTE—Heterogeneous Treatment Effect
- CATE—Conditional Average Treatment Effect
- ATE—Average Treatment Effect
- PSM—Propensity Score Matching
- BIC—Bayesian Information Criterion
- AIC—Akaike Information Criterion
- JMA—Jackknife Model Averaging
- MMA—Mallows Model Averaging
- PM-OPT—Proximity Matching-based Optimal Averaging
- TECV—Treatment Effect Cross-Validation
- TEEM—Treatment Effect Estimation by Mixing
- SNR—Signal-to-Noise Ratio
- SUTVA—Stable Unit Treatment Value Assumption
Appendix B.2. Summary of Key Notation
Symbol | Description |
---|---|
K | Number of candidate models |
L | Number of matched pseudo-observation pairs (via combo matching) |
Matched covariates in treated and control groups | |
Matched outcomes in treated and control groups | |
Predicted outcomes under model for treated/control units | |
Projection matrices from full sample design matrices | |
Projection matrices from matched sample design matrices | |
Zero-padded projection matrices (aligned to length L) | |
Weighted projection matrices: , . | |
Model-averaged predicted outcomes for treated/control units | |
Final model-averaged CATE estimator: | |
Weight space: the unit simplex in | |
Bias due to covariate imbalance: | |
Composite noise term: |
Appendix B.3. The Algorithm of PM-OPT
References
- Ashenfelter, O. Estimating the Effect of Training Programs on Earnings. Rev. Econ. Stat. 1978, 60, 47–57. [Google Scholar] [CrossRef]
- LaLonde, R.J. Evaluating the Econometric Evaluations of Training Programs with Experimental Data. Am. Econ. Rev. 1986, 76, 604–620. [Google Scholar]
- Abadie, A.; Imbens, G.W. Bias-corrected Matching Estimators for Average Treatment Effects. J. Bus. Econ. Stat. 2011, 29, 1–11. [Google Scholar] [CrossRef]
- Imai, K.; Ratkovic, M. Estimating Treatment Effect Heterogeneity in Randomized Program Evaluation. Ann. Appl. Stat. 2013, 7, 443–470. [Google Scholar] [CrossRef]
- Wager, S.; Athey, S. Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests. J. Am. Stat. Assoc. 2018, 113, 1228–1242. [Google Scholar] [CrossRef]
- Chipman, H.A.; George, E.I.; McCulloch, R.E. BART: Bayesian Additive Regression Trees. Ann. Appl. Stat. 2010, 4, 266–298. [Google Scholar] [CrossRef]
- Hill, J.L. Bayesian Nonparametric Modeling for Causal Inference. J. Comput. Graph. Stat. 2011, 20, 217–240. [Google Scholar] [CrossRef]
- Athey, S.; Imbens, G. Recursive Partitioning for Heterogeneous Causal Effects. Proc. Natl. Acad. Sci. USA 2016, 113, 7353–7360. [Google Scholar] [CrossRef]
- Raftery, A.E.; Madigan, D.; Hoeting, J.A. Bayesian Model Averaging for Linear Regression Models. J. Am. Stat. Assoc. 1997, 92, 179–191. [Google Scholar] [CrossRef]
- Hoeting, J.A.; Madigan, D.; Raftery, A.E.; Volinsky, C.T. Bayesian Model Averaging: A Tutorial. Stat. Sci. 1999, 14, 382–401. [Google Scholar] [CrossRef]
- Raftery, A.E.; Zheng, Y. Discussion: Performance of Bayesian Model Averaging. J. Am. Stat. Assoc. 2003, 98, 931–938. [Google Scholar] [CrossRef]
- Buckland, S.T. Model Selection: An Integral Part of Inference. Biometrics 1997, 53, 603–618. [Google Scholar] [CrossRef]
- Claeskens, G.; Hjort, N.L. The Focused Information Criterion. J. Am. Stat. Assoc. 2003, 98, 900–916. [Google Scholar] [CrossRef]
- Hjort, N.L.; Claeskens, G. Focused Information Criteria and Model Averaging for the Cox Hazard Regression Model. J. Am. Stat. Assoc. 2006, 101, 1449–1464. [Google Scholar] [CrossRef]
- Yang, Y. Adaptive Regression by Mixing. J. Am. Stat. Assoc. 2001, 96, 574–588. [Google Scholar] [CrossRef]
- Yang, Y. Regression with Multiple Candidate Models: Selecting or Mixing? Stat. Sin. 2003, 13, 783–809. [Google Scholar]
- Yang, Y. Combining Linear Regression Models: When and How? J. Am. Stat. Assoc. 2005, 100, 1202–1214. [Google Scholar] [CrossRef]
- Hansen, B.E. Least Squares Model Averaging. Econometrica 2007, 75, 1175–1189. [Google Scholar] [CrossRef]
- Wan, A.T.; Zhang, X.; Zou, G. Least Squares Model Averaging by Mallows Criterion. J. Econom. 2010, 156, 277–283. [Google Scholar] [CrossRef]
- Hansen, B.E.; Racine, J.S. Jackknife Model Averaging. J. Econom. 2012, 167, 38–46. [Google Scholar] [CrossRef]
- Zhang, X.; Zou, G.; Carroll, R.J. Model Averaging Based on Kullback-Leibler Distance. Stat. Sin. 2015, 25, 1583–1598. [Google Scholar] [CrossRef]
- Zhang, X.; Yu, D.; Zou, G.; Liang, H. Optimal Model Averaging Estimation for Generalized Linear Models and Generalized Linear Mixed-Effects Models. J. Am. Stat. Assoc. 2016, 111, 1775–1790. [Google Scholar] [CrossRef]
- Zhang, X.; Zou, G.; Liang, H.; Carroll, R.J. Parsimonious Model Averaging With a Diverging Number of Parameters. J. Am. Stat. Assoc. 2019, 115, 972–984. [Google Scholar] [CrossRef]
- Seng, L.; Li, J. Structural Equation Model Averaging: Methodology and Application. J. Bus. Econ. Stat. 2022, 40, 815–828. [Google Scholar] [CrossRef]
- Kitagawa, T.; Muris, C. Model Averaging in Semiparametric Estimation of Treatment Effects. J. Econom. 2016, 195, 358–368. [Google Scholar] [CrossRef]
- Rolling, C.A.; Yang, Y.; Velez, D. Combining Estimates of Conditional Treatment Effects. Econom. Theory 2019, 35, 1089–1110. [Google Scholar] [CrossRef]
- Zhao, Z.; Zhang, X.; Zou, G.; Wan, A.T.K.; Tso, G.K.F. Model Averaging for Estimating Treatment Effects. Ann. Inst. Stat. Math. 2024, 76, 73–92. [Google Scholar] [CrossRef]
- Rubin, D.B. Multivariate Matching Methods That Are Equal Percent Bias Reducing, I: Some Examples. Biometrics 1976, 32, 109–120. [Google Scholar] [CrossRef]
- Rubin, D.B. Bias Reduction Using Mahalanobis-Metric Matching. Biometrics 1980, 36, 293–298. [Google Scholar] [CrossRef]
- Rosenbaum, P.R.; Rubin, D.B. The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika 1983, 70, 41–55. [Google Scholar] [CrossRef]
- Rosenbaum, P.R.; Rubin, D.B. Reducing Bias in Observational Studies Using Subclassification on the Propensity Score. J. Am. Stat. Assoc. 1984, 79, 516–524. [Google Scholar] [CrossRef]
- Hansen, B.B. The Prognostic Analogue of the Propensity Score. Biometrika 2008, 95, 481–488. [Google Scholar] [CrossRef]
- Rosenbaum, P.R. A Characterization of Optimal Designs for Observational Studies. J. R. Stat. Soc. Ser. B Methodol. 1991, 53, 597–610. [Google Scholar] [CrossRef]
- Hansen, B.B. Full Matching in an Observational Study of Coaching for the SAT. J. Am. Stat. Assoc. 2004, 99, 609–618. [Google Scholar] [CrossRef]
- Gao, Z.; Hastie, T.; Tibshirani, R. Assessment of Heterogeneous Treatment Effect Estimation Accuracy via Matching. Stat. Med. 2021, 40, 3990–4013. [Google Scholar] [CrossRef] [PubMed]
- Lv, J.; Liu, J.S. Model Selection Principles in Misspecified Models. J. R. Stat. Soc. Ser. B Stat. Methodol. 2014, 76, 141–167. [Google Scholar] [CrossRef]
- Rolling, C.A.; Yang, Y. Model Selection for Estimating Treatment Effects. J. R. Stat. Soc. Ser. B Stat. Methodol. 2014, 76, 749–769. [Google Scholar] [CrossRef]
- MacArthur, R.D.; Novak, R.M.; Peng, G.; Chen, L.; Xiang, Y.; Hullsiek, K.H. A Comparison of Three Highly Active Antiretroviral Treatment Strategies Consisting of Non-Nucleoside Reverse Transcriptase Inhibitors, Protease Inhibitors, or Both in the Presence of Nucleoside Reverse Transcriptase Inhibitors as Initial Therapy (CPCRA 058 FIRST Study): A Long-Term Randomised Trial. Lancet 2006, 368, 2125–2135. [Google Scholar] [CrossRef] [PubMed]
- Whittle, P. Bounds for the Moments of Linear and Quadratic Forms in Independent Variables. Theory Probab. Appl. 1960, 5, 302–305. [Google Scholar] [CrossRef]
- Zhang, X.; Wan, A.T.; Zou, G. Model Averaging by Jackknife Criterion in Models with Dependent Data. J. Econom. 2013, 174, 82–94. [Google Scholar] [CrossRef]
Design | CATE Form | Propensity e(X) | Grid |
---|---|---|---|
A | Linear | ||
B | Linear | same | |
C | Nonlinear | same | |
D | Nonlinear | same |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, Z.; Zhao, L.; Wang, Y. Model Averaging for Heterogeneous Treatment Effects via Proximity Matching. Symmetry 2025, 17, 1304. https://doi.org/10.3390/sym17081304
Zhao Z, Zhao L, Wang Y. Model Averaging for Heterogeneous Treatment Effects via Proximity Matching. Symmetry. 2025; 17(8):1304. https://doi.org/10.3390/sym17081304
Chicago/Turabian StyleZhao, Zhihao, Lingya Zhao, and Ying Wang. 2025. "Model Averaging for Heterogeneous Treatment Effects via Proximity Matching" Symmetry 17, no. 8: 1304. https://doi.org/10.3390/sym17081304
APA StyleZhao, Z., Zhao, L., & Wang, Y. (2025). Model Averaging for Heterogeneous Treatment Effects via Proximity Matching. Symmetry, 17(8), 1304. https://doi.org/10.3390/sym17081304