Comparing Meta-Learners for Estimating Heterogeneous Treatment Effects and Conducting Sensitivity Analyses
Abstract
1. Introduction
2. Framework
2.1. Background
2.2. Sensitivity Analysis Method
3. Estimating CATE
3.1. Causal Forest
3.2. S-Learner
3.3. T-Learner
3.4. X-Learner
3.5. DR-Learner
3.6. R-Learner
3.7. Cross-Fitting
4. Simulation Study
4.1. Simulation Study of HTE
4.1.1. Performance Measures
4.1.2. Simulation Design
4.1.3. Simulation Results Analysis
5. Application to a Real Data
5.1. Data Description
5.2. Inference on HTE
5.3. Sensitivity Analysis
6. Discussion
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Winship, C.; Morgan, S.L. The estimation of causal effects from observational data. Annu. Rev. Sociol. 1999, 25, 659–706. [Google Scholar] [CrossRef]
- Hill, J.L. Bayesian nonparametric modeling for causal inference. J. Comput. Graph. Stat. 2011, 20, 217–240. [Google Scholar] [CrossRef]
- Ding, P. A First Course in Causal Inference; Chapman and Hall/CRC: London, UK, 2024. [Google Scholar]
- Rosenbaum, P.R. Observational Studies. In Observational Studies; Springer: New York, NY, USA, 2002; pp. 1–17. [Google Scholar]
- Rubin, D.B. The design versus the analysis of observational studies for causal effects: Parallels with the design of randomized trials. Stat. Med. 2007, 26, 20–36. [Google Scholar] [CrossRef]
- Choi, B.Y. Instrumental variable estimation of truncated local average treatment effects. PLoS ONE 2021, 16, e0249642. [Google Scholar] [CrossRef]
- Chen, X.; Harhay, M.O.; Tong, G.; Li, F. A Bayesian machine learning approach for estimating heterogeneous survivor causal effects: Applications to a critical care trial. Ann. Appl. Stat. 2024, 18, 350–374. [Google Scholar] [CrossRef]
- Xie, Y.; Brand, J.E.; Jann, B. Estimating heterogeneous treatment effects with observational data. Sociol. Methodol. 2012, 42, 314–347. [Google Scholar] [CrossRef] [PubMed]
- Athey, S.; Imbens, G.W. Machine learning methods for estimating heterogeneous causal effects. Stat 2015, 1050, 1–26. [Google Scholar]
- Li, S.; Pu, Z.; Cui, Z.; Lee, S.; Guo, X.; Ngoduy, D. Inferring heterogeneous treatment effects of crashes on highway traffic: A doubly robust causal machine learning approach. Transp. Res. Part C Emerg. Technol. 2024, 160, 104537. [Google Scholar] [CrossRef]
- Hattab, Z.; Doherty, E.; Ryan, A.M.; O’Neill, S. Heterogeneity within the Oregon Health Insurance Experiment: An application of causal forests. PLoS ONE 2024, 19, e0297205. [Google Scholar] [CrossRef] [PubMed]
- Sverdrup, E.; Petukhova, M.; Wager, S. Estimating treatment effect heterogeneity in Psychiatry: A review and tutorial with causal forests. Int. J. Methods Psychiatr. Res. 2025, 34, e70015. [Google Scholar] [CrossRef]
- Wager, S.; Athey, S. Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. 2018, 113, 1228–1242. [Google Scholar] [CrossRef]
- Brantner, C.L.; Nguyen, T.Q.; Tang, T.; Zhao, C.; Hong, H.; Stuart, E.A. Comparison of methods that combine multiple randomized trials to estimate heterogeneous treatment effects. Stat. Med. 2024, 43, 1291–1314. [Google Scholar] [CrossRef]
- Okasa, G. Meta-learners for Estimation of Causal Effects: Finite Sample Cross-fit Performance. arXiv 2022, arXiv:2201.12692. [Google Scholar]
- Jacob, D. CATE meets ML: Conditional average treatment effect and machine learning. Digit. Financ. 2021, 3, 99–148. [Google Scholar] [CrossRef]
- McJames, N.; O’Shea, A.; Goh, Y.C.; Parnell, A. Bayesian causal forests for multivariate outcomes: Application to Irish data from an international large scale education assessment. J. R. Stat. Soc. Ser. A Stat. Soc. 2025, 188, 428–450. [Google Scholar] [CrossRef]
- Holland, P.W. Statistics and causal inference. J. Am. Stat. Assoc. 1986, 81, 945–960. [Google Scholar] [CrossRef]
- VanderWeele, T.J.; Ding, P. Sensitivity analysis in observational research: Introducing the E-value. Ann. Intern. Med. 2017, 167, 268–274. [Google Scholar] [CrossRef]
- Cinelli, C.; Hazlett, C. Making sense of sensitivity: Extending omitted variable bias. J. R. Stat. Soc. Ser. B Stat. Methodol. 2020, 82, 39–67. [Google Scholar] [CrossRef]
- Rosenbaum, P.R.; Rubin, D.B. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J. R. Stat. Soc. Ser. B (Methodol.) 1983, 45, 212–218. [Google Scholar] [CrossRef]
- Tchetgen, E.J.T.; Shpitser, I. Semiparametric theory for causal mediation analysis: Efficiency bounds, multiple robustness, and sensitivity analysis. Ann. Stat. 2012, 40, 1816. [Google Scholar] [CrossRef]
- VanderWeele, T.J.; Tchetgen, E.J.T.; Halloran, M.E. Interference and sensitivity analysis. Stat. Sci. A Rev. J. Inst. Math. Stat. 2015, 29, 687. [Google Scholar] [CrossRef]
- Lu, S.; Ding, P. Flexible sensitivity analysis for causal inference in observational studies subject to unmeasured confounding. arXiv 2023, arXiv:2305.17643. [Google Scholar]
- Pearl, J. Invited commentary: Understanding bias amplification. Am. J. Epidemiol. 2011, 174, 1223–1227. [Google Scholar] [CrossRef]
- Athey, S.; Wager, S. Estimating treatment effects with causal forests: An application. Obs. Stud. 2019, 5, 37–51. [Google Scholar] [CrossRef]
- Raghavan, S.; Josey, K.; Bahn, G.; Reda, D.; Basu, S.; Berkowitz, S.A.; Ghosh, D. Generalizability of heterogeneous treatment effects based on causal forests applied to two randomized clinical trials of intensive glycemic control. Ann. Epidemiol. 2022, 65, 101–108. [Google Scholar] [CrossRef] [PubMed]
- Künzel, S.R.; Sekhon, J.S.; Bickel, P.J.; Yu, B. Metalearners for estimating heterogeneous treatment effects using machine learning. Proc. Natl. Acad. Sci. USA 2019, 116, 4156–4165. [Google Scholar] [CrossRef]
- Kennedy, E.H. Towards optimal doubly robust estimation of heterogeneous causal effects. Electron. J. Stat. 2023, 17, 3008–3049. [Google Scholar] [CrossRef]
- Nie, X.; Wager, S. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika 2021, 108, 299–319. [Google Scholar] [CrossRef]
- Lipkovich, I.; Svensson, D.; Ratitch, B.; Dmitrienko, A. Modern approaches for evaluating treatment effect heterogeneity from clinical trials and observational data. Stat. Med. 2024, 43, 4388–4436. [Google Scholar] [CrossRef]
- Bica, I.; Alaa, A.M.; Lambert, C.; Van Der Schaar, M. From real-world patient data to individualized treatment effects using machine learning: Current and future methods to address underlying challenges. Clin. Pharmacol. Ther. 2021, 109, 87–100. [Google Scholar]
- Athey, S.; Imbens, G. Recursive partitioning for heterogeneous causal effects. Proc. Natl. Acad. Sci. USA 2016, 113, 7353–7360. [Google Scholar] [CrossRef] [PubMed]
- Lo, V.S. The true lift model: A novel data mining approach to response modeling in database marketing. ACM SIGKDD Explor. Newsl. 2002, 4, 78–86. [Google Scholar] [CrossRef]
- Robins, J. A new approach to causal inference in mortality studies with a sustained exposure period—Application to control of the healthy worker survivor effect. Math. Model. 1986, 7, 1393–1512. [Google Scholar] [CrossRef]
- Snowden, J.M.; Rose, S.; Mortimer, K.M. Implementation of G-computation on a simulated data set: Demonstration of a causal inference technique. Am. J. Epidemiol. 2011, 173, 731–738. [Google Scholar] [CrossRef]
- Alaa, A.; Schaar, M. Limits of estimating heterogeneous treatment effects: Guidelines for practical algorithm design. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 129–138. [Google Scholar]
- Hahn, P.R.; Murray, J.S.; Carvalho, C.M. Bayesian regression tree models for causal inference: Regularization, confounding, and heterogeneous effects (with discussion). Bayesian Anal. 2020, 15, 965–1056. [Google Scholar] [CrossRef]
- Hansotia, B.; Rukstales, B. Incremental value modeling. J. Interact. Mark. 2002, 16, 35–46. [Google Scholar] [CrossRef]
- Radcliffe, N. Using control groups to target on predicted lift: Building and assessing uplift model. Direct Mark. Anal. J. 2007, 14–21. Available online: https://www.research.ed.ac.uk/en/publications/using-control-groups-to-target-on-predicted-lift-building-and-ass/ (accessed on 19 September 2025).
- Foster, J.C.; Taylor, J.M.; Ruberg, S.J. Subgroup identification from randomized clinical trial data. Stat. Med. 2011, 30, 2867–2880. [Google Scholar] [CrossRef]
- Feuerriegel, S.; Frauen, D.; Melnychuk, V.; Schweisthal, J.; Hess, K.; Curth, A.; van der Schaar, M. Causal machine learning for predicting treatment outcomes. Nat. Med. 2024, 30, 958–968. [Google Scholar] [CrossRef]
- Kang, J.D.; Schafer, J.L. Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Stat. Sci. 2007, 22, 569–573. [Google Scholar] [PubMed]
- Powers, S.; Qian, J.; Jung, K.; Schuler, A.; Shah, N.H.; Hastie, T.; Tibshirani, R. Some methods for heterogeneous treatment effect estimation in high dimensions. Stat. Med. 2018, 37, 1767–1787. [Google Scholar] [CrossRef]
- D’Amour, A.; Ding, P.; Feller, A.; Lei, L.; Sekhon, J. Overlap in observational studies with high-dimensional covariates. J. Econom. 2021, 221, 644–654. [Google Scholar] [CrossRef]
- Robinson, P.M. Root-N-consistent semiparametric regression. Econom. J. Econom. Soc. 1988, 56, 931–954. [Google Scholar] [CrossRef]
- Chernozhukov, V.; Chetverikov, D.; Demirer, M.; Duflo, E.; Hansen, C.; Newey, W.; Robins, J. Double/debiased machine learning for treatment and structural parameters. Econom. J. 2018, 21, C1–C68. [Google Scholar] [CrossRef]
- Newey, W.K.; Robins, J.M. Cross-fitting and fast remainder rates for semiparametric estimation. In CeMMAP Working Papers; 41/17; Institute for Fiscal Studies: London, UK, 2017. [Google Scholar]
- Jacob, D. Cross-Fitting and Averaging for Machine Learning Estimation of Heterogeneous Treatment Effects. In IRTG 1792 Discussion Papers; 2020-014; 1792 “High Dimensional Nonstationary Time Series”; Humboldt University of Berlin, International Research Training Group: Berlin, Germany, 2020. [Google Scholar]
- Sekhon, J.S.; Shem-Tov, Y. Inference on a new class of sample average treatment effects. J. Am. Stat. Assoc. 2021, 116, 798–804. [Google Scholar] [CrossRef]
- Zivich, P.N.; Breskin, A. Machine learning for causal inference: On the use of cross-fit estimators. Epidemiology 2021, 32, 393–401. [Google Scholar] [CrossRef]
- Jarque, C.M.; Bera, A.K. Efficient tests for normality, homoscedasticity and serial independence of regression residuals. Econ. Lett. 1980, 6, 255–259. [Google Scholar] [CrossRef]
- Bera, A.K.; Jarque, C.M. Efficient tests for normality, homoscedasticity and serial independence of regression residuals: Monte Carlo evidence. Econ. Lett. 1981, 7, 313–318. [Google Scholar] [CrossRef]
- Friedman, J.H. Multivariate adaptive regression splines. Ann. Stat. 1991, 19, 1–67. [Google Scholar] [CrossRef]
- Biau, G. Analysis of a random forests model. J. Mach. Learn. Res. 2012, 13, 1063–1095. [Google Scholar]
- Huber, M.; Lechner, M.; Wunsch, C. The performance of estimators based on the propensity score. J. Econom. 2013, 175, 1–21. [Google Scholar] [CrossRef]
- Smith, B.I.; Chimedza, C.; Bührmann, J.H. Treatment effect performance of the X-Learner in the presence of confounding and non-linearity. Math. Comput. Appl. 2023, 28, 32. [Google Scholar] [CrossRef]
- Mullis, I.V.S.; Martin, M.O.; Foy, P.; Kelly, D.L.; Fishbein, B. TIMSS 2019 International Results in Mathematics and Science; Boston College: Chestnut Hill, MA, USA, 2020; Available online: https://timssandpirls.bc.edu/timss2019/international-results/ (accessed on 18 January 2025).
- Austin, P.C. Using the standardized difference to compare the prevalence of a binary variable between two groups in observational research. Commun. Stat. Simul. Comput. 2009, 38, 1228–1234. [Google Scholar] [CrossRef]
- Chernozhukov, V.; Demirer, M.; Duflo, E.; Fernandez-Val, I. Generic Machine Learning Inference on Heterogeneous Treatment Effects in Randomized Experiments, with an Application to Immunization in India. In NBER Working Paper No. 24678; National Bureau of Economic Research: Cambridge, MA, USA, 2018; Available online: https://www.nber.org/papers/w24678 (accessed on 26 July 2024).
- Talan, T. The Effect of Computer-Supported Collaborative Learning on Academic Achievement: A Meta-Analysis Study. Int. J. Educ. Math. Sci. Technol. 2021, 9, 426–448. [Google Scholar] [CrossRef]
- Jung, I. Improving online collaborative learning: Strategies to mitigate stress. Procedia Soc. Behav. Sci. 2013, 93, 322–325. [Google Scholar] [CrossRef][Green Version]
- Deng, Y.; Cherian, J.; Khan, N.U.; Kumari, K.; Sial, M.S.; Comite, U.; Popp, J. Family and academic stress and their impact on students’ depression level and academic performance. Front. Psychiatry 2022, 13, 869337. [Google Scholar] [CrossRef] [PubMed]
- An, G.; Wang, J.; Yang, Y. Chinese parents’ effect on children’s math and science achievements in schools with different SES. J. Comp. Fam. Stud. 2019, 50, 139–161. [Google Scholar] [CrossRef]
- Pappattu, J.; Vanitha, J. A study on family environment and its effect on academic achievement in science among secondary school students. Int. J. Res. Granthaalayah 2017, 5, 428–436. [Google Scholar] [CrossRef]
- Mao, P.; Cai, Z.; He, J.; Chen, X.; Fan, X. The relationship between attitude toward science and academic achievement in science: A three-level meta-analysis. Front. Psychol. 2021, 12, 784068. [Google Scholar] [CrossRef]
- Freeman, S.; Eddy, S.L.; McDonough, M.; Smith, M.K.; Okoroafor, N.; Jordt, H.; Wenderoth, M.P. Active learning increases student performance in science, engineering, and mathematics. Proc. Natl. Acad. Sci. USA 2014, 111, 8410–8415. [Google Scholar] [CrossRef]






| Estimate | 0.818 | |
| Confidence Interval | ||
| p-value | 0.284 | 0.000 |
| Mean in Subgroup 1 (CI) | Mean in Subgroup 5 (CI) | Difference (CI) | Hedge’s g | |
|---|---|---|---|---|
| Expect to go to college | 0.69 | 0.88 | −0.19 | −0.50 |
| (0.58, 0.80) | (0.80, 0.96) | (−0.33, −0.06) | ||
| Often absent | 0.21 | 0.03 | 0.18 | 0.53 |
| (0.11, 0.30) | (−0.01, 0.07) | (0.06, 0.28) | ||
| Parents went to college | 0.37 | 0.76 | −0.40 | −0.89 |
| (0.25, 0.48) | (0.65, 0.86) | (−0.56, −0.25) | ||
| Number of home educational resources | 10.08 | 11.71 | −1.68 | −1.18 |
| (9.71, 10.41) | (11.39, 12.02) | (−2.15, −1.20) | ||
| Sense of school belonging | 9.39 | 10.07 | −0.71 | −0.39 |
| (8.94, 9.84) | (9.67, 10.47) | (−1.34, −0.09) | ||
| Confidence in science | 10.50 | 10.98 | −0.43 | −0.23 |
| (10.02, 10.99) | (10.53, 11.42) | (−1.05, 0.21) | ||
| Has study desk | 0.78 | 0.91 | −0.15 | −0.40 |
| (0.68, 0.88) | (0.84, 0.98) | (−0.27, −0.02) | ||
| Home TV has premium TV channels | 0.85 | 0.93 | −0.06 | −0.21 |
| (0.76, 0.93) | (0.86, 0.99) | (−0.18, 0.04) | ||
| Using the internet to access assignments | 0.76 | 0.94 | −0.18 | −0.54 |
| (0.65, 0.86) | (0.88, 1.00) | (−0.30, −0.07) | ||
| Number of years teaching | 13.22 | 17.76 | −4.51 | −0.43 |
| (11.38, 15.10) | (14.93, 20.48) | (−7.81, −0.96) | ||
| Teacher job satisfaction | 9.60 | 10.66 | −1.08 | −0.60 |
| (9.12, 10.09) | (10.28, 11.05) | (−1.69, −0.49) | ||
| Safe and orderly school | 9.05 | 11.51 | −2.25 | −1.21 |
| (8.58, 9.51) | (11.08, 11.94) | (−2.89, −1.61) | ||
| Teaching unaffected by unprepared students | 9.60 | 10.89 | −1.23 | −0.59 |
| (9.18, 10.08) | (10.39, 11.41) | (−1.93, −0.51) | ||
| Teacher emphasizes on academic success | 9.78 | 11.12 | −1.33 | −0.60 |
| (9.31, 10.29) | (10.56, 11.64) | (−2.07, −0.58) | ||
| Type of degree | 0.59 | 0.80 | −0.22 | −0.49 |
| (0.47, 0.71) | (0.71, 0.90) | (−0.37, −0.07) | ||
| School discipline | 9.93 | 10.89 | −0.97 | −0.79 |
| (9.62, 10.25) | (10.64, 11.14) | (−1.39, −0.57) | ||
| School emphasizes on academic success | 9.65 | 10.95 | −1.33 | −0.66 |
| (9.16, 10.15) | (10.44, 11.42) | (−1.99, −0.65) | ||
| Science resources | 10.57 | 11.17 | −0.56 | −0.32 |
| (10.17, 10.98) | (10.76, 11.56) | (−1.14, 0.02) | ||
| Socioeconomic background of the school | 0.06 | 0.49 | −0.44 | −1.10 |
| (0.00, 0.12) | (0.37, 0.61) | (−0.57, −0.31) |
| R0 | ||||||||
| 0.97 | 0.98 | 0.99 | 1.00 | 1.01 | 1.02 | 1.03 | ||
| 0.97 | 15.78 (3.24, 28.33) | 12.06 (−1.02, 25.15) | 8.41 (−4.46, 21.28) | 3.80 (−9.17, 16.77) | −0.18 (−13.34, 12.97) | −5.11 (−17.97, 7.76) | −8.05 (−20.73, 4.63) | |
| 0.98 | 14.67 (1.67, 27.67) | 11.76 (−1.31, 24.83) | 6.58 (−6.83, 20.00) | 2.31 (−10.52, 15.14) | −1.05 (−14.45, 12.35) | −5.31 (−18.28, 7.65) | −9.91 (−21.78, 1.95) | |
| 0.99 | 13.01 (−0.17, 26.19) | 9.54 (−3.46, 22.55) | 6.15 (−6.90, 19.19) | 0.80 (−12.94, 14.55) | −2.59 (−15.44, 10.26) | −7.20 (−20.90, 6.49) | −10.82 (−23.89, 2.26) | |
| 1.00 | 11.90 (−0.34, 24.14) | 7.61 (−5.40, 20.62) | 4.57 (−8.37, 17.52) | 0.47 (−12.41, 13.36) | −3.42 (−15.90, 9.07) | −8.92 (−21.73, 3.89) | −12.95 (−26.19, 0.30) | |
| 1.01 | 11.55 (−1.70, 24.80) | 7.46 (−5.06, 19.98) | 2.61 (−10.34, 15.56) | −1.68 (−15.50, 12.14) | −4.82 (−17.46, 7.82) | −10.58 (−23.58, 2.41) | −12.65 (−26.94, 1.64) | |
| 1.02 | 10.03 (−2.99, 23.06) | 5.74 (−7.01, 18.50) | 1.70 (−11.23, 14.64) | −1.58 (−15.76, 12.60) | −6.97 (−20.08, 6.15) | −9.38 (−22.86, 4.11) | −13.69 (−27.10, −0.28) | |
| 1.03 | 9.66 (−3.16, 22.48) | 4.90 (−8.25, 18.04) | 1.13 (−11.44, 13.71) | −2.80 (−16.51, 10.90) | −6.99 (−20.96, 6.97) | −12.67 (−26.06, 0.71) | −14.79 (−28.28, −1.30) | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, J.; Jin, Y.; Wang, X. Comparing Meta-Learners for Estimating Heterogeneous Treatment Effects and Conducting Sensitivity Analyses. Math. Comput. Appl. 2025, 30, 139. https://doi.org/10.3390/mca30060139
Zhang J, Jin Y, Wang X. Comparing Meta-Learners for Estimating Heterogeneous Treatment Effects and Conducting Sensitivity Analyses. Mathematical and Computational Applications. 2025; 30(6):139. https://doi.org/10.3390/mca30060139
Chicago/Turabian StyleZhang, Jingxuan, Yanfei Jin, and Xueli Wang. 2025. "Comparing Meta-Learners for Estimating Heterogeneous Treatment Effects and Conducting Sensitivity Analyses" Mathematical and Computational Applications 30, no. 6: 139. https://doi.org/10.3390/mca30060139
APA StyleZhang, J., Jin, Y., & Wang, X. (2025). Comparing Meta-Learners for Estimating Heterogeneous Treatment Effects and Conducting Sensitivity Analyses. Mathematical and Computational Applications, 30(6), 139. https://doi.org/10.3390/mca30060139

