Learning Optimal Dynamic Treatment Regime from Observational Clinical Data through Reinforcement Learning
Abstract
:1. Introduction
2. Dynamic Treatment Regimes
Concept and Notation
- ■
- The first assumption, known as consistency, posits that a patient’s actual outcome corresponds to what their outcome would have been if the patient had received the treatment they were actually administered. In essence, the treatment received by a patient is the sole factor influencing their outcome.
- ■
- The second assumption concerns the concept of stable unit treatment value, asserting that an individual’s outcome remains unaffected by the treatments administered to other patients.
- ■
- The third assumption pertains to sequential exchangeability, indicating that the treatment assignment at each time point is assumed to be independent of future potential outcomes given past treatment and the covariate history.
3. Machine Learning Models for DTRs
3.1. Tree-Based Reinforcement Learning
3.2. Stochastic Tree-Based Reinforcement Learning
3.3. Causal Tree-Based Method
4. The Experimental Study: DTRs for Diabetic Kidney Disease
4.1. The Dataset
- ■
- Renin-Angiotensin-System-inhibitor (RASi)-only treatment;
- ■
- A combination of the Sodium-Glucose Transporter 2 inhibitor (SGLT2i) and the RASi treatment;
- ■
- A combination of the Glucagon-Like Peptide 1 receptor agonist (GLP1a) and the RASi treatment.
- ■
- A combination of the MineraloCorticoid Receptor Antagonist (MCRa) and the RASi treatment.
4.2. Variable Selection
4.3. Experimental Setup
Measurement Metrics
- 1.
- Expected Mean and Standard Deviation value of eGFR ()—calculated from the counterfactual eGFR under the estimated optimal treatment regime selected at a given stage. This is the average eGFR value that we would expect if the patient were to receive the estimated optimal treatment regime. A higher expected mean eGFR suggests that the optimal treatment regime is likely to be more effective in preserving kidney function. If the mean eGFR under the optimal treatment is significantly higher compared to the expected mean eGFR under the current or other regimes, it indicates that the optimal treatment is better at maintaining eGFR levels.
- 2.
- Optimal Classification Rate (Optimality Percentage)—a comprehensive comparison with the corresponding ground-truth treatments, which reveals the percentage rate of subjects accurately classified (assigned) into their respective optimal treatment categories, providing a quantitative measure of the model’s accuracy and predictive validity. In other words, this is the percentage of subjects correctly assigned to their optimal treatment categories based on the model’s predictions, compared to the ground truth.
4.4. Experimental Results
4.4.1. Single Stage
4.4.2. Two Stage
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Covariates in the PROVALID Study
Variable ACRONYM | Variable DESCRIPTION |
---|---|
GE | Gender |
HEIGHT | Height |
ADMD | Age at DM2 diagnosis |
AHDT | Age at HT diagnosis |
SDMAV | Severity of DM2 at first visit in PROVALID |
DDMAV | DM2 duration at the first visit in PROVALID (first for the patient sequence) |
DDMT | Duration of DM2 pharmacological treatment at first visit in PROVALID (first for the patient sequence) |
HTDAV | HT duration at first visit in PROVALID (first for the patient sequence) |
SHTAV | Severity of HT at visit in PROVALID (+1 for each HT drug) |
DHTT | Duration of HT pharmacological treatment at firt visit in PROVALID (first for the patient sequence) |
PHDRB | Personal history of diabetic retinopathy at baseline |
PHRDB | Personal history of renal disease at baseline |
PHHFB | Personal history of heart failure stage III or IV at baseline |
PHCADB | Personal history of coronary artery disease (any angina, myocardial infarction, coronary intervention) at baseline |
PHPADB | Personal history of peripheral artery disease (Claudicatio, amputation, etc) at baseline |
PHCVDB | Personal history of cerebrovascular disease (stroke, TIA, PRIND) |
SMOK | Smoking |
FHRD | Family history of renal disease |
FHHT | Family history of hypertension |
FHDM | Family history of type 2 diabetes |
FHCVD | Family history of cardiovascular disease |
FHM | Family history of malignancy |
BW | Body weight [kg] |
SBP | Systolic BP |
DBP | Diastolic BP |
AGEV | Age at visit |
BMI | Body Mass Index |
MABP | Mean arterial blood pressure |
PP | Pulse pressure |
BG | Blood glucose |
HBA1C | HbA1C |
SCR | Serum creatinine |
TOTCHOL | Serum cholesterol (total) |
LDLCHOL | Serum cholesterol (LDL) |
HDLCHOL | Serum cholesterol (HDL) |
STRIG | Serum triglycerides |
SPOT | Serum potassium |
HB | Hemoglobin |
SALB | Serum albumin |
CRP | CRP |
EGFR | eGFR |
UACR | mean UACR |
LDLHDLR | LDL/HDL cholesterol ratio |
EVLDLCHOL | (new) estimated VLDL based on the Friedewald equation—only for STRIG < 400 |
ELDLCHOL | (new) estimated LDL based on the Friedewald equation—only for STRIG < 400 |
ELDLHDLR | (new) LDL/HDL cholesterol ratio based on data—when available—or estimation |
UCREA | Urinary creatinine |
CA_CL_num | Calcium concentration in serum—NA below min and above max replaced by and |
PHOS_CL_num | Phosphate concentration in serum—NA below min and above max replaced by and |
CST3_num | Cystatin C concentration in serum—NA below min and above max replaced by and |
CPEP_CL_num | C-peptide concentration in serum—NA below min and above max replaced by and |
FFA_CL_num | Free Fatty Acids concentration in serum—NA below min and above max replaced by and |
UA_CL_num | Uric Acid concentration in serum—NA below min and above max replaced by and |
SO_CL_num | Sodium concentration in urine—NA below min and above max replaced by and |
POT_CL_num | Potassium concentration in urine—NA below min and above max replaced by and |
CHL_CL_num | Chloride concentration in urine—NA below min and above max replaced by and |
UNA24H | 24 h urinary sodium excretion |
NIDR | New incidence of diabetic retinopathy (DR) |
NIMI | New incidence of non fatal myocardial infarction (NFMI) |
NIS | New incidence of non fatal stroke (NFS) |
NIHF | New incidence of heart failure (stage III/IV) |
NICAD | New incidence of coronary artery disease (CAD) |
NICVD | New incidence of cerebrovascular disease (CD) |
NIPAD | New incidence of peripheral artery disease (PAD) |
AHBB | Beta-receptor blockers |
AHCA | Calcium antagonists |
AHCAAH | Centrally acting antihypertensives |
AHARB | Alpha-receptor blockers |
AHDV | Direct vasodilators |
ADSU | Sulfonylureas |
ADPPI | Meglitinides (glinides) |
ADGL | DPPIV inhibitors or GLP1 analogs |
ADGLIT | Thiazolinediones (glitazones) |
ADMET | Biguanides (metformin) |
ADAGI | Alpha-Glucosidase inhibitors |
ADI | Insulins |
LLCFA | Clofibric acid derivative |
LLSTAT | Statins |
LLOTHER | Other lipid-lowering drugs (ezetimibe, omega 3 acid) |
APASA | ASA |
APTPD | Thienopyridine derivatives |
APDIP | Dipyridamole |
APGPI | GPIIb/IIIa inhibitors |
APOTHER | Other platelet aggregation inhibitors (ticagrelor) |
VDAC | Alfacalcidol |
VDCCF | Colecalciferol |
EPODA | DarbEpoetin alfa |
EPOEA | Epoetin alfa |
EPOEB | poetin beta |
IO | Oral iron |
PBCB | Calcium-based |
DLOOP | Loop diuretics |
DTH | Thiazides |
DPS | Potassium-saving diuretics |
AC | Analgesics combinations |
ASC | Single-component analgesics |
TAH | Group “TAH” |
TAD | Group “TAD” |
TADI | TADI |
TLL | Group “TLL” |
TEPO | Group “TEPO” |
TDIU | Group “TDIU” |
MMP7_LUM_num | Matrix Metallopeptidase 7 concentration in serum—NA below min and above max replaced by and |
VEGFA_LUM_num | Vascular Endothelial Growth Factor A concentration in serum—NA below min and above max replaced by and |
AGER_LUM_num | Advanced Glycosylation End-Product Specific Receptor concentration in serum—NA below min and above max replaced by and |
LEP_LUM_num | Leptin concentration in serum—NA below min and above max replaced by and |
ICAM1_LUM_num | Intercellular Adhesion Molecule 1 concentration in serum NA below min and above max replaced by and |
TNFRSF1A_LUM_num | TNF Receptor Superfamily Member 1A concentration in serum—NA below min and above max replaced by and |
IL18_LUM_num | Interleukin 18 concentration in serum—NA below min and above max replaced by and |
DPP4_LUM_num | Dipeptidyl Peptidase 4 concentration in serum—NA below min and above max replaced by and |
LGALS3_LUM_num | Galectin 3 concentration in serum—NA below min and above max replaced by and |
SERPINE1_LUM_num | Serpin Family E Member 1 concentration in serum—NA below min and above max replaced by 0.5 ∗ min and 1.5 ∗ max |
ADIPOQ_LUM_num | Adiponectin, C1Q And Collagen Domain Containing concentration in serum—NA below min and above max replaced by and |
EGF_MESO_num_norm | epidermal growth factor concentration in urine normalized by UCREA—NA below min and above max replaced by and |
FGF21_MESO_num_norm | fibroblast growth factor 21 concentration in urine normalized by UCREA—NA below min and above max replaced by and |
IL6_MESO_num_norm | Interleukin 6 concentration in urine normalized by UCREA—NA below min and above max replaced by and |
HAVCR1_MESO_num_norm | hepatitis A virus cellular receptor 1 concentration in urine normalized by UCREA—NA below min and above max replaced by and |
CCL2_MESO_num_norm | C-C motif chemokine ligand 2 concentration in urine normalized by UCREA—NA below min and above max replaced by and |
MMP2_MESO_num_norm | matrix metallopeptidase 2 concentration in urine normalized by UCREA—NA below min and above max replaced by and |
MMP9_MESO_num_norm | matrix metallopeptidase 9 concentration in urine normalized by UCREA—NA below min and above max replaced by and |
LCN2_MESO_num_norm | lipocalin-2 concentration in urine normalized by UCREA—NA below min and above max replaced by and |
NPHS1_MESO_num_norm | NPHS1 adhesion molecule, nephrin concentration in urine normalized by UCREA—NA below min and above max replaced by and |
THBS1_MESO_num_norm | thrombospondin 1 concentration in urine normalized by UCREA—NA below min and above max replaced by and |
Appendix B. Covariates Selected by Bayesian Network
Variable ACRONYM | Variable DESCRIPTION |
---|---|
GE | Gender |
DLOOP | Loop diuretics |
SCR | Serum creatinine |
CST3_num | Cystatin C concentration in serum—NA below min and above max replaced by and |
PHRDB | Personal history of renal disease at baseline |
EGF_MESO_num_norm | epidermal growth factor concentration in urine normalized by UCREA—NA below min and above max replaced by and |
FGF21_MESO_num_norm | fibroblast growth factor 21 concentration in urine normalized by UCREA—NA below min and above max replaced by and |
HB | Hemoglobin |
HDLCHOL | Serum cholesterol (HDL) |
ICAM1_LUM_num | Intercellular Adhesion Molecule 1 concentration in serum NA below min and above max replaced by and |
LEP_LUM_num | Leptin concentration in serum—NA below min and above max replaced by and |
MMP7_LUM_num | Matrix Metallopeptidase 7 concentration in serum—NA below min and above max replaced by and |
SPOT | Serum potassium |
TNFRSF1A_LUM_num | TNF Receptor Superfamily Member 1A concentration in serum—NA below min and above max replaced by and |
UACR | mean UACR |
CRP | CRP |
LDLCHOL | Serum Cholesterol (LDL) |
HBA1C | HbA1C |
BG | Blood glucose |
ADMET | Biguanides (metformin) |
GE | Gender |
UCREA | Urinary creatinine |
DBP | Diastolic BP |
LGALS3_LUM | Galectin 3 concentration in serum—NA below min and above max replaced by and |
ADMD | Age at DM2 diagnosis |
STRIG | Serum triglycerides |
TOTCHOL | Serum cholesterol (total) |
ADIPOQ_LUM_num | Adiponectin, C1Q, And Collagen Domain Containing concentration in serum—NA below min and above max replaced by 0.5 ∗ min and 1.5 ∗ max |
AGEV | Age at visit |
BMI | Body Mass Index |
References
- Pugliese, G.; Penno, G.; Natali, A.; Barutta, F.; Di Paolo, S.; Reboldi, G.; Gesualdo, L.; De Nicola, L. Diabetic kidney disease: New clinical and therapeutic issues. Joint position statement of the Italian Diabetes Society and the Italian Society of Nephrology on “The natural history of diabetic kidney disease and treatment of hyperglycemia in patients with type 2 diabetes and impaired renal function”. Nutr. Metab. Cardiovasc. Dis. 2019, 29, 1127–1150. [Google Scholar] [PubMed]
- Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
- König, I.R.; Fuchs, O.; Hansen, G.; von Mutius, E.; Kopp, M.V. What is precision medicine? Eur. Respir. J. 2017, 50, 1700391. [Google Scholar] [PubMed]
- Ginsburg, G.S.; Phillips, K.A. Precision medicine: From science to value. Health Aff. 2018, 37, 694–701. [Google Scholar]
- Robins, J. A new approach to causal inference in mortality studies with a sustained exposure period—Application to control of the healthy worker survivor effect. Math. Model. 1986, 7, 1393–1512. [Google Scholar]
- Robins, J.M. Correcting for non-compliance in randomized trials using structural nested mean models. Commun. Stat. Theory Methods 1994, 23, 2379–2412. [Google Scholar] [CrossRef]
- Robins, J.M. Causal inference from complex longitudinal data. In Proceedings of the Latent Variable Modeling and Applications to Causality; Lecture Notes in Statistics. Springer: New York, NY, USA; Berlin/Heidelberg, Germany, 1997; pp. 69–117. [Google Scholar]
- Murphy, S.A. Optimal dynamic treatment regimes. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2003, 65, 331–355. [Google Scholar]
- Chakraborty, B.; Moodie, E.E. Statistical Methods for Dynamic Treatment Regimes; Springer: Berlin/Heidelberg, Germany, 2013; Volume 10, pp. 978–981. [Google Scholar]
- Chakraborty, B.; Murphy, S.A. Dynamic treatment regimes. Annu. Rev. Stat. Its Appl. 2014, 1, 447–464. [Google Scholar] [CrossRef]
- Wagner, E.H.; Austin, B.T.; Davis, C.; Hindmarsh, M.; Schaefer, J.; Bonomi, A. Improving chronic illness care: Translating evidence into action. Health Aff. 2001, 20, 64–78. [Google Scholar] [CrossRef] [PubMed]
- Robins, J.M. Optimal structural nested models for optimal sequential decisions. In Proceedings of the Second Seattle Symposium in Biostatistics: Analysis of Correlated Data; Springer: Berlin/Heidelberg, Germany; New York, NY, USA, 2004; pp. 189–326. [Google Scholar]
- Murphy, S.A.; van der Laan, M.J.; Robins, J.M.; Conduct Problems Prevention Research Group. Marginal mean models for dynamic regimes. J. Am. Stat. Assoc. 2001, 96, 1410–1423. [Google Scholar] [CrossRef]
- Moodie, E.E.; Chakraborty, B.; Kramer, M.S. Q-learning for estimating optimal dynamic treatment rules from observational data. Can. J. Stat. 2012, 40, 629–645. [Google Scholar]
- Wallace, M.P.; Moodie, E.E.; Stephens, D.A. Dynamic treatment regimen estimation via regression-based techniques: Introducing r package dtrreg. J. Stat. Softw. 2017, 80, 1–20. [Google Scholar]
- Tsiatis, A.A.; Davidian, M.; Holloway, S.T.; Laber, E.B. Dynamic Treatment Regimes: Statistical Methods for Precision Medicine; CRC press: Boca Raton, FL, USA, 2019. [Google Scholar]
- van der Laan, M.J.; Petersen, M.L.; Joffe, M.M. History-adjusted marginal structural models and statically-optimal dynamic treatment regimens. Int. J. Biostat. 2005, 1. [Google Scholar] [CrossRef]
- Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Murphy, S.A. A Generalization Error for Q-Learning. 2005. Available online: https://www.jmlr.org/papers/volume6/murphy05a/murphy05a.pdf (accessed on 27 May 2024).
- Mahar, R.K.; McGuinness, M.B.; Chakraborty, B.; Carlin, J.B.; IJzerman, M.J.; Simpson, J.A. A scoping review of studies using observational data to optimise dynamic treatment regimens. BMC Med. Res. Methodol. 2021, 21, 39. [Google Scholar]
- Blumlein, T.; Persson, J.; Feuerriegel, S. Learning optimal dynamic treatment regimes using causal tree methods in medicine. In Proceedings of the Machine Learning for Healthcare Conference. PMLR, Durham, NC, USA, 5–6 August 2022; pp. 146–171. [Google Scholar]
- Tao, Y.; Wang, L. Adaptive contrast weighted learning for multi-stage multi-treatment decision-making. Biometrics 2017, 73, 145–155. [Google Scholar] [CrossRef] [PubMed]
- Laber, E.B.; Zhao, Y.Q. Tree-based methods for individualized treatment regimes. Biometrika 2015, 102, 501–514. [Google Scholar]
- Zhang, Y.; Laber, E.B.; Tsiatis, A.; Davidian, M. Using decision lists to construct interpretable and parsimonious treatment regimes. Biometrics 2015, 71, 895–904. [Google Scholar] [PubMed]
- Zhang, Y.; Laber, E.B.; Davidian, M.; Tsiatis, A.A. Interpretable dynamic treatment regimes. J. Am. Stat. Assoc. 2018, 113, 1541–1549. [Google Scholar] [CrossRef]
- Lakkaraju, H.; Rudin, C. Learning cost-effective and interpretable treatment regimes. In Proceedings of the Artificial Intelligence and Statistics. PMLR, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 166–175. [Google Scholar]
- Rivest, R.L. Learning decision lists. Mach. Learn. 1987, 2, 229–246. [Google Scholar] [CrossRef]
- Tao, Y.; Wang, L.; Almirall, D. Tree-based reinforcement learning for estimating optimal dynamic treatment regimes. Ann. Appl. Stat. 2018, 12, 1914. [Google Scholar] [CrossRef]
- Sun, Y.; Wang, L. Stochastic tree search for estimating optimal dynamic treatment regimes. J. Am. Stat. Assoc. 2021, 116, 421–432. [Google Scholar] [CrossRef]
- Min, J.; Elliott, L.T. Q-learning with online random forests. arXiv 2022, arXiv:2204.03771. [Google Scholar]
- Alyass, A.; Turcotte, M.; Meyre, D. From big data analysis to personalized medicine for all: Challenges and opportunities. BMC Med. Genom. 2015, 8, 33. [Google Scholar] [CrossRef]
- Mathur, S.; Sutton, J. Personalized medicine could transform healthcare. Biomed. Rep. 2017, 7, 3–5. [Google Scholar] [PubMed]
- Denson, L.A.; Curran, M.; McGovern, D.P.; Koltun, W.A.; Duerr, R.H.; Kim, S.C.; Sartor, R.B.; Sylvester, F.A.; Abraham, C.; de Zoeten, E.F.; et al. Challenges in IBD research: Precision medicine. Inflamm. Bowel Dis. 2019, 25, S31–S39. [Google Scholar]
- Martin, T.P.; Hanusa, B.H.; Kapoor, W.N. Risk stratification of patients with syncope. Ann. Emerg. Med. 1997, 29, 459–466. [Google Scholar] [PubMed]
- Roberts, M.C. Implementation challenges for risk-stratified screening in the era of precision medicine. JAMA Oncol. 2018, 4, 1484–1485. [Google Scholar] [CrossRef]
- Rosenbaum, P.R.; Rubin, D.B. The central role of the propensity score in observational studies for causal effects. Biometrika 1983, 70, 41–55. [Google Scholar]
- Robins, J.M.; Hernán, M.A. Estimation of the causal effects of time-varying exposures. Longitud. Data Anal. 2009, 553, 599. [Google Scholar]
- Plant, D.; Barton, A. Machine learning in precision medicine: Lessons to learn. Nat. Rev. Rheumatol. 2021, 17, 5–6. [Google Scholar] [CrossRef]
- Zhou, N.; Brook, R.D.; Dinov, I.D.; Wang, L. Optimal dynamic treatment regime estimation using information extraction from unstructured clinical text. Biom. J. 2022, 64, 805–817. [Google Scholar] [CrossRef] [PubMed]
- Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Routledge: Wadsworth, OH, USA; Belmont, MA, USA, 1984; ISBN 978-0412048418. [Google Scholar]
- Robins, J.M.; Rotnitzky, A.; Zhao, L.P. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Am. Stat. Assoc. 1995, 90, 106–121. [Google Scholar] [CrossRef]
- Chipman, H.A.; George, E.I.; McCulloch, R.E. Bayesian CART model search. J. Am. Stat. Assoc. 1998, 93, 935–948. [Google Scholar] [CrossRef]
- Wu, Y.; Tjelmeland, H.; West, M. Bayesian CART: Prior specification and posterior simulation. J. Comput. Graph. Stats. 2007, 16, 44–66. [Google Scholar]
- Athey, S.; Imbens, G. Recursive partitioning for heterogeneous causal effects. Proc. Natl. Acad. Sci. USA 2016, 113, 7353–7360. [Google Scholar] [CrossRef] [PubMed]
- Mayer, G.; Eder, S.; Rosivall, L.; Voros, P.; Heerspink, H.L.; de Zeeuw, D.; Czerwienska, B.; Wiecek, A.; Hillyard, D.; Mark, P.; et al. Baseline Data from the Multinational Prospective Cohort Study for Validation of Biomarkers (Provalid). Nephrol. Dial. Transplant. 2016, 31, 1482. [Google Scholar] [CrossRef]
- Eder, S.; Leierer, J.; Kerschbaum, J.; Rosivall, L.; Wiecek, A.; de Zeeuw, D.; Mark, P.B.; Heinze, G.; Rossing, P.; Heerspink, H.L.; et al. A prospective cohort study in patients with type 2 diabetes mellitus for validation of biomarkers (PROVALID)—Study design and baseline characteristics. Kidney Blood Press. Res. 2018, 43, 181–190. [Google Scholar] [CrossRef] [PubMed]
- Gregorich, M.; Heinzel, A.; Kammer, M.; Meiselbach, H.; Böger, C.; Eckardt, K.U.; Mayer, G.; Heinze, G.; Oberbauer, R. A prediction model for the decline in renal function in people with type 2 diabetes mellitus: Study protocol. Diagn. Progn. Res. 2021, 5, 19. [Google Scholar]
- Scutari, M.; Denis, J.B. Bayesian Networks: With Examples in R; Chapman and Hall/CRC: Boca Raton, FL, USA, 2021. [Google Scholar]
- Scutari, M.; Auconi, P.; Caldarelli, G.; Franchi, L. Bayesian networks analysis of malocclusion data. Sci. Rep. 2017, 7, 15236. [Google Scholar] [CrossRef]
- Arora, P.; Boyne, D.; Slater, J.J.; Gupta, A.; Brenner, D.R.; Druzdzel, M.J. Bayesian networks for risk prediction using real-world data: A tool for precision medicine. Value Health 2019, 22, 439–445. [Google Scholar]
- Shen, J.; Liu, F.; Xu, M.; Fu, L.; Dong, Z.; Wu, J. Decision support analysis for risk identification and control of patients affected by COVID-19 based on Bayesian Networks. Expert Syst. Appl. 2022, 196, 116547. [Google Scholar] [CrossRef]
- Kitson, N.K.; Constantinou, A.C.; Guo, Z.; Liu, Y.; Chobtham, K. A survey of Bayesian Network structure learning. Artif. Intell. Rev. 2023, 56, 8721–8814. [Google Scholar]
Method | Optimality (Opt)% | |
---|---|---|
Q-learning with Random Forest (Q-RF) | 51 | 60.6 (18.5) |
DTR-Causal Tree (DTR-CT) | 76.5 | 64 (16.4) |
DTR-Causal Forest (DTR-CF) | 78 | 65.8 (16.04) |
Stochastic tree-based reinforcement learning (SL-RL) | 73 | 63.8 (16.1) |
Tree-based reinforcement learning (T-RL) | 61 | 61.3 (17.7) |
Algorithm | Optimality (Opt)% | |
---|---|---|
Q-learning with Random Forest (Q-RF) | 57 | 68.34 (16.5) |
DTR-Causal Tree (DTR-CT) | 82.3 | 74 (14.2) |
DTR-Causal Forest (DTR-CF) | 85 | 78.03 (13.9) |
Stochastic tree-based reinforcement learning (SL-RL) | 73 | 72.43 (15.81) |
Tree-based reinforcement learning (T-RL) | 71.5 | 69.7 (16.2) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Abebe, S.; Poli, I.; Jones, R.D.; Slanzi, D. Learning Optimal Dynamic Treatment Regime from Observational Clinical Data through Reinforcement Learning. Mach. Learn. Knowl. Extr. 2024, 6, 1798-1817. https://doi.org/10.3390/make6030088
Abebe S, Poli I, Jones RD, Slanzi D. Learning Optimal Dynamic Treatment Regime from Observational Clinical Data through Reinforcement Learning. Machine Learning and Knowledge Extraction. 2024; 6(3):1798-1817. https://doi.org/10.3390/make6030088
Chicago/Turabian StyleAbebe, Seyum, Irene Poli, Roger D. Jones, and Debora Slanzi. 2024. "Learning Optimal Dynamic Treatment Regime from Observational Clinical Data through Reinforcement Learning" Machine Learning and Knowledge Extraction 6, no. 3: 1798-1817. https://doi.org/10.3390/make6030088
APA StyleAbebe, S., Poli, I., Jones, R. D., & Slanzi, D. (2024). Learning Optimal Dynamic Treatment Regime from Observational Clinical Data through Reinforcement Learning. Machine Learning and Knowledge Extraction, 6(3), 1798-1817. https://doi.org/10.3390/make6030088