Predicting 30-Day Readmission Risks in Breast Cancer Patients: An Explainable Machine Learning Approach
Abstract
1. Introduction
- (i)
- To identify key risk factors associated with 30-day readmissions of breast cancer patients.
- (ii)
- To develop an ML model to predict 30-day readmissions among individuals diagnosed with breast cancer.
- (iii)
- To explore the use of explainable artificial intelligence (XAI) methods to interpret and explain the predictions generated by the models.
2. Related Work
3. Materials and Methods
3.1. Data Source
3.2. Data Preprocessing
3.3. Feature Engineering and Selection
3.4. Machine Learning Algorithms
3.5. Experimental Setup
3.6. External Validation
3.7. Model Evaluation Metrics
3.7.1. Accuracy
3.7.2. Precision
3.7.3. Recall
3.7.4. Specificity
3.7.5. F1-Score
3.7.6. AUC-ROC
3.7.7. PR-AUC
3.7.8. Baseline Clinical Comparator
3.8. Explainable Artificial Intelligence
4. Ethical Considerations
5. Results
5.1. Descriptive Statistics
5.2. Model Performance Using Repeated Stratified Cross-Validation
5.2.1. Performance Without SMOTE (Baseline Evaluation)
5.2.2. Performance with SMOTE (Fold-Level Resampling)
5.2.3. Confusion Matrix Interpretation
5.3. Comparative Analysis with Existing Literature
5.4. Explainability Results
5.4.1. SHAP Insights
5.4.2. Partial Dependence Plots (PDP)
5.4.3. LIME Explanations
6. Discussion
6.1. Implications for Clinical Practice and Health Policy
6.2. Limitations and Future Directions
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AUC | Area under the curve |
| DT | Decision tree |
| ECOG | Eastern Cooperative Oncology Group |
| KNN | K-nearest neighbours |
| LIME | Local interpretable model-agnostic explanations |
| LR | Logistic regression |
| ML | Machine learning |
| NB | Naive Bayes |
| RF | Random forest |
| ROC | Receiver operating characteristic |
| SD | Standard deviation |
| SHAP | SHapley additive exPlanations |
| SVM | Support vector machine |
| WHO | World Health Organization |
| XGB | Extreme gradient boosting |
| USA | United State of America |
| CA | Canada |
References
- World Health Organization. Breast Cancer. 2024. Available online: https://www.who.int/news-room/fact-sheets/detail/breast-cancer (accessed on 1 January 2025).
- Stabellini, N.; Nazha, A.; Agrawal, N.; Huhn, M.; Shanahan, J.; Hamerschlak, N.; Waite, K.; Barnholtz-Sloan, J.S.; Montero, A.J. Thirty-Day Unplanned Hospital Readmissions in Patients With Cancer and the Impact of Social Determinants of Health: A Machine Learning Approach. JCO Clin. Cancer Inform. 2023, 7, e2200143. [Google Scholar] [CrossRef]
- Daly, B.; Olopade, O.I. A perfect storm: How tumor biology, genomics, and health care delivery patterns collide to create a racial survival disparity in breast cancer and proposed interventions for change. CA Cancer J. Clin. 2015, 65, 221–238. [Google Scholar] [CrossRef]
- Rahman, M.A.; Khan, M.S.H.; Watanobe, Y.; Prioty, J.T.; Annita, T.T.; Rahman, S.; Hossain, M.S.; Aitijjo, S.A.; Taskin, R.I.; Dhrubo, V.; et al. Advancements in Breast Cancer Detection: A Review of Global Trends, Risk Factors, Imaging Modalities, Machine Learning, and Deep Learning Approaches. BioMedInformatics 2025, 5, 46. [Google Scholar] [CrossRef]
- Jencks, S.F.; Williams, M.V.; Coleman, E.A. Rehospitalizations among patients in the Medicare fee-for-service program. N. Engl. J. Med. 2009, 360, 1418–1428. [Google Scholar] [CrossRef]
- Centers for Medicare & Medicaid Services. Hospital Readmissions Reduction Program. 2025. Available online: https://www.cms.gov/medicare/payment/prospective-payment-systems/acute-inpatient-pps/hospital-readmissions-reduction-program-hrrp (accessed on 12 January 2025).
- Pal Choudhury, P.; Wilcox, A.N.; Brook, M.N.; Zhang, Y.; Ahearn, T.; Orr, N.; Coulson, P.; Schoemaker, M.J.; Jones, M.E.; Gail, M.H.; et al. Comparative validation of breast cancer risk prediction models and projections for future risk stratification. JNCI J. Natl. Cancer Inst. 2020, 112, 278–285. [Google Scholar] [CrossRef]
- Riba, L.A.; Gruner, R.A.; Fleishman, A.; James, T.A. Surgical Risk Factors for the Delayed Initiation of Adjuvant Chemotherapy in Breast Cancer. Ann. Surg. Oncol. 2018, 25, 1904–1911. [Google Scholar] [CrossRef]
- Miret, C.; Domingo, L.; Louro, J.; Barata, T.; Baré, M.; Ferrer, J.; Carmona-García, M.C.; Castells, X.; Sala, M. Factors associated with readmissions in women participating in screening programs and treated for breast cancer: A retrospective cohort study. BMC Health Serv. Res. 2019, 19, 940. [Google Scholar] [CrossRef]
- Chen, T.; Madanian, S.; Airehrour, D.; Cherrington, M. Machine learning methods for hospital readmission prediction: Systematic analysis of literature. J. Reliab. Intell. Environ. 2022, 8, 49–66. [Google Scholar] [CrossRef]
- Dalwai, E.; Buccimazza, I. System delays in breast cancer. S. Afr. J. Surg. 2015, 53, 40. [Google Scholar] [CrossRef]
- Green, V.L. Breast cancer risk assessment and management of the high-risk patient. Obstet. Gynecol. Clin. 2022, 49, 87–116. [Google Scholar] [CrossRef] [PubMed]
- Brankovic, A.; Rolls, D.; Boyle, J.; Niven, P.; Khanna, S. Identifying patients at risk of unplanned re-hospitalisation using statewide electronic health records. Sci. Rep. 2022, 12, 16592. [Google Scholar] [CrossRef] [PubMed]
- Khavanin, N.; Bethke, K.P.; Lovecchio, F.C.; Jeruss, J.S.; Hansen, N.M.; Kim, J.Y. Risk factors for unplanned readmissions following excisional breast surgery. Breast J. 2014, 20, 288–294. [Google Scholar] [CrossRef] [PubMed]
- Mohanty, S.D.; Lekan, D.; McCoy, T.P.; Jenkins, M.; Manda, P. Machine learning for predicting readmission risk among the frail: Explainable AI for healthcare. Patterns 2022, 3, 100395. [Google Scholar] [CrossRef]
- Hwang, S.; Urbanowicz, R.; Lynch, S.; Vernon, T.; Bresz, K.; Giraldo, C.; Kennedy, E.; Leabhart, M.; Bleacher, T.; Ripchinski, M.R.; et al. Toward Predicting 30-Day Readmission Among Oncology Patients: Identifying Timely and Actionable Risk Factors. JCO Clin. Cancer Inform. 2023, 7, e2200097. [Google Scholar] [CrossRef]
- Kim, C.; Gadgil, S.U.; Lee, S.I. Transparency of medical artificial intelligence systems. Nat. Rev. Bioeng. 2026, 4, 11–29. [Google Scholar] [CrossRef]
- Rasool, A.; Bunterngchit, C.; Tiejian, L.; Islam, M.R.; Qu, Q.; Jiang, Q. Improved Machine Learning-Based Predictive Models for Breast Cancer Diagnosis. Int. J. Environ. Res. Public Health 2022, 19, 3211. [Google Scholar] [CrossRef]
- Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
- Soliman, A.; Agvall, B.; Etminani, K.; Hamed, O.; Lingman, M. The Price of Explainability in Machine Learning Models for 100-Day Readmission Prediction in Heart Failure: Retrospective, Comparative, Machine Learning Study. JMIR Med. Inform. 2023, 25, e46934. [Google Scholar] [CrossRef]
- Raj, S. Developing AI Models for Predicting Hospital Readmission Rates. J. Publ. Int. Res. Eng. Manag. 2025, 5, 1–8. [Google Scholar]
- Alelyani, T.; Alshammari, M.M.; Almuhanna, A.; Asan, O. Explainable Artificial Intelligence in Quantifying Breast Cancer Factors: Saudi Arabia Context. Healthcare 2024, 12, 1025. [Google Scholar] [CrossRef] [PubMed]
- Bibi, H.; Khan, S.; Shabir, M. A Critique Of Research Paradigms And Their Implications For Qualitative, Quantitative And Mixed Research Methods. Webology 2022, 19, 7321–7335. [Google Scholar]
- Andersen, R.M. Revisiting the behavioral model and access to medical care: Does it matter? J. Health Soc. Behav. 1995, 36, 1–10. [Google Scholar] [CrossRef]
- Moore, L.; Lavoie, A.; Bourgeois, G.; Lapointe, J. Donabedian’s structure-process-outcome quality of care model: Validation in an integrated trauma system. J. Trauma Acute Care Surg. 2015, 78, 1168–1175. [Google Scholar] [CrossRef]
- Cosentino, V.; Luis, J.; Cabot, J. Findings from GitHub: Methods, datasets and limitations. In Proceedings of the 13th International Conference on Mining Software Repositories; Association for Computing Machinery: New York, NY, USA, 2016; pp. 137–141. [Google Scholar]
- He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
- Zhang, H. The Optimality of Naive Bayes. In Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004); The AAAI Press: Menlo Park, CA, USA, 2004. [Google Scholar]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
- Xu, Q.; Xie, W.; Liao, B.; Hu, C.; Qin, L.; Yang, Z.; Xiong, H.; Lyu, Y.; Zhou, Y.; Luo, A. Interpretability of Clinical Decision Support Systems Based on Artificial Intelligence from Technological and Medical Perspective: A Systematic Review. J. Healthc. Eng. 2023, 2023, 9919269. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016; pp. 1135–1144. [Google Scholar]
- Hosmer, J.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression; Wiley: Hoboken, NJ, USA, 2013. [Google Scholar]
- Gonzalez-Castro, L.; Chávez, M.; Duflot, P.; Bleret, V.; Martin, A.G.; Zobel, M.; Nateqi, J.; Lin, S.; Pazos-Arias, J.J.; Del Fiol, G.; et al. Machine Learning Algorithms to Predict Breast Cancer Recurrence Using Structured and Unstructured Sources from Electronic Health Records. Cancers 2023, 15, 2741. [Google Scholar] [CrossRef] [PubMed]
- Park, S.W.; Park, Y.L.; Lee, E.G.; Chae, H.; Park, P.; Choi, D.W.; Choi, Y.H.; Hwang, J.; Ahn, S.; Kim, K.; et al. Mortality Prediction Modeling for Patients with Breast Cancer Based on Explainable Machine Learning. Cancers 2024, 16, 3799. [Google Scholar] [CrossRef] [PubMed]
- Labilloy, G.; Jasra, B.; Widrich, J.; Edgar, L.; Smotherman, C.; Neumayer, L.; Celso, B.G. Machine learning determined risk factors associated with non-adherence to timely surgery for breast cancer patients. Ann. Breast Surg. 2024, 8, 3. [Google Scholar] [CrossRef]
- Lou, S.J.; Hou, M.F.; Chang, H.T.; Chiu, C.C.; Lee, H.H.; Yeh, S.C.J.; Shi, H.Y. Machine Learning Algorithms to Predict Recurrence within 10 Years after Breast Cancer Surgery: A Prospective Cohort Study. Cancers 2020, 12, 3817. [Google Scholar] [CrossRef]
- Du, K.L.; Jiang, B.; Lu, J.; Hua, J.; Swamy, M.N.S. Exploring Kernel Machines and Support Vector Machines: Principles, Techniques, and Future Directions. Mathematics 2024, 12, 3935. [Google Scholar] [CrossRef]
- Magboo, M.S.A.; Magboo, V.P.C. Feature Importance Measures as an Explanation for Classification Applied to Hospital Readmission Prediction. Procedia Comput. Sci. 2022, 207, 1388–1397. [Google Scholar] [CrossRef]
- Park, C.; Lee, H.; Jensen, B.C.; Schonberg, M.A. Hospital readmission after a breast cancer-related admission among breast cancer patients with and without heart failure. J. Clin. Oncol. 2022, 40, E18717. [Google Scholar] [CrossRef]
- Tokac, U.; Chipps, J.; Brysiewicz, P.; Bruce, J.; Clarke, D. Using Machine Learning to Improve Readmission Risk in Surgical Patients in South Africa. Int. J. Environ. Res. Public Health 2025, 22, 345. [Google Scholar] [CrossRef]
- Hossin, M.; Sulaiman, M.N. A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process. 2015, 5, 1. [Google Scholar] [CrossRef]
- Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
- Brown, J. Classifiers and their metrics quantified. Mol. Inform. 2018, 37, 1700127. [Google Scholar] [CrossRef] [PubMed]
- Cullerne Bown, W. Sensitivity and Specificity versus Precision and Recall, and Related Dilemmas. J. Classif. 2024, 41, 402–426. [Google Scholar] [CrossRef]
- Pandey, S.R.; Tile, J.D.; Oghaz, M.M.D. Predicting 30-day hospital readmissions using ClinicalT5 with structured and unstructured electronic health records. PLoS ONE 2025, 20, e0328848. [Google Scholar] [CrossRef] [PubMed]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Adadi, A.; Berrada, M. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
- Belle, V.; Papantonis, I. Principles and Practice of Explainable Machine Learning. Front. Big Data 2021, 4, 688969. [Google Scholar] [CrossRef] [PubMed]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017. [Google Scholar]








| Feature Name | Feature Description | Data Type |
|---|---|---|
| age | Patient’s age in years | Integer |
| comorbidities | Number of comorbid conditions | Integer |
| ecog_score | ECOG performance status score | Integer |
| dose_reduction_cycle1 | Whether dose reduction occurred in cycle 1 (0/1) | Integer |
| baseline_neutrophil | Baseline neutrophil count | Float |
| readmitted_30d | Readmission within 30 days (binary outcome) | Integer |
| Model | Key Parameter(s) | Setting | Rationale |
|---|---|---|---|
| LR | max_iter, solver | 2000, liblinear | Stable convergence on small datasets; regularised linear baseline. |
| NB | None | Default | Parameter-free baseline suitable for small samples. |
| KNN | n_neighbors | Default | Included as an instance-based comparator; scaling applied within folds. |
| DT | max_depth, random_state | Restricted depth | Controls complexity to reduce overfitting in N = 100. |
| RF | n_estimators | 500 trees | Stabilises variance via bagging; robust non-linear baseline. |
| SVM | probability, kernel | True; default | Non-linear classifier included; scaling applied within folds. |
| XGB | n_estimators, learning_rate, max_depth | Conservative | Reduces overfitting risk; robust ensemble for structured data. |
| Variable | Mean | SD | Min | Max |
|---|---|---|---|---|
| Age (years) | 61.6 | 13.3 | 40 | 84 |
| Comorbidities | 1.7 | 1.34 | 0 | 6 |
| ECOG score | 1.28 | 0.92 | 0 | 3 |
| Dose reduction in cycle 1 (1 = Yes) | 0.27 | 0.45 | 0 | 1 |
| Baseline neutrophil (/L) | 3.33 | 1.25 | 0.6 | 7.2 |
| Readmitted within 30 days (1 = Yes) | 0.87 | 0.34 | 0 | 1 |
| Model | Accuracy | Precision | Recall | Specificity | F1-Score | AUC |
|---|---|---|---|---|---|---|
| LR | 0.87 | 0.87 | 1.00 | 0.00 | 0.93 | 0.68 |
| NB | 0.84 | 0.86 | 0.95 | 0.00 | 0.91 | 0.65 |
| SVM | 0.87 | 0.87 | 1.00 | 0.00 | 0.93 | 0.43 |
| KNN | 0.85 | 0.86 | 0.97 | 0.00 | 0.91 | 0.65 |
| DT | 0.79 | 0.88 | 0.87 | 0.23 | 0.87 | 0.52 |
| RF | 0.83 | 0.87 | 0.94 | 0.07 | 0.90 | 0.61 |
| XGBoost | 0.86 | 0.88 | 0.96 | 0.15 | 0.92 | 0.65 |
| Model | Accuracy | Precision | Recall | Spec. | F1-Score | ROC-AUC | PR-AUC | Brier |
|---|---|---|---|---|---|---|---|---|
| LR | ||||||||
| N B | ||||||||
| SVM | ||||||||
| KNN | ||||||||
| DT | ||||||||
| RF | ||||||||
| XGBoost |
| Study Reference | Model | AUC | Context | Comparison to Current Study |
|---|---|---|---|---|
| [40] | RF | 0.99 | USA (Large Dataset) | Near-perfect metrics suggest potential overfitting or lack of external validation. |
| [36] | XGBoost | 0.87 | South Korea (Oncology) | Superior AUC; however, RF (0.72) is closer to the current study’s range. |
| [16] | RF/NB | 0.71 | USA (Oncology) | Most comparable to current study; shows oncology AUCs often hover around 0.70. |
| [37] | AdaBoost | 0.82 | USA (30–120 days) | Higher AUC but lacks SHAP/LIME for individual local interpretation. |
| [42] | RF | 0.63 | South Africa | Highly similar performance; suggests 0.60–0.70 is a regional benchmark for oncology. |
| This Study (2025) | RF | 0.68 | Breast Cancer (N = 100) | Focuses on preliminary XAI feasibility. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Mqadi, M.; Mbunge, E.; Makaba, T. Predicting 30-Day Readmission Risks in Breast Cancer Patients: An Explainable Machine Learning Approach. Appl. Sci. 2026, 16, 2467. https://doi.org/10.3390/app16052467
Mqadi M, Mbunge E, Makaba T. Predicting 30-Day Readmission Risks in Breast Cancer Patients: An Explainable Machine Learning Approach. Applied Sciences. 2026; 16(5):2467. https://doi.org/10.3390/app16052467
Chicago/Turabian StyleMqadi, Mlondolozi, Elliot Mbunge, and Tebogo Makaba. 2026. "Predicting 30-Day Readmission Risks in Breast Cancer Patients: An Explainable Machine Learning Approach" Applied Sciences 16, no. 5: 2467. https://doi.org/10.3390/app16052467
APA StyleMqadi, M., Mbunge, E., & Makaba, T. (2026). Predicting 30-Day Readmission Risks in Breast Cancer Patients: An Explainable Machine Learning Approach. Applied Sciences, 16(5), 2467. https://doi.org/10.3390/app16052467

