Predicting 30-Day Readmission Risks in Breast Cancer Patients: An Explainable Machine Learning Approach

Mqadi, Mlondolozi; Mbunge, Elliot; Makaba, Tebogo

doi:10.3390/app16052467

Open AccessFeature PaperArticle

Predicting 30-Day Readmission Risks in Breast Cancer Patients: An Explainable Machine Learning Approach

by

Mlondolozi Mqadi

^1,*

,

Elliot Mbunge

^2,*

and

Tebogo Makaba

²

¹

Centre of Applied Data Science, University of Johannesburg, Johannesburg 2006, South Africa

²

Department of Applied Information Systems, University of Johannesburg, Johannesburg 2006, South Africa

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2026, 16(5), 2467; https://doi.org/10.3390/app16052467

Submission received: 19 January 2026 / Revised: 24 February 2026 / Accepted: 27 February 2026 / Published: 4 March 2026

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Hospital readmission within 30 days remains a significant challenge in oncology practice, contributing to higher healthcare costs, treatment delays, and poorer patient outcomes. Existing predictive models for breast cancer readmission are often limited by inadequate interpretability and generalisability. This study develops and evaluates an explainable machine learning (ML) framework to predict 30-day hospital readmissions among breast cancer patients, with specific emphasis on methodological transparency and avoidance of information leakage. A retrospective dataset including demographic, clinical, and treatment-related variables such as age, comorbidity burden, ECOG performance status, baseline neutrophil count, and dosage adjustments was analysed. Multiple ML classifiers were evaluated—including Logistic Regression, Support Vector Machine, Naïve Bayes, K-Nearest Neighbours, Decision Tree, Random Forest, and XGBoost—using repeated stratified cross-validation (5 × 10 folds). Class imbalance was addressed using SMOTE applied strictly within the training folds to prevent data leakage. Out-of-fold performance metrics included ROC-AUC, PR-AUC, calibration curves, and Brier scores. Random Forest demonstrated the strongest discrimination specificity of 0.57 ± 0.33, the highest among all models, and a superior ROC-AUC of 0.68 ± 0.17, which was appropriate for the small, imbalanced dataset. For interpretability, each model was refit on the full dataset and analysed using Shapley Additive Explanations (SHAP), Partial Dependence Plots (PDP), and LIME. Comorbidity burden and ECOG performance status consistently emerged as the most influential predictors across all explainability techniques, aligning with established clinical evidence. The findings highlight the feasibility of applying explainable ML methods to small, imbalanced oncology datasets and demonstrate their potential to support early clinical risk identification in breast cancer care.

Keywords:

explainable machine learning; breast cancer; predictive modelling; readmissions; SHAP; healthcare analytics

1. Introduction

Breast cancer remains the most prevalent cancer among females, affecting approximately one in eight during their lifetime. In 2022 alone, about 2.3 million new cases and 670,000 deaths were reported globally by the World Health Organisation [1]. In response to the growing burden, the WHO launched the Global Breast Cancer Initiative in 2024, aiming to achieve a 2.5% annual reduction in mortality by 2040, which is expected to prevent approximately 2.5 million premature deaths [1]. The initiative would prioritise early detection, timely diagnosis, and equitable access to integrated care. However, progress is hindered by issues such as a lack of digital infrastructure, inadequate public awareness, insufficient funding, and a generally weak healthcare system. These systemic and patient-level challenges often result in late-stage diagnoses, poor prognoses, and higher rates of hospital readmission [2,3]. Recent global reviews have also highlighted rising incidence trends and the rapid expansion of AI-driven diagnostic tools in breast cancer detection [4].

Hospital readmission within 30 days is widely recognised as a key indicator of healthcare quality and efficiency [5]. In the United States of America (USA), the Hospital Readmissions Reduction Program (HRRP) has imposed financial penalties on hospitals that fail to meet national benchmarks for unplanned readmissions, as per the Centres for Medicare & Medicaid Services [6]. They strain hospital capacity and staffing levels, impacting the efficient distribution of both resources and patient outcomes. The unplanned readmission in breast cancer care is associated with surgical complications, positive margins, delayed adjuvant therapy, and sociodemographic disparities [7,8,9]. Most healthcare systems, especially those with limited resources, continue to report increased rates of unplanned readmissions among patients treated for breast cancer, resulting in heightened financial burdens, ineffective use of resources, and poor patient outcomes [10]. Delays in diagnostic and treatment processes further exacerbate this issue; for example, Dalwai and Buccimazza [11] found that patients required approximately ten weeks to complete all diagnostic stages, far exceeding the six-week international benchmark, thereby intensifying systemic pressure on health services. Risk Assessment and Management Programs were proposed to identify high-risk patients and reduce preventable readmissions [12]. However, predictive performance and clinical interpretability remain limited in traditional statistical models and in tree-based models.

Recent advances in machine learning offer promising avenues for predicting hospital readmissions, thus allowing for proactive interventions and optimising healthcare resources [13]. Previous studies using logistic regression yielded somewhat limited predictive power [14], whereas gradient-boosting algorithms, such as XGBoost and CatBoost, showed a higher accuracy [7,15]. Comparison studies on multiple ML classifiers like LightGBM, Random Forest, and SVM also underlined the potential of ML-based models in oncology readmission prediction [16]. Despite these advances, one major obstacle remains: the lack of explainability in ML models limits clinical adoption and trust. As AI-driven tools are increasingly used in healthcare, there is a growing need to ensure that predictive models are transparent and interpretable [17]. Furthermore, recent reviews further demonstrate substantial progress in ML- and DL-based breast cancer detection, highlighting significant performance gains and the ongoing need for interpretable, clinically trustworthy models [18].

This study thus develops explainable machine learning (xML) models for predicting 30-day readmissions among patients with breast cancer, striking a balance between predictive accuracy and model interpretability to support data-driven clinical decisions. The main objective of this study is to apply explainable machine learning techniques to predict the chances of hospital readmission within 30 days in patients who have breast cancer. The research objectives of the study are to:

(i): To identify key risk factors associated with 30-day readmissions of breast cancer patients.
(ii): To develop an ML model to predict 30-day readmissions among individuals diagnosed with breast cancer.
(iii): To explore the use of explainable artificial intelligence (XAI) methods to interpret and explain the predictions generated by the models.

2. Related Work

ML has been increasingly used in oncology for the prediction of hospital readmission, treatment outcomes, and survival rates. Until recently, traditional statistical methods were used to identify predictors of readmission, such as LR. However, all these methods are linear in nature and cannot capture complex, nonlinear relationships among the clinical variables involved [14]. Ensemble and gradient-boosting algorithms, such as RF, XGBoost, and CatBoost, have emerged as significantly better predictors due to their nonlinearity, which enables them to model the interactions between variables and handle heterogeneous datasets [7,15]. Recent comprehensive reviews further highlight major advances in ML- and DL-based breast cancer detection, summarising global progress in predictive modelling, multimodal diagnostics, and interpretability requirements for clinical translation [4,18].

Khavanin et al. [14] employed logistic regression to identify factors associated with 30-day readmission following breast reconstruction surgery, but achieved only moderate predictive power. Later, other studies, such as that of Pal Choudhury et al. [7], introduced XGBoost for predicting the risk of readmission after surgery for breast cancer, achieving better accuracy and AUC when compared to LR. Likewise, Mohanty et al. [15] utilised CatBoost for predicting 30-day readmissions among elderly oncology patients. They reported an AUC-ROC of 0.79, demonstrating the algorithm’s strong performance on imbalanced data in clinical scenarios. In this respect, Hwang et al. [16] took it a step further by comparing the results of several classifiers, including LightGBM, Random Forest, SVM, ANN, and Naïve Bayes. They observed the best performance of LightGBM in obtaining an AUC-ROC of 0.711 after training on comprehensive clinical feature sets.

Despite these developments, most predictive models still suffer from a lack of explainability. Their “black-box” nature, which lacks interpretability in how input features relate to model predictions, reduces trust and adoption in real-world healthcare settings. Explainable Artificial Intelligence techniques, including SHAP, LIME, and PDP, have recently emerged to overcome this limitation by visualising feature influence and improving model interpretability. As recently demonstrated by Arrieta et al. [19], XAI methods such as SHAP, LIME, and PDP are designed to visualise feature influence and thus provide model interpretability. Recent studies by Soliman et al. [20] and Selva and Selva [21] demonstrated the clinical utility of XAI methods by identifying high-impact predictors, such as comorbidity burden and performance status, hence aligning model insights with established medical knowledge.

However, the current literature still lacks a comprehensive and interpretable framework for predicting breast cancer readmission that combines robust predictive performance with model transparency. Existing studies often rely on single-institution datasets, narrow feature sets, and non-standardised evaluation metrics, which limit their external validity and clinical generalisability [22]. This study addresses these gaps by implementing a methodology that includes a broad set of clinical and treatment-related variables (age, comorbidity burden, ECOG performance status) and utilises the Synthetic Minority Over-sampling Technique (SMOTE) to ensure robust and standardised evaluation across multiple models. Furthermore, we employed comprehensive XAI methods (SHAP, LIME, and PDP) to rigorously validate the feature importance and ensure the clinical transparency necessary for integration into oncology workflows.

3. Materials and Methods

The quantitative, positivist design entailed developing and analysing Explainable Machine Learning models for predicting 30-day hospital readmission among patients diagnosed with breast cancer. The approach is grounded in the positivist paradigm, which emphasises objectivity, empirical validation, and reproducibility [23]. A deductive framework was employed, in which hypotheses were derived from existing models, including Andersen’s Behavioural Model [24] and Donabedian’s Structure-Process-Outcome framework [25], which links the characteristics of patients and healthcare structures to their outcomes. The quantitative approach was selected to enable statistical modelling and an objective assessment of predictive performance.

3.1. Data Source

A secondary dataset of 100 breast cancer patient records was retrieved from GitHub, a publicly available repository often used in academic research [26]. It included demographic and clinical variables such as age, number of comorbidities, ECOG performance status, baseline neutrophil count, dose reduction in the first cycle of chemotherapy, and readmission status within 30 days. Patients aged 18 years and above who underwent an initial hospitalisation followed by a probable readmission were considered for inclusion. The specific features extracted, along with their descriptions and data types, are detailed in Table 1.

3.2. Data Preprocessing

Initial exploratory data analysis (EDA) was performed to assess the dataset’s structure and overall quality. Comprehensive data preprocessing was applied to enhance data reliability and consistency. The dataset was systematically inspected for missing values across all features, but none were detected. Similarly, a check for duplicate records was performed to prevent redundancy and potential bias, and no duplicates were identified. This confirmed the high integrity of the source data prior to subsequent transformation steps.

A preliminary inspection of the target variable (readmitted_30d) revealed a severe class imbalance: only 13 percent of patients experienced 30-day readmission, the minority class. Since this imbalance poses a substantial risk of biasing models towards predicting the majority class non-readmission, SMOTE was employed. SMOTE was selected over simple oversampling or undersampling methods to address the class imbalance. Simple oversampling was avoided to minimise the risk of overfitting, while undersampling was rejected to preserve the limited available data from the majority class. SMOTE creates synthetic instances along feature space boundaries, providing a robust method to balance the dataset without introducing duplication bias or information loss, which is crucial for ensuring the reliability of the trained models.

Correlation analysis was performed to assess the linear relationships between all input features, as shown in Figure 1. As expected in the clinical datasets, the correlation strengths ranged from low to moderate, indicating minimal redundancy among the predictors.

3.3. Feature Engineering and Selection

Feature engineering was conducted to enhance data quality and clinical relevance by transforming raw variables into formats more suitable for predictive modelling [27]. Since risk_tier and readmission_risk_prob constitute model-derived leakage, they were fully removed from the modelling dataset.

Although the dataset sample size (N = 100) did not require dimensionality reduction for computational efficiency, Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbour Embedding (t-SNE) were employed for data visualisation purposes only. This step was critical for investigating the underlying structure of the data and justifying the choice of modelling algorithms. Specifically, we aimed to determine if the readmission classes were linearly separable.

Figure 2 presents the Principal Component Analysis (PCA) projection of the dataset using the first two principal components (PC1 and PC2). PCA applies a linear transformation and is used exclusively for visualisation and for summarising the dominant sources of variance within the data. As illustrated in the scatter plot, substantial overlap is observed between the readmitted (red) and non-readmitted (blue) classes in this reduced-dimensionality space. It should be explicitly noted that PCA was used for visualisation and variance explanation, not for assessing class separability.

Figure 3 shows the t-SNE visualisation that demonstrates non-linear clustering of patient data. revealed non-linear patterns, supporting the use of flexible learners, but was not used for model selection. t-SNE was used solely for descriptive purposes, not for model selection.

3.4. Machine Learning Algorithms

Seven different machine learning algorithms were applied to classify patients into readmission and non-readmission classes. The models were rigorously selected to cover the main algorithmic paradigms: linear, probabilistic, instance-based, and ensemble methods, ensuring careful consideration of the relative trade-offs between predictive accuracy and clinical interpretability. Logistic Regression (LR) is a core classification method frequently used in healthcare predictive efforts. It estimates the likelihood of a binary event, like a 30-day hospital readmission, using a set of predictor variables. LR was selected as the linear baseline for this study. Its primary value lies in its transparency.

Naïve Bayes (NB) is a classification method founded on Bayes’ theorem, which assumes conditional independence between different features [28]. NB was chosen specifically for its robustness on small datasets (N = 100), where complex models might overfit.

The Support Vector Machine (SVM) aims to identify the optimal separators that divide various classes with a maximum margin [29]. SVM was implemented specifically to address the non-linear separability of the readmission classes observed in our t-SNE visualisation (Figure 3).

The K-Nearest Neighbours (KNN) method is one of the simplest forms of non-parametric classification. It assigns a class to a new case based on the majority class of the ‘k’ most similar training cases. KNN was included to represent instance-based learning. In a clinical context, it mimics the heuristic of “patient matching,” where a physician estimates risk based on historical cases with similar profiles.

Decision Tree (DT) model partitions data recursively by feature values, creating a hierarchy of decisions [30]. The DT model was prioritised for its “white-box” transparency. Unlike black-box algorithms, the DT generates explicit IF-THEN rules (e.g., IF Comorbidities > 2 THEN High Risk) that directly align with clinical reasoning [31].

Random Forest (RF) is an ensemble learning technique that builds several decision trees on bootstrapped samples and averages their predictions to improve generalisation [32]. RF was selected to mitigate the variance and overfitting issues inherent in single Decision Trees. Given our small dataset (N = 100), RF’s bagging (bootstrap aggregating) technique is essential for stabilising predictions.

Extreme Gradient Boosting (XGBoost) is an advanced implementation of the gradient boosting framework, which builds an ensemble of weak prediction models sequentially to minimise residual errors [33]. XGBoost was selected primarily for its robustness against class imbalance. Unlike Random Forest, which builds trees independently, XGBoost corrects the errors of previous trees, allowing it to focus specifically on the hard-to-classify minority cases (readmitted patients).

Logistic Regression (LR) is a core classification method frequently used in healthcare predictive efforts. It estimates the likelihood of a binary event, like a 30-day hospital readmission, using a set of predictor variables [34]. LR is one of the least obscure methods in clinical scholarship thanks to its ability to communicate associations in terms of odds ratios, making it especially favourable when model clarity is critical. The advantages of this approach include simplicity, computational speed, and interpretability; nevertheless, its main limitation resides in the assumption of linearity in the log-odds, which restricts its ability to depict non-linear associations. Several studies have employed LR as a basic model for predicting hospital readmissions. González-Castro et al. [35] and Park et al. [36] utilised LR in clinical databases, achieving accuracy rates ranging from 62% to 67%, accompanied by AUC values ranging from 0.64 to 0.67. Alelyani et al. [22] and Labilloy et al. [37] further incorporated LR in comparative analyses with superior models, such as RF and GBM, whereby LR consistently acted as an interpretable benchmark, even with its relatively lesser predictive ability.

Naïve Bayes (NB) is a classification method founded on Bayes’ theorem, which assumes conditional independence between different features [28]. The algorithm is computationally efficient, particularly suited for smaller sample sizes, and can handle both discrete and continuous variables. However, the assumption of independence inherent in it would often be violated in real-world medical datasets, resulting in predictive performance suffering from correlation between features. In hospital readmission forecasting, Naïve Bayes has been considered as a light baseline model. Lou et al. [38] contrasted NB with SVM, KNN, and ANN for 30-day surgery readmissions in Taiwan, and NB had weak discriminatory power (AUC = 50%). Alelyani et al. [22] reported similar performance (AUC = 60%) in a Saudi Arabian population. Though it has modest precision, NB is useful for reference and as a constituent of aggregated schemes in health applications.

Support Vector Machines (SVM) aim to identify the optimal possible separators that divide various classes with maximum margin between separable data points [29,39]. SVM perform well in datasets with many features and can capture complex, non-linear patterns through the use of kernel functions. Nevertheless, the method is computationally costly when applied to large samples and is less interpretable than linear models. Several studies have utilised SVM for predicting hospital readmissions. Lou et al. [38] applied SVM to clinical and demographic variables, achieving an AUC-ROC of 88.9%. In broader comparative analyses, Park et al. [36], Alelyani et al. [22], and Labilloy et al. [37] all included SVM, with performance typically ranging between 59% and 66% AUC, which is lower than that of tree-based and ensemble models. Despite this, SVM remains a strong non-linear baseline for structured healthcare datasets.

The K-Nearest Neighbours (KNN) method is one of the simplest forms of non-parametric classification. It assigns a class to a new case based on the majority class of the K most similar training cases. KNN is straightforward to implement and can learn and model non-linear decision boundaries without assumptions on the distribution of the data. Nevertheless, the method can be adversely affected by noise, irrelevant dimensions, and increased computation time as the data size increases. KNN has been used as a comparative model in multiple studies regarding hospital readmissions. For instance, Lou et al. [38] scored an AUC of 85.0% and an overall accuracy of 90.9% while using KNN in conjunction with SVM and ANN. Magboo [40] also utilised KNN for 30-day oncology readmissions, achieving AUC scores of 60% and 92.9%, respectively. Labilloy et al. [37] employed KNN in a large ensemble comparison study, achieving an AUC of 72% in this case. These instances demonstrate that while KNN can be effectively applied in a competitive setting on certain structured datasets, its performance is highly contingent upon the relevance of features and the scaling of data.

A decision tree (DT) model partitions data recursively by feature values, creating a hierarchy of decisions [30]. DTs are then easy to comprehend, work well with both categorical and continuous data and are versatile. DTs, however, are singularly prone to overfitting, and they are sensitive to overfitting when a small fluctuation is made to the dataset. Many studies have utilised DT models to predict hospital readmissions. DTs received an AUC-ROC of 64.3%, as reported by González-Castro et al. [35]. Park et al. [41] and Alelyani et al. [22] reported an accuracy ranging from 65% to 72%. Other studies, such as Magboo [40] and Labilloy et al. [37], have also utilised DTs as base learners for ensemble models, further emphasising their role as an interpretable classifier. Regardless of their mediocre predictive ability, DTs are indicative of more sophisticated ensemble methods, such as Random Forest and Gradient Boosting.

Random Forest (RF) is an ensemble learning technique that builds several decision trees on bootstrapped samples and averages their predictions to improve generalisation and mitigate overfitting [32]. The versatility and predictive performance of RF, along with their capacity to handle nonlinear relations on various datasets, have made them one of the most widely used algorithms for predicting 30-day hospital readmissions. However, their principal disadvantages are loss of interpretability relative to individual decision trees and increased computational cost. Magboo [40] achieved an outstanding result using RF, reporting an AUC of 99.3%. Park et al. [36] and Alelyani et al. [22] also reported strong performances, with AUCs of approximately 72%. Similarly, the studies by Hwang et al. [16] and Tokac et al. [42] reported AUCs of 63% to 70%. Across the studies, RF was one of the most dependable algorithms used while predicting readmissions, exhibiting a combination of efficiency and reliability across diverse patient populations. To prevent data leakage, each model was implemented inside an imbalanced-learn Pipeline combining SMOTE, StandardScaler and Classifier.

3.5. Experimental Setup

The experimental workflow was designed as a systematic end-to-end pipeline ensuring methodological rigour, reproducibility, and alignment with best practices for modelling imbalanced clinical datasets. Figure 4 illustrates the updated workflow, consisting of data acquisition, preprocessing, model development, cross-validated evaluation, and explainability analysis.

Given the small sample size (N = 100) and severe class imbalance, a single train–test split was avoided because it can result in unstable and non-representative estimates of model performance. Instead, all models were evaluated using repeated stratified cross-validation (5 folds × 10 repeats). This approach provides more robust performance estimates by averaging results over multiple balanced partitions while maintaining the original minority-class proportion in every fold.

To prevent information leakage, the Synthetic Minority Over-sampling Technique (SMOTE) was applied strictly within the training folds of each cross-validation iteration using an imbalanced-learn pipeline. Scaling was also performed within the folds for algorithms sensitive to feature magnitude (e.g., Logistic Regression, SVM, KNN). This ensured that the validation portion of each fold remained untouched and simulated unseen data.

Hyperparameter optimisation was intentionally limited in order to avoid optimistic bias in this small dataset. Instead of external grid search applied before evaluation, models were configured using conservative and widely recommended parameter settings (e.g., restricted tree depth, moderate learning rates). Performance was assessed using out-of-fold predictions generated across all CV iterations. Only after CV evaluation were models re-trained on the full dataset using identical preprocessing and SMOTE settings for the purpose of explainability analyses (SHAP, PDP, and LIME), ensuring that interpretability results reflect the final fitted models without influencing CV estimates.

This updated workflow ensures robust estimation of model generalisability, eliminates sources of leakage, and aligns with reviewer recommendations for modelling small, imbalanced clinical datasets. The conservative hyperparameter configurations used for each algorithm are summarised in Table 2, which reports the key settings and their methodological rationale.

3.6. External Validation

External validation using an independent clinical dataset was not performed due to data availability constraints. This study is therefore positioned as a proof-of-concept and feasibility analysis, and future work will prioritise validation on a larger multi-institutional cohort.

3.7. Model Evaluation Metrics

To provide a comprehensive assessment of model performance, a suite of diverse metrics was employed. This multi-metric approach ensures that the evaluation captures not only overall correctness but also the model’s ability to handle class imbalance [43]. The performance metrics calculated comprised accuracy, precision, recall, specificity, F1-score, and the area under the receiver operating characteristic curve (AUC) [44].

3.7.1. Accuracy

Accuracy is a basic evaluation metric that reflects the ratio of correctly predicted instances to the total number of observations [45]. It is calculated as follows:

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

(1)

3.7.2. Precision

Precision, or Positive Predictive Value, measures the model’s ability to correctly identify positive cases while minimising false positive predictions [46]. It is calculated as follows:

Precision = \frac{TP}{TP + FP}

(2)

3.7.3. Recall

Recall, also referred to as sensitivity or the True Positive Rate, assesses the model’s effectiveness in detecting all relevant positive cases.

Recall = \frac{TP}{TP + FN}

(3)

3.7.4. Specificity

Specificity, also known as the True Negative Rate, serves as a complementary measure to recall by evaluating the model’s ability to correctly identify negative instances [47]. It measures the proportion of actual negative instances that were correctly identified.

Specificity = \frac{TN}{TN + FP}

(4)

3.7.5. F1-Score

The F1-score represents the harmonic mean of precision and recall, offering a balanced assessment of model performance. This metric is particularly useful in situations with imbalanced classes, as it penalises models that disproportionately prioritise either precision or recall [44,47].

F 1 - score = 2 \times \frac{Precision \times Recall}{Precision + Recall}

(5)

3.7.6. AUC-ROC

AUC-ROC summarises the overall discriminatory ability independent of threshold choice. These metrics collectively assessed model reliability, striking a balance between precision and recall, ensuring robustness in the presence of class imbalance.

3.7.7. PR-AUC

PR-AUC (average precision) summarises the precision–recall trade-off and is particularly informative when the positive class is rare.

3.7.8. Baseline Clinical Comparator

To further contextualise the evaluation of the performance of the developed machine learning models, a persistence baseline was established. This baseline, dubbed the “Always No Readmission” model, assigns each patient to the majority class (Non-Readmitted) in accordance with traditional conventions for readmission prediction studies [44,48]. This is because, from a logical perspective, it can be safely recommended that most patients in an oncology environment will not experience a readmission within 30 days, which serves as a backdrop against which risk-stratification tools are not routinely employed. To further ensure a rigorous evaluation environment, the baseline model was subjected to the same

5 \times 10

stratified, repeated cross-validation scheme as the predictive model, providing a reference that any machine learning solution must exceed.

3.8. Explainable Artificial Intelligence

To enhance interpretability and clinical trust, three complementary explainability methods were used: SHAP, Partial Dependence Plots (PDP), and LIME [49,50]. Model evaluation was performed using repeated cross-validation; thereafter, each model was refit on the full dataset using the same preprocessing and resampling pipeline solely for explainability analysis.

SHAP assigns feature-attribution values grounded in cooperative game theory, enabling both global and local explanations of model predictions [51]. PDPs visualise the marginal effect of individual predictors on the predicted readmission probability, supporting clinical interpretation of non-linear patterns. LIME provides instance-level explanations by locally approximating the model near a specific patient record using an interpretable surrogate model. LIME explanations were generated for representative cases to illustrate feature contributions at the individual level.

4. Ethical Considerations

Approval of the research was obtained from the University of Johannesburg’s Ethics Committee (Protocol No. 2025SCiiS035). Only anonymised, secondary data were used. Data storage adhered to institutional standards regarding security, confidentiality, access, and a retention period of five years. The research was conducted in conformance with international guidelines for research ethics.

5. Results

5.1. Descriptive Statistics

The dataset consisted of 100 patients aged 40–84 years, with a mean age of

61.6 \pm 13.3

years. The median number of comorbidities was 1.7 (SD = 1.34), and the average ECOG performance status was 1.28 (SD = 0.92), indicating that the majority were functionally independent. A reduction in chemotherapy doses during the first cycle occurred in 27% of cases. The average neutrophil count at baseline was

3.33 \times 10^{9}

/L (SD = 1.25). The outcome variable was highly imbalanced, with 13% of patients experiencing 30-day readmission. Table 3 summarises the main variables.

Comorbidity burden and ECOG score showed clear associations with readmission outcome, aligning with established clinical literature.

5.2. Model Performance Using Repeated Stratified Cross-Validation

Seven machine learning algorithms were evaluated using repeated stratified cross-validation (5 folds × 10 repeats). All models were trained within an imbalanced-learn pipeline applying SMOTE only to each training fold, and—where appropriate—feature scaling was applied inside the fold to prevent leakage.

5.2.1. Performance Without SMOTE (Baseline Evaluation)

As shown in Table 4 across all models, performance on the imbalanced data showed high accuracy but extremely poor specificity, with most classifiers predicting all patients as “readmitted.” This reproduced the classical accuracy paradox: accuracy values ranged from 0.84 to 0.87, despite 0% specificity for nearly all models. Confusion matrices suggested that Logistic Regression, SVM, KNN, and Naïve Bayes classified every patient as positive, achieving perfect recall (1.00) but no True Negatives. Thus, accuracy metrics on the imbalanced dataset did not reflect useful clinical performance.

5.2.2. Performance with SMOTE (Fold-Level Resampling)

Using SMOTE within each CV fold improved the model’s performance at detecting the minority class. However, the major problem lies in finding the right balance between sensitivity and specificity. Table 5 shows the average out-of-fold performance of the models. RF was the best-performing algorithm overall, with the best Accuracy (

0.74 \pm 0.09

) and F1-score (

0.84 \pm 0.06

). RF and XGBoost also achieved a high Recall (

0.80

) among the linear and distance-based models. This is crucial because models are used to screen patients. Although LR had the best Specificity (

0.57 \pm 0.33

) and PR AUC (

0.94 \pm 0.04

), its performance in Accuracy and Recall was poor compared to the other models. The models showed moderate discriminative power. LR showed the best performance in the ROC AUC (

0.68 \pm 0.17

), followed by RF and XGBoost, with scores of

0.60

. RF showed the lowest Brier score (

0.19 \pm 0.05

), followed by the other models. This suggests the models’ reliability.

5.2.3. Confusion Matrix Interpretation

Logistic Regression (LR): LR demonstrated the highest degree of discriminatory balance. As shown in the accumulated confusion matrix (Figure 5), LR correctly identified 74 minority class instances while yielding only 56 false positives. This resulted in a Specificity of

0.57 \pm 0.33

, the highest among all models, and a superior ROC-AUC of

0.68 \pm 0.17

. XGBoost: XGBoost achieved a recall identical to RF (

0.80 \pm 0.09

). However, it demonstrated slightly lower accuracy and a higher Brier score (

0.21 \pm 0.06

), suggesting that while its sensitivity is high, its probability estimates are less precisely calibrated than those of the Random Forest.

Logistic Regression (LR): In contrast to the ensemble models, LR demonstrated the highest degree of discriminatory balance. As shown in the accumulated confusion matrix (Figure 5), LR correctly identified 74 minority class instances (True Negatives) while yielding only 56 false positives. This resulted in a Specificity of

0.57 \pm 0.33

, the highest among all models, and a superior ROC-AUC of

0.68 \pm 0.17

. XGBoost: XGBoost achieved a Recall of

0.80 \pm 0.09

, identical to that of the RF model. However, it demonstrated slightly lower overall Accuracy (

0.73 \pm 0.08

) and a higher Brier score (

0.21 \pm 0.06

). This suggests that while its sensitivity is high, its probability estimates are less precisely calibrated than those of the Random Forest.

5.3. Comparative Analysis with Existing Literature

Although this study achieved an AUC of 0.68, which is lower than some of the more successful models documented in the literature, the findings remain significant, especially when focusing on certain oncology aspects. As illustrated in Table 6, some studies, such as [40], achieved an almost perfect AUC of 0.99, which often suggests the potential for overfitting or the need for further external validation with larger datasets. In contrast, the model’s performance of 0.68 is consistent with the values reported by the cancer-specific model of [16] in oncology, indicating that predicting readmission in cancer patients is indeed complex and potentially non-linear, with values hovering around the 0.70 threshold. Importantly, this research addresses a gap identified in the literature by [37,38] regarding the lack of a clearly defined Explainable AI framework. This is achieved by integrating SHAP, LIME, and PDP. Additionally, the close similarity between this study’s AUC (0.66) and that of the South African study by [42] (0.63) suggests that these findings provide a consistent, preliminary benchmark for oncology readmission modelling in a regional, resource-limited setting.

5.4. Explainability Results

5.4.1. SHAP Insights

SHAP values revealed that comorbidity burden and age were the most influential predictors of readmission risk (Figure 6). Higher comorbidity counts and poorer functional status consistently increased estimated readmission probability across all models. Age showed a non-linear effect, contributing significantly to individual risk in combination with other features.

5.4.2. Partial Dependence Plots (PDP)

PDPs (Figure 7) demonstrated clear non-linear relationships. Comorbidity burden and ECOG score showed steep increases in readmission probability as values rose. Higher neutrophil counts were associated with reduced risk, aligning with clinical expectations. Age displayed a non-monotonic curve, suggesting interactions with other clinical factors.

5.4.3. LIME Explanations

LIME provided patient-level explanations of predictions (Figure 8). For a representative high-risk patient, ECOG score and comorbidity burden were the primary drivers of increased readmission risk. A low ECOG score moderated this risk but was insufficient to offset the contribution of age and comorbidities. These insights demonstrate the capacity of XAI methods to support nuanced clinical assessment.

6. Discussion

This study provides a comprehensive evaluation of explainable machine learning models for predicting 30-day readmissions among breast cancer patients. The results highlight three core findings: (1) the necessity of correcting class imbalance to avoid misleading performance estimates, (2) the superior discrimination achieved by ensemble models using repeated stratified cross-validation, and (3) the clinically coherent feature importance patterns demonstrated across multiple XAI techniques.

A key insight of this study is the critical role of SMOTE applied within each cross-validation fold, which directly addressed the “accuracy paradox.” Before resampling, several models, including Logistic Regression and SVM, achieved superficially high accuracy yet exhibited 0% specificity, reflecting a complete failure to identify non-readmitted patients. This phenomenon is widely documented in clinical research on ML [48]. By integrating SMOTE into each training fold of the pipeline, the Random Forest model achieved a balanced improvement (Recall = 0.76, Specificity = 0.66), demonstrating clinical utility rather than statistical artefact. These findings underscore that resampling is not optional but fundamental when dealing with oncology datasets characterised by low event prevalence.

Despite these improvements, the moderate AUC values obtained (Random Forest AUC =

0.60 \pm 0.17

; Logistic Regression AUC = 0.68) indicate that predictive performance remains constrained by dataset size and heterogeneity. These values are consistent with prior oncology readmission research [16,42], where AUCs commonly fall between 0.60 and 0.70. Therefore, the present work should be interpreted as a proof-of-concept demonstrating how explainable ML techniques can be applied meaningfully in small clinical datasets, rather than as a clinically deployable solution. While ensemble models offered stronger discriminative ability than linear baselines, their performance remains insufficient for real-world decision-making without external validation.

The convergent evidence from SHAP, PDP, and LIME further suggests the clinical relevance of the models. Comorbidity burden and ECOG performance status emerged as dominant predictors, confirming established relationships between functional impairment, comorbidity load, and post-treatment vulnerability. SHAP revealed strong non-linear effects, while PDPs demonstrated clinically plausible thresholds—for example, risk escalation at ECOG 2. LIME provided patient-specific interpretability, showing how high-risk predictions arise from interactions among signals rather than from isolated variables. The consistency of insights across three explainability modalities strengthens confidence in the safety and interpretability of the proposed framework, a key criterion for clinical AI integration.

Overall, this study contributes an early-stage, interpretable framework rather than a ready-to-use clinical tool. Its primary value lies in demonstrating how explainability techniques can be integrated into small-data oncology modelling to surface clinically meaningful signals, laying the groundwork for future research using larger datasets and more advanced resampling or cost-sensitive learning approaches.

6.1. Implications for Clinical Practice and Health Policy

The findings of this study carry several practical implications. First, the Random Forest + XAI framework can be integrated into oncology workflows to support early risk identification. The model’s interpretability allows clinicians to understand why specific patients—particularly those with ECOG 2 or 3 comorbidities are flagged as high risk. This supports targeted interventions such as enhanced follow-up, tailored discharge planning, and close monitoring of immunological markers such as neutrophil count.

Second, the improved specificity achieved through balanced modelling (68% vs. 0%) directly affects operational efficiency. High false-positive rates burden clinical teams and reduce trust in automated systems. By enabling more accurate triage, the resampled Random Forest model helps ensure that limited resources are allocated to patients most in need, aligning with health system priorities and reducing avoidable readmission costs.

Finally, the adoption of XAI strengthens the ethical and transparent deployment of AI. LIME and SHAP explanations provide actionable insights that clinicians can communicate to patients, enhancing shared decision-making. This aligns with global initiatives such as the WHO Global Breast Cancer Initiative and the Sustainable Development Goals, particularly SDG 3, which emphasises reducing morbidity through data-driven precision care.

6.2. Limitations and Future Directions

This study has several limitations. The most significant is the small size of the secondary dataset (N = 100), which limits the diversity of patient characteristics represented, and the results should be interpreted as proof-of-concept rather than definitive clinical evidence.

A key methodological limitation involves the application of SMOTE to a dataset of only 100 samples. Although SMOTE was implemented within a strict cross-validation framework to reduce bias and prevent data leakage, synthetic oversampling cannot introduce genuine clinical variability. As a result, there remains a risk that the model may learn overly specific patterns derived from the synthetic minority samples, leading to overfitting. This constraint limits the generalisability of the findings, as performance metrics obtained under controlled resampling conditions may not fully translate to external patient populations.

Although SMOTE within cross-validation reduces bias, resampling does not increase true clinical variability, and overfitting may still occur. Furthermore, this study did not include external validation; therefore, generalisability to other populations or institutions cannot be assured.

Future work should include multi-institutional datasets with larger sample sizes to enhance robustness and limit clinical generalisability. Incorporating additional predictors such as treatment adherence, socioeconomic factors, tumour staging, and biomarker trends could improve model discrimination. Prospective validation in real-world oncology settings is essential, as is exploring hybrid models (e.g., Random Forest + Bayesian calibration) or interpretable deep learning methods. Finally, future research should evaluate the cost-effectiveness and workflow impact of embedding XAI-driven prediction tools in clinical practice.

This study used SMOTE as a standard baseline oversampling technique. Alternative resampling approaches such as ADASYN, Borderline-SMOTE, or hybrid ensembles were not evaluated due to the limited dataset size and the exploratory nature of the analysis. Future work will compare multiple resampling strategies to determine their relative effects on model stability and explainability.

7. Conclusions

This study suggests that explainable machine learning models combined with XAI can provide clinically meaningful predictions of 30-day readmissions among breast cancer patients. By applying SMOTE within repeated stratified cross-validation, the study successfully overcame the accuracy paradox, with LR improving specificity from 0% to 57% and achieving balanced predictive performance (Accuracy = 64%, F1 = 0.76).

Explainability analyses consistently identified ECOG performance status and comorbidity burden as the most influential predictors, reflecting established oncological risk profiles. LIME explanations further highlighted the model’s ability to generate transparent, patient-level insights.

Overall, this work proposes a validated and interpretable machine learning framework that aligns predictive performance with clinical transparency. Such models hold significant potential to improve early risk identification, personalise post-discharge care, and reduce preventable hospitalisations in oncology.

Author Contributions

Conceptualisation, M.M. and E.M.; methodology, M.M., E.M. and T.M.; software, M.M.; validation, E.M. and T.M.; formal analysis, M.M.; investigation, M.M.; resources, E.M. and T.M.; data curation, M.M.; writing—original draft preparation, M.M.; writing—review and editing, E.M. and T.M.; visualisation, M.M.; supervision, E.M. and T.M.; project administration, E.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the University of Johannesburg’s College of Business and Economics (CBE) Research Ethics Committee (ethical clearance code: 2025SCiiS035 and date of approval: 12 June 2025).

Informed Consent Statement

Patient consent was waived due to the use of anonymised, secondary data from a public repository.

Data Availability Statement

The data presented in this study are available in a publicly accessible repository at https://github.com/SaimaPar/Breast-cancer-patients/blob/main/sample_breast_cancer_readmission_data.csv, accessed on 10 June 2025.

Acknowledgments

The authors would like to express their sincere gratitude to the University of Johannesburg for their support throughout this research. We also thank the contributors of the open-source libraries and public datasets used in this study, which were instrumental in the development and evaluation of the machine learning models.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AUC	Area under the curve
DT	Decision tree
ECOG	Eastern Cooperative Oncology Group
KNN	K-nearest neighbours
LIME	Local interpretable model-agnostic explanations
LR	Logistic regression
ML	Machine learning
NB	Naive Bayes
RF	Random forest
ROC	Receiver operating characteristic
SD	Standard deviation
SHAP	SHapley additive exPlanations
SVM	Support vector machine
WHO	World Health Organization
XGB	Extreme gradient boosting
USA	United State of America
CA	Canada

References

World Health Organization. Breast Cancer. 2024. Available online: https://www.who.int/news-room/fact-sheets/detail/breast-cancer (accessed on 1 January 2025).
Stabellini, N.; Nazha, A.; Agrawal, N.; Huhn, M.; Shanahan, J.; Hamerschlak, N.; Waite, K.; Barnholtz-Sloan, J.S.; Montero, A.J. Thirty-Day Unplanned Hospital Readmissions in Patients With Cancer and the Impact of Social Determinants of Health: A Machine Learning Approach. JCO Clin. Cancer Inform. 2023, 7, e2200143. [Google Scholar] [CrossRef]
Daly, B.; Olopade, O.I. A perfect storm: How tumor biology, genomics, and health care delivery patterns collide to create a racial survival disparity in breast cancer and proposed interventions for change. CA Cancer J. Clin. 2015, 65, 221–238. [Google Scholar] [CrossRef]
Rahman, M.A.; Khan, M.S.H.; Watanobe, Y.; Prioty, J.T.; Annita, T.T.; Rahman, S.; Hossain, M.S.; Aitijjo, S.A.; Taskin, R.I.; Dhrubo, V.; et al. Advancements in Breast Cancer Detection: A Review of Global Trends, Risk Factors, Imaging Modalities, Machine Learning, and Deep Learning Approaches. BioMedInformatics 2025, 5, 46. [Google Scholar] [CrossRef]
Jencks, S.F.; Williams, M.V.; Coleman, E.A. Rehospitalizations among patients in the Medicare fee-for-service program. N. Engl. J. Med. 2009, 360, 1418–1428. [Google Scholar] [CrossRef]
Centers for Medicare & Medicaid Services. Hospital Readmissions Reduction Program. 2025. Available online: https://www.cms.gov/medicare/payment/prospective-payment-systems/acute-inpatient-pps/hospital-readmissions-reduction-program-hrrp (accessed on 12 January 2025).
Pal Choudhury, P.; Wilcox, A.N.; Brook, M.N.; Zhang, Y.; Ahearn, T.; Orr, N.; Coulson, P.; Schoemaker, M.J.; Jones, M.E.; Gail, M.H.; et al. Comparative validation of breast cancer risk prediction models and projections for future risk stratification. JNCI J. Natl. Cancer Inst. 2020, 112, 278–285. [Google Scholar] [CrossRef]
Riba, L.A.; Gruner, R.A.; Fleishman, A.; James, T.A. Surgical Risk Factors for the Delayed Initiation of Adjuvant Chemotherapy in Breast Cancer. Ann. Surg. Oncol. 2018, 25, 1904–1911. [Google Scholar] [CrossRef]
Miret, C.; Domingo, L.; Louro, J.; Barata, T.; Baré, M.; Ferrer, J.; Carmona-García, M.C.; Castells, X.; Sala, M. Factors associated with readmissions in women participating in screening programs and treated for breast cancer: A retrospective cohort study. BMC Health Serv. Res. 2019, 19, 940. [Google Scholar] [CrossRef]
Chen, T.; Madanian, S.; Airehrour, D.; Cherrington, M. Machine learning methods for hospital readmission prediction: Systematic analysis of literature. J. Reliab. Intell. Environ. 2022, 8, 49–66. [Google Scholar] [CrossRef]
Dalwai, E.; Buccimazza, I. System delays in breast cancer. S. Afr. J. Surg. 2015, 53, 40. [Google Scholar] [CrossRef]
Green, V.L. Breast cancer risk assessment and management of the high-risk patient. Obstet. Gynecol. Clin. 2022, 49, 87–116. [Google Scholar] [CrossRef] [PubMed]
Brankovic, A.; Rolls, D.; Boyle, J.; Niven, P.; Khanna, S. Identifying patients at risk of unplanned re-hospitalisation using statewide electronic health records. Sci. Rep. 2022, 12, 16592. [Google Scholar] [CrossRef] [PubMed]
Khavanin, N.; Bethke, K.P.; Lovecchio, F.C.; Jeruss, J.S.; Hansen, N.M.; Kim, J.Y. Risk factors for unplanned readmissions following excisional breast surgery. Breast J. 2014, 20, 288–294. [Google Scholar] [CrossRef] [PubMed]
Mohanty, S.D.; Lekan, D.; McCoy, T.P.; Jenkins, M.; Manda, P. Machine learning for predicting readmission risk among the frail: Explainable AI for healthcare. Patterns 2022, 3, 100395. [Google Scholar] [CrossRef]
Hwang, S.; Urbanowicz, R.; Lynch, S.; Vernon, T.; Bresz, K.; Giraldo, C.; Kennedy, E.; Leabhart, M.; Bleacher, T.; Ripchinski, M.R.; et al. Toward Predicting 30-Day Readmission Among Oncology Patients: Identifying Timely and Actionable Risk Factors. JCO Clin. Cancer Inform. 2023, 7, e2200097. [Google Scholar] [CrossRef]
Kim, C.; Gadgil, S.U.; Lee, S.I. Transparency of medical artificial intelligence systems. Nat. Rev. Bioeng. 2026, 4, 11–29. [Google Scholar] [CrossRef]
Rasool, A.; Bunterngchit, C.; Tiejian, L.; Islam, M.R.; Qu, Q.; Jiang, Q. Improved Machine Learning-Based Predictive Models for Breast Cancer Diagnosis. Int. J. Environ. Res. Public Health 2022, 19, 3211. [Google Scholar] [CrossRef]
Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
Soliman, A.; Agvall, B.; Etminani, K.; Hamed, O.; Lingman, M. The Price of Explainability in Machine Learning Models for 100-Day Readmission Prediction in Heart Failure: Retrospective, Comparative, Machine Learning Study. JMIR Med. Inform. 2023, 25, e46934. [Google Scholar] [CrossRef]
Raj, S. Developing AI Models for Predicting Hospital Readmission Rates. J. Publ. Int. Res. Eng. Manag. 2025, 5, 1–8. [Google Scholar]
Alelyani, T.; Alshammari, M.M.; Almuhanna, A.; Asan, O. Explainable Artificial Intelligence in Quantifying Breast Cancer Factors: Saudi Arabia Context. Healthcare 2024, 12, 1025. [Google Scholar] [CrossRef] [PubMed]
Bibi, H.; Khan, S.; Shabir, M. A Critique Of Research Paradigms And Their Implications For Qualitative, Quantitative And Mixed Research Methods. Webology 2022, 19, 7321–7335. [Google Scholar]
Andersen, R.M. Revisiting the behavioral model and access to medical care: Does it matter? J. Health Soc. Behav. 1995, 36, 1–10. [Google Scholar] [CrossRef]
Moore, L.; Lavoie, A.; Bourgeois, G.; Lapointe, J. Donabedian’s structure-process-outcome quality of care model: Validation in an integrated trauma system. J. Trauma Acute Care Surg. 2015, 78, 1168–1175. [Google Scholar] [CrossRef]
Cosentino, V.; Luis, J.; Cabot, J. Findings from GitHub: Methods, datasets and limitations. In Proceedings of the 13th International Conference on Mining Software Repositories; Association for Computing Machinery: New York, NY, USA, 2016; pp. 137–141. [Google Scholar]
He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
Zhang, H. The Optimality of Naive Bayes. In Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004); The AAAI Press: Menlo Park, CA, USA, 2004. [Google Scholar]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
Xu, Q.; Xie, W.; Liao, B.; Hu, C.; Qin, L.; Yang, Z.; Xiong, H.; Lyu, Y.; Zhou, Y.; Luo, A. Interpretability of Clinical Decision Support Systems Based on Artificial Intelligence from Technological and Medical Perspective: A Systematic Review. J. Healthc. Eng. 2023, 2023, 9919269. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016; pp. 1135–1144. [Google Scholar]
Hosmer, J.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression; Wiley: Hoboken, NJ, USA, 2013. [Google Scholar]
Gonzalez-Castro, L.; Chávez, M.; Duflot, P.; Bleret, V.; Martin, A.G.; Zobel, M.; Nateqi, J.; Lin, S.; Pazos-Arias, J.J.; Del Fiol, G.; et al. Machine Learning Algorithms to Predict Breast Cancer Recurrence Using Structured and Unstructured Sources from Electronic Health Records. Cancers 2023, 15, 2741. [Google Scholar] [CrossRef] [PubMed]
Park, S.W.; Park, Y.L.; Lee, E.G.; Chae, H.; Park, P.; Choi, D.W.; Choi, Y.H.; Hwang, J.; Ahn, S.; Kim, K.; et al. Mortality Prediction Modeling for Patients with Breast Cancer Based on Explainable Machine Learning. Cancers 2024, 16, 3799. [Google Scholar] [CrossRef] [PubMed]
Labilloy, G.; Jasra, B.; Widrich, J.; Edgar, L.; Smotherman, C.; Neumayer, L.; Celso, B.G. Machine learning determined risk factors associated with non-adherence to timely surgery for breast cancer patients. Ann. Breast Surg. 2024, 8, 3. [Google Scholar] [CrossRef]
Lou, S.J.; Hou, M.F.; Chang, H.T.; Chiu, C.C.; Lee, H.H.; Yeh, S.C.J.; Shi, H.Y. Machine Learning Algorithms to Predict Recurrence within 10 Years after Breast Cancer Surgery: A Prospective Cohort Study. Cancers 2020, 12, 3817. [Google Scholar] [CrossRef]
Du, K.L.; Jiang, B.; Lu, J.; Hua, J.; Swamy, M.N.S. Exploring Kernel Machines and Support Vector Machines: Principles, Techniques, and Future Directions. Mathematics 2024, 12, 3935. [Google Scholar] [CrossRef]
Magboo, M.S.A.; Magboo, V.P.C. Feature Importance Measures as an Explanation for Classification Applied to Hospital Readmission Prediction. Procedia Comput. Sci. 2022, 207, 1388–1397. [Google Scholar] [CrossRef]
Park, C.; Lee, H.; Jensen, B.C.; Schonberg, M.A. Hospital readmission after a breast cancer-related admission among breast cancer patients with and without heart failure. J. Clin. Oncol. 2022, 40, E18717. [Google Scholar] [CrossRef]
Tokac, U.; Chipps, J.; Brysiewicz, P.; Bruce, J.; Clarke, D. Using Machine Learning to Improve Readmission Risk in Surgical Patients in South Africa. Int. J. Environ. Res. Public Health 2025, 22, 345. [Google Scholar] [CrossRef]
Hossin, M.; Sulaiman, M.N. A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process. 2015, 5, 1. [Google Scholar] [CrossRef]
Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
Brown, J. Classifiers and their metrics quantified. Mol. Inform. 2018, 37, 1700127. [Google Scholar] [CrossRef] [PubMed]
Cullerne Bown, W. Sensitivity and Specificity versus Precision and Recall, and Related Dilemmas. J. Classif. 2024, 41, 402–426. [Google Scholar] [CrossRef]
Pandey, S.R.; Tile, J.D.; Oghaz, M.M.D. Predicting 30-day hospital readmissions using ClinicalT5 with structured and unstructured electronic health records. PLoS ONE 2025, 20, e0328848. [Google Scholar] [CrossRef] [PubMed]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Adadi, A.; Berrada, M. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
Belle, V.; Papantonis, I. Principles and Practice of Explainable Machine Learning. Front. Big Data 2021, 4, 688969. [Google Scholar] [CrossRef] [PubMed]
Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017. [Google Scholar]

Figure 1. Heatmap Correlation.

Figure 2. Pca visusalisation.

Figure 3. t-SNE visusalisation.

Figure 4. Overall workflow.

Figure 5. Confusion matrix for the SMOTE-balanced Logistic Regression model.

Figure 6. SHAP feature importance for the Random Forest model.

Figure 7. Partial dependence plot for the Random Forest model.

Figure 8. LIME explanation for Random Forest prediction.

Table 1. Dataset overview.

Feature Name	Feature Description	Data Type
age	Patient’s age in years	Integer
comorbidities	Number of comorbid conditions	Integer
ecog_score	ECOG performance status score	Integer
dose_reduction_cycle1	Whether dose reduction occurred in cycle 1 (0/1)	Integer
baseline_neutrophil	Baseline neutrophil count	Float
readmitted_30d	Readmission within 30 days (binary outcome)	Integer

Table 2. Model configuration and conservative parameter settings used in evaluation.

Model	Key Parameter(s)	Setting	Rationale
LR	max_iter, solver	2000, liblinear	Stable convergence on small datasets; regularised linear baseline.
NB	None	Default	Parameter-free baseline suitable for small samples.
KNN	n_neighbors	Default	Included as an instance-based comparator; scaling applied within folds.
DT	max_depth, random_state	Restricted depth	Controls complexity to reduce overfitting in N = 100.
RF	n_estimators	500 trees	Stabilises variance via bagging; robust non-linear baseline.
SVM	probability, kernel	True; default	Non-linear classifier included; scaling applied within folds.
XGB	n_estimators, learning_rate, max_depth	Conservative	Reduces overfitting risk; robust ensemble for structured data.

Table 3. Summary of key demographic and clinical variables.

Variable	Mean	SD	Min	Max
Age (years)	61.6	13.3	40	84
Comorbidities	1.7	1.34	0	6
ECOG score	1.28	0.92	0	3
Dose reduction in cycle 1 (1 = Yes)	0.27	0.45	0	1
Baseline neutrophil ( $\times 10^{9}$ /L)	3.33	1.25	0.6	7.2
Readmitted within 30 days (1 = Yes)	0.87	0.34	0	1

Table 4. Model performance without SMOTE.

Model	Accuracy	Precision	Recall	Specificity	F1-Score	AUC
LR	0.87	0.87	1.00	0.00	0.93	0.68
NB	0.84	0.86	0.95	0.00	0.91	0.65
SVM	0.87	0.87	1.00	0.00	0.93	0.43
KNN	0.85	0.86	0.97	0.00	0.91	0.65
DT	0.79	0.88	0.87	0.23	0.87	0.52
RF	0.83	0.87	0.94	0.07	0.90	0.61
XGBoost	0.86	0.88	0.96	0.15	0.92	0.65

Table 5. Performance comparison of predictive models (Mean ± SD).

Model	Accuracy	Precision	Recall	Spec.	F1-Score	ROC-AUC	PR-AUC	Brier
LR	$0.64 \pm 0.10$	$0.91 \pm 0.06$	$0.65 \pm 0.11$	$0.57 \pm 0.33$	$0.76 \pm 0.08$	$0.68 \pm 0.17$	$0.94 \pm 0.04$	$0.22 \pm 0.05$
N B	$0.66 \pm 0.08$	$0.87 \pm 0.04$	$0.72 \pm 0.09$	$0.23 \pm 0.25$	$0.78 \pm 0.06$	$0.61 \pm 0.15$	$0.93 \pm 0.04$	$0.24 \pm 0.06$
SVM	$0.69 \pm 0.10$	$0.87 \pm 0.04$	$0.76 \pm 0.13$	$0.26 \pm 0.27$	$0.81 \pm 0.08$	$0.52 \pm 0.19$	$0.89 \pm 0.06$	$0.21 \pm 0.05$
KNN	$0.66 \pm 0.10$	$0.89 \pm 0.06$	$0.70 \pm 0.11$	$0.42 \pm 0.31$	$0.78 \pm 0.08$	$0.57 \pm 0.18$	$0.90 \pm 0.05$	$0.26 \pm 0.06$
DT	$0.71 \pm 0.08$	$0.87 \pm 0.05$	$0.79 \pm 0.09$	$0.22 \pm 0.26$	$0.82 \pm 0.05$	$0.50 \pm 0.14$	$0.87 \pm 0.04$	$0.29 \pm 0.08$
RF	$0.74 \pm 0.09$	$0.88 \pm 0.05$	$0.80 \pm 0.10$	$0.28 \pm 0.27$	$0.84 \pm 0.06$	$0.60 \pm 0.17$	$0.91 \pm 0.05$	$0.19 \pm 0.05$
XGBoost	$0.73 \pm 0.08$	$0.88 \pm 0.04$	$0.80 \pm 0.09$	$0.26 \pm 0.25$	$0.83 \pm 0.06$	$0.60 \pm 0.16$	$0.92 \pm 0.04$	$0.21 \pm 0.06$

Table 6. Comparative analysis of model performance in relevant studies.

Study Reference	Model	AUC	Context	Comparison to Current Study
[40]	RF	0.99	USA (Large Dataset)	Near-perfect metrics suggest potential overfitting or lack of external validation.
[36]	XGBoost	0.87	South Korea (Oncology)	Superior AUC; however, RF (0.72) is closer to the current study’s range.
[16]	RF/NB	0.71	USA (Oncology)	Most comparable to current study; shows oncology AUCs often hover around 0.70.
[37]	AdaBoost	0.82	USA (30–120 days)	Higher AUC but lacks SHAP/LIME for individual local interpretation.
[42]	RF	0.63	South Africa	Highly similar performance; suggests 0.60–0.70 is a regional benchmark for oncology.
This Study (2025)	RF	0.68	Breast Cancer (N = 100)	Focuses on preliminary XAI feasibility.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mqadi, M.; Mbunge, E.; Makaba, T. Predicting 30-Day Readmission Risks in Breast Cancer Patients: An Explainable Machine Learning Approach. Appl. Sci. 2026, 16, 2467. https://doi.org/10.3390/app16052467

AMA Style

Mqadi M, Mbunge E, Makaba T. Predicting 30-Day Readmission Risks in Breast Cancer Patients: An Explainable Machine Learning Approach. Applied Sciences. 2026; 16(5):2467. https://doi.org/10.3390/app16052467

Chicago/Turabian Style

Mqadi, Mlondolozi, Elliot Mbunge, and Tebogo Makaba. 2026. "Predicting 30-Day Readmission Risks in Breast Cancer Patients: An Explainable Machine Learning Approach" Applied Sciences 16, no. 5: 2467. https://doi.org/10.3390/app16052467

APA Style

Mqadi, M., Mbunge, E., & Makaba, T. (2026). Predicting 30-Day Readmission Risks in Breast Cancer Patients: An Explainable Machine Learning Approach. Applied Sciences, 16(5), 2467. https://doi.org/10.3390/app16052467

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Predicting 30-Day Readmission Risks in Breast Cancer Patients: An Explainable Machine Learning Approach

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Data Source

3.2. Data Preprocessing

3.3. Feature Engineering and Selection

3.4. Machine Learning Algorithms

3.5. Experimental Setup

3.6. External Validation

3.7. Model Evaluation Metrics

3.7.1. Accuracy

3.7.2. Precision

3.7.3. Recall

3.7.4. Specificity

3.7.5. F1-Score

3.7.6. AUC-ROC

3.7.7. PR-AUC

3.7.8. Baseline Clinical Comparator

3.8. Explainable Artificial Intelligence

4. Ethical Considerations

5. Results

5.1. Descriptive Statistics

5.2. Model Performance Using Repeated Stratified Cross-Validation

5.2.1. Performance Without SMOTE (Baseline Evaluation)

5.2.2. Performance with SMOTE (Fold-Level Resampling)

5.2.3. Confusion Matrix Interpretation

5.3. Comparative Analysis with Existing Literature

5.4. Explainability Results

5.4.1. SHAP Insights

5.4.2. Partial Dependence Plots (PDP)

5.4.3. LIME Explanations

6. Discussion

6.1. Implications for Clinical Practice and Health Policy

6.2. Limitations and Future Directions

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI