Next Article in Journal
Correction: Lee et al. Discrepancies Between the Tennessee Nomogram and Oncotype DX: Implications for the Korean Breast Cancer Population—The BRAIN Study. Cancers 2025, 17, 3083
Previous Article in Journal
Tumor–Immune Cell Crosstalk Drives Immune Cell Reprogramming Towards a Pro-Tumor Proliferative State Involving STAT3 Activation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A TabNet-Based Multidimensional Deep Learning Model for Predicting Doxorubicin-Induced Cardiotoxicity in Breast Cancer Patients

1
Department of Cardiology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin 150001, China
2
Department of Medical Oncology, Cancer Hospital of Dalian University of Technology, Cancer Hospital of China Medical University, Liaoning Cancer Hospital & Institute, No. 44 Xiaoheyan Road, Dadong District, Shenyang 110042, China
*
Authors to whom correspondence should be addressed.
Cancers 2026, 18(1), 117; https://doi.org/10.3390/cancers18010117 (registering DOI)
Submission received: 4 November 2025 / Revised: 26 December 2025 / Accepted: 29 December 2025 / Published: 30 December 2025
(This article belongs to the Section Methods and Technologies Development)

Simple Summary

Doxorubicin is a widely used chemotherapy drug for breast cancer, but it can cause heart damage in some patients, which may limit treatment and affect long-term outcomes. Identifying patients at high risk of cardiotoxicity before or during treatment remains challenging. In this study, we developed an interpretable deep learning model based on the TabNet architecture using routinely collected clinical, laboratory, electrocardiographic, and echocardiographic data. The model accurately predicted doxorubicin-induced cardiotoxicity and identified key risk factors related to cardiac function, electrical activity, and metabolic status. This approach may help clinicians recognize high-risk patients earlier and support personalized monitoring and preventive strategies during chemotherapy.

Abstract

Objective: To develop and validate an interpretable deep learning model based on the TabNet architecture for predicting doxorubicin-induced cardiotoxicity (DIC) in patients with breast cancer through integration of multidimensional clinical data. Methods: This retrospective study included 2034 patients who received doxorubicin-based chemotherapy at The Fourth Affiliated Hospital of Harbin Medical University between January 2021 and December 2023. Clinical, biochemical, electrocardiographic, and echocardiographic parameters were incorporated into six predictive algorithms: logistic regression, decision tree, random forest, gradient boosting machine, XGBoost, and TabNet. Model discrimination, calibration, and clinical utility were assessed using AUC, C-index, calibration plots, and decision curve analysis. Model interpretability was evaluated through attention-based feature importance and SHAP analysis. Results: TabNet achieved the best overall predictive performance, with an AUC of 0.86 and a C-index of 0.80 in the validation cohort, demonstrating superior discrimination, calibration, and generalization compared with all baseline models. Decision curve analysis confirmed its higher net clinical benefit across threshold probabilities. The model identified eight dominant predictors—cumulative anthracycline dose, LVEF, QTc interval, lactate dehydrogenase, creatinine, glucose, hypertension, and platelet count—that collectively reflected myocardial contractility, electrophysiological stability, and systemic metabolic stress. Correlation and clustering analyses revealed that high-risk patients exhibited concurrent QTc prolongation, metabolic disturbance, and LVEF decline, defining a distinct cardiometabolic injury phenotype. These findings highlight TabNet’s ability to uncover complex feature interactions while maintaining transparent and clinically interpretable outputs. Conclusions: The TabNet-based multidimensional model provides an accurate, stable, and interpretable tool for individualized prediction of doxorubicin-induced cardiotoxicity, supporting early intervention and precision management in breast cancer patients receiving anthracycline therapy.

1. Introduction

Breast cancer remains the most prevalent malignancy and the leading cause of cancer-related death among women worldwide, accounting for approximately 2.3 million new cases and 670,000 deaths annually, according to GLOBOCAN 2022 estimates [1,2]. Advances in early detection, surgical techniques, and systemic therapies such as chemotherapy, endocrine therapy, and targeted therapy have markedly improved survival outcomes over the past two decades [3,4]. Among these, anthracycline-based regimens, particularly doxorubicin, remain a cornerstone of adjuvant and neoadjuvant chemotherapy for breast cancer, owing to their potent cytotoxic efficacy [5,6,7]. However, the clinical benefit of doxorubicin is counterbalanced by a well-recognized risk of dose-dependent cardiotoxicity, which can lead to irreversible left ventricular dysfunction and heart failure [8,9,10].
Doxorubicin-induced cardiotoxicity (DIC) represents a major dose-limiting complication that significantly impacts long-term prognosis and quality of life in cancer survivors [10]. Despite substantial progress in cardio-oncology, early identification of patients at high risk for DIC remains challenging due to the multifactorial nature of the injury, involving cumulative anthracycline exposure, pre-existing cardiovascular comorbidities, and individual metabolic and genetic susceptibilities [11,12]. Conventional risk stratification tools, which rely on linear regression or single-parameter thresholds such as left ventricular ejection fraction (LVEF) decline, are limited in their ability to capture complex, nonlinear interactions among diverse clinical and biochemical variables [13,14]. Consequently, the development of a more accurate, individualized, and interpretable prediction framework is urgently needed to guide preventive and monitoring strategies for DIC. These limitations highlight the need for data-driven methods capable of capturing the complex, multidimensional relationships underlying DIC.
In recent years, machine learning (ML)-based models have shown substantial promise in cardiovascular risk prediction by leveraging high-dimensional clinical data to uncover nonlinear patterns beyond traditional statistical methods [15]. Algorithms such as random forest, gradient boosting machines (GBMs), and extreme gradient boosting (XGBoost) have achieved notable performance improvements in various oncologic and cardiotoxicity-related applications [16,17,18]. Nevertheless, most conventional ML models remain limited by their “black box” nature and lack of interpretability, hindering clinical trust and real-world adoption [19].
Building on these advances, deep learning enables end-to-end feature learning from complex and high-dimensional data, achieving significant progress in predictive modeling [20]. Traditional fully connected and convolutional architectures often lose interpretability and tend to overfit small or moderate clinical datasets [21]. TabNet, developed in 2020 by Google AI, uses an attention-based neural architecture for tabular data and achieves a strong balance between accuracy, interpretability, and computational efficiency [22]. It applies sequential attention and sparse feature selection to focus on the most informative variables during decision making, providing both high predictive power and a clear explanation of feature contributions at global and individual levels [23]. TabNet also demonstrates a strong capability to integrate data from multiple dimensions, which enhances its predictive accuracy [24]. Therefore, constructing a multidimensional feature model that includes clinical, biochemical, electrocardiographic, and echocardiographic data can improve predictive precision by reflecting the complex pathophysiological mechanisms of cardiotoxicity. Based on this rationale, this study developed and validated a TabNet-based framework that integrates multidimensional patient data for individualized prediction of doxorubicin-induced cardiotoxicity, providing both high accuracy and biological interpretability.

2. Materials and Methods

2.1. Patients

This retrospective study was conducted at The Fourth Affiliated Hospital of Harbin Medical University and included patients who received doxorubicin-based chemotherapy between January 2021 and December 2023 (Figure 1). A total of 2034 patients with histologically confirmed breast cancer were enrolled after screening, according to predefined inclusion and exclusion criteria. The inclusion criteria were as follows: (1) histologically confirmed diagnosis of breast cancer; (2) completion of at least one full course of doxorubicin-containing chemotherapy; (3) availability of comprehensive clinical, biochemical, electrocardiographic, and echocardiographic data; and (4) documented cardiac follow-up before and after chemotherapy. The exclusion criteria included (1) pre-existing left ventricular dysfunction or symptomatic heart failure prior to chemotherapy; (2) severe hepatic or renal impairment; (3) incomplete or missing clinical, imaging, or follow-up data; and (4) concurrent administration of non-anthracycline cardiotoxic drugs. Based on established diagnostic definitions, all eligible patients were classified into DIC and non-DIC groups for comparative analysis. The study was conducted in accordance with the principles of the Declaration of Helsinki and approved by the Ethics Committee of The Fourth Affiliated Hospital of Harbin Medical University (Approval No. 2024-DWSYLLCZ-15).

2.2. Diagnostic Criteria for DIC

To date, there is no universally accepted gold standard for defining DIC. In this study, DIC was identified according to the Chinese Expert Consensus on the Diagnosis and Treatment of Breast Cancer-Related Cardiovascular Diseases (2022 Edition) and the 2022 ESC Cardio-Oncology Guidelines, which provide the most widely recognized framework for clinical and research applications. DIC was defined as new-onset or progressive cardiac dysfunction occurring after doxorubicin exposure that could not be fully explained by other cardiac conditions. Patients were diagnosed with DIC if any of the criteria listed in Table 1 were met.

2.3. Data Collection and Preprocessing

Comprehensive multidimensional data were collected from the institutional electronic medical record system of The Fourth Affiliated Hospital of Harbin Medical University. The dataset encompassed four major categories of variables: (1) clinical characteristics, including demographic information, comorbidities, and treatment-related factors; (2) biochemical indicators reflecting hepatic, renal, and metabolic function; (3) electrocardiographic variables describing cardiac electrical activity; and (4) echocardiographic measurements assessing cardiac structure and systolic function. Only patients with complete clinical, laboratory, electrocardiographic, and echocardiographic data were included in the final analysis, according to the predefined inclusion criteria. Therefore, no variable exclusion based on missingness thresholds and no imputation using mean, median, or mode values were performed during the modeling process. Standard preprocessing procedures required for model implementation were applied uniformly across the dataset. Continuous variables were normalized using z-score transformation, and categorical variables were converted into dummy indicators through one-hot encoding.
After preprocessing, the final dataset was randomly divided into a training cohort (n = 1627) and an independent validation cohort (n = 407) in a 4:1 ratio using stratified random sampling to maintain the proportional distribution of DIC and non-DIC cases across cohorts.

2.4. TabNet Architecture

TabNet is a deep learning framework specifically developed for tabular data that integrates sequential attention and sparse feature selection to learn informative feature representations while preserving interpretability. Unlike conventional neural networks that process all input features simultaneously, TabNet employs a feature selection mask at each decision step i, defined as
M _ i = S p a r s e m a x ( P _ i   W _ i )
where P _ i represents the prior scale parameter derived from previous attention steps and W _ i denotes the learnable weight matrix. The Sparsemax activation constrains the mask values between 0 and 1 and enforces sparsity, allowing the model to focus selectively on the most informative variables at each step. Each decision step transforms the selected features into a latent representation through a nonlinear decision block
d _ i = f _ i ( M _ i     x )
where x is the input feature vector and denotes element-wise multiplication. The final prediction output is obtained by aggregating the representations across all decision steps
y = _ { i = 1 } ^ { N }   D e c i s i o n S t e p ( d _ i )
This sequential attention mechanism enables TabNet to dynamically allocate feature importance for each individual sample, achieving interpretable and patient-specific decision pathways. The incorporation of sparse feature selection further minimizes redundancy and mitigates overfitting, thereby enhancing generalization on moderate-sized clinical datasets. Through this design, TabNet not only achieves end-to-end feature selection and interpretability, but also aligns well with the heterogeneity and nonlinear interactions inherent in clinical multidimensional data.

2.5. Model Development

To construct a comprehensive predictive framework for DIC, six supervised learning models were developed and systematically compared: logistic regression (LR), decision tree (DT), random forest (RF), GBM, XGBoost, and TabNet. These models were chosen to represent a spectrum of machine learning paradigms, ranging from conventional linear classifiers to ensemble-based and deep learning approaches, thereby enabling a robust comparative evaluation of predictive performance and interpretability. All models were trained using the same dataset, split into a training cohort (80%) and an independent validation cohort (20%) to ensure methodological consistency. Prior to model fitting, all continuous variables were standardized to zero mean and unit variance, and categorical variables were one-hot-encoded. Missing values were imputed using the k-nearest neighbors (KNN) approach. To address potential class imbalance between DIC and non-DIC groups, the synthetic minority oversampling technique (SMOTE) was applied within the training set.
For baseline models (LR, DT, RF, GBM, and XGBoost), hyperparameter optimization was performed using a five-fold cross-validation grid search minimizing the mean squared error for regression-based performance and maximizing the area under the receiver operating characteristic curve (AUC) for classification accuracy. For the TabNet model, optimization was performed through Bayesian hyperparameter tuning, focusing on the learning rate, number of decision steps, relaxation factor (γ), and sparsity regularization coefficient (λ_sparsity). Model training employed an Adam optimizer with an initial learning rate of 0.02, batch size of 512, and early stopping after 30 epochs without improvement in validation loss to prevent overfitting. Model outputs are expressed as predicted probabilities of DIC occurrence. Performance metrics including mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2), AUC, concordance index (C-index), and stability index (σ) were calculated to assess the predictive accuracy, discrimination, calibration, and robustness of the models. This multi-dimensional evaluation ensured that both global performance and generalization ability were comprehensively characterized.

2.6. Statistical Analysis

Statistical analyses were performed using the R software (version 4.3.2) and Python (version 3.10). The statistical analysis framework was designed to describe baseline characteristics, compare group differences, develop and evaluate predictive models, and assess model performance, stability, interpretability, and generalizability. Continuous variables were assessed for normality using the Shapiro–Wilk test and are presented as mean ± standard deviation or median (interquartile range), as appropriate, while categorical variables are expressed as frequencies and percentages. Differences between groups were analyzed using the independent Student’s t-test or Mann–Whitney U test for continuous variables and the chi-square or Fisher’s exact test for categorical variables. Multiple predictive models were subsequently developed, and their performance was assessed using complementary metrics reflecting different aspects of predictive performance. Hyperparameter optimization for all models was performed within the training cohort to avoid information leakage. Model discrimination was evaluated using receiver operating characteristic curves, the area under the curve, and the concordance index, while regression performance was quantified using the mean absolute error, root mean square error, and coefficient of determination. Model calibration and clinical utility were assessed using calibration plots, the Hosmer–Lemeshow goodness-of-fit test, and decision curve analysis. Precision–recall curves were obtained to evaluate model robustness under class imbalance. To assess model robustness and reduce the risk of overfitting, model stability and generalizability were systematically examined throughout the modeling process using five-fold cross-validation and 1000 bootstrap resampling. Comparative performance between models was assessed using DeLong’s test for paired AUC comparisons. In addition, subgroup and stratified analyses were conducted according to age, cumulative anthracycline dose, radiotherapy status, and sex to evaluate the consistency of model performance across clinically relevant patient subgroups. All statistical tests were two-sided, and a p value < 0.05 was considered statistically significant.

3. Results

3.1. Patient Characteristics

A total of 2034 patients receiving doxorubicin-based chemotherapy were included in the present study, 305 of which (15.0%) experienced DIC. The mean age of the entire cohort was 54.8 ± 10.5 years, and the mean BMI was 24.9 ± 3.4 kg/m2. Compared with patients without DIC, those who developed DIC were significantly older (59.1 ± 9.7 vs. 53.9 ± 10.4 years, p < 0.001) and had a higher prevalence of hypertension (59.0% vs. 38.3%, p < 0.001), diabetes (30.8% vs. 18.7%, p < 0.001), and coronary artery disease (14.8% vs. 8.6%, p = 0.004). The distribution of TNM stages differed significantly between the two groups (p = 0.030), with stage III and IV disease more common among patients with DIC. Patients who developed DIC also received higher cumulative doses of doxorubicin (311 ± 49 mg/m2 vs. 272 ± 53 mg/m2, p < 0.001) and were more likely to undergo chest radiotherapy (29.6% vs. 21.4%, p = 0.007) or HER2-targeted therapy (18.0% vs. 11.4%, p = 0.009). No significant difference was found in BMI between the two groups (p = 0.068). Collectively, patients with DIC were characterized by older age, more cardiovascular comorbidities, and higher cumulative anthracycline exposure compared with those without DIC (Table 2).

3.2. Model Parameters

A total of 2034 patients were randomly divided into a training cohort (n = 1627) and a validation cohort (n = 407). Baseline demographic, clinical, echocardiographic, and laboratory parameters were comparable between the two cohorts, and no significant differences were observed in any variable (all p > 0.05). The mean age was 55.0 ± 10.3 years in the training set and 54.6 ± 10.8 years in the validation set, while the mean BMIs were 24.9 ± 3.4 and 25.0 ± 3.5 kg/m2, respectively. The prevalence of hypertension, diabetes, and coronary artery disease was similar between cohorts. The mean cumulative dose of doxorubicin was 279 ± 54 mg/m2 in the training set and 277 ± 56 mg/m2 in the validation set. Cardiac functional indices, including left ventricular ejection fraction, ventricular dimensions, and electrocardiographic parameters, as well as routine biochemical and hematological indicators, showed no significant intergroup differences (all p > 0.05). These findings indicate that the training and validation sets were well balanced, providing a consistent foundation for subsequent model construction and validation (Table 3).

3.3. Model Development and Performance Evaluation

Six machine learning models—logistic regression, decision tree, random forest, GBM, XGBoost, and TabNet—were developed to predict doxorubicin-induced cardiotoxicity. Their comparative performance is illustrated in Figure 2A–D. In the 3D absolute prediction error analysis (Figure 2A), TabNet exhibited the lowest overall prediction deviation, indicating better model calibration than other algorithms.
The regression performance metrics of all models are summarized in Table 4. TabNet achieved the lowest mean absolute error (MAE = 0.175) and root mean square error (RMSE = 0.231), together with the highest coefficient of determination (R2 = 0.83) and stability index (σ = 0.047). These results demonstrate that TabNet provided superior fitting accuracy and robustness compared with traditional ensemble and linear models.
Receiver operating characteristic (ROC) curve analysis (Figure 2B) further confirmed that TabNet had the strongest classification performance, yielding the largest area under the curve (AUC = 0.86, 95% CI 0.82–0.90), followed by XGBoost (AUC = 0.79) and GBM (AUC = 0.75). The corresponding sensitivity, specificity, and Youden index values for each model are listed in Table 5. TabNet also achieved the best balance between true-positive and true-negative predictions (sensitivity 0.85, specificity 0.78, Youden index 0.63).
The concordance index (C-index) results presented in Figure 2C,D showed consistent trends. TabNet achieved the highest discrimination in both the training cohort (C-index = 0.831) and the validation cohort (C-index = 0.796), reflecting strong generalization performance without overfitting. Taken together, TabNet consistently outperformed all other algorithms across regression, discrimination, and calibration metrics, demonstrating optimal accuracy, stability, and clinical applicability for cardiotoxicity prediction.
To provide an integrated evaluation of predictive performance, a radar chart (Supplementary Figure S1) was constructed to visualize the five key performance dimensions of each model, including AUC, C-index for training and validation cohorts, stability, and generalization ability. As illustrated, TabNet consistently occupied the outermost area across all dimensions, indicating its superior and well-balanced performance profile. Specifically, TabNet demonstrated the highest discrimination ability in both the training and validation cohorts, along with the greatest model stability and generalization capacity. In contrast, traditional models such as logistic regression and decision tree exhibited smaller enclosed areas, reflecting relatively limited discrimination and robustness. These results further confirm that the TabNet framework achieved the optimal trade-off between accuracy, stability, and generalization among all evaluated algorithms.

3.4. Model Interpretation

To enhance the interpretability of the TabNet-based prediction model, both global and local feature attribution analyses were conducted. Global feature importance analysis (Figure 3A) revealed that cumulative anthracycline dose, LVEF, and QTc interval were the top three contributors to model output, followed by LDH, creatinine, glucose, hypertension, and platelet count. These features together represent a comprehensive profile encompassing cardiac function, hemodynamic status, and systemic metabolism. The feature mask heatmap (Figure 3B) demonstrated heterogeneous feature weights across individual samples, suggesting that TabNet dynamically adjusted feature attention to achieve personalized risk prediction.
To further validate model interpretability, SHAP analysis was performed to quantify the contribution and directionality of each variable (Figure 4). Consistent with the attention-based feature ranking, higher cumulative dose, prolonged QTc, and elevated LDH were positively associated with increased cardiotoxicity risk, whereas higher LVEF exerted a strong protective effect. These findings confirmed the biological plausibility of the TabNet-derived decision process.
Feature correlation and interaction analyses (Figure 5A,B) provided additional insight into inter-variable relationships. Strong positive correlations were observed between cumulative dose, LDH, and LVEF reduction, whereas QTc prolongation appeared to be moderately linked to metabolic indicators (glucose and creatinine). Pairwise interaction maps derived from TabNet masks revealed that cumulative dose and LVEF formed the most synergistic pair influencing risk prediction, highlighting the central role of anthracycline exposure and cardiac reserve in doxorubicin-induced injury.
Finally, hierarchical clustering of high-risk patients (Figure 6) revealed distinct co-activation patterns among cardiometabolic variables. High-risk individuals exhibited a characteristic cluster defined by prolonged QTc, elevated LDH and creatinine levels, and reduced LVEF, indicating concurrent myocardial stress, metabolic disturbance, and contractile dysfunction. Cumulative anthracycline dose and hypertension were positioned within the same cluster, reinforcing their additive contribution to cardiac vulnerability. Collectively, these clustered patterns delineate a cardiotoxic phenotype characterized by metabolic stress, electrophysiological instability, and impaired cardiac function, aligning well with known mechanisms of anthracycline-induced cardiotoxicity.

3.5. Model Performance and Clinical Utility Evaluation

To comprehensively evaluate the predictive reliability and clinical applicability of the TabNet model, post hoc performance analyses were conducted using the independent validation cohort. Decision curve analysis (Supplementary Figure S2A) demonstrated that the TabNet model yielded a consistently higher net clinical benefit across a wide range of threshold probabilities compared with the “treat-all” and “treat-none” strategies, indicating favorable clinical usefulness in risk-based decision making. Calibration analysis (Supplementary Figure S2B) revealed close agreement between predicted and observed probabilities, with the calibration curve aligning well along the ideal 45-degree line, suggesting well-calibrated probability estimation without systematic deviation. The precision–recall curve (Supplementary Figure S2C) achieved an AP of 0.54, confirming robust sensitivity and precision despite moderate class imbalance. The predicted probability distribution (Supplementary Figure S2D) further illustrated clear separation between DIC and non-DIC groups, underscoring the model’s strong discriminative capacity. Collectively, these results indicate that the TabNet model maintained stable predictive performance and high clinical interpretability in the independent validation cohort, supporting its potential application for individualized cardiotoxicity risk prediction in anthracycline-treated patients.

3.6. Subgroup and Stratified Validation Analysis

To further verify the robustness and clinical generalizability of the TabNet model, stratified and subgroup validation analyses were performed in both the training and validation cohorts. As shown in Supplementary Figure S3A,B, patients were stratified into high-risk and low-risk groups based on the median predicted probability derived from the TabNet model. In both the training and validation cohorts, the high-risk group exhibited a markedly higher cumulative predicted risk of DIC compared with the low-risk group (training cohort p < 0.001; validation cohort p = 0.022). The clear separation of the two curves highlights the model’s strong discriminative power in distinguishing patients with differing susceptibility to anthracycline-related cardiac injury.
Subgroup analysis across key clinical characteristics (Figure 7) further confirmed the stability of TabNet’s predictive performance. Consistent AUC values were observed across sex, age, cumulative anthracycline dose, and radiotherapy subgroups, with all AUCs remaining within the range of 0.75–0.83 in both the training and validation sets. Importantly, no single subgroup or individual variable, including age, disproportionately influenced the model’s discriminative performance. These findings indicate that the model maintained reliable discrimination independent of demographic or treatment-related heterogeneity. Collectively, the stratified and subgroup validation results demonstrate that the TabNet model achieved robust and stable predictive capability across different patient subpopulations, reinforcing its clinical applicability for individualized cardiotoxicity risk assessment in real-world settings.

4. Discussion

DIC remains one of the most critical barriers to optimizing anthracycline therapy in breast cancer, directly affecting treatment continuity, long-term survival, and quality of life [11]. Despite extensive efforts to identify high-risk patients, early prediction of DIC remains a major unmet need in clinical oncology [25,26]. Conventional risk assessment tools rely primarily on post-treatment monitoring of LVEF or single-parameter thresholds, which fail to detect subclinical injury and underestimate the multifactorial nature of cardiotoxicity [13]. Therefore, developing an accurate and interpretable model capable of integrating heterogeneous patient information is crucial for guiding preventive monitoring and individualized treatment adjustment. In this context, advanced machine learning and deep learning techniques offer new opportunities to improve predictive precision and enable truly personalized risk management.
At present, numerous research groups have focused on elucidating the mechanisms underlying doxorubicin-induced cardiotoxicity, with studies spanning mitochondrial dysfunction, ferroptosis, gut microbiota regulation, metabolic reprogramming, and traditional medicine-based interventions, aiming to attenuate or reverse myocardial injury from multiple perspectives [27,28,29]. Although these studies have yielded important mechanistic insights and proposed several potential therapeutic targets, most remain at the basic or early translational stage, and their applicability in routine clinical practice remains limited. In 2025, Li and colleagues compared single-dose and cumulative-dose doxorubicin administration strategies in murine models to induce acute and chronic cardiotoxicity and evaluated associated survival outcomes. Their findings demonstrated that cumulative exposure rather than single-dose intensity is the primary determinant of cardiac injury severity, and they proposed more standardized and reproducible dosing recommendations for experimental modeling [30]. Beyond mechanistic investigations, the development of predictive models has emerged as another important research direction. In 2025, Singh et al. conducted a systematic review of metabolomics studies and reported consistent metabolic disturbances associated with doxorubicin-induced cardiotoxicity across in vitro, in vivo, and clinical settings, involving amino acid metabolism, energy metabolism, and purine/pyrimidine pathways. These findings highlight the potential value of metabolomics in identifying early biomarkers and enabling risk prediction for DIC [31]. With advances in computational methodologies, machine learning approaches have increasingly been applied in this field. In 2024, Huang and colleagues integrated bioinformatics analyses, machine learning algorithms, and weighted gene co-expression network analysis to identify and validate key genes and immune cell infiltration patterns associated with DIC and subsequently constructed a model with favorable diagnostic and predictive performance, underscoring the potential contribution of multidimensional molecular features to early DIC identification [32]. Subsequently, in 2025, another study developed a machine learning model based on routinely available clinical and hematological parameters from breast cancer patients, combined with synthetic data augmentation techniques, to achieve potential prediction of doxorubicin-related cardiotoxicity [33]. By contrast, deep learning approaches have thus far predominantly been applied to target discovery and drug screening in DIC research. In 2023, Liu et al. established a deep learning-assisted, high-content phenotypic screening platform using zebrafish cardiac function, successfully identifying candidate compounds with cardioprotective effects and revealing a potential mechanism involving modulation of the Keap1–Nrf2 pathway to mitigate doxorubicin-induced myocardial injury. This work demonstrated the promise of artificial intelligence-based methods in drug discovery and functional evaluation for DIC [34]. In the same year, Chen and colleagues developed a deep learning-based high-content screening system to precisely quantify doxorubicin-induced DNA double-strand breaks and to screen candidate compounds with cardioprotective properties, further supporting the feasibility of artificial intelligence techniques for early identification and intervention in doxorubicin-related cardiotoxicity [35].
In the present study, we developed and validated a multidimensional, interpretable deep learning framework based on the TabNet architecture to predict DIC in patients with breast cancer. By integrating clinical, biochemical, electrocardiographic, and echocardiographic data, the model achieved superior predictive performance compared with five conventional algorithms—logistic regression, decision tree, random forest, GBM, and XGBoost. TabNet demonstrated the best overall performance across discrimination, calibration, generalization, and stability metrics, with an AUC of 0.86 and a C-index of 0.80 in the validation cohort. Its probability estimates were well-calibrated and maintained robust discrimination across subgroups defined by age, cumulative anthracycline dose, and radiotherapy exposure. Moreover, TabNet provided clear interpretability through attention-based feature attribution, allowing for the identification of both globally important predictors and individualized risk contributors. Together, these results indicate that our TabNet-based model provides a reliable and clinically meaningful tool for early risk identification in anthracycline-treated patients.
In this study, eight features emerged as the most influential predictors of DIC within the TabNet model: cumulative anthracycline dose, LVEF, QTc interval, LDH, creatinine, glucose, hypertension, and platelet count. These indicators collectively represent an integrated profile of cardiac structure, function, and systemic metabolic status [36,37,38]. Cumulative anthracycline dose was identified as the strongest predictor, consistent with the well-established dose-dependent mechanism of doxorubicin cardiotoxicity [39]. Anthracyclines generate reactive oxygen species (ROS) through redox cycling and iron-mediated Fenton reactions, leading to mitochondrial damage, lipid peroxidation, and myofibrillar disarray [40]. Higher cumulative exposure exacerbates oxidative stress and disrupts sarcomeric integrity, resulting in a progressive decline in contractile function. LVEF, a measure of global systolic performance, naturally reflects myocardial contractility and reserve capacity [41]. In our model, lower baseline LVEF was strongly associated with higher DIC risk, aligning with prior evidence that patients with reduced myocardial strain or borderline ejection fraction are more vulnerable to anthracycline-induced dysfunction [42]. The QTc interval, another key feature, serves as a surrogate marker of electrical instability and ventricular repolarization heterogeneity [43]. Anthracycline exposure has been shown to prolong QTc by modulating ion channel kinetics and disrupting mitochondrial energy homeostasis in cardiomyocytes. Prolonged QTc not only reflects electrophysiological disturbance, but also predisposes patients to malignant arrhythmias, amplifying the clinical impact of subclinical myocardial injury [44]. Metabolic indicators, particularly LDH, creatinine, and glucose, further enhanced the model’s predictive performance by capturing systemic metabolic stress and tissue injury. Elevated LDH represents increased anaerobic metabolism and cellular turnover, often observed in myocardial ischemia or oxidative stress conditions [45]. Creatinine elevation indicates impaired renal clearance, which has been associated with reduced cardiac output and altered drug pharmacokinetics [46], both of which increase the likelihood of anthracycline accumulation and toxicity. Hyperglycemia and insulin resistance contribute to endothelial dysfunction [47], mitochondrial impairment, and inflammation [48], which synergistically exacerbate anthracycline-induced oxidative damage. These metabolic factors may thus act as amplifiers of cardiac vulnerability.
Hypertension and platelet count, though traditionally regarded as secondary factors, emerged as significant contributors within the TabNet framework. Chronic hypertension accelerates myocardial remodeling and increases left ventricular wall stress, thereby lowering the threshold for anthracycline-induced damage [49]. Platelet count reflects vascular and inflammatory homeostasis, while mild thrombocytosis or activation can indicate subclinical inflammation and endothelial dysfunction [50], both of which potentiate cardiotoxicity via microvascular injury. The observed clustering between cumulative dose, hypertension, and metabolic markers in the feature interaction analysis suggests a tightly interconnected network of hemodynamic, metabolic, and oxidative stress pathways underlying DIC development.
Collectively, these eight predictors delineate a biologically coherent and clinically interpretable model that integrates myocardial, vascular, and metabolic domains. This multidimensional interaction pattern highlights that DIC is not a single-organ phenomenon, but rather the manifestation of systemic cardiometabolic stress, which reinforces the necessity of multi-parameter risk assessment.
From a clinical perspective, the proposed TabNet framework holds considerable potential for enhancing precision cardio-oncology. By identifying patients with high predicted DIC risk, clinicians could implement proactive management strategies, such as limiting cumulative anthracycline exposure, substituting with less cardiotoxic analogs, or initiating cardioprotective therapy (e.g., beta-blockers, ACE inhibitors, or dexrazoxane). The model’s interpretability also facilitates clinical adoption, as it enables visualization of individual-level feature contributions and transparent risk explanation, bridging the gap between complex deep learning algorithms and routine clinical decision making. Moreover, its integration of multimodal data—combining biochemical, electrocardiographic, and echocardiographic variables—aligns with the emerging paradigm of holistic patient profiling, which aims to move beyond single-parameter monitoring toward systems-based risk assessment. The application of such interpretable artificial intelligence in real-world practice could ultimately enable personalized chemotherapy planning and improve long-term cardiovascular outcomes in breast cancer survivors.
Despite these strengths, several limitations should be acknowledged. Although the sample size in this study was relatively large, the analysis was conducted using a single-center retrospective design, which may restrict the generalizability of the findings. Although doxorubicin-induced cardiotoxicity was defined according to established national and ESC guidelines, variability in echocardiographic measurements (such as LVEF and GLS), together with differences in biomarker interpretation, may have introduced potential outcome misclassification. Follow-up information was derived from routine clinical records, and incomplete post-chemotherapy follow-up cannot be fully excluded, which may increase the risk of information bias. External validation in multicenter and prospective cohorts is therefore warranted to further confirm the robustness and clinical applicability of the proposed model. As this study represents an early stage of model development, the proposed framework is not intended for direct bedside application and requires prospective validation and further translational development. At the current stage, the model primarily serves as a foundational tool for risk stratification and hypothesis generation rather than immediate clinical deployment. Beyond these considerations, the model was developed using routinely collected clinical and imaging variables without incorporating molecular or longitudinal data, such as genomic or proteomic profiles, which could further enhance biological interpretability and predictive precision. Although the TabNet model demonstrated superior interpretability compared with conventional neural networks, its decision pathways still require prospective validation and integration into clinical workflows to evaluate its feasibility and safety in real-world practice. Looking ahead, future research should focus on developing dynamic modeling strategies that incorporate temporal variations in biomarkers or imaging indices to better characterize the progressive trajectory of anthracycline-induced cardiac injury.

5. Conclusions

This study established an interpretable and multidimensional deep learning model based on the TabNet architecture for individualized prediction of doxorubicin-induced cardiotoxicity in breast cancer. By integrating heterogeneous patient data and leveraging attention-based feature learning, the model achieves robust predictive accuracy and biological transparency. This framework not only demonstrates the feasibility of interpretable deep learning in complex clinical prediction, but also provides a practical foundation for precision risk assessment and proactive cardio protection in oncology practice.
In clinical practice, such a predictive framework may facilitate early risk stratification for anthracycline-induced cardiotoxicity and support individualized surveillance and cardioprotective strategies during cancer treatment. More broadly, these findings highlight the growing importance of early and interpretable risk prediction models in cardio-oncology, reflecting how the integration of multimodal clinical data with emerging deep learning approaches can advance individualized risk stratification. By enabling timely identification of patients at heightened risk of anthracycline-induced cardiotoxicity before the onset of overt cardiac dysfunction, this approach supports earlier risk awareness and more informed clinical decision making.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/cancers18010117/s1, Figure S1: Overall Performance Comparison of Predictive Models; Figure S2: Comprehensive performance and clinical utility evaluation of the TabNet model; Figure S3: Cumulative predicted risk curves in training and validation cohorts.

Author Contributions

Writing—original draft and writing—review and editing: J.C., X.H. and L.D.; data curation and investigation: X.H. and L.D.; methodology and supervision: W.J. and W.Y.; resources, funding acquisition, and project administration: W.J. and W.Y. All authors have read and agreed to the published version of the manuscript.

Funding

Heilongjiang Provincial Key Research and Development Program (2022ZX06C25). Postgraduate Scientific Research and Practice Innovation Project of Harbin Medical University 2024 (YJSCX2024-71HYD). Natural Science Foundation of Liaoning Province (2025-MS-337).

Institutional Review Board Statement

This study was conducted in accordance with the Helsinki Declaration and received approval from the ethics committee of Fourth Affiliated Hospital of Harbin Medical University (2024-DWSYLLCZ-15, 7 June 2024).

Informed Consent Statement

All patients signed an “Informed Consent Form for the Secondary Use of Medical History Data/Biological Specimens”. Informed consent was obtained from all individual participants included in the study.

Data Availability Statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest

All the authors certify that they have no affiliations with or involvement in any organization or entity with any financial or nonfinancial interest in the subject matter or materials discussed in this manuscript.

Abbreviations

The following abbreviations are used in this manuscript:
AbbreviationFull term
ACEAngiotensin-converting enzyme
AdamAdaptive moment estimation
A/GAlbumin-to-globulin ratio
ALBAlbumin
ALPAlkaline phosphatase
ALTAlanine aminotransferase
APAverage precision
ASTAspartate aminotransferase
AUCArea under the receiver operating characteristic curve
BASOBasophils
BMIBody mass index
BPBlood pressure
BUNBlood urea nitrogen
CIConfidence interval
CREACreatinine
C-indexConcordance index
DBILDirect bilirubin
DCADecision curve analysis
DICDoxorubicin-induced cardiotoxicity
DTDecision tree
EOSEosinophils
ESCEuropean Society of Cardiology
GBMGradient boosting machines
GGTGamma-glutamyl transferase
GLBGlobulin
GLSGlobal longitudinal strain
GLUGlucose
HbHemoglobin
HctHematocrit
hs-cTnI/THigh-sensitivity cardiac troponin I/T
IDBILIndirect bilirubin
IQRInterquartile range
KNNk-nearest neighbor
LALeft atrial
LDHLactate dehydrogenase
LRLogistic regression
LVEDDLeft ventricular end-diastolic diameter
LVEFLeft ventricular ejection fraction
LYMLymphocytes
MAEMean absolute error
MLMachine learning
MONMonocytes
NEUNeutrophils
NT-proBNPN-terminal pro–B-type natriuretic peptide
PALBPrealbumin
PLTPlatelet count
PTProthrombin time
QTcCorrected QT interval
QRSQRS complex duration
RBCRed blood cell count
RFRandom forest
RMSERoot mean square error
ROCReceiver operating characteristic
ROSReactive oxygen species
SHAPSHapley Additive exPlanations
SMOTESynthetic minority oversampling technique
TabNetTabular Neural Network
TBILTotal bilirubin
TNMTumor–Node–Metastasis staging system
TPTotal protein
UAUric acid
WBCWhite blood cell count
XGBoostExtreme gradient boosting

References

  1. Bray, F.; Laversanne, M.; Sung, H.; Ferlay, J.; Siegel, R.L.; Soerjomataram, I.; Jemal, A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2024, 74, 229–263. [Google Scholar] [CrossRef] [PubMed]
  2. Kim, J.; Harper, A.; McCormack, V.; Sung, H.; Houssami, N.; Morgan, E.; Mutebi, M.; Garvey, G.; Soerjomataram, I.; Fidler-Benaoudia, M.M. Global patterns and trends in breast cancer incidence and mortality across 185 countries. Nat. Med. 2025, 31, 1154–1162. [Google Scholar] [CrossRef] [PubMed]
  3. Heater, N.K.; Warrior, S.; Lu, J. Current and future immunotherapy for breast cancer. J. Hematol. Oncol. 2024, 17, 131. [Google Scholar] [CrossRef]
  4. Xiong, X.; Zheng, L.W.; Ding, Y.; Chen, Y.F.; Cai, Y.W.; Wang, L.P.; Huang, L.; Liu, C.C.; Shao, Z.M.; Yu, K.D. Breast cancer: Pathogenesis and treatments. Signal Transduct. Target. Ther. 2025, 10, 49. [Google Scholar] [CrossRef]
  5. Kciuk, M.; Gielecińska, A.; Mujwar, S.; Kołat, D.; Kałuzińska-Kołat, Ż.; Celik, I.; Kontek, R. Doxorubicin-An Agent with Multiple Mechanisms of Anticancer Activity. Cells 2023, 12, 659. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  6. Mattioli, R.; Ilari, A.; Colotti, B.; Mosca, L.; Fazi, F.; Colotti, G. Doxorubicin and other anthracyclines in cancers: Activity, chemoresistance and its overcoming. Mol. Aspects Med. 2023, 93, 101205. [Google Scholar] [CrossRef] [PubMed]
  7. Bisht, A.; Avinash, D.; Sahu, K.K.; Patel, P.; Gupta, G.D.; Kurmi, B.D. A comprehensive review on doxorubicin: Mechanisms, toxicity, clinical trials, combination therapies and nanoformulations in breast cancer. Drug Deliv. Transl. Res. 2025, 15, 102–133. [Google Scholar] [CrossRef] [PubMed]
  8. Sattler, S.; Ljubojevic-Holzer, S. CD8+ T cells as the missing link between doxorubicin cancer therapy and heart failure risk. Nat. Cardiovasc. Res. 2024, 3, 890–892. [Google Scholar] [CrossRef] [PubMed]
  9. Luo, W.; Zou, X.; Wang, Y.; Dong, Z.; Weng, X.; Pei, Z.; Song, S.; Zhao, Y.; Wei, Z.; Gao, R.; et al. Critical Role of the cGAS-STING Pathway in Doxorubicin-Induced Cardiotoxicity. Circ. Res. 2023, 132, e223–e242. [Google Scholar] [CrossRef] [PubMed]
  10. Kong, C.Y.; Guo, Z.; Song, P.; Zhang, X.; Yuan, Y.P.; Teng, T.; Yan, L.; Tang, Q.Z. Underlying the Mechanisms of Doxorubicin-Induced Acute Cardiotoxicity: Oxidative Stress and Cell Death. Int. J. Biol. Sci. 2022, 18, 760–770. [Google Scholar] [CrossRef]
  11. Camilli, M.; Cipolla, C.M.; Dent, S.; Minotti, G.; Cardinale, D.M. Anthracycline Cardiotoxicity in Adult Cancer Patients: JACC: CardioOncology State-of-the-Art Review. JACC CardioOncol. 2024, 6, 655–677. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  12. Bloom, M.W.; Vo, J.B.; Rodgers, J.E.; Ferrari, A.M.; Nohria, A.; Deswal, A.; Cheng, R.K.; Kittleson, M.M.; Upshaw, J.N.; Palaskas, N.; et al. Cardio-Oncology and Heart Failure: A Scientific Statement From the Heart Failure Society of America. J. Card. Fail. 2025, 31, 415–455. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  13. Vancheri, F.; Longo, G.; Henein, M.Y. Left ventricular ejection fraction: Clinical, pathophysiological, and technical limitations. Front. Cardiovasc. Med. 2024, 11, 1340708. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  14. Travers, S.; Alexandre, J.; Baldassarre, L.A.; Salem, J.E.; Mirabel, M. Diagnosis of cancer therapy-related cardiovascular toxicities: A multimodality integrative approach and future developments. Arch. Cardiovasc. Dis. 2025, 118, 185–198. [Google Scholar] [CrossRef] [PubMed]
  15. Wang, Z.; Gu, Y.; Huang, L.; Liu, S.; Chen, Q.; Yang, Y.; Hong, G.; Ning, W. Construction of machine learning diagnostic models for cardiovascular pan-disease based on blood routine and biochemical detection data. Cardiovasc. Diabetol. 2024, 23, 351. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  16. Dincer, S.; Akmansu, M.; Akyol, O. Machine learning modeling of cancer treatment-related cardiac events in breast cancer: Utilizing dosiomics and radiomics. Front. Oncol. 2025, 15, 1557382. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  17. An, D.; Ibrahim, E.S. Elucidating Early Radiation-Induced Cardiotoxicity Markers in Preclinical Genetic Models Through Advanced Machine Learning and Cardiac MRI. J. Imaging 2024, 10, 308. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  18. Li, C.; Chen, L.; Chou, C.; Ngorsuraches, S.; Qian, J. Using Machine Learning Approaches to Predict Short-Term Risk of Cardiotoxicity Among Patients with Colorectal Cancer After Starting Fluoropyrimidine-Based Chemotherapy. Cardiovasc. Toxicol. 2022, 22, 130–140. [Google Scholar] [CrossRef] [PubMed]
  19. Rudin, C. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  20. Ravera, F.; Gilardi, N.; Ballestrero, A.; Zoppoli, G. Applications, challenges and future directions of artificial intelligence in cardio-oncology. Eur. J. Clin. Investig. 2025, 55, e14370. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  21. Chen, X.; Liu, C.; Zhao, H.; Zhong, Y.; Xu, Y.; Wang, Y. Deep learning-assisted high-content screening identifies isoliquiritigenin as an inhibitor of DNA double-strand breaks for preventing doxorubicin-induced cardiotoxicity. Biol. Direct 2023, 18, 63. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  22. Kita, K.; Fujimori, T.; Suzuki, Y.; Kanie, Y.; Takenaka, S.; Kaito, T.; Taki, T.; Ukon, Y.; Furuya, M.; Saiwai, H.; et al. Bimodal artificial intelligence using TabNet for differentiating spinal cord tumors-Integration of patient background information and images. iScience 2023, 26, 107900. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  23. Qi, H.; Hu, Y.; Fan, R.; Deng, L. Tab-Cox: An Interpretable Deep Survival Analysis Model for Patients With Nasopharyngeal Carcinoma Based on TabNet. IEEE J. Biomed. Health Inform. 2024, 28, 4937–4950. [Google Scholar] [CrossRef] [PubMed]
  24. Chowdhury, M.N.H.; Reaz, M.B.I.; Ali, S.H.M.; Crespo, M.L.; Ahmad, S.; Salim, G.M.; Haque, F.; Ordóñez, L.G.G.; Islam, M.J.; Mahdee, T.M.; et al. Deep learning for early detection of chronic kidney disease stages in diabetes patients: A TabNet approach. Artif. Intell. Med. 2025, 166, 103153. [Google Scholar] [CrossRef] [PubMed]
  25. Ito-Hagiwara, K.; Hagiwara, J.; Endo, Y.; Becker, L.B.; Hayashida, K. Cardioprotective strategies against doxorubicin-induced cardiotoxicity: A review from standard therapies to emerging mitochondrial transplantation. Biomed. Pharmacother. 2025, 189, 118315. [Google Scholar] [CrossRef] [PubMed]
  26. Chen, Y.; Shi, S.; Dai, Y. Research progress of therapeutic drugs for doxorubicin-induced cardiomyopathy. Biomed. Pharmacother. 2022, 156, 113903. [Google Scholar] [CrossRef] [PubMed]
  27. Wu, L.; Wang, L.; Du, Y.; Zhang, Y.; Ren, J. Mitochondrial quality control mechanisms as therapeutic targets in doxorubicin-induced cardiotoxicity. Trends Pharmacol. Sci. 2023, 44, 34–49. [Google Scholar] [CrossRef] [PubMed]
  28. Rawat, P.S.; Jaiswal, A.; Khurana, A.; Bhatti, J.S.; Navik, U. Doxorubicin-induced cardiotoxicity: An update on the molecular mechanism and novel therapeutic strategies for effective management. Biomed. Pharmacother. 2021, 139, 111708. [Google Scholar] [CrossRef] [PubMed]
  29. Shi, Y.; Cai, J.; Chen, L.; Cheng, H.; Song, X.; Xue, J.; Xu, R.; Ma, J.; Ge, J. AIG1 protects against doxorubicin-induced cardiomyocyte ferroptosis and cardiotoxicity by promoting ubiquitination-mediated p53 degradation. Theranostics 2025, 15, 4931–4954. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  30. Li, M.; Zhang, Y.; Wu, B.; Qiu, R.; Zhao, C.; Chen, B.; Shang, H. Optimizing dose selection for doxorubicin-induced cardiotoxicity in mice: A comprehensive analysis of single and multiple-dose regimens. Eur. J. Pharmacol. 2025, 1003, 177883. [Google Scholar] [CrossRef] [PubMed]
  31. Singh, A.; Bakhtyar, M.; Jun, S.R.; Boerma, M.; Lan, R.S.; Su, L.J.; Makhoul, S.; Hsu, P.C. A narrative review of metabolomics approaches in identifying biomarkers of doxorubicin-induced cardiotoxicity. Metabolomics 2025, 21, 68. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  32. Huang, C.; Pei, J.; Li, D.; Liu, T.; Li, Z.; Zhang, G.; Chen, R.; Xu, X.; Li, B.; Lian, Z.; et al. Analysis and Validation of Critical Signatures and Immune Cell Infiltration Characteristics in Doxorubicin-Induced Cardiotoxicity by Integrating Bioinformatics and Machine Learning. J. Inflamm. Res. 2024, 17, 669–685. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  33. Araújo, D.C.; Simões, R.; Sabino, A.P.; Oliveira, A.N.; Oliveira, C.M.; Veloso, A.A.; Gomes, K.B. Predicting doxorubicin-induced cardiotoxicity in breast cancer: Leveraging machine learning with synthetic data. Med. Biol. Eng. Comput. 2025, 63, 1535–1550. [Google Scholar] [CrossRef] [PubMed]
  34. Liu, C.; Wang, Y.; Zeng, Y.; Kang, Z.; Zhao, H.; Qi, K.; Wu, H.; Zhao, L.; Wang, Y. Use of Deep-Learning Assisted Assessment of Cardiac Parameters in Zebrafish to Discover Cyanidin Chloride as a Novel Keap1 Inhibitor Against Doxorubicin-Induced Cardiotoxicity. Adv. Sci. 2023, 10, e2301136. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  35. Grafton, F.; Ho, J.; Ranjbarvaziri, S.; Farshidfar, F.; Budan, A.; Steltzer, S.; Maddah, M.; Loewke, K.E.; Green, K.; Patel, S.; et al. Deep learning detects cardiotoxicity in a high-content screen with induced pluripotent stem cell-derived cardiomyocytes. eLife 2021, 10, e68714. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  36. Saleh, Y.; Abdelkarim, O.; Herzallah, K.; Abela, G.S. Anthracycline-induced cardiotoxicity: Mechanisms of action, incidence, risk factors, prevention, and treatment. Heart Fail. Rev. 2021, 26, 1159–1173. [Google Scholar] [CrossRef] [PubMed]
  37. Puppe, J.; van Ooyen, D.; Neise, J.; Thangarajah, F.; Eichler, C.; Krämer, S.; Pfister, R.; Mallmann, P.; Wirtz, M.; Michels, G. Evaluation of QTc Interval Prolongation in Breast Cancer Patients after Treatment with Epirubicin, Cyclophosphamide, and Docetaxel and the Influence of Interobserver Variation. Breast Care 2017, 12, 40–44. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  38. Zhou, X.; Weng, Y.; Jiang, T.; Ou, W.; Zhang, N.; Dong, Q.; Tang, X. Influencing factors of anthracycline-induced subclinical cardiotoxicity in acute leukemia patients. BMC Cancer 2023, 23, 976. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  39. Lyon, A.R.; López-Fernández, T.; Couch, L.S.; Asteggiano, R.; Aznar, M.C.; Bergler-Klein, J.; Boriani, G.; Cardinale, D.; Cordoba, R.; Cosyns, B.; et al. 2022 ESC Guidelines on cardio-oncology developed in collaboration with the European Hematology Association (EHA), the European Society for Therapeutic Radiology and Oncology (ESTRO) and the International Cardio-Oncology Society (IC-OS). Eur. Heart J. 2022, 43, 4229–4361, Erratum in Eur. Heart J. 2023, 44, 1621. https://doi.org/10.1093/eurheartj/ehad196. PMID: 36017568. [Google Scholar] [CrossRef]
  40. Li, Y.; Yan, J.; Yang, P. The mechanism and therapeutic strategies in doxorubicin-induced cardiotoxicity: Role of programmed cell death. Cell Stress Chaperones 2024, 29, 666–680, Erratum in Cell Stress Chaperones 2024, 29, 720. https://doi.org/10.1016/j.cstres.2024.10.005. PMID: 39343295; PMCID: PMC11490929. [Google Scholar] [CrossRef]
  41. Rosano, G.M.C.; Teerlink, J.R.; Kinugawa, K.; Bayes-Genis, A.; Chioncel, O.; Fang, J.; Greenberg, B.; Ibrahim, N.E.; Imamura, T.; Inomata, T.; et al. The use of left ventricular ejection fraction in the diagnosis and management of heart failure. A clinical consensus statement of the Heart Failure Association (HFA) of the ESC, the Heart Failure Society of America (HFSA), and the Japanese Heart Failure Society (JHFS). Eur. J. Heart Fail. 2025, 27, 1174–1187. [Google Scholar] [CrossRef] [PubMed]
  42. Cardinale, D.; Iacopo, F.; Cipolla, C.M. Cardiotoxicity of Anthracyclines. Front. Cardiovasc. Med. 2020, 7, 26. [Google Scholar] [CrossRef] [PubMed]
  43. Giraud, E.L.; Ferrier, K.R.M.; Lankheet, N.A.G.; Desar, I.M.E.; Steeghs, N.; Beukema, R.J.; van Erp, N.P.; Smolders, E.J. The QT interval prolongation potential of anticancer and supportive drugs: A comprehensive overview. Lancet Oncol. 2022, 23, e406–e415. [Google Scholar] [CrossRef] [PubMed]
  44. Richardson, D.R.; Parish, P.C.; Tan, X.; Fabricio, J.; Andreini, C.L.; Hicks, C.H.; Jensen, B.C.; Muluneh, B.; Zeidner, J.F. Association of QTc Formula With the Clinical Management of Patients With Cancer. JAMA Oncol. 2022, 8, 1616–1623. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  45. She, H.; Hu, Y.; Zhao, G.; Du, Y.; Wu, Y.; Chen, W.; Li, Y.; Wang, Y.; Tan, L.; Zhou, Y.; et al. Dexmedetomidine Ameliorates Myocardial Ischemia-Reperfusion Injury by Inhibiting MDH2 Lactylation via Regulating Metabolic Reprogramming. Adv. Sci. 2024, 11, e2409499. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  46. Kalkhoran, S.B.; Basalay, M.; He, Z.; Golforoush, P.; Roper, T.; Caplin, B.; Salama, A.D.; Davidson, S.M.; Yellon, D.M. Investigating the cause of cardiovascular dysfunction in chronic kidney disease: Capillary rarefaction and inflammation may contribute to detrimental cardiovascular outcomes. Basic. Res. Cardiol. 2024, 119, 937–955. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  47. An, Y.; Geng, K.; Wang, H.Y.; Wan, S.R.; Ma, X.M.; Long, Y.; Xu, Y.; Jiang, Z.Z. Hyperglycemia-induced STING signaling activation leads to aortic endothelial injury in diabetes. Cell Commun. Signal. 2023, 21, 365. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  48. Szukiewicz, D. Molecular Mechanisms for the Vicious Cycle between Insulin Resistance and the Inflammatory Response in Obesity. Int. J. Mol. Sci. 2023, 24, 9818. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  49. Gallo, G.; Savoia, C. Hypertension and Heart Failure: From Pathophysiology to Treatment. Int. J. Mol. Sci. 2024, 25, 6661. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  50. Tefferi, A.; Gangat, N.; Loscocco, G.G.; Guglielmelli, P.; Szuber, N.; Pardanani, A.; Orazi, A.; Barbui, T.; Vannucchi, A.M. Essential Thrombocythemia: A Review. JAMA 2025, 333, 701–714. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Flow diagram of patient selection and study design.
Figure 1. Flow diagram of patient selection and study design.
Cancers 18 00117 g001
Figure 2. Comparative performance of six predictive models. (A) 3D absolute prediction error comparison. (B) ROC curves. (C) Model concordance indices (C-index heatmap). (D) Generalization performance in training and validation cohorts.
Figure 2. Comparative performance of six predictive models. (A) 3D absolute prediction error comparison. (B) ROC curves. (C) Model concordance indices (C-index heatmap). (D) Generalization performance in training and validation cohorts.
Cancers 18 00117 g002
Figure 3. Global and local feature importance of the TabNet model. (A) Global attention-based feature importance. (B) Feature mask heatmap showing local explainability across samples.
Figure 3. Global and local feature importance of the TabNet model. (A) Global attention-based feature importance. (B) Feature mask heatmap showing local explainability across samples.
Cancers 18 00117 g003
Figure 4. SHAP summary plot of feature contributions in the TabNet model.
Figure 4. SHAP summary plot of feature contributions in the TabNet model.
Cancers 18 00117 g004
Figure 5. Correlation and interaction analysis of multi-modal features. (A) Pearson correlation matrix. (B) Pairwise interaction map.
Figure 5. Correlation and interaction analysis of multi-modal features. (A) Pearson correlation matrix. (B) Pairwise interaction map.
Cancers 18 00117 g005
Figure 6. Feature clustering heatmap for high-risk samples.
Figure 6. Feature clustering heatmap for high-risk samples.
Cancers 18 00117 g006
Figure 7. Subgroup analysis of model performance across clinical characteristics.
Figure 7. Subgroup analysis of model performance across clinical characteristics.
Cancers 18 00117 g007
Table 1. Diagnostic criteria for DIC.
Table 1. Diagnostic criteria for DIC.
ParameterDiagnostic Criteria
Left ventricular ejection fraction (LVEF)≥10% absolute reduction from baseline to <53% or ≥5% reduction to <53% with symptoms or signs of heart failure.
Global longitudinal strain (GLS)>15% relative reduction from baseline, indicating subclinical myocardial injury.
Cardiac biomarkersPersistent elevation of hs-cTnI/T or NT-proBNP above the upper reference limit, or a 1.5- to 2-fold increase from baseline.
Clinical manifestationsNew or worsening symptoms of heart failure, such as dyspnea, fatigue, or peripheral edema.
Table 2. Patient characteristics.
Table 2. Patient characteristics.
TotalDICNon-DIC
Variablen = 2034n = 3051729p
Age (years)54.8 ± 10.559.1 ± 9.753.9 ± 10.4<0.001
BMI (kg/m2)24.9 ± 3.425.4 ± 3.524.8 ± 3.40.068
Hypertension (%)41.65938.3<0.001
Yes41.60%59.00%38.30%
No58.40%41.00%61.70%
Diabetes (%)20.530.818.7<0.001
Yes20.50%30.80%18.70%
No79.50%69.20%81.30%
Coronary artery disease (%)9.614.88.60.004
Yes9.60%14.80%8.60%
No90.40%85.20%91.40%
TNM stage 0.030
Stage I18.50%12.00%19.70%
Stage II37.30%38.50%37.00%
Stage III31.00%34.00%30.40%
Stage IV13.20%15.50%12.90%
Cumulative dose (mg/m2)278 ± 55311 ± 49272 ± 53<0.001
Chest radiotherapy 0.007
Yes22.70%29.60%21.40%
No77.30%70.40%78.60%
HER2-targeted therapy 0.009
Yes12.50%18.00%11.40%
No87.50%82.00%88.60%
Table 3. Comparison between the training and validation cohorts.
Table 3. Comparison between the training and validation cohorts.
Training SetValidation Set
Variablen = 1627n = 407p
Age (years)55.0 ± 10.354.6 ± 10.80.472
BMI (kg/m2)24.9 ± 3.425.0 ± 3.50.682
Hypertension (%)41.840.70.613
Yes41.80%40.70%
No58.20%59.30%
Diabetes (%)20.3210.745
Yes20.30%21.00%
No79.70%79.00%
Coronary artery disease (%)9.510.10.699
Yes9.50%10.10%
No90.50%89.90%
TNM stage 0.511
Stage I18.30%19.20%
Stage II37.50%36.80%
Stage III31.20%30.40%
Stage IV13.00%13.60%
Cumulative dose (mg/m2)279 ± 54277 ± 560.594
Chest radiotherapy (%)22.523.10.808
Yes22.50%23.10%
No77.50%76.90%
HER2-targeted therapy (%)12.611.80.671
Yes12.60%11.80%
No87.40%88.20%
Systolic BP (mmHg)126 ± 15127 ± 140.513
Diastolic BP (mmHg)78 ± 977 ± 90.42
Heart rate (bpm)76 ± 1175 ± 100.286
QTc (ms)423 ± 26424 ± 250.661
QRS duration (ms)93 ± 1292 ± 130.338
LVEF (%)62.7 ± 6.662.9 ± 6.80.764
LVEDD (mm)48.4 ± 4.848.3 ± 4.90.837
LA diameter (mm)36.9 ± 4.236.7 ± 4.30.541
E/A ratio1.03 ± 0.281.04 ± 0.270.678
ALT (U/L)25 [19–32]25 [18–31]0.729
AST (U/L)24 [19–30]24 [19–29]0.842
GGT (U/L)28 [20–39]27 [20–38]0.504
LDH (U/L)211 ± 61210 ± 600.812
TBIL (μmol/L)12.0 [9.0–15.0]12.1 [9.2–15.3]0.905
DBIL (μmol/L)3.6 [2.8–4.6]3.6 [2.9–4.7]0.772
IDBIL (μmol/L)8.3 [6.3–10.5]8.3 [6.4–10.6]0.931
TP (g/L)69.8 ± 5.169.9 ± 5.20.876
ALB (g/L)41.6 ± 3.841.8 ± 3.70.521
GLB (g/L)28.2 ± 3.128.3 ± 3.00.729
A/G1.47 ± 0.221.48 ± 0.220.463
PALB (mg/L)230 [190–270]231 [192–272]0.833
BUN (mmol/L)5.8 ± 1.65.8 ± 1.60.952
CREA (μmol/L)73.6 ± 14.173.2 ± 13.90.74
UA (μmol/L)341 ± 89339 ± 900.678
ALP (U/L)85 ± 2884 ± 270.563
GLU (mmol/L)5.64 ± 1.095.61 ± 1.080.744
WBC (×109/L)6.4 ± 1.96.3 ± 1.80.498
NEU (%)61 ± 961 ± 90.882
LYM (%)28 ± 828 ± 80.97
MON (%)7.1 ± 2.07.0 ± 2.00.512
EOS (%)2.2 [1.4–3.1]2.2 [1.5–3.0]0.789
BASO (%)0.5 [0.4–0.6]0.5 [0.4–0.6]0.946
Hb (g/L)129 ± 14129 ± 130.944
RBC (×1012/L)4.32 ± 0.484.33 ± 0.470.807
Hct (L/L)0.390 ± 0.0400.391 ± 0.0410.835
PLT (×109/L)238 ± 62239 ± 630.884
PT (s)12.8 ± 2.212.9 ± 2.20.714
D-dimer (mg/L FEU)0.32 [0.21–0.51]0.32 [0.20–0.50]0.921
Table 4. Regression performance metrics of six machine learning models.
Table 4. Regression performance metrics of six machine learning models.
Model MAE RMSE R2 Stability (σ)
Logistic Regression0.3860.4720.580.091
Decision Tree0.3340.4190.630.085
Random Forest0.3260.4010.680.076
GBM0.240.3150.740.062
XGBoost0.2460.3080.750.059
TabNet0.1750.2310.830.047
Table 5. Classification performance of six predictive models.
Table 5. Classification performance of six predictive models.
ModelAUC95% CISensitivitySpecificityYouden Index
Logistic Regression0.660.62–0.700.680.640.32
Decision Tree0.660.61–0.710.670.630.30
Random Forest0.750.71–0.790.780.720.50
GBM0.750.70–0.790.800.710.51
XGBoost0.790.75–0.830.820.740.56
TabNet0.860.82–0.900.850.780.63
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cao, J.; Hong, X.; Dong, L.; Jiang, W.; Yang, W. A TabNet-Based Multidimensional Deep Learning Model for Predicting Doxorubicin-Induced Cardiotoxicity in Breast Cancer Patients. Cancers 2026, 18, 117. https://doi.org/10.3390/cancers18010117

AMA Style

Cao J, Hong X, Dong L, Jiang W, Yang W. A TabNet-Based Multidimensional Deep Learning Model for Predicting Doxorubicin-Induced Cardiotoxicity in Breast Cancer Patients. Cancers. 2026; 18(1):117. https://doi.org/10.3390/cancers18010117

Chicago/Turabian Style

Cao, Juanwen, Xiaojian Hong, Li Dong, Wei Jiang, and Wei Yang. 2026. "A TabNet-Based Multidimensional Deep Learning Model for Predicting Doxorubicin-Induced Cardiotoxicity in Breast Cancer Patients" Cancers 18, no. 1: 117. https://doi.org/10.3390/cancers18010117

APA Style

Cao, J., Hong, X., Dong, L., Jiang, W., & Yang, W. (2026). A TabNet-Based Multidimensional Deep Learning Model for Predicting Doxorubicin-Induced Cardiotoxicity in Breast Cancer Patients. Cancers, 18(1), 117. https://doi.org/10.3390/cancers18010117

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop