Demystifying Deep Learning Decisions in Leukemia Diagnostics Using Explainable AI
Abstract
1. Introduction
2. Related Work
2.1. Deep Learning for Leukemia Diagnosis
2.1.1. Single-Type Pipeline Classifier
- Single-Type Pipeline: Focuses on binary classification by distinguishing one leukemia type (e.g., ALL) from healthy or benign samples. It typically involves fewer image categories, simpler network heads (two output neurons), and is trained on datasets that include only one disease type.
- Multiple-Type Pipeline: Extends this to multi-class classification across several leukemia types (e.g., ALL, AML, CLL, CML, and Healthy). This pipeline uses multi-label encoding (five output neurons in our case) and is trained on a consolidated dataset where each sample is annotated according to its specific subtype.
2.1.2. Multi-Type Pipeline Classifier
2.2. XAI for Leukemia Diagnosis
3. Methodology
3.1. Proposed Deep Learning Models
- Preprocessing: Input images—provided in BMP, JPG, or TIFF—were standardized via normalization, resized to 224 × 224 pixels to ensure CNN compatibility, and augmented to expand sample diversity and mitigate overfitting.
- Model training: The processed corpus was partitioned into training (80%) and testing/validation (20%) splits. We fine-tuned pretrained CNN backbones (DenseNet-121, VGG-16, ResNet50) by replacing their terminal classification layers to adapt them to leukemia subtype prediction. Deep learning was selected as the core analytical approach because microscopic leukemia diagnosis involves highly complex, high-dimensional visual patterns that cannot be effectively captured by conventional feature-engineering or shallow classifiers.
- Explainability: Post hoc XAI methods, specifically LIME and Grad-CAM, as shown in Figure 4, were applied to generate heatmaps that highlighted image regions most influential to the classifier, rendering model behavior transparent and clinically interpretable.
3.2. Proposed XAI Techniques
- Model-agnostic (LIME)—provides local, quantitative explanations by perturbing input pixels and observing how model predictions change;
- Model-specific (Grad-CAM)—provides visual, qualitative explanations by tracing class-specific gradients back through the final convolutional layer.
3.2.1. LIME
3.2.2. Grad-CAM
4. Experimental Setup
4.1. Datasets
4.2. Data Splitting
4.3. Benchmark Baselines
4.4. Computational Resources and Evaluation Metrics
4.5. Dataset Exploration
4.5.1. Samples and Visualizations
4.5.2. Training Controls to Mitigate over-/Underfitting
- Using a stratified 80/10/10 split (train/val/test), selecting all hyperparameters on the validation split only; the test split is used once, at the end, to report final metrics.
- Monitoring validation loss/F1 and stop training when validation loss does not improve for 10 epochs (patience = 10).
- Weight decay (L2 = 1 × 10−4) and dropout (0.3–0.5) in the classification head; label smoothing = 0.1 to reduce over-confidence.
- We evaluated four families (MixUp, AugMix, CutMix, and RandAug). MixUp (α = 0.2) and AugMix (severity = 3, width = 3) consistently minimized the train–val generalization gap and yielded the most clinically coherent explanations; results reported in the paper use those settings. (We retain RandAug/CutMix results for completeness but do not base model selection on them.)
- We tracked training vs. validation curves for loss and F1. Models are accepted only if (i) the validation F1 improves while the training F1 improves (no divergence) and (ii) the generalization gap at the chosen epoch is small (typically ≤ 2–3% for the best models). We report test metrics once for the single model checkpoint with the best validation F1.
4.5.3. Preprocessing and Augmentation
4.6. Fine-Tuning Procedure
5. Experimental Results
5.1. Experimental Results from CNNs and Pretrained Models
5.1.1. Experiment (A)—Binary ALL vs. Healthy (Table 7)
5.1.2. Experiment (B)—Multiclass (CLL, FL, CML) (Table 7)
5.1.3. Experiment (C)—Multiclass CALL Subtypes (Benign, Early, Pre, Pro) Across Two Datasets (Table 8)
5.1.4. Experiment (D)—Five-Class: (ALL, AML, CLL, CML, Healthy) (Table 10)
| Model | Augmentation | Train | Val | Test | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Acc. | Loss | F1 | Acc. | Loss | F1 | Acc. | Loss | F1 | ||
| Xception | AugMix | 0.9045 | 0.2603 | 0.9044 | 0.9637 | 0.1056 | 0.9633 | 0.9628 | 0.1183 | 0.9630 |
| MixUp | 0.9199 | 0.4214 | 0.8737 | 0.9688 | 0.1003 | 0.9687 | 0.9622 | 0.1234 | 0.9626 | |
| RandAug | 0.8280 | 0.4444 | 0.8275 | 0.9304 | 0.1898 | 0.9296 | 0.9310 | 0.1929 | 0.9316 | |
| CutMix | 0.7079 | 0.8873 | 0.6560 | 0.9481 | 0.2301 | 0.9480 | 0.9523 | 0.2355 | 0.9525 | |
| VGG16 | AugMix | 0.8821 | 0.3157 | 0.8824 | 0.9516 | 0.1487 | 0.9510 | 0.9575 | 0.1288 | 0.9579 |
| MixUp | 0.9168 | 0.4464 | 0.8692 | 0.9743 | 0.1065 | 0.9741 | 0.9725 | 0.1099 | 0.9729 | |
| RandAug | 0.8277 | 0.4404 | 0.8268 | 0.9269 | 0.1995 | 0.9257 | 0.9432 | 0.1892 | 0.9430 | |
| CutMix | 0.6885 | 0.9281 | 0.6404 | 0.9466 | 0.2502 | 0.9460 | 0.9438 | 0.2429 | 0.9440 | |
| ResNet50 | AugMix | 0.6714 | 0.8285 | 0.6706 | 0.8029 | 0.5207 | 0.8002 | 0.7855 | 0.5172 | 0.7848 |
| MixUp | 0.7198 | 0.8360 | 0.6892 | 0.8271 | 0.4846 | 0.8264 | 0.8215 | 0.4820 | 0.8233 | |
| RandAug | 0.5595 | 1.0700 | 0.5579 | 0.7324 | 0.7375 | 0.7329 | 0.7105 | 0.7652 | 0.7138 | |
| CutMix | 0.5553 | 1.2279 | 0.5149 | 0.8054 | 0.6240 | 0.8061 | 0.8146 | 0.6363 | 0.8177 | |
| MobileNetV2 | AugMix | 0.9052 | 0.2588 | 0.9056 | 0.9582 | 0.1133 | 0.9576 | 0.9656 | 0.1080 | 0.9652 |
| MixUp | 0.9228 | 0.4107 | 0.8771 | 0.9783 | 0.0823 | 0.9782 | 0.9790 | 0.0775 | 0.9788 | |
| RandAug | 0.8555 | 0.3893 | 0.8553 | 0.9355 | 0.1722 | 0.9352 | 0.9562 | 0.1459 | 0.9564 | |
| CutMix | 0.7187 | 0.8620 | 0.6613 | 0.9607 | 0.1843 | 0.9604 | 0.9590 | 0.1868 | 0.9595 | |
| InceptionV3 | AugMix | 0.8477 | 0.4069 | 0.8481 | 0.9340 | 0.1965 | 0.9337 | 0.9328 | 0.2166 | 0.9335 |
| MixUp | 0.8992 | 0.4942 | 0.8531 | 0.9541 | 0.1495 | 0.9541 | 0.9525 | 0.1668 | 0.9524 | |
| RandAug | 0.7856 | 0.5631 | 0.7844 | 0.8977 | 0.2902 | 0.8976 | 0.8999 | 0.3001 | 0.8999 | |
| CutMix | 0.6845 | 0.9488 | 0.6349 | 0.9360 | 0.2649 | 0.9358 | 0.9302 | 0.2815 | 0.9301 | |
| DenseNet121 | AugMix | 0.9086 | 0.2477 | 0.9080 | 0.9703 | 0.0940 | 0.9702 | 0.9662 | 0.0934 | 0.9665 |
| MixUp | 0.9221 | 0.4169 | 0.8761 | 0.9793 | 0.0743 | 0.9792 | 0.9763 | 0.0806 | 0.9766 | |
| RandAug | 0.8766 | 0.3261 | 0.8755 | 0.9597 | 0.1178 | 0.9594 | 0.9624 | 0.1183 | 0.9628 | |
| CutMix | 0.7027 | 0.8896 | 0.6519 | 0.9551 | 0.2096 | 0.9549 | 0.9563 | 0.2092 | 0.9565 | |
| CNN | AugMix | 0.8451 | 0.4290 | 0.8431 | 0.8745 | 0.3266 | 0.8690 | 0.8761 | 0.3147 | 0.8772 |
| MixUp | 0.9072 | 0.4809 | 0.8544 | 0.8846 | 0.3279 | 0.8833 | 0.9251 | 0.2502 | 0.9238 | |
| RandAug | 0.7689 | 0.6108 | 0.7675 | 0.7883 | 0.5145 | 0.7816 | 0.7984 | 0.4808 | 0.7979 | |
| CutMix | 0.7038 | 0.9079 | 0.6564 | 0.8241 | 0.4683 | 0.8215 | 0.8699 | 0.3926 | 0.8720 | |
5.2. Comparative Analysis of Model Results for Diagnosis Problems
5.3. Experimental Results from Explainable AI (XAI)
5.3.1. Interpretability of Binary Classifiers Using LIME
- Green-highlighted super-pixels represent areas that positively contribute to the predicted class (supporting evidence).
- Red-highlighted regions denote areas that contradict the prediction or carry lower diagnostic relevance.
5.3.2. Interpretability of Multi-Class Classifiers Using LIME
5.3.3. Interpretability of Binary Classifiers Using Grad-CAM
5.3.4. Interpretability of Multi-Class Classifiers Using Grad-CAM
5.4. Comparison of Baseline Methods and the Proposed Model
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ansari, S.; Navin, A.H.; Babazadeh Sangar, A.; Vaez Gharamaleki, J.; Danishvar, S. Acute Leukemia Diagnosis Based on Images of Lymphocytes and Monocytes Using Type-II Fuzzy Deep Network. Electronics 2023, 12, 1116. [Google Scholar] [CrossRef]
- Deshpande, N.-M.; Gite, S.; Pradhan, B.; Alamri, A.; Lee, C.-W. A New Method for Diagnosis of Leukemia Utilizing a Hybrid DL-ML Approach for Binary and Multi-Class Classification on a Limited-Sized Database. Comput. Model. Eng. Sci. 2024, 139, 593–631. [Google Scholar] [CrossRef]
- Alzahrani, A.K.; Alsheikhy, A.A.; Shawly, T.; Azzahrani, A.; Said, Y. A Novel Deep Learning Segmentation and Classification Framework for Leukemia Diagnosis. Algorithms 2023, 16, 556. [Google Scholar] [CrossRef]
- Elsayed, B.; Elhadary, M.; Elshoeibi, R.M.; Elshoeibi, A.M.; Badr, A.; Metwally, O.; ElSherif, R.A.; Salem, M.E.; Khadadah, F.; Alshurafa, A.; et al. Deep learning enhances acute lymphoblastic leukemia diagnosis and classification using bone marrow images. Front. Oncol. 2023, 13, 1330977. [Google Scholar] [CrossRef] [PubMed]
- Kaur, M.; AlZubi, A.A.; Jain, A.; Singh, D.; Yadav, V.; Alkhayyat, A. DSCNet: Deep Skip Connections-Based Dense Network for ALL Diagnosis Using Peripheral Blood Smear Images. Diagnostics 2023, 13, 2752. [Google Scholar] [CrossRef]
- Alkhalaf, S.; Alturise, F.; Bahaddad, A.A.; Elnaim, B.M.E.; Shabana, S.; Abdel-Khalek, S.; Mansour, R.F. Adaptive Aquila Optimizer with Explainable Artificial Intelligence-Enabled Cancer Diagnosis on Medical Imaging. Cancers 2023, 15, 1492. [Google Scholar] [CrossRef]
- Abhishek, A.; Jha, R.K.; Sinha, R.; Jha, K. Automated detection and classification of leukemia on a subject-independent test dataset using deep transfer learning supported by Grad-CAM visualization. Biomed. Signal Process. Control 2023, 83, 104722. [Google Scholar] [CrossRef]
- Deshpande, N.M.; Gite, S.; Pradhan, B. Explainable AI for binary and multi-class classification of leukemia using a modified transfer learning ensemble model. Int. J. Smart Sens. Intell. Syst. 2024, 17, 1–20. [Google Scholar] [CrossRef]
- Döhner, H.; Estey, E.; Grimwade, D.; Amadori, S.; Appelbaum, F.R.; Büchner, T.; Dombret, H.; Ebert, B.L.; Fenaux, P.; Larson, R.A.; et al. Diagnosis and management of AML in adults: 2017 ELN recommendations from an international expert panel. Blood 2017, 129, 424–447. [Google Scholar] [CrossRef]
- Mohammed, K.K.; Hassanien, A.E.; Afify, H.M. Refinement of ensemble strategy for acute lymphoblastic leukemia microscopic images using hybrid CNN-GRU-BiLSTM and MSVM classifier. Neural Comput. Appl. 2023, 35, 17415–17427. [Google Scholar] [CrossRef]
- Hosseini, A.; Eshraghi, M.A.; Taami, T.; Sadeghsalehi, H.; Hoseinzadeh, Z.; Ghaderzadeh, M.; Rafiee, M. A mobile application based on efficient lightweight CNN model for classification of B-ALL cancer from non-cancerous cells: A design and implementation study. Inform. Med. Unlocked 2023, 39, 101244. [Google Scholar] [CrossRef]
- Sulaiman, A.; Kaur, S.; Gupta, S.; Alshahrani, H.; Reshan, M.S.A.; Alyami, S.; Shaikh, A. ResRandSVM: Hybrid Approach for Acute Lymphocytic Leukemia Classification in Blood Smear Images. Diagnostics 2023, 13, 2121. [Google Scholar] [CrossRef] [PubMed]
- Mohamed, H.; Elsheref, F.K.; Kamal, S.R. A New Model for Blood Cancer Classification Based on Deep Learning Techniques. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 422–429. [Google Scholar] [CrossRef]
- Diaz Resendiz, J.L.; Ponomaryov, V.; Reyes Reyes, R.; Sadovnychiy, S. Explainable CAD System for Classification of Acute Lymphoblastic Leukemia Based on a Robust White Blood Cell Segmentation. Cancers 2023, 15, 3376. [Google Scholar] [CrossRef]
- Velázquez-Arreola, J.D.; Sánchez-Medel, N.I.; Zarraga-Vargas, O.A.; Díaz-Hernández, R.; Altamirano-Robles, L. Evaluation of Heat Map Methods Using Cell Morphology for Classifying Acute Lymphoblastic Leukemia Cells. Comput. Sist. 2024, 28, 221–231. [Google Scholar] [CrossRef]
- Genovese, A.; Piuri, V.; Scotti, F. ALL-IDB Patches: Whole Slide Imaging for Acute Lymphoblastic Leukemia Detection Using Deep Learning. In Proceedings of the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW, Rhodes Island, Greece, 4–10 June 2023. [Google Scholar]
- Himel, M.; Hasan, M.A.; Suzuki, T.; Shin, J. Feature Fusion Based Ensemble of Deep Networks for Acute Leukemia Diagnosis Using Microscopic Smear Images. IEEE Access 2024, 12, 54758–54771. [Google Scholar] [CrossRef]
- Raza, M.U.; Nawaz, A.; Liu, X.T.; Chen, Z.Z.; Leung, V.C.M.; Chen, J.; Li, J.Q. Visual Explanations: Activation-based Acute Lymphoblastic Leukemia Cell Classification. In Proceedings of the 2023 IEEE International Conference on Development and Learning, ICDL, Macau, China, 9–11 November 2023; pp. 61–66. [Google Scholar]
- Ghnemat, R.; Alodibat, S.; Abu Al-Haija, Q. Explainable Artificial Intelligence (XAI) for Deep Learning Based Medical Imaging Classification. J. Imaging 2023, 9, 177. [Google Scholar] [CrossRef]
- Muhammad, D.; Bendechache, M. Unveiling the black box: A systematic review of Explainable Artificial Intelligence in medical image analysis. Comput. Struct. Biotechnol. J. 2024, 24, 542–560. [Google Scholar] [CrossRef]
- Mohamed, E.; Sirlantzis, K.; Howells, G. A review of visualisation-as-explanation techniques for convolutional neural networks and their evaluation. Displays 2022, 73, 102239. [Google Scholar] [CrossRef]
- Haque, R.; Al Sakib, A.; Hossain, M.F.; Islam, F.; Ibne Aziz, F.; Ahmed, M.R.; Kannan, S.; Rohan, A.; Hasan, M.J. Advancing Early Leukemia Diagnostics: A Comprehensive Study Incorporating Image Processing and Transfer Learning. BioMedInformatics 2024, 4, 966–991. [Google Scholar] [CrossRef]
- Cubuk, E.D.; Zoph, B.; Shlens, J.; Le, Q.V. RandAugment: Practical automated data augmentation with a reduced search space. arXiv 2019, arXiv:1909.13719. [Google Scholar] [CrossRef]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond Empirical Risk Minimization. arXiv 2018, arXiv:1710.09412. [Google Scholar] [CrossRef]
- Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. arXiv 2019, arXiv:1905.04899. [Google Scholar] [CrossRef]
- Hendrycks, D.; Mu, N.; Cubuk, E.D.; Zoph, B.; Gilmer, J.; Lakshminarayanan, B. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty. arXiv 2020, arXiv:1912.02781. [Google Scholar] [CrossRef]



























| Category | Cell Type Affected | Disease Speed | Typical Age Group | Key Features |
|---|---|---|---|---|
| ALL | Lymphoid (immature B/T cells) | Acute | Children | Rapid onset; immature lymphoblasts |
| AML | Myeloid (precursor cells) | Acute | Adults | Myeloblast accumulation; cytogenetic variants |
| CLL | Lymphoid (mature B cells) | Chronic | Older adults | Slow progression; high lymphocyte count |
| CML | Myeloid (mature myeloid cells) | Chronic | Adults (50+) | Philadelphia chromosome; triphasic course |
| Type | Images | Type of Data | Method | Pre-Trained Models | Acc. | Limitations | Ref. |
|---|---|---|---|---|---|---|---|
| Single-Type Classifiers | - | bone marrow smear | CNN | AlexNet, DenseNet121, MobileViTv2, ResNet-50, ResNext101_32x8d, ResNext50_32x4d | 88.5% | Limited datasets. Lack of external validation. Computational complexity. | [4] |
| 3242 | PBS | CNNs | MobileNetV2 | 100% | Limited datasets. | [11] | |
| 10,661 | PBS | ResRandSVM | ResNet50 | 90% | Complex model architecture complicates interpretation. | [12] | |
| 12,528 | Microscopic | CNN-GRU-BiLSTM-MSVM | DenseNet-201 | 96.29% | - | [10] | |
| Multi-Type Classifiers | 889 | Microscopic | Hybrid DL-ML | VGG, Xception, InceptionResNet, DenseNet, ResNet | 100% for (ALL) | Limited datasets. | [2] |
| 97.08% for AML, CLL, CML | |||||||
| 92.2% AML vs. ALL cases | |||||||
| 3679 | Microscopic | CNNs | VGG16 | 98.2% | Limited datasets. Difficulty in interpreting and explaining DL predictions. | [13] | |
| DenseNet-121 | 98.1% |
| XAI Type | Model Type | Images | Model | Pre-Trained Models | XAI Model | Acc. | Limitation | Ref. | |||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Local/Global | Intrinsic/Post Hoc | Specific/Agnostic | |||||||||
| Visual: Backpropagation-based | Local | Post hoc | Model-Specific | ALL | 2823 | CNN | ResNet-50 | Grad-Cam | 99.9% | Time-consuming. Limited datasets. | [14] |
| Local | Post hoc | Model-Specific | ALL | 260 | CNN | VGG-19, ResNet18, ResNet50, GoogleNet | Grad-Cam | - | Limited datasets. | [15] | |
| Local | Post hoc | Model-Specific | ALL | 108 | OrthoALLNet | ResNet18, ResNet34 | Grad-Cam | 95.91% to 96.06% | Limited datasets. | [16] | |
| Local | Post hoc | Model-Specific | ALL, AML | 498 | CNN | EfficientNetB7, MobileNetV3 | Grad-Cam | 99.3% | Limited datasets. | [17] | |
| Local | Post hoc | Model-Specific | ALL | 15,114 | CNN | VGG, ResNet, EfficientNet, DenseNet121, ViT, InceptionV3 | CAM, Grad-CAM & GradCAM++ | 68% to 94% | - | [18] | |
| Local | Post hoc | Model-Specific | ALL, AML, CLL, CML | 1250 | CNN | VGG16 | Grad-Cam | 84% | Limited datasets. | [7] | |
| Visual: Perturbation-based | Local | Post hoc | Model-Agnostic | ALL, AML, CLL, CML | 889 | CNN | VGG-16 and InceptionV3 | LIME | 83.33% (ALL) | - | [8] |
| 100% (AML, CLL, CML) | Limited datasets cause 100% | ||||||||||
| No. | Dataset | Type | Images | Description | Magnification | Source |
|---|---|---|---|---|---|---|
| 1. | ALL | Microscopic | 15,135 | Cells segmented from 15,135 microscopic images divided into healthy and ALL. | N/A | Kaggle |
| 2. | ALL | PBS | 3256 | Cells segmented from 3256 PBS images divided into two classes: benign and malignant. The ALL group with three subtypes of malignant lymphoblasts: Early Pre-B, Pre-B, and Pro-B ALL. | 100× | Kaggle |
| 3. | ALL | N/A | 20,000 | Contains 130,000 images of 8 types of cancer, and 20,000 images for ALL, divided into two classes: benign and malignant. The ALL group with three subtypes of malignant lymphoblasts: Early_ ALL, Pre_ ALL, and Pro_ALL. | 512 × 512 pixels | Kaggle |
| 4. | AML | PBS | 10,000 | Cells segmented from 10,000 PBS images from patients diagnosed with AML. | 64 × 64 pixels | Kaggle |
| 5. | CLL | N/A | 113 | Contains 5400 images of 3 types of malignant lymphomas, and 113 images for CLL. | N/A | Kaggle |
| 6. | CML | Microscope | 623 | Contains 623 microscopic images of CML, and the images taken by smartphone camera. | N/A | Raabindata |
| 7. | ALL, AML, CLL, CML, H | Microscope | 20,000 | Contains 20,000 microscopic images of ALL, AML, CLL, CML, and H (healthy). | N/A | [22] |
| Leukemia Type | Images | Training (80%) | Validation (10%) | Testing (10%) | Image Size | Image Format |
|---|---|---|---|---|---|---|
| ALL | 35,814 | 28,651 | 3581 | 3582 | (450, 450), (600, 600), (224, 224), (512, 512), (640, 640) | BMP, JPEG |
| AML | 10,000 | 8000 | 1000 | 1000 | (64, 64) | TIFF |
| CLL | 113 | 90 | 11 | 12 | (1388, 1040), (512, 512) | TIFF, JPEG |
| CML | 623 | 498 | 62 | 63 | (4160, 3120) | JPEG |
| ALL, AML, CLL, CML, H | 20,000 | 16,000 | 2000 | 2000 | (1024, 768) | JPEG |
| Total: | 66,550 | 53,239 | 6654 | 6657 | ||
| Ref. | Images | Dataset/Repository | Type of Leukemia | Method | XAI | Accuracy |
|---|---|---|---|---|---|---|
| [2] | 889 | ALL-IDB | ALL, AML, CLL, CML | VGG, Xception, InceptionResV2, DenseNet, ResNet with RF and XGBoost | - | 100% for ALL |
| Private dataset (AML, CLL, CML) | 97.08% for (AML, CLL, CML) | |||||
| [7] | 1250 | AIIMS Patna | ALL, AML, CLL, CML | VGG16 with SVM | Grad-CAM | 84% |
| [8] | 889 | ALL-IDB | ALL, AML, CLL, CML | VGG-16 and InceptionV3 | LIME | 83.33% (ALL) |
| Private dataset (AML, CLL, CML) | 100% (AML, CLL, CML) |
| Model | Augmentation | Train | Validation | Test | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Acc. | Loss | F1 | Acc. | Loss | F1 | Acc. | Loss | F1 | ||
| Xception | AugMix | 0.7372 | 0.5607 | 0.6667 | 0.8085 | 0.4230 | 0.7591 | 0.8134 | 0.4275 | 0.7698 |
| MixUp | 0.8194 | 0.4244 | 0.7648 | 0.8574 | 0.3367 | 0.8268 | 0.8610 | 0.3613 | 0.8332 | |
| RandAug | 0.7079 | 0.5842 | 0.6272 | 0.7965 | 0.4595 | 0.7568 | 0.7706 | 0.4616 | 0.7290 | |
| CutMix | 0.7191 | 0.5779 | 0.6305 | 0.8301 | 0.4060 | 0.7808 | 0.8185 | 0.4112 | 0.7596 | |
| VGG16 | AugMix | 0.7242 | 0.5872 | 0.6532 | 0.7853 | 0.4834 | 0.7012 | 0.7789 | 0.4773 | 0.6958 |
| MixUp | 0.7511 | 0.5600 | 0.6800 | 0.8045 | 0.4514 | 0.7548 | 0.7919 | 0.4577 | 0.7361 | |
| RandAug | 0.7237 | 0.5692 | 0.6353 | 0.7869 | 0.4875 | 0.7395 | 0.7711 | 0.5162 | 0.7204 | |
| CutMix | 0.7146 | 0.5904 | 0.6104 | 0.7965 | 0.4671 | 0.7433 | 0.7881 | 0.4755 | 0.7296 | |
| ResNet50 | AugMix | 0.7134 | 0.5978 | 0.6375 | 0.7732 | 0.5041 | 0.7059 | 0.7648 | 0.4989 | 0.6896 |
| MixUp | 0.7393 | 0.5647 | 0.6677 | 0.7837 | 0.4792 | 0.7230 | 0.7854 | 0.4805 | 0.7342 | |
| RandAug | 0.6757 | 0.6309 | 0.5189 | 0.6755 | 0.5990 | 0.4055 | 0.6728 | 0.5969 | 0.4041 | |
| CutMix | 0.6943 | 0.6220 | 0.5978 | 0.7804 | 0.4910 | 0.7210 | 0.7812 | 0.4982 | 0.7230 | |
| MobileNetV2 | AugMix | 0.7386 | 0.5485 | 0.6742 | 0.7997 | 0.4480 | 0.7405 | 0.8109 | 0.4353 | 0.7518 |
| MixUp | 0.8038 | 0.4635 | 0.7421 | 0.8165 | 0.4086 | 0.7740 | 0.8324 | 0.3965 | 0.7937 | |
| RandAug | 0.7080 | 0.5946 | 0.6358 | 0.7788 | 0.4785 | 0.7058 | 0.7837 | 0.4703 | 0.7105 | |
| CutMix | 0.7096 | 0.5929 | 0.6121 | 0.7989 | 0.4534 | 0.7422 | 0.8162 | 0.4414 | 0.7693 | |
| InceptionV3 | AugMix | 0.7388 | 0.5447 | 0.6708 | 0.7917 | 0.4658 | 0.7377 | 0.7874 | 0.4744 | 0.7371 |
| MixUp | 0.7876 | 0.4839 | 0.7354 | 0.8213 | 0.4148 | 0.7857 | 0.8080 | 0.4256 | 0.7705 | |
| RandAug | 0.7164 | 0.5732 | 0.6314 | 0.7612 | 0.4938 | 0.7072 | 0.7556 | 0.5236 | 0.7023 | |
| CutMix | 0.6987 | 0.6060 | 0.6043 | 0.8061 | 0.4448 | 0.7478 | 0.8006 | 0.4548 | 0.7397 | |
| DenseNet121 | AugMix | 0.7597 | 0.5124 | 0.7031 | 0.8237 | 0.4050 | 0.7827 | 0.8303 | 0.4066 | 0.7882 |
| MixUp | 0.8245 | 0.4343 | 0.7702 | 0.8518 | 0.3435 | 0.8222 | 0.8494 | 0.3437 | 0.8159 | |
| RandAug | 0.7546 | 0.5230 | 0.6898 | 0.8221 | 0.4041 | 0.7808 | 0.7903 | 0.4670 | 0.7460 | |
| CutMix | 0.7415 | 0.5527 | 0.6576 | 0.8462 | 0.3748 | 0.8063 | 0.8420 | 0.3870 | 0.8017 | |
| CNN | AugMix | 0.7763 | 0.4865 | 0.7155 | 0.7652 | 0.4873 | 0.7521 | 0.7910 | 0.4775 | 0.7718 |
| MixUp | 0.8530 | 0.3904 | 0.7997 | 0.8293 | 0.3722 | 0.8179 | 0.8178 | 0.3858 | 0.8064 | |
| RandAug | 0.7685 | 0.5081 | 0.6852 | 0.7139 | 0.5307 | 0.7078 | 0.7825 | 0.4952 | 0.7643 | |
| CutMix | 0.7984 | 0.4842 | 0.7259 | 0.8838 | 0.3104 | 0.8698 | 0.8895 | 0.3168 | 0.8775 | |
| Model | Augmentation | Train | Validation | Test | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Acc. | Loss | F1 | Acc. | Loss | F1 | Acc. | Loss | F1 | ||
| Xception | AugMix | 0.3798 | 1.2851 | 0.3778 | 0.4375 | 1.2137 | 0.2897 | 0.3998 | 1.1012 | 0.1939 |
| MixUp | 0.7115 | 0.7518 | 0.6867 | 0.5625 | 1.1341 | 0.5369 | 0.5526 | 1.0876 | 0.5463 | |
| RandAug | 0.3796 | 1.3432 | 0.3756 | 0.3438 | 1.2161 | 0.1746 | 0.4420 | 1.0829 | 0.2776 | |
| CutMix | 0.4618 | 1.2466 | 0.4459 | 0.5312 | 1.0242 | 0.5106 | 0.4107 | 1.1147 | 0.4070 | |
| VGG16 | AugMix | 0.4637 | 1.2660 | 0.4558 | 0.4062 | 1.3110 | 0.1926 | 0.4261 | 1.0649 | 0.2479 |
| MixUp | 0.4735 | 1.1503 | 0.4562 | 0.5000 | 1.0240 | 0.4362 | 0.4633 | 1.0887 | 0.4439 | |
| RandAug | 0.3570 | 1.4211 | 0.3555 | 0.2812 | 1.2176 | 0.2046 | 0.3368 | 1.1460 | 0.1916 | |
| CutMix | 0.4771 | 1.2294 | 0.4618 | 0.6250 | 0.9506 | 0.5842 | 0.4945 | 0.9722 | 0.4908 | |
| ResNet50 | AugMix | 0.4849 | 1.1454 | 0.4789 | 0.5000 | 1.0868 | 0.3810 | 0.3318 | 1.1590 | 0.1657 |
| MixUp | 0.4794 | 1.2549 | 0.4717 | 0.4688 | 1.0313 | 0.3810 | 0.4896 | 1.0116 | 0.4273 | |
| RandAug | 0.3991 | 1.2831 | 0.3961 | 0.4375 | 1.0661 | 0.3206 | 0.3001 | 1.1364 | 0.1538 | |
| CutMix | 0.3689 | 1.5108 | 0.3515 | 0.4062 | 1.1547 | 0.3632 | 0.4633 | 0.9963 | 0.4008 | |
| MobileNetV2 | AugMix | 0.4538 | 1.1880 | 0.4525 | 0.5000 | 1.0580 | 0.5030 | 0.5947 | 0.9422 | 0.5838 |
| MixUp | 0.6267 | 0.8571 | 0.6284 | 0.4375 | 1.1723 | 0.4419 | 0.5317 | 0.9664 | 0.5324 | |
| RandAug | 0.4386 | 1.1665 | 0.4179 | 0.5312 | 0.9977 | 0.4590 | 0.4370 | 0.9773 | 0.4176 | |
| CutMix | 0.5260 | 1.1346 | 0.5129 | 0.3750 | 1.0690 | 0.3604 | 0.4896 | 0.9406 | 0.4860 | |
| InceptionV3 | AugMix | 0.4191 | 1.3365 | 0.4171 | 0.3125 | 1.2287 | 0.2874 | 0.3844 | 1.0467 | 0.3771 |
| MixUp | 0.6946 | 0.8153 | 0.6542 | 0.5625 | 0.9258 | 0.5160 | 0.4425 | 1.0506 | 0.4145 | |
| RandAug | 0.4676 | 1.3531 | 0.4532 | 0.4062 | 1.1455 | 0.4040 | 0.5422 | 1.0297 | 0.5453 | |
| CutMix | 0.4571 | 1.2502 | 0.4319 | 0.4375 | 1.1371 | 0.4307 | 0.5000 | 1.0810 | 0.4939 | |
| DenseNet121 | AugMix | 0.5025 | 1.1260 | 0.4964 | 0.4688 | 1.1029 | 0.3702 | 0.5000 | 1.0766 | 0.4053 |
| MixUp | 0.6982 | 0.7466 | 0.6713 | 0.6250 | 0.7675 | 0.6056 | 0.6210 | 0.7763 | 0.6220 | |
| RandAug | 0.4066 | 1.3358 | 0.4048 | 0.3750 | 1.1619 | 0.3075 | 0.4841 | 1.0708 | 0.4172 | |
| CutMix | 0.4382 | 1.3478 | 0.4228 | 0.4375 | 0.9331 | 0.4035 | 0.4841 | 0.9894 | 0.4774 | |
| CNN | AugMix | 0.5855 | 0.8824 | 0.5557 | 0.4375 | 1.0676 | 0.3454 | 0.4524 | 1.1120 | 0.4241 |
| MixUp | 0.5909 | 0.9263 | 0.5576 | 0.3750 | 1.1183 | 0.1818 | 0.2684 | 1.3455 | 0.1410 | |
| RandAug | 0.3999 | 1.0676 | 0.3737 | 0.2188 | 1.3920 | 0.1197 | 0.4420 | 1.0863 | 0.2043 | |
| CutMix | 0.5893 | 0.9412 | 0.5405 | 0.2188 | 2.1327 | 0.1197 | 0.2842 | 1.2836 | 0.1475 | |
| Model | Augmentation | ALL (Benign, Early, Pre, Pro) (1) | ALL (Benign, Early, Pre, Pro) (2) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Validation | Test | Validation | Test | ||||||||||
| Acc. | Loss | F1 | Acc. | Loss | F1 | Acc. | Loss | F1 | Acc. | Loss | F1 | ||
| Xception | AugMix | 0.9531 | 0.1286 | 0.9483 | 0.9295 | 0.1449 | 0.9082 | 0.9869 | 0.0389 | 0.9868 | 0.9839 | 0.0410 | 0.9838 |
| MixUp | 0.9719 | 0.0898 | 0.9687 | 0.9815 | 0.0845 | 0.9756 | 0.9945 | 0.0402 | 0.9944 | 0.9917 | 0.0497 | 0.9919 | |
| RandAug | 0.9563 | 0.1567 | 0.9544 | 0.9485 | 0.1611 | 0.9437 | 0.9713 | 0.0854 | 0.9712 | 0.9655 | 0.0987 | 0.9647 | |
| CutMix | 0.9719 | 0.1503 | 0.9697 | 0.9891 | 0.1494 | 0.9884 | 0.9869 | 0.1515 | 0.9869 | 0.9850 | 0.1514 | 0.9852 | |
| VGG16 | AugMix | 0.9531 | 0.1220 | 0.9467 | 0.9611 | 0.1071 | 0.9505 | 0.9909 | 0.0348 | 0.9908 | 0.9928 | 0.0285 | 0.9928 |
| MixUp | 0.9781 | 0.0729 | 0.9748 | 0.9788 | 0.0700 | 0.9737 | 1.0000 | 0.0224 | 1.0000 | 0.9995 | 0.0222 | 0.9995 | |
| RandAug | 0.9406 | 0.1617 | 0.9335 | 0.9659 | 0.1112 | 0.9615 | 0.9945 | 0.0234 | 0.9944 | 0.9924 | 0.0247 | 0.9925 | |
| CutMix | 0.9719 | 0.1758 | 0.9689 | 0.9770 | 0.1618 | 0.9732 | 0.9955 | 0.1324 | 0.9955 | 0.9938 | 0.1318 | 0.9938 | |
| ResNet50 | AugMix | 0.6969 | 0.8004 | 0.6715 | 0.7301 | 0.7485 | 0.6706 | 0.8286 | 0.4756 | 0.8287 | 0.8297 | 0.4786 | 0.8281 |
| MixUp | 0.8594 | 0.4534 | 0.8543 | 0.8555 | 0.4603 | 0.8448 | 0.8831 | 0.3624 | 0.8830 | 0.8870 | 0.3615 | 0.8856 | |
| RandAug | 0.7000 | 0.8688 | 0.6122 | 0.6827 | 0.8680 | 0.5903 | 0.7888 | 0.6783 | 0.7889 | 0.7760 | 0.6681 | 0.7735 | |
| CutMix | 0.7656 | 0.6030 | 0.7280 | 0.8009 | 0.5254 | 0.7582 | 0.8760 | 0.4458 | 0.8757 | 0.8761 | 0.4411 | 0.8753 | |
| MobileNetV2 | AugMix | 0.9750 | 0.0658 | 0.9723 | 0.9863 | 0.0587 | 0.9821 | 0.9904 | 0.0283 | 0.9903 | 0.9965 | 0.0200 | 0.9965 |
| MixUp | 0.9656 | 0.1126 | 0.9615 | 0.9743 | 0.0912 | 0.9679 | 0.9975 | 0.0271 | 0.9975 | 0.9975 | 0.0282 | 0.9975 | |
| RandAug | 0.9719 | 0.1126 | 0.9709 | 0.9812 | 0.1015 | 0.9815 | 0.9834 | 0.0590 | 0.9832 | 0.9808 | 0.0517 | 0.9807 | |
| CutMix | 0.9594 | 0.2006 | 0.9574 | 0.9651 | 0.1928 | 0.9582 | 0.9884 | 0.1326 | 0.9884 | 0.9871 | 0.1274 | 0.9869 | |
| InceptionV3 | AugMix | 0.9250 | 0.2000 | 0.9206 | 0.9273 | 0.2384 | 0.9152 | 0.9693 | 0.0791 | 0.9688 | 0.9656 | 0.0929 | 0.9649 |
| MixUp | 0.9438 | 0.1892 | 0.9430 | 0.9598 | 0.1454 | 0.9554 | 0.9864 | 0.0607 | 0.9862 | 0.9898 | 0.0588 | 0.9898 | |
| RandAug | 0.8813 | 0.2649 | 0.8688 | 0.9512 | 0.1685 | 0.9474 | 0.9622 | 0.1122 | 0.9617 | 0.9581 | 0.1128 | 0.9581 | |
| CutMix | 0.9500 | 0.2257 | 0.9489 | 0.9388 | 0.2275 | 0.9364 | 0.9738 | 0.1627 | 0.9737 | 0.9726 | 0.1613 | 0.9724 | |
| DenseNet121 | AugMix | 0.9812 | 0.0691 | 0.9779 | 0.9794 | 0.0639 | 0.9745 | 0.9980 | 0.0106 | 0.9980 | 0.9961 | 0.0144 | 0.9962 |
| MixUp | 1.0000 | 0.0419 | 1.0000 | 1.0000 | 0.0372 | 1.0000 | 1.0000 | 0.0157 | 1.0000 | 0.9982 | 0.0195 | 0.9982 | |
| RandAug | 0.9812 | 0.0506 | 0.9798 | 0.9835 | 0.0491 | 0.9819 | 0.9919 | 0.0274 | 0.9919 | 0.9940 | 0.0220 | 0.9940 | |
| CutMix | 0.9906 | 0.1336 | 0.9886 | 0.9881 | 0.1219 | 0.9862 | 0.9960 | 0.1089 | 0.9960 | 0.9946 | 0.1112 | 0.9946 | |
| CNN | AugMix | 0.9750 | 0.1244 | 0.9717 | 0.9842 | 0.1109 | 0.9801 | 0.9904 | 0.0449 | 0.9904 | 0.9890 | 0.0385 | 0.9888 |
| MixUp | 0.9719 | 0.1221 | 0.9668 | 0.9617 | 0.1282 | 0.9479 | 0.9955 | 0.0607 | 0.9954 | 0.9942 | 0.0502 | 0.9943 | |
| RandAug | 0.7563 | 0.7743 | 0.6189 | 0.8205 | 0.6452 | 0.6713 | 0.9693 | 0.1151 | 0.9692 | 0.9800 | 0.0828 | 0.9799 | |
| CutMix | 0.8625 | 0.4241 | 0.8457 | 0.8994 | 0.3721 | 0.8954 | 0.9763 | 0.1995 | 0.9761 | 0.9729 | 0.1687 | 0.9734 | |
| Model | Images | Dataset/Repository | Type of Medical Diagnostics | Method | XAI Technology Used | Accuracy |
|---|---|---|---|---|---|---|
| [2] | 889 | ALL-IDB | ALL, AML, CLL, CML | VGG, Xception, InceptionResV2, DenseNet, ResNet with RF and XGBoost | - | 100% for ALL |
| Private dataset (AML, CLL, CML) | 97.08% for (AML, CLL, CML) | |||||
| [7] | 1250 | AIIMS Patna | ALL, AML, CLL, CML | VGG16 with SVM | Grad-CAM | 84% |
| [8] | 889 | ALL-IDB | ALL, AML, CLL, CML | VGG-16 and InceptionV3 | LIME | 83.33% (ALL) |
| Private dataset (AML, CLL, CML) | 100% (AML, CLL, CML) | |||||
| Our model | 20,000 | As described in Section 4.1 | ALL, AML, CLL, CML, H | MobileNetV2 with MixUp Augmentation | LIME, Grad-CAM | 97.9% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Altalhi, S.H.; Alzahrani, S.M. Demystifying Deep Learning Decisions in Leukemia Diagnostics Using Explainable AI. Diagnostics 2026, 16, 212. https://doi.org/10.3390/diagnostics16020212
Altalhi SH, Alzahrani SM. Demystifying Deep Learning Decisions in Leukemia Diagnostics Using Explainable AI. Diagnostics. 2026; 16(2):212. https://doi.org/10.3390/diagnostics16020212
Chicago/Turabian StyleAltalhi, Shahd H., and Salha M. Alzahrani. 2026. "Demystifying Deep Learning Decisions in Leukemia Diagnostics Using Explainable AI" Diagnostics 16, no. 2: 212. https://doi.org/10.3390/diagnostics16020212
APA StyleAltalhi, S. H., & Alzahrani, S. M. (2026). Demystifying Deep Learning Decisions in Leukemia Diagnostics Using Explainable AI. Diagnostics, 16(2), 212. https://doi.org/10.3390/diagnostics16020212

