Strategies to Improve the Robustness and Generalizability of Deep Learning Segmentation and Classification in Neuroimaging
Abstract
:1. Introduction
2. Methods
- The current state and challenges in model robustness and generalizability.
- Strategies for enhancing and monitoring these attributes.
- Barriers in transitioning models from research into clinical practice.
2.1. Search Strategy
2.2. Selection Criteria
2.3. Data Extraction
3. Strategies for Improving Robustness and Generalizability
3.1. Shared Approaches Improving Both Robustness and Generalizability
3.1.1. Optimization Techniques
3.1.2. Data Augmentation
3.1.3. Ensemble Learning Approaches
3.1.4. Model Architecture
3.2. Robustness Improvement Methods
3.2.1. Adversarial Training
3.2.2. Other Methods
3.3. Generalizability Improvement Methods
3.3.1. Domain Adaptation and Invariant Learning
3.3.2. Model Training Strategies
3.4. Evaluation and Monitoring
3.4.1. Key Performance Metrics and Statistical Results
3.4.2. Computational Complexity Analysis
- Low-complexity models, such as Multilayer Perceptron and basic CNNs, are suitable for small datasets and simple classification tasks.
3.4.3. Cross-Validation Strategies
3.4.4. Validation Framework
3.5. Pros and Cons of Different Robustness and Generalizability Improvement Methods
4. Challenges in Translating Robust and Generalizable Models to Clinical Settings
4.1. Data Quality and Standardization
4.2. Population Variability and Cross-Site Generalization
4.3. Task-Specific Reliability in Segmentation and Classification
5. Ablation Study: Robustness and Generalizability of Intracranial Hemorrhage Segmentation and Classification from Non-Contrast Head CT
6. Discussion
7. Conclusions
Supplementary Materials
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AI | Artificial intelligence |
ViT | Vision transformers |
CNN | Convolutional neural network |
FGSM | Fast gradient sign method |
PGD | Projected gradient descent |
CW | Carlini and Wagner |
CT | Computed tomography |
MRI | Magnetic resonance imaging |
fMRI | Functional magnetic resonance imaging |
DWI | Diffusion-weighted imaging |
IoU | Intersection over Union |
DSC | Dice–Sørensen coefficient |
HD | Hausdorff distance |
AUC-ROC | Area under the curve of receiver operating characteristic |
ICH | Intracerebral hemorrhage |
IVH | Intraventricular hemorrhage |
PHE | Perihematomal edema |
Sen | Sensitivity |
Acc | Accuracy |
Spec | Specificity |
BraTS | International brain tumor segmentation |
SwinUNETR | Swin UNEt TRansformers |
CV | Cross-validation |
References
- Berson, E.R.; Aboian, M.S.; Malhotra, A.; Payabvash, S. Artificial Intelligence for Neuroimaging and Musculoskeletal Radiology: Overview of Current Commercial Algorithms. Semin. Roentgenol. 2023, 58, 178–183. [Google Scholar] [CrossRef] [PubMed]
- Williams, K.S. Evaluations of artificial intelligence and machine learning algorithms in neurodiagnostics. J. Neurophysiol. 2024, 131, 825–831. [Google Scholar] [CrossRef]
- Fernandez, J.-C.; Mounier, L.; Pachon, C.A. A Model-Based Approach for Robustness Testing. In Proceedings of the IFIP International Conference on Testing of Communicating Systems, Montreal, QC, Canada, 31 May–2 June 2005; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2005; pp. 333–348. [Google Scholar]
- Drenkow, N.; Sani, N.; Shpitser, I.; Unberath, M. A Systematic Review of Robustness in Deep Learning for Computer Vision: Mind thegap? arXiv 2021, arXiv:2112.00639. [Google Scholar]
- Zhu, Z.; Liu, F.; Chrysos, G.; Cevher, V. Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization). In Proceedings of the Advances in Neural Information Processing Systems 35 (NeurIPS 2022), New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Freiesleben, T.; Grote, T. Beyond generalization: A theory of robustness in machine learning. Synthese 2023, 202, 109. [Google Scholar] [CrossRef]
- Hendrycks, D.; Gimpel, K. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Kurakin, A.; Goodfellow, I.J.; Bengio, S. Adversarial examples in the physical world. arXiv 2016, arXiv:1607.02533. [Google Scholar] [CrossRef]
- Kawaguchi, K.; Kaelbling, L.P.; Bengio, Y. Generalization in Deep Learning; Cambridge University Press: Cambridge, UK, 2022. [Google Scholar]
- Neyshabur, B.; Bhojanapalli, S.; McAllester, D.; Srebro, N. Exploring Generalization in Deep Learning. In Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Nagarajan, V. Explaining generalization in deep learning: Progress and fundamental limits. arXiv 2021. [Google Scholar] [CrossRef]
- Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O. Understanding deep learning requires rethinking generalization. arXiv 2016. [Google Scholar] [CrossRef]
- Ying, X. An Overview of Overfitting and its Solutions. J. Phys. Conf. Ser. 2019, 1168, 022022. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- He, Y.; Guo, Y.; Lyu, J.; Ma, L.; Tan, H.; Zhang, W.; Ding, G.; Liang, H.; He, J.; Lou, X.; et al. Disorder-Free Data Are All You Need—Inverse Supervised Learning for Broad-Spectrum Head Disorder Detection. NEJM AI 2024, 1, AIoa2300137. [Google Scholar] [CrossRef]
- Ghassemi, M.; Oakden-Rayner, L.; Beam, A.L. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit. Health 2021, 1, e745–e750. [Google Scholar] [CrossRef] [PubMed]
- Kelly, C.J.; Karthikesalingam, A.; Suleyman, M.; Corrado, G.; King, D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019, 17, 195. [Google Scholar] [CrossRef]
- Recht, B.; Roelofs, R.; Schmidt, L.; Shankar, V. Do ImageNet Classifiers Generalize to ImageNet? In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 5389–5400. [Google Scholar]
- Hendrycks, D.; Dietterich, T. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. arXiv 2019, arXiv:1807.01697. [Google Scholar]
- Barzamini, H.; Rahimi, M.; Shahzad, M.; Alhoori, H. Improving generalizability of ML-enabled software through domain specification. In Proceedings of the 1st International Conference on AI Engineering: Software Engineering for AI, Pittsburgh, PA, USA, 16–17 May 2022; pp. 181–192. [Google Scholar]
- Degtiar, I.; Rose, S. A Review of Generalizability and Transportability. Annu. Rev. Stat. Its Appl. 2023, 10, 501–524. [Google Scholar] [CrossRef]
- Fassia, M.K.; Balasubramanian, A.; Woo, S.; Vargas, H.A.; Hricak, H.; Konukoglu, E.; Becker, A.S. Deep Learning Prostate MRI Segmentation Accuracy and Robustness: A Systematic Review. Radiol. Artif. Intell. 2024, 6, e230138. [Google Scholar] [CrossRef]
- Wang, S.; Veldhuis, R.; Brune, C.; Strisciuglio, N. A Survey on the Robustness of Computer Vision Models against Common Corruptions. arXiv 2023, arXiv:2305.06024. [Google Scholar] [CrossRef]
- Keele, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering; School of Computer Science and Mathematics, Keele University: Keele, UK, 2007; pp. 1–2. [Google Scholar]
- Rethlefsen, M.L.; Kirtley, S.; Waffenschmidt, S.; Ayala, A.P.; Moher, D.; Page, M.J.; Koffel, J.B.; Group, P.-S. PRISMA-S: An extension to the PRISMA Statement for Reporting Literature Searches in Systematic Reviews. Syst. Rev. 2021, 10, 39. [Google Scholar] [CrossRef] [PubMed]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- Dice, L.R. Measures of the Amount of Ecologic Association Between Species. Ecology 1945, 26, 297–302. [Google Scholar] [CrossRef]
- Ho, Y.; Wookey, S. The Real-World-Weight Cross-Entropy Loss Function: Modeling the Costs of Mislabeling. IEEE Access 2020, 8, 4806–4813. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2015, arXiv:1412.698. [Google Scholar]
- Tibshirani, R. Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Cortes, C.; Mohri, M.; Rostamizadeh, A. L2 regularization for learning kernels. In Proceedings of the UAI ’09: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, 18–21 June 2009; pp. 109–116. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Sergey Ioffe, C.S. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the ICML’15: Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Yao, Y.; Rosasco, L.; Caponnetto, A. On Early Stopping in Gradient Descent Learning. Constr. Approx. 2007, 26, 289–315. [Google Scholar] [CrossRef]
- Birgani, M.T.; Chegeni, N.; Birgani, F.F.; Fatehi, D.; Akbarizadeh, G.; Shams, A. Optimization of Brain Tumor MR Image Classification Accuracy Using Optimal Threshold, PCA and Training ANFIS with Different Repetitions. J. Biomed. Phys. Eng. 2019, 9, 189–198. [Google Scholar]
- Nath, M.K.; Sahambi, J.S. Independent component analysis of functional MRI data. In Proceedings of the TENCON 2008—2008 IEEE Region 10 Conference, Hyderabad, India, 19–21 November 2008; pp. 1–6. [Google Scholar]
- Abdumalikov, S.; Kim, J.; Yoon, Y. Performance Analysis and Improvement of Machine Learning with Various Feature Selection Methods for EEG-Based Emotion Classification. Appl. Sci. 2024, 14, 10511. [Google Scholar] [CrossRef]
- Sadegh-Zadeh, S.A.; Sadeghzadeh, N.; Soleimani, O.; Ghidary, S.S.; Movahedi, S.; Mousavi, S.Y. Comparative analysis of dimensionality reduction techniques for EEG-based emotional state classification. Am. J. Neurodegener. Dis. 2024, 13, 23–33. [Google Scholar] [CrossRef]
- Wang, J.; Perez, L. The Effectiveness of Data Augmentation in Image Classification using Deep Learning. arXiv 2017. [Google Scholar] [CrossRef]
- Hossain, T.; Zhang, M. MGAug: Multimodal Geometric Augmentation in Latent Spaces of Image Deformations. arXiv 2023. [Google Scholar] [CrossRef]
- Ramesh, J.; Dinsdale, N.; Yeung, P.H.; Namburete, A.I. Geometric Transformation Uncertainty for Improving 3D Fetal Brain Pose Prediction from Freehand 2D Ultrasound Videos. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2024, Marrakesh, Morocco, 6–10 October 2024. [Google Scholar]
- Xiao, Y.; Decenciere, E.; Velasco-Forero, S.; Burdin, H.; Bornschlogl, T.; Bernerd, F.; Warrick, E.; Baldeweck, T. A New Color Augmentation Method for Deep Learning Segmentation of Histological Images. In Proceedings of the International Symposium on Biomedical Imaging (ISBI 2019), Venice, Italy, 8–11 April 2019; pp. 886–890. [Google Scholar]
- Akbiyik, M.E. Data Augmentation in Training CNNs: Injecting Noise to Images. arXiv 2023. [Google Scholar] [CrossRef]
- Dai, Y.; Qian, Y.; Lu, F.; Wang, B.; Gu, Z.; Wang, W.; Wan, J.; Zhang, Y. Improving adversarial robustness of medical imaging systems via adding global attention noise. Comput. Biol. Med. 2023, 164, 107251. [Google Scholar] [CrossRef]
- Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random Erasing Data Augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 13001–13008. [Google Scholar]
- Zhang, X.; Liu, C.; Ou, N.; Zeng, X.; Zhuo, Z.; Duan, Y.; Xiong, X.; Yu, Y.; Liu, Z.; Liu, Y.; et al. CarveMix: A Simple Data Augmentation Method for Brain Lesion Segmentation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2021: 24th International Conference, Strasbourg, France, 27 September–1 October 2021; pp. 196–205. [Google Scholar]
- Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
- Logan, R.; Williams, B.G.; da Silva, M.F.; Indani, A.; Schcolnicov, N.; Ganguly, A.; Miller, S.J. Deep Convolutional Neural Networks with Ensemble Learning and Generative Adversarial Networks for Alzheimer’s Disease Image Data Classification. Front. Aging Neurosci. 2021, 13, 720226. [Google Scholar] [CrossRef] [PubMed]
- Breiman, L. Bias, Variance, and Arcing Classifiers; Statistics Department, University of California at Berkeley: Berkeley, CA, USA, 1996. [Google Scholar]
- Nguyen, D.; Nguyen, H.; Ong, H.; Le, H.; Ha, H.; Duc, N.T.; Ngo, H.T. Ensemble learning using traditional machine learning and deep neural network for diagnosis of Alzheimer’s disease. IBRO Neurosci. Rep. 2022, 13, 255–263. [Google Scholar] [CrossRef]
- Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
- Rumala, D.J.; van Ooijen, P.; Rachmadi, R.F.; Sensusiati, A.D.; Purnama, I.K.E. Deep-Stacked Convolutional Neural Networks for Brain Abnormality Classification Based on MRI Images. J. Digit. Imaging. 2023, 36, 1460–1479. [Google Scholar] [CrossRef] [PubMed]
- Hosny, K.M.; Mohammed, M.A.; Salama, R.A.; Elshewey, A.M. Explainable ensemble deep learning-based model for brain tumor detection and classification. Neural Comput. Appl. 2024, 37, 1289–1306. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015. [Google Scholar]
- Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. [Google Scholar]
- Zhao, L.; Wu, Z.; Dai, H.; Liu, Z.; Zhang, T.; Zhu, D.; Liu, T. Embedding Human Brain Function via Transformer. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2022, Singapore, 18–22 September 2022; pp. 366–375. [Google Scholar]
- Zeineldin, R.A.; Karar, M.E.; Elshaer, Z.; Coburger, J.; Wirtz, C.R.; Burgert, O.; Mathis-Ullrich, F. Explainable hybrid vision transformers and convolutional network for multimodal glioma segmentation in brain MRI. Sci. Rep. 2024, 14, 3713. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. arXiv 2014, arXiv:1412.6572. [Google Scholar] [CrossRef]
- Kurakin, A.; Goodfellow, I.; Bengio, S. Adversarial Machine Learning at Scale. arXiv 2016. [Google Scholar] [CrossRef]
- Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Joel, M.Z.; Umrao, S.; Chang, E.; Choi, R.; Yang, D.X.; Duncan, J.S.; Omuro, A.; Herbst, R.; Krumholz, H.M.; Aneja, S. Using Adversarial Images to Assess the Robustness of Deep Learning Models Trained on Diagnostic Images in Oncology. JCO Clin. Cancer Inform. 2022, 6, e2100170. [Google Scholar] [CrossRef]
- Liu, Z.; Zhang, J.; Jog, V.; Loh, P.-L.; McMillan, A.B. Robustifying Deep Networks for Medical Image Segmentation. J. Digit. Imaging 2021, 34, 1279–1293. [Google Scholar] [CrossRef] [PubMed]
- Villegas-Ch, W.; Jaramillo-Alcázar, A.; Luján-Mora, S. Evaluating the Robustness of Deep Learning Models against Adversarial Attacks: An Analysis with FGSM, PGD and CW. Big Data Cogn. Comput. 2024, 8, 8. [Google Scholar] [CrossRef]
- Carlini, N.; Wagner, D. Towards Evaluating the Robustness of Neural Networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2017; pp. 39–57. [Google Scholar]
- Athalye, A.; Carlini, N.; Wagner, D. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. In Proceedings of the International Conference on Machine Learning Conference, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Tramèr, F.; Kurakin, A.; Papernot, N.; Goodfellow, I.; Boneh, D.; McDaniel, P. Ensemble Adversarial Training: Attacks and Defenses. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Buchheim, C.; Kurtz, J. Min-max-min robustness: A new approach to combinatorial optimization under uncertainty based on multiple solutions. Electron. Notes Discret. Math. 2016, 52, 45–52. [Google Scholar] [CrossRef]
- Esfahani, P.M.; Kuhn, D. Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Program. 2018, 171, 115–166. [Google Scholar] [CrossRef]
- Ganin, Y.; Lempitsky, V. Unsupervised Domain Adaptation by Backpropagation. In Proceedings of the 32nd International Conference on Machine Learning, PMLR, Lille, France, 7–9 July 2015; pp. 1180–1189. [Google Scholar]
- Al Khalil, Y.; Ayaz, A.; Lorenz, C.; Weese, J.; Pluim, J.; Breeuwer, M. Multi-modal brain tumor segmentation via conditional synthesis with Fourier domain adaptation. Comput. Med. Imaging Graph. 2024, 112, 102332. [Google Scholar] [CrossRef]
- Gopinath, K.; Hoopes, A.; Alexander, D.C.; Arnold, S.E.; Balbastre, Y.; Billot, B.; Casamitjana, A.; Cheng, Y.; Chua, R.Y.Z.; Edlow, B.L.; et al. Synthetic data in generalizable, learning-based neuroimaging. Imaging Neurosci. 2024, 2, 1–22. [Google Scholar] [CrossRef]
- Adragna, R.; Creager, E.; Madras, D.; Zemel, R. Fairness and Robustness in Invariant Learning: A Case Study in Toxicity Classification. arXiv 2020. [Google Scholar] [CrossRef]
- Yu, W.; Huang, Z.; Zhang, J.; Shan, H. SAN-Net: Learning generalization to unseen sites for stroke lesion segmentation with self-adaptive normalization. Comput. Biol. Med. 2023, 156, 106717. [Google Scholar] [CrossRef]
- Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? In Proceedings of the NIPS’14: Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QV, USA, 8–13 December 2014; pp. 3320–3328. [Google Scholar]
- Long, M.; Cao, Y.; Wang, J.; Jordan, M. Learning transferable features with deep adaptation networks. In Proceedings of the ICML’15: Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 97–105. [Google Scholar]
- Taşcı, B. Attention Deep Feature Extraction from Brain MRIs in Explainable Mode: DGXAINet. Diagnostics 2023, 13, 895. [Google Scholar] [CrossRef] [PubMed]
- Krishnapriya, S.; Karuna, Y. Pre-trained deep learning models for brain MRI image classification. Front. Hum. Neurosci. 2023, 17, 1150120. [Google Scholar] [CrossRef] [PubMed]
- Vimala, B.B.; Srinivasan, S.; Mathivanan, S.K.; Mahalakshmi; Jayagopal, P.; Dalu, G.T. Detection and classification of brain tumor using hybrid deep learning models. Heliyon 2023, 13, 23029. [Google Scholar] [CrossRef]
- Seetha, J.; Raja, S.S. Brain Tumor Classification Using Convolutional Neural Networks. Biomed. Pharmacol. J. 2018, 11, 1457. [Google Scholar] [CrossRef]
- Hu, S.Y.; Beers, A.; Chang, K.; Höbel, K.; Campbell, J.P.; Erdogumus, D.; Ioannidis, S.; Dy, J.; Chiang, M.F.; Kalpathy-Cramer, J.; et al. Deep feature transfer between localization and segmentation tasks. arXiv 2018. [Google Scholar] [CrossRef]
- Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning Transferable Architectures for Scalable Image Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 8697–8710. [Google Scholar]
- Santoro, A.; Bartunov, S.; Botvinick, M.; Wierstra, D.; Lillicrap, T. Meta-learning with memory-augmented neural networks. In Proceedings of the ICML’16: Proceedings of the 33rd International Conference on International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1842–1850. [Google Scholar]
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
- Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated Machine Learning. ACM Trans. Intell. Syst. Technol. 2019, 10, 1–19. [Google Scholar] [CrossRef]
- Sadilek, A.; Liu, L.; Nguyen, D.; Kamruzzaman, M.; Serghiou, S.; Rader, B.; Ingerman, A.; Mellem, S.; Kairouz, P.; Nsoesie, E.O.; et al. Privacy-first health research with federated learning. Npj Digit. Med. 2021, 4, 132. [Google Scholar] [CrossRef]
- Liu, Y.; Lian, L.; Zhang, E.; Xu, L.; Xiao, C.; Zhong, X.; Li, F.; Jiang, B.; Dong, Y.; Ma, L.; et al. Mixed-UNet: Refined class activation mapping for weakly-supervised semantic segmentation with multi-scale inference. Front. Comput. Sci. 2022, 4, 1036934. [Google Scholar] [CrossRef]
- Taha, A.A.; Hanbury, A. Metrics for evaluating 3D medical image segmentation: Analysis, selection, and tool. BMC Med. Imaging 2015, 15, 29. [Google Scholar] [CrossRef]
- Wang, Y.; Katsaggelos, A.K.; Wang, X.; Parrish, T.B. A deep symmetry convnet for stroke lesion segmentation. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 111–115. [Google Scholar]
- Rockafellar, R.T.; Wets, R.J.B. Variational Analysis; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
- Fawcett, T. An introduction to ROC analysis. Pattern Recognit Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
- Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. Int. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar] [CrossRef]
- Hicks, S.A.; Strümke, I.; Thambawita, V.; Hammou, M.; Riegler, M.A.; Halvorsen, P.; Parasa, S. On evaluation metrics for medical applications of artificial intelligence. Sci. Rep. 2022, 12, 5979. [Google Scholar] [CrossRef] [PubMed]
- Demsar, J. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
- Wilcoxon, F. Individual Comparisons by Ranking Methods. In Breakthroughs in Statistics; Springer Series in Statistics; Springer: Berlin/Heidelberg, Germany, 1992. [Google Scholar]
- Tibshirani, R.J.; Efron, B. An Introduction to the Bootstrap; Chapman & Hall/CRC: Boca Raton, FL, USA, 1993. [Google Scholar]
- You, S.; Wiest, R.; Reyes, M. SaRF: Saliency regularized feature learning improves MRI sequence classification. Comput. Methods Programs Biomed. 2024, 243, 107867. [Google Scholar] [CrossRef]
- Younis, E.M.; Mahmoud, M.N.; Albarrak, A.M.; Ibrahim, I.A. A Hybrid Deep Learning Model with Data Augmentation to Improve Tumor Classification Using MRI Images. Diagnostics 2024, 14, 2710. [Google Scholar] [CrossRef] [PubMed]
- Zhao, X.; Chen, K.; Wu, G.; Zhang, G.; Zhou, X.; Lv, C.; Wu, S.; Chen, Y.; Xie, G.; Yao, Z. Deep learning shows good reliability for automatic segmentation and volume measurement of brain hemorrhage, intraventricular extension, and peripheral edema. Eur. Radiol. 2020, 31, 5012–5020. [Google Scholar] [CrossRef]
- Kok, Y.E.; Pszczolkowski, S.; Law, Z.K.; Ali, A.; Krishnan, K.; Bath, P.M.; Sprigg, N.; Dineen, R.A.; French, A.P. Semantic Segmentation of Spontaneous Intracerebral Hemorrhage, Intraventricular Hemorrhage, and Associated Edema on CT Images Using Deep Learning. Radiol. Artif. Intell. 2022, 4, e220096. [Google Scholar] [CrossRef]
- Grøvik, E.; Yi, D.; Iv, M.; Tong, E.; Nilsen, L.B.; Latysheva, A.; Saxhaug, C.; Jacobsen, K.D.; Helland, Å.; Emblem, K.E.; et al. Handling missing MRI sequences in deep learning segmentation of brain metastases: A multicenter study. NPJ Digit. Med. 2021, 4, 33. [Google Scholar] [CrossRef]
- Amin, J.; Sharif, M.; Anjum, M.A.; Raza, M.; Bukhari, S.A.C. Convolutional neural network with batch normalization for glioma and stroke lesion detection using MRI. Cogn. Syst. Res. 2020, 59, 304–311. [Google Scholar] [CrossRef]
- Ali, R.R.; Yaacob, N.M.; Alqaryouti, M.H.; Sadeq, A.E.; Doheir, M.; Iqtait, M.; Rachmawanto, E.H.; Sari, C.A.; Yaacob, S.S. Learning Architecture for Brain Tumor Classification Based on Deep Convolutional Neural Network: Classic and ResNet50. Diagnostics 2025, 15, 624. [Google Scholar] [CrossRef]
- Yurtsever, M.; Atay, Y.; Arslan, B.; Sagiroglu, S. Development of brain tumor radiogenomic classification using GAN-based augmentation of MRI slices in the newly released gazi brains dataset. BMC Med. Inform. Decis. Mak. 2024, 24, 285. [Google Scholar] [CrossRef]
- Celik, F.; Celik, K.; Celik, A. Enhancing brain tumor classification through ensemble attention mechanism. Sci. Rep. 2024, 14, 22260. [Google Scholar] [CrossRef]
- Rajput, S.; Kapdi, R.; Roy, M.; Raval, M.S. A triplanar ensemble model for brain tumor segmentation with volumetric multiparametric magnetic resonance images. Healthc. Anal. 2024, 5, 100307. [Google Scholar] [CrossRef]
- Saeed, T.; Khan, M.A.; Hamza, A.; Shabaz, M.; Khan, W.Z.; Alhayan, F.; Jamel, L.; Baili, J. Neuro-XAI: Explainable deep learning framework based on deeplabV3+ and bayesian optimization for segmentation and classification of brain tumor in MRI scans. J. Neurosci. Methods 2024, 410, 110247. [Google Scholar] [CrossRef]
- Hatamizadeh, A.; Nath, V.; Tang, Y.; Yang, D.; Roth, H.R.; Xu, D. Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images. arXiv 2022. [Google Scholar] [CrossRef]
- Van, M.H.; Carey, A.N.; Wu, X. Robust Influence-Based Training Methods for Noisy Brain MRI. In Proceedings of the Advances in Knowledge Discovery and Data Mining, PAKDD 2024, Taipei, Taiwan, 7–10 May 2024; pp. 246–257. [Google Scholar]
- Tran, A.T.; Karam, G.A.; Zeevi, D.; Qureshi, A.I.; Malhotra, A.; Majidi, S.; Murthy, S.B.; Park, S.; Kontos, D.; Falcone, G.J.; et al. Improving the Robustness of Deep-Learning Models in Predicting Hematoma Expansion from Admission Head CT. Am. J. Neuroradiol. 2025, ajnr.A8650. [Google Scholar] [CrossRef]
- Zhou, S.; Cox, C.R.; Lu, H. Improving whole-brain neural decoding of fMRI with domain adaptation. In Proceedings of the International Workshop on Machine Learning in Medical Imaging (MLMI), Shenzhen, China, 13 October 2019; pp. 265–273. [Google Scholar]
- Dong, D.; Fu, G.; Li, J.; Pei, Y.; Chen, Y. An unsupervised domain adaptation brain CT segmentation method across image modalities and diseases. Expert Syst. Appl. 2022, 207, 118016. [Google Scholar] [CrossRef]
- Awang, M.K.; Rashid, J.; Ali, G.; Hamid, M.; Mahmoud, S.F.; Saleh, D.I.; Ahmad, H.I. Classification of Alzheimer disease using DenseNet-201 based on deep transfer learning technique. PLoS ONE 2024, 19, 0304995. [Google Scholar] [CrossRef]
- Albalawi, E.; TR, M.; Thakur, A.; Kumar, V.V.; Gupta, M.; Khan, S.B.; Almusharraf, A. Integrated approach of federated learning with transfer learning for classification and diagnosis of brain tumor. BMC Med. Imaging 2024, 24, 110. [Google Scholar] [CrossRef]
- Nimeshika, G.N.; Subitha, D. Enhancing Alzheimer’s disease classification through split federated learning and GANs for imbalanced datasets. PeerJ Comput. Sci. 2024, 10, e2459. [Google Scholar] [CrossRef]
- Shi, C.; Wang, Y.; Wu, Y.; Chen, S.; Hu, R.; Zhang, M.; Qiu, B.; Wang, X. Self-supervised pretraining improves the performance of classification of task functional magnetic resonance imaging. Front. Neurosci. 2023, 17, 1199312. [Google Scholar] [CrossRef]
- Gryshchuk, V.; Singh, D.; Teipel, S.; Dyrba, M.; ADNI; AIBL; FTLDNI Study Groups. Contrastive Self-supervised Learning for Neurodegenerative Disorder Classification. medRxiv 2024. [Google Scholar] [CrossRef]
- Correia de Verdier, M.; Saluja, R.; Gagnon, L.; LaBella, D.; Baid, U.; Hoda Tahon, N.; Foltyn-Dumitru, M.; Zhang, J.; Alafif, M.; Baig, S.; et al. The 2024 Brain Tumor Segmentation (BraTS) Challenge: Glioma Segmentation on Post-treatment MRI. arXiv 2024, arXiv:2405.18368. [Google Scholar] [CrossRef]
- Jack, C.R., Jr.; Bernstein, M.A.; Fox, N.C.; Thompson, P.; Alexander, G.; Harvey, D.; Borowski, B.; Britson, P.J.; J, L.W.; Ward, C.; et al. The Alzheimer’s Disease Neuroimaging Initiative (ADNI): MRI methods. J. Magn. Reson. Imaging 2008, 27, 685–691. [Google Scholar] [CrossRef]
- Hooper, S.M.; Dunnmon, J.A.; Lungren, M.P.; Mastrodicasa, D.; Rubin, D.L.; Re, C.; Wang, A.; Patel, B.N. Impact of Upstream Medical Image Processing on Downstream Performance of a Head CT Triage Neural Network. Radiol. Artif. Intell. 2021, 3, e200229. [Google Scholar] [CrossRef]
- Hernandez Petzsche, M.R.; de la Rosa, E.; Hanning, U.; Wiest, R.; Valenzuela, W.; Reyes, M.; Meyer, M.; Liew, S.L.; Kofler, F.; Ezhov, I.; et al. ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation dataset. Sci. Data 2022, 9, 762. [Google Scholar] [CrossRef]
- Nickparvar, M. Brain Tumor MRI Dataset. Available online: https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset (accessed on 6 April 2025).
- Chilamkurthy, S.; Ghosh, R.; Tanamala, S.; Biviji, M.; Campeau, N.G.; Venugopal, V.K.; Mahajan, V.; Rao, P.; Warier, P. Deep learning algorithms for detection of critical findings in head CT scans: A retrospective study. Lancet 2018, 392, 2388–2396. [Google Scholar] [CrossRef]
- Sprigg, N.; Flaherty, K.; Appleton, J.P.; Al-Shahi Salman, R.; Bereczki, D.; Beridze, M.; Christensen, H.; Ciccone, A.; Collins, R.; Czlonkowska, A.; et al. Tranexamic acid for hyperacute primary IntraCerebral Haemorrhage (TICH-2): An international randomised, placebo-controlled, phase 3 superiority trial. Lancet 2018, 391, 2107–2115. [Google Scholar] [CrossRef]
- Qureshi, A.I.; Palesch, Y.Y.; Barsan, W.G.; Hanley, D.F.; Hsu, C.Y.; Martin, R.L.; Moy, C.S.; Silbergleit, R.; Steiner, T.; Suarez, J.I.; et al. Intensive Blood-Pressure Lowering in Patients with Acute Cerebral Hemorrhage. N. Engl. J. Med. 2016, 375, 1033–1043. [Google Scholar] [CrossRef]
- Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. In Proceedings of the International Conference on Learning Representations (ICLR), Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, USA, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- Bibi, N.; Wahid, F.; Ma, Y.; Ali, S.; Abbasi, I.A.; Alkhayyat, A. A Transfer Learning-Based Approach for Brain Tumor Classification. IEEE Access 2024, 12, 111218–111238. [Google Scholar] [CrossRef]
- Qin, C.; Li, B.; Han, B. Fast brain tumor detection using adaptive stochastic gradient descent on shared-memory parallel environment. Eng. Appl. Artif. Intell. 2023, 120, 105816. [Google Scholar] [CrossRef]
- Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the IJCAI’95: Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, QC, Canada, 20–25 August 1995; pp. 1137–1143. [Google Scholar]
- Badža, M.M.; Barjaktarović, M. Classification of Brain Tumors from MRI Images Using a Convolutional Neural Network. Appl. Sci. 2020, 10, 1999. [Google Scholar] [CrossRef]
- Rastogi, D.; Johri, P.; Tiwari, V.; Elngar, A.A. Multi-class classification of brain tumour magnetic resonance images using multi-branch network with inception block and five-fold cross validation deep learning framework. Biomed. Signal Process. Control 2024, 88, 105602. [Google Scholar] [CrossRef]
- Liu, J.; Deng, F.; Yuan, G.; Yang, C.; Song, H.; Luo, L. An Efficient CNN for Radiogenomic Classification of Low-Grade Gliomas on MRI in a Small Dataset. Wirel. Commun. Mob. Comput. 2022, 2022, 8856789. [Google Scholar] [CrossRef]
- Taher, F.; Shoaib, M.R.; Emara, H.M.; Abdelwahab, K.M.; El-Samie, F.E.A.; Haweel, M.T. Efficient framework for brain tumor detection using different deep learning techniques. Front. Public Health 2022, 10, 959667. [Google Scholar] [CrossRef]
- Usman, K.; Rajpoot, K. Brain tumor classification from multi-modality MRI using wavelets and machine learning. Pattern Anal. Appl. 2017, 20, 871–881. [Google Scholar] [CrossRef]
- Allgaier, J.; Pryss, R. Cross-Validation Visualized: A Narrative Guide to Advanced Methods. Mach. Learn. Knowl. Extr. 2024, 6, 1378–1388. [Google Scholar] [CrossRef]
- Pati, S.; Thakur, S.P.; Hamamcı, İ.E.; Baid, U.; Baheti, B.; Bhalerao, M.; Güley, O.; Mouchtaris, S.; Lang, D.; Thermos, S.; et al. GaNDLF: The generally nuanced deep learning framework for scalable end-to-end clinical workflows. Commun. Eng. 2023, 2, 23. [Google Scholar] [CrossRef]
- Marklund, H.; Xie, S.M.; Zhang, M.; Balsubramani, A.; Hu, W.; Yasunaga, M.; Phillips, R.L.; Beery, S.; Leskovec, J.; Kundaje, A.; et al. WILDS: A Benchmark of in-the-Wild Distribution Shifts. arXiv 2020. [Google Scholar] [CrossRef]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of the NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6629–6640. [Google Scholar]
- Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training GANs. In Proceedings of the NIPS’16: Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 2234–2242. [Google Scholar]
- Barati, B.; Erfaninejad, M.; Khanbabaei, H. Evaluation of effect of optimizers and loss functions on prediction accuracy of brain tumor type using a Light neural network. Biomed. Signal Process. Control 2025, 103, 107409. [Google Scholar] [CrossRef]
- Isensee, F.; Jaeger, P.F.; Kohl, S.A.A.; Petersen, J.; Maier-Hein, K.H. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 2021, 18, 203–211. [Google Scholar] [CrossRef]
- Kamnitsas, K.; Ledig, C.; Newcombe, V.F.; Simpson, J.P.; Kane, A.D.; Menon, D.K.; Rueckert, D.; Glocker, B. Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med. Image Anal. 2017, 36, 61–78. [Google Scholar] [CrossRef]
- Liu, S.; Liu, S.; Cai, W.; Che, H.; Pujol, S.; Kikinis, R.; Feng, D.; Fulham, M.J. ADNI. Multimodal neuroimaging feature learning for multiclass diagnosis of Alzheimer’s disease. IEEE Trans. Biomed. Eng. 2015, 62, 1132–1140. [Google Scholar] [CrossRef]
- Balaji, N.S.; Hemachandran, M.; Jansi, R. Precision Brain Tumor Detection Using Integrated Batch Normalization. In Proceedings of the 10th International Conference on Communication and Signal Processing (ICCSP), Melmaruvathur, India, 12–14 April 2024; pp. 438–444. [Google Scholar]
- Alnowami, M.; Taha, E.; Alsebaeai, S.; Anwar, S.M.; Alhawsawi, A. MR image normalization dilemma and the accuracy of brain tumor classification model. J. Radiat. Res. Appl. Sci. 2022, 15, 33–39. [Google Scholar] [CrossRef]
- Mok, T.C.W.; Chung, A.C.S. Learning Data Augmentation for Brain Tumor Segmentation with Coarse-to-Fine Generative Adversarial Networks. In Proceedings of the Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, Granada, Spain, 16 September 2018; pp. 70–80. [Google Scholar]
- Alsaif, H.; Guesmi, R.; Alshammari, B.M.; Hamrouni, T.; Guesmi, T.; Alzamil, A.; Belguesmi, L. A Novel Data Augmentation-Based Brain Tumor Detection Using Convolutional Neural Network. Appl. Sci. 2022, 12, 3773. [Google Scholar] [CrossRef]
- Aurna, N.F.; Abu Yousuf, M.; Abu Taher, K.; Azad, A.; Moni, M.A. A classification of MRI brain tumor based on two stage feature level ensemble of deep CNN models. Comput. Biol. Med. 2022, 146, 105539. [Google Scholar] [CrossRef]
- Cheng, G.; Ji, H. Adversarial Perturbation on MRI Modalities in Brain Tumor Segmentation. IEEE Access 2020, 8, 206009–206015. [Google Scholar] [CrossRef]
- Joel, M.Z.; Avesta, A.; Yang, D.X.; Zhou, J.-G.; Omuro, A.; Herbst, R.S.; Krumholz, H.M.; Aneja, S. Comparing Detection Schemes for Adversarial Images against Deep Learning Models for Cancer Imaging. Cancers 2023, 15, 1548. [Google Scholar] [CrossRef]
- Han, Y.; Yoo, J.; Kim, H.H.; Shin, H.J.; Sung, K.; Ye, J.C. Deep learning with domain adaptation for accelerated projection-reconstruction MR. Magn. Reson. Med. 2018, 80, 1189–1205. [Google Scholar] [CrossRef]
- Dou, Q.; Ouyang, C.; Chen, C.; Chen, H.; Heng, P.-A. Unsupervised Cross-Modality Domain Adaptation of ConvNets for Biomedical Image Segmentations with Adversarial Loss. In Proceedings of the IJCAI’18: Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 691–697. [Google Scholar]
- Deepak, S.; Ameer, P. Brain tumor classification using deep CNN features via transfer learning. Comput. Biol. Med. 2019, 111, 103345. [Google Scholar] [CrossRef]
- Li, H.; Parikh, N.A.; He, L. A Novel Transfer Learning Approach to Enhance Deep Neural Network Classification of Brain Functional Connectomes. Front. Neurosci. 2018, 12, 491. [Google Scholar] [CrossRef]
- Power, J.D.; Barnes, K.A.; Snyder, A.Z.; Schlaggar, B.L.; Petersen, S.E. Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion. Neuroimage 2012, 59, 2142–2154. [Google Scholar] [CrossRef]
- Reuter, M.; Rosas, H.D.; Fischl, B. Highly accurate inverse consistent registration: A robust approach. Neuroimage 2010, 53, 1181–1196. [Google Scholar] [CrossRef]
- Song, Y.-H.; Yi, J.-Y.; Noh, Y.; Jang, H.; Seo, S.W.; Na, D.L.; Seong, J.-K. On the reliability of deep learning-based classification for Alzheimer’s disease: Multi-cohorts, multi-vendors, multi-protocols, and head-to-head validation. Front. Neurosci. 2022, 16, 851871. [Google Scholar] [CrossRef]
- Varzandian, A.; Razo, M.A.S.; Sanders, M.R.; Atmakuru, A.; Di Fatta, G.; Biomarkers, T.A.I. Classification-Biased Apparent Brain Age for the Prediction of Alzheimer’s Disease. Front. Neurosci. 2021, 15, 673120. [Google Scholar] [CrossRef]
- Angkurawaranon, S.; Sanorsieng, N.; Unsrisong, K.; Inkeaw, P.; Sripan, P.; Khumrin, P.; Angkurawaranon, C.; Vaniyapong, T.; Chitapanarux, I. A comparison of performance between a deep learning model with residents for localization and classification of intracranial hemorrhage. Sci. Rep. 2023, 12, 9975. [Google Scholar] [CrossRef]
- Do, L.-N.; Baek, B.H.; Kim, S.K.; Yang, H.-J.; Park, I.; Yoon, W. Automatic Assessment of ASPECTS Using Diffusion-Weighted Imaging in Acute Ischemic Stroke Using Recurrent Residual Convolutional Neural Network. Diagnostics 2020, 10, 803. [Google Scholar] [CrossRef]
- Sharma, A.; Singh, P.K.; Chandra, R. SMOTified-GAN for Class Imbalanced Pattern Classification Problems. IEEE Access 2022, 10, 30655–30665. [Google Scholar] [CrossRef]
- Wang, S.; Chen, Z.; You, S.; Wang, B.; Shen, Y.; Lei, B. Brain stroke lesion segmentation using consistent perception generative adversarial network. Neural Comput. Appl. 2022, 34, 8657–8669. [Google Scholar] [CrossRef]
- Wang, C.; Li, Y.; Tsuboshita, Y.; Sakurai, T.; Goto, T.; Yamaguchi, H.; Yamashita, Y.; Sekiguchi, A.; Tachimori, H.; Hisateru Tachimori for the Alzheimer’s Disease Neuroimaging Initiative. A high-generalizability machine learning framework for predicting the progression of Alzheimer’s disease using limited data. NPJ Digit. Med. 2022, 5, 43. [Google Scholar] [CrossRef]
- Lu, B.; Li, H.-X.; Chang, Z.-K.; Li, L.; Chen, N.-X.; Zhu, Z.-C.; Zhou, H.-X.; Li, X.-Y.; Wang, Y.-W.; Cui, S.-X.; et al. A practical Alzheimer’s disease classifier via brain imaging-based deep learning on 85,721 samples. J. Big Data 2022, 1, 101. [Google Scholar] [CrossRef]
- de la Rosa, E.; Reyes, M.; Liew, S.L.; Hutton, A.; Wiest, R.; Kaesmacher, J.; Hanning, U.; Hakim, A.; Zubal, R.; Valenzuela, W.; et al. A Robust Ensemble Algorithm for Ischemic Stroke Lesion Segmentation: Generalizability and Clinical Utility Beyond the ISLES Challenge. arXiv 2024, arXiv:2403.19425. [Google Scholar] [CrossRef]
- Sheller, M.J.; Edwards, B.; Reina, G.A.; Martin, J.; Pati, S.; Kotrotsou, A.; Milchenko, M.; Xu, W.; Marcus, D.; Colen, R.R.; et al. Federated learning in medicine: Facilitating multi-institutional collaborations without sharing patient data. Sci. Rep. 2020, 10, 12598. [Google Scholar] [CrossRef]
- Boudi, A.; He, J.; El Kader, I.A. Enhancing Alzheimer’s Disease Classification with Transfer Learning: Finetuning a Pre-trained Algorithm. Curr. Med. Imaging 2024, 20, e15734056305633. [Google Scholar] [CrossRef]
- Kim, H.J.; Roh, H.G. Imaging in Acute Anterior Circulation Ischemic Stroke: Current and Future. Neurointervention 2022, 17, 2–17. [Google Scholar] [CrossRef]
- Qiu, S.; Joshi, P.S.; Miller, M.I.; Xue, C.; Zhou, X.; Karjadi, C.; Chang, G.H.; Joshi, A.S.; Dwyer, B.; Zhu, S.; et al. Development and validation of an interpretable deep learning framework for Alzheimer’s disease classification. Brain 2020, 143, 1920–1933. [Google Scholar] [CrossRef]
- Balzano, R.F.; Mannatrizio, D.; Castorani, G.; Perri, M.; Pennelli, A.M.; Izzo, R.; Popolizio, T.; Guglielmi, G. Imaging of Cerebral Microbleeds: Primary Patterns and Differential Diagnosis. Curr. Radiol. Rep. 2021, 9, 15. [Google Scholar] [CrossRef]
- Sharrock, M.F.; Mould, W.A.; Hildreth, M.; Ryu, E.P.; Walborn, N.; Awad, I.A.; Hanley, D.F.; Muschelli, J. Bayesian Deep Learning Outperforms Clinical Trial Estimators of Intracerebral and Intraventricular Hemorrhage Volume. J. Neuroimaging 2023, 32, 968–976. [Google Scholar] [CrossRef]
- Pan, D.; Zeng, A.; Jia, L.; Huang, Y.; Frizzell, T.; Song, X. Early Detection of Alzheimer’s Disease Using Magnetic Resonance Imaging: A Novel Approach Combining Convolutional Neural Networks and Ensemble Learning. Front. Neurosci. 2020, 14, 259. [Google Scholar] [CrossRef]
- Yüce, M.; Öztürk, S.; Pamuk, G.G.; Varlık, C.; Cimilli, A.T. Automatic segmentation and volumetric analysis of intracranial hemorrhages in brain CT images. Eur. J. Radiol. 2025, 184, 111952. [Google Scholar] [CrossRef]
- Piao, Z.; Gu, Y.H.; Jin, H.; Yoo, S.J. Intracerebral hemorrhage CT scan image segmentation with HarDNet based transformer. Sci. Rep. 2023, 13, 7208. [Google Scholar] [CrossRef] [PubMed]
- Chang, C.S.; Chang, T.S.; Yan, J.L.; Ko, L. All Attention U-NET for Semantic Segmentation of Intracranial Hemorrhages In Head CT Images. In Proceedings of the IEEE Biomedical Circuits and Systems Conference (BioCAS), Taipei, Taiwan, 13–15 October 2022; pp. 600–604. [Google Scholar]
- Nijiati, M.; Tuersun, A.; Zhang, Y.; Yuan, Q.; Gong, P.; Abulizi, A.; Tuoheti, A.; Abulaiti, A.; Zou, X. A symmetric prior knowledge based deep learning model for intracerebral hemorrhage lesion segmentation. Front. Physiol. 2022, 13, 977427. [Google Scholar] [CrossRef]
- Kiewitz, J.; Aydin, O.U.; Hilbert, A.; Gultom, M.; Nouri, A.; Khalil, A.A.; Vajkoczy, P.; Tanioka, S.; Ishida, F.; Dengler, N.F.; et al. Deep Learning-based Multiclass Segmentation in Aneurysmal Subarachnoid Hemorrhage. Front. Neurol. 2024, 15, 1490216. [Google Scholar] [CrossRef]
- Wu, B.; Xie, Y.; Zhang, Z.; Ge, J.; Yaxley, K.; Bahadir, S.; Wu, Q.; Liu, Y.; To, M.S. BHSD: A 3D Multi-class Brain Hemorrhage Segmentation Dataset. In Proceedings of the Machine Learning in Medical Imaging: 14th International Workshop, MLMI, MICCAI, Vancouver, BC, Canada, 8 October 2023. Proceedings, Part I. [Google Scholar]
- Asif, M.; Shah, M.A.; Khattak, H.A.; Mussadiq, S.; Ahmed, E.; Nasr, E.A.; Rauf, H.T. Intracranial Hemorrhage Detection Using Parallel Deep Convolutional Models and Boosting Mechanism. Diagnostics 2023, 13, 652. [Google Scholar] [CrossRef] [PubMed]
- Umapathy, S.; Murugappan, M.; Bharathi, D.; Thakur, M. Automated Computer-Aided Detection and Classification of Intracranial Hemorrhage Using Ensemble Deep Learning Techniques. Diagnostics 2023, 13, 2987. [Google Scholar] [CrossRef] [PubMed]
- Nizarudeen, S.; Shanmughavel, G.R. Comparative analysis of ResNet, ResNet-SE, and attention-based RaNet for hemorrhage classification in CT images using deep learning. Biomed. Signal Process. Control 2024, 8, 105672. [Google Scholar] [CrossRef]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.T.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Proceedings of the NIPS’20: Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; pp. 9459–9474. [Google Scholar]
- Ayubcha, C.; Sajed, S.; Omara, C.; Veldman, A.B.; Singh, S.B.; Lokesha, Y.U.; Liu, A.; Aziz-Sultan, M.A.; Smith, T.R.; Beam, A. Improved Generalizability in Medical Computer Vision: Hyperbolic Deep Learning in Multi-Modality Neuroimaging. J. Imaging 2024, 10, 319. [Google Scholar] [CrossRef]
- Bao, Q.; Mi, S.; Gang, B.; Yang, W.; Chen, J.; Liao, Q. MDAN: Mirror Difference Aware Network for Brain Stroke Lesion Segmentation. IEEE J. Biomed. Health Inform. 2022, 26, 1628–1639. [Google Scholar] [CrossRef]
- Wu, H.; Chen, X.; Li, P.; Wen, Z. Automatic Symmetry Detection from Brain MRI Based on a 2-Channel Convolutional Neural Network. IEEE Trans. Cybern. 2021, 51, 4464–4475. [Google Scholar] [CrossRef]
Techniques | Studies | Dataset | Performance | Conclusion |
---|---|---|---|---|
Loss Function [27] | Brain hemorrhage (ICH), intraventricular extension (IVH), and peripheral edema (PHE) segmentation [100]. | Huashan Hospital, Fudan University | DSC = 0.92, 0.79, 0.71 and Sen = 0.93, 0.88, 0.81 for ICH, IVH, PHE in segmentation tasks. | DSC loss is essential for segmentation. |
ICH, IVH, PHE segmentation from non-contrast CT [101]. | TICH-2 | Improved average DSC by 0.02 | Focal loss is valuable for class imbalance. | |
Regularization (L1/L2, Dropout) [31,32,33] | Regularized feature learning improves MRI sequence classification [98]. | Swiss-First study | Improvement in mean accuracy by 4.4% (from 0.935 to 0.976), mean AUC by 1.2% (from 0.9851 to 0.9968), and mean F1-score by 20.5% (from 0.767 to 0.924). | Regularization is critical for training and improves robustness. |
Input-level dropout model for brain metastases segmentation [102]. | Oslo University Hospital and Stanford | Improve DSC (0.795 ± 0.104 vs. 0.774 ± 0.104, p = 0.017), and IoU (0.561 ± 0.225 vs. 0.492 ± 0.186, p < 0.001). Tested on 6 datasets. | ||
Batch Normalization [34] | Convolutional neural network with batch normalization for glioma and stroke lesion detection using MRI [103]. | BRATS 2013, 2014, 2015, 2016, 2017 and ISLES 2015. | Improves model convergence and boosts 0.9778 Acc, 0.9754 DSC, 0.9770 Spec, 0.9789 Sen on BRATS dataset 2017 | Dependence on batch size Can increase computational cost but help models achieve higher accuracy and generalization. |
the combination of convolution, batch normalization and ReLU activation enhances the network’s ability to discriminate and capture relevant information [104] | Kaggle (Brain Tumor MRI Dataset) | Improves with an accuracy of 99.88% | ||
Data Augmentation [40] | Data Augmentation improve Tumor Classification Using MRI Images [99]. | Tianjin Medical University General, Nan Fang Hospital, BR35H | Improvement in precision = 0.9951, recall = 0.9947, F1-score = 0.9944, spec = 0.9977. | Essential for improving robustness, especially in limited datasets. |
StyleGANv2-ADA is proposed for augmenting brain MRI slices [105] | Gazi University Faculty of Medicine, BR35H | BraTS 2021 = 75.18%, and Gazi Brains 2020 datasets = 99.36%, BR35H dataset= 98.99% | ||
Ensemble Methods [48,49] | Enhancing brain tumor classification through ensemble attention mechanism [106]. | BraTS 2019 | Improves acc = 0.9894, rrecision = 0.9891, recall = 0.9893, F1-Score = 0.9891, AUC = 0.984 | Effective in improving model reliability for classification and segmentation tasks. |
An optimized triplanar (2.5D) model ensemble to generate accurate segmentation with fewer parameters [107] | BraTS 2020 | Improving Dice with enhancing tumor = 0.713, whole tumor = 0.873, and tumor core = 0.778 | ||
Model Architecture Improvements | DeeplabV3 + Bayesian optimization for segmentation and classification of brain tumor in MRI scans [108]. | Brats 2021 | Improves acc = 97.0%, recall = 0.966, spec = 0.988, F1-Score = 0.96, precision = 0.966 | Advanced architectures such as SwinUNETR and GNNs can improve performance but have a high computational demand. |
Swin transformers for semantic segmentation of brain tumors [109]. | BRATS 2021 | DSC and HD in this approach are better than nnU-Net, SegResNet, TransBTS. | ||
Adversarial Training [60,61,62] | Robust influence-based training methods for noisy brain MRI [110] | BRATS 2017 | Increases robustness, ACC = 89.52 ± 2.61 | Effective for improving robustness but computationally intensive. |
Improving robustness in predicting hematoma expansion [111] | ATACH-2, YALE | AUC = 0.8 is the same but increases robustness | ||
Domain Adaptation [71] | Improving the whole-brain neural decoding of fMRI with domain adaptation [112] | OpenfMRI | The best Acc improvement is 10.47% (from 77.26% to 87.73%) | Highly recommended for multi-site datasets with distribution shifts. |
An unsupervised domain adaptation segmentation model is trained across modalities and diseases [113] | Decathlon medical segmentation challenge, RSNA | +11.55% DSC | ||
Transfer Learning [76,77] | Transfer learning for accurate brain tumor detection [80] | Brain tumor dataset. Figshare | Highest acc of 99.75% | Worth implementing for tasks with limited labeled data, especially in classification. |
Classification of Alzheimer’s disease using DenseNet-201 based on deep transfer learning techniques [114] | AD5C dataset | Acc = 98.24 | ||
Federated Learning [85] | Integrated approach of federated learning with transfer learning for the classification and diagnosis of brain tumors on MRI [115] | Figshare, Br35H, SARTAJ | High precision (0.99 for glioma, 0.95 for meningioma, 1.00 for no tumor, and 0.98 for pituitary), recall, and F1-scores in classification, outperforming existing methods. | Promising multi-institutional collaborations, balancing performance and privacy. |
Enhancing Alzheimer’s disease classification through split federated learning [116] | Kaggle | Acc = 84.53% | ||
Self-Supervised Learning [88] | Improves the performance of classification in task-based functional MRI [117]. | Human Connectome Project | Acc improves to 80.2 ± 4.7% | Reliable but heavily reliant on large, labeled datasets. |
Contrastive self-supervised learning for neurodegenerative disorder classification [118] | Alzheimer’s Disease Neuroimaging Initiative (ADNI), Australian Imaging, Biomarker and Lifestyle Flagship Study of Aging (AIBL), Frontotemporal Lobar Degeneration Neuroimaging Initiative (FTLDNI) | For AD vs. CN, acc= 82% test subset and acc = 80% independent holdout dataset |
Technique | Strengths | Limitations | Implementation Considerations | Examples |
---|---|---|---|---|
Loss function (for example, Dice loss) | Often used for segmentation tasks by directly optimizing the overlap (e.g., the Dice coefficient) between the predicted mask and the ground-truth. | Less sensitive to small structures | Used in conjunction with other losses such as cross entropy for better performance on imbalanced datasets. | [142,143] |
Regularization (L1/L2/Dropout) | Controls model complexity Reduces overfitting Computationally efficient | Uniform penalty across features May oversimplify important patterns Hyperparameter sensitivity | Balance with domain-specific constraints Considers anatomical priors | [144,145] |
Batch Normalization | Stabilizes training Reduces internal covariate shifts Enables higher learning rates | Batch size dependency Memory requirements Inference stability issues | Consider batch size constraints Address multi-site variations | [146,147] |
Data Augmentation | Increases effective dataset size Improves generalization Addresses class imbalance | May introduce unrealistic variations Risk of violating anatomical constraints Computational overhead during training | Ensures clinically plausible transformations Validates augmented samples with experts | [148,149] |
Ensemble Methods | Robust predictions Uncertainty quantification Handles different aspects of data | Increased computational cost Storage requirements Inference time overhead | Balances diversity and accuracy Considers clinical time constraints | [143,150] |
Model architecture improvements | Improved feature extraction: advanced architectures combining CNNs and transformer-based models capture complex patterns in neuroimaging data. Scalability: Modularly designed architectures (e.g., nnU-Net) adapt to different neuroimaging modalities (e.g., MRI, fMRI, PET) Multimodal processing: Models such as multimodal CNNs integrate different types of neuroimaging data, improving robustness Better temporal modeling: attention-based or periodic components efficiently process temporal neuroimaging data such as fMRI and EEG | Increased computational demands, especially for architectures such as transformers and deep CNNs. Potential for overfitting when dealing with small datasets, as seen in neuroimaging. Complex hyperparameter tuning is required for architectures such as attention mechanisms | For segmentation tasks, architectures such as U-Net and its variants (3D U-Net, nnU-Net) are specifically designed for volumetric neuroimaging data Considers Graph Neural Networks (GNNs) for connectivity studies, as they model relationships between brain regions. Uses self-supervised pretraining with architectures like Vision Transformers (ViT) to improve performance on limited labeled data Uses model ensembling or dropout models to reduce overfitting and improve generalization | [109,143] |
Adversarial Training | Improves robustness to perturbations Handles image artifacts Better generalization | Computationally intensive May reduce standard accuracy Complex hyperparameter tuning | Use clinically relevant perturbations Balance robustness and accuracy | [151,152] |
Domain Adaptation | Addresses scanner variations Handles protocol differences Improves cross-site generalization | Requires data from target domain May not capture all domain shifts Complex implementation | Validates on multiple scanner types Considers temporal domain shifts | [153,154] |
Transfer Learning | Leverages knowledge from larger datasets Reduces required training data Accelerates convergence | Source-target domain mismatch can degrade performance May preserve unwanted biases from source domain Requires careful layer-specific fine-tuning | Validates anatomical consistency Adjusts learning rates per layer based on domain similarity | [155,156] |
Authors | Dataset | Results | Augmentation | Optimization | Cross- Validation | Ensemble Learning | Model Architectures |
---|---|---|---|---|---|---|---|
Segmentation (Dice as the main accuracy metric) | |||||||
Murat Yüce [175] | 1508 CTs (QURE500+ RSNA 2019) | IPH = 0.59; IVH = 0.47; EDH = 0.35; SAH = 0.24; SDH = 0.34 | ✔ | ✔ | ✔ | nnUNet | |
Zhegao Piao [176] | 82.636 CTs, test 20% | IPH = 0.809; IVH = 0.742; EDH = 0.777; SAH = 0.545; SDH = 0.709 | ✔ | ✔ | HarDNet based transformer | ||
Chia Shuo Chang [177] | 51 CTs, test 14.5% | IPH = 0.924; IVH = 0.858; EDH = 0.816; SAH = 0.567; SDH = 0.82 | ✔ | ✔ | All Attention U-NET | ||
Mayidili Nijiati [178] | 1157 CTs, test 200 CTs | IPH = 0.784; IVH = 0.680; EDH = 0.359; SAH = 0.337; SDH = 0.534 | ✔ | ✔ | Sym-TransNet | ||
Julia Kiewitz [179] | 73 CTs, test 20 CTs | IPH = 0.743; IVH = 0.750; SAH = 0.686; SDH = 0.758 | ✔ | ✔ | ✔ | nnUnet | |
Biao Wu [180] | 192 CTs BHSD | IPH = 0.54; IVH = 0.51; EDH = 0.48; SAH = 0.215; SDH = 0.1523 | ✔ | ✔ | ✔ | nnUnet | |
Classification (AUC as main outcome accuracy metric) | |||||||
Muhammad Asif [181] | 13,334 CTs (CQ500 + RSNA), test 30% | IPH = 0.979; IVH = 0.977; EDH = 0.980; SAH = 0.976; SDH = 0.974 | ✔ | ✔ | Res-Inc-LGBM | ||
Snekhalatha Umapathy [182] | 133,709 slices (CQ500 + RSNA), test 14,600 slices | IPH = 0.99; IVH = 0.98; EDH = 0.99; SAH = 0.99; SDH = 0.99 | ✔ | ✔ | ✔ | SE-ResNeXT, LSTM | |
Shanu Nizarudeen [183] | CQ500, 10% | IPH = 0.98; IVH = 0.98; EDH = 0.96; SAH = 0.98; SDH = 0.98 | ✔ | ✔ | Attention-based RaNet |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tran, A.T.; Zeevi, T.; Payabvash, S. Strategies to Improve the Robustness and Generalizability of Deep Learning Segmentation and Classification in Neuroimaging. BioMedInformatics 2025, 5, 20. https://doi.org/10.3390/biomedinformatics5020020
Tran AT, Zeevi T, Payabvash S. Strategies to Improve the Robustness and Generalizability of Deep Learning Segmentation and Classification in Neuroimaging. BioMedInformatics. 2025; 5(2):20. https://doi.org/10.3390/biomedinformatics5020020
Chicago/Turabian StyleTran, Anh T., Tal Zeevi, and Seyedmehdi Payabvash. 2025. "Strategies to Improve the Robustness and Generalizability of Deep Learning Segmentation and Classification in Neuroimaging" BioMedInformatics 5, no. 2: 20. https://doi.org/10.3390/biomedinformatics5020020
APA StyleTran, A. T., Zeevi, T., & Payabvash, S. (2025). Strategies to Improve the Robustness and Generalizability of Deep Learning Segmentation and Classification in Neuroimaging. BioMedInformatics, 5(2), 20. https://doi.org/10.3390/biomedinformatics5020020