Data Leakage in Deep Learning for Alzheimer’s Disease Diagnosis: A Scoping Review of Methodological Rigor and Performance Inflation
Abstract
1. Introduction
1.1. The Promise of Deep Learning
1.2. The Problem of Data Leakage
1.3. Statement of Purpose
2. Materials and Methods
2.1. Registration and Reporting
2.2. Review Framework and Theoretical Model
2.3. Search Strategy
2.4. Eligibility Criteria
2.5. Study Selection and Data Extraction
2.6. Risk Stratification Framework
- -
- Low risk: Studies were categorized as low risk if they provided explicit confirmation of subject-wise data splitting (i.e., ensuring that data from the same participant did not appear in both training and test sets), offered a clear description of their validation methodology, and showed no major indicators of additional methodological concerns. These studies typically employed either independent hold-out test sets or cross-validation procedures that were appropriately structured to avoid leakage.
- -
- Moderate risk: Studies were classified as moderate risk when methodological descriptions were ambiguous or incomplete, precluding a definitive judgment regarding data leakage. Although there was no direct evidence that subject-level contamination had occurred, the lack of transparency, coupled with the presence of one or more methodological concerns (e.g., unclear handling of confounders, incomplete reporting of validation details), limited confidence in the robustness of the results.
- -
- High risk: Studies were deemed high risk when there was clear evidence or strong probability of data leakage. This included the use of slice-wise or region-wise data splitting (where multiple samples from the same individual could be present across training and test sets), the absence of a hold-out or validation set, or the presence of multiple significant methodological deficiencies that would be expected to inflate performance estimates.
2.7. Methodological Quality Assessment
- -
- Low risk of data leakage: Studies were examined for the use of appropriate subject-wise splitting across training, validation, and test sets. This procedure prevents inadvertent sharing of images or longitudinal data from the same participant across different partitions, thereby providing a more accurate estimate of generalization performance.
- -
- External validation: on independent datasets: We assessed whether studies confirmed model performance on a fully independent cohort (e.g., models trained on ADNI and validated on AIBL or NACC). Such external validation offers a more stringent test of generalizability than internal cross-validation alone.
- -
- Robust confounder control: We evaluated the extent to which analyses accounted for potential sources of bias, including demographic variables (age, sex, education, APOE4 status) and technical factors (scanner type, imaging protocol, and site effects). These considerations are particularly important in multi-center datasets where heterogeneity can spuriously drive classification accuracy.
2.8. Scope of Methodological Assessment
3. Results
3.1. Study Selection and Characteristics
3.2. The Evidence for Data Leakage
3.3. Data Modalities and Architectural Approaches
3.4. Systematic Methodological Failures
3.5. Interpretability Methods and the Validation Gap
3.6. Temporal Trends and Improvement
4. Discussion
Supplementary Materials
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wimo, A.; Seeher, K.; Cataldi, R.; Cyhlarova, E.; Dielemann, J.L.; Frisell, O.; Guerchet, M.; Jönsson, L.; Malaha, A.K.; Nichols, E.; et al. The worldwide costs of dementia in 2019. Alzheimer’s Dement. 2023, 19, 2865–2873. [Google Scholar] [CrossRef]
- 2025 Alzheimer’s Disease Facts and Figures. Alzheimer’s Dement. 2025, 21, e70235. [CrossRef]
- Beach, T.G.; Monsell, S.E.; Phillips, L.E.; Kukull, W. Accuracy of the clinical diagnosis of Alzheimer disease at National Institute on Aging Alzheimer’s Disease Centers, 2005-2010. J. Neuropathol. Exp. Neurol. 2012, 71, 266–273. [Google Scholar] [CrossRef]
- Jo, T.; Nho, K.; Saykin, A.J. Deep learning in Alzheimer’s disease: Diagnostic classification and prognostic prediction using neuroimaging data. Front. Aging Neurosci. 2019, 11, 220. [Google Scholar] [CrossRef]
- Shan, G.; Chen, X.; Wang, C.; Liu, L.; Gu, Y.; Jiang, H.; Shi, T. Comparing diagnostic accuracy of clinical professionals and large language models: Systematic review and meta-analysis. JMIR Med. Inform. 2025, 13, e64963. [Google Scholar] [CrossRef]
- Alturayeif, N.; Hassine, J. Data leakage detection in machine learning code. PeerJ Comput. Sci. 2025, 11, e2730. [Google Scholar] [CrossRef]
- Yagis, E.; de Herrera, A.G.S.; Citi, L. Convolutional autoencoder based deep learning approach for Alzheimer’s disease diagnosis using brain MRI. In Proceedings of the 2021 IEEE 34th International Symposium on Computer-Based Medical Systems (CBMS), Aveiro, Portugal, 7–9 June 2021; pp. 486–491. [Google Scholar]
- FDA Clears First Blood Test Used in Diagnosing Alzheimer’s Disease. In FDA News Release; FDA: Silver Spring, MD, USA, 2025.
- Hampel, H.; Elhage, A.; Cho, M.; Apostolova, L.G.; Nicoll, J.A.R.; Atri, A. Amyloid-related imaging abnormalities (ARIA): Radiological, biological and clinical characteristics. Brain 2023, 146, 4414–4424. [Google Scholar] [CrossRef]
- Jeong, S.Y.; Suh, C.H.; Lim, J.S.; Shim, W.H.; Heo, H.; Choi, Y.; Kim, H.S.; Kim, S.J.; Lee, J.-H. Incidence of amyloid-related imaging abnormalities in phase III clinical trials. Neurology 2025, 104, e213483. [Google Scholar] [CrossRef] [PubMed]
- Allgaier, J.; Mulansky, L.; Draelos, R.L.; Pryss, R. How does the model make predictions? A systematic literature review on explainability in healthcare. Artif. Intell. Med. 2023, 143, 102616. [Google Scholar] [CrossRef] [PubMed]
- Taiyeb Khosroshahi, M.; Morsali, S.; Gharakhanlou, S.; Motamedi, A.; Hassanbaghlou, S.; Vahedi, H.; Pedrammehr, S.; Kabir, H.M.D.; Jafarizadeh, A. Explainable artificial intelligence in neuroimaging of Alzheimer’s disease. Diagnostics 2025, 15, 612. [Google Scholar] [CrossRef] [PubMed]
- Vimbi, V.; Shaffi, N.; Mahmud, M. Interpreting artificial intelligence models: LIME and SHAP in Alzheimer’s disease detection. Brain Inform. 2024, 11, 10. [Google Scholar] [CrossRef]
- Rudin, R.S.; Schneider, E.C.; Predmore, Z.; A Gidengil, C. Knowledge gaps inhibit health IT development for coordinating complex patients’ care. Am. J. Manag. Care 2016, 22, e317–e322. [Google Scholar]
- Tricco, A.C.; Lillie, E.; Zarin, W.; O’Brien, K.K.; Colquhoun, H.; Levac, D.; Moher, D.; Peters, M.D.J.; Horsley, T.; Weeks, L.; et al. PRISMA extension for scoping reviews (PRISMA-ScR): Checklist and explanation. Ann. Intern. Med. 2018, 169, 467–473. [Google Scholar] [CrossRef]
- Arksey, H.; O’Malley, L. Scoping studies: Towards a methodological framework. Int. J. Soc. Res. Methodol. 2005, 8, 19–32. [Google Scholar] [CrossRef]
- Van den Akker, O.; Peters, G.J.; Bakker, C.; Carlsson, R.; Coles, N.A.; Corker, K.S.; Feldman, G.; Moreau, D.; Nordström, T.; Pickering, J.S.; et al. Increasing the transparency of systematic reviews: Presenting a generalized registration form. Syst. Rev. 2020, 12, 170. [Google Scholar] [CrossRef] [PubMed]
- Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; van der Laak, J.A.W.M.; van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef] [PubMed]
- Wen, J.; Thibeau-Sutre, E.; Diaz-Melo, M.; Samper-González, J.; Routier, A.; Bottani, S.; Dormont, D.; Durrleman, S.; Burgos, N.; Colliot, O.; et al. Convolutional neural networks for classification of Alzheimer’s disease: Reproducibility and evaluation. Nat. Commun. 2020, 11, 1952. [Google Scholar] [CrossRef]
- Payan, A.; Montana, G. Predicting Alzheimer’s disease: A neuroimaging study with 3D convolutional neural networks. Front. Neurosci. 2015, 9, 220. [Google Scholar]
- Vieira, S.; Pinaya, W.H.L.; Mechelli, A. Using machine learning and structural MRI to predict dementia progression: A systematic review. NeuroImage Clin. 2017, 16, 659–675. [Google Scholar]
- Bron, E.E.; Smits, M.; van der Flier, W.M.; Vrenken, H.; Barkhof, F.; Scheltens, P.; Papma, J.M.; Steketee, R.M.E.; Méndez Orellana, C.; Meijboom, R.; et al. Standardized evaluation of algorithms for computer-aided diagnosis of dementia based on structural MRI: The CADDementia challenge. Neurobiol. Aging 2015, 36, S153–S163. [Google Scholar] [CrossRef]
- Rieke, N.; Hancox, J.; Li, W.; Milletarì, F.; Roth, H.R.; Albarqouni, S.; Bakas, S.; Galtier, M.N.; Landman, B.A.; Maier-Hein, K.; et al. The future of digital health with federated learning. Nat. Mach. Intell. 2020, 2, 349–360. [Google Scholar] [CrossRef]
- Bae, J.; Stocks, J.; Heywood, A.; Jung, Y.; Jenkins, L.; Hill, V.; Katsaggelos, A.K.; Popuri, K.; Wang, L.; Beg, M.F. Interpretable deep learning for Alzheimer’s disease classification using MRI. Alzheimer’s Dement. Diagn. Assess. Dis. Monit. 2020, 12, e12044. [Google Scholar]
- Choi, H.; Jin, K.H. Predicting cognitive decline with deep learning of brain metabolism and amyloid imaging. Sci. Rep. 2020, 10, 2203. [Google Scholar] [CrossRef]
- Eitel, F.; Soehler, E.; Bellmann-Strobl, J.; Brandt, A.U.; Ruprecht, K.; Giess, R.M.; Kuchling, J.; Asseyer, S.; Weygandt, M.; Haynes, J.D.; et al. Uncovering convolutional neural network decisions for diagnosing Alzheimer’s disease on structural MRI: A layer-wise relevance propagation study. Hum. Brain Mapp. 2021, 42, 3453–3466. [Google Scholar]
- Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 29, 372. [Google Scholar]
- Yagis, E.; Atnafu, S.W.; García Seco de Herrera, A.; Marzi, C.; Scheda, R.; Giannelli, M.; Tessa, C.; Citi, L.; Diciotti, S. Effect of data leakage in brain MRI classification using 2D convolutional neural networks. Sci. Rep. 2021, 11, 22544. [Google Scholar] [CrossRef]
- Yoshida, M.; Uemura, T.; Mizoi, M.; Waragai, M.; Sakamoto, A.; Terui, Y.; Kashiwagi, K.; Igarashi, K. Urinary amino acid-conjugated acrolein and taurine as new biomarkers for detection of dementia. J. Alzheimer’s Dis. 2023, 92, 361–369. [Google Scholar] [CrossRef]
- Drage, R.; Escudero, J.; Parra, M.A.; Scally, B.; Anghinah, R.; De Araujo, A.V.L.; Basile, L.F.; Abasolo, D. A novel deep learning approach using AlexNet for the classification of electroencephalograms in Alzheimer’s disease and mild cognitive impairment. In Proceedings of the 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Glasgow, UK, 11–15 July 2022; IEEE: Glasgow, UK, 2022; pp. 3175–3178. [Google Scholar] [CrossRef]
- Klingenberg, M.; Stark, D.; Eitel, F.; Budding, C.; Habes, M.; Ritter, K. Higher performance for women than men in MRI-based Alzheimer’s disease detection. Alzheimers Res. Ther. 2023, 15, 84. [Google Scholar] [CrossRef]
- Santos Bringas, S.; Salomón, S.; Duque, R.; Lage, C.; Montaña, J.L. Alzheimer’s disease stage identification using deep learning models. J. Biomed. Inform. 2020, 109, 103514. [Google Scholar]
- Ruwanpathirana, G.P.; Williams, R.C.; Masters, C.L.; Rowe, C.C.; Johnston, L.A.; Davey, C.E. Mapping the association between Tau-PET and Aβ-amyloid-PET using deep learning. Sci. Rep. 2022, 12, 14797. [Google Scholar] [CrossRef] [PubMed]
- Rutkowski, T.M.; Abe, M.S.; Sugimoto, H.; Otake-Matsuura, M. Mild cognitive impairment detection with machine learning and topological data analysis applied to EEG time-series in facial emotion oddball paradigm. In Proceedings of the 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Sydney, Australia, 24–27 July 2023; pp. 1–4. [Google Scholar] [CrossRef]
- Ortiz, A.; Munilla, J.; Górriz, J.M.; Ramírez, J. Ensembles of deep learning architectures for the early diagnosis of the Alzheimer’s disease. Int. J. Neural Syst. 2016, 26, 1650025. [Google Scholar] [CrossRef]
- Bloch, L.; Friedrich, C.M. Systematic comparison of 3D deep learning and classical machine learning explanations for Alzheimer’s disease detection. Comput. Biol. Med. 2024, 170, 108029. [Google Scholar] [CrossRef] [PubMed]
- Deatsch, A.; Perovnik, M.; Namías, M.; Trošt, M.; Jeraj, R. Development of a deep learning network for Alzheimer’s disease classification with evaluation of imaging modality and longitudinal data. Phys. Med. Biol. 2022, 67, 195014. [Google Scholar] [CrossRef] [PubMed]
- Mahendran, N.; P M, D.R.V. A Deep. learning framework with an embedded-based feature selection approach for the early detection of the Alzheimer’s disease. Comput. Biol. Med. 2022, 141, 105056. [Google Scholar] [CrossRef] [PubMed]
- Srivishagan, S.; Kumaralingam, L.; Thanikasalam, K.; Pinidiyaarachchi, U.A.J.; Ratnarajah, N. Discriminative patterns of white matter changes in Alzheimer’s. Psychiatry Res. Neuroimaging 2023, 328, 111576. [Google Scholar] [CrossRef]
- Tsang, G.; Zhou, S.-M.; Xie, X. Modeling large sparse data for feature selection: Hospital admission predictions of the dementia patients using primary care electronic health records. IEEE J. Transl. Eng. Health Med. 2021, 9, 3000113. [Google Scholar] [CrossRef]
- Bit, S.; Dey, P.; Maji, A.; Khan, T.K. MRI-based mild cognitive impairment and Alzheimer’s disease classification using an algorithm of combination of variational autoencoder and other machine learning Classifiers. J. Alzheimer’s Dis. Rep. 2024, 8, 1434–1452. [Google Scholar] [CrossRef]
- Rutkowski, T.M.; Abe, M.S.; Komendzinski, T.; Sugimoto, H.; Narebski, S.; Otake-Matsuura, M. Machine learning approach for early onset dementia neurobiomarker using EEG network topology features. Front. Hum. Neurosci. 2023, 17, 1155194. [Google Scholar] [CrossRef]
- Özkaraca, O.; Bağrıaçık, O.İ.; Gürüler, H.; Khan, F.; Hussain, J.; Khan, J.; Laila, U.E. Multiple brain tumor classification with dense CNN architecture using brain MRI images. Life Basel Switz. 2023, 13, 349. [Google Scholar] [CrossRef]
- Cárdenas-Peña, D.; Collazos-Huertas, D.; Castellanos-Dominguez, G. Enhanced data representation by Kernel metric learning for dementia diagnosis. Front. Neurosci. 2017, 11, 413. [Google Scholar] [CrossRef]
- Zhang, C.; Yang, H.; Fan, C.-C.; Chen, S.; Fan, C.; Hou, Z.-G.; Chen, J.; Peng, L.; Xiang, K.; Wu, Y.; et al. Comparing multi-dimensional fNIRS features using Bayesian optimization-based neural networks for mild cognitive impairment (MCI) detection. IEEE Trans. Neural Syst. Rehabil. Eng. Publ. IEEE Eng. Med. Biol. Soc. 2023, 31, 1019–1029. [Google Scholar] [CrossRef]
- Zaman, F.; Ponnapureddy, R.; Wang, Y.G.; Chang, A.; Cadaret, L.M.; Abdelhamid, A.; Roy, S.D.; Makan, M.; Zhou, R.; Jayanna, M.B.; et al. Spatio-temporal hybrid neural networks reduce erroneous human “judgement calls” in the diagnosis of Takotsubo syndrome. eClinicalMedicine 2021, 40, 101115. [Google Scholar] [CrossRef]
- Neira-Rodado, D.; Nugent, C.; Cleland, I.; Velasquez, J.; Viloria, A. Evaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: A case study in human activity recognition. Sensors 2020, 20, 1858. [Google Scholar] [CrossRef]
- Alarjani, M.; Almarri, B. Multivariate pattern analysis of medical imaging-based Alzheimer’s disease. Front. Med. 2024, 11, 1412592. [Google Scholar] [CrossRef] [PubMed]
- Park, J.-H.; Park, K.-I.; Kim, D.; Lee, M.; Kang, S.; Kang, S.J.; Yoon, D.H. Improving performance robustness of subject-based brain segmentation software. Encephalitis 2023, 3, 24–33. [Google Scholar] [CrossRef] [PubMed]
- Kar, S.; Aich, U.; Singh, P.K. Efficient brain tumor classification using filter-based deep feature selection methodology. SN Comput. Sci. 2024, 5, 1033. [Google Scholar] [CrossRef]
- Gong, H.; Wang, Z.; Huang, S.; Wang, J. A simple self-supervised learning framework with patch-based data augmentation in diagnosis of Alzheimer’s disease. Biomed. Signal Process. Control 2024, 96, 106572. [Google Scholar] [CrossRef]
- Gyawali, P.K.; Le Guen, Y.; Liu, X.; Belloy, M.E.; Tang, H.; Zou, J.; He, Z. Improving genetic risk prediction across diverse population by disentangling ancestry representations. Commun. Biol. 2023, 6, 964. [Google Scholar] [CrossRef]
- Guelib, B.; Zarour, K.; Hermessi, H.; Rayene, B.; Nawres, K. Same-subject-modalities-interactions: A novel framework for MRI and PET multi-modality fusion for Alzheimer’s disease classification. IEEE Access 2023, 11, 48715–48738. [Google Scholar] [CrossRef]
- Sethuraman, S.K.; Malaiyappan, N.; Ramalingam, R.; Basheer, S.; Rashid, M.; Ahmad, N. Predicting Alzheimer’s disease using deep neuro-functional networks with resting-state fMRI. Electron. Switz. 2023, 12, 1031. [Google Scholar] [CrossRef]
- Fristed, E.; Skirrow, C.; Meszaros, M.; Lenain, R.; Meepegama, U.; Cappa, S.; Aarsland, D.; Weston, J. A Remote speech-based AI system to screen for early Alzheimer’s disease via smartphones. Alzheimer’s Dement. Diagn. Assess. Dis. Monit. 2022, 14, e12366. [Google Scholar] [CrossRef]
- Chang, Y.-W.; Natali, L.; Jamialahmadi, O.; Romeo, S.; Pereira, J.B.; Volpe, G. Neural network training with highly incomplete medical datasets. Mach. Learn. Sci. Technol. 2022, 3, 035001. [Google Scholar] [CrossRef]
- Zhang, X.; Han, L.; Zhu, W.; Sun, L.; Zhang, D. An Explainable 3D residual self-attention deep neural network for joint atrophy localization and Alzheimer’s disease diagnosis using structural MRI. IEEE J. Biomed. Health Inform. 2022, 26, 5289–5297. [Google Scholar] [CrossRef]
- Basheera, S.; Ram, M.S.S. Deep learning based Alzheimer’s disease early diagnosis using T2w segmented gray matter MRI. Int. J. Imaging Syst. Technol. 2021, 31, 1692–1710. [Google Scholar] [CrossRef]
- Akhtar, A.; Minhas, S.; Sabahat, N.; Khanum, A. A deep longitudinal model for mild cognitive impairment to Alzheimer’s disease conversion prediction in low-income countries. Appl. Comput. Intell. Soft Comput. 2022, 2022, 1419310. [Google Scholar] [CrossRef]
- Wen, D.; Cheng, Z.; Li, J.; Zheng, X.; Yao, W.; Dong, X.; Saripan, M.I.; Li, X.; Yin, S.; Zhou, Y. Classification of ERP signal from amnestic mild cognitive impairment with type 2 diabetes mellitus using single-scale multi-input convolution neural network. J. Neurosci. Methods 2021, 363, 109353. [Google Scholar] [CrossRef]
- Choi, H.-S.; Choe, J.Y.; Kim, H.; Han, J.W.; Chi, Y.K.; Kim, K.; Hong, J.; Kim, T.; Kim, T.H.; Yoon, S.; et al. Deep learning based low-cost high-accuracy diagnostic framework for dementia using comprehensive neuropsychological assessment profiles. BMC Geriatr. 2018, 18, 234. [Google Scholar] [CrossRef] [PubMed]
- Gallucci, M.; Spagnolo, P.; Aricò, M.; Grossi, E. Predictors of response to cholinesterase inhibitors treatment of Alzheimer’s disease: Date mining from the TREDEM registry. J. Alzheimers Dis. 2016, 50, 969–979. [Google Scholar] [CrossRef]
- Dong, K.; Liang, W.; Hou, T.; Lu, Z.; Hao, Y.; Li, C.; Qiu, Y.; Kong, N.; Cheng, Y.; Wen, Y.; et al. Exploring the impact of APOE ε4 on functional connectivity in Alzheimer’s disease across cognitive impairment levels. NeuroImage 2025, 305, 120951. [Google Scholar] [CrossRef] [PubMed]
- Jo, T.; Nho, K.; Risacher, S.L.; Saykin, A.J. Deep learning detection of informative features in Tau PET for Alzheimer’s disease classification. BMC Bioinform. 2020, 21 (Suppl. 21), 496. [Google Scholar] [CrossRef] [PubMed]
- Bi, X.-A.; Zhou, W.; Luo, S.; Mao, Y.; Hu, X.; Zeng, B.; Xu, L. Feature aggregation graph convolutional network based on imaging genetic data for diagnosis and pathogeny identification of Alzheimer’s disease. Brief. Bioinform. 2022, 23, bbac137. [Google Scholar] [CrossRef]
- Luo, M.; He, Z.; Cui, H.; Ward, P.; Chen, Y.-P.P. Dual attention based fusion network for MCI conversion prediction. Comput. Biol. Med. 2024, 182, 109039. [Google Scholar] [CrossRef]
- Han, K.; Luo, J.; Xiao, Q.; Ning, Z.; Zhang, Y. Light-weight cross-view hierarchical fusion network for joint localization and identification in Alzheimer’s disease with adaptive instance-declined pruning. Phys. Med. Biol. 2021, 66, 085013. [Google Scholar] [CrossRef]
- Irie, R.; Otsuka, Y.; Hagiwara, A.; Kamagata, K.; Kamiya, K.; Suzuki, M.; Wada, A.; Maekawa, T.; Fujita, S.; Kato, S.; et al. A novel deep learning approach with a 3D convolutional ladder network for differential diagnosis of idiopathic normal pressure hydrocephalus and Alzheimer’s disease. Magn. Reson. Med Sci. 2020, 19, 351–358. [Google Scholar] [CrossRef] [PubMed]
- Ntracha, A.; Iakovakis, D.; Hadjidimitriou, S.; Charisis, V.S.; Tsolaki, M.; Hadjileontiadis, L.J. Detection of mild cognitive impairment through natural language and touchscreen typing processing. Front. Digit. Health 2020, 2, 567158. [Google Scholar] [CrossRef]
- Song, Y.-H.; Yi, J.-Y.; Noh, Y.; Jang, H.; Seo, S.W.; Na, D.L.; Seong, J.-K. On the reliability of deep learning-based classification for Alzheimer’s disease: Multi-cohorts, multi-vendors, multi-protocols, and head-to-head validation. Front. Neurosci. 2022, 16, 851871. [Google Scholar] [CrossRef] [PubMed]
- Lim, B.E. De Novo Request Explained: The Complete Guide. Complizen. Available online: https://www.complizen.ai/post/de-novo-fda-pathway-complete-guide (accessed on 29 June 2025).
- Xing, X.; Rafique, M.U.; Liang, G.; Blanton, H.; Zhang, Y.; Wang, C.; Jacobs, N.; Lin, A.-L. Efficient training on Alzheimer’s disease diagnosis with learnable weighted pooling for 3D PET brain image classification. Electronics 2023, 12, 467. [Google Scholar] [CrossRef] [PubMed]
- Collins, G.S.; Reitsma, J.B.; Altman, D.G.; Moons, K.G.M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. Ann. Intern. Med. 2015, 162, 55–63. [Google Scholar] [CrossRef]
- Bossuyt, P.M.; Reitsma, J.B.; Bruns, D.E.; Gatsonis, C.A.; Glasziou, P.P.; Irwig, L.; Lijmer, J.G.; Moher, D.; Rennie, D.; de Vet, H.C.W.; et al. STARD 2015: An updated list of essential items for reporting diagnostic accuracy studies. Radiology 2015, 277, 826–832. [Google Scholar] [CrossRef]
Risk Category | N (%) | Accuracy Range | Accuracy Mean ± SD | AUC Range | External Validation |
---|---|---|---|---|---|
Low-Risks | 27 (61.4%) | 66–90% | 78.5% ± 7.2% | 0.75–0.93 | 5/27 (18.5%) |
Moderate Risk | 11 (25.0%) | 85–96% | 91.3% ± 4.1% | 0.89–0.97 | 2/11 (18.2%) |
High Risk | 6 (13.6%) | 95–99% | 97.1% ± 1.8% | 0.96–0.99 | 0/6 (0%) |
Modality | N (%) | Common Architectures | Mean Accuracy |
---|---|---|---|
sMRI only | 31 (70.5%) | 3D-CNN, ResNet | 84.2% |
Multimodal | 11 (25.0%) | Fusion networks | 87.6% |
EEG | 4 (9.1%) | CNN, LSTM | 91.3% |
PET | 4 (9.1%) | 3D-CNN | 85.5% |
Novel * | 4 (9.1%) | Transformers | 88.9% |
Method | N Studies | Technical Implementation | Clinical Validation |
---|---|---|---|
Grad-CAM | 8 (18.2%) | Heatmap generation | 1/8 (12.5%) |
Attention | 8 (18.2%) | Weight visualization | 0/8 (0%) |
SHAP | 2 (4.5%) | Feature importance | 1/2 (50%) |
LRP | 3 (6.8%) | Relevance propagation | 0/3 (0%) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Young, V.M.; Gates, S.; Garcia, L.Y.; Salardini, A. Data Leakage in Deep Learning for Alzheimer’s Disease Diagnosis: A Scoping Review of Methodological Rigor and Performance Inflation. Diagnostics 2025, 15, 2348. https://doi.org/10.3390/diagnostics15182348
Young VM, Gates S, Garcia LY, Salardini A. Data Leakage in Deep Learning for Alzheimer’s Disease Diagnosis: A Scoping Review of Methodological Rigor and Performance Inflation. Diagnostics. 2025; 15(18):2348. https://doi.org/10.3390/diagnostics15182348
Chicago/Turabian StyleYoung, Vanessa M., Samantha Gates, Layla Y. Garcia, and Arash Salardini. 2025. "Data Leakage in Deep Learning for Alzheimer’s Disease Diagnosis: A Scoping Review of Methodological Rigor and Performance Inflation" Diagnostics 15, no. 18: 2348. https://doi.org/10.3390/diagnostics15182348
APA StyleYoung, V. M., Gates, S., Garcia, L. Y., & Salardini, A. (2025). Data Leakage in Deep Learning for Alzheimer’s Disease Diagnosis: A Scoping Review of Methodological Rigor and Performance Inflation. Diagnostics, 15(18), 2348. https://doi.org/10.3390/diagnostics15182348