Vision Transformer-Based Identification for Early Alzheimer’s Disease and Mild Cognitive Impairment
Abstract
1. Introduction
- (1)
- We present Vi-ADiM, a framework that mitigates data scarcity by integrating cross-domain feature adaptation and task-driven augmentation. This approach ensures robust generalization on small-scale MRI datasets and defines a data-efficient paradigm for automated diagnosis.
- (2)
- We devise a Two-Stage Encoding Optimization Strategy to reconfigure the Transformer architecture, effectively mitigating parameter redundancy. By reducing encoder depth and implementing a dual-optimizer schedule, transitioning from SGDM to AdamW, our method balances model complexity against limited data availability to prevent overfitting.
- (3)
- We introduce a Global-Local interpretability mechanism that combines Grad-CAM++ and SHAP, which validates that our model’s decision logic aligns with distinct pathological biomarkers, thereby fostering clinical trust.
- (4)
- Our structural optimizations strike an optimal balance between efficiency and performance. Compared to the baseline, the proposed method reduces the number of parameters by 49.0% and FLOPs by 49.7% while maintaining superior diagnostic precision, making Vi-ADiM highly suitable for resource-constrained clinical deployment.
2. Related Work
2.1. Medical Imaging-Based Auxiliary Diagnostic Methods
2.2. Challenges in Training Models with Small Sample Medical Data
2.3. Transformer-Based Diagnosis of Alzheimer’s Disease and Mild Cognitive Impairment
2.4. Application of Interpretability Research in Medical Diagnosis
3. Proposed Method
3.1. Cross-Domain Feature Adaptation and Task-Driven Data Augmentation Strategy
3.1.1. Cross-Domain Feature Adaptation
3.1.2. Task-Driven Data Augmentation Strategy
3.2. Design of the AD and MCI Auxiliary Diagnosis Model
3.2.1. Preliminary Adaptation of the One-Stage Encoding Module
3.2.2. Two-Stage Fine-Tuning and Optimization
3.3. Integrated Interpretation
4. Analysis of Experimental Results
4.1. Experimental Platform and Parameter Configuration
4.2. Dataset Establishment
4.3. Evaluation Metrics
4.4. Experimental Results
4.4.1. Task-Driven Data Augmentation Comparative Experiment
4.4.2. Comparative Experiment on Fine-Tuning and Optimization of the Encoding Module
- (1)
- Preliminary adaptation comparison test of the phase one encoding module
4.4.3. Interpretability Analysis
4.4.4. Comparative Experiments of Different Models
5. Discussion
5.1. Synergizing Global-Local Features for High-Precision Diagnosis
5.2. Bridging the Trust Gap via Dual-Perspective Interpretability
5.3. Proposed Clinical Integration Workflow
5.4. Limitations and Critical Analysis
- (1)
- Data Partitioning and Leakage Risks: Strictly speaking, the current evaluation employed a slice-level random split strategy due to the finite sample size. We recognize that this approach introduces an intrinsic risk of data leakage stemming from intra-subject anatomical correlation, as adjacent slices from the same subject may reside in opposing data splits. Consequently, the reported metrics (e.g., accuracy > 99%) should be interpreted as the model’s upper bound for characterizing morphological features within this specific distribution, rather than a definitive measure of generalization to unseen subjects. Future work will strictly implement subject-level separation to assess generalization rigorously.
- (2)
- Dataset Diversity and External Validation: The model’s validation is currently confined to the ADNI cohort. The absence of external multi-center validation leaves the model’s resilience to heterogeneous acquisition protocols (e.g., variations in magnetic field strength and vendor-specific artifacts) unverified, potentially leading to overfitting to the specific domain characteristics of the ADNI dataset.
- (3)
- Clinical Validation: The proposed "Human-in-the-Loop" workflow remains a theoretical construct. As comparative studies with human clinicians (e.g., inter-rater variability analysis) have not yet been conducted, the practical utility of the model in a prospective decision-support scenario remains a hypothesis that necessitates empirical verification in future clinical trials.
- (4)
- Two-Dimensional Slice-based vs. Three-Dimensional Volumetric Analysis: This study relies on 2D slice-based analysis. While this paradigm is computationally efficient for extracting fine-grained texture anomalies, it inherently sacrifices volumetric spatial coherence along the z-axis. This limitation may impede the detection of pathological patterns that rely on inter-slice continuity, a gap that future 3D Vision Transformers could potentially address.
- (5)
- Parameter Efficiency: Although the Two-Stage Encoding Optimization achieved a 49% reduction in parameters compared to the ViT-Base baseline, the model’s complexity remains higher than that of lightweight CNNs. Further investigation into model quantization and pruning is essential to facilitate deployment on resource-constrained edge devices.
5.5. Future Perspectives
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Jack, C.R.; Andrews, S.J.; Beach, T.G.; Buracchio, T.; Dunn, B.; Graf, A.; Hansson, O.; Ho, C.; Jagust, W.; McDade, E.; et al. Revised criteria for the diagnosis and staging of Alzheimer’s disease. Nat. Med. 2024, 30, 2121–2124. [Google Scholar] [CrossRef]
- Wilcock, D.M.; Lamb, B.T. The importance of continuing development of novel animal models of Alzheimer’s disease and Alzheimer’s disease and related dementias. Alzheimers Dement. 2024, 20, 5078–5079. [Google Scholar] [CrossRef] [PubMed]
- Han, K.; Sheng, V.S.; Song, Y.Q.; Liu, Y.; Qiu, C.J.; Ma, S.Q.; Liu, Z. Deep semi-supervised learning for medical image segmentation: A review. Expert Syst. Appl. 2024, 245, 123052. [Google Scholar] [CrossRef]
- Li, Z.H.; Li, Y.X.; Li, Q.D.; Wang, P.Y.; Guo, D.Z.; Lu, L.; Jin, D.K.; Zhang, Y.; Hong, Q.Q. LViT: Language Meets Vision Transformer in Medical Image Segmentation. IEEE Trans. Med. Imaging 2024, 43, 96–107. [Google Scholar] [CrossRef] [PubMed]
- Xu, B.; Liu, X.; Gu, W.; Liu, J.; Wang, H. A fine segmentation model of flue-cured tobacco’s main veins based on multi-level-scale features of hybrid fusion. Soft Comput. 2024, 28, 10537–10555. [Google Scholar] [CrossRef]
- Behera, T.K.; Khan, M.A.; Bakshi, S. Brain MR Image Classification Using Superpixel-Based Deep Transfer Learning. IEEE J. Biomed. Health Inform. 2024, 28, 1218–1227. [Google Scholar] [CrossRef]
- Alsahafi, Y.S.; Kassem, M.A.; Hosny, K.M. Skin-Net: A novel deep residual network for skin lesions classification using multilevel feature extraction and cross-channel correlation with detection of outlier. J. Big Data 2023, 10, 105. [Google Scholar] [CrossRef]
- Haq, A.U.; Li, J.P.; Khan, I.; Agbley, B.L.Y.; Ahmad, S.; Uddin, M.I.; Zhou, W.; Khan, S.; Alam, I. DEBCM: Deep Learning-Based Enhanced Breast Invasive Ductal Carcinoma Classification Model in IoMT Healthcare Systems. IEEE J. Biomed. Health Inform. 2024, 28, 1207–1217. [Google Scholar] [CrossRef]
- Remigio, A.S. IncARMAG: A convolutional neural network with multi-level autoregressive moving average graph convolutional processing framework for medical image classification. Neurocomputing 2025, 617, 129038. [Google Scholar] [CrossRef]
- Wu, J.; Ma, J.Q.; Xi, H.R.; Li, J.B.; Zhu, J.H. Multi-scale graph harmonies: Unleashing U-Net’s potential for medical image segmentation through contrastive learning. Neural Netw. 2025, 182, 106914. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar] [CrossRef]
- Xu, B.; Yang, G. Interpretability research of deep learning: A literature survey. Inf. Fusion 2025, 115, 102721. [Google Scholar] [CrossRef]
- Hsieh, P.J. Determinants of physicians’ intention to use AI-assisted diagnosis: An integrated readiness perspective. Comput. Hum. Behav. 2023, 147, 107868. [Google Scholar] [CrossRef]
- Kara, O.C.; Xue, J.Q.; Venkatayogi, N.; Mohanraj, T.G.; Hirata, Y.; Ikoma, N.; Atashzar, S.F.; Alambeigi, F. A Smart Handheld Edge Device for On-Site Diagnosis and Classification of Texture and Stiffness of Excised Colorectal Cancer Polyps. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots And Systems, IROS, Detroit, MN, USA, 1–5 October 2023. [Google Scholar] [CrossRef]
- Liu, Z.; Yuan, Y.; Zhang, C.; Zhu, Q.; Xu, X.F.; Yuan, M.; Tan, W.J. Hierarchical classification of early microscopic lung nodule based on cascade network. Health Inf. Sci. Syst. 2024, 12, 13. [Google Scholar] [CrossRef] [PubMed]
- He, Q.Q.; Yang, Q.J.; Xie, M.H. HCTNet: A hybrid CNN-transformer network for breast ultrasound image segmentation. Comput. Biol. Med. 2023, 155, 106629. [Google Scholar] [CrossRef] [PubMed]
- Al-Fahdawi, S.; Al-Waisy, A.S.; Zeebaree, D.Q.; Qahwaji, R.; Natiq, H.; Mohammed, M.A.; Nedoma, J.; Martinek, R.; Deveci, M. Fundus-DeepNet: Multi-label deep learning classification system for enhanced detection of multiple ocular diseases through data fusion of fundus images. Inf. Fusion 2024, 102, 102059. [Google Scholar] [CrossRef]
- Odimayo, S.; Olisah, C.C.; Mohammed, K. Structure focused neurodegeneration convolutional neural network for modelling and classification of Alzheimer’s disease. Sci. Rep. 2024, 14, 15270. [Google Scholar] [CrossRef]
- Erdas, Ç.; Sümer, E.; Kibaroglu, S. Neurodegenerative disease detection and severity prediction using deep learning approaches. Biomed. Signal Process. Control 2021, 70, 103069. [Google Scholar] [CrossRef]
- Cheriet, M.; Dentamaro, V.; Hamdan, M.; Impedovo, D.; Pirlo, G. Multi-speed transformer network for neurodegenerative disease assessment and activity recognition. Comput. Methods Programs Biomed. 2023, 230, 107344. [Google Scholar] [CrossRef]
- Özdemir, E.Y.; Özyurt, F. Elasticnet-Based Vision Transformers for early detection of Parkinson’s disease. Biomed. Signal Process. Control 2025, 101, 107198. [Google Scholar] [CrossRef]
- Rashid, A.H.; Gupta, A.; Gupta, J.; Tanveer, M. Biceph-Net: A Robust and Lightweight Framework for the Diagnosis of Alzheimer’s Disease Using 2D-MRI Scans and Deep Similarity Learning. IEEE J. Biomed. Health Inform. 2023, 27, 1205–1213. [Google Scholar] [CrossRef]
- He, W.; Zhang, C.; Dai, J.; Liu, L.; Wang, T.; Liu, X.; Jiang, Y.; Li, N.; Xiong, J.; Wang, L.; et al. A statistical deformation model-based data augmentation method for volumetric medical image segmentation. Med. Image Anal. 2024, 91, 102984. [Google Scholar] [CrossRef] [PubMed]
- Xu, Z.; Tang, J.; Qi, C.; Yao, D.; Liu, C.; Zhan, Y.; Lukasiewicz, T. Cross-domain attention-guided generative data augmentation for medical image analysis with limited data. Comput. Biol. Med. 2024, 168, 107744. [Google Scholar] [CrossRef] [PubMed]
- Kim, H.E.; Cosa-Linan, A.; Santhanam, N.; Jannesari, M.; Maros, M.E.; Ganslandt, T. Transfer learning for medical image classification: A literature review. BMC Med. Imaging 2022, 22, 69. [Google Scholar] [CrossRef] [PubMed]
- Kora, P.; Ooi, C.P.; Faust, O.; Raghavendra, U.; Gudigar, A.; Chan, W.Y.; Meenakshi, K.; Swaraja, K.; Plawiak, P.; Rajendra Acharya, U. Transfer learning techniques for medical image analysis: A review. Biocybern. Biomed. Eng. 2022, 42, 79–107. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Al-Amidie, M.; Al-Asadi, A.; Humaidi, A.J.; Al-Shamma, O.; Fadhel, M.A.; Zhang, J.; Santamaría, J.; Duan, Y. Novel Transfer Learning Approach for Medical Imaging with Limited Labeled Data. Cancers 2021, 13, 1590. [Google Scholar] [CrossRef]
- Lai, Y.; Cao, A.; Gao, Y.; Shang, J.; Li, Z.; Guo, J. Advancing Efficient Brain Tumor Multi-Class Classification—New Insights from the Vision Mamba Model in Transfer Learning. arXiv 2024, arXiv:2410.21872. [Google Scholar] [CrossRef]
- Ieracitano, C.; Mammone, N.; Hussain, A.; Morabito, F.C. A Convolutional Neural Network based self-learning approach for classifying neurodegenerative states from EEG signals in dementia. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020. [Google Scholar] [CrossRef]
- Luo, M.; He, Z.; Cui, H.; Ward, P.; Chen, Y.-P.P. Dual attention based fusion network for MCI Conversion Prediction. Comput. Biol. Med. 2024, 182, 109039. [Google Scholar] [CrossRef]
- Liu, L.F.; Lyu, J.Y.; Liu, S.Y.; Tang, X.Y.; Chandra, S.S.; Nasrallah, F.A. TriFormer: A Multimodal Transformer Framework For Mild Cognitive Impairment Conversion Prediction. In Proceedings of the 2023 IEEE 20th International Symposium on Biomedical Imaging, ISBI, Cartagena, Colombia, 18–21 April 2023. [Google Scholar] [CrossRef]
- Hu, Z.T.; Wang, Z.; Jin, Y.; Hou, W. VGG-TSwinformer: Transformer-based deep learning model for early Alzheimer’s disease prediction. Comput. Methods Programs Biomed. 2023, 229, 107291. [Google Scholar] [CrossRef]
- Khatri, U.; Kwon, G.R. Diagnosis of Alzheimer’s disease via optimized lightweight convolution-attention and structural MRI. Comput. Biol. Med. 2024, 171, 108116. [Google Scholar] [CrossRef]
- Chen, J.D.; Wang, Y.; Zeb, A.; Suzauddola, M.D.; Wen, Y.X. Multimodal mixing convolutional neural network and Transformer for Alzheimer’s disease recognition. Expert Syst. Appl. 2025, 259, 125321. [Google Scholar] [CrossRef]
- Kun, Y.; Chunqing, G.; Yuehui, G. An Optimized LIME Scheme for Medical Low Light Level Image Enhancement. Comput. Intell. Neurosci. 2022, 2022, 9613936. [Google Scholar] [CrossRef]
- Kamal, M.S.; Dey, N.; Chowdhury, L.; Hasan, S.I.; Santosh, K.C. Explainable AI for Glaucoma Prediction Analysis to Understand Risk Factors in Treatment Planning. IEEE Trans. Instrum. Meas. 2022, 71, 2509209. [Google Scholar] [CrossRef]
- Deshmukh, S.; Behera, B.K.; Mulay, P.; Ahmed, E.A.; Al-Kuwari, S.; Tiwari, P.; Farouk, A. Explainable quantum clustering method to model medical data. Knowl.-Based Syst. 2023, 267, 110413. [Google Scholar] [CrossRef]
- Teneggi, J.; Luster, A.; Sulam, J. Fast Hierarchical Games for Image Explanations. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 4494–4503. [Google Scholar] [CrossRef] [PubMed]
- Tanone, R.; Li, L.H.; Saifullah, S. ViT-CB: Integrating hybrid Vision Transformer and CatBoost to enhanced brain tumor detection with SHAP. Biomed. Signal Process. Control. 2025, 100, 107027. [Google Scholar] [CrossRef]
- Gelir, F.; Akan, T.; Alp, S.; Gecili, E.; Bhuiyan, M.S.; Disbrow, E.A.; Conrad, S.A.; Vanchiere, J.A.; Kevil, C.G.; Bhuiyan, M.A.N.; et al. Machine Learning Approaches for Predicting Progression to Alzheimer’s Disease in Patients with Mild Cognitive Impairment. J. Med. Biol. Eng. 2024, 45, 63–83. [Google Scholar] [CrossRef]
- Yi, F.L.; Yang, H.; Chen, D.R.; Qin, Y.; Han, H.J.; Cui, J.; Bai, W.L.; Ma, Y.F.; Zhang, R.; Yu, H.M. XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease. BMC Med. Inform. Decis. Mak. 2023, 23, 137. [Google Scholar] [CrossRef]
- Zhu, Y.H.; Ma, J.B.; Yuan, C.A.; Zhu, X.F. Interpretable learning based Dynamic Graph Convolutional Networks for Alzheimer’s Disease analysis. Inf. Fusion 2022, 77, 53–61. [Google Scholar] [CrossRef]
- Li, H.X.; Shi, X.S.; Zhu, X.F.; Wang, S.H.; Zhang, Z. FSNet: Dual Interpretable Graph Convolutional Network for Alzheimer’s Disease Analysis. IEEE Trans. Emerg. Top. Comput. Intell. 2023, 7, 15–25. [Google Scholar] [CrossRef]
- Zhu, Q.; Xu, B.L.; Huang, J.S.; Wang, H.Y.; Xu, R.T.; Shao, W.; Zhang, D.Q. Deep Multimodal Discriminative and Interpretability Network for Alzheimer’s Disease Diagnosis. IEEE Trans. Med. Imaging 2023, 42, 1472–1483. [Google Scholar] [CrossRef]
- Liu, Y.; Gao, Y.; Yin, W. An Improved Analysis of Stochastic Gradient Descent with Momentum. arXiv 2020, arXiv:2007.07989. [Google Scholar] [CrossRef]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- Patro, S.; NishaV, M. Early Detection of Alzheimer’s Disease using Image Processing. Int. J. Eng. Res. Technol. 2019, 8, 468–471. [Google Scholar]
- Suk, H.-I.; Lee, S.-W.; Shen, D. Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. NeuroImage 2014, 101, 569–582. [Google Scholar] [CrossRef]
- Saraiva, C.; Praça, C.; Ferreira, R.; Santos, T.; Ferreira, L.; Bernardino, L. Nanoparticle-mediated brain drug delivery: Overcoming blood–brain barrier to treat neurodegenerative diseases. J. Control. Release 2016, 235, 34–47. [Google Scholar] [CrossRef]
- Helaly, H.A.; Badawy, M.; Haikal, A.Y. Toward deep MRI segmentation for Alzheimer’s disease detection. Neural Comput. Appl. 2022, 34, 1047–1063. [Google Scholar] [CrossRef]
- Rathore, S.; Habes, M.; Iftikhar, M.A.; Shacklett, A.; Davatzikos, C. A review on neuroimaging-based classification studies and associated feature extraction methods for Alzheimer’s disease and its prodromal stages. NeuroImage 2017, 155, 530–548. [Google Scholar] [CrossRef]
- Moradi, E.; Pepe, A.; Gaser, C.; Huttunen, H.; Tohka, J. Machine learning framework for early MRI-based Alzheimer’s conversion prediction in MCI subjects. NeuroImage 2015, 104, 398–412. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Wei, L.; Yangqing, J.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar] [CrossRef]
- Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.C.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019. [Google Scholar] [CrossRef]
- Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; Maaten, L.V.D.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
- Tan, M.X.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar] [CrossRef]
- Tan, M.X.; Le, Q.V. EfficientNetV2: Smaller Models and Faster Training. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021. [Google Scholar] [CrossRef]
- Xu, J.; Pan, Y.; Pan, X.; Hoi, S.; Yi, Z.; Xu, Z. RegNet: Self-Regulated Network for Image Classification. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 9562–9567. [Google Scholar] [CrossRef]
- Mehta, S.; Rastegari, M. MobileViT: Lightweight, General-purpose, and Mobile-friendly Vision Transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Emily Esther Rani, K.; Baulkani, S. Alzheimer disease classification using optimal clustering based pre-trained SqueezeNet model. Biomed. Signal Process. Control 2025, 100, 107032. [Google Scholar] [CrossRef]
- Hassan, N.; Miah, A.S.M.; Suzuki, K.; Okuyama, Y.; Shin, J. Stacked CNN-based multichannel attention networks for Alzheimer disease detection. Sci. Rep. 2025, 15, 5815. [Google Scholar] [CrossRef]
- Huang, H.; Pedrycz, W.; Hirota, K.; Yan, F. A multiview-slice feature fusion network for early diagnosis of Alzheimer’s disease with structural MRI images. Inf. Fusion 2025, 119, 103010. [Google Scholar] [CrossRef]
- Ul Haq, E.; Yong, Q.; Yuan, Z.; Xu, H.R.; Ul Haq, R. Multimodal fusion diagnosis of the Alzheimer’s disease via lightweight CNN-LSTM model using magnetic resonance imaging (MRI). Biomed. Signal Process. Control. 2025, 104, 107545. [Google Scholar] [CrossRef]
- Bai, T.; Du, M.; Zhang, L.; Ren, L.; Ruan, L.; Yang, Y.; Qian, G.; Meng, Z.; Zhao, L.; Deen, M.J. A novel Alzheimer’s disease detection approach using GAN-based brain slice image enhancement. Neurocomputing 2022, 492, 353–369. [Google Scholar] [CrossRef]






| Perspective | Method | Parameter Settings | Clinical Rationale |
|---|---|---|---|
| Image characteristics | Rotation | ±5° | Preserves anatomical orientation; prevents structural distortion of key brain regions. |
| Translation | Shift ≤ 0.05 × Size | Simulates minor patient head movement during scanning to enhance positional invariance. | |
| Random Cropping | Pad 4px to crop 224 × 224 | Maintains consistent input dimensions while preserving edge semantic integrity. | |
| Gaussian noise | Intensity σ = 0.05 | Simulates sensor thermal noise without masking subtle gray/white matter atrophy. | |
| Imaging equipment | Random Scaing | Scale ∈ [0.9, 1.1] | Adapts the model to varying voxel resolutions across different scanner vendors. |
| Color Jittering | Brightness/Contrast/Sat = 0.1 | Mimics signal-intensity variations caused by magnetic-field inhomogeneities. |
| Validation Assessment | Accuracy (%) | Precision (%) | Recall (%) | F1 Score (%) |
|---|---|---|---|---|
| Unenhanced | 99.196 ± 0.321 | 99.144 ± 0.447 | 99.122 ± 0.253 | 99.128 ± 0.325 |
| Enhance | 99.224 ± 0.299 | 99.130 ± 0.325 | 99.196 ± 0.194 | 99.160 ± 0.254 |
| Number | Accuracy (%) | Precision (%) | Recall (%) | F1 Score (%) | Parameters (M) | FLOPs (G) |
|---|---|---|---|---|---|---|
| 8 | 98.890 ± 0.639 | 98.812 ± 0.778 | 98.780 ± 0.531 | 98.790 ± 0.650 | 57.888 | 11.281 |
| 9 | 99.086 ± 0.442 | 99.022 ± 0.505 | 99.010 ± 0.336 | 99.010 ± 0.417 | 64.976 | 12.677 |
| 10 | 99.162 ± 0.524 | 99.246 ± 0.584 | 99.334 ± 0.312 | 99.288 ± 0.444 | 72.064 | 14.072 |
| 11 | 99.364 ± 0.311 | 99.332 ± 0.389 | 99.300 ± 0.302 | 99.314 ± 0.344 | 79.152 | 15.468 |
| 12 | 99.224 ± 0.299 | 99.130 ± 0.325 | 99.196 ± 0.194 | 99.160 ± 0.254 | 85.649 | 16.863 |
| Number | Accuracy (%) | Precision (%) | Recall (%) | F1 Score (%) | Params (M) | FLOPs (G) |
|---|---|---|---|---|---|---|
| 4 | 98.118 ± 0.543 | 98.138 ± 0.590 | 97.840 ± 0.763 | 97.978 ± 0.653 | 29.537 | 5.699 |
| 5 | 99.310 ± 0.177 | 99.300 ± 0.214 | 99.258 ± 0.128 | 99.278 ± 0.151 | 36.624 | 7.095 |
| 6 | 99.640 ± 0.163 | 99.630 ± 0.160 | 99.598 ± 0.188 | 99.610 ± 0.161 | 43.712 | 8.490 |
| Fold Number | Class | Precision (%) | Recall (%) | Specificity (%) |
|---|---|---|---|---|
| Fold1 | AD | 99.703 | 99.703 | 99.917 |
| NC | 99.537 | 99.537 | 99.820 | |
| MCI | 99.743 | 99.743 | 99.740 | |
| Fold2 | AD | 98.824 | 99.703 | 99.669 |
| NC | 99.07 | 98.611 | 99.641 | |
| MCI | 99.742 | 99.614 | 99.740 | |
| Fold3 | AD | 100.0 | 99.407 | 100.0 |
| NC | 98.843 | 98.843 | 99.551 | |
| MCI | 99.358 | 99.614 | 99.35 | |
| Fold4 | AD | 99.703 | 99.703 | 99.917 |
| NC | 99.768 | 99.537 | 99.910 | |
| MCI | 99.743 | 99.871 | 99.740 | |
| Fold5 | AD | 99.704 | 100.0 | 99.917 |
| NC | 99.769 | 99.769 | 99.910 | |
| MCI | 100.0 | 99.871 | 100.0 |
| Dataset | Accuracy (%) | Precision (%) | Recall (%) | F1 Score (%) |
|---|---|---|---|---|
| Validation set | 99.223 | 99.287 | 98.978 | 99.125 |
| Test Set | 99.225 | 99.099 | 99.319 | 99.201 |
| Validation and Test Set | 99.224 | 99.191 | 99.150 | 99.164 |
| Model | Version | Accuracy (%) | Precision (%) | Recall (%) | F1 Score (%) | Params (M) | FLOPs (G) |
|---|---|---|---|---|---|---|---|
| VGG | 16 | 94.262 ± 1.435 | 94.206 ± 1.114 | 93.234 ± 1.779 | 93.648 ± 1.472 | 134.273 | 15.466 |
| VGG | 19 | 95.896 ± 1.982 | 95.674 ± 2.391 | 95.224 ± 2.071 | 95.418 ± 2.225 | 139.583 | 19.628 |
| GoogleNet | - | 91.296 ± 1.276 | 90.504 ± 1.219 | 90.804 ± 1.691 | 90.560 ± 1.217 | 5.977 | 1.582 |
| ResNet | 34 | 99.444 ± 0.289 | 99.422 ± 0.293 | 99.416 ± 0.356 | 99.418 ± 0.287 | 21.286 | 3.678 |
| ResNet | 50 | 99.612 ± 0.159 | 99.672 ± 0.132 | 99.522 ± 0.206 | 99.596 ± 0.158 | 23.514 | 4.132 |
| MobileNetV2 | - | 99.030 ± 0.518 | 98.938 ± 0.607 | 98.972 ± 0.602 | 98.952 ± 0.595 | 2.228 | 0.326 |
| MobileNetV3 | Small | 95.816 ± 0.444 | 95.814 ± 0.742 | 94.798 ± 0.630 | 95.264 ± 0.557 | 1.521 | 0.061 |
| MobileNetV3 | Large | 99.364 ± 0.311 | 99.374 ± 0.254 | 99.234 ± 0.427 | 99.302 ± 0.339 | 4.206 | 0.233 |
| ShuffleNetV2 | 1.0 | 89.272 ± 1.337 | 88.372 ± 1.556 | 87.106 ± 1.285 | 87.536 ± 1.325 | 1.257 | 0.152 |
| ShuffleNetV2 | 2.0 | 98.778 ± 0.206 | 98.706 ± 0.186 | 98.612 ± 0.345 | 98.652 ± 0.187 | 5.351 | 0.596 |
| DenseNet | 121 | 99.308 ± 0.195 | 99.178 ± 0.243 | 99.284 ± 0.134 | 99.232 ± 0.188 | 6.957 | 2.896 |
| EfficientNet | B0 | 98.698 ± 0.255 | 98.440 ± 0.435 | 98.740 ± 0.150 | 98.778 ± 0.557 | 4.011 | 0.412 |
| EfficientNetV2 | Small | 99.306 ± 0.261 | 99.278 ± 0.308 | 99.246 ± 0.331 | 99.260 ± 0.309 | 20.181 | 2.897 |
| RegNet | - | 98.836 ± 0.408 | 98.704 ± 0.483 | 98.734 ± 0.393 | 98.716 ± 0.423 | 2.317 | 0.207 |
| ConvNext | Tiny | 98.918 ± 0.432 | 98.848 ± 0.601 | 98.806 ± 0.409 | 98.826 ± 0.489 | 27.801 | 4.455 |
| ConvNext | Base | 98.946 ± 0.361 | 99.066 ± 0.204 | 98.750 ± 0.569 | 98.920 ± 0.362 | 87.513 | 15.354 |
| MobileViT | Small | 98.450 ± 0.687 | 98.342 ± 0.904 | 98.166 ± 0.727 | 98.236 ± 0.820 | 4.940 | 1.464 |
| Swin Transformer | Tiny | 99.168 ± 0.501 | 99.208 ± 0.472 | 99.012 ± 0.639 | 99.104 ± 0.529 | 27.498 | 4.371 |
| Swin Transformer | Small | 98.928 ± 0.778 | 98.984 ± 0.778 | 98.796 ± 0.895 | 98.902 ± 0.785 | 48.792 | 8.544 |
| ViT | Base | 99.196 ± 0.321 | 99.144 ± 0.447 | 99.122 ± 0.253 | 99.128 ± 0.325 | 85.649 | 16.863 |
| Pre-trained SqueezeNet [66] | -- | 98.3 ± 1.05 | 98.9 ± 1.28 | 98.1 ± 1.45 | 98.6 ± 1.3 | -- | -- |
| SCCAN [67] | -- | 99.58 | 99.58 | 99.58 | 99.66 | -- | -- |
| HDFE+FEA+MFF [68] | -- | 96.14 ± 0.31 | 96.33 ± 0.13 | 95.39 ± 0.42 | 95.74 ± 0.27 | -- | -- |
| Lightweight CNN-LSTM [69] | -- | 92.30 | 92.10 | 92.20 | 92.25 | -- | -- |
| BSGAN-ADD [70] | -- | 98.60 | 98.20 | 99.70 | 99.10 | -- | -- |
| Vi-ADiM | - | 99.640 ± 0.163 | 99.630 ± 0.160 | 99.598 ± 0.188 | 99.610 ± 0.161 | 43.712 | 8.490 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Li, Y.; Xu, B.; Bai, Q.; Liu, Z.; Zhu, J.; Chen, Q. Vision Transformer-Based Identification for Early Alzheimer’s Disease and Mild Cognitive Impairment. Information 2026, 17, 129. https://doi.org/10.3390/info17020129
Li Y, Xu B, Bai Q, Liu Z, Zhu J, Chen Q. Vision Transformer-Based Identification for Early Alzheimer’s Disease and Mild Cognitive Impairment. Information. 2026; 17(2):129. https://doi.org/10.3390/info17020129
Chicago/Turabian StyleLi, Yang, Biao Xu, Qiang Bai, Zhenghong Liu, Junfeng Zhu, and Qipeng Chen. 2026. "Vision Transformer-Based Identification for Early Alzheimer’s Disease and Mild Cognitive Impairment" Information 17, no. 2: 129. https://doi.org/10.3390/info17020129
APA StyleLi, Y., Xu, B., Bai, Q., Liu, Z., Zhu, J., & Chen, Q. (2026). Vision Transformer-Based Identification for Early Alzheimer’s Disease and Mild Cognitive Impairment. Information, 17(2), 129. https://doi.org/10.3390/info17020129

