Patient Diagnosis Alzheimer’s Disease with Multi-Stage Features Fusion Network and Structural MRI

Nguyen, Thi My Tien; Bui, Ngoc Thang

doi:10.3390/jdad2040035

Open AccessArticle

Patient Diagnosis Alzheimer’s Disease with Multi-Stage Features Fusion Network and Structural MRI

by

Thi My Tien Nguyen

¹ and

Ngoc Thang Bui

^2,3,*

¹

Department of Pediatrics of Hospital for Tropical Diseases, Ho Chi Minh City 700000, Vietnam

²

Department of Radiology of Mayo Clinic, Rochester, MN 55901, USA

³

Institute of Engineering of HUTECH University, Ho Chi Minh City 700000, Vietnam

^*

Author to whom correspondence should be addressed.

J. Dement. Alzheimer's Dis. 2025, 2(4), 35; https://doi.org/10.3390/jdad2040035

Submission received: 13 June 2025 / Revised: 18 August 2025 / Accepted: 5 September 2025 / Published: 1 October 2025

Download

Browse Figures

Versions Notes

Abstract

Background: Timely intervention and effective control of Alzheimer’s disease (AD) have been shown to limit memory loss and preserve cognitive function and the ability to perform simple activities in older adults. In addition, magnetic resonance imaging (MRI) scans are one of the most common and effective methods for early detection of AD. With the rapid development of deep learning (DL) algorithms, AD detection based on deep learning has wide applications. Methods: In this research, we have developed an AD detection method based on three-dimensional (3D) convolutional neural networks (CNNs) for 3D MRI images, which can achieve strong accuracy when compared with traditional 3D CNN models. The proposed model has four main blocks, and the multi-layer fusion functionality of each block was used to improve the efficiency of the proposed model. The performance of the proposed model was compared with three different pre-trained 3D CNN architectures (i.e., 3D ResNet-18, 3D InceptionResNet-v2, and 3D Efficientnet-b2) in both tasks of multi-/binary-class classification of AD. Results: Our model achieved impressive classification results of 91.4% for binary-class as well as 80.6% for multi-class classification on the Open Access Series of Imaging Studies (OASIS) database. Conclusions: Such results serve to demonstrate that multi-stage feature fusion of 3D CNN is an effective solution to improve the accuracy of diagnosis of AD with 3D MRI, thus enabling earlier and more accurate diagnosis.

Keywords:

3D CNN; multi-features fusion; transfer learning; Alzheimer’s disease with 3D MRI

1. Introduction

The World Alzheimer Report 2018 estimated that, globally, one person develops dementia every three seconds. In 2018, approximately 50 million individuals were living with dementia, a figure projected to more than triple to 152 million by 2050. The most common causes of dementia include Alzheimer’s disease (AD), vascular dementia (VaD), and dementia with Lewy bodies (DLB) [1]. While both the prevalence and incidence of dementia increase with advancing age, approximately 5% of cases have an onset before the age of 65 [2]. AD is the leading cause of dementia, accounting for 60% to 80% of all cases, and represents one of the primary contributors to physical and mental health decline among the elderly worldwide [3]. Importantly, AD is an irreversible neurodegenerative disorder, and current therapeutic options can only delay disease progression rather than reverse it [4]. Early diagnosis of AD is therefore critical—not only to allow individuals and their families to plan and adapt but also because accurate and timely diagnosis is essential for the development and implementation of future disease-modifying treatments [5,6].

Alzheimer’s disease (AD), a neurodegenerative disorder with an unclear pathological etiology, remains challenging to diagnose definitively through noninvasive clinical methods [7,8,9]. Several established imaging techniques, most notably magnetic resonance imaging (MRI) and positron emission tomography (PET), have been employed in recent years to support the early diagnosis of AD [10,11]. However, PET imaging presents several limitations: while it offers functional insights, its spatial resolution is limited in certain contexts, making precise localization of pathological changes difficult [1]. Additionally, the high cost and limited availability of PET scanners pose significant challenges for widespread clinical and research use. In contrast, structural MRI offers high spatial resolution, widespread availability, and the ability to visualize and quantify gray matter atrophy, making it a powerful tool for tracking AD progression [1,9,11]. As a result, MRI-based approaches are gaining increasing attention in AD research, with significant potential for advancing diagnosis and monitoring strategies on a global scale.

In recent years, the trend is to use deep learning (DL) and machine learning (ML) techniques, which have made breakthroughs in medical image analysis [12,13,14]. The application of ML and DL algorithms in AD detection research has achieved many surprising results [15]. In previous studies, ML algorithms, including support vector machine (SVM), K-nearest neighbors (KNN), and decision tree (DT), have achieved promising results in AD diagnosis using structural MRI images [14,16]. In the past decade, convolutional neural networks (CNNs) have shown significant potential in AD diagnosis. In addition, for the image classification task based on ML/DL applications, the combination of different types of features also significantly increases the accuracy of the model [17,18]. The application of CNNs in AD detection based on MRI images is mainly divided into two main methods. First, the 3D CNN [19] method can automatically learn discriminative features for region of interest (ROI)-based AD diagnosis without manual feature extraction [20], and the 3D CNN can automatically segment the hippocampus and classify AD using structural MRI images. Second, the 2D CNN [1,12,21] is used to classify AD based on three views of 3D MRI images. The advantage of the 3D CNN method is that it directly analyzes 3D MRI images, while the 2D CNN needs to analyze 3D MRI images from three different views (i.e., axial, coronal, and sagittal) and then synthesize them [22]. The synthesis of results from analysis in three different views is likely to lead to inaccurate prediction results. However, the 2D CNN method has a lighter structure than the 3D CNN [9], thus significantly saving hardware resources and implementation time. Therefore, the development of a lightweight 3D CNN that combines the advantages of 3D CNN and 2D CNN in AD detection with 3D MRI images shows great development potential. We have summarized and compared the most recent studies on AD classification models based on 2D CNN and 3D CNN in Table 1 below.

Our main motivation for this research remains to develop an approach that combines the strengths of both 2D CNN and 3D CNN. By leveraging the advantages of each approach, we aim to create a more robust and accurate framework for early diagnosis and classification of Alzheimer’s disease using structural MRI data. This paper describes the following work that was performed:

We proposed a multi-stage feature fusion-based 3D CNN model that fuses feature extraction across multiple blocks of a CNN to avoid information loss and improve classification accuracy [32].
Optimize model parameters to achieve high computational efficiency at low cost.
Compare our proposed approach with other 3D CNN models applied in AD classification [33].
Preprocessing stage for structural MRI data to remove noise and reconstruct image with four feature image (i.e., white matter, grey matter, cerebrospinal fluid, and bias correction) to improve image quality and reduce computational complexity [34].

Our paper is structured as follows. Section 2 provides materials and methods. The results obtained from the proposed method and comparison with state-of-the-art methods are presented in Section 3. Section 4 provides a discussion and future research directions. In Section 5, we conclude the paper.

2. Materials and Methods

2.1. The Open-Access Series of Imaging Studies (OASIS) Dataset

In this study, we used the Open Access Series of Imaging Studies (OASIS) database [35,36] for the experiments with the proposed models. The OASIS database includes more than 400 subjects over 4 different patient groups with Clinical Dementia Rating (CDR) (i.e., Non-Demented (CDR = 0), Very Mild Demented (CDR = 0.5), Mild Demented (CDR = 1.0), and Moderate Demented (CDR = 2.0)). However, the Moderate Demented (CDR = 2.0) group only included 2 patients, so we excluded this group from the experiments. Details of the OASIS database are described in Table 2 below.

2.2. Data Preprocessing

The OASIS database was processed using SPM12 (Statistical Parametric Mapping v.12) software [34], including realignment, co-registration, and spatial and intensity normalization steps. In addition, these images were segmented into Bias Correction, Gray Matter (GM), White Matter (WM), and Cerebrospinal Fluid (CSF) maps. In total, 464 3D MRI images (i.e., size of 91 × 109 × 91 voxels) were preprocessed and normalized according to the intensity range (0, 1). Finally, we synthesized the images into a database for evaluating models. The details of the analysis and synthesis process are given in Figure 1a. Figure 1b–d show examples of 3 different types of patient MRI images after preprocessing with 3 different view modes (i.e., axial, coronal, and sagittal).

2.3. Proposed Architecture of Classification Model Based on 3D CNN

Our proposed model processed 3D MRI images of size (91 × 109 × 91) voxels. Due to the single channel for MRI images, we set the size of the input image to (91 × 109 × 91 × 1). The input images were 3D cubes containing axial, sagittal, and coronal views of the brain. In detail, the proposed model consists of 4 blocks (i.e., Blocks 1 to 4). Each block consists of a 3D CNN layer (i.e., 3 × 3 × 3 kernel), a 3D Maxpooling layer (stride 2 × 2 × 2), and a Normalization layer, and a rectified linear unit (ReLU) activation function was used for all convolutional layers. The outputs of each block were connected to each other by a fusion block. Then, several fully connected (FC) layers were added after the last pooling layer to process the features. The 3D feature map was generated when the result was flattened and concatenated into a 1D vector as input to the FC layer. Finally, a softmax layer was added to the last FC layer and used to predict the class probabilities (i.e., normal, very mild, and mild) while fine-tuning with backpropagation. The model details are depicted in Figure 2 below.

2.4. Evaluation Metrics

This study used a stratified 5-fold CV technique to evaluate the model. In addition, various evaluation metrics were used to demonstrate the appropriate quality of the proposed model. The most common evaluation metrics used were the F1-score, accuracy, specificity, sensitivity, and precision, which are shown in [1,7,10,19]. We also evaluated our model using common evaluation metrics because these metrics are commonly used measures in the literature for DL models, which helps to compare our results with those of other studies.

3. Results

Based on the research results described in [33,37], we selected 3D ResNet-18, InceptionResNet-v2, and EfficientNet-b2 to compare with our proposed model (Figure 3). Our model is considered to have a much lighter structure, as a result of the optimization process in the proposed model compared to traditional models. In addition, we observed during the optimization of the proposed model that as the number of layers of the model increases, it can easily overfit the training data. This observation allows our proposed model to be much smaller in size but still achieve the same performance as the models with large structures. All networking was implemented with Python v3.12.5, TensorFlow framework v.2.18.0 and trained on a PC with an Intel^® Xeon (R) E5-2650 Processor paired with a NVIDIA Quadro P2000 GPU.

3.1. Preparation of Dataset for Training Proposed Classification Model

After preprocessing, OASIS is used to train and test 4-DL models. We divide the dataset into two types of multi-class and binary class for classification tasks. All evaluations of models were performed on these datasets. The details of the OASIS dataset for each classification task are presented in Table 3 and Table 4 below.

3.2. Evaluation Performance of Proposed Model

To evaluate the performance of our model with the other three models on the OASIS dataset, we set up hyperparameters to train all four models in both multi-class and binary classification modes. The details are presented in Table 5. We chose the ResNet-18 model to train simultaneously with our model and conducted a t-test to compare the performance of our model with a model of ResNet-18.

Figure 4 and Table 6 below detail the process of training and testing our model with 5-fold cross-validation and five evaluation metrics of binary classification. When observing Figure 4 and Table 6, we can see that from the fourth and fifth cross-validations, the model enters a steady state with quite stable acc and loss values as well as quite low values of other evaluation metrics.

Similarly to above, Figure 5 and Table 7 below describe in detail the process of training and testing our model with 5-fold cross-validation and five evaluation metrics for multi-class classification tasks. When observing Figure 5 and Table 7, we can see that from the fourth and fifth cross-validations, the model enters a steady state with quite stable acc and loss values as well as quite low values of other evaluation metrics. The t-test results show that our model has higher accuracy than the ResNet-18 model at each k-fold. However, at k = 3, k = 4, there is no significant difference between the two models.

In Table 8 below, we compare our model’s acc with ResNet-18 after performing k-fold cross-validation. We trained our model and ResNet-18 on k-1 folds and evaluated them on the remaining fold, then recorded the accuracy metric. Finally, we performed a t-test with the acc of these models. The t and p values show that our model has significantly higher (p < 0.05) accuracy than the Restnet-18 model after each k-fold.

3.3. Comparison of Performance of Proposed Model with Three Different Classification 3D CNN Models Based on Transfer Learning

The comparison of our model with three other transfer learning models based on 3D CNN is performed through four steps: first, we compare five evaluation metrics after training; second, we compare the performance of all four models with the test set in both multi-class and binary-class tasks; third, we compare the activation map with the grad cam method; and fourth, we compare the sizes of the four models.

3.3.1. Comparison Evaluation of Metrics After Training

In Table 9 below, we compare the training/validation values of the four models. The results show that all models have a train-acc greater than 93%, in which our model achieves a train-acc of 98.6%, while the train-acc of ResNet-18 is lowest, only reaching 93.64% for multi-class and 94.47% for binary class. The InceptionResNet-v2 model achieves the lowest validation-acc in both multi/binary classification tasks. Our model achieves a validation acc greater than 93% for both multi- and binary classes for 3D MRI images of the OASIS dataset with a trained model.

3.3.2. Comparison of Performance of Multi-Class and Binary Classification with Test Set

We present the classification performance of the selected models used in this study in Figure 6 (multi-class classification) and Figure 7 (binary classification). From the results, it is evident that our model has the best performance among all the reference models. The performances of ResNet-18, InceptionResNet-v2, and EfficientNet-b2 are outstanding for general image classification tasks, but in this study, they achieve low performance. This is because of the lack of a large enough training dataset due to the limited number of patients and the imbalanced AD-to-normal ratio in OASIS.

For multi-class classification, we also analyzed the incorrectly predicted subjects thoroughly, and we found that many of them had a Clinical Dementia Rating (CDR) of 0.5. When classifying subjects according to the CDR, 0.5 is associated with the very mild class, which is an intermediate transitional stage between normal and AD. Patients in this stage may transition to AD within a short time after the initial stage. This shows that the model achieves high accuracy with the AD/normal group but low accuracy for the very mild group, so we will focus on developing new methods for classifying the very mild group in our future work.

For binary classification, due to the more balanced amount of data in each class, the segmentation efficiency of the models is significantly improved. However, the ResNet-18 and IncaeptionReNet-v2 models still have a high false prediction rate for subjects with a CDR = 0.5. Our model has superior accuracy, with four features extracted from four different CNN layers, avoiding information loss and overcoming the disadvantage of data imbalance between classes of the OASIS dataset.

3.3.3. Comparison of Activation Maps with Grad Cam Method

In this part, we demonstrate several CAM-based images obtained from 3D MRI and four different DL models, which show the effectiveness of the feature fusion module. We compare the visualization of our proposed model with three traditional DL models in three classes of patients (i.e., very mild, mild, and normal) with three view modes. The proposed model used an attention mechanism-based multi-stage feature fusion module to realize fusion of local regional features of 3D MRI images. Figure 8 shows the specific visualization results. The above visualization results verify the effectiveness of the attention mechanism-based multi-stage fusion module. As can be seen from Figure 8, our proposed multi-feature fusion learning method focuses on the areas strongly related to AD, which verify that the model has been well trained, and the fusion information has been well learned to classify multi-class data.

3.3.4. Comparison of Model Sizes for Proposed Model and Three Transfer Learning Models Based on 3D CNN

In Table 10 below, we compare the sizes of the four 3D CNN-based classification models in this study. The results show that InceptionResNet-v2 is the model with the largest size (258.08 MB), which is much larger than our model (5.28 MB) by 48.87 times. Meanwhile, the classification performance of our model is still better. This shows that focusing on optimizing the model for specific tasks can give outstanding results. ResNet-18 and InceptionReNet-v2 have been studied and applied in many different studies. However, a large-sized model is not effective and often gives an average accuracy, but it has wider applicability.

4. Discussion

The main objective of this study was to develop a lightweight AD classification model based on 3D MRI images and improve the accuracy already achieved with the Alzheimer’s disease classification model compared to a normal control. This approach merged features extracted from different CNN layers used in most CNN-based classification models. The fused features were then fed into the classifier block, which outperforms in accuracy compared with other CNN approaches. The fused features from MRI images appear to be extremely effective in improving the accuracy and reliability of deep learning models, considering the data and computational constraints. The results confirm the effectiveness of the proposed approach. The proposed solution is a single and less complex model, achieving a comparable or better performance when compared to complex models.

In this study, the selection of ResNet-18, InceptionResNet-v2, and EfficientNet-b2 as benchmarks against the proposed CNN is justified for several key reasons: (1) ResNet-18 is a relatively shallow residual network that mitigates the vanishing gradient problem and allows for training deeper networks. It serves as a lightweight and fast baseline. In addition, InceptionResNet-v2 combines inception modules (multi-scale processing) with residual connections, allowing for efficient extraction of complex features. In addition, EfficientNet-b2 uses a convolutional scaling method in depth, width, and resolution, providing a state-of-the-art balance of accuracy and efficiency with fewer parameters and lower computational costs. By combining all three, we capture a wide range of model complexity, depth, and efficiency. (2) All three networks are pre-trained on ImageNet and are therefore ideal for transfer learning, especially in medical imaging tasks where labeled data is often limited. Therefore, fine-tuning the model also yields good results with small datasets. Comparing a proposed CNN with them validates its relative strengths in terms of accuracy, efficiency, and generalization.

Our proposed model performs better in both binary and multi-class classification than previous studies. Our model accuracy is 91.40% for binary classification and 80.6% for multi-class classification on a 3D MRI database at the patient level. Unlike other models, which evaluate at the slide level and then aggregate at the patient level, our study evaluates directly at the patient level, thereby improving the stability of the model.

As shown in Figure 3, our model only includes four 3D CNN blocks, which is a small architecture. Therefore, to increase the accuracy of the proposed model, we can increase the number of CNN blocks, thereby increasing the number of features extracted for each block. However, this method also increases the model size and shows the high customizability and applicability of the model. Moreover, the 3D MRI data is input into the model, meaning it is possible to evaluate the volumetric ROI affecting the diagnosis of AD (i.e., based on the activation map of Figure 8), which is a prominent feature compared to the model based on 2D CNN.

Figure 6 shows the experimental results of the four models in this study. We found that the classification performance of the very mild class is relatively low for all four models. We believe that the current dataset has a large imbalance between classes, and there is no significant difference in features between very mild and normal. Therefore, we plan to apply methods to solve this problem, including the synthetic minority oversampling technique (SMOTE) [38], two-dimensional Fourier Bessel series expansion-based empirical wavelet transform [39], radiomics feature [12], and class-weighted losses [40].

We have developed a classification model based on 3D CNN, which is different from 2D CNN models. Three-dimensional CNN models can extract spatial features, thereby significantly reducing the number of datasets required. In addition, our model is small in size, is easy to use to train parameters, and converges faster. This is the big advantage of 3D CNN compared to 2D CNN, which requires a training set of tens of thousands of data [41,42]. However, 3D CNN also requires a more complex data structure. Many other studies also require smaller datasets, which are summarized in Table 1, with the ADNI dataset including around 500 data.

In the future, we will develop the research in the following directions: 1. Testing the model with different datasets (OASIS-3/4 [35,36], Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu)) and aiming for multi-center research. 2. Analyzing the GM volume to see if there are any structural changes in these regions that can be used as biomarkers of different types of AD. Integrating more clinical data into the model to assess the progression of AD through different stages. Addressing these limitations requires continuous research, robust validation, and careful consideration of ethical and practical factors to improve the effectiveness and reliability of AI models in clinical practice.

5. Conclusions

In this paper, we have proposed a 3D CNN-based AD classification model that combines features extraction from different 3D CNN layers to improve the accuracy of the model. We rigorously evaluated our proposed technique using a commonly used training and validation protocol. The performance of our proposed technique ranks highly when compared to other 3D CNN-based classification models for AD classification. Most importantly, we further show that the model has a small size that is suitable for deployment on low-end hardware while still ensuring good accuracy. Our analysis was performed in detail for both multi- and binary-class classification in AD.

Author Contributions

Conceptualization, T.M.T.N. and N.T.B.; methodology, T.M.T.N.; software, T.M.T.N.; validation, N.T.B.; formal analysis, T.M.T.N.; investigation, T.M.T.N.; resources, T.M.T.N.; data curation, T.M.T.N.; writing—original draft preparation, T.M.T.N.; writing—review and editing, N.T.B.; visualization, N.T.B.; supervision, N.T.B.; project administration, N.T.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data were provided 1-12 by OASIS-1: Cross-Sectional: Principal Investigators: D. Marcus, R, Buckner, J, Csernansky J. Morris; P50 AG05681, P01 AG03991, P01 AG026276, R01 AG021910, P20 MH071616, U24 RR021382.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Huang, H.; Pedrycz, W.; Hirota, K.; Yan, F. A multiview-slice feature fusion network for early diagnosis of Alzheimer’s disease with structural MRI images. Inf. Fusion 2025, 119, 103010. [Google Scholar] [CrossRef]
Rahim, N.; El-Sappagh, S.; Ali, S.; Muhammad, K.; Del Ser, J.; Abuhmed, T. Prediction of Alzheimer’s progression based on multimodal Deep-Learning-based fusion and visual Explainability of time-series data. Inf. Fusion 2023, 92, 363–388. [Google Scholar] [CrossRef]
Fu, J.; Ferreira, D.; Smedby, Ö.; Moreno, R. Decomposing the effect of normal aging and Alzheimer’s disease in brain morphological changes via learned aging templates. Sci. Rep. 2025, 15, 11813. [Google Scholar] [CrossRef]
Cheng, J.; Wang, H.; Wei, S.; Mei, J.; Liu, F.; Zhang, G. Alzheimer’s disease prediction algorithm based on de-correlation constraint and multi-modal feature interaction. Comput. Biol. Med. 2024, 170, 108000. [Google Scholar] [CrossRef]
Raza, H.A.; Ansari, S.U.; Javed, K.; Hanif, M.; Mian Qaisar, S.; Haider, U.; Pławiak, P.; Maab, I. A proficient approach for the classification of Alzheimer’s disease using a hybridization of machine learning and deep learning. Sci. Rep. 2024, 14, 30925. [Google Scholar] [CrossRef]
Liu, S.; Masurkar, A.V.; Rusinek, H.; Chen, J.; Zhang, B.; Zhu, W.; Razavian, N. Generalizable deep learning model for early Alzheimer’s disease detection from structural MRIs. Sci. Rep. 2022, 12, 17106. [Google Scholar] [CrossRef]
Golovanevsky, M.; Eickhoff, C.; Singh, R. Multimodal attention-based deep learning for Alzheimer’s disease diagnosis. J. Am. Med. Inform. Assoc. 2022, 29, 2014–2022. [Google Scholar] [CrossRef]
Kim, J.S.; Han, J.W.; Bae, J.B.; Moon, D.G.; Shin, J.; Kong, J.E.; Lee, H.; Yang, H.W.; Lim, E.; Kim, J.Y.; et al. Deep learning-based diagnosis of Alzheimer’s disease using brain magnetic resonance images: An empirical study. Sci. Rep. 2022, 12, 18007. [Google Scholar] [CrossRef] [PubMed]
Kang, W.; Lin, L.; Sun, S.; Wu, S. Three-round learning strategy based on 3D deep convolutional GANs for Alzheimer’s disease staging. Sci. Rep. 2023, 13, 5750. [Google Scholar] [CrossRef] [PubMed]
Alinsaif, S.; Lang, J. Alzheimer’s Disease Neuroimaging Initiative, 3D shearlet-based descriptors combined with deep features for the classification of Alzheimer’s disease based on MRI data. Comput. Biol. Med. 2021, 138, 104879. [Google Scholar] [CrossRef]
Guan, H.; Wang, C.; Tao, D. MRI-based Alzheimer’s disease prediction via distilling the knowledge in multi-modal data. Neuroimage 2021, 244, 118586. [Google Scholar] [CrossRef]
Jytzler, J.A.; Lysdahlgaard, S. Radiomics evaluation for the early detection of Alzheimer’s dementia using T1-weighted MRI. Radiography 2024, 30, 1427–1433. [Google Scholar] [CrossRef] [PubMed]
Ur Rahman, J.; Hanif, M.; Ur Rehman, O.; Haider, U.; Mian Qaisar, S.; Pławiak, P. Stages prediction of Alzheimer’s disease with shallow 2D and 3D CNNs from intelligently selected neuroimaging data. Sci. Rep. 2025, 15, 9238. [Google Scholar] [CrossRef]
Chen, K.; Weng, Y.; Hosseini, A.A.; Dening, T.; Zuo, G.; Zhang, Y. A comparative study of GNN and MLP based machine learning for the diagnosis of Alzheimer’s Disease involving data synthesis. Neural Netw. 2024, 169, 442–452. [Google Scholar] [CrossRef] [PubMed]
Zhang, X.; Li, Z.; Zhang, Q.; Yin, Z.; Lu, Z.; Li, Y. A new weakly supervised deep neural network for recognizing Alzheimer’s disease. Comput. Biol. Med. 2023, 163, 107079. [Google Scholar] [CrossRef] [PubMed]
Bloch, L.; Friedrich, C.M. Alzheimer’s Disease Neuroimaging Initiative, Systematic comparison of 3D Deep learning and classical machine learning explanations for Alzheimer’s Disease detection. Comput. Biol. Med. 2024, 170, 108029. [Google Scholar] [CrossRef]
Park, C.; Jung, W.; Suk, H.I. Deep joint learning of pathological region localization and Alzheimer’s disease diagnosis. Sci. Rep. 2023, 13, 11664. [Google Scholar] [CrossRef]
Jenber Belay, A.; Walle, Y.M.; Haile, M.B. Deep Ensemble learning and quantum machine learning approach for Alzheimer’s disease detection. Sci. Rep. 2024, 14, 14196. [Google Scholar] [CrossRef]
Ebrahimi, A.; Luo, S.; Chiong, R. Introducing Transfer Learning to 3D ResNet-18 for Alzheimer’s Disease Detection on MRI Images. In Proceedings of the 35th International Conference on Image and Vision Computing New Zealand (IVCNZ), Wellington, New Zealand, 25–27 November 2020; pp. 1–6. [Google Scholar]
Cao, G.; Zhang, M.; Wang, Y.; Zhang, J.; Han, Y.; Xu, X.; Huang, J.; Kang, G. End-to-end automatic pathology localization for Alzheimer’s disease diagnosis using structural MRI. Comput. Biol. Med. 2023, 163, 107110. [Google Scholar]
Hussain, M.Z.; Shahzad, T.; Mehmood, S.; Akram, K.; Khan, M.A.; Tariq, M.U.; Ahmed, A. A fine-tuned convolutional neural network model for accurate Alzheimer’s disease classification. Sci. Rep. 2025, 15, 11616. [Google Scholar] [CrossRef]
Zhang, Y.; Peng, S.; Xue, Z.; Zhao, G.; Li, Q.; Zhu, Z.; Gao, Y.; Kong, L. Alzheimer’s Disease Neuroimaging Initiative, AMSF: Attention-based multi-view slice fusion for early diagnosis of Alzheimer’s disease. PeerJ Comput. Sci. 2023, 9, e1706. [Google Scholar] [CrossRef]
Priyadharshini, S.; Ramkumar, K.; Vairavasundaram, S.; Narasimhan, K.; Venkatesh, S.; Madhavasarma, P.; Kotecha, K. Bio-inspired feature selection for early diagnosis of Parkinson’s disease through optimization of deep 3D nested learning. Sci. Rep. 2024, 14, 23394. [Google Scholar] [CrossRef]
Parmar, H.; Nutter, B.; Long, R.; Antani, S.; Mitra, S. Spatiotemporal feature extraction and classification of Alzheimer’s disease using deep learning 3D-CNN for fMRI data. J. Med. Imaging 2020, 7, 056001. [Google Scholar] [CrossRef]
Pan, D.; Zeng, A.; Jia, L.; Huang, Y.; Frizzell, T.; Son, X.; Initia, A. Early Detection of Alzheimer’s Disease Using Magnetic Resonance Imaging: A Novel Approach Combining Convolutional Neural Networks and Ensemble Learning. Front. Neurosci. 2020, 14, 259. [Google Scholar] [CrossRef] [PubMed]
Oh, K.; Chung, Y.-C.; Kim, K.W.; Kim, W.-S.; Oh, I.-S. Classification and Visualization of Alzheimer’s Disease using Volumetric Convolutional Neural Network and Transfer Learning. Sci. Rep. 2019, 9, 18150. [Google Scholar] [CrossRef] [PubMed]
Balboni, E.; Nocetti, L.; Carbone, C.; Dinsdale, N.; Genovese, M.; Guidi, G.; Malagoli, M.; Chiari, A.; Namburete, A.I.L.; Jenkinson, M.; et al. The impact of transfer learning on 3D deep learning convolutional neural network segmentation of the hippocampus in mild cognitive impairment and Alzheimer disease subjects. Hum. Brain Mapp. 2022, 43, 3427–3438. [Google Scholar] [CrossRef] [PubMed]
Rogeau, A.; Hives, F.; Bordier, C.; Lahousse, H.; Roca, V.; Lebouvier, T.; Pasquier, F.; Huglo, D.; Semah, F.; Lopes, R. A 3D convolutional neural network to classify subjects as Alzheimer’s disease, frontotemporal dementia or healthy controls using brain 18F-FDG PET. Neuroimage 2024, 288, 120530. [Google Scholar] [CrossRef]
Alp, S.; Akan, T.; Bhuiyan, M.S.; Disbrow, E.A.; Conrad, S.A.; Vanchiere, J.A.; Kevil, C.G.; Bhuiyan, M.A.N. Joint transformer architecture in brain 3D MRI classification: Its application in Alzheimer’s disease classification. Sci. Rep. 2024, 14, 8996. [Google Scholar] [CrossRef]
Zhang, J.; Zheng, B.; Gao, A.; Feng, X.; Liang, D.; Long, X. A 3D densely connected convolution neural network with connection-wise attention mechanism for Alzheimer’s disease classification. Magn. Reson. Imaging 2021, 78, 119–126. [Google Scholar] [CrossRef]
Zhang, X.; Han, L.; Zhu, W.; Sun, L.; Zhang, D. An Explainable 3D Residual Self-Attention Deep Neural Network for Joint Atrophy Localization and Alzheimer’s Disease Diagnosis Using Structural MRI. IEEE J. Biomed. Health Inform. 2022, 26, 5289–5297. [Google Scholar] [CrossRef]
Muhammad, G.; Shamim Hossain, M. COVID-19 and Non-COVID-19 Classification using Multi-layers Fusion From Lung Ultrasound Images. Inf. Fusion 2021, 72, 80–88. [Google Scholar] [CrossRef]
Solovyev, R.; Kalinin, A.A.; Gabruseva, T. 3D convolutional neural networks for stalled brain capillary detection. Comput. Biol. Med. 2022, 141, 105089. [Google Scholar] [CrossRef]
Friston, K.J.; Holmes, A.P.; Worsley, K.J.; Poline, J.-P.; Frith, C.D.; Frackowiak, R.S.J. Statistical Parametric Maps in Functional Imaging: A General Linear Approach. Hum. Brain Mapp. 1994, 2, 189–210. [Google Scholar] [CrossRef]
Marcus, D.; Wang, T.H.; Parker, J.; Csernansky, J.G.; Morris, J.C.; Buckner, R.L. Open Access Series of Imaging Studies (OASIS): Cross-Sectional MRI Data in Young, Middle Aged, Nondemented, and Demented Older Adults. J. Cogn. Neurosci. 2007, 19, 1498–1507. [Google Scholar] [CrossRef]
Salami, F.; Bozorgi-Amiri, A.; Hassan, G.M.; Tavakkoli-Moghaddam, R.; Datta, A. Designing a clinical decision support system for Alzheimer’s diagnosis on OASIS-3 data set. Biomed. Signal Process. Control 2022, 74, 103527. [Google Scholar] [CrossRef]
Mishra, S.K.; Kumar, D.; Kumar, G.; Kumar, S. Multi-Classification of Brain MRI Using Efficientnet. In Proceedings of the 2022 International Conference for Advancement in Technology (ICONAT), Goa, India, 21–22 January 2022; pp. 1–6. [Google Scholar]
Matharaarachchi, S.; Domaratzki, M.; Muthukumarana, S. Enhancing SMOTE for imbalanced data with abnormal minority instances. Mach. Learn. Appl. 2024, 18, 100597. [Google Scholar] [CrossRef]
Chaudhary, P.K.; Pachori, R.B. Automatic diagnosis of glaucoma using two-dimensional Fourier-Bessel series expansion based empirical wavelet transform. Biomed. Signal Process. Control 2021, 64, 102237. [Google Scholar] [CrossRef]
Fernando, K.R.M.; Tsokos, C.P. Dynamically Weighted Balanced Loss: Class Imbalanced Learning and Confidence Calibration of Deep Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 2940–2951. [Google Scholar] [CrossRef] [PubMed]
Bouguerra, O.; Attallah, B.; Brik, Y. MRI-based brain tumor ensemble classification using two stage score level fusion and CNN models. Egypt. Inform. J. 2024, 28, 100565. [Google Scholar] [CrossRef]
Wang, W.R.; Pan, B.; Ai, Y.; Li, G.H.; Fu, Y.L.; Liu, Y.J. ParaCM-PNet: A CNN-tokenized MLP combined parallel dual pyramid network for prostate and prostate cancer segmentation in MRI. Comput. Biol. Med. 2024, 170, 107999. [Google Scholar] [CrossRef]

Figure 1. (a) Preprocessing scheme of MRI with SPM12; examples of (b) Normal, (c) Very Mild, and (d) Mild with three views: axial, coronal, and sagittal; Note: Bias Correction (i1) mean i1 value after applied bias correction method.

Figure 2. Proposed 3D fusion CNN for classification of Alzheimer’s disease with 3D MRI image.

Figure 3. Procedure of testing with 3D MRI and 3D CNN models.

Figure 4. Acc and loss of training and validation during training with binary classification of proposed 3D fusion CNN model.

Figure 5. Acc and loss of training and validation during training with multi-class classification of proposed fusion 3D CNN model.

Figure 6. Comparison of confusion matrices for multi-class classification: (a) 3D ResNet-18, (b) 3D InceptionResNet-v2, (c) 3D Efficientnet-b2, (d) proposed model.

Figure 7. Comparison of confusion matrices for binary classification: (a) 3D ResNet-18, (b) 3D InceptionResNet-v2, (c) 3D Efficientnet-b2, (d) proposed model.

Figure 8. Examples of activation maps with Grad CAM of three different subjects: (a) 3D ResNet-18, (b) 3D EfficientNet-b2, (c) 3D InceptionResNet-v2, (d) proposed model.

Table 1. AD classification research using 2D/3D CNN with 3D MRI imaging.

Reference	Year	Dataset	Image Type	Method	Classification Type	Accuracy (%)	No. of Layers/No. of Parameters
Priyadharshini et al. [23]	2024	PPMI database	MRI	3D CNN	Multi-class	97	23 layers
Parmar et al. [24]	2020	ADNI	fMRI	3D CNN	Multi-class	93	8 layers
Pan et al. [25]	2020	ADNI	MR	2D CNN	Multi-class	84	8 layers
Oh et al. [26]	2020	ADNI	MRI	Convolutional autoencoder (CAE)-based	Multi-class	86.6	371 K
Balboni et al. [27]	2022	ADNI	MRI	Spatial warping network segmentation (SWANS) 3D- CNN	Multi-class	90	10 layers
Rogeau et al. [28]	2024	ADNI	MRI	3D CNN	Multi-class	89.8	3D VGG16
Alp et al. [29]	2024	ADNI	MRI	Vision Transformer (ViT)	Multi-class	99	x
Zhang et al. [30]	2021	ADNI	MRI	2D CNN	Multi-class	78.79	Transfer learing with ResNet and DenseNet
Zhang et al. [31]	2021	ADNI	MRI	3D CNN	Multi-class	95.6	Transfer learing with 3D-ResAttNet34
Kang et al. [9]	2023	ADNI	MRI	3D Deep Convolutional Generative Adversarial Networks (DCGANs)	Multi-class	92.8	Transfer learing with 3D ResNet

Table 2. Demographic and CDR details of OASIS subjects.

Categories	CDR = 0	CDR = 0.5	CDR = 1.0	CDR = 2.0
Age (year)	43.8 ± 23.72	76.21 ± 7.14	77.75 ± 6.68	82.0 ± 4.0
Male	127	48	9	1
Female	209	52	19	1
Total	336	100	28	2

Table 3. OASIS dataset for binary classification.

	Normal	AD
Train/Validation	269	102
Test	67	26

Table 4. OASIS dataset for multi-class classification.

	CDR = 0 (Normal)	CDR = 0.5 (Very Mild)	CDR = 1.0 (Mild)
Train/Validation	269	80	16
Test	67	20	10

Table 5. Hyper-parameters of proposed 3D fusion CNN for training.

Hyper-Parameters	Value
Epoch	30
k-folds	5
Batch size	32
Learning rate	0.0001
Data augmentation	Rotation angles = (−20, −10, −5, 5, 10, 20)
Optimizer	Adam
Loss function	Cross-entropy
Metrics	Train_accuracy, train_loss, val_accuaracy, val_loss

Table 6. Five-fold results with 5 evaluation metrics during training with binary classification.

Parameter	Cross-Validation					Average (Mean ± SD) (95% CI)
Parameter	k = 1	k = 2	k = 3	k = 4	k = 5	Average (Mean ± SD) (95% CI)
F1-score	0.9278	0.9255	0.9407	0.9742	0.9927	0.9522 ± 0.0267 (0.9151–0.9892)
Accuracy	0.9265	0.9355	0.9523	0.9886	0.9901	0.9586 ± 0.0264 (0.9219–0.9953)
Specificity	0.9321	0.9269	0.9412	0.9623	0.9923	0.9510 ± 0.0239 (0.9177–0.9842)
Sensitivity	0.9123	0.9117	0.9320	0.9615	0.9957	0.9426 ± 0.0321 (0.8980–0.9872)
Precision	0.9256	0.9360	0.9307	0.9709	0.9961	0.9515 ± 0.0272 (0.9141–0.9897)

Table 7. Five-fold results with 5 evaluation metrics during training of model with multi-class classification.

Parameter	Cross-Validation					Average (Mean ± SD) (95% CI)
Parameter	k = 1	k = 2	k = 3	k = 4	k = 5	Average (Mean ± SD) (95% CI)
F1-score	0.9155	0.9156	0.8904	0.9604	0.9521	0.9268 ± 0.0259 (0.8909–0.9627)
Accuracy	0.9236	0.9298	0.8896	0.9769	0.9601	0.9360 ± 0.0303 (0.8939–0.9781)
Specificity	0.9228	0.9305	0.8936	0.9556	0.9678	0.9341 ± 0.0260 (0.8980–0.9701)
Sensitivity	0.9286	0.9288	0.8821	0.9774	0.9551	0.9344 ± 0.0319 (0.8902–0.9786)
Precision	0.9144	0.9159	0.8845	0.9654	0.9668	0.9294 ± 0.0320 (0.8850–0.9738)

Table 8. t-test for comparison of accuracy between our model and ResNet-18 model with k-fold cross-validation.

		k = 1	k = 2	k = 3	k = 4	k = 5	t-Test
Acc of Binary Classification	ResNet-18	0.7621	0.8022	0.8629	0.8782	0.8817	t-statistic: 9.2936 p-value: 0.0007
Acc of Binary Classification	Our	0.8721	0.9021	0.9384	0.9432	0.9496	t-statistic: 9.2936 p-value: 0.0007
Acc of Multi-class Classification	ResNet-18	0.7522	0.7715	0.8099	0.8267	0.8398	t-statistic: 8.5168 p-value: 0.001
Acc of Multi-class Classification	Our	0.8012	0.8823	0.9015	0.9218	0.9312	t-statistic: 8.5168 p-value: 0.001

Table 9. Comparison of training metrics of four models with multi-class classification of OASIS dataset.

Task	Evaluation Metrics	Model
Task	Evaluation Metrics	3D ResNet18	3D InceptionResNet-v2	3D Efficientnet-b2	Proposed
Multi-class Classification	Train Acc	0.9364	0.9921	0.9736	0.9860
	Validation Acc	0.8416	0.8098	0.9057	0.9360
	Train Loss	0.0261	0.0056	0.0098	0.0174
	Validation Loss	0.1651	0.1827	0.1398	0.1152
Binary Classification	Train Acc	0.9447	0.9949	0.9952	0.9886
	Validation Acc	0.8851	0.8604	0.9155	0.9586
	Train Loss	0.0056	0.0016	0.0029	0.0101
	Validation Loss	0.1125	0.1230	0.0879	0.0771

Table 10. Comparison of sizes of models.

Model	Trainable Params	Total Params
3D ResNet-18	33,236,548 (126.79 MB)	33,244,486 (126.82 MB)
3D InceptionResNet-v2	67,654,115 (258.08 MB)	67,714,659 (258.31 MB)
3D Efficientnet-b2	9,898,577 (37.76 MB)	9,980,865 (38.07 MB)
Proposed	1,387,929 (5.28 MB)	1,386,489

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nguyen, T.M.T.; Bui, N.T. Patient Diagnosis Alzheimer’s Disease with Multi-Stage Features Fusion Network and Structural MRI. J. Dement. Alzheimer's Dis. 2025, 2, 35. https://doi.org/10.3390/jdad2040035

AMA Style

Nguyen TMT, Bui NT. Patient Diagnosis Alzheimer’s Disease with Multi-Stage Features Fusion Network and Structural MRI. Journal of Dementia and Alzheimer's Disease. 2025; 2(4):35. https://doi.org/10.3390/jdad2040035

Chicago/Turabian Style

Nguyen, Thi My Tien, and Ngoc Thang Bui. 2025. "Patient Diagnosis Alzheimer’s Disease with Multi-Stage Features Fusion Network and Structural MRI" Journal of Dementia and Alzheimer's Disease 2, no. 4: 35. https://doi.org/10.3390/jdad2040035

APA Style

Nguyen, T. M. T., & Bui, N. T. (2025). Patient Diagnosis Alzheimer’s Disease with Multi-Stage Features Fusion Network and Structural MRI. Journal of Dementia and Alzheimer's Disease, 2(4), 35. https://doi.org/10.3390/jdad2040035

Article Menu

Patient Diagnosis Alzheimer’s Disease with Multi-Stage Features Fusion Network and Structural MRI

Abstract

1. Introduction

2. Materials and Methods

2.1. The Open-Access Series of Imaging Studies (OASIS) Dataset

2.2. Data Preprocessing

2.3. Proposed Architecture of Classification Model Based on 3D CNN

2.4. Evaluation Metrics

3. Results

3.1. Preparation of Dataset for Training Proposed Classification Model

3.2. Evaluation Performance of Proposed Model

3.3. Comparison of Performance of Proposed Model with Three Different Classification 3D CNN Models Based on Transfer Learning

3.3.1. Comparison Evaluation of Metrics After Training

3.3.2. Comparison of Performance of Multi-Class and Binary Classification with Test Set

3.3.3. Comparison of Activation Maps with Grad Cam Method

3.3.4. Comparison of Model Sizes for Proposed Model and Three Transfer Learning Models Based on 3D CNN

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI