Beyond Accuracy: Explainable Deep Learning for Alzheimer’s Disease Detection Using Structural MRI Data

Chakroborty, Tamal; Colafranceschi, Adam; Liu, Yang; for the Alzheimer’s Disease Neuroimaging Initiative,

doi:10.3390/info16121058

Open AccessArticle

Beyond Accuracy: Explainable Deep Learning for Alzheimer’s Disease Detection Using Structural MRI Data

by

Tamal Chakroborty

,

Adam Colafranceschi

,

Yang Liu

^*

and

for the Alzheimer’s Disease Neuroimaging Initiative

^†

Department of Physics and Computer Science, Wilfrid Laurier University, Waterloo, ON N2L 3C5, Canada

^*

Author to whom correspondence should be addressed.

^†

Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu, accessed on 7 August 2025). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf, accessed on 7 August 2025.

Information 2025, 16(12), 1058; https://doi.org/10.3390/info16121058

Submission received: 14 October 2025 / Revised: 8 November 2025 / Accepted: 28 November 2025 / Published: 2 December 2025

Download

Browse Figures

Versions Notes

Abstract

Alzheimer’s disease (AD) is a neurodegenerative condition that gradually deteriorates memory and cognitive abilities, posing a significant global health challenge. While convolutional neural networks (CNNs) applied to structural magnetic resonance imaging (MRI) have achieved high diagnostic accuracy, their decision-making processes often lack transparency, which can limit clinical trust. This study presents a structured evaluation framework by applying multiple gradient-based and model-agnostic interpretability methods, such as Grad-CAM, Grad-CAM++, HiRes-CAM, Backpropagation, Guided Backpropagation, Kernel SHAP, LIME, and RISE, to pre-trained and custom CNN architectures for AD classification. We utilized the ADNI MRI dataset and assessed models based on accuracy, sensitivity, specificity, and visual alignment of highlighted brain regions with established biomarkers. By analyzing both predictive performance and explanation validity, this study aims to assist clinicians in making informed diagnoses, ultimately strengthening trust in AI-assisted tools.

Keywords:

Alzheimer’s disease (AD); deep learning; model interpretability; convolutional neural networks (CNNs); saliency maps

Graphical Abstract

1. Introduction

AD is a neurodegenerative disorder that severely impairs memory and cognitive function, affecting an estimated 50 million people worldwide, with estimates potentially exceeding 150 million by 2050 [1,2]. In structural magnetic resonance imaging (MRI), AD is marked by neurodegeneration that begins in the temporal lobe and gradually spreads, making it difficult for professionals to differentiate between normal age-related atrophy and that caused by AD [3].

Deep learning models, particularly convolutional neural networks (CNNs), have achieved high accuracy in AD classification using MRI scans [4,5], yet their decision-making processes remain unexplained. In clinical settings, such “black-box” predictions hinder trust and adoption, as clinicians require interpretable outputs to validate diagnostic relevance [6].

Several explanation techniques have been developed to highlight image regions in an image that significantly influence the classification decision of deep learning models [6,7]. However, relying on a single explanation method can be risky, as their outputs vary across model architectures, hyperparameters, and datasets [8]. Additionally, different heat maps are backed by distinctive methods and can show varying or even contradictory patterns. Therefore, a comprehensive study is required to properly define an evaluation workflow that compares multiple Explanable AI (XAI) methods with sanity checks and quantifies region-level alignment with known biomarkers of AD.

This study addresses these gaps by proposing an integrated, end-to-end pipeline for AD detection, which provides a complete workflow from data processing and model evaluation to extensive post hoc analysis with saliency maps. This is our primary contribution. To construct and validate this pipeline, our objectives are to

Provide an optimized reference model for 3D AD classification using ADNI 3D structural MRI images.
Conduct a systematic comparison among a list of interpretation methods using consistent implementation and parameterization.
Establish a quantitative and qualitative evaluation protocol that incorporates sanity checks and comparisons of brain regions against established AD biomarkers.

2. Related Work

2.1. AD Diagnosis and MRI Biomarkers

Structural MRI is widely used for AD diagnosis due to its wide availability, non-invasiveness, and ability to indicate AD progression [9,10]. Classical approaches often use region-of-interest (ROI)-based volumetric analyses and statistical comparisons of brain structures using MRI scans. With the development of feature extraction tools like FreeSurfer [11], it has become more straightforward to extract various statistical data on brain regions for ROI-based analysis using machine learning algorithms [12,13]. However, this approach provides limited information on the areas of the brain involved in AD, as it relies on derived statistics instead of original images, potentially compromising spatial information. Furthermore, the data extraction process from MRI images is resource-intensive and time-intensive [4,13]. The alternative approach involves the use of deep learning algorithms.

2.2. Deep Learning Models for AD Classification

There are two deep learning approaches for Alzheimer’s disease detection: 2D and 3D models. In the 2D approach, 3D MRI scans are divided into 2D slices along axial, coronal, or sagittal planes, and models are trained on the selected slices [5,14,15]. While this method reduces computational costs and can deliver strong performance when the slices are carefully chosen, it risks losing inter-slice spatial context and may lead to biased results if slice selection is not optimal [4]. In contrast, the 3D approach processes the entire volume, thereby preserving anatomical relationships and enabling richer feature learning [3,13,15]. However, this method requires more memory and longer training times due to its high feature dimensionality [4].

2.3. Interpretability of Medical Imaging: Intrinsic vs. Post Hoc

Interpretability methods can be broadly categorized into intrinsic and post hoc approaches. Intrinsic models are interpretable by design, such as decision trees or linear models with sparse features. However, in deep learning for neuroimaging, intrinsic models are seldom used due to their limited ability to capture the extremely high-dimensional and complex patterns present in the data [16]. In contrast, post hoc interpretability methods are applied after training to explain predictions of black-box models, such as CNNs, recurrent neural networks, and transformers. Saliency-based techniques fall squarely within the post hoc category. In the context of medical imaging, saliency maps have become a leading form of post hoc interpretability, especially for CNNs, as they can validate whether the highlighted regions correspond to known disease biomarkers [3,17]. Post hoc approaches are further divided into Gradient-based & Perturbation-based/Model-agnostic methods.

2.3.1. Gradient-Based Methods

Gradient-based methods compute gradients or derivatives of the model output prediction with respect to the input features. Methods like Backpropagation [7], Guided-Backpropagation [18], Grad-CAM [19], Grad-CAM++ [20], and HiResCAM [21], provide visual explanations by highlighting regions that influence predictions the most. While these methods are computationally efficient, they are model-specific and can be sensitive to aspects such as network architecture, layer selection, and gradient noise. As a result, these models are better suited for architectures that are differentiable, such as CNNs.

2.3.2. Model-Agnostic Perturbation Methods

Perturbation-based/Model-agnostic methods treat the model as a black box, generating explanations by perturbing the input and observing changes in output. Techniques like LIME [22], SHAP (especially Kernel SHAP) [23], and RISE [24] fall into this category. These methods do not require access to model internals and can be applied to any predictive model directly by approximating or perturbing inputs, making them ideal for cross-model comparisons in AD pipelines. However, they can be computationally expensive for 3D MRI, as many perturbed samples are required to generate stable explanations.

2.3.3. Limitations and Pitfalls of Saliency Maps

Saliency-based explanations can vary dramatically across methods, models, and parameter settings [8]. Thus, there is no one-size-fits-all solution when it comes to interpreting deep learning models, especially in the context of medical imaging. Different models and datasets demand tailored explanation techniques, as a single method often fails to produce reliable or meaningful insights across all scenarios [25]. This has led researchers to explore a variety of saliency-based approaches and to evaluate their performance under different conditions. While prior work [26,27] has primarily focused on general frameworks or broad disease categories, tailored quantitative evaluation remains rare in AD classification deep learning mode. This motivates our framework, which combines gradient-based and model-agnostic approaches with both qualitative and quantitative assessments, including sanity checks and region-level biomarker alignment, tailored for AD detection.

3. Datasets

Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu, accessed on 7 August 2025). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD) [28].

In this work, we utilized structural MRI scans from the ADNI 1 cohort, focusing exclusively on AD and CN participants. We specifically used the “FreeSurfer Cross-Sectional Processing brainmask” series, which provides post-processed brain masks generated via Talairach image registration in FreeSurfer [11]. We selected this series because it offers standardized skull stripping, cortical/subcortical segmentation, and consistent alignment across the cohort. This approach reduces preprocessing variability and ensures anatomical comparability for interpretability analysis.

A total of 1300 brain masks were used, comprising 600 AD and 700 CN scans. The dataset was randomly split at the scan level into training (n = 1000), validation (n = 150), and test (n = 150) sets. Participants’ ages ranged from 55 to 93 years (mean = 76.58 ± 6.35 years). The gender distribution was 51.92% male (n = 675) and 48.08% female (n = 625).

In this study, we adopt the ADNI 1 cohort as our data source. This dataset is publicly available and has been widely used in Alzheimer’s disease research, which ensures comparability with prior studies and facilitates reproducibility. While later phases of ADNI include scans from additional sites and scanner types, focusing on ADNI 1 provides a more consistent imaging protocol, reducing heterogeneity in acquisition. We used scan-level sp. Finally, although the use of preprocessed volumes can introduce segmentation artifacts, it also standardizes the inputs, which is essential for a fair and reliable assessment of model performance. Overall, the adoption of ADNI 1 represents a pragmatic and well-established choice that strikes a balance between data availability, consistency, and comparability.

4. Framework

Figure 1 illustrates the overall workflow of this study. The process begins with assembling datasets of Alzheimer’s disease and normal controls, followed by image pre-processing steps such as determining maximum length, cropping, and padding. While these steps establish a clean input for modeling, the central motivation of the pipeline is twofold: first, to ensure the models achieve strong predictive accuracy, and second, to build upon that accuracy to enable meaningful interpretation. Model exploration is therefore carried out with different architectures and configurations to identify the most reliable baseline performance. On top of this, visualization methods are applied to analyze the model’s predictions, with the ultimate aim of providing explanations that clinicians and researchers can interpret. In short, the structure of the pipeline reflects the need to balance predictive accuracy with interpretability, so that explanations derived from the models can genuinely support diagnosis validation.

4.1. Image Preprocessing

All MRI scans used in this study were obtained from the ADNI 1 cohort, which was processed using FreeSurfer’s cross-sectional pipeline [29]. This processing included skull stripping, orientation standardization and resampling, as well as intensity normalization. Additionally, a quality check for the Talairach quality control (QC) transformation was performed to ensure proper spatial alignment. No further learning-based enhancement or manual threshold tuning was applied beyond the initially processed data.

In this study, 3D MRI brainmask images containing large surrounding black regions were preprocessed to improve their suitability for analysis. The goal was to reduce excess background while preserving the spatial integrity of brain structures, and to standardize all images to a uniform size through padding. This preprocessing consisted of three main steps: determining the target size, segmenting the brain region, and applying uniform padding, as outlined below.

1.: Finding Target Size: All the segmented brainmasks had the same shape of $256 \times 256 \times 256$ . Therefore, the first step involved determining the minimum dimensions needed to ensure that all brain regions in the dataset can fit without doing any spatial changes, such as zooming or interpolation. For each MRI scan, we calculated the maximum length required for each axis to accommodate the largest brain region present in the entire dataset. We need to assess each axis individually, as the area of the three planes is not uniform when considering only the brain region. The final target shape was found to be $174 \times 174 \times 190$ . In our findings, we have not encountered any FreeSurfer post-processed brain mask MRI scans available on ADNI that exceed these dimensions.
2.: Segmenting Brain Region: In the second step, we removed the black areas or voxels to isolate only the brain region. First, we filtered out the non-zero coordinates from the image. Then, we cropped the brain region based on the minimum and maximum non-zero coordinates along each axis. Additionally, we updated the header and affine details of the raw image to align with the segmented image. The sample output from this steps depicted on Figure 2.
3.: Padding for Uniformity: The MRI shapes after applying Step 2 were not uniform across the dataset. To make the MRI uniform across the dataset, we utilized the target shape computed in Step 1. If an MRI shape is smaller than the target shape along any axis, we added padding of 0 pixels symmetrically on both sides of the brain region for that axis. Additionally, before saving, we updated the header and affine details of the images. By implementing these adjustments, we ensured that all images in the dataset have a uniform shape, making them suitable for further use. The algorithms for Step 2 and Step 3 briefly explained in Algorithm 1.

Algorithm 1: Process 3D MRI Images

4.2. Reference Classifiers

The objective at this stage is to build a classification model that is as accurate as possible, since this model serves as the foundation for generating explanations in the next step. By ensuring high predictive performance, we aim to capture the underlying anatomical patterns reliably, so that subsequent interpretation is both meaningful and trustworthy. To this end, we constructed reference classifiers using three representative categories of models, reflecting both established baselines and more recent advances in deep learning. We did not extend this study to 2D approaches, as prior research has consistently demonstrated that 3D models capture volumetric anatomical structure more effectively and generally yield superior predictive accuracy, despite requiring longer training times and greater computational resources [15,30,31].

4.2.1. 3D CNN and DenseNet Variants

As a starting point, we implemented the classic 3D CNN architecture, which has been widely applied to volumetric medical imaging tasks. To further enhance feature propagation and mitigate vanishing gradients, we also included DenseNet variants [32], which employ dense connectivity across layers. This category provides a strong conventional baseline for 3D image classification.

Baseline 3D CNN: The baseline model illustrated in Figure 3 consists of four convolutional blocks, and three fully-connected layers [3]. Each block includes a convolution layer with a kernel size of 3 × 3 × 3, followed by batch normalization, a ReLU activation function, and max pooling. The progression of convolutional channels is as follows: 1 to 8, 8 to 16, 16 to 32, and 32 to 64, with pooling factors of 2, 3, 2, and 3, respectively.
After the convolutional blocks, the learned feature representations are then flattened and passed through three fully connected layers (128, 64, and 2 neurons, respectively), using ReLU activations. A dropout regularization (rate = 0.8) applied after the first dense layer. The final output layer produces logits for binary classification, which are optimized using cross-entropy loss.
Custom DenseNet: We experimented with a custom three-dimensional variant of DenseNet to train volumetric MRI inputs from scratch with a growth rate of 8 and a compression factor of 0.5 [33]. The architecture Figure 4 begins with an initial convolutional block that includes a 3 × 3 × 3 convolution with a stride of 1, followed by max pooling, batch normalization, and ReLU activation.
The feature extractor consists of two Dense Blocks, each containing three composite layers arranged in the following pattern: Batch Normalization (BN) → ReLU → 1 × 1 × 1 convolution → BN → ReLU → 3 × 3 × 3 convolution. After the first Dense Block, a transition layer (BN → ReLU → 1 × 1 × 1 convolution → average pooling) reduces both the channel dimension (using a compression factor of 0.5) and the spatial resolution. The second Dense Block, with the same pattern as the first block, is then applied, followed by a max pooling transition layer. The classifier head consists of a global flattening operation, dropout regularization (p = 0.5), and a fully connected linear layer mapping to the final output. ReLU activation is utilized throughout, except for the final layer, which is trained using cross-entropy loss, where softmax is applied implicitly.

4.2.2. ResNet-18 Pretrained on ImageNet

To leverage transfer learning, we adopted a ResNet-18 model pre-trained on the large-scale ImageNet dataset, which consists of 1000 classes of various objects [34]. Although ImageNet consists of 2D natural images, pretrained weights can serve as an effective initialization strategy [35], enabling faster convergence and potentially improved generalization compared to training from scratch. To transfer the learnable parameters to the 3D model, the 2D convolutional kernels were extended into the third dimension by replicating them along the depth dimension [36]. This design choice reflects the rationale of adapting well-established 2D feature representations to 3D medical imaging tasks.

Specifically, the 3D variant of ResNet-18 was developed by replacing all 2D convolutions, batch normalizations, and pooling layers with their 3D counterparts. The network shown in Figure 5 begins with a 7 × 7 × 7 convolution (stride 2, padding 3) producing 64 feature maps, followed by batch normalization, ReLU activation, and a 3 × 3 × 3 max pooling layer (stride 2). The residual backbone is arranged into four stages with [2, 2, 2, 2] basic blocks. Within each basic block, there are two 3 × 3 × 3 convolutions with batch normalization and ReLU activation. the First residual block of each layer performs downsampling except for first layer (L1 in Figure 5). Identity shortcuts are used when the input and output dimensions are the same, whereas projection shortcuts (1 × 1 × 1 convolutions with stride 2) are introduced in the first block of layers 2–4 to manage changes in spatial resolution. After the final residual stage, a global adaptive average pooling reduces the feature maps to a 512-dimensional vector. This is passed through a fully connected layer for binary classification. Later, we again fine-tuned the fully connected layer with our training dataset.

4.2.3. CNN with ResNet and Swin Transformer Pre-Training

Finally, to explore the benefit of combining convolutional and transformer-based architectures, we employed a CNN pretrained with ResNet and Swin Transformer backbones. Transformers have shown strong capability in modeling long-range dependencies, and when integrated with CNN features, they offer a complementary perspective for capturing both local and global anatomical patterns. Including this hybrid approach allowed us to investigate whether recent advances in vision transformers can further enhance Alzheimer’s disease classification performance. The architecture illustrated in Figure 6.

In this architecture, the convolutional stem consisted of the first three residual stages of a 3D ResNet-18 backbone, which was pre-trained and fine-tuned on our dataset. The architecture of the stem ResNet-18 has been explained in Section 4.2.2. These convolutional stages transformed the raw volumetric input into mid-level feature maps (256 channels), capturing robust local structural details. A 1 × 1 × 1 convolution was applied to reduce channel dimensionality before passing the feature maps to the transformer.

For the transformer stage, we employed a hierarchical 3D Swin Transformer encoder (SwinUNETR-style), configured with an embedding dimension of 12. The encoder comprised four successive stages, each containing two Swin Transformer blocks with windowed multi-head self-attention, MLP blocks, and residual connections. Patch merging layers progressively increased the channel dimensionality (12 → 24 → 48 → 96 → 192) while reducing spatial resolution. The final encoder output was globally aggregated using adaptive average pooling.

Then, a classification head, consisting of two fully connected layers (192 → 128 → 2) with ReLU activation and dropout (p = 0.3), was used to produce the final predictions. All convolutional and transformer weights were fine-tuned jointly during training, rather than freezing the CNN stem. The decision to truncate ResNet-18 after the third residual stage, rather than using the full backbone, was motivated by the need to balance local feature richness with sufficient spatial resolution for the Swin encoder. Empirically, using only two residual stages produced suboptimal results, as the transformer received overly coarse features. The three-layer design provided the best trade-off between feature granularity and compatibility with the transformer.

4.3. Interpretation Methods

Since the ultimate goal of this study is not only to achieve accurate classification but also to enable meaningful interpretation, we adopted a set of established explanation techniques to analyze and validate the model’s decisions. Interpretation is critical for assessing whether the classifier has captured disease-relevant anatomical patterns, rather than relying on spurious correlations. We investigated five gradient-based techniques: Grad-CAM, Grad-CAM++, HiResCAM, Backpropagation, and Guided Backpropagation—alongside three model-agnostic methods: SHAP, LIME, and RISE. These methods were applied to assess model predictions and to highlight the regions of the brain that are the most influential on the classification results. For consistency, we ensured that the input data for each method was identical in size and pre-processing, thereby avoiding potential sources of bias.

4.3.1. Gradient-Based Techniques

Gradient-based methods exploit the differentiable nature of neural networks to trace how changes in input features affect the model’s predictions. These techniques are computationally efficient and particularly suited for CNN-based architectures. We applied five such methods, with each method providing saliency maps or visual heatmaps that highlight the regions most influential to the model’s decisions. For CNN+Swin, although our final architecture includes a transformer block, gradient-based methods remain applicable because the entire network is differentiable, and activations can be selected at either the convolutional or transformer stage.

Grad-CAM (Gradient-weighted Class Activation Mapping) is a gradient-based visualization method that uses the gradients of a target class flowing into the final convolutional layers to create coarse localization heatmaps [19]. It offers visual explanations particular to a class by highlighting the areas of the input image that have the most influence on a model’s judgment. Because of this, it is especially useful in medical imaging, where it is crucial to detect anatomical regions that are discriminative. It is computationally light and easily integrable into current CNN designs because of its efficiency, which comes from using gradient information from a single forward and backward pass.
Grad-CAM++ is an improved gradient-based visualization method that uses higher-order gradients to handle scenarios with fine-grained features or numerous object instances [20]. It is appropriate for tasks requiring fine detail, such as medical image analysis, because it produces localization maps that are sharper and more precise. Its efficiency comes from using both first- and second-order gradients to capture more subtle spatial importance.
HiResCAM (High-Resolution Class Activation Mapping) is a visualization approach that relies on activation rather than gradients to enhance interpretability [21]. It is less noisy and more stable since it exploits class score differences caused by the removal of individual feature maps rather than backpropagation. Gradient noise is avoided, and each feature map’s direct class score contribution is the main focus, making it effective for high-resolution explanations, especially in medical imaging.
Backpropagation is a gradient-based visualization technique that calculates the output class score’s gradient in relation to the input image [7]. By highlighting each input pixel’s sensitivity to the prediction, it shows which areas of the image have the most effects on the model’s output. Because it is straightforward and uses the gradient directly, it is efficient; yet, it frequently generates noisy attribution maps that lack spatial context.
Guided Backpropagation is a gradient-based visualization method that modifies the backward run through ReLU layers to improve on normal backpropagation [18]. Sharper and more focused saliency maps are produced by limiting the flow of gradients across neurons that experienced positive activations in the forward pass. Class-discriminative localization is not available, although it is effective in bringing out the finer details in the input image.

4.3.2. Model-Agnostic Techniques

Model-agnostic methods do not rely on the internal structure of the model, making them applicable to a wide range of architectures. They typically assess the effect of perturbations or surrogate models to estimate feature importance. To complement gradient-based methods, we adopted three widely used model-agnostic approaches which provide an alternative perspective by approximating the contribution of different brain regions without requiring gradient information.

SHAP (SHapley Additive exPlanations) is a unified framework for interpreting machine learning models by computing the contribution of each feature to the model’s prediction [23]. Kernel SHAP is a model-agnostic variant that estimates SHAP values using a weighted linear regression approach based on Shapley values from cooperative game theory. It is particularly useful when model internals are inaccessible or opaque.
LIME (Local Interpretable Model-agnostic Explanations) explains individual predictions by fitting a local surrogate model around the prediction of interest [22]. It perturbs the input data and observes changes in model outputs to learn the importance of input features. This method is model-agnostic and widely used for interpreting complex black-box models.
RISE (Randomized Input Sampling for Explanation) is a saliency mapping technique designed to interpret deep neural networks, particularly in vision tasks [24]. It generates heatmaps by applying random masks to the input image and observing the resulting changes in prediction scores. The final explanation is computed as a weighted sum of these masks, weighted by the model’s output confidence for the masked input.

In summary, gradient-based methods leverage the differentiable structure to provide efficient, fine-grained saliency maps. In contrast, model-agnostic methods operate independently of the underlying architecture, offering broader applicability through perturbation-based analysis. By employing both categories, we capture complementary perspectives on model behavior, ensuring that the explanations are not only faithful to the network’s internal representations but also robust across methodological assumptions. This dual strategy strengthens confidence in the interpretability of the classification results.

5. Implementation

This section brings together all procedures related to implementation, setup, and the formal protocols used to assess the models and interpretations. For reproducibility, the implementation of the methodology made available in a public repository on GitHub (Version v1.0, developed by T. Chakroborty, incorporating code from A. Colafranceschi, Waterloo, ON, Canada) [37].

5.1. Experimental Setup

Deep learning models are computationally intensive and require significantly more resources than typical machine learning models, especially when working with 3D MRI data and models. To address the resource constraints associated with 3D MRI data, we conducted our experiments on the SHARCNET (Shared Hierarchical Academic Research Computing Network) [38]. For the ResNet-18, DenseNet, and baseline CNN models, all experiments were conducted using 1 to 2 NVIDIA A5000 GPUs, 16 GB of RAM, and 12 CPU cores. In contrast, for the transformer model, we used 4 NVIDIA A100 GPUs, 40 GB of system RAM, and 4 CPU cores.

To leverage the pre-trained weights and biases of Resnet-18 trained on the ImageNet dataset for transfer learning, the input images needed to be a compatible dimension of 224 × 224 × 224. To ensure a fair comparison across models and explainability analyses, we uniformly applied zero-padding (black pixels) to resize the preprocessed MRI volumes from 174 × 174 × 190 to 224 × 224 × 224.

All models were trained using the ADAM optimizer with hyperparameters specific to their architectures. The baseline 3D CNN and DenseNet models were trained with a learning rate of

1 \times 10^{- 5}

and a batch size of 5. For ResNet-18, we found the best performance using two different learning rates: 0.003 for the convolutional layers and 0.0003 for the fully connected layer. We also applied L2 regularization with a value of

5 \times 10^{- 4}

and trained with a batch size of 8. In contrast, the transformer-based Swin hybrid model was trained with a learning rate of 0.0003, a weight decay of

1 \times 10^{- 5}

, a dropout rate of 0.3, a feature size of 12, and a patch size of

16 \times 16 \times 16

, along with a learning rate scheduler.

Due to the substantial computational needs and long training times associated with these sophisticated 3D CNNs, cross-validation was not employed in this study. Instead, the dataset was divided into training, validation, and independent test sets. The validation set was used to monitor training progress and apply early stopping, while the independent held-out test set was utilized exclusively for final model evaluation.

5.2. Performance Metrics on Classification

Performance metrics are essential for evaluating classification models’ predictive performance. In this work, the performance of all classification models was evaluated using metrics derived from the components of the confusion matrix: True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). To gain a comprehensive understanding of the model performances, we utilized accuracy, recall, and the

F_{1}

-score Additionally, we provided a detailed analysis by reporting class-wise recall and class-wise

F_{1} - scores

, along with the weighted averages for both the CN and AD classes. The relevant equations are detailed in Equations (1)–(4).

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(1)

Precision = \frac{T P}{T P + F P}

(2)

Recall = \frac{T P}{T P + F N}

(3)

F_{1} - score = \frac{2 \times (Precision \times Recall)}{Precision + Recall}

(4)

5.3. Evaluation Protocol for Interpretation Methods

Understanding the output generated by interpretation methods based on 3D deep learning models can be quite challenging. Minimal visual observation alone is often insufficient for analyzing these models, as it can be risky and misleading [39]. Additionally, relying solely on 2D plotting for 3D interpretations limits the understanding, as there are considerable portions that need to be examined. Therefore, the explanations generated must be evaluated based on criteria such as faithfulness, robustness, and usefulness.

To assess the reliability of the explanation methods, we adopted three complementary evaluation protocols:

Faithfulness Tests—To verify that our interpretation methods were learning meaningful features rather than detecting model artifacts, we employed a model reinitialization sanity check [39]. For this check, we calculated the Pearson [40] and Spearman’s rank [41] correlation coefficients between the saliency map from the fully trained model and a map generated from the same model after its parameters were randomly reinitialized. A faithful saliency method is expected to yield a low CC score (closer to $0.0$ ) in the range of $- 1.0$ to $+ 1.0$ , confirming that the map’s structural patterns were derived from the model’s learned weights rather than from input artifacts or universal biases. Conversely, a score close to $+ 1.0$ or $- 1.0$ would indicate a strong positive or negative correlation, suggesting that the saliency map is heavily influenced by the model’s architecture or initialization instead of its learned weights.
Robustness Checks—To evaluate the robustness of the interpretation methods, we perturbed the input MRI samples by injecting additive Gaussian noise and applying a low-pass Gaussian filter.
The process for adding noise to each voxel $I (x, y, z)$ in the input volume is described by

$I_{n o i s y} (x, y, z) = I (x, y, z) + N$

(5)

where N is a random value sampled from a Gaussian distribution with a mean of zero and a standard deviation of $σ_{n o i s e}$ . The probability density function of this distribution is

$p (N) = \frac{1}{σ_{n o i s e} \sqrt{2 π}} e^{- \frac{N^{2}}{2 σ_{n o i s e}^{2}}}$

(6)

For the Gaussian filter, the MRI sample is convolved with a 3D Gaussian kernel, defined as

$G (x, y, z) = \frac{1}{{(\sqrt{2 π} σ)}^{3}} e^{- \frac{x^{2} + y^{2} + z^{2}}{2 σ^{2}}}$

(7)

where $(x, y, z)$ are the spatial coordinates relative to the center of the kernel and $σ$ is the standard deviation, which controls the amount of blurring. Finally, we compared the saliency maps from perturbed input with the baseline (unperturbed) maps using Spearman’s rank correlation coefficient [41]. Robust interpretation methods are expected to produce saliency maps that maintain high similarity between the original/baseline and the perturbed inputs.
Usefulness Verification—To provide anatomical interpretability, we mapped saliency maps to standard brain regions using an anatomical labeled atlas [42]. Region-wise hit rates were computed for structures commonly associated with Alzheimer’s disease (e.g., hippocampus, entorhinal cortex, temporal lobe), as suggested in prior neuroimaging studies [43].

6. Results

This section presents the empirical findings derived from the protocols defined in Implementation Section. The Result Analysis divided into Performance Analysis of the Classifiers, Qualitative Analysis of the Interpretations, Quantitative Evaluation of the Interpretations.

6.1. Performance Analysis of the Classifiers

Performance metrics can assess how well and accurately models perform. In this work, we evaluated five models on the test set: a baseline 3D CNN, DenseNet, ResNet-18, ResNet-18 with pre-trained data, and a Swin Transformer-based model. Table 1 summarizes the test accuracy, precision, recall, and

F_{1}

-score for each model.

Among the models trained from scratch, DenseNet achieved the best results, attaining an accuracy of 92% and a weighted

F_{1}

-score of 0.92. It outperformed the baseline CNN, which achieved an accuracy of 81.33%, as well as ResNet-18, which achieved an accuracy of 80.67% when trained from scratch. When applying transfer learning, ResNet-18 with pre-trained weights showed a significant boost in performance, achieving an accuracy of 95.33% and a weighted

F_{1}

-score of 0.95. In contrast, The Hybrid Swin Transformer performed failed to perform exceptionally well, even with the pre-trained stem from ResNet-18, reaching the overall accuracy of 92.67% and a weighted

F_{1}

-score of 0.93.

Figure 7 illustrates the class-wise Recall, also known as Sensitivity or True Positive Rate. Among the evaluated models, the Hybrid Swin Transformer demonstrated a balanced sensitivity across the classes, achieving a recall of 0.92 for CN and 0.94 for AD. In contrast, the Baseline CNN attained the highest recall for AD of 0.97 but it struggled with CN cases, scoring only 0.68. This suggests that the model misclassifies a significant proportion of CN samples.

On the other hand, ResNet-18 displayed better recall for CN at 0.88, but its performance for AD was comparatively lower at 0.74, revealing an imbalance in sensitivity. The ResNet (Pre-trained) model achieved the highest recall for Class CN at 0.98 and a competitive recall of 0.92 for Class AD. This indicates that transfer learning significantly enhances sensitivity to subtle structural variations in CN while maintaining strong performance for AD. DenseNet model demonstrated the similar pattern, achieving a recall of 0.89 for Class AD and a recall of 0.94 for Class CN.

Overall, these findings indicate that DenseNet, transfer learning-based ResNet-18, and transformer-based architectures offer balanced performance, making them ideal candidates for clinical diagnostic classification.

6.2. Qualitative Analysis of the Interpretations

We applied five gradient-based interpretation methods and three model-agnostic techniques to evaluate the decision-making processes of three high-performing models: DenseNet, pre-trained ResNet-18, and the Hybrid Swin Transformer. To demonstrate and compare the results, we assessed the interpretation methods across the models using a CN & AD MRI image. The assessment results are presented in this section, as well as in the quantitative analysis in Section 6.3.

Among the gradient-based interpretation methods considered in this study, Grad-CAM, Grad-CAM++, and HiResCAM are layer-dependent techniques, whereas Backpropagation and Guided Backpropagation provide pixel-level attributions with respect to the input space of models. To ensure fair visualization, the attribution maps from Backpropagation and Guided Backpropagation were intensity-scaled by clipping at the 99th percentile.

For DenseNet, we extracted feature maps from the last convolutional layer of the final dense block for the three layer-dependent methods, and the results of all interpretation techniques are presented in Section 6.2.1. For ResNet-18, we used the last convolutional layer of the final residual block for the layer-dependent methods, with corresponding results shown in Section 6.2.2. Finally, for the Hybrid Swin Transformer model, the normalization layer was used for the layer-dependent methods, and the results for all interpretation techniques are illustrated in Section 6.2.3.

In this paper, we present three slices of the overlaid interpretation results for each interpretation method, with attribution values highlighted in red, particularly in the sagittal, axial, and coronal planes. In this context, x, y, and z denote the slice numbers for the sagittal, axial, and coronal planes, respectively, with

x = 105

,

y = 105

, and

z = 97

representing the randomly selected slices for all visualizations of this section.

6.2.1. Interpretation of DenseNet

Figure 8 illustrates the saliency maps produced by various gradient-based interpretation methods. The model’s predictions primarily focused on brain regions across all imaging planes for both AD and CN samples, with no significant qualitative differences observed between the two groups. Grad-CAM++ generated more intense outputs compared to Grad-CAM and HiResCAM, while Backpropagation and Guided Backpropagation resulted in identical localization and intensity patterns. These findings indicate that Grad-CAM++ is effective in highlighting specific regions, while Backpropagation-based methods demonstrate consistency in their outputs for the DenseNet architecture and the dataset used in this study. Therefore, further quantitative analysis is required to corroborate these observations and assess their generalizability.

The results of the model-agnostic methods for DenseNet, depicted in Figure 9, were consistently noisy across all techniques: RISE, LIME, and SHAP. While LIME highlighted very thin brain areas, the other methods struggled to generalize effectively. This can be attributed to the high dimensionality of 3D MRI brain mask samples and the fact that AD effects are often subtle and spatially distributed (e.g., small localized changes in the hippocampus and temporal regions). Specifically, LIME’s use of thin supervoxel segmentation, combined with its reliance on linear local approximation, resulted in fragmented and unstable highlighted regions for different slices [44]. RISE, which relies on random binary masking to approximate importance, produced highly fragmented patterns in 3D that appeared as noise rather than coherent regions. Kernel SHAP, on the other hand, requires a very large number of perturbations to accurately estimate feature importance [45]. This process is computationally expensive, and given the complexity of DenseNet and the high feature dimensionality of data, it failed to yield stable and trustworthy explanations.

6.2.2. Interpretation of ResNet-18 (Pre-Trained)

As depicted in Figure 10, unlike the interpretation results obtained for DenseNet, the outputs from Grad-CAM, Grad-CAM++, and HiResCAM for ResNet-18 (Pre-trained) are less granular, with activations spreading across both brain and non-brain regions. This behavior is more noticeable in the interpretation of the CN sample. In contrast, Backpropagation and Guided Backpropagation highlighted distinct regions confined to the brain for both AD and CN samples.

The probable reason for this difference compared to DenseNet lies in the architectural design of the ResNet-18 model we used. Since ResNet-18 consists of four residual blocks, the feature map resolution after the final block is significantly smaller than that of DenseNet, leading to coarser Grad-CAM activations. In contrast, backpropagation-based methods calculate the gradients of the output with respect to the input voxel intensities directly, making them independent of feature map resolution and allowing for more localized activations. Between Backpropagation and Guided Backpropagation, there are slight differences observed in specific regions and slices for this model.

Figure 11 shows a similar pattern to DenseNet in terms of noisy and fragmented outputs generated by model-agnostic methods. However, for the CN sample, the SHAP output in the sagittal brain slice indicates that there is a more intense area within the brain region compared to the same sample analyzed by DenseNet.

6.2.3. Interpretation of Hybrid Swin Transformer

The gradient-based interpretation results of our transformer model are presented in Figure 12. Unlike DenseNet and ResNet-18, the transformer model struggled to produce coherent patterns with layer-dependent gradient-based methods, such as Grad-CAM, GradCAM++, and HiResCAM. This is understandable, as these methods depend on spatially consistent activation maps, whereas the Swin model divides the feature map into non-overlapping windows or patches. However, Backpropagation and Guided Backpropagation results indicate that the predictions are not random and are based on actual brain regions.

To verify whether the stem section of the network contains meaningful activations for layer-dependent gradient-based methods, we analyzed Grad-CAM, Grad-CAM++, and HiResCAM outputs for the final convolutional layer of the last sequential residual block. The results, illustrated in Figure 13, show that the stem passes feature maps with activations originating from brain regions. For AD predictions, Grad-CAM and Grad-CAM++ highlighted the same brain regions. In contrast, for CN predictions, the highlighted regions differed between the two methods. This variability likely arises because Grad-CAM++ weights gradients differently than Grad-CAM, amplifying minor differences in the areas of influence.

On the contrary, the results of the model-agnostic methods shown in Figure 14 are similar to those observed for ResNet-18. However, for the CN sample, SHAP highlighted more extensive areas of the brain compared to both DenseNet and ResNet-18. The more extensive SHAP highlights for CN samples may reflect that the model has learned generalizable features, which could contribute to the highest CN recall observed for the Hybrid Swin Transformer.

6.3. Quantitative Analysis of the Interpretations

According to the results of our models and the qualitative analysis of their interpretations, both DenseNet and the pre-trained ResNet-18 demonstrated consistent predictive performance (accuracy,

F_{1}

, & Recall) and interpretation patterns across gradient-based methods, with activations predominantly localized to brain regions. In contrast, although the Hybrid Swin Transformer model achieved accuracy comparable to DenseNet, even with pre-trained ResNet-18 weights in its stem, it did not generalize as effectively as the ResNet-18 (Pre-trained) version. Furthermore, the outcomes from model-agnostic interpretation methods (SHAP, LIME, RISE) were largely noisy and failed to deliver coherent insights for all the models, underscoring the discrepancies between gradient-based and model-agnostic approaches.

To build upon these qualitative findings, this section presents a quantitative evaluation of interpretation methods. The goal is to systematically measure how reliable and meaningful the explanations are, beyond visual inspection. To ensure a fair and informative comparison, the analysis focuses on gradient-based methods applied to DenseNet and pre-trained ResNet-18. Specifically, we employed tests for Faithfulness, Robustness, and Usefulness to systematically assess which interpretation methods offer the most reliable and biologically meaningful explanations. The results of each test are presented in Section 6.3.1–Section 6.3.3.

6.3.1. Faithfulness Tests

To verify the faithfulness of the generated interpretations, we employed sanity check by reinitializing the models parameter as described in Section 5.3. This experiment was conducted for both the AD & CN classes, across all five gradient-based interpretation methods. The probability should drop for all the faithful interpretations.

Figure 15 and Figure 16 illustrate the outcome of the sanity check based on the Pearson correlation coefficient and Spearman’s rank correlation coefficient. Among all interpretation methods, Grad-CAM and Grad-CAM++ performed poorly for both measurements when applied to DenseNet. Among all interpretation methods, Grad-CAM and Grad-CAM++ performed poorly for both measurements when applied to DenseNet. Meanwhile, the interpretation methods for the pre-trained ResNet-18 score is better in Person correlation coefficient, with Grad-CAM and HiResCAM even producing negative correlations. However, for ResNet-18, Backpropagation and Guided Backpropagation produced high Spearman correlations (≥0.60) for CN and AD sample. This discrepancy between Pearson and Spearman occurred because the Spearman coefficient measures the correlation of ranks rather than raw magnitudes. The high Spearman score reveals that both the trained and untrained models rank the importance of voxels in a consistently similar order, exposing the method’s architectural bias. This behavior for Backpropagation and Guided Backpropagation is a known phenomenon, and our finding aligns with other research [39], which suggests these methods are often more dependent on the model’s architecture.

6.3.2. Robustness Checks

We assessed the robustness of various saliency methods under increasing levels of noise and blurring. To quantify the results, we used Spearman’s rank correlation coefficient (

ρ

), as detailed in Section 5.3. The levels of perturbation were varied from

0.1

to

0.5

for noise and from

0.5

to 4 for the blurring effect. For the blurring effect, levels can be categorized as follows: anything below or equal to

1.0

is considered mild, anything up to

3.0

is classified as moderate, and anything above that is regarded as strong blur [46]. Additionally, to address the stochastic nature of noise injection, we repeated the procedure five times for each level and recorded the mean

ρ

across these trials.

The results of the noise perturbation analysis for DenseNet and the pre-trained ResNet-18 models are presented in Figure 17 and Figure 18. For the pre-trained ResNet-18 model, the interpretation methods demonstrated strong robustness, with all of them achieving scores ≥ 0.99. In contrast, for the DenseNet model, only Grad-CAM and HiResCam displayed a similar trend, but Grad-CAM++ exhibited instability and fluctuated wildly for both AD and CN samples. This observation can be aligned with the visual results shown in Figure 8, where Grad-CAM++ highlighted a larger area of the brain coarsely, and noise interfered with the intensity ordering.

The outcome from of the blur perturbation for DenseNet and the pre-trained ResNet-18 models are presented in Figure 19 and Figure 20. Till the blur perturbation level 1 all interpretation methods for pre-trained ResNet-18 model outperformed DenseNet by large margin for both AD and CN sample. However, for moderate to high level (2 to 4), interpretation methods showed more robustness for DenseNet than to the ResNet-18 counterpart samples except Backpropagation and Guided propagation. This is analogous to the visual findings plotted for pre-trained ResNet-18 in Figure 10. Since the layer depended gradient based interpretation method showcased dense consolidated attribution, mild blurry perturbation did not affect the outcome. However, it affected the pixel wise attribution for DenseNet for the coarse attribution values. However, Backpropagation and Guided Propagation found to be extremely robust pre-trained ResNet-18 model. the probable reason for this behavior can be aligned with faithfulness test outcome described in Section 6.3.1.

6.3.3. Usefulness Verification

To evaluate the usefulness of the generated saliency maps, we analyzed the brain regions identified by each interpretation method for classifying the AD and CN samples using DenseNet and pre-trained ResNet-18 models. The voxel-wise attribution scores from the saliency maps were mapped to standard anatomical regions using the AAL atlas [42]. For each region, we computed the proportion of relevance attributed to it and ranked the regions accordingly. Among 62 distinct brain regions from the AAL atlas, the top six brain regions found for each class are reported in Table 2 and Table 3, along with the percentage of total relevance attributed to each region (e.g., Frontal_Mid (5.97) means the Frontal_Mid region contributed 5.97% of the total voxel-wise attribution score).

For AD classification, DenseNet consistently highlighted the frontal and temporal cortices, such as the Frontal_Mid, Temporal_Mid, and Temporal_Inf regions. In contrast, ResNet-18 primarily emphasized the medial and superior frontal regions, including Frontal_Sup_Medial, Frontal_Mid, and Frontal_Sup. These findings align closely with the established medical literature on the progression of AD. Neuropathological studies and structural MRI analyses consistently report that atrophy in the temporal lobe, particularly in the hippocampus and the surrounding medial temporal areas (e.g., Temporal_Mid, Temporal_Inf), is one of the most significant features of AD [47,48]. Involvement of the frontal lobe (e.g., Frontal_Mid, Frontal_Sup) is also documented in the later stages of the disease, which correlates with a decline in cognitive functioning [49].

In terms of CN classification, the DenseNet maps predominantly focused on the frontal and cerebellar regions, such as Frontal_Mid, Frontal_Sup, and Cerebellum_Crus1/2 across all the interpretations. Meanwhile, ResNet-18 highlighted sensorimotor and frontal areas, including Postcentral, Precentral, and Frontal_Mid. These findings are consistent with established neuroanatomical features of normal cognitive function. The frontal regions highlight their critical roles in executive functions, working memory, and cognitive control [50]. Similarly, the Postcentral and Precentral gyri signify the importance of the sensorimotor cortex in cognitive processing and action [51]. The involvement of the Cerebellum (Crus 1/2) has a role in higher-order cognitive and emotional processing function in motor coordination [52].

Although the hippocampal region is the most studied area of the brain for the early diagnosis of AD, it was not among the top six regions identified in our findings. This discrepancy may not only relate to the demographic of our specific subject (85 years old) [53] but also reflect the characteristics of the AAL atlas parcellation and the features we examined. The hippocampus is a small structure located deep within the brain and is anatomically surrounded by the larger temporal lobe, which consistently appears in the top six regions listed in Table 2. In AD, pathological changes, such as tau spread, often begin in the entorhinal cortex and parahippocampal gyrus before affecting the hippocampus itself. These structures comprise the broader medial temporal lobe [54] and are either included within or adjacent to the larger AAL regions identified by our model. Therefore, our models likely capture the spatial extent of the surrounding perirhinal and temporal cortex atrophy rather than the finer, deep changes within the hippocampus.

Overall, the saliency maps effectively captured relevant brain regions across both models and classifications, demonstrating their value in interpreting model predictions within a neuroanatomically meaningful context.

7. Conclusions

In this study, we showed how interpretability can be systematically integrated into model development through a structured pipeline involving preprocessing, model building, generation of explanations, and rigorous evaluation of these explanations. Our findings highlight the importance of making thoughtful design choices, showing that architectural modifications can significantly influence both the quality and reliability of saliency-based interpretations. Moreover, the outcomes emphasize that relying on a single interpretability technique can provide an incomplete understanding of model behavior; instead, combining multiple complementary explanation methods provides deeper and more trustworthy insights. Ultimately, this work underscores the need to shift the focus from mere predictive accuracy to explainability, particularly when deploying AI in sensitive clinical applications such as Alzheimer’s disease diagnosis.

Future research will aim to broaden the scope of the diagnostic models explored by incorporating more advanced architectures that balance predictive performance with interpretability. Building upon the insights gained in this study, we plan to conduct a comprehensive comparison of various saliency map techniques to identify those that most effectively capture both model behavior and clinically significant brain regions. Additionally, extending the analysis to include cases of Mild Cognitive Impairment (MCI) and a wider demographic range will be critical for achieving better generalization across different stages of cognitive decline. Furthermore, we will validate the proposed pipeline using different datasets and perform a comparative analysis on how various pre-processing actions affect the expected outcomes.

The final step will involve integrating clinical expertise into the interpretability pipeline, ensuring that the generated explanations not only align with visual and statistical findings but also hold practical relevance in diagnostic contexts.

Author Contributions

Author Contributions: Conceptualization, T.C., A.C. and Y.L.; methodology, T.C. and Y.L.; software, T.C. and A.C.; validation, T.C., A.C. and Y.L.; formal analysis, T.C. and A.C.; investigation, T.C. and A.C.; resources, T.C. and Y.L.; data collection, ADNI; data curation, T.C.; writing—original draft preparation, T.C.; writing—review and editing, T.C. and Y.L.; visualization, T.C.; supervision, Y.L.; project administration, Y.L.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Individual Discovery Grants.

Institutional Review Board Statement

As per ADNI protocols, all procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. More details can be found at adni.loni.usc.edu, accessed on 7 August 2025.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the ADNI study. Consent for publication has been granted by ADNI administrators.

Data Availability Statement

Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu, accessed on 7 August 2025).

Acknowledgments

Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd. and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org, accessed on 7 August 2025). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. This research was enabled in part by computational resources and services provided by SHARCNET and the Digital Research Alliance of Canada (alliancecan.ca, accessed on 3 October 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CNN	Convolutional Neural Network
XAI	Explainable Artificial Intelligence
MRI	Magnetic Resonance Imaging
AD	Alzheimer’s Disease
CN	Cognitive Normal
MCI	Mild Cognitive Impairment
ROI	Region-of-Interest
ReLU	Rectified Linear Unit
BN	Batch Normalization
QC	Quality Control
AAL	Automated Anatomical Labeling
Grad-CAM	Gradient-weighted Class Activation Mapping
HiResCAM	High-Resolution Class Activation Mapping
LIME	Local Interpretable Model-agnostic Explanations
RISE	Randomized Input Sampling for Explanation
SHAP	SHapley Additive exPlanations
SHARCNET	Shared Hierarchical Academic Research Computing Network
CC	Correlation Coefficient

References

Steiner, A.B.Q.; Jacinto, A.F.; Mayoral, V.F.S.; Brucki, S.M.D.; Citero, V.A. Mild cognitive impairment and progression to dementia of Alzheimer’s disease. Rev. Assoc. Méd. Bras. 2017, 63, 651–655. [Google Scholar] [CrossRef]
Amoroso, N.; Quarto, S.; La Rocca, M.; Tangaro, S.; Monaco, A.; Bellotti, R. An eXplainability artificial intelligence approach to brain connectivity in Alzheimer’s disease. Front. Aging Neurosci. 2023, 15, 1238065. [Google Scholar] [CrossRef] [PubMed]
Rieke, J.; Eitel, F.; Weygandt, M.; Haynes, J.D.; Ritter, K. Visualizing convolutional networks for MRI-based diagnosis of Alzheimer’s disease. In Proceedings of the Understanding and Interpreting Machine Learning in Medical Image Computing Applications, Cham, Switzerland, 16–20 September 2018; pp. 24–31. [Google Scholar] [CrossRef]
Ebrahimighahnavieh, A.; Luo, S.; Chiong, R. Deep learning to detect Alzheimer’s disease from neuroimaging: A systematic literature review. Comput. Methods Programs Biomed. 2020, 187, 105242. [Google Scholar] [CrossRef] [PubMed]
Hassan, N.; Miah, A.S.M.; Suzuki, T.; Shin, J. Gradual variation-based dual-stream deep learning for spatial feature enhancement with dimensionality reduction in early Alzheimer’s disease detection. IEEE Access 2025, 13, 31701–31717. [Google Scholar] [CrossRef]
Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 818–833. [Google Scholar] [CrossRef]
Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep inside Convolutional Networks: Visualising image classification models and saliency maps. arXiv 2013, arXiv:1312.6034. [Google Scholar] [CrossRef]
Dandl, S.; Binder, M.; Auer, A.; Bischl, B. Multi-Objective Counterfactual Explanations. In Proceedings of the Parallel Problem Solving from Nature—PPSN XVI, Granada, Spain, 5–9 September 2020; pp. 448–469. [Google Scholar] [CrossRef]
Jack, C.R.; Knopman, D.S.; Jagust, W.J.; Petersen, R.C.; Weiner, M.W.; Aisen, P.S.; Shaw, L.M.; Vemuri, P.; Wiste, H.J.; Weigand, S.D.; et al. Tracking pathophysiological processes in Alzheimer’s disease: An updated hypothetical model of dynamic biomarkers. Lancet Neurol. 2013, 12, 207–216. [Google Scholar] [CrossRef]
Jack, C.R.; Wiste, H.J.; Vemuri, P.; Weigand, S.D.; Senjem, M.L.; Zeng, G.; Bernstein, M.A.; Gunter, J.L.; Pankratz, V.S.; Aisen, P.S.; et al. Brain beta-amyloid measures and magnetic resonance imaging atrophy both predict time-to-progression from mild cognitive impairment to Alzheimer’s disease. Brain 2010, 133, 3336–3348. [Google Scholar] [CrossRef]
Fischl, B. FreeSurfer. NeuroImage 2012, 62, 774–781. [Google Scholar] [CrossRef]
Schwarz, C.G.; Gunter, J.L.; Wiste, H.J.; Przybelski, S.A.; Weigand, S.D.; Ward, C.P.; Senjem, M.L.; Vemuri, P.; Murray, M.E.; Dickson, D.W.; et al. A large-scale comparison of cortical thickness and volume methods for measuring Alzheimer’s disease severity. NeuroImage Clin. 2016, 11, 802–812. [Google Scholar] [CrossRef]
Liu, S.; Masurkar, A.V.; Rusinek, H.; Chen, J.; Zhang, B.; Zhu, W.; Fernandez-Granda, C.; Razavian, N. Generalizable deep learning model for early Alzheimer’s disease detection from structural MRIs. Sci. Rep. 2022, 12, 17106. [Google Scholar] [CrossRef]
Alahmed, H.; Al-Suhail, G. AlzONet: A deep learning optimized framework for multiclass Alzheimer’s disease diagnosis using MRI brain imaging. J. Supercomput. 2025, 81, 423. [Google Scholar] [CrossRef]
Ebrahimi, A.; Luo, S.; Alzheimer’s Disease Neuroimaging Initiative. Convolutional neural networks for Alzheimer’s disease detection on MRI images. J. Med. Imaging 2021, 8, 024503. [Google Scholar] [CrossRef]
Farahani, F.V.; Fiok, K.; Lahijanian, B.; Karwowski, W.; Douglas, P.K. Explainable AI: A review of applications to neuroimaging data. Front. Neurosci. 2022, 16, 906290. [Google Scholar] [CrossRef]
Wang, S.H.; Han, X.; Du, J.; Wang, Z.; Yuan, C.; Chen, Y.; Zhu, Y.; Dou, X.; Xu, X.; Xu, H.; et al. Saliency-based 3D convolutional neural network for categorising common focal liver lesions on multisequence MRI. Insights Imaging 2021, 12, 173. [Google Scholar] [CrossRef]
Springenberg, J.T.; Dosovitskiy, A.; Brox, T.; Riedmiller, M. Striving for simplicity: The all convolutional net. arXiv 2014, arXiv:1412.6806. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]
Chattopadhay, A.; Sarkar, A.; Howlader, P.; Balasubramanian, V.N. Grad-CAM++: Generalized gradient-based visual explanations for deep convolutional networks. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 839–847. [Google Scholar] [CrossRef]
Draelos, R.L.; Carin, L. Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks. arXiv 2020, arXiv:2011.08891. [Google Scholar] [CrossRef]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 4765–4774. [Google Scholar]
Petsiuk, V.; Das, A.; Saenko, K. RISE: Randomized Input Sampling for Explanation of Black-box Models. In Proceedings of the British Machine Vision Conference (BMVC), Newcastle upon Tyne, UK, 3–6 September 2018; BMVA Press: Guildford, UK, 2018. [Google Scholar]
Jin, W.; Li, X.; Hamarneh, G. One map does not fit all: Evaluating saliency map explanation on multi-modal medical images. arXiv 2021, arXiv:2107.05047. [Google Scholar] [CrossRef]
Wiśniewski, M.; Giulivi, L.; Boracchi, G. SE3D: A framework for saliency method evaluation in 3D imaging. In Proceedings of the 2024 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 27–30 October 2024; pp. 89–95. [Google Scholar] [CrossRef]
Brima, Y.; Atemkeng, M. Saliency-driven explainable deep learning in medical imaging: Bridging visual explainability and statistical quantitative analysis. BioData Min. 2024, 17, 18. [Google Scholar] [CrossRef]
Alzheimer’s Disease Neuroimaging Initiative. Available online: https://adni.loni.usc.edu/ (accessed on 7 August 2025).
Alzheimer’s Disease Neuroimaging Initiative. UCSF FreeSurfer Methods Summary. Available online: https://adni.bitbucket.io/reference/docs/UCSFFRESFR/UCSFFreeSurferMethodsSummary.pdf (accessed on 15 August 2025).
Ebrahimi, A.; Luo, S.; Chiong, R.; Alzheimer’s Disease Neuroimaging Initiative. Deep sequence modelling for Alzheimer’s disease detection using MRI. Comput. Biol. Med. 2021, 134, 104537. [Google Scholar] [CrossRef] [PubMed]
Guan, Z.; Kumar, R.; Fung, Y.R.; Wu, Y.; Fiterau, M. A Comprehensive Study of Alzheimer’s Disease Classification Using Convolutional Neural Networks. arXiv 2019, arXiv:1904.07950. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar] [CrossRef]
Singh, D.; Dyrba, M. Comparison of CNN Architectures for Detecting Alzheimer’s Disease using Relevance Maps. In Bildverarbeitung für die Medizin 2023; Deserno, T.M., Handels, H., Maier, A., Maier-Hein, K., Palm, C., Tolxdorff, T., Eds.; Springer Vieweg: Wiesbaden, Germany, 2023. [Google Scholar] [CrossRef]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
Carreira, J.; Zisserman, A. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4724–4733. [Google Scholar] [CrossRef]
Ebrahimi-Ghahnavieh, A.; Luo, S.; Chiong, R. Transfer learning for Alzheimer’s disease detection on MRI images. In Proceedings of the 2019 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT), Bali, Indonesia, 1–3 July 2019; pp. 133–138. [Google Scholar] [CrossRef]
Tamal. xai-for-ad. GitHub. 2025. Available online: https://github.com/tamal3472/xai-for-ad (accessed on 27 November 2025).
SHARCNET. SHARCNET: Shared Hierarchical Academic Research Computing Network. 2024. Available online: https://www.sharcnet.ca (accessed on 3 October 2025).
Adebayo, J.; Gilmer, J.; Muelly, M.; Goodfellow, I.; Hardt, M.; Kim, B. Sanity Checks for Saliency Maps. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS), NeurIPS’18, Red Hook, NY, USA, 2–7 December 2018; pp. 9525–9536. [Google Scholar]
Pearson, K. VII. Mathematical contributions to the theory of evolution—III. Regression, heredity, and panmixia. Philos. Trans. R. Soc. Lond. Ser. Contain. Pap. Math. Phys. Character 1896, 187, 253–318. [Google Scholar] [CrossRef]
Spearman, C. The Proof and Measurement of Association between Two Things. Am. J. Psychol. 1904, 15, 72–101. [Google Scholar] [CrossRef]
Tzourio-Mazoyer, N.; Landeau, B.; Papathanassiou, D.; Crivello, F.; Etard, O.; Delcroix, N.; Mazoyer, B.; Joliot, M. Automated Anatomical Labeling of Activations in SPM Using a Macroscopic Anatomical Parcellation of the MNI MRI Single-Subject Brain. NeuroImage 2002, 15, 273–289. [Google Scholar] [CrossRef]
Rathore, S.; Habes, M.; Iftikhar, M.A.; Shacklett, A.; Davatzikos, C. A review on neuroimaging-based classification studies and associated feature extraction methods for Alzheimer’s disease and its prodromal stages. NeuroImage 2017, 155, 530–548. [Google Scholar] [CrossRef]
Schallner, L.; Rabold, J.; Scholz, O.; Schmid, U. Effect of Superpixel Aggregation on Explanations in LIME—A Case Study with Biological Data. In Communications in Computer and Information Science, Proceedings of the Machine Learning and Knowledge Discovery in Databases, Würzburg, Germany, 16–20 September 2019; Cellier, P., Driessens, K., Eds.; Springer International Publishing: Cham, Switzerland, 2020; Volume 1167, pp. 147–158. [Google Scholar] [CrossRef]
Tempel, F.; Ihlen, E.A.F.; Adde, L.; Strümke, I. Explaining Human Activity Recognition with SHAP: Validating insights with perturbation and quantitative measures. Comput. Biol. Med. 2025, 188, 109838. [Google Scholar] [CrossRef]
Hendrycks, D.; Dietterich, T. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. arXiv 2019, arXiv:1903.12261. [Google Scholar] [CrossRef]
Convit, A.; Leon, M.J.D.; Tarshish, C.; Santi, S.D.; Tsui, W.; Rusinek, H.; George, A. Specific Hippocampal Volume Reductions in Individuals at Risk for Alzheimer’s Disease. Neurobiol. Aging 1997, 18, 131–138. [Google Scholar] [CrossRef]
Clerx, L.; van Rossum, I.A.; Burns, L.; Knol, D.L.; Scheltens, P.; Verhey, F.; Aalten, P.; Lapuerta, P.; van de Pol, L.; van Schijndel, R.; et al. Measurements of medial temporal lobe atrophy for prediction of Alzheimer’s disease in subjects with mild cognitive impairment. Neurobiol. Aging 2013, 34, 2003–2013. [Google Scholar] [CrossRef]
Desikan, R.S.; Cabral, H.J.; Settecase, F.; Hess, C.P.; Dillon, W.P.; Glastonbury, C.M.; Weiner, M.W.; Schmansky, N.J.; Salat, D.H.; Fischl, B. Automated MRI measures predict progression to Alzheimer’s disease. Neurobiol. Aging 2010, 31, 1364–1374. [Google Scholar] [CrossRef] [PubMed]
Miller, E.K.; Cohen, J.D. An Integrative Theory of Prefrontal Cortex Function. Annu. Rev. Neurosci. 2001, 24, 167–202. [Google Scholar] [CrossRef]
Morgan, C.T. The Cerebral Cortex of Man: A Clinical Study of Localization of Function. Science 1950, 112, 567. [Google Scholar] [CrossRef]
Stoodley, C.J.; Valera, E.M.; Schmahmann, J.D. Functional topography of the cerebellum for motor and cognitive tasks: An fMRI study. NeuroImage 2012, 59, 1560–1570. [Google Scholar] [CrossRef]
Chauveau, L.; Kuhn, E.; Palix, C.; Felisatti, F.; Ourry, V.; de La Sayette, V.; Chételat, G.; de Flores, R. Medial Temporal Lobe Subregional Atrophy in Aging and Alzheimer’s Disease: A Longitudinal Study. Front. Aging Neurosci. 2021, 13, 750154. [Google Scholar] [CrossRef] [PubMed]
Braak, H.; Braak, E. Neuropathological staging of Alzheimer-related changes. Acta Neuropathol. 1991, 82, 239–259. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Workflow Diagram.

Figure 2. Raw images before and after processing.

Figure 3. Baseline 3D CNN architecture. C: Channels; ConvBlock: 3D convolutions with batch normalization, ReLU, and maxpooling; FC: Fully connected layer.

Figure 4. Custom DenseNet Architecture. C: Channels; ConvBlock: 3D convolutions with maxpooling, batch normalization, and ReLU; DB: Dense Block; TR: Transition Layer.

Figure 5. ResNet-18 Architecture. C: Channels; ConvBlock: 3D convolutions with batch normalization, ReLU, and maxpooling; L (2 × Res): Layer with two residual blocks; FC: Fully connected layer.

Figure 6. Hybrid Swin Architecture. C: Channels; ConvBlock: 3D convolutions with batch normalization and maxpooling; ResNet (L1–L3): Layer 1 to 3 from ResNet-18 architecture; Interpol. + Conv (1 × 1 × 1): Interpolation followed by 1 × 1 × 1 convolution; AA. Pool: Adaptive average pooling.

Figure 7. Class-wise recall performance.

Figure 8. Gradient-based Interpretation Methods Visualization for DenseNet: (a) Output for AD Image Sample (b) Output for CN Image Sample.

Figure 9. Model Agnostic Interpretation Method Visualization for DenseNet: (a) Output for AD Image Sample (b) Output for CN Image Sample.

Figure 10. Gradient-based Interpretation Method Visualization for pre-trained ResNet-18: (a) Output for AD Image Sample (b) Output for CN Image Sample.

Figure 11. Model Agnostic Interpretation Method Visualization for pre-trained ResNet-18: (a) Output for AD Image Sample (b) Output for CN Image Sample.

Figure 12. Gradient-based Interpretation Method Visualization for Hybrid Swin Transformer: (a) Output for AD Image Sample (b) Output for CN Image Sample.

Figure 13. Layer Depended Gradient-based Interpretation Method Visualization for Feature Extractor part of Hybrid Swin Transformer: (a) Output for AD Image Sample (b) Output for CN Image Sample.

Figure 14. Model Agnostic Interpretation Method Visualization for Hybrid Swin Transformer: (a) Output for AD Image Sample; (b) Output for CN Image Sample.

Figure 15. Interpretation Methods Faithfulness Check (Pearson Correlation Coefficient) Under Full Model Parameter reinitialization: (a) Results for DenseNet (b) Results for Pre-trained ResNet-18.

Figure 16. Interpretation Methods Faithfulness Check (Spearman

ρ

) Under Full Model Parameter re-initialization: (a) Results for DenseNet (b) Results for Pre-trained ResNet-18.

Figure 16. Interpretation Methods Faithfulness Check (Spearman

ρ

) Under Full Model Parameter re-initialization: (a) Results for DenseNet (b) Results for Pre-trained ResNet-18.

Figure 17. Robustness of Interpretation Methods Using DenseNet Under Input Perturbations (Gaussian Noise): (a) For AD sample (b) For CN sample.

Figure 18. Robustness of Interpretation Methods Using Pre-Trained ResNet-18 Under Input Perturbations (Gaussian Noise): (a) For AD sample (b) For CN sample.

Figure 19. Robustness of Interpretation Methods Using DenseNet Under Input Perturbations (Gaussian Blurring): (a) For AD sample (b) For CN sample.

Figure 20. Robustness of Interpretation Methods Using Pre-Trained ResNet-18 Under Input Perturbations (Gaussian Blurring): (a) For AD sample (b) For CN sample.

Table 1. Performance summary of different models on the test set (WA = Weighted Average).

Model	Accuracy (%)	$F_{1}$ (WA)	$F_{1}$ (Class CN / AD)
Baseline CNN (scratch)	81.33	0.81	0.80/0.83
DenseNet (scratch)	92.00	0.92	0.93/0.90
ResNet-18 (scratch)	80.67	0.81	0.82/0.80
ResNet-18 (pre-trained)	95.33	0.95	0.96/0.94
Hybrid Swin Transformer	92.67	0.93	0.94/0.91

Table 2. Top 6 Relevance Area for AD Classification.

Model	Grad-CAM (%)	Grad-CAM++ (%)	HiResCAM (%)	Backpropagation (%)	Guided Backpropagation (%)
DenseNet	Frontal_Mid (5.97) Temporal_Mid (5.83) Temporal_Inf (4.22) Precentral (4.18) Frontal_Sup (4.05) Postcentral (3.90)	Temporal_Mid (5.46) Temporal_Inf (4.77) Frontal_Mid (4.52) Precuneus (4.20) Postcentral (3.60) Precentral (3.33)	Frontal_Mid (6.75) Temporal_Mid (5.47) Temporal_Inf (5.06) Frontal_Sup (4.21) Precentral (4.03) Postcentral (3.84)	Frontal_Mid (7.60) Temporal_Mid (5.50) Frontal_Sup (5.08) Precentral (4.67) Temporal_Inf (3.96) Temporal_Sup (3.78)	Frontal_Mid (7.77) Temporal_Mid (5.39) Frontal_Sup (5.09) Precentral (4.70) Temporal_Inf (3.81) Temporal_Sup (3.79)
ResNet-18 (Pre-trained)	Frontal_Sup_Medial (7.95) Frontal_Mid (6.74) Frontal_Sup (6.07) Frontal_Inf_Orb (5.32) Cingulum_Ant (4.80) Temporal_Pole_Sup (3.93)	Frontal_Mid (19.16) Frontal_Sup_Medial (12.73) Frontal_Sup (11.83) Frontal_Inf_Orb (6.53) Frontal_Inf_Tri (6.16) Cingulum_Ant (5.93)	Frontal_Sup_Medial (15.72) Frontal_Mid (15.06) Frontal_Sup (12.51) Cingulum_Ant (6.80) Frontal_Inf_Orb (5.30) Frontal_Sup_Orb (4.36)	Frontal_Mid (15.09) Precentral (7.23) Frontal_Sup (7.20) Frontal_Inf_Tri (5.94) Postcentral (5.79) Frontal_Sup_Medial (5.56)	Frontal_Mid (15.75) Frontal_Sup (7.13) Precentral (7.00) Frontal_Inf_Tri (6.01) Postcentral (5.67) Frontal_Sup_Medial (5.55)

Table 3. Top 6 Relevance Area for CN Classification.

Model	Grad-CAM (%)	Grad-CAM++ (%)	HiResCAM (%)	Backpropagation (%)	Guided Backpropagation (%)
DenseNet	Frontal_Mid (6.76) Frontal_Sup (5.25) Cerebelum_Crus1 (4.56) Frontal_Sup_Medial (4.07) Cerebelum_Crus2 (3.96) Temporal_Inf (3.87)	Temporal_Mid (5.20) Precuneus (4.77) Temporal_Inf (4.48) Fusiform (3.76) Postcentral (3.41) Temporal_Sup (3.35)	Frontal_Sup_Medial (5.05) Frontal_Mid (4.81) Cingulum_Mid (4.64) Temporal_Mid (4.54) Cerebelum_Crus1 (4.43) Cerebelum_Crus2 (4.11)	Frontal_Mid (7.42) Temporal_Mid (4.91) Frontal_Sup (4.56) Precentral (3.95) Cingulum_Mid (3.83) Postcentral (3.78)	Frontal_Mid (7.65) Temporal_Mid (4.84) Frontal_Sup (4.39) Precentral (3.99) Postcentral (3.83) Cingulum_Mid (3.77)
ResNet-18 (Pre-trained)	Postcentral (7.23) Frontal_Mid (6.99) Precuneus (6.62) Precentral (5.46) Temporal_Mid (5.27) Frontal_Sup (4.10)	Postcentral (7.44) Frontal_Mid (7.33) Precentral (6.35) Precuneus (5.61) Temporal_Mid (5.04) Frontal_Sup (4.19)	Frontal_Mid (9.72) Precentral (7.69) Postcentral (7.39) Frontal_Sup (5.15) Supp_Motor_Area (4.11) Cingulum_Mid (4.03)	Frontal_Mid (9.21) Precentral (6.05) Insula (5.93) Frontal_Inf_Oper (5.42) Frontal_Inf_Tri (4.94) Postcentral (4.61)	Frontal_Mid (8.61) Precentral (6.17) Insula (5.84) Frontal_Inf_Oper (4.99) Frontal_Inf_Tri (4.94) Postcentral (4.81)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chakroborty, T.; Colafranceschi, A.; Liu, Y.; for the Alzheimer’s Disease Neuroimaging Initiative. Beyond Accuracy: Explainable Deep Learning for Alzheimer’s Disease Detection Using Structural MRI Data. Information 2025, 16, 1058. https://doi.org/10.3390/info16121058

AMA Style

Chakroborty T, Colafranceschi A, Liu Y, for the Alzheimer’s Disease Neuroimaging Initiative. Beyond Accuracy: Explainable Deep Learning for Alzheimer’s Disease Detection Using Structural MRI Data. Information. 2025; 16(12):1058. https://doi.org/10.3390/info16121058

Chicago/Turabian Style

Chakroborty, Tamal, Adam Colafranceschi, Yang Liu, and for the Alzheimer’s Disease Neuroimaging Initiative. 2025. "Beyond Accuracy: Explainable Deep Learning for Alzheimer’s Disease Detection Using Structural MRI Data" Information 16, no. 12: 1058. https://doi.org/10.3390/info16121058

APA Style

Chakroborty, T., Colafranceschi, A., Liu, Y., & for the Alzheimer’s Disease Neuroimaging Initiative. (2025). Beyond Accuracy: Explainable Deep Learning for Alzheimer’s Disease Detection Using Structural MRI Data. Information, 16(12), 1058. https://doi.org/10.3390/info16121058

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Beyond Accuracy: Explainable Deep Learning for Alzheimer’s Disease Detection Using Structural MRI Data

Abstract

1. Introduction

2. Related Work

2.1. AD Diagnosis and MRI Biomarkers

2.2. Deep Learning Models for AD Classification

2.3. Interpretability of Medical Imaging: Intrinsic vs. Post Hoc

2.3.1. Gradient-Based Methods

2.3.2. Model-Agnostic Perturbation Methods

2.3.3. Limitations and Pitfalls of Saliency Maps

3. Datasets

4. Framework

4.1. Image Preprocessing

4.2. Reference Classifiers

4.2.1. 3D CNN and DenseNet Variants

4.2.2. ResNet-18 Pretrained on ImageNet

4.2.3. CNN with ResNet and Swin Transformer Pre-Training

4.3. Interpretation Methods

4.3.1. Gradient-Based Techniques

4.3.2. Model-Agnostic Techniques

5. Implementation

5.1. Experimental Setup

5.2. Performance Metrics on Classification

5.3. Evaluation Protocol for Interpretation Methods

6. Results

6.1. Performance Analysis of the Classifiers

6.2. Qualitative Analysis of the Interpretations

6.2.1. Interpretation of DenseNet

6.2.2. Interpretation of ResNet-18 (Pre-Trained)

6.2.3. Interpretation of Hybrid Swin Transformer

6.3. Quantitative Analysis of the Interpretations

6.3.1. Faithfulness Tests

6.3.2. Robustness Checks

6.3.3. Usefulness Verification

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI