3.1. Model Performance Analysis
3.1.1. Architecture-Wise Performance Analysis
Comprehensive performance evaluation was conducted across three two-dimensional convolutional neural network architectures to assess classification accuracy, training dynamics, and computational efficiency for Alzheimer’s disease severity detection. The evaluation encompassed systematic comparison of EfficientNet-B4, ResNet-50, and MobileNet-V3 architectures trained for four-class severity classification across MildDemented, ModerateDemented, NonDemented, and VeryMildDemented categories.
The comparative training analysis revealed distinct convergence characteristics across all three architectures throughout the 20-epoch training period (
Figure 1). EfficientNet-B4 demonstrated robust training characteristics with systematic convergence patterns, showing training loss decreasing from approximately 0.9 at epoch 1 to below 0.1 by epoch 20, while validation loss decreased from approximately 0.6 to around 0.1. The accuracy progression revealed training accuracy improving from approximately 62% at epoch 1 to nearly 100% by epoch 20, with validation accuracy starting at approximately 74% and reaching 98% at the final epoch. The parallel trajectory of training and validation metrics indicated successful optimization without overfitting concerns, with consistent convergence throughout the complete training duration.
MobileNet-V3 exhibited exceptional training efficiency with rapid convergence characteristics superior to other architectures. The loss curves demonstrated highly efficient optimization with training loss decreasing from approximately 0.8 at epoch 1 to near 0 by epoch 20, while validation loss decreased from approximately 0.4 to around 0.05. The accuracy progression revealed rapid improvement with training accuracy advancing from approximately 70% at epoch 1 to nearly 100% by epoch 20, while validation accuracy improved from approximately 84% initially to over 99% at the final epoch. The superior convergence rate and minimal gap between training and validation metrics throughout the training process indicated optimal architectural design for this specific medical imaging classification task.
ResNet-50 demonstrated consistent training performance with steady convergence patterns comparable to other architectures. The loss curves showed systematic decrease with training loss improving from approximately 0.85 at epoch 1 to below 0.05 by epoch 20, while validation loss decreased from approximately 0.55 to around 0.08. The accuracy progression revealed steady improvement with training accuracy advancing from approximately 61% at epoch 1 to nearly 100% by epoch 20, while validation accuracy improved from approximately 76% initially to 98% at the final epoch. The stable convergence without significant fluctuations indicated robust optimization characteristics and appropriate model capacity for the given dataset complexity.
The comprehensive performance comparison revealed MobileNet-V3 as the superior architecture with the highest test accuracy of 99.18%, representing meaningful improvements over EfficientNet-B4’s 98.23% accuracy and ResNet-50’s 98.04% accuracy (
Figure 2). These performance differences, while numerically appearing modest, represent significant improvements in clinical diagnostic capability when applied to large patient populations. The superior performance of MobileNet-V3 validated the effectiveness of its optimized architectural design for medical imaging applications requiring both high accuracy and computational efficiency.
The efficiency analysis revealed dramatic differences in computational requirements while highlighting MobileNet-V3’s exceptional parameter efficiency (
Figure 3). MobileNet-V3 achieved the highest classification accuracy using only 4,207,156 parameters, representing 82% fewer parameters than ResNet-50’s 23,516,228 parameters and 76% fewer parameters than EfficientNet-B4’s 17,555,788 parameters. This efficiency advantage translates directly to reduced memory requirements, faster inference times, lower power consumption, and decreased deployment costs for clinical implementation. The superior parameter-to-performance ratio demonstrated MobileNet-V3’s architectural optimization for achieving maximum diagnostic accuracy with minimal computational overhead.
Training time analysis demonstrated substantial efficiency differences across architectures. MobileNet-V3 completed 20-epoch training in 1198.2 s, representing approximately 83% faster training than EfficientNet-B4’s 6987.9 s and 63% faster than ResNet-50’s 3253.4 s. This training efficiency advantage enables rapid model development, iterative improvement cycles, and frequent retraining with updated datasets essential for maintaining diagnostic accuracy as clinical protocols evolve.
The detailed confusion matrix analysis provided comprehensive insight into per-class classification performance across all architectures (
Figure 4). EfficientNet-B4 demonstrated strong overall performance with 98.2% accuracy, showing excellent classification for MildDemented cases with 1308 correct classifications out of 1320 total samples. ModerateDemented classification achieved perfect performance with all 1010 samples correctly identified. NonDemented classification showed 1390 correct predictions out of 1421 samples, while VeryMildDemented classification achieved 1301 correct predictions out of 1348 samples. The primary misclassification errors occurred between NonDemented and VeryMildDemented classes, with 26 NonDemented cases misclassified as VeryMildDemented and 37 VeryMildDemented cases misclassified as NonDemented.
ResNet-50 confusion matrix analysis revealed 98.0% overall accuracy with excellent MildDemented classification showing 1310 correct predictions out of 1320 samples and perfect ModerateDemented performance with all 1010 samples correctly classified. NonDemented classification achieved 1372 correct predictions out of 1421 samples, while VeryMildDemented classification showed 1309 correct predictions out of 1348 samples. The misclassification pattern concentrated primarily in the NonDemented-VeryMildDemented boundary, with 40 NonDemented cases misclassified as VeryMildDemented and 29 VeryMildDemented cases misclassified as NonDemented.
MobileNet-V3 demonstrated superior classification performance with 99.2% overall accuracy, achieving exceptional results across all categories. MildDemented classification reached 1317 correct predictions out of 1320 samples, while ModerateDemented maintained perfect classification with all 1010 samples correctly identified. NonDemented classification achieved 1388 correct predictions out of 1421 samples, and VeryMildDemented classification demonstrated 1342 correct predictions out of 1348 samples. The minimal misclassification errors included only 30 NonDemented cases misclassified as VeryMildDemented and 5 VeryMildDemented cases misclassified as NonDemented, representing the lowest error rates among all architectures. The performance criteria for all the classes of Alzheimer’s disease severity, particularly for the precision, recall, and F1-score of each individual class, were calculated from results derived using confusion matrices. Calculations were also based on detailed classification reports that relate to the EfficientNet-B4, ResNet-50, and MobileNet-V3 architectures (
Table 2).
The comprehensive classification performance comparison revealed distinct architectural strengths across different severity categories (
Table 2). EfficientNet-B4 demonstrated strong performance with an overall accuracy of 98%, achieving perfect precision and recall (1.00) for ModerateDemented cases and excellent classification for MildDemented with 0.99 precision and recall. However, its performance was slightly lower for NonDemented and VeryMildDemented categories, with 0.97 precision and 0.98 recall and 0.97 precision and recall, respectively, reflecting the greater diagnostic complexity of differentiating healthy aging from early dementia stages.
ResNet-50 achieved an identical 98% overall accuracy, also delivering perfect performance (1.00 precision and recall) for ModerateDemented and strong results for MildDemented with 0.99 precision and recall. It demonstrated consistent classification for NonDemented and VeryMildDemented, both with 0.97 precision and recall, indicating a balanced capability across the full range of dementia severity without significant bias toward any specific class.
MobileNet-V3 exhibited the strongest overall performance, with an accuracy of 99%, achieving perfect precision and recall (1.00) for both MildDemented and ModerateDemented classes, ensuring flawless identification of critical pathological cases. The model also achieved perfect precision (1.00) and 0.98 recall for NonDemented, and 0.98 precision with perfect recall (1.00) for VeryMildDemented, demonstrating excellent sensitivity and specificity across both early and non-pathological categories.
The macro average and weighted average metrics for MobileNet-V3 showed 0.99 across precision, recall, and F1-score, further supporting its strong generalization and balanced performance. With an accuracy of 0.99 across 5099 test samples, MobileNet-V3 offers robust statistical evidence of its superior diagnostic capability across all Alzheimer’s disease severity levels.
Combined with its lightweight design of 4.2 million parameters and fastest training time of 1198.2 s (as reported elsewhere), MobileNet-V3 stands out as the most efficient and accurate model. This combination of high diagnostic performance and computational efficiency makes it the optimal choice for clinical deployment, particularly in healthcare environments requiring both precision and practical implementation feasibility.
3.1.2. Validation of Source Dataset
To evaluate the effectiveness of augmentation-based training and assess model performance on the source data from which the training set was derived, comprehensive evaluation was conducted on the complete original dataset containing 6400 brain MRI images. This validation approach tested whether training on the augmented dataset (33,984 images) successfully enhanced model capability to classify the original source images, providing insights into augmentation strategy effectiveness and model generalization to the foundational data distribution. The original source dataset comprised MildDemented (896 images), ModerateDemented (64 images), NonDemented (3200 images), and VeryMildDemented (2240 images), representing the unmodified base images from which the augmented training set was generated.
The source dataset evaluation revealed excellent model performance with rankings consistent with augmented dataset training results (
Figure 5). MobileNet-V3 demonstrated superior performance achieving 99.47% accuracy on the original source images, indicating highly successful transfer from augmented training data to the foundational image set. ResNet-50 achieved 98.98% accuracy, maintaining excellent classification capability when applied to the source data from which its training augmentations were derived. EfficientNet-B4 attained 97.30% accuracy, showing effective but comparatively lower performance on source images while still achieving clinically acceptable diagnostic accuracy.
The performance comparison between augmented dataset test results and original source dataset results provided valuable insights into augmentation effectiveness. MobileNet-V3 showed enhanced performance on source data (99.47%) compared to augmented test set performance (99.18%), suggesting that augmentation training improved the model’s ability to classify the foundational images. ResNet-50 demonstrated improved source dataset performance (98.98%) relative to augmented test results (98.04%), indicating successful augmentation-based learning transfer. EfficientNet-B4 showed decreased performance on source data (97.30%) compared to augmented test results (98.23%), suggesting greater reliance on augmentation-specific features during training.
The detailed confusion matrix analysis on source data validated model diagnostic capabilities when applied to the foundational image set (
Figure 6). MobileNet-V3 demonstrated exceptional classification performance with near-perfect accuracy across all severity categories on source images. The model achieved outstanding MildDemented classification with 891 correct predictions out of 896 samples (99.4% class accuracy), perfect ModerateDemented classification with all 64 samples correctly identified (100% class accuracy), excellent NonDemented classification with 3192 correct predictions out of 3200 samples (99.75% class accuracy), and strong VeryMildDemented classification with 2219 correct predictions out of 2240 samples (99.1% class accuracy).
ResNet-50 demonstrated robust source dataset performance with 98.98% overall accuracy, achieving excellent classification for MildDemented (892/896 correct, 99.6% class accuracy) and perfect ModerateDemented classification (64/64 correct, 100% class accuracy). NonDemented classification showed 3193 correct predictions out of 3200 samples (99.78% class accuracy), while VeryMildDemented classification achieved 2186 correct predictions out of 2240 samples (97.6% class accuracy). The primary classification errors concentrated in the VeryMildDemented category, with minimal misclassification across other severity levels.
EfficientNet-B4 achieved 97.30% accuracy on source data, demonstrating effective but less optimal transfer from augmented training to source classification. MildDemented classification reached 869 correct predictions out of 896 samples (97.0% class accuracy), while ModerateDemented maintained perfect performance with all 64 samples correctly identified (100% class accuracy). NonDemented classification achieved 3141 correct predictions out of 3200 samples (98.16% class accuracy), and VeryMildDemented classification demonstrated 2133 correct predictions out of 2240 samples (95.2% class accuracy). The higher misclassification rate indicated greater dependence on augmentation-derived features compared to other architectures.
The comprehensive classification performance analysis on source data demonstrated successful augmentation-based training strategies across all architectures (
Table 3). MobileNet-V3 achieved exceptional performance with perfect macro averages (1.00) for precision, recall, and F1-score, indicating optimal balanced performance when trained on augmented data and applied to source images. This performance level represents the highest achievable balanced classification metrics, confirming the effectiveness of augmentation strategies for improving model capability on foundational data.
The source dataset validation revealed the superior effectiveness of MobileNet-V3’s architectural design for augmentation-based training, achieving enhanced performance on source images (99.47%) compared to augmented test set results (99.18%). ResNet-50 demonstrated consistent improvement when applied to source data (98.98%) relative to augmented test performance (98.04%), indicating successful knowledge transfer from augmented training to foundational image classification. EfficientNet-B4 showed decreased source dataset performance (97.30%) compared to augmented test results (98.23%), suggesting architectural sensitivity to the transition from augmented training features to source image characteristics.
The critical clinical significance of perfect ModerateDemented classification across all architectures on source data validated the robustness of augmentation-based training for the most consequential diagnostic decisions. The consistent perfect identification of moderate-stage dementia cases across both augmented training scenarios and source dataset application provided strong evidence of reliable clinical diagnostic capability, supporting the deployment of these models for diagnostic applications in clinical practice.
3.2. Two-Dimensional Model Explainability Analysis
To validate model decision-making processes and provide clinical interpretability for the two-dimensional classification models, comprehensive XRAI attribution analysis was conducted across all three CNN architectures (EfficientNet-B4, ResNet-50, and MobileNet-V3) for each Alzheimer’s disease severity class. It is important to note that the confidence values shown alongside the XRAI maps represent the model’s softmax prediction probabilities and are not outputs of XRAI itself. The explainability analysis aimed to identify which brain regions most significantly influenced classification decisions, compare architectural approaches to feature detection, and assess the clinical relevance of model attention patterns across different dementia severity levels. Each architecture was evaluated using identical XRAI parameters to ensure fair comparison, with attribution maps generated for representative cases from MildDemented, ModerateDemented, NonDemented, and VeryMildDemented classes to capture the complete spectrum of disease progression. The analysis utilized pixel-level attribution scoring to quantify regional importance, enabling systematic comparison of architectural performance and identification of clinically meaningful attention patterns that could guide model selection for clinical deployment.
XRAI Analysis for MildDemented Class shows model attention patterns for mild dementia cases across all three architectures. The top row shows XRAI attribution heat maps, while the bottom row displays attribution overlays on original brain images with prediction confidence scores.
For the MildDemented case analysis (
Figure 7), ResNet-50 demonstrated the most clinically appropriate attribution patterns with focused high-intensity regions (white and yellow areas) in specific cortical areas, suggesting precise detection of mild dementia pathological markers. The attribution heat map revealed concentrated attention on discrete brain regions with peak intensities around 0.0030–0.0035 as indicated by the color scale, demonstrating targeted focus on areas typically associated with early cognitive decline. This focused regional specificity makes ResNet-50 the most effective architecture for mild dementia detection, as it identifies specific pathological areas rather than providing generalized responses.
EfficientNet-B4 displayed uniform attribution distribution across brain tissue regions with values ranging from 0.1 to 0.4 on the color scale, showing consistent but non-specific attention that lacks the regional discrimination necessary for precise pathological localization. MobileNet-V3 exhibited moderate regional specificity with attribution values in the 0.002–0.008 range, displaying some concentrated regions (yellow-red intensity patterns) but with less focused precision compared to ResNet-50’s targeted approach.
The superior performance of ResNet-50 is evidenced by its ability to generate distinct high-attribution regions (white/yellow areas) that correspond to specific anatomical locations, demonstrating the model’s capacity to identify subtle pathological changes characteristic of mild dementia with greater precision than the more diffuse patterns exhibited by the other architectures.
XRAI Analysis for ModerateDemented Class showing model attention patterns for moderate dementia case. All three architectures demonstrate heightened attribution intensity consistent with more pronounced pathological changes in moderate-stage dementia.
The ModerateDemented case revealed heightened model attention across all architectures, consistent with more pronounced pathological changes expected in moderate dementia stages (
Figure 8). Each architecture demonstrated distinct attribution approaches with all models achieving perfect 1.000 confidence scores, indicating robust classification performance across different analytical methodologies.
EfficientNet-B4 showed concentrated attribution in specific cortical regions with peak intensities reaching 0.25–0.30 according to the color scale, displaying focused white regions indicating selective high-importance area detection. ResNet-50 demonstrated multiple discrete white and yellow regions distributed across cortical areas, maintaining attribution values around 0.0010–0.0012 with varied spatial coverage. MobileNet-V3 exhibited extensive high-attribution regions with large white and yellow areas and attribution values reaching 0.004 based on the color scale.
The architectural differences in attribution pattern generation demonstrate the diverse approaches these models employ for moderate dementia classification, with each showing distinct spatial distribution characteristics while maintaining equivalent classification accuracy. This diversity in attribution patterns provides complementary insights into model decision-making processes without favoring any single architectural approach.
XRAI analysis for NonDemented Class showing model attention patterns for healthy brain case. Attribution patterns are more diffuse compared to dementia cases, indicating appropriate recognition of normal brain structure without pathological focus.
For the NonDemented case, all models demonstrated unexpectedly prominent attribution patterns rather than the minimal, diffuse patterns typically expected for healthy brain tissue (
Figure 9). EfficientNet-B4 displayed distinct high-attribution regions with prominent white areas reaching maximum values of 2.00 on the color scale, indicating focused attention on specific brain regions despite the absence of pathological markers. ResNet-50 showed concentrated white regions in select cortical areas with attribution values reaching 0.005, demonstrating focused regional attention in the healthy brain case. MobileNet-V3 exhibited extensive yellow regions with attribution values reaching 0.012 according to the color scale, showing the most widespread high-attribution coverage among the three architectures.
The prominent attribution patterns observed in all models for the healthy brain case suggest that these architectures may be identifying normal anatomical features or tissue characteristics as important for classification, rather than showing the minimal, background-level attribution that might be expected for truly healthy tissue. This finding indicates that the models are actively detecting specific brain characteristics even in the absence of pathological changes, which may reflect their training on discriminative features that distinguish normal brain structure from various stages of dementia.
XRAI Analysis for VeryMildDemented Class shows model attention patterns for very mild dementia cases. Models demonstrate sensitivity to subtle pathological changes characteristic of early-stage cognitive decline with varying architectural approaches to feature detection.
The VeryMildDemented case analysis revealed distinct architectural approaches to detecting subtle pathological changes characteristic of early-stage cognitive decline (
Figure 10). EfficientNet-B4 showed the most conservative attribution approach with limited high-attribution regions and peak intensities around 0.5 according to the color scale, suggesting selective detection of specific early pathological markers with minimal background attribution. ResNet-50 demonstrated the most pronounced attribution response with extensive white regions in cortical areas and attribution values reaching 3.0 as indicated by the color scale, showing the highest sensitivity to very mild dementia markers among the three architectures. MobileNet-V3 exhibited intermediate attribution sensitivity with substantial yellow and white regions and values reaching 0.0175 based on the color scale, providing a balanced approach between the conservative EfficientNet-B4 and the highly sensitive ResNet-50 patterns.
The architectural differences in very mild dementia detection illustrate varying sensitivities to early pathological changes, with ResNet-50 showing the most aggressive detection approach through extensive high-attribution regions, EfficientNet-B4 providing focused but conservative detection, and MobileNet-V3 offering intermediate sensitivity. All models maintained perfect 1.000 confidence scores despite these attribution pattern differences, indicating robust classification performance across different analytical approaches for early-stage dementia detection.
The XRAI implementation revealed significant architectural differences in attribution scale ranges and spatial distribution patterns across severity classes. EfficientNet-B4 exhibited the widest dynamic range variations, with attribution scales from 0–0.6 for VeryMildDemented cases to 0–2.00 for NonDemented cases, demonstrating substantial response variability based on input characteristics. ResNet-50 showed extreme sensitivity variations ranging from 0–0.0012 for ModerateDemented cases to 0–3.0 for VeryMildDemented cases, indicating highly adaptive feature extraction mechanisms. MobileNet-V3 displayed more consistent scaling behavior with ranges from 0–0.008 to 0–0.012 across classes, suggesting stable attribution responses regardless of severity level.
Spatial attribution analysis revealed distinct architectural approaches to feature importance mapping. EfficientNet-B4 consistently generated focal high-attribution regions with discrete white and yellow areas, showing preference for concentrated attribution zones with clear demarcation between high-importance regions and background areas. ResNet-50 exhibited the most complex spatial distributions with extensive white region coverage, particularly in VeryMildDemented cases, demonstrating maximum sensitivity to subtle input variations through intricate attribution boundaries and varied regional characteristics. MobileNet-V3 produced intermediate spatial patterns with extensive yellow and white regions showing uniform coverage characteristics, maintaining balanced sensitivity across severity levels without extreme responses.
Comparative analysis demonstrated fundamental differences in architectural sensitivity and feature detection approaches across all severity classes. EfficientNet-B4 showed conservative attribution behavior with moderate peak intensities and selective regional focus, indicating threshold-based feature selection mechanisms with sharp intensity gradients. ResNet-50 exhibited the most aggressive attribution responses with extensive high-intensity regions and complex spatial distributions, capturing multiple levels of input information simultaneously through comprehensive feature extraction approaches. MobileNet-V3 maintained balanced attribution intensity behavior with consistent response levels across severity classes, demonstrating robust feature extraction mechanisms that provide stable performance across varied input conditions.
Class Comparison Summary used MobileNet-V3 XRAI Analysis. The top row shows original brain images for each severity class, while the bottom row displays corresponding model attention patterns with prediction confidence scores, demonstrating systematic attention adaptation across the complete spectrum of Alzheimer’s disease severity.
The systematic comparison across severity classes using MobileNet-V3 in
Figure 11 revealed distinct attribution pattern characteristics for each dementia stage, demonstrating adaptive model responses to varying input conditions. For the MildDemented case, MobileNet-V3 generated diverse attribution regions with prominent yellow areas in cortical zones and red regions indicating moderate attention levels, creating a heterogeneous pattern that suggests multifocal feature detection. The ModerateDemented case displayed more extensive yellow and orange attribution coverage with concentrated high-attention regions in upper cortical areas, indicating heightened model sensitivity to structural changes characteristic of moderate disease progression.
The NonDemented case exhibited widespread yellow and orange attribution patterns with substantial coverage across cortical regions, demonstrating that MobileNet-V3 actively identifies specific normal tissue characteristics rather than showing minimal attribution for healthy cases. This extensive attribution in healthy brain tissue indicates the model’s sophisticated approach to distinguishing normal anatomical features from pathological changes. The VeryMildDemented case showed balanced attribution distribution with prominent yellow regions in peripheral cortical areas and blue regions in central ventricular spaces, suggesting the model’s capability to detect subtle early pathological markers while maintaining spatial organization that respects anatomical boundaries.
Across all severity levels, MobileNet-V3 demonstrated consistent color mapping behavior with blue regions corresponding to low attribution values in central ventricular areas, yellow and orange regions indicating moderate to high attribution in cortical tissue zones, and red regions marking intermediate attention levels. The attribution overlays maintained consistent registration between original tissue structure and attribution distribution, indicating proper spatial correspondence between model attention and input anatomy while preserving anatomical boundary relationships across all severity classifications.
The comprehensive XRAI implementation successfully demonstrated technical effectiveness across all three CNN architectures while maintaining perfect classification performance (1.000 confidence) across all tested cases. The framework generated consistent, reproducible attribution patterns with clear visual interpretability through consistent color mapping and spatial resolution that enables detailed analysis of model decision-making processes. The successful adaptation to architectures of EfficientNet-B4, ResNet-50, and MobileNet-V3 validates the implementation’s technical flexibility and robustness for diverse deep learning model interpretation applications in medical imaging, demonstrating effective integration between model inference and explainability analysis without compromising classification accuracy.
3.3. Clinical Web Application Validation
A comprehensive clinical validation of the web-based diagnostic interface was carried out to assess both its classification accuracy and explainability in realistic clinical use cases. The evaluation focused on Alzheimer’s disease across varying severity levels, particularly emphasizing the challenging MildDemented classification. The system, built on the optimized MobileNet-V3 Large architecture and deployed using the Gradio framework, allows clinicians to access the diagnostic tool directly through a web browser, without requiring specialized software or technical expertise.
Validation involved analysis of diagnostic workflow efficiency, classification confidence, and clinical interpretability of the XRAI attribution visualizations. Brain MRI images representing diverse neurodegenerative patterns were systematically tested, with automatic image preprocessing, including resizing to 224 × 224 pixels, RGB conversion, and normalization using dataset-specific parameters (mean: [0.2956, 0.2956, 0.2956]; std: [0.3059, 0.3059, 0.3058]), ensuring consistency with the model’s training configuration.
In the MildDemented case (
Figure 12), the system successfully classified a representative T1/T2 axial brain MRI image, showing clearly defined ventricles, cortical gray matter boundaries, and subcortical white matter features with 100.0% confidence. This high-confidence result demonstrates the model’s ability to accurately recognize early-stage neuroanatomical changes characteristic of mild dementia. The certainty of the classification outcome reduces ambiguity in clinical decision-making, offering reliable support for early-stage diagnosis.
Explainability was delivered through dual-mode XRAI visualizations: a heatmap and a salient region overlay. The heatmap, rendered using an inferno colormap, highlighted cortical and subcortical regions with varying attribution levels bright yellow denoting peak importance, transitioning through orange-red to low-attribution purple zones. Notably, the frontal and parietal cortices areas commonly affected in early dementia showed the highest attribution, while central ventricular regions, less relevant for mild dementia detection, maintained low importance.
To aid clinical interpretation, the interface provides contextual explanation such as: “Heatmap highlights brain regions most influential for Alzheimer’s severity classification.” This built-in guidance allows clinicians to understand the rationale behind the model’s decision without requiring expertise in ML or attribution methods. The interface also supports a user experience with drag-and-drop image upload, automatic preprocessing, and rapid inference, typically returning results within 20 s on CPU.
The system’s performance in this MildDemented case confirms its clinical readiness, combining accurate classification with clear, interpretable outputs. The perfect confidence score and well-aligned attribution patterns reflect strong model generalization from training data to real-world cases. Further evaluation on a ModerateDemented case (
Figure 13) tested the model’s ability to detect more pronounced neurodegenerative changes. The input MRI showed expected features of moderate dementia, including enlarged ventricles, cortical thinning, and visible white matter degeneration.
The model achieved a perfect classification confidence of 100.0% for ModerateDemented, indicating robust recognition of the more advanced pathology. The XRAI attribution map showed more concentrated activation in superior cortical regions, especially frontal and parietal areas, while enlarged ventricles received appropriately low attribution. This pattern matched the expected structural progression of moderate dementia, where cortical atrophy intensifies and ventricular enlargement becomes more prominent.
Salient region extraction highlighted a singular, clearly defined area in the superior cortex, reflecting the model’s specificity in isolating the most diagnostically significant changes. Compared to the more diffuse salient regions observed in the mild dementia case, this concentrated attention demonstrates the model’s ability to adapt its focus as pathological severity increase.
A comparative analysis between mild and moderate cases revealed that the model not only distinguishes severity levels but adjusts its spatial reasoning accordingly, shifting from broader attribution in early stages to more focused patterns in advanced stages. Processing time and interface usability remained consistent across both cases, reinforcing the system’s reliability in clinical settings.
Additional validation was performed using VeryMildDemented and NonDemented cases to cover the full cognitive spectrum. The VeryMildDemented case (
Figure 14) presented subtle MRI features, such as minimal ventricular changes and early cortical irregularities. Even so, the system correctly classified it with 100.0% confidence. The corresponding heatmap showed widespread cortical attribution especially in frontal, parietal, and temporal regions consistent with the diffuse, network-wide changes characteristic of very early dementia.
In the NonDemented case (
Figure 15), the system again achieved 100.0% confidence, correctly identifying healthy neuroanatomical features such as normal ventricle size, preserved cortical thickness, and intact white matter integrity. The attribution map differed substantially from pathological cases, showing moderate, systematically distributed cortical activations reflecting recognition of healthy tissue patterns. Salient region overlays confirmed this, highlighting preserved brain structures across multiple areas.
Together, these results confirm the model’s specificity and its ability to differentiate between normal and pathological brain images across all severity levels.
In conclusion, the web-based diagnostic interface demonstrated consistent, high-confidence classification performance across all four Alzheimer’s severity classes: NonDemented, VeryMildDemented, MildDemented, and ModerateDemented. The model’s explainability features, comprising intuitive heatmaps and focused salient region visualizations, provided clear insight into its decision-making, aligned with clinical understanding of disease progression. The user experience and rapid processing further support integration into real-world clinical workflows, making this system a strong candidate for AI-assisted dementia diagnosis and monitoring.