Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Patient-Tailored Dementia Diagnosis with CNN-Based Brain MRI Classification

Appl. Sci. 2025, 15(9), 4652; https://doi.org/10.3390/app15094652

by Zofia Knapińska

and Jan Mulawka^*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Appl. Sci. 2025, 15(9), 4652; https://doi.org/10.3390/app15094652

Submission received: 24 March 2025 / Revised: 17 April 2025 / Accepted: 21 April 2025 / Published: 23 April 2025

(This article belongs to the Section Applied Biosciences and Bioengineering)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The network architecture is the core part of the paper, and convolutional neural networks are common neural networks. The author's innovation in convolutional neural networks is not clearly reflected.
The algorithm used by the author lacks a comprehensive comparison with other advanced algorithms in the field, such as time complexity, detection accuracy, etc.
The introduction section lacks sufficient analysis on the research progress of deep learning methods in dementia diagnosis.

Author Response

We sincerely thank You for Your thoughtful and constructive feedback, which helped us significantly improve the quality and clarity of the manuscript. Below, we provide detailed responses to each of the points raised, highlighting the corresponding revisions made.

Comment 1: The network architecture is the core part of the paper, and convolutional neural networks are common neural networks. The author's innovation in convolutional neural networks is not clearly reflected.

Response 1: We have significantly expanded the relevant sections in the Introduction (lines 160-175) to provide a deeper contextualization of our contributions relative to existing CNN-based methods for dementia diagnosis.
Specifically, we have outlined the following innovations of our proposed method:

Lightweight 2D-CNN Architecture: We implemented a low-complexity model designed to maintain high diagnostic accuracy while reducing computational and memory demands.
Single-Slice Classification Strategy: Rather than relying on full volumetric input or multiple slices, we introduced a novel approach based on identifying and using a single representative axial slice per subject, optimizing performance and mitigating redundancy.
Efficiency and Generalizability: Our architecture achieves high diagnostic performance with significantly reduced training time and data requirements, making it suitable for settings with limited computational resources or small datasets.
Real-Time Clinical Applicability: Due to the low computational burden and fast inference time, the model presents strong potential for integration into clinical workflows or decision-support systems.

To further contextualize our work, we have incorporated a detailed overview of current state-of-the-art deep learning methods, including those using hybrid CNN-attention models (e.g., Illakiya et al., 2022; Zhang et al., 2021), 3D CNNs (e.g., Kang et al., 2022), and landmark-centered patch-based strategies (e.g., Liu et al., 2023; Lian et al., 2021). While these approaches demonstrate strong performance, they often come with greater complexity, longer training times, and limited real-time feasibility.

Our approach is distinct in preserving strong diagnostic power while prioritizing simplicity, efficiency, and clinical scalability. The revised manuscript (lines 95-144) now better highlights this contribution in comparison to the literature.

Comment 2: The algorithm used by the author lacks a comprehensive comparison with other advanced algorithms in the field, such as time complexity, detection accuracy, etc.

Response 2: In the revised manuscript, we have addressed this in two ways:

Quantitative Comparison with Pre-trained CNNs: We added a detailed performance comparison between our custom-built models and several state-of-the-art transfer learning (TL) architectures, including ResNet50, Xception, MobileNet, and VGG16 (see Section 3.1 and Table 5). These TL-based models were fine-tuned on the same dataset to ensure fair comparison, and we used consistent output layers and training conditions to maintain reproducibility.
Discussion of Time Complexity and Efficiency: We expanded our methodological description (Sections 2.2.2 and 2.2.3) to include architectural differences (e.g., number of Conv2D layers and filters), and efficiency considerations. Our results demonstrate that the proposed lightweight model achieves competitive classification accuracy (AD vs. MCI vs. CN) with significantly reduced computational requirements compared to deeper, pre-trained models.

Additionally, in the Introduction (lines 113-137), we reviewed and cited recent studies that employed more complex architectures (e.g., Lian et al., 2021; Chen and Xia, 2023; Poloni and Ferrari, 2022), discussing their trade-offs in terms of performance and scalability. This comparison further supports the rationale behind our streamlined model choice.

Comment 3: The introduction section lacks sufficient analysis on the research progress of deep learning methods in dementia diagnosis.

Response 3: We have substantially revised the Introduction (lines 95-150) to provide a more comprehensive review of the current landscape in deep learning for dementia diagnosis, covering:

Traditional Machine Learning Limitations: We discuss the dependence of ML methods on manual feature engineering and predefined biomarkers, which are both time-consuming and prone to variability. 
CNNs and Their Evolution: We provide an overview of 2D and 3D CNN approaches, their architectural improvements (e.g., attention modules, region-specific modeling), and their applicability to dementia classification. 
Recent Studies and Approaches: We now reference and discuss a wide range of relevant studies, including:

- Liu et al. (2023): Anatomically guided patch-based learning to optimize memory use and reduce overfitting.
- Chen and Xia (2023): Sparse regression-based ROI extraction for enhanced spatial precision.
- Poloni and Ferrari (2022): Focused analysis of hippocampal structures for classification.
- Lian et al. (2021): Multi-level feature fusion using hybrid 3D CNNs.
- Zhang et al. (2021): Integration of multimodal 3D attention for AD and MCI differentiation.

These additions are intended to give readers a clearer sense of the state-of-the-art, highlight the motivations behind our approach, and position our work as an efficient and practical alternative within the broader research landscape.

We thank You once again for Your thoughtful input, which helped us strengthen both the scientific depth and clarity of our manuscript. We believe the revisions now clearly articulate our contributions and adequately address Your concerns.

Sincerely, 

Zofia Knapinska 

(On behalf of all authors)

Reviewer 2 Report

Comments and Suggestions for Authors

This manuscript investigates the application of convolutional neural network (CNN)-based brain MRI classification models in the early diagnosis and personalized management of dementia (including AD, MCI, and CN). The study utilizes T1-weighted MRI data from the ADNI database and performs classification using both a self-developed CNN model and transfer learning (TL) models such as ResNet50 and VGG16. The results show that TL models achieved an average accuracy of 97.64%, outperforming the self-developed model (89.75%). In addition, the study employs Grad-CAM to visualize key brain regions, demonstrating the interpretability of the models. Overall, this study has certain practical significance, but several issues remain. Major revisions are recommended before considering acceptance.

Major comments:

Although T1-weighted MRI data are inherently three-dimensional, the authors only used 2D slices and built 2D CNN models. It is recommended to explore 3D models based on volumetric data to capture spatial information more comprehensively. Additionally, the selection of 50 axial middle slices appears arbitrary and lacks justification. The authors should clarify the rationale behind this choice and supplement experiments using other planes (e.g., coronal, sagittal) for comparison.
The authors should provide access to the code used in the study and the processed ADNI data to enhance reproducibility. If ADNI data are restricted and cannot be shared publicly, the manuscript should clearly state the ethical approval process and whether informed consent was obtained from participants.
The authors have manually assembled several custom CNN architectures, but the necessity of such designs is questionable based on model construction logic and experimental results. It is advised to compare the proposed models with more well-established deep learning networks (e.g., DenseNet, EfficientNet) as well as traditional non-deep learning methods.
The manuscript should include additional performance metrics such as AUC、ACC to provide a more comprehensive evaluation of the model performance.

Minor comments:

The number of references is insufficient, and there is a lack of review on related existing work in the field. Furthermore, citations for the models used in comparison are missing. It is recommended to supplement relevant literature to strengthen the background and context of the study.
It is suggested to annotate relevant anatomical structures (e.g., hippocampus, lateral ventricles) in the Grad-CAM heatmaps to improve the interpretability and clinical relevance of the visualizations.
The font size in all figures is too small, which negatively affects the reading experience. It is recommended to uniformly enlarge the text in figures to improve visual clarity.

Author Response

We sincerely thank You for Your thorough and insightful comments, which have contributed meaningfully to improving the quality of our manuscript. Below, we address each point raised, explaining the revisions made and providing clarifications where necessary.

Major Comments

Comment 1: Although T1-weighted MRI data are inherently three-dimensional, the authors only used 2D slices and built 2D CNN models. It is recommended to explore 3D models based on volumetric data to capture spatial information more comprehensively. Additionally, the selection of 50 axial middle slices appears arbitrary and lacks justification. The authors should clarify the rationale behind this choice and supplement experiments using other planes (e.g., coronal, sagittal) for comparison.
Response 1: We greatly appreciate this valuable observation. We acknowledge that 3D CNNs offer richer spatial context and that using only 2D axial slices may lead to some loss of inter-slice continuity. In our revised manuscript, we have expanded the discussion significantly to reflect this limitation (see section 4.1.). We now also highlight the rationale for adopting a 2D approach at this stage - namely, the need to reduce computational cost, facilitate model interpretability, and ensure robustness on relatively limited data (see Introduction and section 4.1.).
The selection of 50 central axial slices was made with the specific intention of maximizing the inclusion of clinically relevant brain regions. These middle slices most often capture early neurodegenerative changes typical of dementia, such as medial temporal atrophy, cortical thinning, and ventricular enlargement (further justified in section 2.1.2.).
Furthermore, we agree with the reviewer that evaluating alternative planes is essential. While this study was designed as a proof-of-concept using axial slices, we have now included a dedicated paragraph (4.1. section, lines 547-553) stating our intention to extend the framework by incorporating sagittal and coronal views and eventually migrating to volumetric 3D CNN architectures.

Comment 2: The authors should provide access to the code used in the study and the processed ADNI data to enhance reproducibility. If ADNI data are restricted and cannot be shared publicly, the manuscript should clearly state the ethical approval process and whether informed consent was obtained from participants.
Response 2: We have now added a GitHub repository link to the manuscript, which contains the full codebase, model architectures, training configuration files, and instructions for replication. The repository is available at: https://github.com/ZKnapinskaWUT/Patient-Tailored-Dementia-Diagnosis-with-CNN-Based-Brain-MRI-Classification
Regarding the data, we confirm that all MRI scans were sourced from the Alzheimer’s Disease Neuroimaging Initiative (ADNI), following the necessary registration and approval process. Informed consent was obtained from all participants, and the data were anonymized in accordance with ADNI’s ethical policies. We have updated the manuscript to include a Data Availability Statement (lines 612-621) and a clear note on the ethical approval and informed consent procedures (lines 179-189). Due to ADNI's data usage policy, we are unable to share the processed dataset directly, but we have provided instructions for other researchers to obtain the same data through the official ADNI portal.

Comment 3: The authors have manually assembled several custom CNN architectures, but the necessity of such designs is questionable based on model construction logic and experimental results. It is advised to compare the proposed models with more well-established deep learning networks (e.g., DenseNet, EfficientNet) as well as traditional non-deep learning methods.
Response 3: The purpose of including custom CNN architectures was to explore lightweight, interpretable designs that could be potentially suitable for deployment in clinical settings with limited computational resources. However, we agree that benchmarking against state-of-the-art models is essential.
Accordingly, our manuscript already includes a comparative evaluation with several well-established transfer learning (TL) models: ResNet50, VGG16, MobileNet, and Xception (Table 5). These comparisons are discussed in both the “Results” and “Discussion” sections. The TL-based models consistently performed better than our custom designs, though the performance margin was relatively small, which supports the value of our lightweight alternatives in resource-constrained contexts.
While DenseNet and EfficientNet were not included in the original scope of the study, we will consider these models in our extended framework. As for traditional machine learning methods, we did reference and reviewed approaches such as SVM classifiers and radiomics-inspired features in the "Introduction" extensively. However, given the focus on DL-based end-to-end architectures, we did not reimplement traditional pipelines.

Comment 4: The manuscript should include additional performance metrics such as AUC, ACC to provide a more comprehensive evaluation of the model performance.
Response 4: We have revised the Results section to include additional metrics such as Accuracy (ACC) and Area Under the ROC Curve (AUC) (Table 5). These metrics are now reported for each class (AD, MCI, CN) and across all evaluated models. We believe this provides a more balanced and informative picture of the models’ diagnostic utility and class-wise performance, particularly given the clinical relevance of distinguishing MCI from AD and CN.

Minor Comments

Comment 5: The number of references is insufficient, and there is a lack of review on related existing work in the field. Furthermore, citations for the models used in comparison are missing. It is recommended to supplement relevant literature to strengthen the background and context of the study.
Response 5: We acknowledge this shortcoming. In the revised manuscript, we have significantly expanded the literature review in the "Introduction" section. We now include citations for all TL models used in our comparisons (line 258) and refer to recent benchmark studies on AD diagnosis using MRI. The background has been enhanced with a more comprehensive discussion of CNN variants, attention mechanisms, and hybrid models used in this domain.

Comment 6: It is suggested to annotate relevant anatomical structures (e.g., hippocampus, lateral ventricles) in the Grad-CAM heatmaps to improve the interpretability and clinical relevance of the visualizations.
Response 6: We acknowledge the valuable suggestion to annotate relevant anatomical structures (e.g., hippocampus, lateral ventricles) in the Grad-CAM heatmaps to enhance interpretability and clinical relevance. While this feature has not yet been implemented, we agree that such annotations would significantly improve the utility of the visualizations. As outlined in Section 4.3, "Future Development Potential", we envision future iterations of the model incorporating scalable brain atlas frameworks and diagnostic scales based on imaging biomarkers. This would enable accurate localization of neurodegenerative changes across the brain and allow mapping of heatmap features to clinically meaningful structures and dementia stages, thereby enriching both diagnostic and prognostic insights.

Comment 7: The font size in all figures is too small, which negatively affects the reading experience. It is recommended to uniformly enlarge the text in figures to improve visual clarity.
Response 7: We have updated all figures to ensure uniformly enlarged font sizes and enhanced resolution. These modifications aim to improve the clarity and visual impact of the illustrations.

We are grateful for the thoughtful and constructive feedback. The revisions made in response to the major and minor comments have strengthened the manuscript’s scientific rigor, clarity, and presentation. We hope the updated version addresses all concerns satisfactorily and meets the standards for publication.

Sincerely, 

Zofia Knapinska 

(On behalf of all co-authors)

Reviewer 3 Report

Comments and Suggestions for Authors

Overall, the text is very well-written and scientifically sound, with only minor improvements suggested to enhance readability and rigor.

Starting with the introduction, if the focus is on the use of CNNs in dementia diagnosis, the discussion on current limitations and future perspectives could be expanded.

In the later sections of the manuscript, the discussion on the limitations of 2D modeling could specify how a 3D model would improve the analysis (for example, by mentioning techniques such as 3D CNN or Voxel-Based Morphometry). I believe this would make the manuscript more comprehensive and in-depth.

The section on Grad-CAM analysis is well-argued, but I think it could benefit from a comparison with other CNN interpretability techniques, such as SHAP or LIME, to provide a broader context on AI explainability methodologies.

The description of atrophy patterns is accurate and clear, but I believe it could be improved by citing more specific quantitative biomarkers, such as cortical thickness or hippocampal volume, if possible.

Also, in the concluding part, the discussion on CNN depth and the number of filters is correct, but it lacks quantitative details. What is the numerical performance of the models? Were metrics such as AUC-ROC, F1-score, or precision-recall used? I think these should be mentioned in this section if possible.

The conclusion could more clearly emphasize the key findings and how they could be implemented in clinical practice. This would significantly enhance the value of the text.

Finally, regarding the text itself, I believe there are minor modifications needed to improve fluency. Additionally, some concepts are repeated multiple times, making certain sections somewhat redundant.

Overall, I believe that after some refinements, the manuscript is suitable for publication.

Author Response

We would like to sincerely thank You for Your thoughtful and constructive feedback. We appreciate the time and expertise invested in reviewing our manuscript, and we have revised the text to incorporate the suggestions provided. Please find our detailed point-by-point responses below.

Comment 1: “Starting with the introduction, if the focus is on the use of CNNs in dementia diagnosis, the discussion on current limitations and future perspectives could be expanded.”
Response 1: We have significantly expanded the Introduction section (lines 80–150) to provide a more comprehensive overview of the current limitations and future directions in CNN-based dementia diagnosis. This includes a detailed discussion on the trade-offs between 2D and 3D modeling, the challenges posed by manual feature engineering in traditional machine learning methods, and the integration of attention mechanisms and multimodal approaches to enhance CNN performance and clinical relevance. We have also added several recent studies to illustrate emerging strategies and to emphasize the evolving trajectory of AI-based neuroimaging analysis.

Comment 2: “In the later sections of the manuscript, the discussion on the limitations of 2D modeling could specify how a 3D model would improve the analysis (for example, by mentioning techniques such as 3D CNN or Voxel-Based Morphometry). I believe this would make the manuscript more comprehensive and in-depth.”
Response 2: We have revised the Study Limitations section accordingly (lines 522–553). We now explicitly describe how 3D CNNs and voxel-based morphometry (VBM) can enhance the analysis by capturing volumetric atrophy patterns, preserving spatial continuity, and offering voxel-wise assessments of neurodegeneration. We highlight their potential to uncover subtle and dispersed structural changes that may not be visible in 2D representations. These additions help contextualize the limitations of our current approach and clearly outline the benefits of volumetric modeling for future research.

Comment 3: “The section on Grad-CAM analysis is well-argued, but I think it could benefit from a comparison with other CNN interpretability techniques, such as SHAP or LIME, to provide a broader context on AI explainability methodologies.”
Response 3: We have updated the Discussion section (lines 477–520) to include a comparative analysis of Grad-CAM with other interpretability techniques, including LIME, SHAP, and Shap-CAM. This revised section now outlines the strengths and limitations of each method in the context of medical imaging, especially regarding spatial resolution, computational demands, and clinical interpretability. We emphasize why Grad-CAM is particularly suited for CNN-based dementia models while also recognizing the potential of hybrid approaches that combine multiple techniques to enhance transparency and trustworthiness in clinical AI systems.

Comment 4: “The description of atrophy patterns is accurate and clear, but I believe it could be improved by citing more specific quantitative biomarkers, such as cortical thickness or hippocampal volume, if possible.”
Response 4: We have now included references to specific quantitative biomarkers, such as hippocampal volume and cortical thickness, in the revised Introduction and Results sections (Section 3.2. Proposed CDSS for Dementia Grad-CAM, lines 422-440). These biomarkers are acknowledged as robust, quantifiable indicators of early neurodegeneration and are increasingly used in AI-driven frameworks. This addition strengthens the scientific rigor of our manuscript and provides a clearer link between neuroanatomical changes and model predictions.

Comment 5: “Also, in the concluding part, the discussion on CNN depth and the number of filters is correct, but it lacks quantitative details. What is the numerical performance of the models? Were metrics such as AUC-ROC, F1-score, or precision-recall used? I think these should be mentioned in this section if possible.”
Response 5: In the revised Discussion section (lines 449-465), we now provide key evaluation metrics, including accuracy and AUC-ROC. F1 score and precision-recall values are extensively discussed in the Results section (lines 315 - 406 and Table 5). These metrics offer a clearer understanding of model performance, allowing for a more informed interpretation of the results and their potential clinical utility. Where applicable, we have referenced the performance of both our model and comparable state-of-the-art frameworks.

Comment 6: “The conclusion could more clearly emphasize the key findings and how they could be implemented in clinical practice. This would significantly enhance the value of the text.”
Response 6: We have revised the Conclusion section to better highlight the practical implications of our findings. We now summarize the key contributions of our model - such as its high classification accuracy, use of interpretability tools, and efficiency in resource-constrained environments - and explain how these features could be integrated into clinical workflows. We hope this revised conclusion offers a clearer bridge between research findings and real-world applications.

Comment 7: “Finally, regarding the text itself, I believe there are minor modifications needed to improve fluency. Additionally, some concepts are repeated multiple times, making certain sections somewhat redundant.”
Response 7: We have carefully reviewed the entire manuscript to improve the fluency and reduce redundancy. Repetitive phrases and duplicated concepts have been edited or removed to ensure the text flows more smoothly and maintains focus. Additionally, we have restructured some paragraphs for improved coherence and readability.

Once again, we thank You for Your thoughtful feedback, which has significantly improved the quality of our manuscript. We hope that our revisions and responses address all the concerns raised and demonstrate our commitment to producing a scientifically sound and clinically relevant contribution to the field.

Sincerely,
Zofia Knapinska
(On behalf of all co-authors)

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The revisions have been made in accordance with the reviewer's comments.

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have done well in revising their manuscript. My recommendation therefore is acceptance in present form.

Reviewer 3 Report

Comments and Suggestions for Authors

I would like to thank the authors for accepting the revisions and my comments on their interesting manuscript, which I had suggested. Based on the updated version of their work, I believe they have addressed the points I raised in an excellent manner, and I therefore consider it suitable for publication.

Article Menu

Patient-Tailored Dementia Diagnosis with CNN-Based Brain MRI Classification

Further Information

Guidelines

MDPI Initiatives

Follow MDPI