Next Article in Journal
A Dual-Attentive Multimodal Fusion Method for Fault Diagnosis Under Varying Working Conditions
Next Article in Special Issue
Cross-Domain Approach for Automated Thyroid Classification Using Diff-Quick Images
Previous Article in Journal
Directed Equilibrium Propagation Revisited
Previous Article in Special Issue
A Two-Stage Deep Learning Method for Auxiliary Diagnosis of Upper Limb Fractures Based on ResNet-50 and Enhanced YOLO
 
 
Article
Peer-Review Record

GazeMap: Dual-Pathway CNN Approach for Diagnosing Alzheimer’s Disease from Gaze and Head Movements

Mathematics 2025, 13(11), 1867; https://doi.org/10.3390/math13111867
by Hyuntaek Jung 1, Shinwoo Ham 1, Hyunyoung Kil 2, Jung Eun Shin 3,* and Eun Yi Kim 1,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Reviewer 5: Anonymous
Mathematics 2025, 13(11), 1867; https://doi.org/10.3390/math13111867
Submission received: 4 April 2025 / Revised: 5 May 2025 / Accepted: 26 May 2025 / Published: 3 June 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

With their paper on video-based gaze and head tracking for AD classification, the authors tackle a clinically highly relevant subject matter using a combined deep learning approach. The writing is concise and engaging, and the presentation of results is good. I commend the authors on applying their framework in an independent cohort of patients in a real-world scenario. However, there are several concerns regarding the contextualization and interpretability of results that need addressing before publication.

  1. The introduction provides only superficial coverage of video-oculography in clinical neuroscience. This is not a recent method but has been a major component of neurophysiological investigation for decades. I strongly recommend expanding this section by consulting with neurologists/clinical neuroscientists and incorporating established literature (e.g., https://www.imrpress.com/journal/JIN/17/4/10.31083/j.jin.2018.04.0416; https://academic.oup.com/brain/article/131/5/1268/424677; https://journals.lww.com/co-neurology/fulltext/2016/02000/eye_movements_in_neurodegenerative_diseases.12.aspx; https://www.frontiersin.org/journals/neurology/articles/10.3389/fneur.2017.00377/full).
  2. The authors should better situate their work within the rapidly emerging field of computer vision-enabled behavioral analyses. Relevant references include: https://pubmed.ncbi.nlm.nih.gov/36422668/; https://www.ahajournals.org/doi/10.1161/JAHA.123.030927; https://jamanetwork.com/journals/jamaneurology/fullarticle/2830246.
  3. The claims about head movement alterations in AD require more nuance. From a clinical perspective, I question the assumption that AD patients supplement eye movement deficits with head movements, as most AD cases do not exhibit limitations in gaze range or alignment that would drive such compensation. Please tone down this interpretation and better integrate head movement aspects throughout the paper. Consider incorporating literature on video-based head movement analysis in neurological disorders: https://pmc.ncbi.nlm.nih.gov/articles/PMC9082391/; https://www.nature.com/articles/s41746-024-01140-6; https://pubmed.ncbi.nlm.nih.gov/23825431/.
  4. The section on speech analysis seems tangential to the paper's focus. If contextualizing findings with other digital biomarkers, please be more concise and comprehensive (consider blood biomarkers, MRI, digital cognitive exams).
  5. The cohort descriptions lack critical details: How exactly was AD diagnosed and by whom/in what setting? What were the inclusion/exclusion criteria, particularly for the validation dataset captured outside hospitals?
  6. The authors should address how AD patients interacted with the technology and which factors might have influenced data collection, inclusion, and interpretation.
  7. What features did the CNN prioritize, and how do these relate to clinical heuristics? While the authors propose gazemaps, these are uncommon in eye tracking research. Please contextualize these findings with established clinical parameters. For clinical implementation, transparency and interpretability are essential—clinicians will not use a "black box" tool given the serious consequences of AD misdiagnosis.
  8. The justification for the gaze paradigms used requires improvement. Why did the authors not employ established paradigms like antisaccades? How well is their task validated in AD?
  9. Please provide a more explicit discussion of the study's limitations while highlighting the key strength of applying this tool in a real-world clinical population—a decisive step toward deploying computer vision frameworks in clinical practice.

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Minor Revisions 

Comments for author File: Comments.pdf

Comments on the Quality of English Language

Minor Revisions 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This paper presents an innovative framework for Alzheimer's disease detection that provides new ideas for non-invasive and low-cost early diagnosis. The experimental results show that the research has achieved good results on clinical data sets and real world data sets, which has certain research value in the field of medical image analysis and artificial intelligence-aided diagnosis. However, after carefully reading and evaluating your research results, I find that the paper needs to be further improved in some aspects. The following are some suggestions and opinions, which I hope will help you to further improve your paper.

1. In the study presented in this paper, the clinical dataset contains only 112 participants (61 healthy controls, 51 AD patients) and the real-world dataset contains only 44 participants (21 healthy controls, 23 AD patients). Such a sample size might be slightly smaller, and it would better to supplement the existing data set in subsequent studies or try to introduce other external data sets.    

2. As mentioned in the paper, the accuracy of the model has dropped significantly in the real world dataset, from 91.09% to 83.33% in the clinical dataset. The author is suggested to further analyze the causes of this phenomenon, which will make your research more explanatory.

3. Although the author introduced a dual-pathway architecture to process and fuse the features of different modalities separately in the model design, the feature fusion strategy adopted at present is relatively simple. It is suggested that the author can try to explore and mine the feature interaction patterns with more depth and complexity, for example, consider introducing mechanisms like the Transformer architecture or Cross-modal attention.

4. It was mentioned in this paper that the eye movement characteristics of AD patients were observed through the red dot tracking task in the experiment. However, the experimental design seems to be somewhat single and only involves a specific visual tracking mode. It should consider adding more diverse task scenarios in future experiments, which may be more helpful to further study the abnormal eye movement performance of patients.

5. When presenting the results of the study, it is recommended that you consider introducing a confusion matrix, which can visually show which classes the model is better at classifying and which classes are prone to misjudgment. In particular, for research on Alzheimer's disease detection, with the confusion matrix, it is possible to quickly understand the misclassification pattern of the model between different classes, for example, whether the model is more likely to misclassify AD patients as HCs, or HCs as AD patients.

6. The cases of false positives (healthy controls misclassified as having Alzheimer's disease) and false negatives (patients with Alzheimer's disease were missed) were not analyzed. For the diagnosis of Alzheimer's disease, false negative results may cause patients to miss critical opportunities for early intervention. It is recommended that the authors further analyze false negatives and false positives in follow-up studies. This will make your diagnostic model more reliable.

7. The paper mentions in section 5.1.1 that gradient descent is used to optimize the model. Besides this method, there are other optimization methods that may be better, such as AdamW, momentum gradient descent, etc. It is suggested that the author may consider using other optimization methods in the follow-up research and conduct comparative experiments, which will help to better optimize your model.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

Title: GazeMap: Dual-Pathway CNN Approach for Diagnosing Alzheimer’s Disease from Gaze and Head Movements

This study proposes a novel AD detection framework that integrates gaze and head movement analysis using a dual- pathway convolutional neural network (CNN). Unlike conventional methods relying on linguistic, speech, or neuroimaging data, this approach leverages noninvasive video-based tracking, offering a more accessible and cost-effective solution. The work is good, and the authors have presented their work in detail; however, this manuscript currently needs some correction that should be addressed to be ready for publication.

Remarks to the Author:  Please see the full comments.

1- In the Abstract, a comparison with the baseline models was mentioned. What are these models? Please list them. And how was the model performance validated?

2- In fact, the paper contains some grammatical and linguistic errors, and these errors lead to a decrease in the reader's understanding, so the manuscript should be reviewed carefully.

3- Please list the contributions of the proposed work in the end of the introduction section instead of mentioning them separately in sections 1 and 2 (2.2, and 3.2).

4- The introduction section provides a comprehensive foundation on the topic of this research based on various works; however, some other recent works need to be added to this section.

5- Any information, figure, equation, or dataset taken from a previous source must be cited as a reliable source, unless it relates to the authors. Please, check this issue for the entire manuscript.

6- Can the authors provide more explanation about the feature extraction process and define these extracted features.

Besides, it is required to provide a figure to show the whole workflow of the proposed system.

7- It is mentioned that “To facilitate the CNN-based processing of gaze cues, we convert gaze angles into a gaze saliency map, inspired by optical flow visualization”. How “optical flow visualization” facilitates the stated process. Please explain.

8- Why does the caption of Figure 2 repeat the same information in the beginning of section 3.1. It is recommended to change the caption. Check this issue for the whole manuscript.

9- Can the authors provide further clarification and details about the datasets (in-distribution and out-of-distribution datasets) used in this work represented by a Tabular form.

10- Why three-fold cross-validation has been used for evaluation performance. This point should be explained clearly.

11- It is stated that “To evaluate robustness against missing data, we randomly dropped feature sequences …...”.  How was this step performed and how was the robustness increased?

12- It is stated that “Finally, we vary the channel capacity of the subpathway (head pose) by modifying the ratio…..”. Larger ratios (1/1, 1/2) provide no further gains, what about lesser ratio? It required to clarify this point.

13- It is recommended to make the conclusion section as one paragraph.

Comments on the Quality of English Language

The English could be improved to more clearly express the research.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 5 Report

Comments and Suggestions for Authors

Could you please answer the following questions:

 

  • Why did you classify the conventional methods which relying on linguistic as invasive method?
  • The gaze and head movements in the other disease such as vascular dementia, Parkinson, frontotemporal dementia and depression can be the same. How is your proposed method distinguishing the patient suffer from the Alzheimer’s Disease?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have done a great job addressing all comments, turning the manuscript in a well balanced and relevant contribution to the field. Bravo!

Reviewer 3 Report

Comments and Suggestions for Authors

Accept in present form

Reviewer 5 Report

Comments and Suggestions for Authors

Thank you very much for your valuable answers to the comments.

Back to TopTop