Next Article in Journal
Correlation Between Thyroid Nodule Size and Risk of Thyroid Cancer: A Retrospective Cohort Study at a Tertiary Care Center
Previous Article in Journal
A Comparative Study of Rapid Fresh Pathology Imaging and Standard FFPE H&E Histopathology: A High Concordance in the Evaluation of Lung and Breast Cancer
Previous Article in Special Issue
Performance and Clinical Utility of Deep Learning for Detecting Referable Age-Related Macular Degeneration on Fundus Photographs: A Systematic Review and Meta-Analysis
 
 
Article
Peer-Review Record

Comparative Analysis of General-Purpose vs. Domain-Specific Multimodal Models for Diabetic Retinopathy Classification

Diagnostics 2026, 16(10), 1504; https://doi.org/10.3390/diagnostics16101504
by Mohammad Iqbal Nouyed 1, Mohammad Al-Mamun 2, Donald A. Adjeroh 3 and Gangqing Hu 1,4,5,*
Reviewer 2: Anonymous
Diagnostics 2026, 16(10), 1504; https://doi.org/10.3390/diagnostics16101504
Submission received: 19 March 2026 / Revised: 6 May 2026 / Accepted: 14 May 2026 / Published: 15 May 2026

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors
  • The proposed work appears to be an extension of the existing work by Ayhan et al. [13], aimed at demonstrating the credibility of prompt-based or general-purpose large multimodal models (LMMs) in comparison with domain-specific models.
  • Details of data augmentation should be consolidated and presented in a single section, as many of the parameters are common across models.
  • The term “accum_steps,” used in the context of learning rate computation, should be clearly defined.
  • The fine-tuning section in Figure 1 should be updated to improve clarity and provide a better understanding of the proposed methodology.
  • Clarity regarding the dataset size after data augmentation, as well as details of the test dataset, should be included. This is important for proper interpretation of the results. In previous works, along with 10-fold cross-validation, a test set (typically 10% of the data) is also considered.
  • Uniform evaluation metrics, including specificity, should be reported across all results (Tables 3 and 4) to enable fair and comprehensive comparison among the presented models.
  • Although EyeCLIP and MedSigLIP are designed for multimodal (image and text) inputs, the evaluation has been performed only on image data. The authors are requested to justify this choice.
  • Confidence intervals should be included along with the reported results (mean + std) to provide better insight into the statistical significance.
  • Explainability analysis should be incorporated using techniques such as Grad-CAM to provide better insights into the model’s decision-making process.
  • The generalization capability of the proposed approach can be further evaluated using external datasets such as the DeepDRiD dataset (ISBI Challenge) [Liu et al., 2022].

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors
  1. A limited dataset size (IDRiD with 516 images) may limit the generalizability and robustness of the conclusions. Use other DR datasets
  2. The study focuses only on binary classification, ignoring multi-grade DR severity, which reduces clinical relevance.
  3. No statistical significance testing was provided to support performance differences between models.
  4. Prompt-based evaluation lacks reproducibility due to variability in LMM responses and sampling strategies.
  5. Explainability analysis is discussed but not quantitatively evaluated, limiting clinical interpretability.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The reviewer appreciates the efforts put forth by the authors to address the suggestions and comments raised.

The authors are further advised to review the entire paper for any possible grammatical errors, such as the one in line 157:
“Although EyeCLIP and MedSigLIP are designed for multimodal (image + text) inputs, we deliberately used their visual encoders as the focus of the study in on (should be "is on") image classification.”

Author Response

Thanks for the comments on writing. We have used a professional manuscript editing agent from author service to further improve the readability of the manuscript. All changes have been highlighted in red.

Reviewer 2 Report

Comments and Suggestions for Authors

I'm satisfied with the revised manuscript. 

Author Response

We are glad to see that you are satisfied with our revision.

Back to TopTop