Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Comparative Analysis of General-Purpose vs. Domain-Specific Multimodal Models for Diabetic Retinopathy Classification

Diagnostics 2026, 16(10), 1504; https://doi.org/10.3390/diagnostics16101504

by Mohammad Iqbal Nouyed¹

, Mohammad Al-Mamun², Donald A. Adjeroh³ and Gangqing Hu^1,4,5,*

Reviewer 1:

Venkata Sainath Gupta Thadikemalla

Reviewer 2: Anonymous

Diagnostics 2026, 16(10), 1504; https://doi.org/10.3390/diagnostics16101504

Submission received: 19 March 2026 / Revised: 6 May 2026 / Accepted: 14 May 2026 / Published: 15 May 2026

(This article belongs to the Special Issue AI-Powered Ophthalmic Diagnostics: From Image Segmentation to Disease Classification)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The proposed work appears to be an extension of the existing work by Ayhan et al. [13], aimed at demonstrating the credibility of prompt-based or general-purpose large multimodal models (LMMs) in comparison with domain-specific models.
Details of data augmentation should be consolidated and presented in a single section, as many of the parameters are common across models.
The term “accum_steps,” used in the context of learning rate computation, should be clearly defined.
The fine-tuning section in Figure 1 should be updated to improve clarity and provide a better understanding of the proposed methodology.
Clarity regarding the dataset size after data augmentation, as well as details of the test dataset, should be included. This is important for proper interpretation of the results. In previous works, along with 10-fold cross-validation, a test set (typically 10% of the data) is also considered.
Uniform evaluation metrics, including specificity, should be reported across all results (Tables 3 and 4) to enable fair and comprehensive comparison among the presented models.
Although EyeCLIP and MedSigLIP are designed for multimodal (image and text) inputs, the evaluation has been performed only on image data. The authors are requested to justify this choice.
Confidence intervals should be included along with the reported results (mean + std) to provide better insight into the statistical significance.
Explainability analysis should be incorporated using techniques such as Grad-CAM to provide better insights into the model’s decision-making process.
The generalization capability of the proposed approach can be further evaluated using external datasets such as the DeepDRiD dataset (ISBI Challenge) [Liu et al., 2022].

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

A limited dataset size (IDRiD with 516 images) may limit the generalizability and robustness of the conclusions. Use other DR datasets
The study focuses only on binary classification, ignoring multi-grade DR severity, which reduces clinical relevance.
No statistical significance testing was provided to support performance differences between models.
Prompt-based evaluation lacks reproducibility due to variability in LMM responses and sampling strategies.
Explainability analysis is discussed but not quantitatively evaluated, limiting clinical interpretability.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The reviewer appreciates the efforts put forth by the authors to address the suggestions and comments raised.

The authors are further advised to review the entire paper for any possible grammatical errors, such as the one in line 157:
“Although EyeCLIP and MedSigLIP are designed for multimodal (image + text) inputs, we deliberately used their visual encoders as the focus of the study in on (should be "is on") image classification.”

Author Response

Thanks for the comments on writing. We have used a professional manuscript editing agent from author service to further improve the readability of the manuscript. All changes have been highlighted in red.

Article Menu

Comparative Analysis of General-Purpose vs. Domain-Specific Multimodal Models for Diabetic Retinopathy Classification

Further Information

Guidelines

MDPI Initiatives

Follow MDPI