This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Beyond Accuracy: A Multi-dimensional Cognitive Audit of Medical Large Vision–Language Models in Fundus Image Interpretation
1
Institute of Data Science, City University of Macau, Macau 999078, China
2
Institute of Artificial Intelligence, Putian University, Putian 351100, China
3
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(12), 6064; https://doi.org/10.3390/app16126064 (registering DOI)
Submission received: 9 May 2026
/
Revised: 7 June 2026
/
Accepted: 10 June 2026
/
Published: 15 June 2026
Abstract
Reliance on standalone accuracy limits credible assessment of fundus-focused large vision–language models (LVLMs), as high scores often stem from linguistic shortcuts rather than real visual reasoning. This work develops the Cognitive Audit Framework (CAF), a four-module automated auditing pipeline that dissects model reasoning flaws: Visual–Linguistic Decoupling (textual dependency via modality ablation), Hierarchical Logical Consistency (lesion–diagnosis contradiction detection), Reasoning Fidelity Gap (chain-of-thought unfaithfulness scoring), and Contextual Robustness (positional bias under option permutation). Experiments on six 7B–31B LVLMs over FunBench reveal a notable gap between benchmark accuracy and reasoning quality: high accuracy coexists with measurable textual dependency, logical inconsistencies across diagnostic levels, limited chain-of-thought faithfulness, and non-trivial positional sensitivity. CAF serves as a reproducible complement to pure accuracy metrics for validating clinical competence of ophthalmic multimodal models.
Share and Cite
MDPI and ACS Style
Zhang, J.; Zheng, S.; Liu, X.; Gu, J.
Beyond Accuracy: A Multi-dimensional Cognitive Audit of Medical Large Vision–Language Models in Fundus Image Interpretation. Appl. Sci. 2026, 16, 6064.
https://doi.org/10.3390/app16126064
AMA Style
Zhang J, Zheng S, Liu X, Gu J.
Beyond Accuracy: A Multi-dimensional Cognitive Audit of Medical Large Vision–Language Models in Fundus Image Interpretation. Applied Sciences. 2026; 16(12):6064.
https://doi.org/10.3390/app16126064
Chicago/Turabian Style
Zhang, Jingling, Shuting Zheng, Xiangfei Liu, and Jia Gu.
2026. "Beyond Accuracy: A Multi-dimensional Cognitive Audit of Medical Large Vision–Language Models in Fundus Image Interpretation" Applied Sciences 16, no. 12: 6064.
https://doi.org/10.3390/app16126064
APA Style
Zhang, J., Zheng, S., Liu, X., & Gu, J.
(2026). Beyond Accuracy: A Multi-dimensional Cognitive Audit of Medical Large Vision–Language Models in Fundus Image Interpretation. Applied Sciences, 16(12), 6064.
https://doi.org/10.3390/app16126064
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article metric data becomes available approximately 24 hours after publication online.