Beyond Accuracy: A Multi-dimensional Cognitive Audit of Medical Large Vision–Language Models in Fundus Image Interpretation

Zhang, Jingling; Zheng, Shuting; Liu, Xiangfei; Gu, Jia

doi:10.3390/app16126064

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Beyond Accuracy: A Multi-dimensional Cognitive Audit of Medical Large Vision–Language Models in Fundus Image Interpretation

¹

Institute of Data Science, City University of Macau, Macau 999078, China

²

Institute of Artificial Intelligence, Putian University, Putian 351100, China

³

Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(12), 6064; https://doi.org/10.3390/app16126064 (registering DOI)

Submission received: 9 May 2026 / Revised: 7 June 2026 / Accepted: 10 June 2026 / Published: 15 June 2026

Download Versions Notes

Abstract

Reliance on standalone accuracy limits credible assessment of fundus-focused large vision–language models (LVLMs), as high scores often stem from linguistic shortcuts rather than real visual reasoning. This work develops the Cognitive Audit Framework (CAF), a four-module automated auditing pipeline that dissects model reasoning flaws: Visual–Linguistic Decoupling (textual dependency via modality ablation), Hierarchical Logical Consistency (lesion–diagnosis contradiction detection), Reasoning Fidelity Gap (chain-of-thought unfaithfulness scoring), and Contextual Robustness (positional bias under option permutation). Experiments on six 7B–31B LVLMs over FunBench reveal a notable gap between benchmark accuracy and reasoning quality: high accuracy coexists with measurable textual dependency, logical inconsistencies across diagnostic levels, limited chain-of-thought faithfulness, and non-trivial positional sensitivity. CAF serves as a reproducible complement to pure accuracy metrics for validating clinical competence of ophthalmic multimodal models.

Keywords: large vision–language models; visual reasoning; cognitive audit framework; clinical comprehension

Share and Cite

MDPI and ACS Style

Zhang, J.; Zheng, S.; Liu, X.; Gu, J. Beyond Accuracy: A Multi-dimensional Cognitive Audit of Medical Large Vision–Language Models in Fundus Image Interpretation. Appl. Sci. 2026, 16, 6064. https://doi.org/10.3390/app16126064

AMA Style

Zhang J, Zheng S, Liu X, Gu J. Beyond Accuracy: A Multi-dimensional Cognitive Audit of Medical Large Vision–Language Models in Fundus Image Interpretation. Applied Sciences. 2026; 16(12):6064. https://doi.org/10.3390/app16126064

Chicago/Turabian Style

Zhang, Jingling, Shuting Zheng, Xiangfei Liu, and Jia Gu. 2026. "Beyond Accuracy: A Multi-dimensional Cognitive Audit of Medical Large Vision–Language Models in Fundus Image Interpretation" Applied Sciences 16, no. 12: 6064. https://doi.org/10.3390/app16126064

APA Style

Zhang, J., Zheng, S., Liu, X., & Gu, J. (2026). Beyond Accuracy: A Multi-dimensional Cognitive Audit of Medical Large Vision–Language Models in Fundus Image Interpretation. Applied Sciences, 16(12), 6064. https://doi.org/10.3390/app16126064

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Beyond Accuracy: A Multi-dimensional Cognitive Audit of Medical Large Vision–Language Models in Fundus Image Interpretation

Abstract

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI