This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Robustness of Large Vision Language Model Features Under Wireless Channel Degradation for Medical Visual Question Answering
by
Merve Güllü
Merve Güllü 1,2,*
and
Necaattin Barışçı
Necaattin Barışçı 3
1
R&D Department, Türk Telekom, Ankara 06080, Türkiye
2
Graduate School of Natural and Applied Sciences, Gazi University, Ankara 06560, Türkiye
3
Department of Computer Engineering, Gazi University, Ankara 06560, Türkiye
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(13), 6425; https://doi.org/10.3390/app16136425 (registering DOI)
Submission received: 23 May 2026
/
Revised: 14 June 2026
/
Accepted: 23 June 2026
/
Published: 27 June 2026
Abstract
Deploying medical visual question answering (VQA) systems over wireless networks introduces a fundamental challenge: channel-induced image degradation may corrupt the visual representations extracted by large vision-language models (VLMs), leading to unreliable diagnostic decisions. We investigate the robustness of frozen LLaVA-1.6, BLIP-2, and BioViL-T hidden-state features under additive white Gaussian noise (AWGN), Rayleigh fading, and six combined JPEG-compression-plus-channel conditions (quality factors ) across signal-to-noise ratios (SNRs) from to dB. A lightweight MLP classifier is trained exclusively on clean features and evaluated on channel-degraded features, enabling controlled analysis of representation robustness without retraining. We introduce the Feature Robustness Score (FRS), defined as the difference between cosine similarity and normalized L2 drift of clean versus degraded features, together with a validation-set FRS threshold analysis as a label-free retraining criterion. A wavelet sub-band energy analysis further characterizes the spectral distribution of channel-induced feature drift. Experiments on PathVQA and VQA-RAD reveal four key findings: (1) LLaVA-1.6 features maintain cosine similarity above across all eight channel conditions and all SNR levels, with statistically significant MLP gains at every tested point (, McNemar’s test); (2) BLIP-2 and BioViL-T features are less stable but still support consistent MLP improvements, with BioViL-T performing competitively on VQA-RAD, suggesting domain alignment matters; (3) JPEG compression quality () has negligible impact on feature drift, establishing VLM features as JPEG quality-invariant; and (4) wavelet analysis confirms that channel noise primarily affects high-frequency detail bands while preserving low-frequency semantic content.
Share and Cite
MDPI and ACS Style
Güllü, M.; Barışçı, N.
Robustness of Large Vision Language Model Features Under Wireless Channel Degradation for Medical Visual Question Answering. Appl. Sci. 2026, 16, 6425.
https://doi.org/10.3390/app16136425
AMA Style
Güllü M, Barışçı N.
Robustness of Large Vision Language Model Features Under Wireless Channel Degradation for Medical Visual Question Answering. Applied Sciences. 2026; 16(13):6425.
https://doi.org/10.3390/app16136425
Chicago/Turabian Style
Güllü, Merve, and Necaattin Barışçı.
2026. "Robustness of Large Vision Language Model Features Under Wireless Channel Degradation for Medical Visual Question Answering" Applied Sciences 16, no. 13: 6425.
https://doi.org/10.3390/app16136425
APA Style
Güllü, M., & Barışçı, N.
(2026). Robustness of Large Vision Language Model Features Under Wireless Channel Degradation for Medical Visual Question Answering. Applied Sciences, 16(13), 6425.
https://doi.org/10.3390/app16136425
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article metric data becomes available approximately 24 hours after publication online.