Robustness of Large Vision Language Model Features Under Wireless Channel Degradation for Medical Visual Question Answering

Güllü, Merve; Barışçı, Necaattin

doi:10.3390/app16136425

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Robustness of Large Vision Language Model Features Under Wireless Channel Degradation for Medical Visual Question Answering

by

Merve Güllü

^1,2,*

and

Necaattin Barışçı

³

¹

R&D Department, Türk Telekom, Ankara 06080, Türkiye

²

Graduate School of Natural and Applied Sciences, Gazi University, Ankara 06560, Türkiye

³

Department of Computer Engineering, Gazi University, Ankara 06560, Türkiye

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(13), 6425; https://doi.org/10.3390/app16136425 (registering DOI)

Submission received: 23 May 2026 / Revised: 14 June 2026 / Accepted: 23 June 2026 / Published: 27 June 2026

(This article belongs to the Special Issue Deep Learning and Its Applications in Natural Language Processing)

Download Versions Notes

Abstract

Deploying medical visual question answering (VQA) systems over wireless networks introduces a fundamental challenge: channel-induced image degradation may corrupt the visual representations extracted by large vision-language models (VLMs), leading to unreliable diagnostic decisions. We investigate the robustness of frozen LLaVA-1.6, BLIP-2, and BioViL-T hidden-state features under additive white Gaussian noise (AWGN), Rayleigh fading, and six combined JPEG-compression-plus-channel conditions (quality factors

q \in {20, 50, 70}

) across signal-to-noise ratios (SNRs) from

- 5

to

+ 20

dB. A lightweight MLP classifier is trained exclusively on clean features and evaluated on channel-degraded features, enabling controlled analysis of representation robustness without retraining. We introduce the Feature Robustness Score (FRS), defined as the difference between cosine similarity and normalized L2 drift of clean versus degraded features, together with a validation-set FRS threshold analysis as a label-free retraining criterion. A wavelet sub-band energy analysis further characterizes the spectral distribution of channel-induced feature drift. Experiments on PathVQA and VQA-RAD reveal four key findings: (1) LLaVA-1.6 features maintain cosine similarity above

0.98

across all eight channel conditions and all SNR levels, with statistically significant MLP gains at every tested point (

p < 0.05

, McNemar’s test); (2) BLIP-2 and BioViL-T features are less stable but still support consistent MLP improvements, with BioViL-T performing competitively on VQA-RAD, suggesting domain alignment matters; (3) JPEG compression quality (

q = 20, 50, 70

) has negligible impact on feature drift, establishing VLM features as JPEG quality-invariant; and (4) wavelet analysis confirms that channel noise primarily affects high-frequency detail bands while preserving low-frequency semantic content.

Keywords: medical visual question answering; large vision-language models; LLaVA; BLIP-2; BioViL-T; wireless channel; AWGN; Rayleigh fading; JPEG compression; feature robustness score; wavelet analysis; PathVQA; VQA-RAD

Share and Cite

MDPI and ACS Style

Güllü, M.; Barışçı, N. Robustness of Large Vision Language Model Features Under Wireless Channel Degradation for Medical Visual Question Answering. Appl. Sci. 2026, 16, 6425. https://doi.org/10.3390/app16136425

AMA Style

Güllü M, Barışçı N. Robustness of Large Vision Language Model Features Under Wireless Channel Degradation for Medical Visual Question Answering. Applied Sciences. 2026; 16(13):6425. https://doi.org/10.3390/app16136425

Chicago/Turabian Style

Güllü, Merve, and Necaattin Barışçı. 2026. "Robustness of Large Vision Language Model Features Under Wireless Channel Degradation for Medical Visual Question Answering" Applied Sciences 16, no. 13: 6425. https://doi.org/10.3390/app16136425

APA Style

Güllü, M., & Barışçı, N. (2026). Robustness of Large Vision Language Model Features Under Wireless Channel Degradation for Medical Visual Question Answering. Applied Sciences, 16(13), 6425. https://doi.org/10.3390/app16136425

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Robustness of Large Vision Language Model Features Under Wireless Channel Degradation for Medical Visual Question Answering

Abstract

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI