QuantFT-VL: Harmonizing Quantization and LoRA for Efficient Mobile Vision–Language Model Fine-Tuning

Jin, Fangyuan; Lin, Hui; Zhang, Lu; Chen, Yiwei

doi:10.3390/a19050364

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

QuantFT-VL: Harmonizing Quantization and LoRA for Efficient Mobile Vision–Language Model Fine-Tuning

by

Fangyuan Jin

¹,

Hui Lin

^1,2,

Lu Zhang

¹ and

Yiwei Chen

^3,4,*

¹

Science and Technology on Underwater Test and Control Laboratory, Dalian 116023, China

²

Marine Engineering College, Dalian Maritime University, Dalian 116026, China

³

Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou 215163, China

⁴

School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China, Suzhou 215163, China

^*

Author to whom correspondence should be addressed.

Algorithms 2026, 19(5), 364; https://doi.org/10.3390/a19050364

Submission received: 25 March 2026 / Revised: 23 April 2026 / Accepted: 30 April 2026 / Published: 4 May 2026

Download Versions Notes

Abstract

Vision–language models (VLMs) are increasingly deployed in resource-constrained environments, yet efficient fine-tuning remains challenging because post-training quantization often degrades the effectiveness of low-rank adaptation. This paper revisits that mismatch in the context of MobileVLM1.7B and presents QuantFT-VL, a novel initialization strategy following the quantization phase to seamlessly align with the LoRA technique. The key idea is to initialize LoRA using a low-rank approximation of the quantization residual instead of the default zero-initialization used in QLoRA-style pipelines. After quantizing a pretrained weight matrix W into Q, we compute the residual W − Q and use truncated singular value decomposition to initialize the LoRA factors (A and B) so that the starting adapted weight Q + AB^T better matches the full-precision model. This residual-aware initialization reduces the discrepancy introduced by quantization and leads to faster and more stable optimization. Experiments on six standard VLM benchmarks show that QuantFT-VL consistently improves over QLoRA and recovers performance close to or better than full-precision LoRA in the best setting. On two RTX 3090 GPUs, QuantFT-VL improves the average benchmark score by 3.27 percentage points over QLoRA while preserving the memory and speed advantages of quantized fine-tuning.

Keywords: efficient fine-tuning; low-rank adaptation; mobile vision–language model; quantization-aware adaptation; quantized fine-tuning; residual-aware initialization

Share and Cite

MDPI and ACS Style

Jin, F.; Lin, H.; Zhang, L.; Chen, Y. QuantFT-VL: Harmonizing Quantization and LoRA for Efficient Mobile Vision–Language Model Fine-Tuning. Algorithms 2026, 19, 364. https://doi.org/10.3390/a19050364

AMA Style

Jin F, Lin H, Zhang L, Chen Y. QuantFT-VL: Harmonizing Quantization and LoRA for Efficient Mobile Vision–Language Model Fine-Tuning. Algorithms. 2026; 19(5):364. https://doi.org/10.3390/a19050364

Chicago/Turabian Style

Jin, Fangyuan, Hui Lin, Lu Zhang, and Yiwei Chen. 2026. "QuantFT-VL: Harmonizing Quantization and LoRA for Efficient Mobile Vision–Language Model Fine-Tuning" Algorithms 19, no. 5: 364. https://doi.org/10.3390/a19050364

APA Style

Jin, F., Lin, H., Zhang, L., & Chen, Y. (2026). QuantFT-VL: Harmonizing Quantization and LoRA for Efficient Mobile Vision–Language Model Fine-Tuning. Algorithms, 19(5), 364. https://doi.org/10.3390/a19050364

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

QuantFT-VL: Harmonizing Quantization and LoRA for Efficient Mobile Vision–Language Model Fine-Tuning

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI