Next Article in Journal
Distribution Network Planning Considering Harmonics Based on a Parallel Genetic Algorithm Using Message Passing Interface
Previous Article in Journal
A Reproducible Benchmarking Methodology for Machine Learning Hardware: Performance–Energy Trade-Offs from GPUs to Apple Silicon
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

QuantFT-VL: Harmonizing Quantization and LoRA for Efficient Mobile Vision–Language Model Fine-Tuning

1
Science and Technology on Underwater Test and Control Laboratory, Dalian 116023, China
2
Marine Engineering College, Dalian Maritime University, Dalian 116026, China
3
Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou 215163, China
4
School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China, Suzhou 215163, China
*
Author to whom correspondence should be addressed.
Algorithms 2026, 19(5), 364; https://doi.org/10.3390/a19050364
Submission received: 25 March 2026 / Revised: 23 April 2026 / Accepted: 30 April 2026 / Published: 4 May 2026

Abstract

Vision–language models (VLMs) are increasingly deployed in resource-constrained environments, yet efficient fine-tuning remains challenging because post-training quantization often degrades the effectiveness of low-rank adaptation. This paper revisits that mismatch in the context of MobileVLM1.7B and presents QuantFT-VL, a novel initialization strategy following the quantization phase to seamlessly align with the LoRA technique. The key idea is to initialize LoRA using a low-rank approximation of the quantization residual instead of the default zero-initialization used in QLoRA-style pipelines. After quantizing a pretrained weight matrix W into Q, we compute the residual WQ and use truncated singular value decomposition to initialize the LoRA factors (A and B) so that the starting adapted weight Q + ABT better matches the full-precision model. This residual-aware initialization reduces the discrepancy introduced by quantization and leads to faster and more stable optimization. Experiments on six standard VLM benchmarks show that QuantFT-VL consistently improves over QLoRA and recovers performance close to or better than full-precision LoRA in the best setting. On two RTX 3090 GPUs, QuantFT-VL improves the average benchmark score by 3.27 percentage points over QLoRA while preserving the memory and speed advantages of quantized fine-tuning.
Keywords: efficient fine-tuning; low-rank adaptation; mobile vision–language model; quantization-aware adaptation; quantized fine-tuning; residual-aware initialization efficient fine-tuning; low-rank adaptation; mobile vision–language model; quantization-aware adaptation; quantized fine-tuning; residual-aware initialization

Share and Cite

MDPI and ACS Style

Jin, F.; Lin, H.; Zhang, L.; Chen, Y. QuantFT-VL: Harmonizing Quantization and LoRA for Efficient Mobile Vision–Language Model Fine-Tuning. Algorithms 2026, 19, 364. https://doi.org/10.3390/a19050364

AMA Style

Jin F, Lin H, Zhang L, Chen Y. QuantFT-VL: Harmonizing Quantization and LoRA for Efficient Mobile Vision–Language Model Fine-Tuning. Algorithms. 2026; 19(5):364. https://doi.org/10.3390/a19050364

Chicago/Turabian Style

Jin, Fangyuan, Hui Lin, Lu Zhang, and Yiwei Chen. 2026. "QuantFT-VL: Harmonizing Quantization and LoRA for Efficient Mobile Vision–Language Model Fine-Tuning" Algorithms 19, no. 5: 364. https://doi.org/10.3390/a19050364

APA Style

Jin, F., Lin, H., Zhang, L., & Chen, Y. (2026). QuantFT-VL: Harmonizing Quantization and LoRA for Efficient Mobile Vision–Language Model Fine-Tuning. Algorithms, 19(5), 364. https://doi.org/10.3390/a19050364

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop