You are currently viewing a new version of our website. To view the old version click .
Journal of Imaging
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

28 November 2025

VT-MFLV: Vision–Text Multimodal Feature Learning V Network for Medical Image Segmentation

,
,
,
,
and
College of Publishing, University of Shanghai for Science and Technology, Shanghai 200093, China
*
Author to whom correspondence should be addressed.
J. Imaging2025, 11(12), 425;https://doi.org/10.3390/jimaging11120425 
(registering DOI)
This article belongs to the Special Issue Image Segmentation Techniques: Current Status and Future Directions (2nd Edition)

Abstract

Currently, existing multimodal segmentation methods face limitations in effectively leveraging medical text to guide visual feature learning. They often suffer from insufficient multimodal fusion and inadequate accuracy in fine-grained lesion segmentation accuracy. To address these challenges, the Vision–Text Multimodal Feature Learning V Network (VT-MFLV) is proposed. This model exploits the complementarity between medical images and text to enhance multimodal fusion, which consequently improves critical lesion recognition accuracy. VT-MFLV introduces three key modules: Diagnostic Image–Text Residual Multi-Head Semantic Encoding (DIT-RMHSE) module that preserves critical semantic cues while reducing preprocessing complexity; Fine-Grained Multimodal Fusion Local Attention Encoding (FG-MFLA) module that strengthens local cross-modal interaction; and Adaptive Global Feature Compression and Focusing (AGCF) module that emphasizes clinically relevant lesion regions. Experiments are conducted on two publicly available pulmonary infection datasets. On the MosMedData dataset, VT-MFLV achieved Dice and mIoU scores of 75.61 ± 0.32% and 63.98 ± 0.29%. On the QaTa-COV1 dataset, VT-MFLV achieved Dice and mIoU scores of 83.34 ± 0.36% and 72.09 ± 0.30%, both reaching world-leading levels.

Article Metrics

Citations

Article Access Statistics

Article metric data becomes available approximately 24 hours after publication online.