Next Article in Journal
Forecasting Chlorophyll-a in the Murray–Darling Basin Using Remote Sensing
Previous Article in Journal
Classification of Forest Stratification and Evaluation of Forest Stratification Changes over Two Periods Using UAV-LiDAR
Previous Article in Special Issue
Feature-Guided Instance Mining and Task-Aligned Focal Loss for Weakly Supervised Object Detection in Remote Sensing Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Multimodal Prompt-Guided Bidirectional Fusion for Referring Remote Sensing Image Segmentation

MOE Key Laboratory of Optoelectronic Imaging Technology and System, Beijing Institute of Technology, Beijing 100081, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(10), 1683; https://doi.org/10.3390/rs17101683 (registering DOI)
Submission received: 21 March 2025 / Revised: 2 May 2025 / Accepted: 8 May 2025 / Published: 10 May 2025

Abstract

Multimodal feature alignment is a key challenge in referring remote sensing image segmentation (RRSIS). The complex spatial relationships and multi-scale targets in remote sensing images call for efficient cross-modal mapping and fine-grained feature alignment. Existing approaches typically rely on cross-attention for multimodal fusion, which increases model complexity. To address this, we introduce the concept of prompt learning in RRSIS and propose a parameter-efficient multimodal prompt-guided bidirectional fusion (MPBF) architecture. MPBF combines both early and late fusion strategies. In the early fusion stage, it conducts the deep fusion of linguistic and visual features through cross-modal prompt coupling. In the late fusion stage, to handle the multi-scale nature of remote sensing targets, a scale refinement module is proposed to capture diverse scale representations, and a vision–language alignment module is employed for pixel-level multimodal semantic associations. Comparative experiments and ablation studies on a public dataset demonstrate that MPBF significantly outperformed existing state-of-the-art methods with relatively small computational overhead, highlighting its effectiveness and efficiency for RRSIS. Further application experiments on a custom dataset confirm the method’s practicality and robustness in real-world scenarios.
Keywords: remote sensing; referring image segmentation; prompt learning; bidirectional fusion remote sensing; referring image segmentation; prompt learning; bidirectional fusion

Share and Cite

MDPI and ACS Style

Li, Y.; Jin, W.; Qiu, S.; Sun, Q. Multimodal Prompt-Guided Bidirectional Fusion for Referring Remote Sensing Image Segmentation. Remote Sens. 2025, 17, 1683. https://doi.org/10.3390/rs17101683

AMA Style

Li Y, Jin W, Qiu S, Sun Q. Multimodal Prompt-Guided Bidirectional Fusion for Referring Remote Sensing Image Segmentation. Remote Sensing. 2025; 17(10):1683. https://doi.org/10.3390/rs17101683

Chicago/Turabian Style

Li, Yingjie, Weiqi Jin, Su Qiu, and Qiyang Sun. 2025. "Multimodal Prompt-Guided Bidirectional Fusion for Referring Remote Sensing Image Segmentation" Remote Sensing 17, no. 10: 1683. https://doi.org/10.3390/rs17101683

APA Style

Li, Y., Jin, W., Qiu, S., & Sun, Q. (2025). Multimodal Prompt-Guided Bidirectional Fusion for Referring Remote Sensing Image Segmentation. Remote Sensing, 17(10), 1683. https://doi.org/10.3390/rs17101683

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop