This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Multimodal Prompt-Guided Bidirectional Fusion for Referring Remote Sensing Image Segmentation
by
Yingjie Li
Yingjie Li ,
Weiqi Jin
Weiqi Jin *,
Su Qiu
Su Qiu
and
Qiyang Sun
Qiyang Sun
MOE Key Laboratory of Optoelectronic Imaging Technology and System, Beijing Institute of Technology, Beijing 100081, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(10), 1683; https://doi.org/10.3390/rs17101683 (registering DOI)
Submission received: 21 March 2025
/
Revised: 2 May 2025
/
Accepted: 8 May 2025
/
Published: 10 May 2025
Abstract
Multimodal feature alignment is a key challenge in referring remote sensing image segmentation (RRSIS). The complex spatial relationships and multi-scale targets in remote sensing images call for efficient cross-modal mapping and fine-grained feature alignment. Existing approaches typically rely on cross-attention for multimodal fusion, which increases model complexity. To address this, we introduce the concept of prompt learning in RRSIS and propose a parameter-efficient multimodal prompt-guided bidirectional fusion (MPBF) architecture. MPBF combines both early and late fusion strategies. In the early fusion stage, it conducts the deep fusion of linguistic and visual features through cross-modal prompt coupling. In the late fusion stage, to handle the multi-scale nature of remote sensing targets, a scale refinement module is proposed to capture diverse scale representations, and a vision–language alignment module is employed for pixel-level multimodal semantic associations. Comparative experiments and ablation studies on a public dataset demonstrate that MPBF significantly outperformed existing state-of-the-art methods with relatively small computational overhead, highlighting its effectiveness and efficiency for RRSIS. Further application experiments on a custom dataset confirm the method’s practicality and robustness in real-world scenarios.
Share and Cite
MDPI and ACS Style
Li, Y.; Jin, W.; Qiu, S.; Sun, Q.
Multimodal Prompt-Guided Bidirectional Fusion for Referring Remote Sensing Image Segmentation. Remote Sens. 2025, 17, 1683.
https://doi.org/10.3390/rs17101683
AMA Style
Li Y, Jin W, Qiu S, Sun Q.
Multimodal Prompt-Guided Bidirectional Fusion for Referring Remote Sensing Image Segmentation. Remote Sensing. 2025; 17(10):1683.
https://doi.org/10.3390/rs17101683
Chicago/Turabian Style
Li, Yingjie, Weiqi Jin, Su Qiu, and Qiyang Sun.
2025. "Multimodal Prompt-Guided Bidirectional Fusion for Referring Remote Sensing Image Segmentation" Remote Sensing 17, no. 10: 1683.
https://doi.org/10.3390/rs17101683
APA Style
Li, Y., Jin, W., Qiu, S., & Sun, Q.
(2025). Multimodal Prompt-Guided Bidirectional Fusion for Referring Remote Sensing Image Segmentation. Remote Sensing, 17(10), 1683.
https://doi.org/10.3390/rs17101683
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.