Next Article in Journal
Reinforcement Learning with Value Function Decomposition for Hierarchical Multi-Agent Consensus Control
Previous Article in Journal
Free Vibration of Graphene Nanoplatelet-Reinforced Porous Double-Curved Shells of Revolution with a General Radius of Curvature Based on a Semi-Analytical Method
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Hybrid Framework for Referring Image Segmentation: Dual-Decoder Model with SAM Complementation

by
Haoyuan Chen
1,2,
Sihang Zhou
1,
Kuan Li
2,
Jianping Yin
2 and
Jian Huang
1,*
1
College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
2
School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523808, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(19), 3061; https://doi.org/10.3390/math12193061
Submission received: 11 August 2024 / Revised: 19 September 2024 / Accepted: 25 September 2024 / Published: 30 September 2024

Abstract

In the realm of human–robot interaction, the integration of visual and verbal cues has become increasingly significant. This paper focuses on the challenges and advancements in referring image segmentation (RIS), a task that involves segmenting images based on textual descriptions. Traditional approaches to RIS have primarily focused on pixel-level classification. These methods, although effective, often overlook the interconnectedness of pixels, which can be crucial for interpreting complex visual scenes. Furthermore, while the PolyFormer model has shown impressive performance in RIS, its large number of parameters and high training data requirements pose significant challenges. These factors restrict its adaptability and optimization on standard consumer hardware, hindering further enhancements in subsequent research. Addressing these issues, our study introduces a novel two-branch decoder framework with SAM (segment anything model) for RIS. This framework incorporates an MLP decoder and a KAN decoder with a multi-scale feature fusion module, enhancing the model’s capacity to discern fine details within images. The framework’s robustness is further bolstered by an ensemble learning strategy that consolidates the insights from both the MLP and KAN decoder branches. More importantly, we collect the segmentation target edge coordinates and bounding box coordinates as input cues for the SAM model. This strategy leverages SAM’s zero-sample learning capabilities to refine and optimize the segmentation outcomes. Our experimental findings, based on the widely recognized RefCOCO, RefCOCO+, and RefCOCOg datasets, confirm the effectiveness of this method. The results not only achieve state-of-the-art (SOTA) performance in segmentation but are also supported by ablation studies that highlight the contributions of each component to the overall improvement in performance.
Keywords: referring image segmentation; two-branch decoder; ensemble learning; SAM; KAN referring image segmentation; two-branch decoder; ensemble learning; SAM; KAN

Share and Cite

MDPI and ACS Style

Chen, H.; Zhou, S.; Li, K.; Yin, J.; Huang, J. A Hybrid Framework for Referring Image Segmentation: Dual-Decoder Model with SAM Complementation. Mathematics 2024, 12, 3061. https://doi.org/10.3390/math12193061

AMA Style

Chen H, Zhou S, Li K, Yin J, Huang J. A Hybrid Framework for Referring Image Segmentation: Dual-Decoder Model with SAM Complementation. Mathematics. 2024; 12(19):3061. https://doi.org/10.3390/math12193061

Chicago/Turabian Style

Chen, Haoyuan, Sihang Zhou, Kuan Li, Jianping Yin, and Jian Huang. 2024. "A Hybrid Framework for Referring Image Segmentation: Dual-Decoder Model with SAM Complementation" Mathematics 12, no. 19: 3061. https://doi.org/10.3390/math12193061

APA Style

Chen, H., Zhou, S., Li, K., Yin, J., & Huang, J. (2024). A Hybrid Framework for Referring Image Segmentation: Dual-Decoder Model with SAM Complementation. Mathematics, 12(19), 3061. https://doi.org/10.3390/math12193061

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop