TSFANet: Trans-Mamba Hybrid Network with Semantic Feature Alignment for Remote Sensing Salient Object Detection
Abstract
1. Introduction
- To achieve comprehensive feature extraction, we designed the Trans-Mamba Semantic-Detail Dual-Stream Collaborative Module (TSDSM), a novel dual-stream architecture that synergistically combines CNNs-Transformer and CNNs-Mamba branches. This hybrid structure effectively leverages Transformer’s global modeling capabilities and Mamba’s local processing advantages, enabling more accurate salient detection in complex remote sensing scenes.
- For effective alignment of local details and global semantic features, we constructed the Adaptive Semantic Correlation Refinement Module (ASCRM). This module models the correlation between semantic and local features, utilizing matrix reshaping and SoftMax activation to accurately capture the spatial information of significant regions, thereby enhancing the precision of optical remote sensing salient object detection.
- To better integrate semantic features with local details, we designed the Semantic-Guided Adjacent Feature Fusion Module (SGAFF). This module extracts the overall semantic framework using a global attention mechanism and fuses it with local features to enhance the semantic information. Through the global semantic fluid information, SGAFF effectively filters out background noise, highlights target details, and improves object detection accuracy.
- Extensive experiments on three public RSI-SOD datasets demonstrate that our method consistently outperforms 26 state-of-the-art approaches. Detailed ablation studies verify that our proposed modules effectively address the challenges of complex backgrounds, scale variations, and irregular topologies in RSI-SOD tasks.
2. Related Works
2.1. State Space Models
2.2. SOD in Natural Scene Images
2.3. SOD in Optical RSIs
3. Methodology
3.1. Overall Structure
Algorithm 1: Training Framework for TSFANet |
3.2. Trans-Mamba Semantic-Detail Dual-Stream Collaborative
3.2.1. CNNs-Transformer
3.2.2. CNNs-Mamba
3.3. Adaptive Semantic Correlation Refinement
3.4. Semantic-Guided Adjacent Feature Fusion
3.5. Loss Function
4. Experiments and Analysis
4.1. Dataset Description
- ORSSD: The first publicly available RSI-SOD dataset, ORSSD, comprises 800 optical RSI images with corresponding annotations. Of these, 600 images are designated for training, and 200 images for testing.
- EORSSD: An extended version of ORSSD, EORSSD incorporates more challenging scenarios to better assess model robustness. It includes 1400 training samples and 600 testing samples.
- ORSI-4199: The most diverse saliency detection dataset in terms of scene complexity, ORSI-4199 contains 2000 training samples and 2199 testing samples.
4.2. Implementation Details and Evaluation Metrics
4.3. Comparison with State-of-the-Art Methods
4.3.1. Quantitative Comparison
4.3.2. Qualitative Comparison
4.4. Ablation Study
4.4.1. Effect of TSDSM
4.4.2. Effect of ASCRM
4.4.3. Effect of SGAFF
4.4.4. Effect of the Loss Function
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Itti, L.; Koch, C.; Niebur, E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 1254–1259. [Google Scholar] [CrossRef]
- Borji, A. What is a salient object? A dataset and a baseline model for salient object detection. IEEE Trans. Image Process. 2014, 24, 742–756. [Google Scholar] [CrossRef]
- Li, C.; Yuan, Y.; Cai, W.; Xia, Y.; Dagan Feng, D. Robust saliency detection via regularized random walks ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2710–2717. [Google Scholar]
- Yuan, Y.; Li, C.; Kim, J.; Cai, W.; Feng, D.D. Reversion correction and regularized random walk ranking for saliency detection. IEEE Trans. Image Process. 2017, 27, 1311–1322. [Google Scholar] [CrossRef] [PubMed]
- Liu, N.; Zhang, N.; Wan, K.; Shao, L.; Han, J. Visual Saliency Transformer. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 4702–4712. [Google Scholar]
- Lv, Z.; Huang, H.; Li, X.; Zhao, M.; Benediktsson, J.A.; Sun, W.; Falco, N. Land cover change detection with heterogeneous remote sensing images: Review, progress, and perspective. Proc. IEEE 2022, 110, 1976–1991. [Google Scholar] [CrossRef]
- Sarkar, A.; Chowdhury, T.; Murphy, R.R.; Gangopadhyay, A.; Rahnemoonfar, M. Sam-vqa: Supervised attention-based visual question answering model for post-disaster damage assessment on remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4702716. [Google Scholar] [CrossRef]
- Han, Y.; Liao, J.; Lu, T.; Pu, T.; Peng, Z. KCPNet: Knowledge-driven context perception networks for ship detection in infrared imagery. IEEE Trans. Geosci. Remote Sens. 2022, 61, 5000219. [Google Scholar] [CrossRef]
- Qin, X.; Zhang, Z.; Huang, C.; Gao, C.; Dehghan, M.; Jagersand, M. BASNet: Boundary-aware salient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–19 June 2019; pp. 7479–7489. [Google Scholar]
- Deng, Z.; Hu, X.; Zhu, L.; Xu, X.; Qin, J.; Han, G.; Heng, P.A. R3Net: Recurrent residual refinement network for saliency detection. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden, 13–19 July 2018; pp. 684–690. [Google Scholar]
- Li, G.; Bai, Z.; Liu, Z.; Zhang, X.; Ling, H. Salient object detection in optical remote sensing images driven by transformer. IEEE Trans. Image Process. 2023, 32, 5257–5269. [Google Scholar] [CrossRef]
- Liu, Y.; Tian, Y.; Zhao, Y.; Yu, H.; Xie, L.; Wang, Y.; Ye, Q.; Liu, Y. Vmamba: Visual state space model. arXiv 2024, arXiv:2401.10166. [Google Scholar]
- Wang, Q.; Liu, Y.; Xiong, Z.; Yuan, Y. Hybrid feature aligned network for salient object detection in optical remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5624915. [Google Scholar] [CrossRef]
- Gao, Y.; Huang, J.; Sun, X.; Jie, Z.; Zhong, Y.; Ma, L. Matten: Video Generation with Mamba-Attention. arXiv 2024, arXiv:2405.03025. [Google Scholar]
- Hatamizadeh, A.; Kautz, J. MambaVision: A Hybrid Mamba-Transformer Vision Backbone. arXiv 2024, arXiv:2407.08083. [Google Scholar]
- Gong, H.; Kang, L.; Wang, Y.; Wan, X.; Li, H. nnmamba: 3D biomedical image segmentation, classification and landmark detection with state space model. arXiv 2024, arXiv:2402.03526. [Google Scholar]
- Sheng, J.; Zhou, J.; Wang, J.; Ye, P.; Fan, J. DualMamba: A Lightweight Spectral-Spatial Mamba-Convolution Network for Hyperspectral Image Classification. arXiv 2024, arXiv:2406.07050. [Google Scholar] [CrossRef]
- Wang, Z.; Ma, C. Weak-Mamba-UNet: Visual Mamba Makes CNN and ViT Work Better for Scribble-based Medical Image Segmentation. arXiv 2024, arXiv:2402.10887. [Google Scholar]
- Huang, T.; Pei, X.; You, S.; Wang, F.; Qian, C.; Xu, C. LocalMamba: Visual state space model with windowed selective scan. arXiv 2024, arXiv:2403.09338. [Google Scholar]
- Zhu, L.; Liao, B.; Zhang, Q.; Wang, X.; Liu, W.; Wang, X. Vision Mamba: Efficient visual representation learning with bidirectional state space model. arXiv 2024, arXiv:2401.09417. [Google Scholar]
- Wang, C.; Tsepa, O.; Ma, J.; Wang, B. Graph-mamba: Towards long-range graph sequence modeling with selective state spaces. arXiv 2024, arXiv:2402.00789. [Google Scholar]
- Tang, Y.; Dong, P.; Tang, Z.; Chu, X.; Liang, J. VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting. arXiv 2024, arXiv:2403.16536. [Google Scholar]
- Li, W.; Hong, X.; Fan, X. Spikemba: Multi-modal spiking saliency mamba for temporal video grounding. arXiv 2024, arXiv:2404.01174. [Google Scholar]
- Deng, R.; Gu, T. CU-Mamba: Selective State Space Models with Channel Learning for Image Restoration. arXiv 2024, arXiv:2404.11778. [Google Scholar]
- Wang, Z.; Zheng, J.Q.; Ma, C.; Guo, T. Vmambamorph: A visual mamba-based framework with cross-scan module for deformable 3D image registration. arXiv 2024, arXiv:2404.05105. [Google Scholar]
- Yue, Y.; Li, Z. Medmamba: Vision mamba for medical image classification. arXiv 2024, arXiv:2403.03849. [Google Scholar]
- He, X.; Cao, K.; Yan, K.; Li, R.; Xie, C.; Zhang, J.; Zhou, M. Pan-Mamba: Effective pan-sharpening with State Space Model. arXiv 2024, arXiv:2402.12192. [Google Scholar] [CrossRef]
- Shi, Y.; Xia, B.; Jin, X.; Wang, X.; Zhao, T.; Xia, X.; Xiao, X.; Yang, W. Vmambair: Visual state space model for image restoration. arXiv 2024, arXiv:2403.11423. [Google Scholar] [CrossRef]
- Kim, J.; Han, D.; Tai, Y.W.; Kim, J. Salient region detection via high-dimensional color transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 883–890. [Google Scholar]
- Zhu, W.; Liang, S.; Wei, Y.; Sun, J. Saliency optimization from robust background detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 2814–2821. [Google Scholar]
- Feng, M.; Lu, H.; Ding, E. Attentive feedback network for boundary-aware salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 1623–1632. [Google Scholar]
- Li, G.; Yu, Y. Visual saliency based on multiscale deep features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 5455–5463. [Google Scholar]
- Zhang, P.; Wang, D.; Lu, H.; Wang, H.; Ruan, X. Amulet: Aggregating multi-level convolutional features for salient object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 202–211. [Google Scholar]
- Pang, Y.; Zhao, X.; Zhang, L.; Lu, H. Multi-scale interactive network for salient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9413–9422. [Google Scholar]
- Zhang, Z.; Li, S.; Li, H. C2SNet: Contour-to-Saliency Network for Salient Object Detection. IEEE Trans. Image Process. 2020, 29, 3076–3088. [Google Scholar]
- Zhao, J.; Liu, J.J.; Fan, D.P.; Cao, Y.; Yang, J.; Cheng, M.M. EGNet: Edge guidance network for salient object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 Octorber–2 November 2019; pp. 8779–8788. [Google Scholar]
- Zhou, H.; Xie, X.; Lai, J.H.; Chen, Z.; Yang, L. Interactive two-stream decoder for accurate and fast saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9138–9147. [Google Scholar]
- Lee, M.S.; Shin, W.; Han, S.W. TRACER: Extreme attention guided salient object tracing network. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Online, 22 February–1 March 2022. [Google Scholar]
- Liu, N.; Han, J.; Yang, M.H. Picanet: Learning pixel-wise contextual attention for saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 3089–3098. [Google Scholar]
- Zhao, T.; Wu, X. Pyramid feature attention network for saliency detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3085–3094. [Google Scholar]
- Ma, M.; Xia, C.; Xie, C.; Chen, X.; Li, J. Boosting broader receptive fields for salient object detection. IEEE Trans. Image Process. 2023, 32, 1026–1038. [Google Scholar] [CrossRef]
- Zhao, J.; Wang, J.; Shi, J.; Jiang, Z. Sparsity-guided saliency detection for remote sensing images. J. Appl. Remote Sens. 2015, 9, 095055. [Google Scholar] [CrossRef]
- Li, C.; Cong, R.; Hou, J.; Zhang, S.; Qian, Y.; Kwong, S. Nested network with two-stream pyramid for salient object detection in optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 9156–9166. [Google Scholar] [CrossRef]
- Zhang, Q.; Cong, R.; Li, C.; Cheng, M.M.; Fang, Y.; Cao, X.; Zhao, Y.; Kwong, S. Dense attention fluid network for salient object detection in optical remote sensing images. IEEE Trans. Image Process. 2021, 30, 1305–1317. [Google Scholar] [CrossRef]
- Tu, Z.; Wang, C.; Li, C.; Fan, M.; Zhao, H.; Luo, B. ORSI salient object detection via multiscale joint region and boundary model. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5607913. [Google Scholar] [CrossRef]
- Zhou, X.; Shen, K.; Liu, Z.; Gong, C.; Zhang, J.; Yan, C. Edge-aware multiscale feature integration network for salient object detection in optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5605315. [Google Scholar] [CrossRef]
- Li, G.; Liu, Z.; Lin, W.; Ling, H. Multi-content complementation network for salient object detection in optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5614513. [Google Scholar] [CrossRef]
- Huang, Z.; Chen, H.; Liu, B.; Wang, Z. Semantic-guided attention refinement network for salient object detection in optical remote sensing images. Remote Sens. 2021, 13, 2163. [Google Scholar] [CrossRef]
- Li, G.; Liu, Z.; Lin, D.; Ling, H. Adjacent context coordination network for salient object detection in optical remote sensing images. IEEE Trans. Cybern. 2023, 53, 526–538. [Google Scholar] [CrossRef] [PubMed]
- Zhuge, M.; Fan, D.P.; Liu, N.; Zhang, D.; Xu, D.; Shao, L. Salient object detection via integrity learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 3738–3752. [Google Scholar] [CrossRef]
- Zhao, J.; Jia, Y.; Ma, L.; Yu, L. Recurrent adaptive graph reasoning network with region and boundary interaction for salient object detection in optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5630720. [Google Scholar] [CrossRef]
- Feng, D.; Chen, H.; Liu, S.; Liao, Z.; Shen, X.; Xie, Y.; Zhu, J. Boundary-semantic collaborative guidance network with dual-stream feedback mechanism for salient object detection in optical remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4706317. [Google Scholar] [CrossRef]
- Gong, A.; Nie, J.; Niu, C.; Yu, Y.; Li, J.; Guo, L. Edge and skeleton guidance network for salient object detection in optical remote sensing images. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 7109–7120. [Google Scholar] [CrossRef]
- Ma, F.; Zhang, F.; Xiang, D.; Yin, Q.; Zhou, Y. Fast task-specific region merging for SAR image segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5222316. [Google Scholar] [CrossRef]
- Gao, L.; Liu, B.; Fu, P.; Xu, M. Adaptive spatial tokenization transformer for salient object detection in optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5602915. [Google Scholar] [CrossRef]
- Liu, K.; Zhang, B.; Lu, J.; Yan, H. Towards integrity and detail with ensemble learning for salient object detection in optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5624615. [Google Scholar] [CrossRef]
- Yan, R.; Yan, L.; Geng, G.; Cao, Y.; Zhou, P.; Meng, Y. ASNet: Adaptive semantic network based on transformer-CNN for salient object detection in optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5608716. [Google Scholar] [CrossRef]
- Liu, Y.; Xu, M.; Xiao, T.; Tang, H.; Hu, Y.; Nie, L. Heterogeneous feature collaboration network for salient object detection in optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5635114. [Google Scholar] [CrossRef]
- Han, P.; Zhao, B.; Li, X. Progressive feature interleaved fusion network for remote-sensing image salient object detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5500414. [Google Scholar] [CrossRef]
- Di, L.; Zhang, B.; Wang, Y. Multi-scale and multi-dimensional weighted network for salient object detection in optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5625114. [Google Scholar] [CrossRef]
- Ma, F.; Zhang, F.; Yin, Q.; Xiang, D.; Zhou, Y. Fast SAR image segmentation with deep task-specific superpixel sampling and soft graph convolution. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5214116. [Google Scholar] [CrossRef]
- Fang, W.; Fu, Y.; Sheng, V.S. FPS-U2Net: Combining U2Net and Multi-level Aggregation Architecture for Fire Point Segmentation in Remote Sensing Images. Comput. Geosci. 2024, 189, 105628. [Google Scholar] [CrossRef]
- Zhu, Q.; Cai, Y.; Fang, Y.; Yang, Y.; Chen, C.; Fan, L.; Nguyen, A. Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model. arXiv 2024, arXiv:2404.01705. [Google Scholar] [CrossRef]
- Chen, S.; Tan, X.; Wang, B.; Hu, X. Reverse attention for salient object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 234–250. [Google Scholar]
- Wu, Z.; Su, L.; Huang, Q. Cascaded partial decoder for fast and accurate salient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3907–3916. [Google Scholar]
- Liu, J.J.; Hou, Q.; Cheng, M.M.; Feng, J.; Jiang, J. A simple pooling-based design for real-time salient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3917–3926. [Google Scholar]
- Qin, X.; Zhang, Z.; Huang, C.; Dehghan, M.; Zaiane, O.R.; Jagersand, M. U2-net: Going deeper with nested u-structure for salient object detection. Pattern Recognit. 2020, 106, 107404. [Google Scholar] [CrossRef]
- Cheng, M.M.; Gao, S.H.; Borji, A.; Tan, Y.Q.; Lin, Z.; Wang, M. A highly efficient model to study the semantics of salient object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 8006–8021. [Google Scholar] [CrossRef]
- Giglietto, F.; Righetti, N.; Rossi, L.; Marino, G. COORNET: An Integrated Approach to Surface Problematic Content, Malicious Actors, and Coordinated Networks. Aoir Sel. Pap. Internet Res. 2021, 21, 13–16. [Google Scholar] [CrossRef]
- Chen, X.; Zhang, N.; Li, L.; Yao, Y.; Deng, S.; Tan, C.; Huang, F.; Si, L.; Chen, H. Good visual guidance makes a better extractor: Hierarchical visual prefix for multimodal entity and relation extraction. arXiv 2022, arXiv:2205.03521. [Google Scholar]
- Liu, Y.; Zhang, X.Y.; Bian, J.W.; Zhang, L.; Cheng, M.M. Samnet: Stereoscopically attentive multi-scale network for lightweight salient object detection. IEEE Trans. Image Process. 2021, 30, 3804–3814. [Google Scholar] [CrossRef]
- Li, J.; Pan, Z.; Liu, Q.; Wang, Z. Stacked U-shape network with channel-wise attention for salient object detection. IEEE Trans. Multimed. 2020, 23, 1397–1409. [Google Scholar] [CrossRef]
- Fang, C.; Tian, H.; Zhang, D.; Zhang, Q.; Han, J.; Han, J. Densely nested top-down flows for salient object detection. Sci. China Inf. Sci. 2022, 65, 182103. [Google Scholar] [CrossRef]
- Liu, Y.; Zhang, D.; Liu, N.; Xu, S.; Han, J. Disentangled capsule routing for fast part-object relational saliency. IEEE Trans. Image Process. 2022, 31, 6719–6732. [Google Scholar] [CrossRef]
- Xu, B.; Liang, H.; Liang, R.; Chen, P. Locate globally, segment locally: A progressive architecture with knowledge review network for salient object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 3004–3012. [Google Scholar]
- Zhou, X.; Shen, K.; Weng, L.; Cong, R.; Zheng, B.; Zhang, J.; Yan, C. Edge-guided recurrent positioning network for salient object detection in optical remote sensing images. IEEE Trans. Cybern. 2022, 53, 539–552. [Google Scholar] [CrossRef] [PubMed]
- Fang, W.; Fu, Y.; Sheng, V.S. Dual Backbone Interaction Network For Burned Area Segmentation in Optical Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2024, 21, 6008805. [Google Scholar] [CrossRef]
- Fu, Y.; Fang, W.; Sheng, V.S. Burned Area Segmentation in Optical Remote Sensing Images Driven by U-shaped Multi-stage Masked Autoencoder. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 10770–10780. [Google Scholar] [CrossRef]
- Ge, Y.; Liang, T.; Ren, J.; Chen, J.; Bi, H. Enhanced salient object detection in remote sensing images via dual-stream semantic interactive network. Vis. Comput. 2024, 44, 5153–5169. [Google Scholar] [CrossRef]
- Park, J.; Woo, S.; Lee, J.Y.; Kweon, I.S. BAM: Bottleneck Attention Module. arXiv 2018, arXiv:1807.06514. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Li, Y.; Yao, T.; Pan, Y.; Mei, T. Contextual Transformer Networks for Visual Recognition. arXiv 2021, arXiv:2107.12292. [Google Scholar] [CrossRef] [PubMed]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11531–11539. [Google Scholar]
- Pan, J.; Canton Ferrer, C.; McGuinness, K.; O’Connor, N.E.; Torres, J.; Sayrol, E.; Giro-i Nieto, X. SalGAN: Visual Saliency Prediction with Generative Adversarial Networks. arXiv 2018, arXiv:1701.01081. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
Methods | Publication | Type | Backbone | ORSSD | |||
---|---|---|---|---|---|---|---|
PicaNet | 2018 CVPR | C-NSI | ResNet | 0.8689 | 0.7922 | 0.9005 | 0.0260 |
RAS | 2018 ECCV | C-NSI | ResNet | 0.8829 | 0.8229 | 0.9306 | 0.0169 |
R3Net | 2018 IJCAI | C-NSI | ResNet | 0.9009 | 0.8615 | 0.9238 | 0.0170 |
PoolNet | 2019 CVPR | C-NSI | ResNet | 0.8098 | 0.7051 | 0.8513 | 0.0469 |
BASNet | 2019 CVPR | C-NSI | U-Net | 0.8931 | 0.8231 | 0.9206 | 0.0277 |
CPDNet | 2019 CVPR | C-NSI | ResNet | 0.8829 | 0.8363 | 0.9133 | 0.0171 |
EGNet | 2019 ICCV | C-NSI | VGG | 0.8725 | 0.7603 | 0.8959 | 0.0217 |
U2Net | 2020 PR | C-NSI | U-Net | 0.8716 | 0.7962 | 0.9014 | 0.0222 |
VIT | 2021 ICCV | T-NSI | Transformer | 0.9174 | 0.8613 | 0.9415 | 0.0125 |
CSNet | 2020 TIP | C-RSI | ResNet | 0.8928 | 0.8437 | 0.9181 | 0.0181 |
HVPNet | 2021 TGRS | C-RSI | ResNet | 0.8585 | 0.7431 | 0.8601 | 0.0227 |
SAMNet | 2021 TGRS | C-RSI | ResNet | 0.8721 | 0.7559 | 0.8690 | 0.0221 |
CoorNet | 2021 TGRS | C-RSI | ResNet | 0.9392 | 0.9188 | 0.9746 | 0.0098 |
MCCNet | 2021 TGRS | C-RSI | VGG | 0.9428 | 0.9241 | 0.9758 | 0.0087 |
SUCA | 2021 T-GRS | C-RSI | ResNet | 0.8989 | 0.8398 | 0.9391 | 0.0145 |
ACCoNet | 2022 TCYB | C-RSI | VGG | 0.9424 | 0.9157 | 0.9754 | 0.0088 |
DNTD | 2022 TGRS | C-RSI | ResNet | 0.8696 | 0.8163 | 0.9065 | 0.0218 |
MJRBM | 2022 TGRS | C-RSI | ResNet | 0.9194 | 0.8749 | 0.9418 | 0.0163 |
ICON | 2022 TGRS | C-RSI | ResNet | 0.9251 | 0.8851 | 0.9637 | 0.0116 |
EMFINet | 2022 TGRS | C-RSI | ResNet | 0.9365 | 0.9038 | 0.9671 | 0.0109 |
DPORTNet | 2023 TGRS | C-RSI | ResNet | 0.8823 | 0.8327 | 0.9119 | 0.0221 |
PA-KRN | 2023 TGRS | C-RSI | ResNet | 0.9231 | 0.8909 | 0.9620 | 0.0139 |
ERPNet | 2023 TGRS | C-RSI | ResNet | 0.9247 | 0.8926 | 0.9566 | 0.0135 |
GeleNet | 2023 TIP | T-RSI | PVT | 0.9075 | 0.8619 | 0.9463 | 0.0133 |
HFANet | 2022 TGRS | H-RSI | Hybrid | 0.9393 | 0.9165 | 0.9711 | 0.0092 |
VMamba | 2024 arXiv | - | MAmba | 0.7060 | 0.5047 | 0.6684 | 0.0584 |
Samba | 2024 arXiv | - | MAmba | 0.8628 | 0.8077 | 0.9060 | 0.0327 |
DBINet | 2024 GRSL | H-RSI | Hybrid | 0.8818 | 0.8220 | 0.9297 | 0.0173 |
DCNet | 2024 JSTARS | H-RSI | Hybrid | 0.8821 | 0.8359 | 0.9128 | 0.0177 |
DSINet | 2024 Vis Comput | H-RSI | Hybrid | 0.9387 | 0.9007 | 0.9709 | 0.0093 |
TSFANet | Ours | H-RSI | Hybrid | 0.9446 | 0.9174 | 0.9817 | 0.0077 |
Methods | Publication | Type | Backbone | EORSSD | |||
---|---|---|---|---|---|---|---|
PicaNet | 2018 CVPR | C-NSI | ResNet | 0.8797 | 0.7762 | 0.8902 | 0.0119 |
RAS | 2018 ECCV | C-NSI | ResNet | 0.8847 | 0.8052 | 0.9265 | 0.0115 |
R3Net | 2018 IJCAI | C-NSI | ResNet | 0.8974 | 0.8268 | 0.9154 | 0.0124 |
PoolNet | 2019 CVPR | C-NSI | ResNet | 0.8556 | 0.7486 | 0.8742 | 0.0169 |
BASNet | 2019 CVPR | C-NSI | U-Net | 0.9031 | 0.8207 | 0.9266 | 0.0114 |
CPDNet | 2019 CVPR | C-NSI | ResNet | 0.8488 | 0.7672 | 0.8794 | 0.0156 |
EGNet | 2019 ICCV | C-NSI | VGG | 0.8605 | 0.7065 | 0.8631 | 0.0109 |
U2Net | 2020 PR | C-NSI | U-Net | 0.8619 | 0.7457 | 0.8581 | 0.0167 |
VIT | 2021 ICCV | T-NSI | Transformer | 0.9183 | 0.8407 | 0.9366 | 0.0074 |
CSNet | 2020 TIP | C-RSI | ResNet | 0.8399 | 0.7812 | 0.8860 | 0.0142 |
HVPNet | 2021 TGRS | C-RSI | ResNet | 0.8697 | 0.7430 | 0.8552 | 0.0112 |
SAMNet | 2021 TGRS | C-RSI | ResNet | 0.8597 | 0.7286 | 0.8543 | 0.0133 |
CoorNet | 2021 TGRS | C-RSI | ResNet | 0.9298 | 0.8890 | 0.9646 | 0.0083 |
MCCNet | 2021 TGRS | C-RSI | VGG | 0.9323 | 0.8874 | 0.9685 | 0.0066 |
SUCA | 2021 T-GRS | C-RSI | ResNet | 0.8985 | 0.8168 | 0.9251 | 0.0097 |
ACCoNet | 2022 TCYB | C-RSI | VGG | 0.9285 | 0.8823 | 0.9655 | 0.0074 |
DNTD | 2022 TGRS | C-RSI | ResNet | 0.8954 | 0.8176 | 0.9196 | 0.0114 |
MJRBM | 2022 TGRS | C-RSI | ResNet | 0.9200 | 0.8504 | 0.9354 | 0.0099 |
ICON | 2022 TGRS | C-RSI | ResNet | 0.9196 | 0.8632 | 0.9619 | 0.0073 |
EMFINet | 2022 TGRS | C-RSI | ResNet | 0.9299 | 0.8751 | 0.9601 | 0.0084 |
DPORTNet | 2023 TGRS | C-RSI | ResNet | 0.8937 | 0.7945 | 0.8907 | 0.0152 |
PA-KRN | 2023 TGRS | C-RSI | ResNet | 0.9186 | 0.8621 | 0.9537 | 0.0104 |
ERPNet | 2023 TGRS | C-RSI | ResNet | 0.9201 | 0.8565 | 0.9399 | 0.0089 |
GeleNet | 2023 TIP | T-RSI | PVT | 0.8849 | 0.8138 | 0.9273 | 0.0090 |
HFANet | 2022 TGRS | H-RSI | Hybrid | 0.9390 | 0.8951 | 0.9678 | 0.0070 |
VMamba | 2024 arXiv | - | MAmba | 0.7423 | 0.5104 | 0.6991 | 0.0273 |
Samba | 2024 arXiv | - | MAmba | 0.8976 | 0.8397 | 0.9416 | 0.0124 |
DBINet | 2024 GRSL | H-RSI | Hybrid | 0.8830 | 0.8152 | 0.9258 | 0.0129 |
DCNet | 2024 JSTARS | H-RSI | Hybrid | 0.8482 | 0.8148 | 0.8753 | 0.0161 |
DSINet | 2024 Vis Comput | H-RSI | Hybrid | 0.9315 | 0.8776 | 0.9571 | 0.0076 |
TSFANet | Ours | H-RSI | Hybrid | 0.9323 | 0.8874 | 0.9685 | 0.0060 |
Methods | Publication | Type | Backbone | ORSI-4199 | |||
---|---|---|---|---|---|---|---|
PicaNet | 2018 CVPR | C-NSI | ResNet | 0.8359 | 0.7711 | 0.8860 | 0.0418 |
RAS | 2018 ECCV | C-NSI | ResNet | 0.8326 | 0.7879 | 0.8962 | 0.0433 |
R3Net | 2018 IJCAI | C-NSI | ResNet | 0.8586 | 0.8220 | 0.9116 | 0.0353 |
PoolNet | 2019 CVPR | C-NSI | ResNet | 0.7926 | 0.7133 | 0.8491 | 0.0643 |
BASNet | 2019 CVPR | C-NSI | U-Net | 0.8369 | 0.7862 | 0.8863 | 0.0500 |
CPDNet | 2019 CVPR | C-NSI | ResNet | 0.8361 | 0.8035 | 0.8943 | 0.0403 |
EGNet | 2019 ICCV | C-NSI | VGG | 0.8464 | 0.7959 | 0.8943 | 0.0455 |
U2Net | 2020 PR | C-NSI | U-Net | 0.8384 | 0.7915 | 0.8970 | 0.0433 |
VIT | 2021 ICCV | T-NSI | Transformer | 0.8764 | 0.8354 | 0.9288 | 0.0307 |
CSNet | 2020 TIP | C-RSI | ResNet | 0.8215 | 0.7540 | 0.8549 | 0.0545 |
HVPNet | 2021 TGRS | C-RSI | ResNet | 0.8468 | 0.7932 | 0.8930 | 0.0438 |
SAMNet | 2021 TGRS | C-RSI | ResNet | 0.8407 | 0.7920 | 0.8908 | 0.0452 |
CoorNet | 2021 TGRS | C-RSI | ResNet | 0.8621 | 0.8421 | 0.9204 | 0.0382 |
MCCNet | 2021 TGRS | C-RSI | VGG | 0.8747 | 0.8541 | 0.9346 | 0.0332 |
SUCA | 2021 T-GRS | C-RSI | ResNet | 0.8795 | 0.8500 | 0.9352 | 0.0320 |
ACCoNet | 2022 TCYB | C-RSI | VGG | 0.8775 | 0.8531 | 0.9341 | 0.0330 |
DNTD | 2022 TGRS | C-RSI | ResNet | 0.8446 | 0.8113 | 0.9041 | 0.0441 |
MJRBM | 2022 TGRS | C-RSI | ResNet | 0.8593 | 0.8226 | 0.9102 | 0.0390 |
ICON | 2022 TGRS | C-RSI | ResNet | 0.8753 | 0.8577 | 0.9435 | 0.0299 |
EMFINet | 2022 TGRS | C-RSI | ResNet | 0.8674 | 0.8390 | 0.9254 | 0.0346 |
DPORTNet | 2023 TGRS | C-RSI | ResNet | 0.7475 | 0.7609 | 0.8676 | 0.0586 |
PA-KRN | 2023 TGRS | C-RSI | ResNet | 0.8492 | 0.8237 | 0.9164 | 0.0399 |
ERPNet | 2023 TGRS | C-RSI | ResNet | 0.8673 | 0.8286 | 0.9147 | 0.0373 |
GeleNet | 2023 TIP | T-RSI | PVT | 0.7665 | 0.6887 | 0.8216 | 0.0711 |
HFANet | 2022 TGRS | H-RSI | Hybrid | 0.8766 | 0.8534 | 0.9334 | 0.0330 |
VMamba | 2024 arXiv | - | MAmba | 0.7228 | 0.5853 | 0.7448 | 0.0829 |
Samba | 2024 arXiv | - | MAmba | 0.8467 | 0.8211 | 0.9132 | 0.0446 |
DBINet | 2024 GRSL | H-RSI | Hybrid | 0.8318 | 0.7870 | 0.8952 | 0.0441 |
DCNet | 2024 JSTARS | H-RSI | Hybrid | 0.8358 | 0.8032 | 0.8939 | 0.0410 |
DSINet | 2024 Vis Comput | H-RSI | Hybrid | 0.8644 | 0.8554 | 0.9284 | 0.0297 |
TSFANet | Ours | H-RSI | Hybrid | 0.8876 | 0.8617 | 0.9453 | 0.0277 |
Model | Params (M) ↓ | Flops (G) ↓ | Model | Params (M) ↓ | Flops (G) ↓ |
---|---|---|---|---|---|
PicaNet | 47.219 | 59.786 | RAS | 45.31 | 60.19 |
R3Net | 71.89 | 70.99 | PoolNet | 75.67 | 80.83 |
BASNet | 46.10 | 36.57 | CPDNet | 69.99 | 75.69 |
EGNet | 85.25 | 90.91 | U2Net | 44.01 | 115.313 |
VIT | 126.62 | 75.44 | – | – | – |
CSNet | 65.58 | 70.12 | HVPNet | 100.42 | 120.11 |
SAMNet | 95.66 | 114.96 | CoorNet | 88.98 | 104.69 |
MCCNet | 67.65 | 358.77 | SUCA | 117.34 | 171.66 |
ACCoNet | 167.01 | 177.21 | DNTD | 94.89 | 103.79 |
MJRBM | 85.94 | 90.25 | ICON | 94.79 | 89.91 |
EMFINet | 102.45 | 132.47 | DPORTNet | 93.73 | 88.68 |
PA-KRN | 87.62 | 73.65 | ERPNet | 72.34 | 87.55 |
GeleNet | 48.94 | 25.45 | HFANet | 320.31 | 78.46 |
VMamba | 66.80 | 121.31 | Samba | 99.16 | 123.57 |
DBINet | 57.61 | 47.15 | DCNet | 101.19 | 82.09 |
DSINet | 109.89 | 89.69 | TSFANet | 334.82 | 115.24 |
No. | TSDSM | ASCRM | SGAFF | ||||
---|---|---|---|---|---|---|---|
0 | 0.8766 | 0.8534 | 0.9334 | 0.0330 | |||
1 | ✓ | 0.8792 | 0.8573 | 0.9369 | 0.0313 | ||
2 | ✓ | 0.8779 | 0.8550 | 0.9354 | 0.0321 | ||
3 | ✓ | 0.8788 | 0.8567 | 0.9362 | 0.0317 | ||
4 | ✓ | ✓ | 0.8858 | 0.8603 | 0.9418 | 0.0287 | |
5 | ✓ | ✓ | 0.8849 | 0.8596 | 0.9403 | 0.0292 | |
6 | ✓ | ✓ | 0.8835 | 0.8590 | 0.9388 | 0.0296 | |
7 | ✓ | ✓ | ✓ | 0.8876 | 0.8617 | 0.9453 | 0.0277 |
Encoder | Evaluation Metrics | |||
---|---|---|---|---|
Baseline | 0.8835 | 0.8590 | 0.9388 | 0.0296 |
+ CNN | 0.8839 | 0.8593 | 0.9391 | 0.0294 |
+ Transformer | 0.8853 | 0.8602 | 0.9417 | 0.0283 |
+ Mamba | 0.8847 | 0.8599 | 0.9412 | 0.0287 |
+ CNN-Transformer | 0.8872 | 0.8614 | 0.9446 | 0.0279 |
+ CNN-Mamba | 0.8868 | 0.8609 | 0.9440 | 0.0280 |
+ TSDSM | 0.8876 | 0.8617 | 0.9453 | 0.0277 |
Encoder | Evaluation Metrics | |||
---|---|---|---|---|
No.1 | 0.8849 | 0.8596 | 0.9403 | 0.0292 |
No.2 | 0.8849 | 0.8597 | 0.9404 | 0.0291 |
No.3 | 0.8850 | 0.8599 | 0.9406 | 0.0290 |
No.4 | 0.8851 | 0.8600 | 0.9410 | 0.0288 |
No.5 | 0.8855 | 0.8602 | 0.9428 | 0.0286 |
No.6 | 0.8864 | 0.8605 | 0.9442 | 0.0283 |
No.7 | 0.8866 | 0.8607 | 0.9444 | 0.0282 |
No.8 | 0.8873 | 0.8614 | 0.9451 | 0.0279 |
No.9 | 0.8871 | 0.8610 | 0.9449 | 0.0280 |
No.10 | 0.8876 | 0.8617 | 0.9453 | 0.0277 |
Encoder | Evaluation Metrics | |||
---|---|---|---|---|
Baseline | 0.8766 | 0.8534 | 0.9334 | 0.0330 |
+AAM | 0.8770 | 0.8539 | 0.9338 | 0.0327 |
+BAM | 0.8767 | 0.8535 | 0.9335 | 0.0329 |
+CAM | 0.8767 | 0.8536 | 0.9336 | 0.0329 |
+CBAM | 0.8769 | 0.8538 | 0.9338 | 0.0328 |
+CoT-Attention | 0.8772 | 0.8540 | 0.9341 | 0.0326 |
+ECA-Attention | 0.8773 | 0.8542 | 0.9342 | 0.0325 |
+NLAM | 0.8775 | 0.8544 | 0.9350 | 0.0324 |
+RSAM | 0.8776 | 0.85475 | 0.9351 | 0.0324 |
+Self-Attention | 0.8778 | 0.8548 | 0.9353 | 0.00322 |
+ASCRM | 0.8779 | 0.8550 | 0.9354 | 0.0321 |
Encoder | Evaluation Metrics | |||
---|---|---|---|---|
Baseline | 0.8766 | 0.8534 | 0.9334 | 0.0330 |
+ Element-wise Summation | 0.8769 | 0.8538 | 0.9339 | 0.0328 |
+ Element-wise Multiplication | 0.8774 | 0.8542 | 0.9343 | 0.0325 |
+ Channel Concatenation | 0.8779 | 0.8557 | 0.9359 | 0.0321 |
+ SGAFF | 0.8788 | 0.8567 | 0.9362 | 0.0317 |
Encoder | Evaluation Metrics | |||
---|---|---|---|---|
F-measure Loss | 0.8734 | 0.8566 | 0.8802 | 0.0395 |
CT Loss | 0.8745 | 0.8491 | 0.8871 | 0.0389 |
BCE Loss | 0.8748 | 0.8543 | 0.9097 | 0.0367 |
IG Loss | 0.8755 | 0.8587 | 0.9212 | 0.0314 |
F-measure Loss + IOU Loss | 0.8761 | 0.8609 | 0.9385 | 0.0292 |
CT Loss + IOU Loss | 0.8852 | 0.8603 | 0.9418 | 0.0295 |
BCE Loss + IOU Loss | 0.8868 | 0.8610 | 0.9444 | 0.0281 |
IG Loss + IOU Loss | 0.8876 | 0.8617 | 0.9453 | 0.0277 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, J.; Wang, Z.; Xu, N.; Zhang, C. TSFANet: Trans-Mamba Hybrid Network with Semantic Feature Alignment for Remote Sensing Salient Object Detection. Remote Sens. 2025, 17, 1902. https://doi.org/10.3390/rs17111902
Li J, Wang Z, Xu N, Zhang C. TSFANet: Trans-Mamba Hybrid Network with Semantic Feature Alignment for Remote Sensing Salient Object Detection. Remote Sensing. 2025; 17(11):1902. https://doi.org/10.3390/rs17111902
Chicago/Turabian StyleLi, Jiayuan, Zhen Wang, Nan Xu, and Chuanlei Zhang. 2025. "TSFANet: Trans-Mamba Hybrid Network with Semantic Feature Alignment for Remote Sensing Salient Object Detection" Remote Sensing 17, no. 11: 1902. https://doi.org/10.3390/rs17111902
APA StyleLi, J., Wang, Z., Xu, N., & Zhang, C. (2025). TSFANet: Trans-Mamba Hybrid Network with Semantic Feature Alignment for Remote Sensing Salient Object Detection. Remote Sensing, 17(11), 1902. https://doi.org/10.3390/rs17111902