Semantic Attention and Structured Model for Weakly Supervised Instance Segmentation in Optical and SAR Remote Sensing Imagery
Abstract
:1. Introduction
- We propose SASM-Net for weakly supervised instance segmentation tasks in optical and SAR remote sensing imagery. The segmentation branch of this network incorporates spatial relationship modeling to establish weak supervision constraints, allowing the accurate prediction of instance masks without the requirement of pixel-level labels.
- We introduce an MSFE module to build equivalent feature scales through a hierarchical approach similar to the residual structure during feature extraction, achieving efficient multi-scale feature extraction to adapt to the challenge of significant scale variations of targets in remote sensing imagery.
- We construct an SAE module that includes a semantic information prediction stream and an attention enhancement stream, which enhances the activation of instances and reduces interference from cluttered backgrounds in remote sensing imagery.
- We propose an SMG module to assist the SAE module in building supervision containing edge information during training, reducing the impact of insufficient sensitivity to edge information caused by the lack of fine-grained pixel-level labels and improving the model’s perceptual ability for target edge information.
2. Relate Work
2.1. Supervised Instance Segmentation
2.2. Weakly Supervised Instance Segmentation
3. Methodology
3.1. Overview
3.2. Multi-Scale Feature Extraction Module
3.3. Semantic Attention Enhancement Module
3.4. Structured Model Guidance Module
3.5. Segmentation Branch
4. Experiments
4.1. Datasets
4.2. Implementation Details
4.3. Evaluation Metrics
4.4. Experimental Results on the NWPU VHR-10 Instance Segmentation Dataset
- Weakly supervised paradigm methods: We categorize the compared weakly supervised methods into two types: adaptations of fully supervised methods and dedicated weakly supervised instance segmentation methods. Adaptations of fully supervised methods directly treat the object-level labels from annotations as bounding box labels to train the original fully supervised methods. Dedicated weakly supervised methods are designed explicitly for bounding box labels, including BoxInst [50], DiscoBox [28], DBIN [51], and MGWI-Net [49]. For DBIN, we exclude the domain adaptation aspect as it is beyond the scope of this paper. It should be noted that adaptations of fully supervised methods directly adopt the bounding box labels from annotations as pixel-level labels, thus requiring only consistent labeling with bounding box labels, and we classify them as weakly supervised paradigm methods.
- Fully supervised paradigm methods: Fully supervised methods perform instance segmentation by training with finely annotated pixel-level labels, which imposes expensive labeling costs. We select several representative fully supervised methods for comparison with the proposed SASM-Net.
- Hybrid supervised paradigm methods: To further compare with our proposed method, we also design a series of hybrid supervised methods. Specifically, we combine partial pixel-level labels with some object-level labels for network training. The labeling cost of this paradigm falls between weakly supervised and fully supervised methods.
4.5. Experimental Results on the SSDD Dataset
4.6. Ablation Study
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Chen, X.; Ma, M.; Li, Y.; Cheng, W. Fusing Deep Features by Kernel Collaborative Representation for Remote Sensing Scene Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 12429–12439. [Google Scholar] [CrossRef]
- Jing, W.; Zhang, M.; Tian, D. Improved U-Net Model for Remote Sensing Image Classification Method Based on Distributed Storage. J. Real-Time Image Process. 2021, 18, 1607–1619. [Google Scholar] [CrossRef]
- Zhang, J.; Liu, J.; Pan, B.; Chen, Z.; Xu, X.; Shi, Z. An Open Set Domain Adaptation Algorithm via Exploring Transferability and Discriminability for Remote Sensing Image Scene Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–12. [Google Scholar] [CrossRef]
- Li, B.; Xie, X.; Wei, X.; Tang, W. Ship Detection and Classification from Optical Remote Sensing Images: A Survey. Chin. J. Aeronaut. 2021, 34, 145–163. [Google Scholar] [CrossRef]
- Geng, J.; Xu, Z.; Zhao, Z.; Jiang, W. Rotated Object Detection of Remote Sensing Image Based on Binary Smooth Encoding and Ellipse-Like Focus Loss. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Yang, L.; Yuan, G.; Zhou, H.; Liu, H.; Chen, J.; Wu, H. RS-YOLOX: A High-Precision Detector for Object Detection in Satellite Remote Sensing Images. Appl. Sci. 2022, 12, 8707. [Google Scholar] [CrossRef]
- Alam, M.; Wang, J.-F.; Guangpei, C.; Yunrong, L.; Chen, Y. Convolutional Neural Network for the Semantic Segmentation of Remote Sensing Images. Mob. Netw. Appl. 2021, 26, 200–215. [Google Scholar] [CrossRef]
- Wang, J.-X.; Chen, S.-B.; Ding, C.H.Q.; Tang, J.; Luo, B. Semi-Supervised Semantic Segmentation of Remote Sensing Images with Iterative Contrastive Network. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Zhao, S.; Feng, Z.; Chen, L.; Li, G. DANet: A Semantic Segmentation Network for Remote Sensing of Roads Based on Dual-ASPP Structure. Electronics 2023, 12, 3243. [Google Scholar] [CrossRef]
- Yang, Z.; Wu, Q.; Zhang, F.; Zhang, X.; Chen, X.; Gao, Y. A New Semantic Segmentation Method for Remote Sensing Images Integrating Coordinate Attention and SPD-Conv. Symmetry 2023, 15, 1037. [Google Scholar] [CrossRef]
- Su, H.; Wei, S.; Liu, S.; Liang, J.; Wang, C.; Shi, J.; Zhang, X. HQ-ISNet: High-Quality Instance Segmentation for Remote Sensing Imagery. Remote Sens. 2020, 12, 989. [Google Scholar] [CrossRef]
- Chen, L.; Fu, Y.; You, S.; Liu, H. Efficient Hybrid Supervision for Instance Segmentation in Aerial Images. Remote Sens. 2021, 13, 252. [Google Scholar] [CrossRef]
- Zhao, D.; Zhu, C.; Qi, J.; Qi, X.; Su, Z.; Shi, Z. Synergistic Attention for Ship Instance Segmentation in SAR Images. Remote Sens. 2021, 13, 4384. [Google Scholar] [CrossRef]
- Fan, F.; Zeng, X.; Wei, S.; Zhang, H.; Tang, D.; Shi, J.; Zhang, X. Efficient Instance Segmentation Paradigm for Interpreting SAR and Optical Images. Remote Sens. 2022, 14, 531. [Google Scholar] [CrossRef]
- Wei, S.; Zeng, X.; Zhang, H.; Zhou, Z.; Shi, J.; Zhang, X. LFG-Net: Low-Level Feature Guided Network for Precise Ship Instance Segmentation in SAR Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–17. [Google Scholar] [CrossRef]
- Uijlings, J.R.R.; van de Sande, K.E.A.; Gevers, T.; Smeulders, A.W.M. Selective Search for Object Recognition. Int. J. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef]
- Pont-Tuset, J.; Arbelaez, P.; Barron, J.T.; Marques, F.; Malik, J. Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 128–140. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Y.; Zhu, Y.; Ye, Q.; Qiu, Q.; Jiao, J. Weakly Supervised Instance Segmentation Using Class Peak Response. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 3791–3800. [Google Scholar]
- Laradji, I.H.; Vazquez, D.; Schmidt, M. Where are the Masks: Instance Segmentation with Image-Level Supervision. arXiv 2019, arXiv:1907.01430. [Google Scholar]
- Ahn, J.; Cho, S.; Kwak, S. Weakly Supervised Learning of Instance Segmentation with Inter-Pixel Relations. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 2204–2213. [Google Scholar]
- Zhu, Y.; Zhou, Y.; Xu, H.; Ye, Q.; Doermann, D.; Jiao, J. Learning Instance Activation Maps for Weakly Supervised Instance Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3111–3120. [Google Scholar]
- Ge, W.; Guo, S.; Huang, W.; Scott, M.R. Label-PEnet: Sequential Label Propagation and Enhancement Networks for Weakly Supervised Instance Segmentation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27–28 October 2019; pp. 3345–3354. [Google Scholar]
- Arun, A.; Jawahar, C.V.; Kumar, M.P. Weakly Supervised Instance Segmentation by Learning Annotation Consistent Instances. In Proceedings of the European Conference on Computer Vision (ECCV), 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 254–270. [Google Scholar]
- Khoreva, A.; Benenson, R.; Hosang, J.; Hein, M.; Schiele, B. Simple Does It: Weakly Supervised Instance and Semantic Segmentation. arXiv 2016, arXiv:1603.07485. [Google Scholar]
- Wang, X.; Feng, J.; Hu, B.; Ding, Q.; Ran, L.; Chen, X.; Liu, W. Weakly-Supervised Instance Segmentation via Class-Agnostic Learning with Salient Images. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 10225–10235. [Google Scholar]
- Lee, J.; Yi, J.; Shin, C.; Yoon, S. BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 2643–2651. [Google Scholar]
- Hsu, C.-C.; Hsu, K.-J.; Tsai, C.-C.; Lin, Y.-Y.; Chuang, Y.-Y. Weakly Supervised Instance Segmentation Using the Bounding Box Tightness Prior. In Proceedings of the 2019 Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–14 December 2019; pp. 6586–6597. [Google Scholar]
- Lan, S.; Yu, Z.; Choy, C.; Radhakrishnan, S.; Liu, G.; Zhu, Y.; Davis, L.S.; Anandkumar, A. DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 3386–3396. [Google Scholar]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 386–397. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into High Quality Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6154–6162. [Google Scholar]
- Chen, K.; Ouyang, W.; Loy, C.C.; Lin, D.; Pang, J.; Wang, J.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; et al. Hybrid Task Cascade for Instance Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4969–4978. [Google Scholar]
- Liu, J.-J.; Hou, Q.; Cheng, M.-M.; Wang, C.; Feng, J. Improving Convolutional Networks with Self-Calibrated Convolutions. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10093–10102. [Google Scholar]
- Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. YOLACT: Real-Time Instance Segmentation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27–28 October 2019; pp. 9156–9165. [Google Scholar]
- Chen, H.; Sun, K.; Tian, Z.; Shen, C.; Huang, Y.; Yan, Y. BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8570–8578. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H. Conditional Convolutions for Instance Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 254–270. [Google Scholar]
- Wang, X.; Kong, T.; Shen, C.; Jiang, Y.; Li, L. SOLO: Segmenting Objects by Locations. In Proceedings of the European Conference on Computer Vision (ECCV), 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 649–665. [Google Scholar]
- Wang, X.; Zhang, R.; Kong, T.; Li, L.; Shen, C. SOLOv2: Dynamic and Fast Instance Segmentation. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020; pp. 17721–17732. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27–28 October 2019; p. 9626. [Google Scholar]
- Gao, S.-H.; Cheng, M.-M.; Zhao, K.; Zhang, X.-Y.; Yang, M.-H.; Torr, P. Res2Net: A New Multi-Scale Backbone Architecture. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 652–662. [Google Scholar] [CrossRef] [PubMed]
- Zhang, T.; Zhang, X.; Zhu, P.; Tang, X.; Li, C.; Jiao, L.; Zhou, H. Semantic Attention and Scale Complementary Network for Instance Segmentation in Remote Sensing Images. IEEE Trans. Cybern. 2022, 52, 10999–11013. [Google Scholar] [CrossRef] [PubMed]
- Krähenbühl, P.; Koltun, V. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. arXiv 2012, arXiv:1210.5644. [Google Scholar]
- Hao, S.; Wang, G.; Gu, R. Weakly Supervised Instance Segmentation Using Multi-Prior Fusion. Comput. Vis. Image Underst. 2021, 211, 103261. [Google Scholar] [CrossRef]
- Milletari, F.; Navab, N.; Ahmadi, S.-A. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 565–571. [Google Scholar]
- Su, H.; Wei, S.; Yan, M.; Wang, C.; Shi, J.; Zhang, X. Object Detection and Instance Segmentation in Remote Sensing Imagery Based on Precise Mask R-CNN. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 1454–1457. [Google Scholar]
- Cheng, G.; Zhou, P.; Han, J. Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision (ECCV), 13th European Conference, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Chen, M.; Zhang, Y.; Chen, E.; Hu, Y.; Xie, Y.; Pan, Z. Meta-Knowledge Guided Weakly Supervised Instance Segmentation for Optical and SAR Image Interpretation. Remote Sens. 2023, 15, 2357. [Google Scholar] [CrossRef]
- Tian, Z.; Shen, C.; Wang, X.; Chen, H. BoxInst: High-Performance Instance Segmentation with Box Annotations. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 5443–5452. [Google Scholar] [CrossRef]
- Li, Y.; Xue, Y.; Li, L.; Zhang, X.; Qian, X. Domain Adaptive Box-Supervised Instance Segmentation Network for Mitosis Detection. IEEE Trans. Med. Imaging 2022, 41, 2469–2485. [Google Scholar] [CrossRef]







| Paradigm | Method | Ppixe. | AP | AP50 | AP75 | APS | APM | APL | Tspee. | 
|---|---|---|---|---|---|---|---|---|---|
| Hybrid supervised | YOLACT [34] | 25% | 15.2 | 41.2 | 7.8 | 7.7 | 16.8 | 12.6 | - | 
| YOLACT [34] | 50% | 22.5 | 49.7 | 17.0 | 9.6 | 19.9 | 31.5 | - | |
| YOLACT [34] | 75% | 27.5 | 54.4 | 27.4 | 12.1 | 25.9 | 34.2 | - | |
| Mask R-CNN [29] | 25% | 25.7 | 59.4 | 18.8 | 16.9 | 25.3 | 29.3 | - | |
| Mask R-CNN [29] | 50% | 35.5 | 70.8 | 31.3 | 24.6 | 34.2 | 39.9 | - | |
| Mask R-CNN [29] | 75% | 49.3 | 82.6 | 51.7 | 36.9 | 47.0 | 53.9 | - | |
| CondInst [36] | 25% | 23.9 | 59.8 | 14.8 | 19.8 | 23.7 | 25.3 | - | |
| CondInst [36] | 50% | 34.5 | 73.4 | 27.6 | 23.7 | 34.1 | 35.9 | - | |
| CondInst [36] | 75% | 49.5 | 85.1 | 50.3 | 35.9 | 48.6 | 53.7 | - | |
| Fully supervised | YOLACT [34] | 100% | 35.6 | 68.4 | 36.4 | 14.8 | 33.3 | 56.0 | - | 
| Mask R-CNN [29] | 100% | 58.8 | 86.6 | 65.2 | 47.1 | 57.5 | 62.4 | - | |
| CondInst [36] | 100% | 58.5 | 90.1 | 62.9 | 29.4 | 56.8 | 71.3 | - | |
| Weakly supervised | Adaptations of fully supervised methods | ||||||||
| YOLACT [34] | 0 | 9.8 | 32.9 | 1.3 | 4.4 | 11.3 | 8.0 | 61.0 | |
| Mask R-CNN [29] | 0 | 19.8 | 54.7 | 9.7 | 7.8 | 19.4 | 24.6 | 74.1 | |
| CondInst [36] | 0 | 17.1 | 50.5 | 6.7 | 10.7 | 17.7 | 18.5 | 94.3 | |
| Dedicated weakly supervised methods | |||||||||
| BoxInst [50] | 0 | 47.6 | 78.9 | 49.0 | 33.8 | 43.9 | 55.5 | 94.3 | |
| DiscoBox [28] | 0 | 46.2 | 79.7 | 47.4 | 29.4 | 42.9 | 57.1 | 90.9 | |
| DBIN [51] | 0 | 48.3 | 80.2 | 50.5 | 34.5 | 46.1 | 57.0 | 99.0 | |
| MGWI-Net [49] | 0 | 51.6 | 81.3 | 53.3 | 37.6 | 48.2 | 59.1 | 96.2 | |
| SASM-Net | 0 | 53.1 | 82.4 | 55.2 | 38.6 | 49.9 | 61.0 | 107.5 | |
| Paradigm | Method | Rpixe. | AI | BD | GTF | VC | SH | TC | HB | ST | BC | BR | Npara. | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Hybrid supervised | YOLACT [34] | 25% | 0 | 39.1 | 14.2 | 12.2 | 1.5 | 7.6 | 12.2 | 49.8 | 10.3 | 1.4 | - | 
| YOLACT [34] | 50% | 0.1 | 55.0 | 49.9 | 11.7 | 4.6 | 17.0 | 19.3 | 52.5 | 13.6 | 1.2 | - | |
| YOLACT [34] | 75% | 0.7 | 64.0 | 62.9 | 19.4 | 5.9 | 19.8 | 17.4 | 60.7 | 16.9 | 7.2 | - | |
| Mask R-CNN [29] | 25% | 0.1 | 34.2 | 37.4 | 12.5 | 4.7 | 39.3 | 22.5 | 56.6 | 42.0 | 7.8 | - | |
| Mask R-CNN [29] | 50% | 8.8 | 49.7 | 50.8 | 28.1 | 14.9 | 44.1 | 30.6 | 59.4 | 52.1 | 16.7 | - | |
| Mask R-CNN [29] | 75% | 27.6 | 71.2 | 68.6 | 40.5 | 30.4 | 63.2 | 33.0 | 70.1 | 64.7 | 23.9 | - | |
| CondInst [36] | 25% | 0 | 37.5 | 35.7 | 18.2 | 3.0 | 36.6 | 18.2 | 53.8 | 32.5 | 3.5 | - | |
| CondInst [36] | 50% | 14.8 | 49.0 | 45.6 | 23.0 | 12.8 | 45.6 | 28.4 | 57.1 | 56.8 | 11.3 | - | |
| CondInst [36] | 75% | 30.9 | 68.4 | 64.5 | 41.5 | 31.8 | 68.4 | 30.3 | 65.0 | 66.0 | 27.8 | - | |
| Fully supervised | YOLACT [34] | 100% | 8.2 | 70.5 | 70.8 | 22.7 | 21.5 | 24.3 | 34.8 | 63.4 | 26.5 | 13.5 | - | 
| Mask R-CNN [29] | 100% | 35.3 | 78.8 | 84.8 | 46.1 | 50.2 | 72.0 | 48.1 | 80.9 | 64.2 | 28.0 | - | |
| CondInst [36] | 100% | 26.7 | 77.7 | 89.1 | 46.2 | 46.1 | 69.7 | 46.8 | 73.4 | 74.0 | 35.4 | - | |
| Weakly supervised | Adaptations of fully supervised methods | ||||||||||||
| YOLACT [34] | 0 | 0 | 20.7 | 12.1 | 4.8 | 0.1 | 9.6 | 2.1 | 33.5 | 14.9 | 0.1 | 34.8 | |
| Mask R-CNN [29] | 0 | 0 | 33.3 | 34.2 | 8.0 | 2.3 | 21.5 | 16.4 | 48.6 | 26.9 | 6.6 | 63.3 | |
| CondInst [36] | 0 | 0 | 30.7 | 26.8 | 6.6 | 1.1 | 19.2 | 14.2 | 46.1 | 23.1 | 3.1 | 53.5 | |
| Dedicated weakly supervised methods | |||||||||||||
| BoxInst [50] | 0 | 12.5 | 76.6 | 89.7 | 38.0 | 47.9 | 65.5 | 11.3 | 75.4 | 58.9 | 6.8 | 53.5 | |
| DiscoBox [28] | 0 | 12.0 | 77.7 | 91.5 | 33.7 | 42.8 | 64.3 | 10.6 | 74.6 | 57.9 | 6.0 | 65.0 | |
| DBIN [51] | 0 | 14.0 | 77.1 | 91.2 | 37.8 | 48.6 | 67.8 | 13.0 | 75.2 | 61.9 | 5.4 | 55.6 | |
| MGWI-Net [49] | 0 | 17.0 | 77.3 | 91.9 | 41.0 | 50.8 | 71.2 | 15.7 | 76.5 | 64.6 | 10.9 | 53.7 | |
| SASM-Net | 0 | 19.6 | 78.6 | 92.7 | 42.6 | 51.7 | 72.4 | 14.5 | 77.0 | 66.8 | 11.3 | 58.1 | |
| Paradigm | Method | Rpixe. | AP | AP50 | AP75 | APS | APM | Tspee. | 
|---|---|---|---|---|---|---|---|---|
| Hybrid supervised | YOLACT [34] | 25% | 17.4 | 59.0 | 1.5 | 19.7 | 21.0 | - | 
| YOLACT [34] | 50% | 28.6 | 76.7 | 9.0 | 32.1 | 34.2 | - | |
| YOLACT [34] | 75% | 39.0 | 79.9 | 32.5 | 40.3 | 45.5 | - | |
| Mask R-CNN [29] | 25% | 22.8 | 72.4 | 6.3 | 27.2 | 28.5 | - | |
| Mask R-CNN [29] | 50% | 39.3 | 86.2 | 28.0 | 42.7 | 44.4 | - | |
| Mask R-CNN [29] | 75% | 54.6 | 90.2 | 63.0 | 56.6 | 57.1 | - | |
| CondInst [36] | 25% | 18.6 | 65.7 | 2.7 | 22.1 | 23.7 | - | |
| CondInst [36] | 50% | 38.4 | 87.4 | 28.6 | 41.3 | 43.8 | - | |
| CondInst [36] | 75% | 54.1 | 93.0 | 59.6 | 54.6 | 56.8 | - | |
| Fully supervised | YOLACT [34] | 100% | 44.6 | 86.6 | 41.0 | 45.3 | 48.5 | - | 
| Mask R-CNN [29] | 100% | 64.2 | 94.9 | 80.1 | 62.0 | 64.7 | - | |
| CondInst [36] | 100% | 63.0 | 95.9 | 78.4 | 63.7 | 63.6 | - | |
| Weakly supervised | Adaptations of fully supervised methods | |||||||
| YOLACT [34] | 0 | 12.4 | 49.4 | 0.6 | 15.9 | 17.3 | 43.9 | |
| Mask R-CNN [29] | 0 | 15.5 | 61.0 | 1.6 | 20.2 | 21.1 | 50.8 | |
| CondInst [36] | 0 | 14.8 | 59.1 | 1.4 | 17.7 | 19.6 | 63.7 | |
| Dedicated weakly supervised methods | ||||||||
| BoxInst [50] | 0 | 49.9 | 90.1 | 52.7 | 50.6 | 52.3 | 64.1 | |
| DiscoBox [28] | 0 | 48.4 | 90.2 | 50.4 | 47.2 | 50.6 | 60.6 | |
| DBIN [51] | 0 | 50.6 | 91.7 | 52.8 | 51.3 | 52.0 | 65.4 | |
| MGWI-Net [49] | 0 | 53.0 | 92.4 | 57.1 | 53.7 | 54.9 | 64.9 | |
| SASM-Net | 0 | 54.6 | 93.0 | 60.8 | 56.6 | 57.9 | 69.9 | |
| Method | MSFE | SAE | SMG | AP | AP50 | AP75 | APS | APM | APL | 
|---|---|---|---|---|---|---|---|---|---|
| Baseline | 49.8 | 80.7 | 51.0 | 35.2 | 46.1 | 57.4 | |||
| Models | 50.7 | 81.0 | 52.1 | 36.3 | 47.3 | 58.9 | |||
| 51.6 | 81.2 | 53.1 | 36.8 | 48.7 | 59.9 | ||||
| 52.2 | 81.7 | 54.4 | 37.6 | 48.8 | 60.4 | ||||
| SASM-Net | 53.1 | 82.4 | 55.2 | 38.6 | 49.9 | 61.0 | 
| Method | MSFE | SAE | SMG | AP | AP50 | AP75 | APS | APM | 
|---|---|---|---|---|---|---|---|---|
| Baseline | 51.8 | 91.9 | 54.0 | 52.7 | 53.2 | |||
| Models | 52.9 | 92.1 | 55.9 | 54.8 | 54.7 | |||
| 53.5 | 92.3 | 57.5 | 55.7 | 56.7 | ||||
| 53.9 | 92.2 | 59.4 | 55.3 | 57.0 | ||||
| SASM-Net | 54.6 | 93.0 | 60.8 | 56.6 | 57.9 | 
| Dataset | Method | AP | AP50 | AP75 | APS | APM | APL | 
|---|---|---|---|---|---|---|---|
| NWPU VHR-10 | Baseline | 50.9 | 80.6 | 52.2 | 36.0 | 47.4 | 59.0 | 
| Post-processing | 51.1 | 80.2 | 52.9 | 35.7 | 47.9 | 59.8 | |
| Implicit guidance | 52.2 | 81.7 | 54.4 | 37.6 | 48.8 | 60.4 | |
| SSDD | Baseline | 52.6 | 91.7 | 56.2 | 54.1 | 55.2 | - | 
| Post-processing | 53.0 | 91.8 | 57.1 | 54.4 | 56.1 | - | |
| Implicit guidance | 53.9 | 92.2 | 59.4 | 55.3 | 57.0 | - | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, M.; Xu, K.; Chen, E.; Zhang, Y.; Xie, Y.; Hu, Y.; Pan, Z. Semantic Attention and Structured Model for Weakly Supervised Instance Segmentation in Optical and SAR Remote Sensing Imagery. Remote Sens. 2023, 15, 5201. https://doi.org/10.3390/rs15215201
Chen M, Xu K, Chen E, Zhang Y, Xie Y, Hu Y, Pan Z. Semantic Attention and Structured Model for Weakly Supervised Instance Segmentation in Optical and SAR Remote Sensing Imagery. Remote Sensing. 2023; 15(21):5201. https://doi.org/10.3390/rs15215201
Chicago/Turabian StyleChen, Man, Kun Xu, Enping Chen, Yao Zhang, Yifei Xie, Yahao Hu, and Zhisong Pan. 2023. "Semantic Attention and Structured Model for Weakly Supervised Instance Segmentation in Optical and SAR Remote Sensing Imagery" Remote Sensing 15, no. 21: 5201. https://doi.org/10.3390/rs15215201
APA StyleChen, M., Xu, K., Chen, E., Zhang, Y., Xie, Y., Hu, Y., & Pan, Z. (2023). Semantic Attention and Structured Model for Weakly Supervised Instance Segmentation in Optical and SAR Remote Sensing Imagery. Remote Sensing, 15(21), 5201. https://doi.org/10.3390/rs15215201
 
        


 
       