Scale in Scale for SAR Ship Instance Segmentation
Abstract
:1. Introduction
- A SISNet is proposed, delving into high-quality SAR ship instance segmentation based on Mask R-CNN.
- In SISNet, four SIS modes, i.e., the input mode, backbone mode, RPN mode, and ROI mode, are proposed. In SISNet, two additional improvements, i.e., redesigned FPN and redesigned DH, are proposed.
- To verify the effectiveness of SISNet, we conduct extensive experiments on public dataset SSDD and HRSID. SISNet offers a state-of-the-art performance.
2. Methodology
2.1. Input Mode
2.2. Backbone Mode
2.3. RPN Mode
2.4. ROI Mode
2.5. Redesigned FPN
2.6. Redesigned DH
3. Experiments
3.1. Dataset
3.2. Experimental Detail
3.3. Evaluation Criteria
4. Results
4.1. Quantitative Results
- Each technique (redesigned FPN, redesigned DH, and four SIS modes) is useful. The accuracy gradually increases with the progressive insertion of each technique to the baseline. This shows the correctness of the theoretical analysis in Section 2. The detection AP is improved by 9.9% on SSDD and by 5.4% on HRSID. The segmentation AP is improved by 7.3% on SSDD and by 4.1% on HRSID. Certainly, the detection speed is expectedly sacrificed. The accuracy–speed trade-off is a permanent topic, which will be considered in the future.
- The accuracy gains of different techniques are differentiated, but each of them is always instrumental in performance improvements, more or less. Thus, the accuracy still exhibits an upward trend. Each technique accuracy sensitivity upon the whole SISNet will be introduced in Section 5 in the form of each installation and removal.
- The detection accuracies are universally higher than the segmentation ones, regardless of SSDD and HRSID, because the latter is more challenging and detects ships at the pixel level.
- The accuracies on HRSID are universally lower than those on SSDD because HRSID have more complex SAR images, so more efforts should be made on HRSID in the future.
- SISNet surpasses the other competitive models dramatically. The suboptimal model is from HTC [73]. Still, its detection performance is lower than SISNet by 5.1% on SSDD and by 3.3% on HRSID. Its segmentation performance is still lower than SISNet by 4.4% on SSDD and by 2.5% on HRSID. This fully shows the state-of-the-art performance of SISNet.
- The detection speed of SISNet is inferior to others, but it offers more accuracy gains. This shortcoming needs to be handled in the future. Despite this, SISNet is still better than others because when only three techniques (the redesigned FPN, the redesigned DH, and the input SIS mode) are used, the performance of SISNet outperforms the others already. That is, the detection AP on SSDD is 67.7%, which is already better than that of HTC by 0.9% (66.8%). The segmentation AP on SSDD is 61.4%, which is already better than that of HTC by 0.7% (60.7%). Meanwhile, in the above case, the detection speed of SISNet is 3.31 FPS, which is comparable to others to some degree.
- YOLACT offers the fastest detection speed, since it is a one-stage model, but its accuracy is too poor to meet the application requirements. Its performance is greatly lower than SISNet’s, i.e., on SSDD, its detection is 54.0% AP << SISNet’s detection 71.9% AP, and its segmentation is 48.4% AP << SISNet’s segmentation 65.1% AP. The same is true on HRSID.
- SISNet’s model size is 909 MB, and its parameter quantity is 118.10 M. This seems to be acceptable due to the fact that the model size of HTC reaches 733 MB, and HQ-ISNet has 98.79 M parameters. Therefore, although SISNet has the highest complexity, its high accuracy makes up for it, which is worth it. The segmentation performance of SISNet is higher than HTC by 4.4% on SSDD and by 2.5% on HRSID. Thus, SISNet may still be cost effective.
4.2. Qualitative Results
- SISNet offers a higher detection rate than HTC (see the blue ellipse regions). In the #1 image of Figure 18, HTC missed a ship, while SISNet detected it smoothly. In the #3 image of Figure 19, HTC missed three ones parked at ports, while SISNet missed one. The same is true on other images. This benefits from the combined action of the proposed improvements.
- SISNet offers a lower false alarm rate than HTC (see the orange ellipse regions). In the #1 image of Figure 19, HTC generated four false alarms, whereas SISNet generated one. In the #2 image of Figure 19, HTC generated one false alarm, whereas SISNet suppressed it. The same is true on other images. This is because SISNet can receive more background context information via the adopted ROI-mode SIS, boosting its foreground–background discrimination capacity.
- SISNet offers better detection performance of small ships. In the #6 image of Figure 18, eleven small ships were missed by HTC, whereas SISNet detected three of them. This is because our redesigned FPN can ease the spatial feature loss of small ships. Now, SAR small ship detection is a challenging topic due to fewer features, but SISNet can deal with this task well.
- SISNet offers better detection performance of large ships. In the #4 image of Figure 18, the positioning accuracy of HTC was poorer than that of SISNet. Moreover, HTC resulted in two extra false alarms arising from repeated detections. The same situation also occurred on the #4 and #5 images of Figure 19. The redesigned DH and the PA branch in the redesigned FPN both play a vital role in detecting large ships. The multi-cascaded regressors of the former can progressively refine the positioning of large ships. More spatial location information is transmitted to the pyramid top by the latter, which can improve the representativeness of high-level features.
- SISNet offers better detection performance of densely parallel parked ships. In the #3 image of Figure 18, although the ship hulls overlap, SISNet can still detect them and then segment them. However, HTC misses most of them. At present, densely parallel parked ship detection is a challenging topic for mutual interferences, but SISNet can handle this task well.
- SISNet offers better detection performance of inshore ships. In all inshore scenes, SISNet detected more ships than HTC; meanwhile, it still avoided more false alarms. Now, inshore ship detection is a challenging topic because of more complex backgrounds and serious interferences of landing ship-like facilities, but SISNet can deal with this task well.
- SISNet offers more credible detection results (see the yellow ellipse regions). In the #5 image of Figure 18, the box confidence of HTC is 0.99, which is still lower than that of SISNet (1.0). In the #6 image of Figure 18, the box confidences of three small ships are all inferior to that of SISNet (i.e., 0.74 < 0.87, 0.70 < 0.98, and 0.96 < 0.99). This is because the triple structure of the redesigned DH can decouple the classification and regression task, enabling superior classification performance. Thus, SISNet enables more high-quality SAR ship detection.
- SISNet offers better segmentation performance. In the #4 image of Figure 18, the total pixels of the ship were separated into three independent regions by HTC, but this case did not occur on SISNet. In the #6 image of Figure 18, some scattered island pixels were misjudged as ship ones by HTC, but this case also did not occur on SISNet. This is because the ROI-mode SIS can enable the network to observe more surroundings to suppress pixel false alarms.
- SISNet offers superb multi-scale/cross-scale detection–segmentation performance. Regardless of very small ships or rather large ones, SISNet can always detect them. This benefits from the multi-scale image pyramid of the input-mode SIS, the more robust feature extraction of the backbone-mode SIS, the optimized proposals of the RPN-mode SIS, and the more robust multi-level features of the redesigned FPN.
- In short, SISNet offers state-of-the-art SAR ship instance segmentation performance.
5. Ablation Study
5.1. Ablation Study on Input Mode
5.2. Ablation Study on Backbone Mode
5.3. Ablation Study on RPN Mode
5.4. Ablation Study on ROI Mode
5.5. Ablation Study on Redesigned FPN
5.6. Ablation Study on Redesigned DH
6. Discussion
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhang, T.; Zhang, X.; Shi, J. HyperLi-Net: A hyper-light deep learning network for high-accurate and high-speed ship detection from synthetic aperture radar imagery. ISPRS J. Photogramm. Remote Sens. 2020, 167, 123–153. [Google Scholar] [CrossRef]
- Xu, X.; Zhang, X.; Shao, Z.; Shi, J.; Wei, S.; Zhang, T.; Zeng, T. A Group-Wise Feature Enhancement-and-Fusion Network with Dual-Polarization Feature Enrichment for SAR Ship Detection. Remote Sens. 2022, 14, 5276. [Google Scholar] [CrossRef]
- Zhang, T.; Zeng, T.; Zhang, X. Synthetic Aperture Radar (SAR) Meets Deep Learning. Remote Sens. 2023, 15, 303. [Google Scholar] [CrossRef]
- Chen, S.W.; Cui, X.C.; Wang, X.S. Speckle-free SAR image ship detection. IEEE Trans. Image Process. 2021, 30, 5969–5983. [Google Scholar] [CrossRef] [PubMed]
- Zhang, T.; Zhang, X. Injection of traditional hand-crafted features into modern CNN-based models for SAR ship classification: What, why, where, and how. Remote Sens. 2021, 13, 2091. [Google Scholar] [CrossRef]
- Zeng, X.; Wei, S.; Shi, J. A Lightweight Adaptive RoI Extraction Network for Precise Aerial Image Instance Segmentation. IEEE Trans. Instrum. Meas. 2021, 70, 1–17. [Google Scholar] [CrossRef]
- Xu, X.; Zhang, X.; Zhang, T.; Yang, Z.; Shi, J.; Zhan, X. Shadow-Background-Noise 3D Spatial Decomposition Using Sparse Low-Rank Gaussian Properties for Video-SAR Moving Target Shadow Enhancement. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X. A mask attention interaction and scale enhancement network for SAR ship instance segmentation. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X. Integrate Traditional Hand-Crafted Features into Modern CNN-based Models to Further Improve SAR Ship Classification Accuracy. In Proceedings of the 2021 7th Asia-Pacific Conference on Synthetic Aperture Radar (APSAR), Kuta, Bali island, Indonesia, 1–3 November 2021; pp. 1–6. [Google Scholar]
- Ai, J.; Luo, Q.; Yang, X. Outliers-Robust CFAR Detector of Gaussian Clutter Based on the Truncated-Maximum-Likelihood-Estimator in SAR Imagery. IEEE Trans. Intell. Transp. Syst. 2019, 21, 2039–2049. [Google Scholar] [CrossRef]
- Liu, T.; Zhang, J.; Gao, G. CFAR Ship Detection in Polarimetric Synthetic Aperture Radar Images Based on Whitening Filter. IEEE Trans. Geosci. Remote Sens. 2019, 58, 58–81. [Google Scholar] [CrossRef]
- Zhu, J.; Qiu, X.; Pan, Z. Projection Shape Template-Based Ship Target Recognition in TerraSAR-X Images. IEEE Geosci. Remote Sens. Lett. 2016, 14, 222–226. [Google Scholar] [CrossRef]
- Wang, C.; Bi, F.; Chen, L. A novel threshold template algorithm for ship detection in high-resolution SAR images. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Beijing, China, 10–15 July 2016; pp. 100–103. [Google Scholar]
- Liu, Y.; Zhao, J.; Qin, Y. A novel technique for ship wake detection from optical images. Remote Sens. Environ. 2021, 258, 112375. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X. High-speed ship detection in SAR images based on a grid convolutional neural network. Remote Sens. 2019, 11, 1206. [Google Scholar] [CrossRef] [Green Version]
- Zhang, T.; Zhang, X. A polarization fusion network with geometric feature embedding for SAR ship classification. Pattern Recognit. 2021, 123, 108365. [Google Scholar] [CrossRef]
- Zhang, X.; Zhang, T.; Shi, J.; Wei, S. High-speed and High-accurate SAR ship detection based on a depthwise separable convolution neural network. Journal of Radars. 2019, 8, 841–851. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2015; pp. 91–99. [Google Scholar]
- Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Zhang, T.; Zhang, X. Squeeze-and-excitation Laplacian pyramid network with dual-polarization feature fusion for ship classification in sar images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X.; Shi, J.; Wei, S. ShipDeNet-18: An only 1 MB with only 18 convolution layers light-weight deep learning network for SAR ship detection. In Proceedings of the IGARSS 2020–2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 1221–1224. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Zhang, T.; Zhang, X.; Shi, J.; Wei, S. High-speed ship detection in SAR images by improved yolov3. In Proceedings of the 2019 16th International Computer Conference on Wavelet Active Media Technology and Information Processing, Chengdu, China, 14–15 December 2019; pp. 149–152. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra R-CNN: Towards Balanced Learning for Object Detection. arXiv 2019, arXiv:1904.02701. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Wu, Y.; Chen, Y.; Yuan, L. Rethinking Classification and Localization for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 10183–10192. [Google Scholar]
- Duan, K.; Bai, S.; Xie, L. CenterNet: Keypoint Triplets for Object Detection. In Proceedings of the European Conference on Computer Vision, Long Beach, CA, USA, 16–20 June 2019; pp. 6568–6577. [Google Scholar]
- Li, J.; Qu, C.; Shao, J. Ship detection in SAR images based on an improved faster R-CNN. In Proceedings of the 2017 SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA), Beijing, China, 13–14 November 2017; pp. 1–6. [Google Scholar]
- Zhang, T.; Zhang, X.; Ke, X. HOG-ShipCLSNet: A novel deep learning network with hog feature fusion for SAR ship classification. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5210322. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X. A full-level context squeeze-and-excitation ROI extractor for SAR ship instance segmentation. IEEE Geosci. Remote Sens. Lett. 2022, 19, 4506705. [Google Scholar] [CrossRef]
- Kang, M.; Ji, K.; Leng, X.; Lin, Z. Contextual Region-Based Convolutional Neural Network with Multilayer Fusion for SAR Ship Detection. Remote Sens 2017, 9, 860. [Google Scholar] [CrossRef] [Green Version]
- Lin, Z.; Ji, K.; Leng, X. Squeeze and Excitation Rank Faster R-CNN for Ship Detection in SAR Images. IEEE Geosci. Remote Sens. Lett. 2019, 16, 751–755. [Google Scholar] [CrossRef]
- Deng, Z.; Sun, H.; Zhou, S.; Zhao, J.; Lei, L.; Zou, H. Multi-scale object detection in remote sensing imagery with convolutional neural networks. ISPRS J. Photogramm. Remote Sens. 2018, 145, 3–22. [Google Scholar] [CrossRef]
- Zhao, J.; Guo, W.; Zhang, Z. A coupled convolutional neural network for small and densely clustered ship detection in SAR images. Sci. China Inf. Sci. 2018, 62, 1–16. [Google Scholar] [CrossRef] [Green Version]
- Cui, Z.; Li, Q.; Cao, Z.; Liu, N. Dense attention pyramid networks for multi-scale ship detection in SAR images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8983–8997. [Google Scholar] [CrossRef]
- Zhao, Y.; Zhao, L.; Xiong, B. Attention Receptive Pyramid Network for Ship Detection in SAR Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 2738–2756. [Google Scholar] [CrossRef]
- Fu, J.; Sun, X.; Wang, Z. An Anchor-Free Method Based on Feature Balancing and Refinement Network for Multiscale Ship Detection in SAR Images. IEEE Trans. Geosci. Remote Sens. 2020, 59, 1331–1344. [Google Scholar] [CrossRef]
- Gao, F.; He, Y.; Wang, J.; Hussain, A.; Zhou, H. Anchor-free Convolutional Network with Dense Attention Feature Aggregation for Ship Detection in SAR Images. Remote Sens. 2020, 12, 2619. [Google Scholar] [CrossRef]
- Xu, X.; Zhang, X.; Zhang, T. Lite-YOLOv5: A Lightweight Deep Learning Detector for On-Board Ship Detection in Large-Scene Sentinel-1 SAR Images. Remote Sens. 2022, 14, 1018. [Google Scholar] [CrossRef]
- Chen, S.; Zhan, R.; Wang, W. Learning Slimming SAR Ship Object Detector Through Network Pruning and Knowledge Distillation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 1267–1282. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X.; Shi, J.; Wei, S. Depthwise Separable Convolution Neural Network for High-Speed SAR Ship Detection. Remote Sens. 2019, 11, 2483. [Google Scholar] [CrossRef] [Green Version]
- Zhang, T.; Zhang, X.; Shi, J. Balance scene learning mechanism for offshore and inshore ship detection in SAR images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 4004905. [Google Scholar] [CrossRef]
- Jiang, J.; Fu, X.; Qin, R.; Wang, X.; Ma, Z. High-Speed Lightweight Ship Detection Algorithm Based on YOLO-V4 for Three-Channels RGB SAR Image. Remote Sens. 2021, 13, 1909. [Google Scholar] [CrossRef]
- Wang, J.; Lu, C.; Jiang, W. Simultaneous Ship Detection and Orientation Estimation in SAR Images Based on Attention Module and Angle Regression. Sensors 2018, 18, 2851. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jin, L.; Liu, G. An Approach on Image Processing of Deep Learning Based on Improved SSD. Symmetry 2021, 13, 495. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, C.; Zhang, H. Combining a single shot multibox detector with transfer learning for ship detection using sentinel-1 SAR images. Remote Sens. Lett. 2018, 9, 780–788. [Google Scholar] [CrossRef]
- Zhang, X.; Wang, H.; Xu, C. A lightweight feature optimizing network for ship detection in SAR image. IEEE Access 2019, 7, 141662–141678. [Google Scholar] [CrossRef]
- Yang, R.; Wang, G.; Pan, Z.; Lu, H.; Zhang, H.; Jia, X. A novel false alarm suppression method for CNN-based SAR ship detector. IEEE Geosci. Remote Sens. Lett. 2020, 18, 1401–1405. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, C.; Zhang, H.; Dong, Y.; Wei, S. Automatic Ship Detection Based on RetinaNet Using Multi-Resolution Gaofen-3 Imagery. Remote Sens. 2019, 11, 531. [Google Scholar] [CrossRef] [Green Version]
- Chen, S.; Zhang, J.; Zhan, R. R2FA-Det: Delving into High-Quality Rotatable Boxes for Ship Detection in SAR Images. Remote Sens. 2020, 12, 2031. [Google Scholar] [CrossRef]
- Shao, Z.; Zhang, X.; Zhang, T.; Xu, X.; Zeng, T. RBFA-Net: A Rotated Balanced Feature-Aligned Network for Rotated SAR Ship Detection and Classification. Remote Sens. 2022, 14, 3345. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X.; Shi, J. Balanced feature pyramid network for ship detection in synthetic aperture radar images. In Proceedings of the 2020 IEEE Radar Conference (RadarConf20), Florence, Italy, 21–25 September 2020; pp. 1–5. [Google Scholar]
- Wei, S.; Su, H.; Ming, J.; Wang, C.; Yan, M.; Kumar, D.; Shi, J.; Zhang, X. Precise and Robust Ship Detection for High-Resolution SAR Imagery Based on HR-SDNet. Remote Sens. 2020, 12, 167. [Google Scholar] [CrossRef] [Green Version]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep High-Resolution Representation Learning for Human Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5693–5703. [Google Scholar]
- Huang, J.; Niu, Y.; Gan, W. Ship Detection based on SAR Imaging Supervised by Noisy Ship Direction. In Proceedings of the 2021 4th International Conference on Pattern Recognition and Artificial Intelligence, Yibin, China, 20–22 August 2021; pp. 372–377. [Google Scholar]
- Guo, H.; Yang, X.; Wang, N. A CenterNet++ model for ship detection in SAR images. Pattern Recognit. 2021, 112, 107787. [Google Scholar] [CrossRef]
- Cui, Z.; Wang, X.; Liu, N. Ship detection in large-scale SAR images via spatial shuffle-group enhance attention. IEEE Trans. Geosci. Remote Sens. 2020, 59, 379–391. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X.; Ke, X. Ls-ssdd-v1.0: A deep learning dataset dedicated to small ship detection from large-scale sentinel-1 SAR images. Remote Sens. 2020, 12, 2997. [Google Scholar] [CrossRef]
- Fan, Q.; Chen, F.; Cheng, M.; Lou, S.; Xiao, R.; Zhang, B.; Wang, C.; Li, J. Ship Detection Using a Fully Convolutional Network with Compact Polarimetric SAR Images. Remote Sens. 2019, 11, 2171. [Google Scholar] [CrossRef] [Green Version]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the MICCAI 2015, Munich, Germany, 5–9 October 2015. [Google Scholar]
- Wu, X.; Sahoo, D.; Hoi, S.C.H. Recent advances in deep learning for object detection. Neurocomputing 2020, 396, 39–64. [Google Scholar] [CrossRef] [Green Version]
- Zhang, T.; Zhang, X. Htc+ for SAR ship instance segmentation. Remote Sens. 2022, 14, 2395. [Google Scholar] [CrossRef]
- Li, J.; Guo, C.; Gou, S. Ship segmentation on high-resolution SAR image by a 3D dilated multiscale U-Net. In Proceedings of the IGARSS 2020–2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 2575–2578. [Google Scholar]
- Jin, K.; Chen, Y.; Xu, B. A patch-to-pixel convolutional neural network for small ship detection with PolSAR images. IEEE Trans. Geosci. Remote Sens. 2020, 58, 6623–6638. [Google Scholar] [CrossRef]
- Su, H.; Wei, S.; Liu, S.; Liang, J.; Wang, C.; Shi, J.; Zhang, X. HQ-ISNet: High-Quality Instance Segmentation for Remote Sensing Imagery. Remote Sens. 2020, 12, 989. [Google Scholar] [CrossRef] [Green Version]
- Wei, S.; Zeng, X.; Qu, Q.; Wang, M.; Su, H.; Shi, J. HRSID: A High-Resolution SAR Images Dataset for Ship Detection and Instance Segmentation. IEEE Access 2020, 8, 120234–120254. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X.; Li, J.; Xu, X.; Wang, B.; Zhan, X.; Xu, Y.; Ke, X.; Zeng, T.; Su, H.; et al. SAR Ship Detection Dataset (SSDD): Official Release and Comprehensive Data Analysis. Remote Sens. 2021, 13, 3690. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Huang, Z.; Huang, L.; Gong, Y. Mask scoring r-cnn. In Proceedings of the European Conference on Computer Vision, Long Beach, CA, USA, 16–20 June 2019; pp. 6409–6418. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: High Quality Object Detection and Instance Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 1483–1498. [Google Scholar] [CrossRef] [PubMed]
- Chen, K.; Pang, J.; Wang, J.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Shi, J.; Ouyang, W.; et al. Hybrid Task Cascade for Instance Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4969–4978. [Google Scholar]
- Zhang, T.; Xu, X.; Zhang, X. SAR ship instance segmentation based on hybrid task cascade. In Proceedings of the 2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), Chengdu, China, 17–19 December 2021; pp. 530–533. [Google Scholar]
- Lin, M.; Chen, Q.; Yan, S. Network in network. arXiv 2013, arXiv:1312.4400. [Google Scholar]
- Zhou, Z.; Guan, R.; Cui, Z. Scale Expansion Pyramid Network for Cross-Scale Object Detection in SAR Images. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 5291–5294. [Google Scholar]
- Gao, S.H.; Cheng, M.M.; Zhao, K. Res2net: A new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 652–662. [Google Scholar] [CrossRef] [Green Version]
- Wang, J.; Chen, K.; Xu, R. Carafe: Content-aware reassembly of features. In Proceedings of the European Conference on Computer Vision, Long Beach, CA, USA, 16–20 June 2019; pp. 3007–3016. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H. Path aggregation network for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 2015. [Google Scholar] [CrossRef] [Green Version]
- Zhang, T.; Zhang, X.; Ke, X. Quad-FPN: A Novel Quad Feature Pyramid Network for SAR Ship Detection. Remote Sens. 2021, 13, 2771. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X.; Shi, J.; Wei, S. A HOG Feature Fusion Method to Improve CNN-Based SAR Ship Classification Accuracy. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 5311–5314. [Google Scholar]
- Kosub, S. A note on the triangle inequality for the Jaccard distance. Pattern Recognit. Lett 2019, 120, 36–38. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Dong, W.; Zhang, T.; Qu, J. Laplacian pyramid dense network for hyperspectral pansharpening. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–13. [Google Scholar] [CrossRef]
- Xie, S.; Girshick, R.; Dollár, P. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S. Rethinking the inception architecture for computer vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Han, L.; Zhao, X.; Ye, W. Asymmetric and square convolutional neural network for SAR ship detection from scratch. In Proceedings of the 2020 5th International Conference on Biomedical Signal and Image Processing, Suzhou, China, 21–23 August 2020; pp. 80–85. [Google Scholar]
- Cao, Y.; Xu, J.; Lin, S. GCNet: Non-local networks meet squeeze-excitation networks and beyond. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 15–20 June 2019; pp. 1971–1980. [Google Scholar]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [Green Version]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; PMLR: Lille, France, 2015; Volume 37, pp. 448–456. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
- Bolya, D.; Zhou, C.; Xiao, F. Yolact: Real-time instance segmentation. In Proceedings of the European Conference on Computer Vision, Long Beach, CA, USA, 16–20 June 2019; pp. 9156–9165. [Google Scholar]
- Ke, X.; Zhang, X.; Zhang, T. GCBANet: A Global Context Boundary-Aware Network for SAR Ship Instance Segmentation. Remote Sens. 2022, 14, 2165. [Google Scholar] [CrossRef]
- Han, L.; Ran, D.; Ye, W. Multi-size Convolution and Learning Deep Network for SAR Ship Detection from Scratch. IEEE Access 2020, 8, 158996. [Google Scholar] [CrossRef]
- Shi, W.; Caballero, J.; Huszár, F. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar]
- Everingham, M.; Eslami, S.M.; Van Gool, L. The Pascal Visual Object Classes Challenge: A Retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Zhang, T.; Zhang, X.; Liu, C.; Shi, J.; Wei, S.; Ahmad, I.; Zhan, X.; Zhou, Y.; Pan, D.; Li, J.; et al. Balance Learning for Ship Detection from Synthetic Aperture Radar Remote Sensing Imagery. ISPRS J. Photogramm. Remote Sens. 2021, 182, 190–207. [Google Scholar]
- De Boer, P.T.; Kroese, D.P.; Mannor, S. A Tutorial on the Cross-Entropy Method. Ann. Oper. Res. 2005, 134, 19–67. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Hosang, J.; Benenson, R.; Schiele, B. Learning Non-maximum Suppression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4507–4515. [Google Scholar]
- Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. MMDetection: Open MMLab Detection Toolbox and Benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar]
- Rossi, L.; Karimi, A.; Prati, A. A Novel Region of Interest Extraction Layer for Instance Segmentation. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January; pp. 2203–2209.
- Zhao, D.; Zhu, C.; Qi, J.; Qi, X.; Su, Z.; Shi, Z. Synergistic Attention for Ship Instance Segmentation in SAR Images. Remote Sens. 2021, 13, 4384. [Google Scholar] [CrossRef]
- Dai, J.; Qi, H.; Xiong, Y. Deformable Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 764–773. [Google Scholar]
- Radosavovic, I.; Kosaraju, R.P.; Girshick, R. Designing Network Design Spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 10425–10433. [Google Scholar]
- Zhang, H.; Wu, C.; Zhang, Z. ResNeSt: Split-Attention Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 2736–2746. [Google Scholar]
- Zhang, T.; Zhang, X. ShipDeNet-20: An only 20 convolution layers and <1-MB lightweight SAR ship detector. IEEE Geosci. Remote Sens. Lett. 2020, 18, 1234–1238. [Google Scholar]
- Distill. Deconvolution and Checkerboard Artifacts. Available online: https://distill.pub/2016/deconv-checkerboard/ (accessed on 10 October 2016).
Block | Input Size | Output Size |
---|---|---|
Input Mode * | L × L × 1 | αL × αL × 1 L × L × 1 βL × βL × 1 |
Backbone Mode | L × L × 1 | (L/4) × (L/4) × 256 (L/8) × (L/38) × 512 (L/16) × (L/16) × 1024 (L/32) × (L/32) × 2048 |
Redesigned FPN | (L/4) × (L/4) × 256 (L/8) × (L/38) × 512 (L/16) × (L/16) × 1024 (L/32) × (L/32) × 2048 | (L/4) × (L/4) × 256 (L/8) × (L/38) × 256 (L/16) × (L/16) × 256 (L/32) × (L/32) × 256 |
RPN Mode | (L/4) × (L/4) × 256 (L/8) × (L/38) × 256 (L/16) × (L/16) × 256 (L/32) × (L/32) × 256 | 1000 × 5 |
ROIAlign Mode | (L/4) × (L/4) × 256 (L/8) × (L/38) × 256 (L/16) × (L/16) × 256 (L/32) × (L/32) × 256 1000 × 5 | 1000 × 256 × 7 × 7 |
Redesigned DH | 1000 × 256 × 7 × 7 | 1000 × 2 1000 × 8 28 × 28 × 1 |
Redesign FPN? | Redesign DH? | SIS Mode? | Detection Task (%) | Segmentation Task (%) | Model Size (MB) | #Para (M) | FPS | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Input | Backbone | RPN | ROI | AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | |||||
-- | -- | -- | -- | -- | -- | 91.5 | 75.4 | 62.0 | 64.4 | 19.7 | 62.0 | 88.5 | 72.1 | 57.2 | 60.8 | 27.4 | 57.8 | 480 | 62.74 | 11.05 |
✓ | 94.9 | 77.0 | 65.5 | 63.8 | 18.8 | 64.6 | 93.8 | 72.5 | 59.9 | 58.6 | 25.1 | 59.6 | 514 | 66.35 | 10.55 | |||||
+3.4 | +1.6 | +3.5 | −0.6 | −0.9 | +2.6 | +5.3 | +0.4 | +2.7 | −2.2 | −2.3 | +1.8 | |||||||||
✓ | ✓ | 94.9 | 77.5 | 66.8 | 64.5 | 40.9 | 66.0 | 93.7 | 74.3 | 60.5 | 58.5 | 50.2 | 59.9 | 797 | 103.47 | 6.82 | ||||
+3.4 | +2.1 | +4.8 | +0.1 | +21.2 | +4.0 | +5.2 | +2.2 | +3.3 | −2.3 | +22.8 | +2.1 | |||||||||
✓ | ✓ | ✓ | 95.4 | 80.2 | 68.4 | 66.5 | 41.1 | 67.7 | 94.1 | 76.2 | 62.5 | 57.8 | 54.4 | 61.4 | 797 | 103.47 | 3.31 | |||
+3.9 | +4.8 | +6.4 | +2.1 | +21.4 | +5.7 | +5.6 | +4.1 | +5.3 | −3.0 | +27.0 | +3.6 | |||||||||
✓ | ✓ | ✓ | ✓ | 95.9 | 80.6 | 69.4 | 66.9 | 43.5 | 68.5 | 94.1 | 77.6 | 62.4 | 59.9 | 57.4 | 61.8 | 803 | 104.13 | 2.90 | ||
+4.4 | +5.2 | +7.4 | +2.5 | +23.8 | +6.5 | +5.6 | +5.5 | +5.2 | −0.9 | +30.0 | +4.0 | |||||||||
✓ | ✓ | ✓ | ✓ | ✓ | 96.4 | 84.8 | 71.1 | 69.8 | 49.4 | 70.5 | 94.4 | 79.6 | 63.4 | 62.5 | 60.1 | 63.1 | 822 | 106.69 | 2.42 | |
+4.9 | +9.4 | +9.1 | +5.4 | +29.7 | +8.5 | +5.9 | +7.5 | +6.2 | +1.7 | +32.7 | +5.3 | |||||||||
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 97.2 | 87.0 | 71.7 | 73.1 | 54.6 | 71.9 | 94.6 | 80.9 | 64.5 | 67.4 | 62.5 | 65.1 | 909 | 118.10 | 1.84 |
+5.7 | +11.6 | +9.7 | +8.7 | +34.9 | +9.9 | +6.1 | +8.8 | +7.3 | +6.6 | +35.1 | +7.3 | |||||||||
Method | Backbone Network | Detection Task (%) | Segmentation Task (%) | Model Size (MB) | #Para (M) | FPS | ||||||||||||||
AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | |||||||||
Mask R-CNN [59] | ResNet-50-FPN | 91.6 | 68.4 | 59.8 | 53.4 | 12.1 | 58.1 | 90.2 | 67.6 | 58.6 | 53.0 | 27.4 | 57.1 | 351 | 45.88 | 12.35 | ||||
Mask R-CNN [59] † | ResNet-50-FPN | 94.0 | 75.7 | 65.7 | 55.2 | 22.8 | 63.2 | 92.4 | 70.5 | 60.6 | 53.0 | 29.6 | 58.6 | 351 | 45.88 | 8.26 | ||||
Mask R-CNN [59] | ResNet-101-FPN | 91.5 | 75.4 | 62.0 | 64.4 | 19.7 | 62.0 | 88.5 | 72.1 | 57.2 | 60.8 | 27.4 | 57.8 | 480 | 62.74 | 11.05 | ||||
Mask R-CNN [59] † | ResNet-101-FPN | 93.1 | 74.7 | 64.5 | 64.3 | 19.6 | 64.1 | 91.9 | 75.0 | 60.7 | 60.7 | 31.3 | 60.6 | 480 | 62.74 | 7.48 | ||||
Mask Scoring R-CNN [60] | ResNet-50-FPN | 90.2 | 63.4 | 57.4 | 51.0 | 13.2 | 55.6 | 88.8 | 63.9 | 56.2 | 51.0 | 23.1 | 54.7 | 453 | 59.27 | 13.11 | ||||
Mask Scoring R-CNN [60] † | ResNet-50-FPN | 90.4 | 74.2 | 63.9 | 49.7 | 11.8 | 60.2 | 89.3 | 71.1 | 60.4 | 50.8 | 31.3 | 57.8 | 453 | 59.27 | 7.44 | ||||
Mask Scoring R-CNN [60] | ResNet-101-FPN | 91.0 | 75.1 | 61.9 | 66.0 | 15.7 | 62.4 | 89.4 | 73.2 | 58.0 | 61.4 | 22.6 | 58.6 | 604 | 79.00 | 12.88 | ||||
Mask Scoring R-CNN [60] † | ResNet-101-FPN | 94.8 | 77.2 | 65.7 | 65.7 | 23.4 | 65.3 | 92.6 | 74.6 | 60.9 | 60.2 | 30.8 | 60.4 | 604 | 79.00 | 7.25 | ||||
Cascade Mask R-CNN [61] | ResNet-50-FPN | 89.7 | 65.5 | 58.6 | 49.3 | 8.2 | 56.2 | 88.7 | 65.3 | 57.5 | 50.4 | 27.4 | 55.7 | 586 | 77.06 | 12.47 | ||||
Cascade Mask R-CNN [61] † | ResNet-50-FPN | 92.7 | 74.8 | 64.7 | 56.2 | 7.1 | 62.4 | 91.2 | 75.8 | 61.6 | 54.9 | 22.7 | 59.9 | 586 | 77.06 | 8.41 | ||||
Cascade Mask R-CNN [61] | ResNet-101-FPN | 89.6 | 75.2 | 62.4 | 66.0 | 12.0 | 63.0 | 87.5 | 70.5 | 56.3 | 58.8 | 22.6 | 56.6 | 732 | 95.79 | 10.55 | ||||
Cascade Mask R-CNN [61] † | ResNet-101-FPN | 93.5 | 76.5 | 65.5 | 66.4 | 38.1 | 65.4 | 91.5 | 73.1 | 60.2 | 59.7 | 50.3 | 60.0 | 732 | 95.79 | 5.80 | ||||
HTC [62] | ResNet-101-FPN | 93.6 | 76.3 | 65.2 | 68.4 | 27.5 | 65.6 | 91.7 | 73.1 | 58.7 | 61.6 | 34.8 | 59.3 | 733 | 95.92 | 11.60 | ||||
HTC [62] † | ResNet-101-FPN | 94.8 | 78.5 | 66.7 | 68.7 | 40.6 | 66.8 | 93.1 | 72.9 | 60.4 | 62.0 | 43.4 | 60.7 | 733 | 95.92 | 5.52 | ||||
PANet [67] | ResNet-101-FPN | 93.4 | 75.4 | 63.4 | 65.5 | 40.8 | 63.3 | 91.1 | 74.0 | 59.3 | 61.0 | 52.1 | 59.6 | 507 | 66.28 | 13.65 | ||||
PANet [67] † | ResNet-101-FPN | 93.8 | 76.3 | 66.4 | 64.1 | 30.6 | 65.4 | 92.4 | 75.0 | 60.6 | 60.2 | 38.3 | 60.4 | 507 | 66.28 | 7.48 | ||||
YOLACT [82] | ResNet-101-FPN | 90.6 | 61.2 | 56.9 | 48.2 | 12.6 | 54.0 | 88.0 | 52.1 | 47.3 | 53.5 | 40.2 | 48.4 | 410 | 53.72 | 15.47 | ||||
GRoIE [93] | ResNet-101-FPN | 91.5 | 71.6 | 62.2 | 59.8 | 8.7 | 61.2 | 89.8 | 72.7 | 58.6 | 58.7 | 21.8 | 58.3 | 509 | 66.53 | 9.67 | ||||
GRoIE [93] † | ResNet-101-FPN | 64.0 | 74.1 | 64.3 | 62.1 | 24.9 | 63.5 | 92.1 | 75.7 | 60.7 | 60.2 | 35.1 | 60.4 | 509 | 66.53 | 4.64 | ||||
HQ-ISNet [51] | HRNetV2_W18 | 91.0 | 76.3 | 64.7 | 66.6 | 26.0 | 64.9 | 89.3 | 73.6 | 58.2 | 60.4 | 37.2 | 58.6 | 479 | 62.75 | 8.59 | ||||
HQ-ISNet [56] † | HRNetV2_W18 | 92.2 | 77.2 | 65.9 | 65.7 | 26.3 | 65.6 | 91.3 | 75.9 | 59.7 | 58.9 | 35.0 | 59.4 | 479 | 62.75 | 4.07 | ||||
HQ-ISNet [56] | HRNetV2_W32 | 90.7 | 77.3 | 65.6 | 66.9 | 23.2 | 65.5 | 90.4 | 75.5 | 58.9 | 61.1 | 37.3 | 59.3 | 630 | 82.55 | 8.00 | ||||
HQ-ISNet [56] † | HRNetV2_W32 | 93.2 | 77.7 | 65.8 | 67.6 | 33.4 | 66.0 | 91.4 | 74.8 | 59.0 | 61.3 | 60.2 | 59.5 | 630 | 82.55 | 3.87 | ||||
HQ-ISNet [56] | HRNetV2_W40 | 87.8 | 75.3 | 62.6 | 67.8 | 27.9 | 63.6 | 86.0 | 72.6 | 56.7 | 61.3 | 50.2 | 57.6 | 754 | 98.79 | 7.73 | ||||
HQ-ISNet [56] † | HRNetV2_W40 | 92.2 | 75.1 | 63.8 | 64.9 | 38.6 | 63.8 | 90.6 | 74.0 | 59.6 | 59.3 | 57.2 | 59.5 | 754 | 98.79 | 3.57 | ||||
SA R-CNN [94] | ResNet-50-GCB-FPN | 92.1 | 75.2 | 63.8 | 64.0 | 7.0 | 63.2 | 90.4 | 73.3 | 59.6 | 60.3 | 20.2 | 59.4 | 411 | 53.75 | 13.65 | ||||
SA R-CNN [94] † | ResNet-50-GCB-FPN | 93.1 | 74.8 | 65.5 | 61.7 | 27.6 | 64.1 | 92.1 | 74.0 | 61.1 | 58.8 | 31.2 | 60.3 | 411 | 53.75 | 8.00 | ||||
SISNet (Ours) | ResNet-101-SIS-FPN | 97.2 | 87.0 | 71.7 | 73.1 | 54.6 | 71.9 | 94.6 | 80.9 | 64.5 | 67.4 | 62.5 | 65.1 | 909 | 118.10 | 1.84 | ||||
+2.4 | +8.5 | +5.0 | +4.4 | +13.8 | +5.1 | +1.5 | +5.0 | +3.4 | +5.4 | +2.3 | +4.4 |
Redesign FPN? | Redesign DH? | SIS Mode? | Detection Task (%) | Segmentation Task (%) | Model Size (MB) | #Para (M) | FPS | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Input | Backbone | RPN | ROI | AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | ||||||
-- | -- | -- | -- | -- | -- | 87.7 | 75.5 | 66.1 | 68.4 | 14.1 | 65.1 | 85.7 | 65.2 | 54.3 | 62.5 | 13.3 | 54.8 | 480 | 62.74 | 7.07 | |
✓ | 90.2 | 77.1 | 67.9 | 66.3 | 14.6 | 66.6 | 87.5 | 66.8 | 55.4 | 62.1 | 12.4 | 55.6 | 514 | 66.35 | 6.81 | ||||||
+2.5 | +1.6 | +1.8 | −2.1 | +0.5 | +1.5 | +1.8 | +1.6 | +1.1 | −0.4 | −0.9 | +0.8 | ||||||||||
✓ | ✓ | 89.8 | 77.3 | 68.8 | 68.5 | 21.8 | 67.6 | 87.2 | 65.1 | 55.2 | 61.3 | 17.7 | 55.3 | 797 | 103.47 | 4.42 | |||||
+2.1 | +1.8 | +2.7 | +0.1 | +7.7 | +2.5 | +1.5 | −0.1 | +0.9 | −1.2 | +4.4 | +0.5 | ||||||||||
✓ | ✓ | ✓ | 90.2 | 78.5 | 69.7 | 67.5 | 22.9 | 68.3 | 88.2 | 65.6 | 55.6 | 61.5 | 17.0 | 55.7 | 797 | 103.47 | 2.15 | ||||
+2.5 | +3.0 | +3.6 | -0.9 | +8.8 | +3.2 | +2.5 | +0.4 | +1.3 | −1.0 | +3.7 | +0.9 | ||||||||||
✓ | ✓ | ✓ | ✓ | 91.1 | 78.7 | 70.3 | 68.4 | 26.2 | 69.1 | 88.5 | 65.8 | 55.7 | 61.5 | 19.9 | 55.8 | 803 | 104.13 | 1.86 | |||
+3.4 | +3.2 | +4.2 | +0.0 | +12.1 | +4.0 | +2.8 | +0.6 | +1.4 | −1.0 | +6.6 | +1.0 | ||||||||||
✓ | ✓ | ✓ | ✓ | ✓ | 91.0 | 78.9 | 70.6 | 69.0 | 28.6 | 69.3 | 89.0 | 67.3 | 56.5 | 62.1 | 22.3 | 56.5 | 822 | 106.69 | 1.55 | ||
+3.3 | +3.4 | +4.5 | +0.6 | +14.5 | +4.2 | +3.3 | +2.1 | +2.2 | −0.4 | +9.0 | +1.7 | ||||||||||
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 92.4 | 80.7 | 71.5 | 70.8 | 38.3 | 70.5 | 90.2 | 70.1 | 58.5 | 65.4 | 28.0 | 58.9 | 909 | 118.10 | 1.04 | |
+4.7 | +5.2 | +5.4 | +2.4 | +24.2 | +5.4 | +4.5 | +4.9 | +4.2 | +2.9 | +14.7 | +4.1 | ||||||||||
Method | Backbone Network | Detection Task (%) | Segmentation Task (%) | Model Size (MB) | #Para (M) | FPS | |||||||||||||||
AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | ||||||||||
Mask R-CNN [59] | ResNet-50-FPN | 86.4 | 70.2 | 63.1 | 61.9 | 8.4 | 61.6 | 84.1 | 61.6 | 52.4 | 59.1 | 10.5 | 52.6 | 351 | 45.88 | 8.41 | |||||
Mask R-CNN [59] † | ResNet-50-FPN | 88.4 | 73.8 | 66.1 | 66.6 | 14.8 | 64.9 | 85.9 | 64.1 | 53.8 | 61.0 | 16.1 | 54.1 | 351 | 45.88 | 5.06 | |||||
Mask R-CNN [59] | ResNet-101-FPN | 87.7 | 75.5 | 66.1 | 68.4 | 14.1 | 65.1 | 85.7 | 65.2 | 54.3 | 62.5 | 13.3 | 54.8 | 480 | 62.74 | 7.07 | |||||
Mask R-CNN [59] † | ResNet-101-FPN | 89.2 | 76.0 | 67.3 | 66.2 | 13.5 | 65.8 | 87.3 | 67.5 | 56.3 | 61.9 | 12.5 | 56.3 | 480 | 62.74 | 4.79 | |||||
Mask Scoring R-CNN [60] | ResNet-50-FPN | 88.1 | 73.8 | 66.1 | 66.3 | 14.1 | 64.9 | 85.6 | 64.1 | 53.8 | 61.2 | 14.6 | 54.0 | 453 | 59.27 | 9.12 | |||||
Mask Scoring R-CNN [60] † | ResNet-50-FPN | 88.4 | 74.0 | 66.3 | 66.9 | 14.0 | 65.2 | 85.7 | 64.3 | 54.1 | 61.3 | 15.0 | 54.3 | 453 | 59.27 | 5.06 | |||||
Mask Scoring R-CNN [60] | ResNet-101-FPN | 87.6 | 75.4 | 66.5 | 67.4 | 13.4 | 65.2 | 85.1 | 65.9 | 54.5 | 61.5 | 12.9 | 54.9 | 604 | 79.00 | 8.24 | |||||
Mask Scoring R-CNN [60] † | ResNet-101-FPN | 89.4 | 76.8 | 67.4 | 67.5 | 17.3 | 66.3 | 87.3 | 67.5 | 56.0 | 63.0 | 14.8 | 56.3 | 604 | 79.00 | 4.64 | |||||
Cascade Mask R-CNN [61] | ResNet-50-FPN | 84.7 | 69.8 | 61.7 | 60.5 | 4.8 | 60.1 | 82.5 | 61.2 | 51.4 | 57.6 | 7.8 | 51.5 | 586 | 77.06 | 8.68 | |||||
Cascade Mask R-CNN [61] † | ResNet-50-FPN | 86.5 | 71.5 | 63.3 | 61.9 | 4.3 | 62.0 | 84.1 | 63.1 | 53.2 | 58.6 | 6.0 | 53.2 | 586 | 77.06 | 4.11 | |||||
Cascade Mask R-CNN [61] | ResNet-101-FPN | 85.4 | 74.4 | 66.0 | 69.0 | 17.1 | 65.1 | 83.4 | 62.9 | 52.2 | 62.2 | 17.0 | 52.8 | 732 | 95.79 | 6.75 | |||||
Cascade Mask R-CNN [61] † | ResNet-101-FPN | 88.2 | 76.8 | 67.9 | 68.2 | 16.5 | 66.9 | 85.9 | 65.0 | 54.4 | 61.6 | 16.1 | 54.7 | 732 | 95.79 | 3.71 | |||||
HTC [62] | ResNet-101-FPN | 86.0 | 77.1 | 67.6 | 69.0 | 28.1 | 66.6 | 84.9 | 66.5 | 54.7 | 63.8 | 19.2 | 55.2 | 733 | 95.92 | 7.42 | |||||
HTC [62] † | ResNet-101-FPN | 86.8 | 78.0 | 68.2 | 69.7 | 27.6 | 67.2 | 85.7 | 68.7 | 55.9 | 64.4 | 25.0 | 56.4 | 733 | 95.92 | 3.53 | |||||
PANet [67] | ResNet-101-FPN | 88.0 | 75.7 | 66.5 | 68.2 | 22.1 | 65.4 | 86.0 | 66.2 | 54.7 | 62.8 | 17.8 | 55.1 | 507 | 66.28 | 8.74 | |||||
PANet [67] † | ResNet-101-FPN | 89.5 | 77.1 | 67.6 | 68.5 | 33.5 | 66.6 | 87.5 | 66.9 | 55.6 | 63.3 | 25.1 | 56.1 | 507 | 66.28 | 4.79 | |||||
YOLACT [82] | ResNet-101-FPN | 74.4 | 53.3 | 51.7 | 34.9 | 3.3 | 47.9 | 71.1 | 41.9 | 39.5 | 46.1 | 7.3 | 39.6 | 410 | 53.72 | 10.02 | |||||
GRoIE [93] | ResNet-101-FPN | 87.8 | 75.5 | 66.5 | 67.2 | 21.8 | 65.4 | 85.8 | 66.9 | 54.9 | 63.5 | 19.7 | 55.4 | 509 | 66.53 | 6.19 | |||||
GRoIE [93] † | ResNet-101-FPN | 88.2 | 76.6 | 67.3 | 66.6 | 18.5 | 65.8 | 87.0 | 67.2 | 55.9 | 63.4 | 19.9 | 56.3 | 509 | 66.53 | 2.97 | |||||
HQ-ISNet [51] | HRNetV2_W18 | 86.1 | 75.6 | 67.1 | 66.3 | 8.9 | 66.0 | 84.2 | 64.3 | 53.2 | 59.7 | 10.7 | 53.4 | 479 | 62.75 | 5.50 | |||||
HQ-ISNet [56] † | HRNetV2_W18 | 87.3 | 76.2 | 67.8 | 66.1 | 13.5 | 66.6 | 85.0 | 65.4 | 54.5 | 59.4 | 13.9 | 54.4 | 479 | 62.75 | 2.60 | |||||
HQ-ISNet [56] | HRNetV2_W32 | 86.9 | 76.3 | 67.8 | 68.3 | 16.8 | 66.7 | 85.0 | 65.8 | 54.2 | 61.7 | 13.4 | 54.6 | 630 | 82.55 | 5.12 | |||||
HQ-ISNet [56] † | HRNetV2_W32 | 87.1 | 77.7 | 68.7 | 68.0 | 10.4 | 67.5 | 85.3 | 67.0 | 55.0 | 61.2 | 12.0 | 55.0 | 630 | 82.55 | 2.48 | |||||
HQ-ISNet [56] | HRNetV2_W40 | 86.2 | 76.3 | 67.9 | 68.6 | 11.7 | 66.7 | 84.3 | 64.9 | 53.9 | 61.9 | 12.8 | 54.2 | 754 | 98.79 | 4.95 | |||||
HQ-ISNet [56] † | HRNetV2_W40 | 86.9 | 77.7 | 68.6 | 68.1 | 16.1 | 67.5 | 85.1 | 67.0 | 55.3 | 61.6 | 14.0 | 55.4 | 754 | 98.79 | 2.28 | |||||
SA R-CNN [94] | ResNet-50-GCB-FPN | 88.3 | 75.2 | 66.4 | 65.4 | 10.2 | 65.2 | 86.2 | 66.7 | 54.9 | 60.9 | 12.3 | 55.2 | 411 | 53.75 | 8.74 | |||||
SA R-CNN [94] † | ResNet-50-GCB-FPN | 89.8 | 76.6 | 67.7 | 64.4 | 12.4 | 66.0 | 87.8 | 66.7 | 56.3 | 61.3 | 11.8 | 56.2 | 411 | 53.75 | 5.12 | |||||
SISNet (Ours) | ResNet-101-SIS-FPN | 92.4 | 80.7 | 71.5 | 70.8 | 38.3 | 70.5 | 90.2 | 70.1 | 58.5 | 65.4 | 28.0 | 58.9 | 909 | 118.10 | 1.04 | |||||
+2.6 | +2.7 | +2.8 | +1.1 | +4.8 | +3.3 | +2.4 | +1.4 | +2.2 | +1.0 | +3.0 | +2.5 |
Input-Mode SIS? | Detection Task (%) | Segmentation Task (%) | FPS | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | ||
x * | 96.3 | 81.7 | 68.3 | 71.2 | 56.8 | 68.8 | 94.4 | 76.7 | 61.5 | 63.7 | 65.0 | 62.0 | 4.46 |
✓† | 97.2 | 87.0 | 71.7 | 73.1 | 54.6 | 71.9 | 94.6 | 80.9 | 64.5 | 67.4 | 62.5 | 65.1 | 1.84 |
320 | 416 | 512 | 608 | 704 | Detection Task (%) | Segmentation Task (%) | #Para | FPS | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | |||||||
✓ | 94.2 | 78.9 | 64.9 | 69.9 | 61.3 | 65.7 | 92.0 | 64.8 | 53.7 | 59.8 | 52.5 | 55.1 | 118.10 | 5.95 | ||||
✓ | 94.4 | 80.7 | 67.8 | 70.9 | 58.4 | 68.3 | 93.4 | 74.6 | 58.7 | 61.4 | 60.1 | 59.4 | 118.10 | 5.40 | ||||
✓ | 96.3 | 81.7 | 68.3 | 71.2 | 56.8 | 68.8 | 94.4 | 76.7 | 61.5 | 63.7 | 65.0 | 62.0 | 118.10 | 4.46 | ||||
✓ | 96.3 | 86.9 | 70.8 | 71.4 | 59.6 | 70.9 | 93.6 | 82.7 | 64.0 | 65.7 | 62.5 | 64.3 | 118.10 | 3.74 | ||||
✓ | 95.5 | 85.5 | 70.9 | 71.6 | 55.1 | 71.0 | 94.3 | 80.3 | 64.9 | 65.4 | 55.0 | 65.1 | 118.10 | 3.31 | ||||
✓ | ✓ | 95.9 | 80.0 | 68.0 | 69.9 | 66.7 | 68.2 | 94.1 | 73.3 | 58.8 | 61.9 | 62.5 | 59.6 | 118.10 | 3.36 | |||
✓ | ✓ | 95.4 | 83.6 | 69.9 | 71.1 | 60.9 | 70.1 | 93.6 | 77.3 | 61.9 | 62.1 | 62.5 | 61.9 | 118.10 | 2.86 | |||
✓ | ✓ | 96.4 | 84.6 | 70.9 | 71.9 | 53.3 | 70.8 | 94.5 | 79.4 | 63.7 | 65.1 | 65.0 | 64.0 | 118.10 | 2.70 | |||
✓ | ✓ | 96.6 | 88.5 | 72.2 | 75.0 | 57.1 | 72.7 | 94.6 | 83.2 | 65.8 | 67.2 | 62.5 | 66.1 | 118.10 | 2.30 | |||
✓ | ✓ | ✓ | 96.1 | 83.3 | 69.6 | 70.9 | 68.3 | 69.7 | 94.3 | 76.1 | 60.9 | 61.8 | 62.5 | 61.2 | 118.10 | 2.09 | ||
✓ | ✓ | ✓ | 97.2 | 87.0 | 71.7 | 73.1 | 54.6 | 71.9 | 94.6 | 80.9 | 64.5 | 67.4 | 62.5 | 65.1 | 118.10 | 1.84 | ||
✓ | ✓ | ✓ | 96.5 | 86.5 | 71.8 | 73.0 | 50.2 | 71.9 | 94.6 | 82.0 | 65.3 | 66.2 | 50.1 | 65.4 | 118.10 | 1.60 | ||
✓ | ✓ | ✓ | ✓ | 97.1 | 84.5 | 70.7 | 72.8 | 60.9 | 70.9 | 94.9 | 78.2 | 61.7 | 63.7 | 60.0 | 62.1 | 118.10 | 1.45 | |
✓ | ✓ | ✓ | ✓ | 97.2 | 87.4 | 72.4 | 74.6 | 57.6 | 72.6 | 94.6 | 82.2 | 65.3 | 65.6 | 65.0 | 65.4 | 118.10 | 1.33 |
800 | 896 | 992 | 1024 | 1120 | 1216* | Detection Task (%) | Segmentation Task (%) | FPS | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | |||||||
✓ | 96.5 | 87.5 | 71.1 | 72.2 | 42.7 | 71.2 | 94.7 | 82.8 | 66.5 | 65.9 | 55.0 | 66.0 | 3.22 | |||||
✓ | 96.5 | 85.8 | 71.6 | 71.3 | 60.1 | 71.4 | 94.7 | 83.7 | 66.5 | 64.5 | 62.5 | 66.0 | 3.18 | |||||
✓ | 96.4 | 86.1 | 71.5 | 72.9 | 57.9 | 71.8 | 94.6 | 83.4 | 67.0 | 65.6 | 60.0 | 66.4 | 2.73 | |||||
✓ | 96.5 | 86.7 | 72.1 | 72.3 | 55.1 | 71.9 | 95.4 | 82.4 | 66.8 | 65.8 | 55.1 | 66.5 | 2.67 | |||||
✓ | 95.5 | 86.4 | 72.0 | 72.8 | 57.6 | 71.9 | 94.6 | 83.3 | 67.5 | 65.7 | 55.0 | 67.0 | 2.39 | |||||
✓ | 96.5 | 87.9 | 72.5 | 70.8 | 40.3 | 71.9 | 94.6 | 83.9 | 67.6 | 65.1 | 40.2 | 66.9 | 2.22 |
Backbone-Mode SIS? | Detection Task (%) | Segmentation Task (%) | FPS | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | ||
x * | 96.2 | 84.2 | 71.1 | 70.8 | 52.6 | 70.7 | 94.5 | 79.5 | 63.4 | 63.9 | 50.0 | 63.4 | 1.93 |
✓† | 97.2 | 87.0 | 71.7 | 73.1 | 54.6 | 71.9 | 94.6 | 80.9 | 64.5 | 67.4 | 62.5 | 65.1 | 1.84 |
Backbone | Detection Task (%) | Segmentation Task (%) | FPS | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | ||
ResNet-101 [72] | 96.2 | 84.2 | 71.1 | 70.8 | 52.6 | 70.7 | 94.5 | 79.5 | 63.4 | 63.9 | 50.0 | 63.4 | 1.93 |
ResNeXt-101-32x4d [75] | 96.4 | 84.9 | 71.0 | 73.7 | 61.7 | 71.2 | 94.3 | 78.9 | 62.9 | 64.4 | 60.0 | 63.2 | 2.09 |
ResNeXt-101-32x8d [75] | 96.6 | 85.1 | 71.7 | 72.4 | 70.0 | 71.7 | 94.6 | 78.0 | 62.9 | 64.0 | 75.0 | 63.0 | 2.05 |
ResNeXt-101-64x4d [75] | 96.6 | 86.2 | 72.0 | 72.5 | 60.1 | 71.8 | 94.6 | 79.9 | 64.4 | 65.2 | 60.0 | 64.5 | 1.92 |
ResNeXt-101-64x4d-SA [96] | 97.2 | 85.7 | 71.3 | 72.5 | 71.7 | 71.4 | 94.4 | 81.3 | 63.9 | 64.3 | 70.0 | 64.1 | 1.38 |
ResNeXt-101-64x4d-DCN [97] | 97.4 | 86.1 | 72.1 | 74.4 | 61.3 | 72.4 | 95.4 | 80.0 | 63.9 | 66.0 | 62.5 | 64.3 | 1.12 |
RegNetX-400MF [98] | 96.0 | 82.7 | 70.0 | 69.0 | 57.6 | 69.6 | 93.9 | 75.8 | 61.7 | 59.4 | 60.0 | 61.0 | 2.05 |
RegNetX-800MF [98] | 97.2 | 85.6 | 71.7 | 71.3 | 45.2 | 71.3 | 94.5 | 78.5 | 63.4 | 62.4 | 40.2 | 63.0 | 2.42 |
RegNetX-1.6GF [98] | 97.2 | 83.4 | 71.1 | 70.1 | 65.0 | 70.7 | 94.3 | 78.5 | 63.0 | 62.1 | 60.1 | 62.7 | 2.23 |
RegNetX-3.2GF [98] | 96.6 | 85.9 | 71.5 | 72.4 | 47.7 | 71.5 | 94.6 | 80.8 | 63.8 | 63.4 | 50.2 | 63.5 | 2.21 |
RegNetX-4.0GF [98] | 97.2 | 85.1 | 71.7 | 73.8 | 65.4 | 72.0 | 94.6 | 80.3 | 64.1 | 64.6 | 65.0 | 64.1 | 2.08 |
HRNetV2-W18 [47] | 96.5 | 85.0 | 71.9 | 72.0 | 55.4 | 71.8 | 94.6 | 80.0 | 63.2 | 64.0 | 60.0 | 63.4 | 1.66 |
HRNetV2-W32 [47] | 96.2 | 85.5 | 71.6 | 71.9 | 60.0 | 71.6 | 94.5 | 80.4 | 64.5 | 64.5 | 62.5 | 64.4 | 1.62 |
HRNetV2-W40 [47] | 97.3 | 86.6 | 72.7 | 72.5 | 52.7 | 72.3 | 95.5 | 79.0 | 64.4 | 65.1 | 55.0 | 64.4 | 1.48 |
Res2Net-101 [65] | 96.4 | 85.0 | 71.4 | 73.8 | 66.3 | 71.6 | 94.5 | 78.8 | 63.2 | 64.5 | 62.5 | 63.4 | 1.86 |
ResNeSt-101 [109] | 97.4 | 86.5 | 72.8 | 71.5 | 57.6 | 72.1 | 96.2 | 81.0 | 64.8 | 63.1 | 50.1 | 64.3 | 1.87 |
ResNet-101-SIS | 97.2 | 87.0 | 71.7 | 73.1 | 54.6 | 71.9 | 94.6 | 80.9 | 64.5 | 67.4 | 62.5 | 65.1 | 1.84 |
RPN-Mode SIS? | Detection Task (%) | Segmentation Task (%) | FPS | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | ||
x * | 97.0 | 83.7 | 71.2 | 71.0 | 61.7 | 71.0 | 95.2 | 79.0 | 63.9 | 64.3 | 55.0 | 64.0 | 1.95 |
✓† | 97.2 | 87.0 | 71.7 | 73.1 | 54.6 | 71.9 | 94.6 | 80.9 | 64.5 | 67.4 | 62.5 | 65.1 | 1.84 |
3 × 3 | 3 × 1 | 1 × 3 | GCB | Detection Task (%) | Segmentation Task (%) | FPS | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | |||||
✓ | 97.0 | 83.7 | 71.2 | 71.0 | 61.7 | 71.0 | 95.2 | 79.0 | 63.9 | 64.3 | 55.0 | 64.0 | 2.05 | |||
✓ | 96.3 | 84.5 | 71.8 | 72.0 | 42.9 | 71.4 | 94.5 | 79.8 | 63.3 | 63.0 | 52.5 | 63.1 | 1.98 | |||
✓ | 96.2 | 85.1 | 71.2 | 72.2 | 62.1 | 71.1 | 94.4 | 77.5 | 62.6 | 63.6 | 60.0 | 62.8 | 1.99 | |||
✓ | ✓ | 96.4 | 86.4 | 71.9 | 72.3 | 54.6 | 71.6 | 94.4 | 79.4 | 63.4 | 63.9 | 65.0 | 63.4 | 1.90 | ||
✓ | ✓ | 96.7 | 84.2 | 71.5 | 71.6 | 52.9 | 71.3 | 94.9 | 79.2 | 63.5 | 63.9 | 60.0 | 63.6 | 1.89 | ||
✓ | ✓ | 96.3 | 85.7 | 71.9 | 72.4 | 55.4 | 71.5 | 94.5 | 78.8 | 63.7 | 63.8 | 52.5 | 63.6 | 1.85 | ||
✓ | ✓ | ✓ | x * | 96.4 | 85.3 | 71.0 | 72.7 | 57.3 | 71.2 | 94.5 | 78.7 | 63.0 | 64.4 | 62.5 | 63.3 | 1.81 |
✓ | ✓ | ✓ | x◆ | 96.3 | 86.3 | 71.0 | 71.8 | 62.5 | 70.9 | 94.4 | 79.6 | 63.5 | 63.6 | 65.0 | 63.5 | 1.79 |
✓ | ✓ | ✓ | ✓† | 97.2 | 87.0 | 71.7 | 73.1 | 54.6 | 71.9 | 94.6 | 80.9 | 64.5 | 67.4 | 62.5 | 65.1 | 1.84 |
ROI-Mode SIS? | Detection Task (%) | Segmentation Task (%) | FPS | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | ||
x * | 96.4 | 84.8 | 71.1 | 69.8 | 49.4 | 70.5 | 94.4 | 79.6 | 63.4 | 62.5 | 60.1 | 63.1 | 2.42 |
✓† | 97.2 | 87.0 | 71.7 | 73.1 | 54.6 | 71.9 | 94.6 | 80.9 | 64.5 | 67.4 | 62.5 | 65.1 | 1.84 |
ROI (1.0) 1 | ROI C1(2.0) 2 | ROI C2(3.0) 3 | DRSE | Detection Task (%) | Segmentation Task (%) | FPS | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | |||||
✓ | 96.4 | 84.8 | 71.1 | 69.8 | 49.4 | 70.5 | 94.4 | 79.6 | 63.4 | 62.5 | 60.1 | 63.1 | 2.42 | |||
✓ | 96.3 | 86.3 | 71.9 | 72.3 | 62.5 | 71.2 | 94.6 | 78.9 | 63.9 | 62.3 | 62.5 | 63.4 | 2.44 | |||
✓ | 97.2 | 84.6 | 71.9 | 72.2 | 50.9 | 71.5 | 95.1 | 79.2 | 63.4 | 62.8 | 50.1 | 63.2 | 2.55 | |||
✓ | ✓ | 96.2 | 85.2 | 71.3 | 73.5 | 66.7 | 71.5 | 94.4 | 79.9 | 63.3 | 63.9 | 62.5 | 63.6 | 2.32 | ||
✓ | ✓ | 96.4 | 85.4 | 71.8 | 73.2 | 46.8 | 71.8 | 95.3 | 80.2 | 63.8 | 63.4 | 55.1 | 63.6 | 2.02 | ||
✓ | ✓ | 96.2 | 86.8 | 71.2 | 73.3 | 50.1 | 71.4 | 94.5 | 79.9 | 64.0 | 64.1 | 60.1 | 63.9 | 2.07 | ||
✓ | ✓ | ✓ | x * | 96.5 | 86.1 | 71.2 | 72.4 | 57.9 | 71.4 | 94.5 | 78.5 | 63.4 | 63.9 | 60.0 | 63.4 | 1.88 |
✓ | ✓ | ✓ | ✓† | 97.2 | 87.0 | 71.7 | 73.1 | 54.6 | 71.9 | 94.6 | 80.9 | 64.5 | 67.4 | 62.5 | 65.1 | 1.84 |
Base | λ | μ | Detection Task (%) | Segmentation Task (%) | FPS | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | ||||
1.0 | 1.5 | 2.0 | 96.3 | 85.5 | 71.4 | 72.9 | 56.7 | 71.5 | 94.5 | 78.5 | 63.1 | 64.6 | 60.0 | 63.4 | 2.21 |
1.0 | 1.5 | 2.5 | 96.4 | 87.2 | 72.0 | 72.8 | 55.1 | 71.9 | 94.5 | 80.9 | 63.7 | 64.0 | 50.1 | 63.7 | 2.00 |
1.0 | 1.5 | 3.0 | 96.2 | 84.2 | 70.9 | 71.8 | 60.4 | 70.8 | 94.4 | 80.1 | 63.2 | 64.6 | 60.0 | 63.5 | 1.97 |
1.0 | 2.0 | 2.5 | 96.7 | 86.0 | 71.0 | 70.8 | 63.3 | 70.8 | 94.5 | 80.6 | 64.5 | 63.6 | 62.8 | 64.2 | 1.96 |
1.0 | 2.0 | 3.0 | 97.2 | 87.0 | 71.7 | 73.1 | 54.6 | 71.9 | 94.6 | 80.9 | 64.5 | 67.4 | 62.5 | 65.1 | 1.84 |
1.0 | 2.0 | 3.5 | 97.0 | 84.9 | 72.4 | 72.4 | 60.9 | 72.0 | 95.1 | 78.8 | 64.0 | 65.2 | 60.5 | 64.3 | 1.68 |
1.0 | 2.5 | 3.0 | 96.2 | 85.3 | 71.1 | 72.5 | 56.0 | 71.2 | 94.4 | 80.8 | 63.3 | 63.7 | 65.0 | 63.3 | 1.67 |
1.0 | 2.5 | 3.5 | 96.2 | 86.2 | 72.1 | 71.4 | 67.6 | 71.7 | 94.6 | 80.3 | 63.6 | 63.5 | 65.0 | 63.4 | 1.56 |
ROI (1.0) | ROIC1f (1.5) 1 | ROIC1 (2.0) | ROIC1b (2.5) 2 | ROIC2 (3.0) | Detection Task (%) | Segmentation Task (%) | FPS | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | ||||||
✓ | ✓ | ✓ | 97.2 | 87.0 | 71.7 | 73.1 | 54.6 | 71.9 | 94.6 | 80.9 | 64.5 | 67.4 | 62.5 | 65.1 | 1.84 | ||
✓ | ✓ | ✓ | ✓ | 96.2 | 86.2 | 70.8 | 72.1 | 55.9 | 70.9 | 94.3 | 77.4 | 62.5 | 63.2 | 60.0 | 62.5 | 1.53 | |
✓ | ✓ | ✓ | ✓ | 96.3 | 85.7 | 71.0 | 72.6 | 58.4 | 71.2 | 94.3 | 78.8 | 63.5 | 63.8 | 60.0 | 63.5 | 1.57 | |
✓ | ✓ | ✓ | ✓ | ✓ | 97.0 | 85.7 | 71.4 | 72.4 | 62.1 | 71.3 | 95.4 | 80.8 | 63.9 | 64.1 | 62.5 | 63.9 | 1.47 |
Base | λ | μ | Detection Task (%) | Segmentation Task (%) | FPS | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | ||||
1.0 | 2.0 | 3.0 | 97.2 | 87.0 | 71.7 | 73.1 | 54.6 | 71.9 | 94.6 | 80.9 | 64.5 | 67.4 | 62.5 | 65.1 | 1.84 |
1.0 | 0.7 | 0.5 | 96.3 | 83.8 | 71.2 | 70.8 | 50.2 | 70.8 | 95.3 | 78.8 | 63.5 | 63.7 | 55.1 | 63.6 | 1.90 |
Redesigned FPN? | Detection Task (%) | Segmentation Task (%) | FPS | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | ||
x * | 95.0 | 83.2 | 69.1 | 73.1 | 48.8 | 69.7 | 92.5 | 80.2 | 62.2 | 64.4 | 65.0 | 62.7 | 2.82 |
✓† | 97.2 | 87.0 | 71.7 | 73.1 | 54.6 | 71.9 | 94.6 | 80.9 | 64.5 | 67.4 | 62.5 | 65.1 | 1.84 |
Redesigned FPN | Detection Task (%) | Segmentation Task (%) | FPS | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P1 | P6 | PA | CARAFE | AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | |
-- | ✓ | 95.0 | 83.2 | 69.1 | 73.1 | 48.8 | 69.7 | 92.5 | 80.2 | 62.2 | 64.4 | 65.0 | 62.7 | 2.82 | ||
✓ | ✓ | 96.4 | 86.2 | 71.3 | 71.6 | 56.3 | 71.3 | 94.6 | 78.3 | 63.6 | 63.3 | 65.0 | 63.6 | 1.54 | ||
✓ | 96.9 | 84.8 | 71.1 | 72.5 | 48.0 | 71.2 | 94.4 | 79.3 | 63.2 | 63.9 | 60.5 | 63.3 | 1.94 | |||
✓ | ✓ | x * | 96.3 | 84.5 | 71.9 | 71.4 | 57.9 | 71.4 | 95.1 | 80.8 | 64.1 | 63.4 | 65.0 | 63.9 | 1.90 | |
✓ | ✓ | x◆ | 94.9 | 83.1 | 69.6 | 68.7 | 58.1 | 69.3 | 94.0 | 79.8 | 62.7 | 63.0 | 62.5 | 62.8 | 1.86 | |
✓ | ✓ | ✓† | 97.2 | 87.0 | 71.7 | 73.1 | 54.6 | 71.9 | 94.6 | 80.9 | 64.5 | 67.4 | 62.5 | 65.1 | 1.84 |
Type | Detection Task (%) | Segmentation Task (%) | FPS | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | ||
FPN [14] | 95.0 | 83.2 | 69.1 | 73.1 | 48.8 | 69.7 | 92.5 | 80.2 | 62.2 | 64.4 | 65.0 | 62.7 | 2.82 |
B-FPN [19] | 93.2 | 82.2 | 67.8 | 69.5 | 55.0 | 68.0 | 92.3 | 76.6 | 61.6 | 63.1 | 60.0 | 61.8 | 2.80 |
HR-FPN [47] | 94.3 | 82.1 | 68.3 | 71.1 | 57.1 | 68.8 | 92.5 | 78.6 | 61.5 | 63.2 | 65.0 | 61.8 | 2.70 |
CARAFE-FPN [66] | 93.4 | 81.8 | 68.4 | 68.6 | 50.9 | 68.1 | 91.9 | 77.4 | 61.7 | 61.6 | 55.7 | 61.5 | 2.76 |
PA-FPN [67] | 94.2 | 81.3 | 68.2 | 72.0 | 58.8 | 69.1 | 92.4 | 77.5 | 61.2 | 64.5 | 55.0 | 61.9 | 2.61 |
SS-FPN [100] | 96.4 | 86.3 | 70.7 | 71.7 | 63.0 | 70.7 | 94.5 | 78.5 | 62.9 | 63.8 | 65.0 | 63.1 | 2.47 |
Quad-FPN [70] | 96.4 | 86.3 | 71.7 | 72.7 | 55.2 | 71.5 | 94.5 | 79.7 | 63.7 | 63.7 | 55.1 | 63.5 | 1.65 |
Redesigned FPN | 97.2 | 87.0 | 71.7 | 73.1 | 54.6 | 71.9 | 94.6 | 80.9 | 64.5 | 67.4 | 62.5 | 65.1 | 1.84 |
Redesigned DH? | Detection Task (%) | Segmentation Task (%) | FPS | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | ||
x * | 95.9 | 82.2 | 68.7 | 66.8 | 51.8 | 67.9 | 95.6 | 76.9 | 62.1 | 60.5 | 47.6 | 61.7 | 2.15 |
✓† | 97.2 | 87.0 | 71.7 | 73.1 | 54.6 | 71.9 | 94.6 | 80.9 | 64.5 | 67.4 | 62.5 | 65.1 | 1.84 |
Redesigned DH | Detection Task (%) | Segmentation Task (%) | FPS | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Triple Structure | Cascaded | AP50 | AP75 | APS | APM | APL | AP | AP50 | AP75 | APS | APM | APL | AP | |
-- | -- | 95.9 | 82.2 | 68.7 | 66.8 | 51.8 | 67.9 | 95.6 | 76.9 | 62.1 | 60.5 | 47.6 | 61.7 | 2.15 |
✓ | 96.2 | 82.9 | 69.5 | 67.6 | 40.2 | 68.7 | 95.2 | 79.4 | 63.7 | 63.3 | 42.2 | 63.5 | 1.94 | |
✓ | ✓ | 97.2 | 87.0 | 71.7 | 73.1 | 54.6 | 71.9 | 94.6 | 80.9 | 64.5 | 67.4 | 62.5 | 65.1 | 1.84 |
Redesign FPN? | Redesign DH? | SIS? | Detection (%) | FPS | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Input | Backbone | RPN | ROI | AP50 | AP75 | APS | APM | APL | AP | ||||
Faster R-CNN | -- | -- | -- | -- | -- | -- | 90.3 | 75.2 | 61.4 | 66.4 | 21.5 | 62.1 | 13.65 |
✓ | 95.8 | 77.5 | 66.6 | 64.3 | 25.2 | 65.4 | 12.21 | ||||||
✓ | ✓ | 95.1 | 77.8 | 66.5 | 65.6 | 33.6 | 66.1 | 9.67 | |||||
✓ | ✓ | ✓ | 95.5 | 79.5 | 68.4 | 63.9 | 20.6 | 67.1 | 4.94 | ||||
✓ | ✓ | ✓ | ✓ | 95.7 | 81.0 | 69.0 | 67.6 | 41.4 | 68.3 | 4.00 | |||
✓ | ✓ | ✓ | ✓ | ✓ | 95.9 | 82.4 | 69.5 | 67.3 | 37.1 | 68.6 | 3.41 | ||
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 96.3 | 83.3 | 70.5 | 71.7 | 41.9 | 70.3 | 2.34 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shao, Z.; Zhang, X.; Wei, S.; Shi, J.; Ke, X.; Xu, X.; Zhan, X.; Zhang, T.; Zeng, T. Scale in Scale for SAR Ship Instance Segmentation. Remote Sens. 2023, 15, 629. https://doi.org/10.3390/rs15030629
Shao Z, Zhang X, Wei S, Shi J, Ke X, Xu X, Zhan X, Zhang T, Zeng T. Scale in Scale for SAR Ship Instance Segmentation. Remote Sensing. 2023; 15(3):629. https://doi.org/10.3390/rs15030629
Chicago/Turabian StyleShao, Zikang, Xiaoling Zhang, Shunjun Wei, Jun Shi, Xiao Ke, Xiaowo Xu, Xu Zhan, Tianwen Zhang, and Tianjiao Zeng. 2023. "Scale in Scale for SAR Ship Instance Segmentation" Remote Sensing 15, no. 3: 629. https://doi.org/10.3390/rs15030629
APA StyleShao, Z., Zhang, X., Wei, S., Shi, J., Ke, X., Xu, X., Zhan, X., Zhang, T., & Zeng, T. (2023). Scale in Scale for SAR Ship Instance Segmentation. Remote Sensing, 15(3), 629. https://doi.org/10.3390/rs15030629