Combinational Fusion and Global Attention of the Single-Shot Method for Synthetic Aperture Radar Ship Detection
Abstract
:1. Introduction
2. Related Work
3. Approach
3.1. Global Attention Module
3.2. Combinational Fusion
3.3. Reducing Convolution Computation
3.4. Anchor Design
3.5. Normalization Parameter Setting
3.6. Mixed Loss Function Design
3.7. Training and Inference
4. Experiments
4.1. Evaluation Metrics
4.2. Comparative Experiment
4.2.1. Test on VOC2007 Dataset
4.2.2. Test on SSDD Dataset
4.2.3. Test on NWPUVHR-10 Dataset
4.3. Error Analysis and Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
CF-SSD | Combinational Fusion Shot Multi-box Detector |
GAM | Global Attention Module |
CF | Combinational Fusion |
SAR | Synthetic Aperture Radar |
FCOS | Fully Convolutional One-Stage Object Detection |
YOLO | You Only Look Once |
SSD | Single-Shot Multi-Box Detector |
CNN | Convolutional Neural Network |
FPN | Feature Pyramid Network |
R-CNN | Regions with CNN Features |
DSSD | Deconvolutional Single-Shot Detector |
RSSD | Rainbow Single-Shot Detector |
FSSD | Feature Fusion Single-Shot Multi-box Detector |
FMSSD | Feature-Merged Single-Shot Detector |
FCN | Fully Convolution Network |
ION | Inside-Outside Net |
PANet | Path Aggregation Network |
BPN | Bidirectional Pyramid Network |
BiFPN | Bidirectional Feature Pyramid Network |
EFPN | Extended Feature Pyramid Network |
CBAM | Convolutional Block Attention Module |
SE | Squeeze and Excitation |
RPN | Region Proposal Network |
FPS | Frames Per Second |
ReLU | Rectified Linear Unit |
BN | Batch Normalization |
IOU | Intersection Over Union |
TP | True Positive |
FP | False Positive |
FN | False Negative |
NMS | Non-Maximum Suppression |
SGD | Stochastic Gradient Descent |
GPU | Graphics Processing Unit |
RPN | Region Proposal Network |
References
- Liu, W.; Ma, L.; Chen, H. Arbitrary-Oriented Ship Detection Framework in Optical Remote-Sensing Images. IEEE Geosci. Remote Sens. Lett. 2018, 15, 937–941. [Google Scholar] [CrossRef]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask r-cnn. In Proceedings of the 2017 IEEE International conference on Computer vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards realtime object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 6, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- Bell, S.; Lawrence, C.Z.; Bala, K.; Girshick, R. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2874–2883. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolo9000: Better, faster, stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.; Berg, A. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Xianxiang, Q.; Shilin, Z.; Huanxin, Z.; Gui, G. A CFAR Detection Algorithm for Generalized Gamma Distributed Background in High-Resolution SAR Images. IEEE Geosci. Remote Sens. Lett. 2013, 10, 806–810. [Google Scholar] [CrossRef]
- Yin, W.; Diao, W.; Wang, P.; Gao, X.; Li, Y.; Sun, X. PCAN—Part-Based Context Attention Network for Thermal Power Plant Detection in Remote Sensing Imagery. Remote Sens. 2021, 13, 1243. [Google Scholar] [CrossRef]
- Cui, Z.; Li, Q.; Cao, Z.; Liu, N. Dense Attention Pyramid Networks for Multi-Scale Ship Detection in SAR Images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8983–8997. [Google Scholar] [CrossRef]
- Fu, C.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. DSSD: Deconvolutional single shot detector. arXiv 2017, arXiv:1701.06659. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Jeong, J.; Park, H.; Kwak, N. Enhancement of SSD by concatenating feature maps for object detection. arXiv 2017, arXiv:1705.09587. [Google Scholar]
- Tsung, Y.L.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 99, 2999–3007. [Google Scholar]
- Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Single-shot refinement neural network for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4203–4212. [Google Scholar]
- Wu, X.; Zhang, D.; Zhu, J.; Hoi, S.C.H. Single-Shot Bidirectional Pyramid Networks for High-Quality Object Detection. Neurocomputing 2020, 401, 1–9. [Google Scholar] [CrossRef] [Green Version]
- Li, Z.; Zhou, F. FSSD: Feature Fusion Single Shot Multibox Detector. arXiv 2017, arXiv:1712.00960. [Google Scholar]
- Liu, S.; Di, H.; Wang, Y. Receptive Field Block Net for Accurate and Fast Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Leng, J.; Liu, Y. Single-shot augmentation detector for object detection. Neural Comput. Appl. 2020, 33, 3583–3596. [Google Scholar] [CrossRef]
- Zheng, P.; Bai, H.Y.; Li, W.; Guo, H.W. Small target detection algorithm in complex background. J. Zhejiang Univ. Eng. Sci. 2020, 54, 1–8. [Google Scholar]
- Chang, Y.-L.; Anagaw, A.; Chang, L.; Wang, Y.; Hsiao, C.-Y.; Lee, W.-H. Ship Detection Based on YOLOv2 for SAR Imagery. Remote Sens. 2019, 11, 786. [Google Scholar] [CrossRef] [Green Version]
- Jin, K.; Chen, Y.; Xu, B.; Yin, J.; Wang, X.; Yang, J. A Patch-to-Pixel Convolutional Neural Network for Small Ship Detection with PolSAR Images. IEEE Trans. Geosci. Remote Sens. 2020, 58, 6623–6638. [Google Scholar] [CrossRef]
- Wei, S.; Su, H.; Ming, J.; Wang, C.; Yan, M.; Kumar, D.; Shi, J.; Zhang, X. Precise and Robust Ship Detection for High-Resolution SAR Imagery Based on HR-SDNet. Remote Sens. 2020, 12, 167. [Google Scholar] [CrossRef] [Green Version]
- Tang, G.; Zhuge, Y.; Claramunt, C.; Men, S. N-YOLO: A SAR Ship Detection Using Noise-Classifying and Complete-Target Extraction. Remote Sens. 2021, 13, 871. [Google Scholar] [CrossRef]
- Chen, L.; Shi, W.; Deng, D. Improved YOLOv3 Based on Attention Mechanism for Fast and Accurate Ship Detection in Optical Remote Sensing Images. Remote Sens. 2021, 13, 660. [Google Scholar] [CrossRef]
- Zhao, Y.; Zhao, L.; Xiong, B.; Kuang, G. Attention Receptive Pyramid Network for Ship Detection in SAR Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 2738–2756. [Google Scholar] [CrossRef]
- Yu, L.; Wu, H.; Zhong, Z.; Zheng, L.; Deng, Q.; Hu, H. TWC-Net: A SAR Ship Detection Using Two-Way Convolution and Multiscale Feature Mapping. Remote Sens. 2021, 13, 2558. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Liu, Y.; Wang, Y.; Wang, S.; Liang, T.; Zhou, Q.; Tang, Z.; Ling, H. CBNet: A Novel Composite Backbone Network Architecture for Object Detection. arXiv 2019, arXiv:1909.03625. [Google Scholar] [CrossRef]
- Iandola, F.; Moskewicz, M.; Karayev, S.; Girshick, R.; Darrell, T.; Keutzer, K. Densenet: Implementing efficient convnet descriptor pyramids. arXiv 2014, arXiv:1404.1869. [Google Scholar]
- Deng, C.; Wang, M.; Liu, L.; Liu, Y. Extended Feature Pyramid Network for Small Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Zhu, M.; Han, K.; Yu, C.; Wang, Y. Dynamic Feature Pyramid Networks for Object Detection. arXiv 2020, arXiv:1612.03144. [Google Scholar]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2818–2826. [Google Scholar]
- Li, J.; Qu, C.; Shao, J. Ship detection in SAR images based on an improved faster R-CNN. In Proceedings of the 2017 SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA), Beijing, China, 13–14 November 2017; pp. 1–6. [Google Scholar]
- NWPU VHR-10 Dataset. Available online: http://www.escience.cn/people/gongcheng/NWPU-VHR-10.html (accessed on 21 October 2021).
- Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object Detection via Region-based Fully Convolutional Networks. In Proceedings of the 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 5–8 December 2016. [Google Scholar]
- Zhu, Y.; Zhao, C.; Wang, J.; Zhao, X.; Wu, Y.; Lu, H. Couplenet: Coupling global structure with local parts for object detection. In Proceedings of the IEEE international conference on computer vision, Venice, Italy, 22–29 October 2017; pp. 4126–4134. [Google Scholar]
- Liu Wang, X.L. Single-stage object detection using filter pyramid and atrous convolution. J. Image Graph. 2020, 25, 0102–0112. [Google Scholar]
- Han, X.; Zhong, Y.; Zhang, L. An efficient and robust integrated geospatial object detection framework for high spatial res-olution remote sensing imagery. Remote Sens. 2017, 9, 666. [Google Scholar] [CrossRef] [Green Version]
- Xu, Z.; Xin, X.; Wang, L.; Yang, R.; Pu, F. Deformable convnet with aspect ratio constrained NMS for object detection in remote sensing imagery. Remote Sens. 2017, 9, 1312. [Google Scholar] [CrossRef] [Green Version]
- Ren, Y.; Zhu, C.; Xiao, S. Deformable faster R-CNN with aggregating multi-layer features for partially occluded object detection in optical remote sensing images. Remote Sens. 2018, 10, 1470. [Google Scholar] [CrossRef] [Green Version]
- Chen, S.; Zhan, R.; Zhang, J. Geospatial object detection in remote sensing imagery based on multiscale single-shot detector with activated semantics. Remote Sens. 2018, 10, 820. [Google Scholar] [CrossRef] [Green Version]
- Guo, W.; Yang, W.; Zhang, H.; Hua, G. Geospatial object detection in high resolution satellite images based on multi-scale convolutional neural network. Remote Sens. 2018, 10, 131. [Google Scholar] [CrossRef] [Green Version]
- Wang, P.; Sun, X.; Diao, W.; Fu, K. FMSSD: Feature-Merged Single-Shot Detection for Multiscale Objects in Large-Scale Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3377–3390. [Google Scholar] [CrossRef]
Anchor Setting | The Ratio of Object Area to Image Area | |||
---|---|---|---|---|
<0.004 | 0.004–0.01 | 0.01–0.05 | ≥0.05 | |
SSDD dataset | ✓ | ✓ | ✓ | ✓ |
default prior anchor | ✕ | ✕ | ✓ | ✓ |
adjusted prior anchor | ✓ | ✓ | ✓ | ✓ |
Method | Backbone | Input Size | FPS | mAP |
---|---|---|---|---|
Faster RCNN [4] | VGG16 | 600 × 1000 | 7 | 0.732 |
Faster RCNN [4] | ResNet101 | 600 × 1000 | 2.4 | 0.764 |
ION [5] | VGG16 | 600 × 1000 | 1.25 | 0.765 |
R-FCN [46] | ResNet101 | 600 × 1000 | 9 | 0.805 |
R-FCN Cascade [2] | ResNet101 | 600 × 1000 | 7 | 0.810 |
CoupleNet [47] | ResNet101 | 600 × 1000 | 7 | 0.817 |
YOLOv2 [7] | Darknet19 | 352 × 352 | 81 | 0.737 |
YOLOv3 [8] | ResNet34 | 320 × 320 | − | 0.801 |
SSD300 [9] | VGG16 | 300 × 300 | 46 | 0.772 |
DSSD320 [14] | ResNet101 | 320 × 320 | 9.5 | 0.786 |
RSSD300 [17] | VGG16 | 300 × 300 | 35 | 0.785 |
FSSD300 [21] | VGG16 | 300 × 300 | 36 | 0.788 |
RefineDet320 [19] | VGG16 | 320 × 320 | 40 | 0.800 |
RFBNet300 [22] | VGG16 | 300 × 300 | − | 0.807 |
AFP-SSD [48] | VGG16 | 300 × 300 | 21 | 0.793 |
F_SE_SSD [24] | VGG16 | 300 × 300 | 35 | 0.804 |
BPN320 [20] | VGG16 | 320 × 320 | 32 | 0.803 |
CF-SSD300 | ResNet50 | 300 × 300 | 33 | 0.809 |
Component | mAP |
---|---|
Original SSD | 0.8822 |
SSD | 0.8871 |
SSD + CF | 0.8994 |
SSD + CF + Mixed loss | 0.9011 |
SSD + GAM + CF + Mixed loss | 0.9030 |
SSD + SE + CF + Mixed loss | 0.9003 |
SSD + SA + CF + Mixed loss | 0.9002 |
Method | Input Size | Backbone | FPS | mAP |
---|---|---|---|---|
SSD [9] | 300 × 300 | VGG16 | 49 | 0.887 |
SSD+FPN | 300 × 300 | ResNet50 | 40 | 0.896 |
FSSD [21] | 300 × 300 | VGG16 | 38 | 0.894 |
RetinaNet384+FPN [18] | 384 × 384 | ResNet50 | 24 | 0.878 |
RetinaNet480+FPN [18] | 480 × 480 | ResNet50 | 19 | 0.896 |
Faster RCNN [4] | 320 × 320 | ResNet50 | 5 | 0.888 |
FCOS+FPN [10] | 384 × 384 | ResNet50 | 16 | 0.901 |
CF-SSD | 300 × 300 | ResNet50 | 35 | 0.903 |
Method | Input Size | Backbone | Inference Time (s) | mAP |
---|---|---|---|---|
R-P-Faster RCNN [49] | 512 × 512 | VGG16 | 0.155 | 0.765 |
SSD512 [9] | 512 × 512 | VGG16 | 0.061 | 0.784 |
Deformable R-FCN [50] | 512 × 512 | ResNet101 | 0.201 | 0.791 |
Faster RCNN [4] | 600 × 1000 | VGG16 | 0.16 | 0.809 |
Deformable Faster RCNN [51] | 600 × 1000 | VGG16 | − | 0.844 |
RetinaNet512 [18] | 512 × 512 | ResNet101 | 0.17 | 0.882 |
RDAS512 [52] | 512 × 512 | VGG16 | 0.057 | 0.895 |
Multi-scale CNN [53] | 512 × 512 | VGG16 | 0.11 | 0.896 |
YOLOv3 [8] | 512 × 512 | Darknet53 | 0.047 | 0.896 |
FMSSD [54] | 512 × 512 | VGG16 | − | 0.904 |
CF-SSD512 | 512 × 512 | ResNet50 | 0.084 | 0.906 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, L.; Pang, C.; Guo, Y.; Shu, Z. Combinational Fusion and Global Attention of the Single-Shot Method for Synthetic Aperture Radar Ship Detection. Remote Sens. 2021, 13, 4781. https://doi.org/10.3390/rs13234781
Xu L, Pang C, Guo Y, Shu Z. Combinational Fusion and Global Attention of the Single-Shot Method for Synthetic Aperture Radar Ship Detection. Remote Sensing. 2021; 13(23):4781. https://doi.org/10.3390/rs13234781
Chicago/Turabian StyleXu, Libo, Chaoyi Pang, Yan Guo, and Zhenyu Shu. 2021. "Combinational Fusion and Global Attention of the Single-Shot Method for Synthetic Aperture Radar Ship Detection" Remote Sensing 13, no. 23: 4781. https://doi.org/10.3390/rs13234781
APA StyleXu, L., Pang, C., Guo, Y., & Shu, Z. (2021). Combinational Fusion and Global Attention of the Single-Shot Method for Synthetic Aperture Radar Ship Detection. Remote Sensing, 13(23), 4781. https://doi.org/10.3390/rs13234781