A-BFPN: An Attention-Guided Balanced Feature Pyramid Network for SAR Ship Detection
Abstract
:1. Introduction
- (1)
- Two-stage detection methods. The representative two-stage methods include Faster-RCNN [15], feature pyramid network (FPN) [17], Mask-RCNN [18], region-based fully convolutional network (R-FCN) [19], and Cascade-RCNN [20]. The detection process includes two steps. First, a series of proposal boxes produced by the candidate regions, then those proposal boxes are classified, and the exact position of the proposal boxes is further regressed. R-CNN [21] is the first one that applied DL to object detection. Later, inspired by SPP-Net [22], Fast-RCNN [23] proposed a pooling layer of the region of interest (RoI), which improved processing speed and detection accuracy. Subsequently, Faster-RCNN and improved Fast-RCNN use the region proposal network (RPN) instead of selective search to extract proposal. It is worth noting that the emergence of Faster-RCNN is an important milestone for two-stage detectors. It is composed of a backbone network, RPN and boundary box regression network. The introduction of RPN significantly improves detection accuracy. On the contrary, it also greatly increases the cost of testing time. Then Mask-RCNN came into being, where FCN is introduced to generate relevant mask branches, and the corresponding RoI align solution is proposed for pixel bias in RoI pooling. Based on Faster R-CNN, R-FCN greatly improves the detection speed through network shared computing. The cascade-RCNN alleviates the problem of quality mismatch in training overfitting and inference through multi-level architecture and has a good optimization effect on RPN candidate regions. In addition, in order to solve the problem of multi-scale variation in object detection, many researchers proposed different multi-scale feature extraction modules. As is known to all, FPN builds a rich multi-scale feature pyramid through single-resolution input images, where each layer of the pyramid is used to detect targets of different scales. It integrates multiple layers of feature information and has been widely adopted by subsequent algorithms. In [24], atrous spatial pyramid pooling (ASPP) captures object and image contexts on multiple scales by using atrous convolution at multiple sampling rates. In [25], AugFPN further tapped the potential of multi-scale features by integrating three simple and effective components: consistency supervision, residual feature enhancement and soft RoI selection. In [26], Libra R-CNN solves the problem of imbalance in training through balanced IoU sampling, balanced L1 loss, and balanced feature pyramid network.
- (2)
- One-stage detection methods. The representative one-stage methods include YOLO [13], SSD [14], RetinaNet [27], CornerNet [28], and FCOS [29]. Different from the two-stage detector, RPN is not required, and the one-stage detector classifies and regresses the target directly at each position of the feature map. The YOLO algorithm regards detection as a regression problem and directly uses CNN to realize the whole detection process. It divides the raw image into grids, and the center of each object grid is responsible for predicting the location and category of target. However, one grid center predicts only one class of objects. The SSD detects objects of different sizes on multi-scale feature maps, where anchor module and multi-scale feature extraction layer are introduced to solve the shortcomings of YOLO rugged mesh and low detection accuracy of small objects. RetinaNet overcomes the imbalance of positive and negative samples in the detection by using focus loss. More recently, some anchor-free models have been proposed that do not require prior knowledge to design anchors, including the key point-based algorithms such as CornerNet, and anchor point-based algorithms such as FCOS. Compared with the two-stage detection methods based on R-CNN, the one-stage detection methods improves detection speed, but at the cost of accuracy.
2. Methods
2.1. Problem Formulation and Method Overview
2.2. Enhanced Refinement Module
2.3. Channel Attention-Guided Fusion Network
2.4. Loss Function
3. Results
3.1. Dataset Description and Settings
3.2. Evaluation Criteria
3.3. Results on SSDD
4. Discussion
4.1. Detection Results of Different Methods
4.2. Comparison with the Existing Methods
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Li, Z.; Wu, J.; Huang, Y.; Sun, Z.; Yang, J. Ground-moving target imaging and velocity estimation based on mismatched compression for bistatic forward-looking SAR. IEEE Trans. Geosci. Remote Sens. 2016, 54, 3277–3291. [Google Scholar] [CrossRef]
- Zhang, P.; Xu, H.; Tian, T.; Gao, P.; Li, L.; Zhao, T.; Zhang, N.; Tian, J. SEFEPNet: Scale Expansion and Feature Enhancement Pyramid Network for SAR Aircraft Detection with Small Sample Dataset. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2022, 15, 3365–3375. [Google Scholar] [CrossRef]
- Hong, Z.; Yang, T.; Tong, X.; Zhang, Y.; Jiang, S.; Zhou, R.; Han, Y.; Wang, J.; Yang, S.; Liu, S. Multi-Scale Ship Detection from SAR and Optical Imagery Via A More Accurate YOLOv3. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2021, 14, 6083–6101. [Google Scholar] [CrossRef]
- Du, L.; Dai, H.; Wang, Y.; Xie, X.; Wang, Z. Target discrimination based on weakly supervised learning for high-resolution SAR images in complex scenes. IEEE Trans. Geosci. Remote Sens. 2020, 58, 461–472. [Google Scholar] [CrossRef]
- Brusch, S.; Lehner, S.; Fritz, T.; Soccorsi, M.; Soloviev, A.; van Schie, B. Ship surveillance with TerraSAR-X. IEEE Trans. Geosci. Remote Sens. 2011, 49, 1092–1103. [Google Scholar] [CrossRef]
- Robey, F.; Fuhrmann, D.; Kelly, E. A CFAR adaptive matched filter detector. IEEE Trans. Aerosp. Electron. Syst. 1992, 28, 208–216. [Google Scholar] [CrossRef] [Green Version]
- Qin, X.; Zhou, S.; Zou, H.; Gao, G. A CFAR detection algorithm for generalized gamma distributed background in high-resolution SAR images. IEEE Geosci. Remote Sens. Lett. 2013, 10, 806–810. [Google Scholar]
- Wang, C.; Wang, Z.; Zhang, H.; Zhang, B.; Wu, F. A PolSAR ship detector based on a multi-polarimetric-feature combination using visual attention. Int. J. Remote Sens. 2014, 35, 7763–7774. [Google Scholar] [CrossRef]
- Atteia, G.; Collins, M. On the use of compact polarimetry SAR for ship detection. ISPRS J. Photogramm. Remote Sens. 2013, 80, 1–9. [Google Scholar] [CrossRef]
- Wang, C.; Bi, F.; Chen, L.; Chen, J. A novel threshold template algorithm for ship detection in high-resolution SAR images. In Proceedings of the IEEE International Geoscience Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 100–103. [Google Scholar]
- Zhu, J.; Qiu, X.; Pan, Z.; Zhang, Y.; Lei, B. Projection shape template-based ship target recognition in TerraSAR-X images. IEEE Geosci. Remote Sens. Lett. 2017, 14, 222–226. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.; Berg, A. SSD: Single shot MultiBox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar]
- Li, J.; Qu, C.; Shao, J. Ship detection in SAR images based on an improved faster R-CNN. In Proceedings of the Conference on SAR in Big Data Era-Models, Methods and Applications (BIGSARDATA), Beijing, China, 13–14 November 2017; pp. 1–6. [Google Scholar]
- Zhang, T.; Zhang, X.; Ke, X.; Zhan, X.; Shi, J.; Wei, S.; Pan, D.; Li, J.; Su, H.; Zhou, Y.; et al. LS-SSDD-v1.0: A Deep Learning Dataset Dedicated to Small Ship Detection from Large-Scale Sentinel-1 SAR Images. Remote Sens. 2020, 12, 2997. [Google Scholar] [CrossRef]
- Lin, T.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, Polynesia, 21–26 July 2017; pp. 936–944. [Google Scholar]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object detection via region-based fully convolutional networks. In Proceedings of the Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 379–387. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [Green Version]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 11–18 December 2015; pp. 1440–1448. [Google Scholar]
- Chen, L.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [Green Version]
- Guo, C.; Fan, B.; Zhang, Q.; Xiang, S.; Pan, C. AugFPN: Improving Multi-Scale Feature Learning for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12592–12601. [Google Scholar]
- Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra R-CNN: Towards balanced learning for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 821–830. [Google Scholar]
- Lin, T.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef] [Green Version]
- Law, H.; Deng, J. CornerNet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 765–781. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully convolutional one-stage object detection. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 9626–9635. [Google Scholar]
- He, J.; Wang, Y.; Liu, H.; Wang, N.; Wang, J. A novel automatic PolSAR ship detection method based on superpixel-level local information measurement. IEEE Geosci. Remote Sens. Lett. 2018, 15, 384–388. [Google Scholar] [CrossRef]
- Kapur, J.; Sahoo, P.; Wong, A. A new method for gray-level picture thresholding using the entropy of the histogram. Comput. Vis. Graph. Image Process. 1985, 29, 273–285. [Google Scholar] [CrossRef]
- Eldhuset, K. An automatic ship and ship wake detection system for spaceborne SAR images in coastal regions. IEEE Trans. Geosci. Remote Sens. 1996, 34, 1010–1019. [Google Scholar] [CrossRef]
- Shi, Z.; Yu, X.; Jiang, Z.; Li, B. Ship detection in high-resolution optical imagery based on anomaly detector and local shape feature. IEEE Trans. Geosci. Remote Sens. 2014, 52, 4511–4523. [Google Scholar]
- Fan, Q.; Chen, F.; Cheng, M.; Lou, S.; Xiao, R.; Zhang, B.; Wang, C.; Li, J. Ship detection using a fully convolutional network with compact polarimetric SAR images. Remote Sens. 2019, 11, 2171. [Google Scholar] [CrossRef] [Green Version]
- Kang, M.; Ji, K.; Leng, X.; Lin, Z. Contextual region-based convolutional neural network with multilayer fusion for SAR ship detection. Remote Sens. 2017, 9, 860. [Google Scholar] [CrossRef] [Green Version]
- Guo, H.; Yang, X.; Wang, N.; Song, B.; Gao, X. A rotational Libra R-CNN method for ship detection. IEEE Trans. Geosci. Remote Sens. 2020, 58, 5772–5781. [Google Scholar] [CrossRef]
- Li, D.; Liang, Q.; Liu, H.; Liu, Q.; Liu, H.; Liao, G. A novel multidimensional domain deep learning network for SAR ship detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
- Yu, Y.; Yang, X.; Li, J.; Gao, X. A cascade rotated anchor-aided detector for ship detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
- Lin, Z.; Ji, K.; Leng, X.; Kuang, G. Squeeze and excitation rank faster R-CNN for ship detection in SAR images. IEEE Geosci. Remote Sens. Lett. 2019, 16, 751–755. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.; Kweon, I. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Zhang, T.; Zhang, X.; Shi, J.; Wei, S. HyperLi-Net: A hyper-light deep learning network for high-accurate and high-speed ship detection from synthetic aperture radar imagery. ISPRS J. Photogramm. Remote Sens. 2020, 167, 123–153. [Google Scholar] [CrossRef]
- Du, Y.; Du, L.; Li, L. An SAR Target Detector Based on Gradient Harmonized Mechanism and Attention Mechanism. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Su, N.; He, J.; Yan, Y.; Zhao, C.; Xing, X. SII-Net: Spatial Information Integration Network for Small Target Detection in SAR Images. Remote Sens. 2022, 14, 422. [Google Scholar] [CrossRef]
- Cui, Z.; Li, Q.; Cao, Z.; Liu, N. Dense attention pyramid networks for multi-scale ship detection in SAR images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8983–8997. [Google Scholar] [CrossRef]
- Fu, J.; Sun, X.; Wang, Z.; Fu, K. An anchor-free method based on feature balancing and refinement network for multiscale ship detection in SAR images. IEEE Trans. Geosci. Remote Sens. 2021, 59, 1331–1344. [Google Scholar] [CrossRef]
- Tan, M.; Pang, R.; Quoc, V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar]
- Wei, S.; Su, H.; Ming, J.; Wang, C.; Yan, M.; Kumar, D.; Shi, J.; Zhang, X. Precise and robust ship detection for high-resolution SAR imagery based on HR-SDNet. Remote Sens. 2020, 12, 167. [Google Scholar] [CrossRef] [Green Version]
- Zhang, T.; Zhang, X.; Ke, X. Quad-FPN: A novel quad feature pyramid network for SAR ship detection. Remote Sens. 2021, 13, 2771. [Google Scholar] [CrossRef]
- Zhang, X.; Huo, C.; Xu, N.; Jiang, H.; Cao, Y.; Ni, L.; Pan, C. Multitask Learning for Ship Detection from Synthetic Aperture Radar Images. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2021, 14, 8048–8062. [Google Scholar] [CrossRef]
Sensors | Polarization | Resolution | Position |
---|---|---|---|
TerraSAR-X | |||
Sentinel-1 | VH, HV VV, HH | 1–15 m | offshore inshore |
RadarSat-2 |
Method | ERM | CAFN | mAP | F1-Score | P | R |
---|---|---|---|---|---|---|
BFPN | 0.994 | 0.988 | 0.989 | 0.987 | ||
BFPN with ERM | √ | 0.996 | 0.985 | 0.984 | 0.987 | |
BFPN with CAFN | √ | 0.993 | 0.987 | 0.991 | 0.983 | |
Proposed method with ERM and CAFN | √ | √ | 0.995 | 0.988 | 0.989 | 0.987 |
Method | ERM | CAFN | mAP | F1-Score | P | R |
---|---|---|---|---|---|---|
BFPN | 0.835 | 0.796 | 0.778 | 0.814 | ||
BFPN with ERM | √ | 0.867 | 0.834 | 0.969 | 0.732 | |
BFPN with CAFN | √ | 0.865 | 0.815 | 0.868 | 0.767 | |
Proposed method with ERM and CAFN | √ | √ | 0.883 | 0.836 | 0.935 | 0.756 |
Method | t (s) | FPS | Parameters (M) |
---|---|---|---|
BFPN | 0.123 | 8.13 | 60.40 |
BFPN with ERM | 0.128 | 7.81 | 60.43 |
BFPN with CAFN | 0.131 | 7.63 | 60.43 |
Proposed method with ERM and CAFN | 0.133 | 7.52 | 60.46 |
Method | AP | AP50 | AP75 | APS | APM | APL |
---|---|---|---|---|---|---|
FPN | 0.584 | 0.945 | 0.659 | 0.545 | 0.637 | 0.519 |
BFPN | 0.594 | 0.954 | 0.662 | 0.563 | 0.647 | 0.584 |
Proposed method | 0.596 | 0.968 | 0.699 | 0.579 | 0.654 | 0.591 |
Methods | Entire Scenes | Offshore Scenes | Inshore Scenes | ||||||
---|---|---|---|---|---|---|---|---|---|
mAP | P | R | mAP | P | R | mAP | P | R | |
Faster RCNN [12] | 0.908 | 0.934 | 0.887 | 0.984 | 0.976 | 0.986 | 0.718 | 0.823 | 0.703 |
FPN [17] | 0.945 | 0.955 | 0.876 | 0.988 | 0.976 | 0.981 | 0.810 | 0.869 | 0.697 |
BFPN [26] | 0.954 | 0.962 | 0.893 | 0.994 | 0.989 | 0.987 | 0.835 | 0.778 | 0.814 |
HR-SDNet [47] | 0.908 | 0.964 | 0.909 | 0.985 | 0.986 | 0.986 | 0.736 | 0.907 | 0.744 |
DAPN [44] | 0.905 | 0.855 | 0.913 | 0.974 | 0.975 | 0.975 | 0.732 | 0.641 | 0.779 |
Quad-FPN [48] | 0.952 | 0.895 | 0.857 | 0.993 | 0.973 | 0.994 | 0.846 | 0.747 | 0.877 |
SER Faster R-CNN [39] | 0.915 | 0.861 | 0.922 | 0.982 | 0.968 | 0.983 | 0.745 | 0.663 | 0.790 |
RIF [37] | 0.962 | 0.946 | 0.932 | 0.992 | 0.985 | 0.982 | 0.852 | 0.903 | 0.762 |
Proposed method | 0.968 | 0.975 | 0.944 | 0.995 | 0.989 | 0.987 | 0.883 | 0.935 | 0.756 |
Methods | Entire Scenes | Offshore Scenes | Inshore Scenes | ||||||
---|---|---|---|---|---|---|---|---|---|
mAP | P | R | mAP | P | R | mAP | P | R | |
Faster RCNN [12] | 0.630 | 0.735 | 0.658 | 0.846 | 0.815 | 0.874 | 0.253 | 0.491 | 0.291 |
FPN [17] | 0.748 | 0.737 | 0.777 | 0.899 | 0.828 | 0.919 | 0.467 | 0.559 | 0.536 |
BFPN [26] | 0.736 | 0.735 | 0.767 | 0.901 | 0.815 | 0.920 | 0.383 | 0.778 | 0.419 |
HR-SDNet [47] | 0.688 | 0.849 | 0.705 | 0.883 | 0.875 | 0.899 | 0.348 | 0.760 | 0.378 |
MTL-Det [49] | 0.717 | - | - | 0.887 | - | - | 0.387 | - | - |
SII-Net [43] | 0.761 | 0.682 | 0.793 | 0.916 | 0.819 | 0.934 | 0.469 | 0.461 | 0.554 |
Proposed method | 0.766 | 0.850 | 0.736 | 0.921 | 0.921 | 0.889 | 0.471 | 0.770 | 0.545 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, X.; Li, D.; Liu, H.; Wan, J.; Chen, Z.; Liu, Q. A-BFPN: An Attention-Guided Balanced Feature Pyramid Network for SAR Ship Detection. Remote Sens. 2022, 14, 3829. https://doi.org/10.3390/rs14153829
Li X, Li D, Liu H, Wan J, Chen Z, Liu Q. A-BFPN: An Attention-Guided Balanced Feature Pyramid Network for SAR Ship Detection. Remote Sensing. 2022; 14(15):3829. https://doi.org/10.3390/rs14153829
Chicago/Turabian StyleLi, Xiuqin, Dong Li, Hongqing Liu, Jun Wan, Zhanye Chen, and Qinghua Liu. 2022. "A-BFPN: An Attention-Guided Balanced Feature Pyramid Network for SAR Ship Detection" Remote Sensing 14, no. 15: 3829. https://doi.org/10.3390/rs14153829
APA StyleLi, X., Li, D., Liu, H., Wan, J., Chen, Z., & Liu, Q. (2022). A-BFPN: An Attention-Guided Balanced Feature Pyramid Network for SAR Ship Detection. Remote Sensing, 14(15), 3829. https://doi.org/10.3390/rs14153829