SEFPN: Scale-Equalizing Feature Pyramid Network for Object Detection
Abstract
:1. Introduction
2. Related Work
2.1. Deep Object Detector
2.2. Feature Fusion
3. Framework
3.1. Overall
3.2. Multi-Level Libra
3.3. Multi-Block Libra
3.4. Refine Method
4. Experiments
4.1. Dataset and Evaluation Metrics
4.2. Implementation Details
4.3. Main Results
4.4. Computing Costs
4.5. Ablation Study
4.5.1. Effectiveness of the Block Number
4.5.2. Effectiveness of Batch Normalization Method
4.5.3. Effectiveness of Refine Method
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A Survey of the Recent Architectures of Deep Convolutional Neural Networks. arXiv 2019, arXiv:1901:06032. [Google Scholar]
- Zou, Z.; Shi, Z.; Guo, Y.; Ye, J. Object Detection in 20 Years: A Survey. arXiv 2019, arXiv:1905.05055. [Google Scholar]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Newell, A.; Yang, K.; Deng, J. Stacked hourglass networks for human pose estimation. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
- Chen, K.; Pang, J.; Wang, J.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Shi, J.; Ouyang, W. Hybrid task cascade for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Girshick, R. Fast R-CNN. In Proceedings of the International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Cao, G.; Xie, X.; Yang, W.; Liao, Q.; Shi, G.; Wu, J. Feature-fused SSD: Fast detection for small objects. In Proceedings of the Ninth International Conference on Graphic and Image Processing (ICGIP 2017), Qingdao, China, 14–16 October 2017; International Society for Optics and Photonics: Qingdao, China, 2018; Volume 10615, p. 106151. [Google Scholar]
- Fu, C.-Y.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. DSSD: Deconvolutional Single Shot Detector. arXiv 2017, arXiv:1701.06659. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
- Kong, T.; Sun, F.; Liu, H.; Jiang, Y.; Li, L.; Shi, J. Foveabox: Beyound anchor-based object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 29, 7389–7398. [Google Scholar]
- Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 128, 642–656. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Guo, C.; Fan, B.; Zhang, Q.; Xiang, S.; Pan, C. AugFPN: Improving Multi-Scale Feature Learning for Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Zhu, C.; He, Y.; Savvides, M. Feature selective anchor-free module for single-shot object detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9759–9768. [Google Scholar]
- Wang, X.; Zhang, S.; Yu, Z.; Feng, L.; Zhang, W. Scale-Equalizing Pyramid Convolution for Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra R-CNN: Towards Balanced Learning for Object Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Singh, B.; Davis, L.S. An Analysis of Scale Invariance in Object Detection—SNIP. In Proceedings of the 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Singh, B.; Najibi, M.; Davis, L.S. SNIPER: Efficient Multi-Scale Training. In Proceedings of the 2018 NIPS, Vancouver Convention Center, Vancouver, BC, Canada, 8–14 November 2019. [Google Scholar]
- Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.; Chen, J.; Liu, X.; Pietikäinen, M. Deep Learning for Generic Object Detection: A Survey. arXiv 2018, arXiv:1809:02165. [Google Scholar] [CrossRef] [Green Version]
- Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Single-Shot Refinement Neural Network for Object Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Ghiasi, G.; Lin, T.Y.; Le, Q.V. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Liu, S.; Huang, D.; Wang, Y. Learning Spatial Fusion for Single-Shot Object Detection. arXiv 2019, arXiv:1911.09516. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local Neural Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea, 27–28 October 2019. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. MMDetection: Open Mmlab Detection Toolbox and Benchmark. 2019. Available online: https://github.com/open-mmlab/mmdetection (accessed on 1 August 2021).
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. arXiv 2016, arXiv:1612.08242. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. arXiv 2017, arXiv:1708.02002. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Chen, C.L.; Lin, D. CARAFE: Content-Aware ReAssembly of FEatures. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Method | Backbone | Epoch | AP | |||||
---|---|---|---|---|---|---|---|---|
One-stage | ||||||||
FoveaBox * [14] | ResNet50 | 1× | 36.2 | 56.1 | 38.5 | 20.4 | 39.7 | 46.2 |
FoveaBox * [14] | ResNet101 | 1× | 38.3 | 58.3 | 40.9 | 21.4 | 42.4 | 50.0 |
FoveaBox * [14] | ResNet101 | 2× | 38.9 | 58.7 | 42.0 | 22.1 | 42.8 | 50.6 |
YOLOV2 [33] | Darknet19 | - | 21.6 | 44.0 | 19.2 | 5.0 | 22.4 | 35.5 |
YOLOV3 [10] | Darknet53 | - | 33.0 | 57.9 | 34.4 | 18.3 | 35.4 | 41.9 |
RetinaNet [34] | ResNet101 | - | 39.1 | 59.1 | 42.3 | 21.8 | 42.7 | 50.2 |
FCOS * [35] | ResNet50 | 1× | 37.0 | 56.6 | 39.4 | 20.8 | 39.8 | 46.4 |
SSD512 [13] | ResNet101 | - | 31.2 | 50.4 | 33.3 | 10.2 | 34.5 | 49.8 |
CARAFE [36] | ResNet50 | - | 38.1 | 60.7 | 41.0 | 22.8 | 41.2 | 46.9 |
Two-stage | ||||||||
Faster RCNN * [7] | ResNet50 | 1× | 36.5 | 58.7 | 39.1 | 21.5 | 39.7 | 44.6 |
Faster RCNN * [7] | ResNet101 | 1× | 38.9 | 60.9 | 42.3 | 22.4 | 42.4 | 48.3 |
Faster RCNN * [7] | ResNet101 | 2× | 39.7 | 61.4 | 43.3 | 22.3 | 42.9 | 50.4 |
ours | ||||||||
FoveaBox w/SEFPN | ResNet50 | 1× | 37.3 | 58.0 | 39.6 | 22.0 | 41.0 | 47.7 |
FoveaBox w/SEFPN | ResNet101 | 1× | 39.1 | 59.5 | 41.7 | 22.9 | 43.0 | 50.6 |
FoveaBox w/SEFPN | ResNet101 | 2× | 39.9 | 61.0 | 42.8 | 23.3 | 43.8 | 52.6 |
Faster RCNN w/SEFPN | ResNet50 | 1× | 37.3 | 58.6 | 40.6 | 22.2 | 40.3 | 47.7 |
Faster RCNN w/SEFPN | ResNet101 | 1× | 39.6 | 60.6 | 43.4 | 22.8 | 43.5 | 52.0 |
Faster RCNN w/SEFPN | ResNet101 | 2× | 39.7 | 60.2 | 43.6 | 22.3 | 43.2 | 52.5 |
Method | FPS | FLOP(G) | Params(M) |
---|---|---|---|
Faster RCNN/w FPN | 5.3 | 207.07 | 41.53 |
Faster RCNN/w SEFPN | 4.8 | 207.17 | 41.57 |
Foveabox/w FPN | 7.3 | 206.3 | 36.19 |
Foveabox/w SEFPN | 6.5 | 211.9 | 36.46 |
Block | AP | |||||
---|---|---|---|---|---|---|
1 | 37.0 | 57.5 | 39.2 | 22.1 | 40.9 | 47.1 |
2 | 37.3 | 57.9 | 39.3 | 21.8 | 41.1 | 48.5 |
4 | 36.9 | 57.8 | 38.9 | 21.7 | 41.0 | 47.2 |
Type | AP | |||||
---|---|---|---|---|---|---|
None | 37.0 | 57.5 | 39.2 | 21.2 | 40.9 | 47.1 |
BN_pre | 36.1 | 57.2 | 38.0 | 21.0 | 40.3 | 46.3 |
BN_post | 37.3 | 58.0 | 39.6 | 22.0 | 41.0 | 47.7 |
Type | AP | |||||
---|---|---|---|---|---|---|
None | 36.9 | 57.2 | 39.2 | 21.0 | 40.7 | 47.5 |
NonLocal | 37.3 | 57.9 | 39.3 | 21.8 | 41.1 | 48.5 |
GC Block | 37.2 | 57.9 | 39.2 | 22.6 | 41.1 | 47.2 |
GC Block+ | 36.9 | 57.4 | 39.2 | 21.9 | 40.9 | 47.4 |
Conv | 37.1 | 57.1 | 39.5 | 20.5 | 40.7 | 48.4 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Z.; Qiu, X.; Li, Y. SEFPN: Scale-Equalizing Feature Pyramid Network for Object Detection. Sensors 2021, 21, 7136. https://doi.org/10.3390/s21217136
Zhang Z, Qiu X, Li Y. SEFPN: Scale-Equalizing Feature Pyramid Network for Object Detection. Sensors. 2021; 21(21):7136. https://doi.org/10.3390/s21217136
Chicago/Turabian StyleZhang, Zhiqiang, Xin Qiu, and Yongzhou Li. 2021. "SEFPN: Scale-Equalizing Feature Pyramid Network for Object Detection" Sensors 21, no. 21: 7136. https://doi.org/10.3390/s21217136
APA StyleZhang, Z., Qiu, X., & Li, Y. (2021). SEFPN: Scale-Equalizing Feature Pyramid Network for Object Detection. Sensors, 21(21), 7136. https://doi.org/10.3390/s21217136