Novel Joint Object Detection Algorithm Using Cascading Parallel Detectors
Abstract
:1. Introduction
- We propose a new object detection algorithm called RCGrid R-CNN, which comprises cascading detector modules with parallel anchor-based branch and anchor-free branch. Thus, it improves the prediction of the anchor box and object classification efficiently.
- We design a new model with the anchor-based and anchor-free branches in parallel, which improves the accuracy of the anchor box and objects classification of Grid R-CNN comprehensively. In the anchor-based branch, GA-RPN is employed to obtain a more accurate anchor shape. In the anchor-free branch, the FSAF branch in parallel with the anchor-based branch is added to obtain a more precise anchor box prediction. Finally, we combine the object features extracted from two branches to improve the ability of object classification.
- Using the above new model with two branches as the detector module, we cascade multiple detector modules in regression and classification analysis and improve the anchor box and classification effect using a gradually increasing IoU threshold.
2. Related Work
3. Model
- In the anchor-based branch, combining the location prediction of Grid R-CNN, the shape prediction of GA-RPN is introduced to improve the prediction of the anchor box shape. Thus we can get more accurate anchor boxes;
- Simultaneously, the FSAF branch (anchor-free branch) parallel with the anchor-based branch is employed to select more appropriate anchor boxes and object features;
- The detector modules are cascaded to address anchor boxes and image features, achieving a more accurate detection effect by gradually increasing the IoU threshold in training.
3.1. Anchor Box Prediction And Selection
3.2. Cascading Detectors
3.3. Implementation Details
4. Experiments
4.1. Experimental Procedure
Ablation Experiment
4.2. Results of Contrast Experiment with Baseline Models and Analysis
4.2.1. Pascal VOC2007 Experimental Results
4.2.2. COCO2017 Dataset Experimental Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zou, Z.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. arXiv 2019, arXiv:1905.05055. [Google Scholar]
- Szegedy, C.; Toshev, A.; Erhan, D. Deep neural networks for object detection. Adv. Neural Inf. Process. Syst. 2013, 26, 2553–2561. [Google Scholar]
- Chen, Z.M.; Wei, X.S.; Wang, P.; Guo, Y. Multi-label image recognition with graph convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 5177–5186. [Google Scholar]
- Sathyanarayana, A.; Sadjadi, S.O.; Hansen, J.H. Leveraging sensor information from portable devices towards automatic driving maneuver recognition. In Proceedings of the 2012 15th International IEEE Conference on Intelligent Transportation Systems, Anchorage, AK, USA, 16–19 September 2012; pp. 660–665. [Google Scholar]
- Jin, L.; Li, S.; La, H.M.; Zhang, X.; Hu, B. Dynamic task allocation in multi-robot coordination for moving target tracking: A distributed approach. Automatica 2019, 100, 75–81. [Google Scholar] [CrossRef]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–18 June 2020; pp. 9759–9768. [Google Scholar]
- Kong, T.; Sun, F.; Liu, H.; Jiang, Y.; Li, L.; Shi, J. FoveaBox: Beyound Anchor-Based Object Detection. IEEE Trans. Image Process. 2020, 29, 7389–7398. [Google Scholar] [CrossRef]
- Wang, J.; Chen, K.; Yang, S.; Loy, C.C.; Lin, D. Region proposal by guided anchoring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 2965–2974. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, B.; Yan, J.; Wu, W.; Zhu, Z.; Hu, X. High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8971–8980. [Google Scholar]
- Uijlings, J.R.; Van De Sande, K.E.; Gevers, T.; Smeulders, A.W. Selective search for object recognition. Int. J. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef] [Green Version]
- Henderson, P.; Ferrari, V. End-to-end training of object class detectors for mean average precision. In Asian Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 198–213. [Google Scholar]
- Lu, X.; Li, B.; Yue, Y.; Li, Q.; Yan, J. Grid r-cnn. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 7363–7372. [Google Scholar]
- Sun, Q.S.; Zeng, S.G.; Liu, Y.; Heng, P.A.; Xia, D.S. A new method of feature fusion and its application in image recognition. Pattern Recognit. 2005, 38, 2437–2448. [Google Scholar] [CrossRef]
- Bochinski, E.; Senst, T.; Sikora, T. Extending IOU based multi-object tracking by visual information. In Proceedings of the 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Auckland, New Zealand, 27–30 November 2018; pp. 1–6. [Google Scholar]
- Zhu, C.; He, Y.; Savvides, M. Feature selective anchor-free module for single-shot object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 840–849. [Google Scholar]
- Hosang, J.; Benenson, R.; Schiele, B. Learning non-maximum suppression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4507–4515. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European conference on computer vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Everingham, M.; Eslami, S.A.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes challenge: A retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, South Korea, 27 October–3 November 2019; pp. 6569–6578. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, South Korea, 27 October–3 November 2019; pp. 9627–9636. [Google Scholar]
- Li, Z.; Peng, C.; Yu, G.; Zhang, X.; Deng, Y.; Sun, J. Light-Head R-CNN: In Defense of Two-Stage Object Detector. arXiv 2017, arXiv:1711.07264. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: High Quality Object Detection and Instance Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2019. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, Z.; Sabuncu, M. Generalized cross entropy loss for training deep neural networks with noisy labels. Adv. Neural Inf. Process. Syst. 2018, 31, 8778–8788. [Google Scholar]
- Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. MMDetection: Open MMLab Detection Toolbox and Benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Fu, C.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. DSSD: Deconvolutional Single Shot Detector. arXiv 2017, arXiv:1701.06659. [Google Scholar]
- Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Single-shot refinement neural network for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4203–4212. [Google Scholar]
- Kosiorek, A.; Sabour, S.; Teh, Y.W.; Hinton, G.E. Stacked capsule autoencoders. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 15512–15522. [Google Scholar]
Method | Backbone | AP |
---|---|---|
Grid R-CNN [13] | ResNet-50 | 35.9 |
SGrid R-CNN | ResNet-50 | 36.5 |
FGrid R-CNN | ResNet-50 | 36.3 |
CGrid R-CNN | ResNet-50 | 36.8 |
Method | Backbone | AP |
---|---|---|
Grid R-CNN w FPN [13] | ResNet-50 | 55.3 |
Cascade R-CNN w FPN [25] | ResNet-50 | 54.9 |
RCGrid R-CNN w FPN | ResNet-50 | 55.5 |
Method | Backbone | AP | AP | AP | AP | AP | AP |
---|---|---|---|---|---|---|---|
Faster R-CNN w FPN [9] | ResNet-101 | 39.5 | 61.2 | 43.1 | 22.7 | 43.7 | 50.8 |
Grid R-CNN w FPN [13] | ResNet-101 | 41.2 | 60.3 | 44.4 | 23.4 | 45.8 | 54.1 |
Cascade R-CNN W FPN [25] | ResNet-101 | 42.7 | 61.6 | 46.6 | 23.8 | 46.2 | 57.4 |
RCGrid R-CNN w FPN | ResNet-101 | 43.1 | 61.4 | 46.9 | 23.9 | 46.6 | 58.0 |
Method | Backbone | AP | AP | AP | AP | AP | AP |
---|---|---|---|---|---|---|---|
SSD-513 [23] | ResNet-101 | 31.2 | 50.4 | 33.3 | 10.2 | 34.5 | 49.8 |
DSSD-513 [30] | ResNet-101 | 33.2 | 53.3 | 35.2 | 13.0 | 35.4 | 51.1 |
RefineDet-512 [31] | ResNet-101 | 36.4 | 57.5 | 39.5 | 16.6 | 39.9 | 51.4 |
Faster R-CNN++ [9] | ResNet-101 | 34.9 | 55.7 | 37.4 | 15.6 | 38.7 | 50.9 |
Faster R-CNN w FPN [9] | ResNet-101 | 36.2 | 59.1 | 39.0 | 18.2 | 39.0 | 48.2 |
Grid R-CNN w FPN [13] | ResNet-101 | 41.5 | 60.9 | 44.5 | 23.3 | 44.9 | 53.1 |
GA-RPN w FPN [8] | ResNet-101 | 39.8 | 59.2 | 43.5 | 21.8 | 40.1 | 48 |
Cascade R-CNN w FPN [25] | ResNet-101 | 42.8 | 62.1 | 46.3 | 23.7 | 45.5 | 55.2 |
RCGrid R-CNN w FPN | ResNet-101 | 43.3 | 62.0 | 46.8 | 23.9 | 46.0 | 55.9 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, Z.; Lai, Q.; Ding, S.; Liu, S. Novel Joint Object Detection Algorithm Using Cascading Parallel Detectors. Symmetry 2021, 13, 137. https://doi.org/10.3390/sym13010137
Zhou Z, Lai Q, Ding S, Liu S. Novel Joint Object Detection Algorithm Using Cascading Parallel Detectors. Symmetry. 2021; 13(1):137. https://doi.org/10.3390/sym13010137
Chicago/Turabian StyleZhou, Zihan, Qinghan Lai, Shuai Ding, and Song Liu. 2021. "Novel Joint Object Detection Algorithm Using Cascading Parallel Detectors" Symmetry 13, no. 1: 137. https://doi.org/10.3390/sym13010137
APA StyleZhou, Z., Lai, Q., Ding, S., & Liu, S. (2021). Novel Joint Object Detection Algorithm Using Cascading Parallel Detectors. Symmetry, 13(1), 137. https://doi.org/10.3390/sym13010137