Pedestrian Detection by Novel Axis-Line Representation and Regression Pattern
Abstract
:1. Introduction
- 1
- We propose a detection pattern ALR, which uses a simpler 3-d axis-line representation and regression strategy as an alternative to the traditional 4-d bounding box to obtain purer and stronger internal information of pedestrians in road scenes. In addition, we propose a line-box transformation method to fit the benchmark annotations. In particular, the idea of ALR can be introduced into both anchor-free and anchor-based methods.
- 2
- We propose a deformable convolution base-offset initialization strategy towards a more aligned receptive field, and further improvement of detection performance by forcing the aspect ratio of the deformable convolution kernel close to the pedestrian aspect ratio.
- 3
- Several experiments are carried out on two benchmark datasets (the Caltech-USA and the CityPersons) to demonstrate the effectiveness and generalization of the proposed ALR pattern in both anchor-free and anchor-based methods.
2. Related Work
2.1. Generic Object Detection
2.2. Pedestrian Detection
3. Proposed Method
3.1. Introducing ALR into the Anchor-Free Method
3.1.1. Axis-Line Representation and Regression
3.1.2. Deformable Convolution Base-Offset Initialization Strategy
3.1.3. Line-Box Transformation Method
3.2. Introducing ALR into the Anchor-Based Method
3.2.1. Axis-Line Encoder and Decoder
3.2.2. Loss Calculation Manner
4. Experiments
4.1. Datasets and Evaluation Metric
4.1.1. Caltech-USA Dataset
4.1.2. CityPersons Dataset
4.1.3. Evaluation Metric
4.2. Implementation Details
4.3. Detection Results of RPDet-Based Models
4.3.1. Overall Performance
4.3.2. Ablation Study
4.3.3. Influence of Deformable Convolution Base-Offset
4.4. Detection Results of FRCNN-Based Models
4.4.1. Overall Performance
4.4.2. Comparison with Other Methods
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Chen, Y.; Yang, T.; Li, C.; Zhang, Y. A Binarized Segmented ResNet Based on Edge Computing for Re-Identification. Sensors 2020, 20, 6902. [Google Scholar] [CrossRef] [PubMed]
- Yang, Q.; Wang, P.; Fang, Z.; Lu, Q. Focus on the Visible Regions: Semantic-Guided Alignment Model for Occluded Person Re-Identification. Sensors 2020, 20, 4431. [Google Scholar] [CrossRef] [PubMed]
- Yao, Z.; Wu, X.; Xiong, Z.; Ma, Y. A Dynamic Part-Attention Model for Person Re-Identification. Sensors 2019, 19, 2080. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ababsa, F.; Hadj-Abdelkader, H.; Boui, M. 3D Human Pose Estimation with a Catadioptric Sensor in Unconstrained Environments Using an Annealed Particle Filter. Sensors 2020, 20, 6985. [Google Scholar] [CrossRef] [PubMed]
- Huang, Q.; Hao, K. Development of CNN-based visual recognition air conditioner for smart buildings. J. Inform. Technol. Constr. 2020, 25, 361–373. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26 June 2005; pp. 886–893. [Google Scholar] [CrossRef] [Green Version]
- Viola, P.; Jones, M.J. Robust real-time face detection. Int. J. Comput. Vis. 2004, 57, 137–154. [Google Scholar] [CrossRef]
- Zhang, S.; Benenson, R.; Schiele, B. Citypersons: A diverse dataset for pedestrian detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4457–4465. [Google Scholar] [CrossRef] [Green Version]
- Zhang, L.; Lin, L.; Liang, X.; He, K. Is faster R-CNN doing well for pedestrian detection? In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 443–457. [Google Scholar] [CrossRef] [Green Version]
- Law, H.; Deng, J. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 734–750. [Google Scholar] [CrossRef] [Green Version]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–3 November 2019; pp. 6568–6577. [Google Scholar] [CrossRef] [Green Version]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cai, Z.; Fan, Q.; Feris, R.S.; Vasconcelos, N. A unified multi-scale deep convolutional neural network for fast object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 354–370. [Google Scholar] [CrossRef] [Green Version]
- Yang, Z.; Liu, S.; Hu, H.; Wang, L.; Lin, S. Reppoints: Point set representation for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–3 November 2019; pp. 9657–9666. [Google Scholar] [CrossRef] [Green Version]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar] [CrossRef] [Green Version]
- Dollar, P.; Wojek, C.; Schiele, B.; Perona, P. Pedestrian detection: An evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 743–761. [Google Scholar] [CrossRef] [PubMed]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, X.; Chen, K.; Huang, Z.; Yao, C.; Liu, W. Point linking network for object detection. arXiv 2017, arXiv:1706.03646. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–3 November 2019; pp. 9626–9635. [Google Scholar] [CrossRef] [Green Version]
- Kong, T.; Sun, F.; Liu, H.; Jiang, Y.; Li, L.; Shi, J. Foveabox: Beyound anchor-based object detection. IEEE Trans. Image Process. 2020, 29, 7389–7398. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 8–10 June 2015; pp. 3431–3440. [Google Scholar] [CrossRef] [Green Version]
- Zhou, X.; Zhuo, J.; Krahenbuhl, P. Bottom-up object detection by grouping extreme and center points. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 850–859. [Google Scholar] [CrossRef] [Green Version]
- Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1627–1645. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jafari, O.H.; Yang, M.Y. Real-time rgb-d based template matching pedestrian detection. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 5520–5527. [Google Scholar] [CrossRef] [Green Version]
- Yan, J.; Zhang, X.; Lei, Z.; Liao, S.; Li, S.Z. Robust multi-resolution pedestrian detection in traffic scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 25–27 June 2013; pp. 3033–3040. [Google Scholar] [CrossRef] [Green Version]
- Zhang, S.; Yang, J.; Schiele, B. Occluded pedestrian detection through guided attention in cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 19–21 June 2018; pp. 6995–7003. [Google Scholar] [CrossRef]
- Li, J.; Liang, X.; Shen, S.; Xu, T.; Feng, J.; Yan, S. Scale-aware fast R-CNN for pedestrian detection. IEEE Trans. Multimed. 2018, 20, 985–996. [Google Scholar] [CrossRef] [Green Version]
- Zhou, C.; Yuan, J. Bi-box regression for pedestrian detection and occlusion estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 135–151. [Google Scholar] [CrossRef]
- Zhang, K.; Xiong, F.; Sun, P.; Hu, L.; Li, B.; Yu, G. Double anchor R-CNN for human detection in a crowd. arXiv 2019, arXiv:1909.09998. [Google Scholar]
- Pang, Y.; Xie, J.; Khan, M.H.; Anwer, R.M.; Khan, F.S.; Shao, L. Mask-guided attention network for occluded pedestrian detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–3 November 2019; pp. 4966–4974. [Google Scholar] [CrossRef] [Green Version]
- Chi, C.; Zhang, S.; Xing, J.; Lei, Z.; Li, S.Z.; Zou, X. Pedhunter: Occlusion robust pedestrian detector in crowded scenes. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 10639–10646. [Google Scholar] [CrossRef]
- Chi, C.; Zhang, S.; Xing, J.; Lei, Z.; Li, S.Z.; Zou, X. Relational learning for joint head and human detection. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 10647–10654. [Google Scholar] [CrossRef]
- Liu, W.; Liao, S.; Ren, W.; Hu, W.; Yu, Y. High-level semantic feature detection: A new perspective for pedestrian detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 5182–5191. [Google Scholar] [CrossRef]
- Zhang, J.; Lin, L.; Zhu, J.; Li, Y.; Chen, Y.; Hu, Y.; Hoi, C.H.S. Attribute-aware pedestrian detection in a crowd. IEEE Trans. Multimed. 2020. [Google Scholar] [CrossRef]
- Song, T.; Sun, L.; Xie, D.; Sun, H.; Pu, S. Small-scale pedestrian detection based on somatic topology localization and temporal feature aggregation. arXiv 2018, arXiv:1807.01438. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
- Zhang, S.; Benenson, R.; Omran, M.; Hosang, B.; Schiele, B. How far are we from solving pedestrian detection? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1259–1267. [Google Scholar] [CrossRef] [Green Version]
- Wang, S.; Cheng, J.; Liu, H.; Tang, M. Pcn: Part and context information for pedestrian detection with cnns. arXiv 2018, arXiv:1804.04483. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inform. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- Tian, Y.; Luo, P.; Wang, X.; Tang, X. Deep learning strong parts for pedestrian detection. In Proceedings of the IEEE International Conference on Computer Vision (CVPR), Santiago, Chile, 8–10 June 2015; pp. 1904–1912. [Google Scholar] [CrossRef]
- Cai, Z.; Saberian, M.; Vasconcelos, N. Learning complexity-aware cascades for deep pedestrian detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2195–2211. [Google Scholar] [CrossRef] [PubMed]
- Du, X.; El-Khamy, M.; Lee, J.; Davis, L. Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection. In Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA, 24–31 March 2017; pp. 953–961. [Google Scholar] [CrossRef] [Green Version]
- Wang, X.; Xiao, T.; Jiang, Y.; Shao, S.; Sun, J.; Shen, C. Repulsion loss: Detecting pedestrians in a crowd. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 19–21 June 2018; pp. 7774–7783. [Google Scholar] [CrossRef]
- Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Occlusion-aware R-CNN: Detecting pedestrians in a crowd. In Proceedings of the European Conference on Computer Vision (ECCV); Munich, Germany, 8–14 September 2018, pp. 637–653. [CrossRef] [Green Version]
Method | Reasonable | All | Near | Medium | Bare | Partial | Heavy | |||
---|---|---|---|---|---|---|---|---|---|---|
RPDet | 13.0 | 65.7 | 2.2 | 47.7 | 10.3 | 34.4 | 59.1 | |||
12.2 | 64.6 | 4.6 | 47.6 | 10.0 | 34.9 | 57.3 | ||||
11.9 | 64.0 | 4.3 | 46.4 | 9.9 | 41.5 | 59.4 | ||||
10.2 | 64.9 | 3.7 | 48.1 | 8.7 | 35.7 | 56.5 | ||||
RPDet+ALR (ours) | 9.4 (+3.6) | 63.2 (+2.5) | 1.8 (+0.4) | 44.8 (+2.9) | 7.1 (+3.2) | 33.9 (+0.5) | 56.4 (+2.7) |
Method | Reasonable | All | Large | Middle | Small | Bare | Partial | Heavy |
---|---|---|---|---|---|---|---|---|
RPDet | 22.1 | 53.2 | 13.2 | 13.8 | 32.6 | 13.6 | 24.8 | 73.8 |
RPDet+ALR (ours) | 17.5 (+4.6) | 49.7 (+3.5) | 10.9 (+2.3) | 7.5 (+6.3) | 24.7 (+7.9) | 9.3 (+4.3) | 18.1 (+6.7) | 69.5 (+4.3) |
Method | K-Ratio | Reasonable | All | Near | Medium | Bare | Partial | Heavy | |
---|---|---|---|---|---|---|---|---|---|
RPDet+ALR (ours) | 1 | 1.0 | 10.2 | 64.9 | 3.7 | 48.1 | 8.7 | 35.7 | 56.5 |
2 | 0.6 | 9.8 | 64.6 | 2.8 | 48.0 | 8.2 | 38.4 | 58.3 | |
3 | 0.42 | 9.7 | 64.3 | 2.1 | 47.1 | 7.9 | 37.8 | 57.3 | |
4 | 0.33 | 9.4 | 63.2 | 1.8 | 44.8 | 7.1 | 33.9 | 56.4 | |
5 | 0.27 | 11.7 | 65.0 | 3.9 | 48.9 | 10.4 | 33.9 | 58.3 | |
6 | 0.23 | 12.5 | 65.8 | 3.7 | 49.6 | 10.6 | 36.8 | 61.8 |
Method | Reasonable | All | Near | Medium | Bare | Partial | Heavy |
---|---|---|---|---|---|---|---|
FRCNN | 8.6 | 63.9 | 1.9 | 47.2 | 8.2 | 28.1 | 62.1 |
FRCNN+ALR (ours) | 6.5 (+2.1) | 62.4 (+1.5) | 1.8 (+0.1) | 46.1 (+1.1) | 5.7 (+2.5) | 20.0 (+8.1) | 56.2 (+5.9) |
Method | Reasonable | All | Large | Middle | Small | Bare | Partial | Heavy |
---|---|---|---|---|---|---|---|---|
FRCNN | 13.9 | 47.0 | 8.4 | 6.4 | 23.6 | 7.8 | 14.7 | 67.5 |
FRCNN+ALR (ours) | 12.5 (+1.4) | 46.1 (+0.9) | 7.5 (+0.9) | 7.5 (−1.1) | 19.5 (+4.1) | 7.1 (+0.7) | 13.0 (+1.7) | 67.1 (+0.4) |
Method | Backbone | Reasonable |
---|---|---|
DeepParts | AlexNet | 12.9 |
MS-CNN | VGG16 | 9.5 |
CompACT-Deep | VGG16 | 9.2 |
FRCNN (baseline) | ResNet50 | 8.6 |
ATT-part | VGG16 | 8.1 |
F-DNN+SS | VGG16 | 7.6 |
SA-FRCNN | VGG16 | 7.5 |
RPN+BF | VGG16 | 7.3 |
Repulsion Loss | ResNet101 | 4.0 |
FRCNN+ALR (ours) | ResNet50 | 6.5 |
Method | Backbone | Reasonable |
---|---|---|
ATT-vbb | VGG16 | 16.4 |
Adapted FRCNN | VGG16 | 15.4 |
TLL+MRF | ResNet50 | 14.4 |
FRCNN (baseline) | ResNet50 | 13.9 |
Repulsion Loss | ResNet101 | 13.7 |
OR-CNN | VGG16 | 12.8 |
FRCNN+ALR (ours) | ResNet50 | 12.5 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, M.; Liu, Q. Pedestrian Detection by Novel Axis-Line Representation and Regression Pattern. Sensors 2021, 21, 3312. https://doi.org/10.3390/s21103312
Zhang M, Liu Q. Pedestrian Detection by Novel Axis-Line Representation and Regression Pattern. Sensors. 2021; 21(10):3312. https://doi.org/10.3390/s21103312
Chicago/Turabian StyleZhang, Mengxue, and Qiong Liu. 2021. "Pedestrian Detection by Novel Axis-Line Representation and Regression Pattern" Sensors 21, no. 10: 3312. https://doi.org/10.3390/s21103312
APA StyleZhang, M., & Liu, Q. (2021). Pedestrian Detection by Novel Axis-Line Representation and Regression Pattern. Sensors, 21(10), 3312. https://doi.org/10.3390/s21103312