A Lightweight Pedestrian Detection Engine with Two-Stage Low-Complexity Detection Network and Adaptive Region Focusing Technique
Abstract
:1. Introduction
- A two-stage low-complexity neural network for pedestrian detection is proposed, which significantly reduces the number of parameters and operations of the detection neural network, while maintaining a high detection accuracy;
- An adaptive region focusing technique is proposed, which further reduces the computational complexity by removing the redundancy in the pedestrian detection in video streams;
- The proposed lightweight pedestrian detection engine has been implemented on a Xilinx FPGA Zynq7020, to evaluate its performance and power consumption.
2. Related Work
3. Proposed Pedestrian Detection Engine
3.1. Proposed Two-Stage Low-Complexity Pedestrian Detection Network
3.2. Proposed Adaptive Region Focusing Technique
4. FPGA Design
5. Experimental Results
5.1. Training and Testing Dataset
5.2. Training
- (1)
- Pedestrian classification: Pedestrian classification is used to distinguish whether the image in the frame is a pedestrian or a background, so this is a two-classification task. We use the cross-entropy loss function for training. For each sample , we use the following function:
- (2)
- Frame regression: Frame regression is used to reduce the position gap between the real frame and the predicted frame. Each frame includes the following four pieces of information: left border, upper border, height, and width. Therefore, we adopt Euclidean distance measure loss, as follows:
- (3)
- Joint loss function: since the network needs to complete two different tasks at the same time, it cannot use (9) or (10) alone as the loss function, so the joint loss function is introduced as follows:
5.3. Evaluation of Detection Accuracy
5.4. Evaluation of Computational Complexity
5.5. Evaluation of Speed and Power Consumption
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the In IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005; pp. 886–893. [Google Scholar]
- Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object Detection with Discriminatively Trained Part-Based Models. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1627–1645. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wojek, C.; Schiele, B. A performance evaluation of single and multi-feature people detection. In Pattern Recognition; Lecture Notes in Computer Science; Rigoll, G., Ed.; Springer: Berlin/Heidelberg, Germany, 2008; Volume 5096, pp. 82–91. [Google Scholar]
- Dollar, P.; Appel, R.; Belongie, S.; Perona, P. Fast Feature Pyramids for Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 1532–1545. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- He, Y.; Qin, Q.; Vychodil, J. A Pedestrian Detection Method Using SVM and CNN Multistage Classification. J. Inf. Hiding Multim. Signal Process. 2018, 9, 51–60. [Google Scholar]
- Zhang, X.; Cheng, L.; Li, B.; Hu, H. Too Far to See? Not Really!—Pedestrian Detection With Scale-Aware Localization Policy. IEEE Trans. Image Process. 2018, 27, 3703–3715. [Google Scholar] [CrossRef] [Green Version]
- Cai, Z.; Fan, Q.; Feris, R.S.; Vasconcelos, N. A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; Volume 9908, pp. 354–370. [Google Scholar]
- Lin, C.; Lu, J.; Wang, G.; Zhou, J. Graininess-Aware Deep Feature Learning for Pedestrian Detection. In European Conference on Computer Vision; Lecture Notes in Computer Science; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer: Cham, Switzerland, 2018; Volume 11213, pp. 745–761. [Google Scholar]
- Gao, Z.; Li, S.; Chen, J.; Li, Z. Pedestrian Detection Method Based on YOLO Network. Comput. Eng. 2018, 44, 215–219. [Google Scholar] [CrossRef]
- Peng, Q.; Luo, W.; Hong, G.; Feng, M.; Xia, Y.; Yu, L.; Hao, X.; Wang, X.; Li, M. Pedestrian Detection for Transformer Substation Based on Gaussian Mixture Model and YOLO. In Proceedings of the 2016 8th International Conference on Intelligent Human-Machine Systems and Cybernetics, Hangzhou, China, 27–28 August 2016; pp. 562–565. [Google Scholar]
- Byeon, Y.-H.; Kwak, K.-C. A Performance Comparison of Pedestrian Detection Using Faster RCNN and ACF. In Proceedings of the 2017 6th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), Hamamatsu, Japan, 9–13 July 2017; pp. 858–863. [Google Scholar]
- Xiaoqian, Y.; Yujuan, S.; Liangliang, L. Pedestrian detection based on improved Faster RCNN algorithm. In Proceedings of the 2019 IEEE/CIC International Conference on Communications in China (ICCC), Changchun, China, 11–13 August 2019; pp. 346–351. [Google Scholar] [CrossRef]
- Felzenszwalb, P.; McAllester, D.; Ramanan, D. A discriminatively trained, multiscale, deformable part model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
- Wang, X.; Han, T.X.; Yan, S. An HOG-LBP Human Detector with Partial Occlusion Handling. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 32–39. [Google Scholar]
- Sabzmeydani, P.; Mori, G. Detecting pedestrians by learning shapelet features. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; pp. 1328–1335. [Google Scholar]
- Schwartz, W.R.; Kembhavi, A.; Harwood, D.; Davis, L.S. Human Detection Using Partial Least Squares Analysis. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 24–31. [Google Scholar]
- Wang, S.; Cheng, J.; Liu, H.; Wang, F.; Zhou, H. Pedestrian Detection via Body Part Semantic and Contextual Information With DNN. IEEE Trans. Multimed. 2018, 20, 3148–3159. [Google Scholar] [CrossRef]
- Liu, T.; Stathaki, T. Enhanced Pedestrian Detection using Deep Learning based Semantic Image Segmentation. In Proceedings of the 2017 22nd International Conference on Digital Signal Processing, London, UK, 23–25 August 2017. [Google Scholar]
- Lin, C.-Y.; Xie, H.-X.; Zheng, H. PedJointNet: Joint Head-Shoulder and Full body Deep Network for Pedestrian Detection. IEEE Access 2019, 7, 47687–47697. [Google Scholar] [CrossRef]
- Cao, J.; Pang, Y.; Han, J.; Gao, B.; Li, X. Taking a Look at Small-Scale Pedestrians and Occluded Pedestrians. IEEE Trans. Image Process. 2020, 29, 3143–3152. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yin, R. Multi-resolution generative adversarial networks for tiny-scale pedestrian detection. In Proceedings of the 2019 IEEE International Conference on Image Processing, Taipei, Taiwan, 22–25 September 2019; pp. 1665–1669. [Google Scholar]
- Li, X.; Liu, Y.; Chen, Z.; Zhou, J.; Wu, Y. Fused discriminative metric learning for low resolution pedestrian detection. In Proceedings of the 2018 25th IEEE International Conference on Image Processing, Athens, Greece, 7–10 October 2018; pp. 958–962. [Google Scholar]
- Kruthiventi, S.S.S.; Sahay, P.; Biswal, R. Low-light pedestrian detection from rgb images using multi-modal knowledge distillation. In Proceedings of the 2017 24th IEEE International Conference on Image Processing, Beijing, China, 17–20 September 2017; pp. 4207–4211. [Google Scholar]
- Xu, D.; Ouyang, W.; Ricci, E.; Wang, X.; Sebe, N. Learning Cross-Modal Deep Representations for Robust Pedestrian Detection. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4236–4244. [Google Scholar]
- Chuanyi, H.; Jinlei, Z.; Feng, L.; Shengkai, W.; Houjin, C. Design of lightweight pedestrian detection network in railway scenes. J. Phys. Conf. Ser. 2020, 1544, 012053. [Google Scholar] [CrossRef]
- Huang, R.; Pedoeem, J.; Chen, C. YOLO-LITE: A Real-Time Object Detection Algorithm Optimized for Non-GPU Computers. In Proceedings of the 2018 IEEE International Conference on Big Data, Seattle, WA, USA, 10–13 December 2018; pp. 2503–2510. [Google Scholar]
- Dollar, P.; Wojek, C.; Schiele, B.; Perona, P. Pedestrian Detection: A Benchmark. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; Volumes 1–4; pp. 304–311. [Google Scholar]
- Luo, P.; Tian, Y.; Wang, X.; Tang, X. Switchable Deep Network for Pedestrian Detection. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 899–906. [Google Scholar]
- Tian, Y.; Luo, P.; Wang, X.; Tang, X. Deep Learning Strong Parts for Pedestrian Detection. In Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1904–1912. [Google Scholar]
- Angelova, A.; Krizhevsky, A.; Vanhoucke, V. Pedestrian Detection with a Large-Field-Of-View Deep Network. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation, Seattle, WA, USA, 26–30 May 2015; pp. 704–711. [Google Scholar]
- Li, J.; Liang, X.; Shen, S.; Xu, T.; Feng, J.; Yan, S. Scale-Aware Fast R-CNN for Pedestrian Detection. IEEE Trans. Multimed. 2018, 20, 985–996. [Google Scholar] [CrossRef] [Green Version]
- Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Occlusion-Aware R-CNN: Detecting Pedestrians in a Crowd. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Volume 11207, pp. 657–674. [Google Scholar]
- Liu, W.; Liao, S.; Hu, W.; Liang, X.; Chen, X. Learning Efficient Single-Stage Pedestrian Detectors by Asymptotic Localization Fitting. In Proceedings of the Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Volume 11218, pp. 643–659. [Google Scholar]
- Liu, W.; Liao, S.; Ren, W.; Hu, W.; Yu, Y.; Soc, I.C. High-level Semantic Feature Detection: A New Perspective for Pedestrian Detection. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5182–5191. [Google Scholar]
- Zhang, Y.; Yi, P.; Zhou, D.; Yang, X.; Yang, D.; Zhang, Q.; Wei, X. CSANet: Channel and Spatial Mixed Attention CNN for Pedestrian Detection. IEEE Access 2020, 8, 76243–76252. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Lin, C.; Lu, J.; Zhou, J. Multi-Grained Deep Feature Learning for Robust Pedestrian Detection. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 3608–3621. [Google Scholar] [CrossRef]
- Zeng, X.; Ouyang, W.; Wang, X. Multi-stage Contextual Deep Learning for Pedestrian Detection. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 1–8 December 2013; pp. 121–128. [Google Scholar]
- Paisitkriangkrai, S.; Shen, C.; van den Hengel, A. Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2014; pp. 546–561. [Google Scholar]
- Hosang, J.; Omran, M.; Benenson, R.; Schiele, B. Taking a deeper look at pedestrians. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 4073–4082. [Google Scholar]
Stage | Layer | Convolution Kernel Size (CK) | Convolution Stride (CS) | Max Pooling Kernel Size (PK) | Max Pooling Stride (PS) |
---|---|---|---|---|---|
First stage | 1 | 6 * 4 | 1 | 2 * 2 | 2 |
2 | 5 * 3 | 1 | / | / | |
3 | 5 * 3 | 1 | / | / | |
4 | 4 * 2 | 1 | / | / | |
5 | 4 * 1 | 1 | / | / | |
6 | 3 * 1 | 1 | / | / | |
Second stage | 1 | 5 * 3 | 1 | 2 * 2 | 2 |
2 | 5 * 2 | 1 | 2 * 2 | 2 | |
3 | 6 * 2 | 1 | 2 * 2 | 2 | |
4 | 4 * 2 | 1 | / | / | |
5 | 3 * 1 | 1 | / | / |
LUTs | Registers | DSPs | Block RAM | |
---|---|---|---|---|
utilization | 87,049 | 93,981 | 396 | 14,652 Kb |
Categories | IOU |
---|---|
Positive | ≥0.65 |
Partial | ≥0.4 ∩ <0.65 |
Negative | <0.3 |
Methods | Parameters | Operations | Precision | Miss Rate |
---|---|---|---|---|
OURS-CR | 0.73M | 1.04B | 85.18% | 25.16% |
OURS-SR | 0.73M | 2.75B | 85.18% | 25.16% |
SDN [28] 2 | \ | \ | 84.18% | 37.87% |
DeepParts [29] 2 | 187.10M | 6.81B | 88.70% | 11.89% |
LFOV [30] 2 | 135.27M | 0.64B | 84.64% | 35.85% |
SA-FastRCNN [31] 2 | 266.84M | 41.35B | 88.39% | 9.68% |
MS-CNN [7] 2 | ~217M | \ | 88.66% | 9.95% |
OR-CNN [32] 2 | 138.34M | 30.94B | \ | 4.1% |
ALFNet [33] 2 | 48.4M | 5.07B | \ | 22.5% |
CSP [34] 2 | ~31.23M | ~67.03B | \ | 4.5% |
CSANet [35] 2 | ~22.66M | ~11.54B | \ | 3.88% |
YOLOv3-Tiny 3 | 7.86M | 5.56B | 83.74% 1 | 40.66% 1 |
YOLOv3 [36] 3 | 61.57M | 65.86B | 88.51% 1 | 14.63% 1 |
YOLOV4 [37] 3 | 64.03M | 62.25B | 88.66% 1 | 10.21% 1 |
YOLOV5s 3 | 7.3M | 17.0B | 88.16% 1 | 25.65% 1 |
MDFL [38] 2 | ~276.69 | ~61.88B | \ | 31.46% |
MultiSDP [39] 2 | \ | \ | 82.51% | 45.39% |
DBN-Mut [40] 2 | \ | \ | 79.48% | 48.22% |
SCF+AlexNet [41] 2 | 233M | 727M | 88.39% | 23.32% |
Methods | FPS | Power (W) |
---|---|---|
OURS-CR | 16.3 | 0.59 |
OURS-SR | 8.6 | 0.68 |
YOLOv3-Tiny | 12.8 | 0.95 |
YOLOv3 [36] | 5.3 | 2.43 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Que, L.; Zhang, T.; Guo, H.; Jia, C.; Gong, Y.; Chang, L.; Zhou, J. A Lightweight Pedestrian Detection Engine with Two-Stage Low-Complexity Detection Network and Adaptive Region Focusing Technique. Sensors 2021, 21, 5851. https://doi.org/10.3390/s21175851
Que L, Zhang T, Guo H, Jia C, Gong Y, Chang L, Zhou J. A Lightweight Pedestrian Detection Engine with Two-Stage Low-Complexity Detection Network and Adaptive Region Focusing Technique. Sensors. 2021; 21(17):5851. https://doi.org/10.3390/s21175851
Chicago/Turabian StyleQue, Luying, Teng Zhang, Hongtao Guo, Conghan Jia, Yuchuan Gong, Liang Chang, and Jun Zhou. 2021. "A Lightweight Pedestrian Detection Engine with Two-Stage Low-Complexity Detection Network and Adaptive Region Focusing Technique" Sensors 21, no. 17: 5851. https://doi.org/10.3390/s21175851
APA StyleQue, L., Zhang, T., Guo, H., Jia, C., Gong, Y., Chang, L., & Zhou, J. (2021). A Lightweight Pedestrian Detection Engine with Two-Stage Low-Complexity Detection Network and Adaptive Region Focusing Technique. Sensors, 21(17), 5851. https://doi.org/10.3390/s21175851