Event-Based Pedestrian Detection Using Dynamic Vision Sensors
Abstract
:1. Introduction
- We propose an online pedestrian detector for asynchronous event streams. The approach allowed easy identification of pedestrians directly from the event stream data collected by DVS;
- We propose a novel event-to-frame encoding method to encode the event stream more effectively. Compared with previous methods, our method could thoroughly integrate the inherent characteristics of the events and improve the performance of pedestrian detection;
- We construct an asynchronous feature extracting scheme that could reuse the intermediate features to further decrease the calculation amount. This asynchronous encoding mechanism fits well with the inherent characteristic of asynchronous event streams;
- We autonomously collected and annotated a custom pedestrian detection dataset using the DAVIS346 event sensor and further evaluated the performance of our proposed event-to-frame encoding method and asynchronous pedestrian detection framework based on the dataset.
2. Methodology
2.1. Event-Frame Construction
2.1.1. Event-Stream Encoding Based on Frequency
2.1.2. Event-Stream Encoding Based on Surface of Active Events
2.1.3. Event-Stream Encoding Based on LIF Neuron Model
2.1.4. Our Proposed Event-Stream Encoding Method
Algorithm 1 Neighborhood suppression time surface (NSTS) |
Input: Event e = (x,y,t,p)
Output: Time Surface S(x,y,p) Initialization: S(x,y,p)←0 for all (x,y,p) Initialization: T(x,y,p)←0 for all (x,y,p) For each incoming event (x,y,t,p), extracting S(x + i,x + j)(−R ≤ i ≤ R,-R ≤ j ≤ R), update S: if t − T(x,y) ≥ Tthr do T(x,y,p)←t for each S(x + i,y + i,p) do S(x + i,y + j,p)←S(x + i,y + j,p) − 1 end for end if S(x,y,p)←0 |
2.2. Object Detection Based on CNN
2.2.1. Grids Partition Detection Model
2.2.2. Asynchronous Event Frame Detection Model
3. Experiments and Discussion
3.1. Datasets
3.2. Event-Frame Construction
3.3. Pedestrian Detection Performance
3.3.1. Comparison of Different Event Frame Encoding Methods
3.3.2. Comparison of Different CNN Detection Schemes
- (1)
- Grids partition detection model
- (2)
- Asynchronous event frame detection model
3.4. Discussion
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Mao, J.; Xiao, T.; Jiang, Y.; Cao, Z. What can help pedestrian detection? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3127–3136. [Google Scholar]
- Ye, M.; Shen, J.; Lin, G.; Xiang, T.; Shao, L.; Hoi, S.C. Deep learning for person re-identification: A survey and outlook. IEEE Trans. Pattern Anal. Mach. Intell. 2021. [Google Scholar] [CrossRef] [PubMed]
- Zhu, M.; Wu, Y. A Parallel Convolutional Neural Network for Pedestrian Detection. Electronics 2020, 9, 1478. [Google Scholar] [CrossRef]
- Jung, J.; Bae, S.-H. Real-time road lane detection in urban areas using LiDAR data. Electronics 2018, 7, 276. [Google Scholar] [CrossRef] [Green Version]
- Guo, Z.; Huang, Y.; Hu, X.; Wei, H.; Zhao, B. A Survey on Deep Learning Based Approaches for Scene Understanding in Autonomous Driving. Electronics 2021, 10, 471. [Google Scholar] [CrossRef]
- Gallego, G.; Delbruck, T.; Orchard, G.; Bartolozzi, C.; Scaramuzza, D. Event-based Vision: A Survey. arXiv 2019, arXiv:1904.08405. [Google Scholar] [CrossRef] [PubMed]
- Leñero-Bardallo, J.A.; Serrano-Gotarredona, T.; Linares-Barranco, B. A 3.6$\mu $ s Latency Asynchronous Frame-Free Event-Driven Dynamic-Vision-Sensor. IEEE J. Solid-State Circuits 2011, 46, 1443–1455. [Google Scholar] [CrossRef] [Green Version]
- Lakshmi, A.; Chakraborty, A.; Thakur, C.S. Neuromorphic vision: From sensors to event-based algorithms. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1310. [Google Scholar] [CrossRef]
- Haessig, G.; Benosman, R. A sparse coding multi-scale precise-timing machine learning algorithm for neuromorphic event-based sensors. In Proceedings of the Micro-and Nanotechnology Sensors Systems, and Applications X, Orlando, FL, USA, 15–19 April 2018; p. 106391U. [Google Scholar]
- Chen, N.F. Pseudo-labels for supervised learning on dynamic vision sensor data, applied to object detection under ego-motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 644–653. [Google Scholar]
- Li, J.; Dong, S.; Yu, Z.; Tian, Y.; Huang, T. Event-based vision enhanced: A joint detection framework in autonomous driving. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 8–12 July 2019; pp. 1396–1401. [Google Scholar]
- Jiang, Z.; Xia, P.; Huang, K.; Stechele, W.; Chen, G.; Bing, Z.; Knoll, A. Mixed frame-/event-driven fast pedestrian detection. In Proceedings of the International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 8332–8338. [Google Scholar]
- Chen, G.; Cao, H.; Ye, C.; Zhang, Z.; Liu, X.; Mo, X.; Qu, Z.; Conradt, J.; Röhrbein, F.; Knoll, A. Multi-cue event information fusion for pedestrian detection with neuromorphic vision sensors. Front. Neurorobotics 2019, 13, 10. [Google Scholar] [CrossRef] [PubMed]
- Mueggler, E.; Bartolozzi, C.; Scaramuzza, D. Fast event-based corner detection. In Proceedings of the British Machine Vision Conference (BMVC), London, UK, 4–7 September 2017. [Google Scholar]
- Mohamed, S.A.; Haghbayan, M.-H.; Heikkonen, J.; Tenhunen, H.; Plosila, J. Towards real-time edge detection for event cameras based on lifetime and dynamic slicing. In Proceedings of the Joint European-US Workshop on Applications of Invariance in Computer Vision, Ponta Delgada, Portugal, 9–14 October 1993; pp. 584–593. [Google Scholar]
- Miao, S.; Chen, G.; Ning, X.; Zi, Y.; Ren, K.; Bing, Z.; Knoll, A. Neuromorphic Vision Datasets for Pedestrian Detection, Action Recognition, and Fall Detection. Front. Neurorobotics 2019, 13. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, H.; Li, G.; Ji, X.; Shi, L. Deep representation via convolutional neural network for classification of spatiotemporal event streams. Neurocomputing 2018, 299, 1–9. [Google Scholar] [CrossRef]
- Fang, W. Leaky Integrate-and-Fire Spiking Neuron with Learnable Membrane Time Parameter. arXiv 2020, arXiv:abs/2007.05785. [Google Scholar]
- Sironi, A.; Brambilla, M.; Bourdis, N.; Lagorce, X.; Benosman, R. HATS: Histograms of averaged time surfaces for robust event-based object classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1731–1740. [Google Scholar]
- Yan, C.; Tu, Y.; Wang, X.; Zhang, Y.; Hao, X.; Zhang, Y.; Dai, Q. Stat: Spatial-temporal attention mechanism for video captioning. IEEE Trans. Multimed. 2019, 22, 229–241. [Google Scholar] [CrossRef]
- Choi, E.; Bahadori, M.T.; Sun, J.; Kulas, J.; Schuetz, A.; Stewart, W. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 3504–3512. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:abs/1804.02767. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V. Searching for mobilenetv3. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Berner, R.; Brandli, C.; Yang, M.; Liu, S.-C.; Delbruck, T. A 240 × 180 120 db 10 mw 12us-latency sparse output vision sensor for mobile applications. In Proceedings of the International Image Sensors Workshop, Snowbird, UT, USA, 12–16 June 2013; pp. 41–44. [Google Scholar]
- Loshchilov, I.; Hutter, F. Sgdr: Stochastic gradient descent with warm restarts. arXiv 2016, arXiv:abs/1608.03983. [Google Scholar]
Backbone Model | Input Network Resolution | Receptive Field Size | Parameters | FLOPs | FPS (GPU Tesla V100) | FPS (CPU Xeon W-2145) |
---|---|---|---|---|---|---|
Darknet53 | 352 × 352 | 725 × 725 | 40.58 M | 35.3 G | 93.5 | 3.4 |
MobileNetV3 | 352 × 352 | 639 × 639 | 2.51 M | 307.2 M | 76.7 | 27.6 |
Encoding Methods | AP (%) | Detect Time (ms) |
Frequency [10] | 81.48 | 47.58 |
SAE [14,15] | 82.81 | |
LIF [18] | 78.92 | |
NSTS (proposed) | 84.08 | |
NSTS–SAE (proposed) | 86.37 |
EBD | EBD–GP (Proposed) | |||
---|---|---|---|---|
AP (%) | Detect Time (ms) | AP (%) | Detect Time (ms) | |
Frequency [10] | 81.48 | 47.58 | 81.52 | 41.89 |
SAE [14,15] | 82.81 | 82.44 | ||
LIF [18] | 78.92 | 78.97 | ||
NSTS (proposed) | 84.08 | 83.96 | ||
NSTS–SAE (proposed) | 86.37 | 86.03 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wan, J.; Xia, M.; Huang, Z.; Tian, L.; Zheng, X.; Chang, V.; Zhu, Y.; Wang, H. Event-Based Pedestrian Detection Using Dynamic Vision Sensors. Electronics 2021, 10, 888. https://doi.org/10.3390/electronics10080888
Wan J, Xia M, Huang Z, Tian L, Zheng X, Chang V, Zhu Y, Wang H. Event-Based Pedestrian Detection Using Dynamic Vision Sensors. Electronics. 2021; 10(8):888. https://doi.org/10.3390/electronics10080888
Chicago/Turabian StyleWan, Jixiang, Ming Xia, Zunkai Huang, Li Tian, Xiaoying Zheng, Victor Chang, Yongxin Zhu, and Hui Wang. 2021. "Event-Based Pedestrian Detection Using Dynamic Vision Sensors" Electronics 10, no. 8: 888. https://doi.org/10.3390/electronics10080888
APA StyleWan, J., Xia, M., Huang, Z., Tian, L., Zheng, X., Chang, V., Zhu, Y., & Wang, H. (2021). Event-Based Pedestrian Detection Using Dynamic Vision Sensors. Electronics, 10(8), 888. https://doi.org/10.3390/electronics10080888