Adaptive Multi-Pedestrian Tracking by Multi-Sensor: Track-to-Track Fusion Using Monocular 3D Detection and MMW Radar
Abstract
:1. Introduction
- An improved 3D monocular multi-pedestrian tracking-by-detection method is implemented, with its measurement noise modeled by the detection uncertainty from the 3D pedestrian detection neural network.
- A novel track-to-track fusion strategy is proposed to integrate the pedestrian tracks obtained by MMW radar and monocular camera. The adaptive multi-pedestrian tracking strategy is able to automatically detect the occurence and handle challenging weather condition, low-illumination condition and clutter situation. Also, the track-to-track fusion approach enables the pedestrians to be more accurately tracked by individual sensors before fusion.
- The performance of the proposed tracking strategy is compared in both normal and challenging scenarios using the optimal sub-pattern assignment (OSPA) metric. The superiority of the fusion approach is demonstrated both intuitively and numerically.
2. Related Work
2.1. Radar Pedestrian Tracking
2.2. Monocular 3D Pedestrian Detection
2.3. Fusion of Radar and Camera
2.4. Tracking Evaluation Metrics
3. Methods
3.1. MMW Radar Pedestrian Tracking
3.2. Monocular Vision Pedestrian Detection and Tracking
3.2.1. Bird’s-Eye View Monocular Detection
3.2.2. Tracking by Detection
3.3. Track-to-Track Fusion Strategy
3.3.1. Extrinsic Calibration
3.3.2. Fusion Strategy
Algorithm 1 Track-to-track fusion strategy for pedestrian tracking |
3.4. Evaluation Metric of the Tracking Performance
4. Experimental Results
4.1. Test Setup
4.2. Monocular 3D Localization Noise Model Validation
4.3. Tracking Performance with the Proposed Fusion Strategy
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
LiDAR | Light Detection and Ranging |
GNN | Global Nearest Neighbour |
JPDA | Joint Probabilistic Data Association |
MHT | Multiple Hypothesis Tracking |
MMW | Millimeter Wave |
PHD | Probability Hypothesis Density |
GM-PHD | Gaussian Mixture Probability Hypothesis Density |
CNN | Convolutional Neural Network |
ROI | Region of Interest |
EKF | Extended Kalman filter |
UKF | Unscented Kalman filter |
HOG | Histograms of Gradients |
OSPA | Optimal Sub-Pattern Assignment |
GOSPA | Generalized Optimal Sub-Pattern Assignment |
MOTP | Multiple Object Tracking Precision |
MOTA | Multiple Object Tracking Accuracy |
MIMO | Multiple-input multiple-output |
FMCW | Frequency Modulated Continuous Wave |
IF | Intermediate Frequency |
SNR | Signal-to-noise ratio |
CFAR | Constant False Alarm Rate |
BEV | Bird’s Eye View |
References
- Vo, B.N.; Mallick, M.; Bar-Shalom, Y.; Coraluppi, S.; Osborne, R.; Mahler, R.; Vo, B.T. Multitarget tracking. In Wiley Encyclopedia of Electrical and Electronics Engineering; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2015. [Google Scholar]
- Feichtenhofer, C.; Pinz, A.; Zisserman, A. Detect to Track and Track to Detect. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3057–3065. [Google Scholar] [CrossRef] [Green Version]
- Andriluka, M.; Roth, S.; Schiele, B. People-tracking-by-detection and people-detection-by-tracking. In Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Anchorage, AK, USA, 24–26 June 2008. [Google Scholar] [CrossRef]
- Radosavljevic, Z. A study of a target tracking method using Global Nearest Neighbor algorithm. Vojnoteh. Glas. 2006, 54, 160–167. [Google Scholar] [CrossRef]
- Fortmann, T.E.; Bar-Shalom, Y.; Scheffe, M. Sonar Tracking of Multiple Targets Using Joint Probabilistic Data Association. IEEE J. Ocean. Eng. 1983, 8, 173–184. [Google Scholar] [CrossRef] [Green Version]
- Reid, D.B. An Algorithm for Tracking Multiple Targets. IEEE Trans. Autom. Control 1979, 24, 843–854. [Google Scholar] [CrossRef]
- Kalman, R.E. A new approach to linear filtering and prediction problems. J. Fluids Eng. Trans. ASME 1960, 82, 35–45. [Google Scholar] [CrossRef] [Green Version]
- Julier, S.J.; Uhlmann, J.K. New extension of the Kalman filter to nonlinear systems. In Signal Processing, Sensor Fusion, and Target Recognition VI; International Society for Optics and Photonics: Bellingham, WA, USA, 1997; Volume 3068, pp. 182–193. [Google Scholar] [CrossRef]
- Welch, G.; Bishop, G. An Introduction to the Kalman Filter; Department of Computer Science, University of North Carolina: Chapel Hill, NC, USA, 2006. [Google Scholar]
- Blackman, S.S.; Popoli, R. Design and Analysis of Modern Tracking Systems; Artech House: Norwood, MA, USA, 1999. [Google Scholar]
- Blackman, S.S. Multiple hypothesis tracking for multiple target tracking. IEEE Aerosp. Electron. Syst. Mag. 2004, 19, 5–18. [Google Scholar] [CrossRef]
- Mahler, R.P. Multitarget Bayes Filtering via First-Order Multitarget Moments. IEEE Trans. Aerosp. Electron. Syst. 2003, 39, 1152–1178. [Google Scholar] [CrossRef]
- Arnold, E.; Al-Jarrah, O.Y.; Dianati, M.; Fallah, S.; Oxtoby, D.; Mouzakitis, A. A Survey on 3D Object Detection Methods for Autonomous Driving Applications. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3782–3795. [Google Scholar] [CrossRef] [Green Version]
- Qian, R.; Lai, X.; Li, X. 3D Object Detection for Autonomous Driving: A Survey. arXiv 2021, arXiv:2106.10823. [Google Scholar]
- Van Berlo, B.; Elkelany, A.; Ozcelebi, T.; Meratnia, N. Millimeter Wave Sensing: A Review of Application Pipelines and Building Blocks. IEEE Sens. J. 2021, 21, 10332–10368. [Google Scholar] [CrossRef]
- Wang, Z.; Wu, Y.; Niu, Q. Multi-Sensor Fusion in Automated Driving: A Survey. IEEE Access 2020, 8, 2847–2868. [Google Scholar] [CrossRef]
- Mahler, R.P. Statistical Multisource-Multitarget Information Fusion; Artech House, Inc.: Norwood, MA, USA, 2007. [Google Scholar]
- Song, H.; Choi, W.; Kim, H. Robust Vision-Based Relative-Localization Approach Using an RGB-Depth Camera and LiDAR Sensor Fusion. IEEE Trans. Ind. Electron. 2016, 63, 3725–3736. [Google Scholar] [CrossRef]
- Samal, K.; Kumawat, H.; Saha, P.; Wolf, M.; Mukhopadhyay, S. Task-driven RGB-Lidar Fusion for Object Tracking in Resource-Efficient Autonomous System. IEEE Trans. Intell. Veh. 2021, 8858, 1–11. [Google Scholar] [CrossRef]
- Zhao, X.; Sun, P.; Xu, Z.; Min, H.; Yu, H. Fusion of 3D LIDAR and Camera Data for Object Detection in Autonomous Vehicle Applications. IEEE Sens. J. 2020, 20, 4901–4913. [Google Scholar] [CrossRef] [Green Version]
- Yang, B.; Guo, R.; Liang, M.; Casas, S.; Urtasun, R. RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2020; Volume 12363 LNCS, pp. 496–512. [Google Scholar] [CrossRef]
- Otto, C.; Gerber, W.; León, F.P.; Wirnitzer, J. A joint integrated probabilistic data association filter for pedestrian tracking across blind regions using monocular camera and radar. In Proceedings of the IEEE Intelligent Vehicles Symposium, Madrid, Spain, 3–7 June 2012; pp. 636–641. [Google Scholar] [CrossRef]
- Liu, F.; Sparbert, J.; Stiller, C. IMMPDA vehicle tracking system using asynchronous sensor fusion of radar and vision. In Proceedings of the IEEE Intelligent Vehicles Symposium, Eindhoven, The Netherlands, 4–6 June 2008. [Google Scholar] [CrossRef]
- Dimitrievski, M.; Jacobs, L.; Veelaert, P.; Philips, W. People Tracking by Cooperative Fusion of RADAR and Camera Sensors. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference, ITSC, Auckland, New Zealand, 27–30 October 2019; Volume 2019, pp. 509–514. [Google Scholar] [CrossRef] [Green Version]
- Xu, D.; Anguelov, D.; Jain, A. PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 244–253. [Google Scholar]
- Zhong, Z.; Liu, S.; Mathew, M.; Dubey, A. Camera radar fusion for increased reliability in ADAS applications. In Proceedings of the IS and T International Symposium on Electronic Imaging Science and Technology, Burlingame, CA, USA, 28 January–1 February 2018; pp. 1–4. [Google Scholar] [CrossRef]
- Kim, D.Y.; Jeon, M. Data fusion of radar and image measurements for multi-object tracking via Kalman filtering. Inf. Sci. 2014, 278, 641–652. [Google Scholar] [CrossRef]
- Nobis, F.; Geisslinger, M.; Weber, M.; Betz, J.; Lienkamp, M. A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection. In Proceedings of the 2019 Symposium on Sensor Data Fusion: Trends, Solutions, Applications, SDF 2019, Bonn, Germany, 15–17 October 2019. [Google Scholar] [CrossRef]
- Cho, H.; Seo, Y.W.; Kumar, B.V.; Rajkumar, R.R. A multi-sensor fusion system for moving object detection and tracking in urban driving environments. In Proceedings of the IEEE International Conference on Robotics and Automation, Hong Kong, China, 31 May–7 June 2014; pp. 1836–1843. [Google Scholar] [CrossRef]
- Chen, B.; Pei, X.; Chen, Z. Research on target detection based on distributed track fusion for intelligent vehicles. Sensors 2020, 20, 56. [Google Scholar] [CrossRef] [Green Version]
- Liu, Z.; Cai, Y.; Wang, H.; Chen, L.; Gao, H.; Jia, Y.; Li, Y. Robust Target Recognition and Tracking of Self-Driving Cars With Radar and Camera Information Fusion Under Severe Weather Conditions. IEEE Trans. Intell. Transp. Syst. 2021, 1–14. [Google Scholar] [CrossRef]
- Zhang, R.; Cao, S. Extending reliability of mmwave radar tracking and detection via fusion with camera. IEEE Access 2019, 7, 137065–137079. [Google Scholar] [CrossRef]
- Lee, K.H.; Kanzawa, Y.; Derry, M.; James, M.R. Multi-Target Track-to-Track Fusion Based on Permutation Matrix Track Association. In Proceedings of the IEEE Intelligent Vehicles Symposium, Changshu, China, 26–30 June 2018; pp. 465–470. [Google Scholar] [CrossRef]
- Remmas, W.; Chemori, A.; Kruusmaa, M. Diver tracking in open waters: A low-cost approach based on visual and acoustic sensor fusion. J. Field Robot. 2021, 38, 494–508. [Google Scholar] [CrossRef]
- Tian, Q.; Wang, K.I.; Salcic, Z. An INS and UWB fusion-based gyroscope drift correction approach for indoor Pedestrian tracking. Sensors 2020, 20, 4476. [Google Scholar] [CrossRef]
- Nabati, R.; Qi, H. CenterFusion: Center-based radar and camera fusion for 3d object detection. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021, Waikoloa, HI, USA, 3–8 January 2021; pp. 1526–1535. [Google Scholar] [CrossRef]
- Wang, Z.; Miao, X.; Huang, Z.; Luo, H. Research of target detection and classification techniques using millimeter-wave radar and vision sensors. Remote Sens. 2021, 13, 1064. [Google Scholar] [CrossRef]
- Zhang, W.; Zhou, H.; Sun, S.; Wang, Z.; Shi, J.; Loy, C.C. Robust multi-modality multi-object tracking. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019. [Google Scholar] [CrossRef] [Green Version]
- Chang, S.; Zhang, Y.; Zhang, F.; Zhao, X.; Huang, S.; Feng, Z.; Wei, Z. Spatial attention fusion for obstacle detection using mmwave radar and vision sensor. Sensors 2020, 20, 956. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, X.; Xu, L.; Sun, H.; Xin, J.; Zheng, N. On-Road Vehicle Detection and Tracking Using MMW Radar and Monovision Fusion. IEEE Trans. Intell. Transp. Syst. 2016, 17, 2075–2084. [Google Scholar] [CrossRef]
- Chen, X.; Kundu, K.; Zhang, Z.; Ma, H.; Fidler, S.; Urtasun, R. Monocular 3D Object Detection for Autonomous Driving. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 26 June–1 July 2016; pp. 2147–2156. [Google Scholar] [CrossRef]
- John, V.; Mita, S. RVNet: Deep Sensor Fusion of Monocular Camera and Radar for Image-Based Obstacle Detection in Challenging Environments. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2019; Volume 11854 LNCS, pp. 351–364. [Google Scholar] [CrossRef]
- Cai, X.; Giallorenzo, M.; Sarabandi, K. Machine Learning-Based Target Classification for MMW Radar in Autonomous Driving. IEEE Trans. Intell. Veh. 2021, 6, 678–689. [Google Scholar] [CrossRef]
- Pegoraro, J.; Rossi, M. Real-Time People Tracking and Identification from Sparse mm-Wave Radar Point-Clouds. IEEE Access 2021, 9, 78504–78520. [Google Scholar] [CrossRef]
- Lim, H.S.; Park, H.M.; Lee, J.E.; Kim, Y.H.; Lee, S. Lane-by-lane traffic monitoring using 24.1 ghz fmcw radar system. IEEE Access 2021, 9, 14677–14687. [Google Scholar] [CrossRef]
- Held, P.; Steinhauser, D.; Koch, A.; Brandmeier, T.; Schwarz, U.T. A Novel Approach for Model-Based Pedestrian Tracking Using Automotive Radar. IEEE Trans. Intell. Transp. Syst. 2021, 1–14. [Google Scholar] [CrossRef]
- Davey, S.J.; Rutten, M.G.; Cheung, B. A Comparison of Detection Performance for Several Track-before-Detect Algorithms. EURASIP J. Adv. Signal Process. 2007, 2008, 1–10. [Google Scholar] [CrossRef] [Green Version]
- Zhao, P.; Lu, C.X.; Wang, J.; Chen, C.; Wang, W.; Trigoni, N.; Markham, A. MID: Tracking and identifying people with millimeter wave radar. In Proceedings of the 15th Annual International Conference on Distributed Computing in Sensor Systems, DCOSS 2019, Santorini Island, Greece, 29–31 May 2019; pp. 33–40. [Google Scholar] [CrossRef]
- Fiscante, N.; Addabbo, P.; Clemente, C.; Biondi, F.; Giunta, G.; Orlando, D. A track-before-detect strategy based on sparse data processing for air surveillance radar applications. Remote Sens. 2021, 13, 662. [Google Scholar] [CrossRef]
- Weng, X.; Kitani, K. Monocular 3D object detection with pseudo-LiDAR point cloud. In Proceedings of the 2019 International Conference on Computer Vision Workshop, ICCVW 2019, Seoul, Korea, 27–28 October 2019; pp. 857–866. [Google Scholar] [CrossRef] [Green Version]
- Zhou, Y.; He, Y.; Zhu, H.; Wang, C.; Li, H.; Jiang, Q. Monocular 3D Object Detection: An Extrinsic Parameter Free Approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7556–7566. [Google Scholar]
- Hu, H.N.; Cai, Q.Z.; Wang, D.; Lin, J.; Sun, M.; Kraehenbuehl, P.; Darrell, T.; Yu, F. Joint monocular 3D vehicle detection and tracking. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 5389–5398. [Google Scholar] [CrossRef] [Green Version]
- Qin, Z.; Wang, J.; Lu, Y. Monogrnet: A geometric reasoning network for monocular 3D object localization. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Sanur, Bali, Indonesia, 8–12 December 2019; pp. 8851–8858. [Google Scholar] [CrossRef]
- Xiang, Y.; Schmidt, T.; Narayanan, V.; Fox, D. Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv 2017, arXiv:1711.00199. [Google Scholar]
- Bertoni, L.; Kreiss, S.; Alahi, A. MonoLoco: Monocular 3D pedestrian localization and uncertainty estimation. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 6860–6870. [Google Scholar] [CrossRef] [Green Version]
- Bai, J.; Li, S.; Huang, L.; Chen, H. Robust Detection and Tracking Method for Moving Object Based on Radar and Camera Data Fusion. IEEE Sens. J. 2021, 21, 10761–10774. [Google Scholar] [CrossRef]
- Schuhmacher, D.; Vo, B.T.; Vo, B.N. A consistent metric for performance evaluation of multi-object filters. IEEE Trans. Signal Process. 2008, 56, 3447–3457. [Google Scholar] [CrossRef] [Green Version]
- Fridling, B.E.; Drummond, O.E. Performance evaluation methods for multiple-target-tracking algorithms. In Signal and Data Processing of Small Targets 1991; International Society for Optics and Photonics: Bellingham, WA, USA, 1991; Volume 1481, pp. 371–383. [Google Scholar]
- Rahmathullah, A.S.; Garcia-Fernandez, A.F.; Svensson, L. Generalized optimal sub-pattern assignment metric. In Proceedings of the 20th International Conference on Information Fusion, Fusion 2017—Proceedings, Xi’an, China, 10–13 July 2017; pp. 1–8. [Google Scholar] [CrossRef] [Green Version]
- Bernardin, K.; Stiefelhagen, R. Evaluating multiple object tracking performance: The CLEAR MOT metrics. Eurasip J. Image Video Process. 2008, 2008, 246309. [Google Scholar] [CrossRef] [Green Version]
- Weng, X.; Wang, J.; Held, D.; Kitani, K. 3D multi-object tracking: A baseline and new evaluation metrics. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Las Vegas, NV, USA, 25–29 October 2020; pp. 10359–10366. [Google Scholar] [CrossRef]
- Kreiss, S.; Bertoni, L.; Alahi, A. PifPaf: Composite fields for human pose estimation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11969–11978. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2961–2969. [Google Scholar] [CrossRef] [PubMed]
Fusion Methods | Work | Highlights |
---|---|---|
Single-modality way | Otto et al. [22] | continue tracking across blind regions |
Liu et al. [23] | combine interacting multiple models | |
Dimi et al. [24] | calculate joint likelihood of radar and camera | |
Lee et al. [33] | update sequentially by all sensor observations | |
Multi-modality way | Nobis et al. [28] | use ROIs to guide detection by other sensors |
Cho et al. [29] | camera determines class, radar provides location | |
Bai et al. [56] | detect target independently, track with GM-PHD | |
Nabati et al. [36] | use radar feature maps to complement images | |
Zhang et al. [38] | use e2e neural network with feature extractors | |
Combination way | Wang et al. [40] | use both ROIs and single-modality fusion |
MMW Radar | Parameters |
---|---|
Working frequency | 76–81 GHz |
Max working range | 15 m |
Range resolution | 0.09 m |
Field of view | |
Azimuth resolution | |
Update rate | 15 Hz |
Test Scenario and Fusion Strategy | OSPA |
---|---|
Single pedestrian | |
MMW radar tracking with single pedestrian | 0.197 |
Monocular camera 3D tracking with single pedestrian | 0.383 |
Fusion by [32] with single pedestrian | 0.222 |
Propsed fusion strategy with single pedestrian | 0.185 |
Multiple pedestrians | |
MMW radar tracking with multiple pedestrians | 1.164 |
Monocular camera 3D tracking with multiple pedestrians | 0.970 |
Fusion by [32] with multiple pedestrians | 0.793 |
Propsed fusion strategy with multiple pedestrians | 0.624 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhu, Y.; Wang, T.; Zhu, S. Adaptive Multi-Pedestrian Tracking by Multi-Sensor: Track-to-Track Fusion Using Monocular 3D Detection and MMW Radar. Remote Sens. 2022, 14, 1837. https://doi.org/10.3390/rs14081837
Zhu Y, Wang T, Zhu S. Adaptive Multi-Pedestrian Tracking by Multi-Sensor: Track-to-Track Fusion Using Monocular 3D Detection and MMW Radar. Remote Sensing. 2022; 14(8):1837. https://doi.org/10.3390/rs14081837
Chicago/Turabian StyleZhu, Yipeng, Tao Wang, and Shiqiang Zhu. 2022. "Adaptive Multi-Pedestrian Tracking by Multi-Sensor: Track-to-Track Fusion Using Monocular 3D Detection and MMW Radar" Remote Sensing 14, no. 8: 1837. https://doi.org/10.3390/rs14081837
APA StyleZhu, Y., Wang, T., & Zhu, S. (2022). Adaptive Multi-Pedestrian Tracking by Multi-Sensor: Track-to-Track Fusion Using Monocular 3D Detection and MMW Radar. Remote Sensing, 14(8), 1837. https://doi.org/10.3390/rs14081837