Online Multiple Object Tracking Using Min-Cost Flow on Temporal Window for Autonomous Driving
Abstract
:1. Introduction
- By leveraging the feature extraction capabilities of CNN and incorporating metric learning, we design a three-channel neural network with ResNet50 as the backbone network and a triplet loss as learning function. The network aims to extract object appearance features that possess high discriminability. Simultaneously, we employ the KF with CA motion model to optimize and predict the bounding box information of objects. As a result, we obtain robust object representation features.
- The trajectories within the temporal window are divided into active trajectories and inactive trajectories. The affinities between each category of trajectories and detections are computed based on appearance and motion features. By constructing a sparse affinity network and solving the min-cost flow problem, the data association is performed, leading to a reduction in ID switches.
- The extensive experiments have been conducted on the KITTI MOT dataset and our real-world campus scenario. The ablation study confirms the effectiveness of the key modules. The comparison results between our method and the existing homogeneous, vision-based methods using state-of-the-art evaluation measures show that our method exhibits competitive tracking performance.
2. Related Work
2.1. Tracking by Detection
2.2. Joint Detection and Tracking
3. Method
3.1. Feature Extraction
3.1.1. Feature Extraction
3.1.2. Motion Model
3.2. Data Association by Min-Cost Flow on Temporal Window
3.2.1. Affinity Metrics
- (1)
- Appearance Affinity Metric
- (2)
- Motion affinity metric
3.2.2. Data Association by Min-Cost Flow on Temporal Window
4. Experiments
4.1. Datasets
4.1.1. KITTI MOT Dataset
4.1.2. Real-World Campus Scenario Sequences
4.2. MOT Evaluation Metrics
4.3. Object Appearance Feature Extraction Network Implementation
4.4. Ablation Study
4.5. Comparison with the State-of-the-Art Methods
4.6. Visually Intuitive Evaluation
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Guo, G.; Zhao, S. 3D multi-object tracking with adaptive cubature kalman filter for autonomous driving. IEEE Trans. Intell. Veh. 2023, 8, 512–519. [Google Scholar] [CrossRef]
- Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. Bytetrack: Multi-object tracking by associating every detection box. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022. [Google Scholar] [CrossRef]
- Chan, S.; Jia, Y.; Zhou, X.; Bai, C.; Chen, S.; Zhang, X. Online multiple object tracking using joint detection and embedding network. Pattern Recognit. 2022, 130, 108793. [Google Scholar] [CrossRef]
- Abudayyeh, D.; Almomani, M.; Almomani, O.; Alsoud, H.; Alsalman, F. Perceptions of autonomous vehicles: A case study of Jordan. World Electr. Veh. J. 2023, 14, 133. [Google Scholar] [CrossRef]
- Alqarqaz, M.; Bani Younes, M.; Qaddoura, R. An Object Classification Approach for Autonomous Vehicles Using Machine Learning Techniques. World Electr. Veh. J. 2023, 14, 41. [Google Scholar] [CrossRef]
- Liu, Y.; Li, G.; Hao, L.; Yang, Q.; Zhang, D. Research on a Lightweight Panoramic Perception Algorithm for Electric Autonomous Mini-Buses. World Electr. Veh. J. 2023, 14, 179. [Google Scholar] [CrossRef]
- Tian, W.; Lauer, M.; Chen, L. Online multi-object tracking using joint domain information in traffic scenarios. IEEE Trans. Intell. Transp. Syst. 2020, 21, 374–384. [Google Scholar] [CrossRef]
- Karunasekera, H.; Wang, H.; Zhang, H. Multiple object tracking with attention to appearance, structure, motion and size. IEEE Access 2019, 7, 104423–104434. [Google Scholar] [CrossRef]
- Mykheievskyi, D.; Borysenko, D.; Porokhonskyy, V. Learning local feature descriptors for multiple object tracking. In Proceedings of the Asian Conference on Computer Vision (ACCV), Kyoto, Japan, 30 November–4 December 2020. [Google Scholar] [CrossRef]
- Gonzalez, N.F.; Ospina, A.; Calvez, P. SMAT: Smart multiple affinity metrics for multiple object tracking. In Proceedings of the International Conference on Image Analysis and Recognition (ICIAR), Póvoa de Varzim, Portugal, 24–26 June 2020. [Google Scholar] [CrossRef]
- Pang, J.; Qiu, L.; Li, X.; Chen, H.; Li, Q.; Darrell, T.; Yu, F. Quasi-dense similarity learning for multiple object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021. [Google Scholar] [CrossRef]
- Qin, W.; Du, H.; Zhang, X.; Ren, X. End to end multi-object tracking algorithm applied to vehicle tracking. In Proceedings of the Asia Conference on Algorithms, Computing and Machine Learning (CACML), Hangzhou, China, 19–25 November 2022. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
- Wang, G.; Gu, R.; Liu, Z.; Hu, W.; Song, M.; Hwang, J. Track without appearance: Learn box and tracklet embedding with local and global motion patterns for vehicle tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar] [CrossRef]
- Wang, H.; Li, Z.; Li, Y.; Nai, K.; Wen, M. Sture: Spatial–temporal mutual representation learning for robust data association in online multi-object tracking. Comput. Vis. Image Underst. 2022, 220, 103433. [Google Scholar] [CrossRef]
- Yang, F.; Wang, Z.; Wu, Y.; Sakti, S.; Nakamura, S. Tackling multiple object tracking with complicated motions–Re–designing the integration of motion and appearance. Image Vis. Comput. 2022, 124, 104514. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar] [CrossRef]
- Pramanik, A.; Pal, S.; Maiti, J.; Mitra, P. Granulated rcnn and multi-class deep sort for multi-object detection and tracking. IEEE Trans. Emerg. Top. Comput. Intell. 2022, 6, 171–181. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.; Liao, H. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
- Liang, S.; Wu, H.; Zhen, L.; Hua, Q.; Garg, S.; Kaddoum, G.; Hassan, M.M.; Yu, K. Edge yolo: Real-time intelligent object detection system based on edge-cloud cooperation in autonomous vehicles. IEEE Trans. Intell. Transp. Syst. 2022, 23, 25345–25360. [Google Scholar] [CrossRef]
- Xu, H.; Dong, X.; Wu, W.; Yu, B.; Zhu, H. A two-stage pillar feature-encoding network for pillar-based 3D object detection. World Electr. Veh. J. 2023, 14, 146. [Google Scholar] [CrossRef]
- Luiten, J.; Fischer, T.; Leibe, B. Track to reconstruct and reconstruct to track. IEEE Robot. Autom. Lett. 2020, 5, 1803–1810. [Google Scholar] [CrossRef]
- Marinello, N.; Proesmans, M.; Gool, L.V. Triplettrack: 3D object tracking using triplet embeddings and LSTM. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA, 19–20 June 2022. [Google Scholar] [CrossRef]
- Chu, P.; Ling, H. Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar] [CrossRef]
- Guo, S.; Wang, J.; Wang, X.; Tao, D. Online multiple object tracking with cross-task synergy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar] [CrossRef]
- Zhou, X.; Koltun, V.; Krähenbühl, P. Tracking objects as points. In Proceedings of the European Conference on Computer Vision (ECCV), Virtual Platform, 23–28 August 2020. [Google Scholar] [CrossRef]
- Tokmakov, P.; Li, J.; Burgard, W.; Gaidon, A. Learning to track with object permanence. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar] [CrossRef]
- Kong, J.; Mo, E.; Jiang, M.; Liu, T. Motfr: Multiple object tracking based on feature recoding. IEEE Trans. Circuits. Syst. Video Technol. 2022, 32, 7746–7757. [Google Scholar] [CrossRef]
- Liu, Y.; Bai, T.; Tian, Y.; Wang, Y.; Wang, J.; Wang, X.; Wang, F. Segdq: Segmentation assisted multi-object tracking with dynamic query-based transformers. Neurocomputing 2022, 48, 91–101. [Google Scholar] [CrossRef]
- Cai, J.; Xu, M.; Li, W.; Xiong, Z.; Xia, W.; Tu, Z.; Soatto, S. Memot: Multi-object tracking with memory. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar] [CrossRef]
- Wei, H.; Huang, Y.; Hu, F.; Zhao, B.; Guo, Z.; Zhang, R. Motion Estimation Using Region-Level Segmentation and Extended Kalman Filter for Autonomous Driving. Remote Sens. 2021, 13, 1828. [Google Scholar] [CrossRef]
- Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017. [Google Scholar] [CrossRef]
- Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple online and realtime tracking. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016. [Google Scholar] [CrossRef]
- Wang, C.; Wang, Y.; Wang, Y.; Wu, C.; Yu, G. muSSP: Efficient min-cost flow algorithm for multi-object tracking. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
- Bernardin, K.; Stiefelhagen, R. Evaluating multiple object tracking performance: The CLEAR MOT metrics. EURASIP J. Image Video Process. 2008, 2008, 246309. [Google Scholar] [CrossRef]
- Li, Y.; Huang, C.; Nevatia, R. Learning to associate: HybridBoosted multi-target tracker for crowded scene. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009. [Google Scholar] [CrossRef]
- Luiten, J.; Ošep, A.; Dendorfer, P.; Torr, P.; Geiger, A.; Leal-Taixé, L.; Leibe, B. HOTA: A higher order metric for evaluating multi-object tracking. Int. J. Comput. Vis. 2021, 129, 548–578. [Google Scholar] [CrossRef] [PubMed]
Different Versions | |||||||
---|---|---|---|---|---|---|---|
seg + app_T + motion + ACT (ours) | 78.20 | 75.36 | 82.24 | 85.70 | 92.72 | 2.84 | 11 |
w/o seg + app_T + motion + ACT | 76.32 | 72.18 | 80.89 | 81.76 | 88.39 | 5.22 | 18 |
seg + app_E + motion + ACT | 70.74 | 69.29 | 73.51 | 71.98 | 74.16 | 9.63 | 49 |
seg + app_T + ACT | 74.81 | 70.87 | 79.26 | 79.25 | 86.47 | 4.75 | 26 |
seg + motion + ACT | 72.69 | 68.53 | 77.30 | 77.63 | 79.23 | 3.26 | 32 |
seg + app_T + motion + w/o ACT | 75.95 | 71.74 | 80.42 | 82.42 | 90.82 | 4.28 | 21 |
FAMNET [26] | 52.56 | 61.00 | 45.51 | 75.92 | 52.46 | 9.69 | 521 | 1500 + D |
JCSTD [7] | 65.94 | 65.37 | 67.03 | 80.24 | 57.08 | 7.85 | 173 | 70 + D |
MASS [8] | 68.25 | 72.92 | 64.46 | 84.64 | 74.00 | 2.92 | 353 | 10 + D |
Quasi-Dense [11] | 68.45 | 72.44 | 65.49 | 84.93 | 69.54 | 3.85 | 313 | 70 + D |
MOTSFusion [24] | 68.74 | 72.19 | 66.16 | 84.24 | 72.77 | 2.92 | 415 | 440 + D |
LGM [14] | 73.14 | 74.61 | 72.31 | 87.60 | 85.08 | 2.46 | 448 | 80 + D |
SMAT [10] | 71.88 | 72.13 | 72.13 | 83.64 | 62.77 | 6.00 | 198 | 100 + D |
TripletTrack [25] | 73.58 | 73.18 | 74.66 | 84.32 | 69.85 | 3.85 | 322 | 100 + D |
Ours | 73.93 | 73.35 | 75.81 | 86.49 | 78.52 | 3.17 | 126 | 63 + D |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wei, H.; Huang, Y.; Zhang, Q.; Guo, Z. Online Multiple Object Tracking Using Min-Cost Flow on Temporal Window for Autonomous Driving. World Electr. Veh. J. 2023, 14, 243. https://doi.org/10.3390/wevj14090243
Wei H, Huang Y, Zhang Q, Guo Z. Online Multiple Object Tracking Using Min-Cost Flow on Temporal Window for Autonomous Driving. World Electric Vehicle Journal. 2023; 14(9):243. https://doi.org/10.3390/wevj14090243
Chicago/Turabian StyleWei, Hongjian, Yingping Huang, Qian Zhang, and Zhiyang Guo. 2023. "Online Multiple Object Tracking Using Min-Cost Flow on Temporal Window for Autonomous Driving" World Electric Vehicle Journal 14, no. 9: 243. https://doi.org/10.3390/wevj14090243