Pixel-Guided Association for Multi-Object Tracking
Abstract
:1. Introduction
2. Related Works
2.1. Conventional Online Propagation and Association in MOT
2.2. Transformer-Based Multi-Object Tracking
3. Proposed Approach
3.1. Transformer-Based Propagation
3.2. Long-Term Discriminative Appearance Matching
3.3. Training Objects
4. Experiments
4.1. Implementation Details
4.2. Metrics
4.3. Ablation Studies
5. Experiments with Public MOT Benchmarks
Comparison with the Baseline Approach
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Sadeghian, A.; Alahi, A.; Savarese, S. Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Rezatofighi, S.H.; Milan, A.; Zhang, Z.; Shi, Q.; Dick, A.R.; Reid, I.D. Joint Probabilistic Data Association Revisited. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 3047–3055. [Google Scholar]
- Xiang, Y.; Alahi, A.; Savarese, S. Learning to Track: Online Multi-object Tracking by Decision Making. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 4705–4713. [Google Scholar]
- Daniel, S.; Jürgen, B. Multi-Pedestrian Tracking with Clusters. In Proceedings of the 2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Washington, DC, USA, 16–19 November 2021; pp. 1–10. [Google Scholar]
- Daniel, S.; Jürgen, B. Improving Multiple Pedestrian Tracking by Track Management and Occlusion Handling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 10953–10962. [Google Scholar]
- Jiangmiao, P.; Linlu, Q.; Xia, L.; Haofeng, C.; Qi, L.; Trevor, D.; Fisher, Y. Quasi-Dense Similarity Learning for Multiple Object Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 164–173. [Google Scholar]
- Kim, C.; Li, F.; Rehg, J.M. Multi-object Tracking with Neural Gating Using Bilinear LSTM. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 208–224. [Google Scholar]
- Choi, W. Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 3029–3037. [Google Scholar]
- Xing, J.; Ai, H.; Lao, S. Multi-object tracking through occlusions by local tracklets filtering and global tracklets association with detection responses. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Miami, FL, USA, 20–25 June 2009; pp. 1200–1207. [Google Scholar]
- Hornakova, A.; Henschel, R.; Rosenhahn, B.; Swoboda, P. Lifted Disjoint Paths with Application in Multiple Object Tracking. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020. [Google Scholar]
- Zamir, A.R.; Dehghan, A.; Shah, M. GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 343–356. [Google Scholar]
- Andrea, H.; Timo, K.; Paul, S.; Michal, R.; Bodo, R.; Roberto, H. Making Higher Order MOT Scalable: An Efficient Approximate Solver for Lifted Disjoint Paths. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 6310–6320. [Google Scholar]
- Xu, Y.; Ban, Y.; Delorme, G.; Gan, C.; Rus, D.; Alameda-Pineda, X. TransCenter: Transformers with Dense Queries for Multiple-Object Tracking. arXiv 2021, arXiv:2103.15145. [Google Scholar]
- Yifu, Z.; Chunyu, W.; Xinggang, W.; Wenjun, Z.; Wenyu, L. Fairmot: On the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis. 2021, 129, 3069–3087. [Google Scholar]
- Philipp, B.; Tim, M.; Laura, L.T. Tracking Without Bells and Whistles. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 941–951. [Google Scholar]
- Xingyi, Z.; Vladlen, K.; Philipp, K. Tracking Objects as Points. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 474–490. [Google Scholar]
- Nicolai, W.; Alex, B.; Dietrich, P. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar]
- Nicolas, C.; Francisco, M.; Gabriel, S.; Nicolas, U.; Alexander, K.; Sergey, Z. End-to-End Object Detection with Transformers. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable {DETR}: Deformable Transformers for End-to-End Object Detection. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Chu, P.; Wang, J.; You, Q.; Ling, H.; Liu, Z. TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking. arXiv 2021, arXiv:2104.00194. [Google Scholar]
- Fangao, Z.; Bin, D.; Yuang, Z.; Tiancai, W.; Xiangyu, Z.; Yichen, W. MOTR: End-to-End Multiple-Object Tracking with TRansformer. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
- Sun, P.; Cao, J.; Jiang, Y.; Zhang, R.; Xie, E.; Yuan, Z.; Wang, C.; Luo, P. TransTrack: Multiple-Object Tracking with Transformer. arXiv 2020, arXiv:2012.15460. [Google Scholar]
- Yifu, Z.; Peize, S.; Yi, J.; Dongdong, Y.; Fucheng, W.; Zehuan, Y.; Ping, L.; Wenyu, L.; Xinggang, W. ByteTrack: Multi-Object Tracking by Associating Every Detection Box. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
- Boragule, A.; Jeon, M. Joint Cost Minimization for Multi-object Tracking. In Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 29 August–1 September 2017. [Google Scholar]
- Zhou, X.; Jiang, P.; Wei, Z.; Dong, H.; Wang, F. Online Multi-Object Tracking with Structural Invariance Constraint. In Proceedings of the British Machine Vision Conference (BMVC), Newcastle, UK, 3–6 September 2018. [Google Scholar]
- Dicle, C.; Camps, O.I.; Sznaier, M. The Way They Move: Tracking Multiple Targets with Similar Appearance. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, 1–8 December 2013; pp. 2304–2311. [Google Scholar]
- Yoon, K.; Song, Y.M.; Jeon, M. Multiple hypothesis tracking algorithm for multi-target multi-camera tracking with disjoint views. IET Image Process. 2018, 12, 1175–1184. [Google Scholar] [CrossRef] [Green Version]
- Kim, C.; Li, F.; Ciptadi, A.; Insafutdinov, J.M.R. Multiple Hypothesis Tracking Revisited. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Liu, Q.; Chen, D.; Chu, Q.; Yuan, L.; Liu, B.; Zhang, L.; Yu, N. Online multi-object tracking with unsupervised re-identification learning and occlusion estimation. Neurocomputing 2022, 483, 333–347. [Google Scholar] [CrossRef]
- Bastani, F.; He, S.; Madden, S. Self-Supervised Multi-Object Tracking with Cross-input Consistency. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Virtual, 6–14 December 2021. [Google Scholar]
- Yoon, J.H.; Lee, C.R.; Yang, M.H.; Yoon, K. Online multi-object tracking via structural constraint event aggregation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Yoon, J.H.; Yang, M.H.; Lim, J.; Yoon, K.J. Bayesian multi-object tracking using motion context from multiple objects. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 6–9 January 2015. [Google Scholar]
- Yoon, Y.C.; Boragule, A.; Song, Y.; Yoon, K.; Jeon, M. Online Multi-Object Tracking with Historical Appearance Matching and Scene Adaptive Detection Filtering. In Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Auckland, New Zealand, 27–30 November 2018; pp. 1–6. [Google Scholar]
- Kalman, R. A new approach to linear filtering and prediction problems. Trans. ASME–J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef] [Green Version]
- Vo, B.T.; See, C.M.S.; Ma, N.; Ng, W.T. Multi-Sensor Joint Detection and Tracking with the Bernoulli Filter. IEEE Trans. Aerosp. Electron. Syst. 2012, 48, 1385–1402. [Google Scholar] [CrossRef]
- Bae, S.H.; Yoon, K. Confidence-Based Data Association and Discriminative Deep Appearance Learning for Robust Online Multi-Object Tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 595–610. [Google Scholar] [CrossRef] [PubMed]
- Zewen, L.; Fan, L.; Wenjie, Y.; Shouheng, P.; Jun, Z. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Trans. Neural Netw. Learn. Syst. 2021, 1–21. [Google Scholar] [CrossRef]
- Deep learning in video multi-object tracking: A survey. Neurocomputing 2020, 381, 61–88. [CrossRef]
- Wang, Y.; Kitani, K.; Weng, X. Joint Object Detection and Multi-Object Tracking with Graph Neural Networks. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 13708–13715. [Google Scholar]
- Lu, Z.; Rathod, V.; Ronny, V.; Jonathan, H. RetinaTrack: Online Single Stage Joint Detection and Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 14656–14666. [Google Scholar]
- Qianyu, Z.; Xiangtai, L.; Lu, H.; Yibo, Y.; Guangliang, C.; Yunhai, T.; Lizhuang, M.; Dacheng, T. TransVOD: End-to-end Video Object Detection with Spatial-Temporal Transformers. arXiv 2022, arXiv:2201.05047. [Google Scholar]
- Meinhardt, T.; Kirillov, A.; Leal-Taixe, L.; Feichtenhofer, C. TrackFormer: Multi-Object Tracking with Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–20 June 2022. [Google Scholar]
- Zhao, Z.; Wu, Z.; Zhuang, Y.; Li, B.; Jia, J. Tracking Objects as Pixel-wise Distributions. arXiv 2022, arXiv:2207.05518. [Google Scholar]
- Kaiming, H.; Xiangyu, Z.; Shaoqing, R.; Jian, S. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention Mask Transformer for Universal Image Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–20 June 2022. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Kaiser, Ł.; Polosukhin, I. Attention is All you Need. In Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2017; Volume 30. [Google Scholar]
- Cheng, B.; Schwing, A.G.; Kirillov, A. Per-Pixel Classification is Not All You Need for Semantic Segmentation. arXiv 2021, arXiv:2107.06278. [Google Scholar]
- Milan, A.; Leal-Taixé, L.; Reid, I.D.; Roth, S.; Schindler, K. MOT16: A Benchmark for Multi-Object Tracking. arXiv 2016, arXiv:1603.00831. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Bo, P.; Yizhuo, L.; Yifan, Z.; Muchen, L.; Cewu, L. TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Fang, K.; Xiang, Y.; Li, X.; Savarese, S. Recurrent Autoregressive Networks for Online Multi-object Tracking. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 466–475. [Google Scholar]
- Jieming, Y.; Hongwei, G.; Jinlong, Y.; Yubing, T.; Shuzhi, S. Online Multi-Object Tracking Using Multi-Function Integration and Tracking Simulation Training. Appl. Intell. 2022, 52, 1268–1288. [Google Scholar]
- Ioannis, P.; Abhijit, S.; Anuj, K. A Graph Convolutional Neural Network Based Approach for Traffic Monitoring Using Augmented Detections with Optical Flow. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; IEEE: New York, NY, USA, 2021; pp. 2980–2986. [Google Scholar]
- Peng, C.; Heng, F.; Chiu, T.; Haibin, L. Online Multi-Object Tracking With Instance-Aware Tracker and Dynamic Model Refreshment. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision, Waikoloa Village, HI, USA, 7–11 January 2019; pp. 161–170. [Google Scholar]
- Yihong, X.; Aljosa, O.; Yutong, B.; Radu, H.; Laura, L.T.; Xavier, A.P. How To Train Your Deep Multi-Object Tracker. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 6787–6796. [Google Scholar]
- Pavel, T.; Jie, L.; Wolfram, B.; Adrien, G. Learning to Track with Object Permanence. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 10840–10849. [Google Scholar]
- Wang, Q.; Zheng, Y.; Pan, P.; Xu, Y. Multiple Object Tracking With Correlation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 3876–3886. [Google Scholar]
- Bing, S.; Andrew, B.; Xinyu, L.; Davide, M.; Joseph, T. SiamMOT: Siamese Multi-Object Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Feng, W.; Hu, Z.; Wu, W.; Yan, J.; Ouyang, W. Multi-Object Tracking with Multiple Cues and Switcher-Aware Classification. arXiv 2019, arXiv:1901.06129. [Google Scholar]
Sequences | Training | Validation |
---|---|---|
MOT15 Sequences{1,2,3,4,5,6,7,10,11} | 50% | 50% |
MOT17 Sequences{2,4,5,9,10,11,13} | 50% | 50% |
CityScapes | 16 Sequences | 5 Sequences |
Crowdhuman | 15,000 Frames | 4370 Frames |
Tracker | MOTA%↑ | IDF1%↑ | MT%↑ | ML%↓ | FP↓ | FN↓ | IDSW↓ |
---|---|---|---|---|---|---|---|
Private Detector | |||||||
GSDT [39] | 60.7 | 64.6 | 47.0 | 10.5 | 7334 | 16,358 | 477 |
FairMOT [14] | 60.6 | 64.7 | 47.6 | 11.0 | 7854 | 15,785 | 591 |
Tube_TK [50] | 58.4 | 53.1 | 39.3 | 18.0 | 5756 | 18,961 | 854 |
RAR15 [51] | 56.5 | 61.0 | 45.1 | 14.6 | 9386 | 16,921 | 428 |
Public Detector | |||||||
MFI_TST [52] | 49.2 | 52.4 | 210 | 176 | 8707 | 21,594 | 912 |
GNNMATCH [53] | 46.7 | 43.2 | 157 | 203 | 6643 | 25,311 | 820 |
KCF [54] | 38.9 | 44.5 | 120 | 227 | 7321 | 29,501 | 720 |
TrctrD15 [55] | 44.1 | 46.0 | 124 | 192 | 6085 | 26,917 | 1347 |
Pixel-Guided | 40.6 | 51.9 | 294 | 86 | 15,027 | 17,352 | 1129 |
Tracker | MOTA%↑ | IDF1%↑ | MT%↑ | ML%↓ | FP↓ | FN↓ | IDSW↓ |
---|---|---|---|---|---|---|---|
Private Detector | |||||||
FairMOT [14] | 73.7 | 72.3 | 19.5 | 36.6 | 12,201 | 248,047 | 2072 |
PermaTrack [56] | 73.8 | 68.9 | 43.8 | 17.2 | 28,998 | 115,104 | 3699 |
CorrTracker [57] | 76.5 | 73.6 | 47.6 | 12.7 | 29,808 | 99,510 | 3369 |
ByteTrack [23] | 80.3 | 77.3 | 53.2 | 14.5 | 25,491 | 83,721 | 2196 |
Public Detector | |||||||
SiamMOT [58] | 65.9 | 63.3 | 34.6 | 23.9 | 14,076 | 200,672 | 2583 |
CenterTrack [16] | 67.8 | 64.7 | 34.6 | 24.6 | 18,498 | 160,332 | 3039 |
QuasiDense [6] | 68.7 | 66.3 | 40.6 | 21.9 | 26,589 | 146,643 | 3378 |
LSST17 [59] | 52.7 | 57.9 | 421 | 863 | 22,512 | 241,936 | 2167 |
Tracktor [15] | 53.5 | 52.3 | 459 | 861 | 12,201 | 248,047 | 2072 |
TransCtr [13] | 68.8 | 61.4 | 867 | 564 | 22,860 | 149,188 | 4102 |
ByteTrack [23] | 67.4 | 70.0 | 730 | 735 | 9939 | 172,636 | 1331 |
Pixel-Guided | 69.7 | 68.4 | 903 | 615 | 26,871 | 140,457 | 3639 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Boragule, A.; Jang, H.; Ha, N.; Jeon, M. Pixel-Guided Association for Multi-Object Tracking. Sensors 2022, 22, 8922. https://doi.org/10.3390/s22228922
Boragule A, Jang H, Ha N, Jeon M. Pixel-Guided Association for Multi-Object Tracking. Sensors. 2022; 22(22):8922. https://doi.org/10.3390/s22228922
Chicago/Turabian StyleBoragule, Abhijeet, Hyunsung Jang, Namkoo Ha, and Moongu Jeon. 2022. "Pixel-Guided Association for Multi-Object Tracking" Sensors 22, no. 22: 8922. https://doi.org/10.3390/s22228922