CDTracker: Coarse-to-Fine Feature Matching and Point Densification for 3D Single-Object Tracking
Abstract
:1. Introduction
- We propose a novel 3D SOT network, dubbed CDTracker, to handle abrupt changes in appearance features and sparse point clouds.
- We introduce a novel coarse-to-fine feature matching module based on the hybrid similarity learning mechanism, which combines cosine embedding and attention assignment in the feature matching of 3D SOT.
- We introduce a relatively dense sampling module that segments and retains more points of interest, thereby further improving the tracking performance.
2. Related Work
3. Method
3.1. Feature Extraction
3.2. Coarse-to-Fine Feature Matching
3.2.1. Coarse Feature Matching
3.2.2. Fine Feature Matching
3.3. Relatively Dense Sampling
4. Experiment
4.1. Experiment Setting
4.2. Results
4.3. Ablation Study
4.4. Limitations
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zheng, C.; Yan, X.; Zhang, H.; Wang, B.; Cheng, S.; Cui, S.; Li, Z. Beyond 3d siamese tracking: A motion-centric paradigm for 3d single object tracking in point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8111–8120. [Google Scholar]
- Giancola, S.; Zarzar, J.; Ghanem, B. Leveraging shape completion for 3d siamese tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1359–1368. [Google Scholar]
- Qi, H.; Feng, C.; Cao, Z.; Zhao, F.; Xiao, Y. P2b: Point-to-box network for 3d object tracking in point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6329–6338. [Google Scholar]
- Zheng, C.; Yan, X.; Gao, J.; Zhao, W.; Zhang, W.; Li, Z.; Cui, S. Box-aware feature enhancement for single object tracking on point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 13199–13208. [Google Scholar]
- Zhou, C.; Luo, Z.; Luo, Y.; Liu, T.; Pan, L.; Cai, Z.; Zhao, H.; Lu, S. Pttr: Relational 3d point cloud object tracking with transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8531–8540. [Google Scholar]
- Shan, J.; Zhou, S.; Fang, Z.; Cui, Y. Ptt: Point-track-transformer module for 3d single object tracking in point clouds. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 1310–1316. [Google Scholar]
- Hui, L.; Wang, L.; Cheng, M.; Xie, J.; Yang, J. 3D Siamese voxel-to-BEV tracker for sparse point clouds. Adv. Neural Inf. Process. Syst. 2021, 34, 28714–28727. [Google Scholar]
- Hui, L.; Wang, L.; Tang, L.; Lan, K.; Xie, J.; Yang, J. 3d siamese transformer network for single object tracking on point clouds. In Computer Vision–ECCV 2022, Proceedings of the 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Proceedings, Part II; Springer: Cham, Switzerland, 2022; pp. 293–310. [Google Scholar]
- Zhao, K.; Zhao, H.; Wang, Z.; Peng, J.; Hu, Z. Object Preserving Siamese Network for Single Object Tracking on Point Clouds. arXiv 2023, arXiv:2301.12057. [Google Scholar] [CrossRef]
- Xu, T.X.; Guo, Y.C.; Lai, Y.K.; Zhang, S.H. CXTrack: Improving 3D point cloud tracking with contextual information. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 1084–1093. [Google Scholar]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? the kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
- Sun, P.; Kretzschmar, H.; Dotiwalla, X.; Chouard, A.; Patnaik, V.; Tsui, P.; Guo, J.; Zhou, Y.; Chai, Y.; Caine, B.; et al. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2446–2454. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems; MIT: Cambridge, MA, USA, 2017; Volume 30. [Google Scholar]
- Qi, C.R.; Litany, O.; He, K.; Guibas, L.J. Deep hough voting for 3d object detection in point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9277–9286. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30.
- Luo, Z.; Zhou, C.; Pan, L.; Zhang, G.; Liu, T.; Luo, Y.; Zhao, H.; Liu, Z.; Lu, S. Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer. arXiv 2022, arXiv:2208.05216. [Google Scholar] [CrossRef] [PubMed]
- Nie, J.; He, Z.; Yang, Y.; Gao, M.; Zhang, J. Glt-t: Global-local transformer voting for 3d single object tracking in point clouds. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 1957–1965. [Google Scholar]
- Nie, J.; He, Z.; Yang, Y.; Bao, Z.; Gao, M.; Zhang, J. OSP2B: One-Stage Point-to-Box Network for 3D Siamese Tracking. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, Macao, China, 19–25 August 2023; pp. 1285–1293. [Google Scholar]
- Xia, Y.; Wu, Q.; Li, W.; Chan, A.B.; Stilla, U. A lightweight and detector-free 3d single object tracker on point clouds. IEEE Trans. Intell. Transp. Syst. 2023, 24, 5543–5554. [Google Scholar] [CrossRef]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. 2019, 38, 1–12. [Google Scholar] [CrossRef]
- Natali, M.; Biasotti, S.; Patanè, G.; Falcidieno, B. Graph-based representations of point clouds. Graph. Model. 2011, 73, 151–164. [Google Scholar] [CrossRef]
- Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12697–12705. [Google Scholar]
- Zhou, Y.; Tuzel, O. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4490–4499. [Google Scholar]
- Chen, R.; Wu, J.; Luo, Y.; Xu, G. PointMM: Point Cloud Semantic Segmentation CNN under Multi-Spatial Feature Encoding and Multi-Head Attention Pooling. Remote Sens. 2024, 16, 1246. [Google Scholar] [CrossRef]
- Shi, M.; Zhang, F.; Chen, L.; Liu, S.; Yang, L.; Zhang, C. Position-Feature Attention Network-Based Approach for Semantic Segmentation of Urban Building Point Clouds from Airborne Array Interferometric SAR. Remote Sens. 2024, 16, 1141. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Chen, X.; Li, D.; Liu, M.; Jia, J. CNN and Transformer Fusion for Remote Sensing Image Semantic Segmentation. Remote Sens. 2023, 15, 4455. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision; Springer: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
- Quan, H.; Lai, H.; Gao, G.; Ma, J.; Li, J.; Chen, D. Pairwise CNN-Transformer Features for Human–Object Interaction Detection. Entropy 2024, 26, 205. [Google Scholar] [CrossRef] [PubMed]
- Gong, H.; Mu, T.; Li, Q.; Dai, H.; Li, C.; He, Z.; Wang, W.; Han, F.; Tuniyazi, A.; Li, H.; et al. Swin-transformer-enabled YOLOv5 with attention mechanism for small object detection on satellite images. Remote Sens. 2022, 14, 2861. [Google Scholar] [CrossRef]
- Chen, X.; Yan, B.; Zhu, J.; Wang, D.; Yang, X.; Lu, H. Transformer tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8126–8135. [Google Scholar]
- Yang, J.; Pan, Z.; Liu, Y.; Niu, B.; Lei, B. Single object tracking in satellite videos based on feature enhancement and multi-level matching strategy. Remote Sens. 2023, 15, 4351. [Google Scholar] [CrossRef]
- Zhao, H.; Jiang, L.; Jia, J.; Torr, P.H.; Koltun, V. Point transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 16259–16268. [Google Scholar]
- Estrella-Ibarra, L.F.; León-Cuevas, A.d.; Tovar-Arriaga, S. Nested Contrastive Boundary Learning: Point Transformer Self-Attention Regularization for 3D Intracranial Aneurysm Segmentation. Technologies 2024, 12, 28. [Google Scholar] [CrossRef]
- Mao, J.; Xue, Y.; Niu, M.; Bai, H.; Feng, J.; Liang, X.; Xu, H.; Xu, C. Voxel transformer for 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 3164–3173. [Google Scholar]
- Pan, X.; Xia, Z.; Song, S.; Li, L.E.; Huang, G. 3d object detection with pointformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7463–7472. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Kristan, M.; Matas, J.; Leonardis, A.; Vojíř, T.; Pflugfelder, R.; Fernandez, G.; Nebehay, G.; Porikli, F.; Čehovin, L. A novel performance evaluation methodology for single-target trackers. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 2137–2155. [Google Scholar] [CrossRef]
- Yang, Y.; Deng, Y.; Nie, J.; Zhang, J. BEVTrack: A Simple Baseline for Point Cloud Tracking in Bird’s-Eye-View. arXiv 2023, arXiv:2309.02185. [Google Scholar]
Methods | Car | Pedestrian | Cyclist | Van |
---|---|---|---|---|
SC3D [2] | 41.3/57.9 | 18.2/37.8 | 41.5/70.4 | 40.4/47.0 |
P2B [3] | 56.2/72.8 | 28.7/49.6 | 32.1/44.7 | 40.8/48.4 |
BAT [4] | 60.5/77.7 | 42.1/70.1 | 33.7/45.4 | 52.4/67.0 |
PTT [6] | 67.8/81.8 | 44.9/72.0 | 37.2/47.3 | 43.6/52.5 |
PTTR [5] | 65.2/77.4 | 50.9/81.6 | 65.1/90.5 | 52.5/61.8 |
V2B [7] | 70.5/81.3 | 48.3/73.5 | 40.8/49.7 | 50.1/58.0 |
STNet [8] | 70.6/82.3 | 48.0/71.9 | 64.6/91.7 | 54.0/62.5 |
DMT [19] | 66.4/79.4 | 48.1/77.9 | 70.4/93.6 | 53.3/65.6 |
OSP2B [18] | 67.5/82.3 | 53.6/85.1 | 65.6/90.5 | 56.3/66.2 |
GLT-T [17] | 68.2/82.1 | 52.4/78.8 | 68.9/92.1 | 52.6/62.9 |
CDTracker (Ours) | 71.7/83.1 | 49.2/75.9 | 71.2/93.2 | 51.7/62.3 |
Vehicle | Pedestrian | |||||||
---|---|---|---|---|---|---|---|---|
Methods | Easy | Medium | Hard | Mean | Easy | Medium | Hard | Mean |
P2B [3] | 57.1/65.4 | 52.0/60.7 | 47.9/58.5 | 52.6/61.7 | 18.1/30.8 | 17.8/30.0 | 17.7/29.3 | 17.9/30.1 |
BAT [4] | 61.0/68.3 | 53.3/60.9 | 48.9/57.8 | 54.7/62.7 | 19.3/32.6 | 17.8/29.8 | 17.2/28.3 | 18.2/30.3 |
V2B [7] | 64.5/71.5 | 55.1/63.2 | 52.0/62.0 | 57.6/65.9 | 27.9/43.9 | 22.5/36.2 | 20.1/33.1 | 23.7/37.9 |
STNet [8] | 66.7/73.8 | 58.6/67.4 | 55.4/65.6 | 60.6/69.2 | 27.6/42.9 | 23.2/37.2 | 22.3/36.7 | 24.5/39.1 |
CDTracker (Ours) | 66.8/74.5 | 56.6/65.3 | 55.9/66.2 | 60.1/69.0 | 27.2/42.3 | 24.8/39.8 | 23.1/38.8 | 25.1/40.4 |
Design 1 | Design 2 | M | Car | Cyclist |
---|---|---|---|---|
M = 512 | 70.6/82.3 | 64.6/91.7 | ||
√ | M = 512 | 70.7/82.3 | 70.5/93.1 | |
√ | M = 512 | 70.9/82.1 | 70.2/93.0 | |
√ | √ | M = 512 | 71.7/83.1 | 71.2/93.2 |
√ | √ | M = 128 | 66.1/77.9 | - |
Feature Dimension C | Success | Precision |
---|---|---|
C = 16 | 62.8 | 76.2 |
C = 32 | 71.7 | 83.1 |
C = 64 | 70.6 | 82.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Y.; Pu, C.; Qi, Y.; Yang, J.; Wu, X.; Niu, M.; Wei, M. CDTracker: Coarse-to-Fine Feature Matching and Point Densification for 3D Single-Object Tracking. Remote Sens. 2024, 16, 2322. https://doi.org/10.3390/rs16132322
Zhang Y, Pu C, Qi Y, Yang J, Wu X, Niu M, Wei M. CDTracker: Coarse-to-Fine Feature Matching and Point Densification for 3D Single-Object Tracking. Remote Sensing. 2024; 16(13):2322. https://doi.org/10.3390/rs16132322
Chicago/Turabian StyleZhang, Yuan, Chenghan Pu, Yu Qi, Jianping Yang, Xiang Wu, Muyuan Niu, and Mingqiang Wei. 2024. "CDTracker: Coarse-to-Fine Feature Matching and Point Densification for 3D Single-Object Tracking" Remote Sensing 16, no. 13: 2322. https://doi.org/10.3390/rs16132322
APA StyleZhang, Y., Pu, C., Qi, Y., Yang, J., Wu, X., Niu, M., & Wei, M. (2024). CDTracker: Coarse-to-Fine Feature Matching and Point Densification for 3D Single-Object Tracking. Remote Sensing, 16(13), 2322. https://doi.org/10.3390/rs16132322