KOM-SLAM: A GNN-Based Tightly Coupled SLAM and Multi-Object Tracking Framework
Abstract
1. Introduction
- 1.
- We propose KOM-SLAM, a GNN-based, tightly coupled SLAM and multi-object tracking framework, where the GNN jointly learns associations between keypoints and objects. A soft assignment mechanism is applied to backpropagate the pose estimation loss through the GNN. To the best of our knowledge, this is the first learning-based system that tightly integrates SLAM and multi-object tracking.
- 2.
- We embed the ego motion and spatial distance between the keypoint and the ego pose in the network to allow the dynamic adjustment of the keypoint matching range.
- 3.
- We validate the effectiveness of KOM-SLAM on the KITTI Tracking dataset, demonstrating improved performance in both pose estimation and object tracking.
2. Related Work
2.1. Coupled SLAM and Multi-Object Tracking
2.2. GNN-Based Matching
3. Method
3.1. System Architecture
3.2. Graph Neural Network
3.3. Matching Score Matrix Calculation
3.4. Correspondence Association and Joint Optimization
3.5. Training Loss
4. Experiments
4.1. Experimental Details
4.2. Odometry Estimation
4.3. Multi-Object Tracking
4.4. Ablation Study
4.4.1. Ego Pose Estimation Ablation
4.4.2. Object Tracking Association Ablation
4.4.3. Data Density and Time Gap Analysis
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mur-Artal, R.; Tardós, J.D. ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef]
- Shan, T.; Englot, B.; Meyers, D.; Wang, W.; Ratti, C.; Rus, D. LIO-SAM: Tightly-Coupled Lidar Inertial Odometry via Smoothing and Mapping. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 5135–5142. [Google Scholar]
- Weng, X.; Wang, Y.; Man, Y.; Kitani, K.M. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6499–6508. [Google Scholar]
- Nagy, M.; Werghi, N.; Hassan, B.; Dias, J.; Khonji, M. RobMOT: 3D Multi-Object Tracking Enhancement Through Observational Noise and State Estimation Drift Mitigation in LiDAR Point Clouds. IEEE Trans. Intell. Transp. Syst. 2025, 26, 16047–16059. [Google Scholar] [CrossRef]
- Bescos, B.; Campos, C.; Tardós, J.D.; Neira, J. DynaSLAM II: Tightly-Coupled Multi-Object Tracking and SLAM. IEEE Robot. Autom. Lett. 2021, 6, 5191–5198. [Google Scholar] [CrossRef]
- Tian, R.; Zhang, Y.; Yang, L.; Zhang, J.; Coleman, S.; Kerr, D. DynaQuadric: Dynamic Quadric SLAM for Quadric Initialization, Mapping, and Tracking. IEEE Trans. Intell. Transp. Syst. 2024, 25, 17234–17246. [Google Scholar] [CrossRef]
- Shen, Y.; Li, H.; Yi, S.; Chen, D.; Wang, X. Person Re-Identification with Deep Similarity-Guided Graph Neural Network. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 486–504. [Google Scholar]
- Li, Y.; Gu, C.; Dullien, T.; Vinyals, O.; Kohli, P. Graph Matching Networks for Learning the Similarity of Graph Structured Objects. In Proceedings of the International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019; pp. 3835–3845. [Google Scholar]
- Sarlin, P.-E.; DeTone, D.; Malisiewicz, T.; Rabinovich, A. SuperGlue: Learning Feature Matching with Graph Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4938–4947. [Google Scholar]
- Cetintas, O.; Brasó, G.; Leal-Taixé, L. Unifying Short and Long-Term Tracking with Graph Hierarchies. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June; pp. 22877–22887.
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. SuperPoint: Self-Supervised Interest Point Detection and Description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 224–236. [Google Scholar]
- Zhang, J.; Singh, S. LOAM: Lidar Odometry and Mapping in Real-Time. In Proceedings of the Robotics: Science and Systems, Berkeley, CA, USA, 12–16 July 2014; pp. 1–9. [Google Scholar]
- Kannapiran, S.; Bendapudi, N.; Yu, M.-Y.; Parikh, D.; Berman, S.; Vora, A.; Pandey, G. Stereo Visual Odometry with Deep Learning-Based Point and Line Feature Matching Using an Attention Graph Neural Network. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 3491–3498. [Google Scholar]
- Cui, J.; Chen, J.; Li, L. SAGE-ICP: Semantic Information-Assisted ICP. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 8537–8543. [Google Scholar]
- Koledić, K.; Cvišić, I.; Marković, I.; Petrović, I. MOFT: Monocular Odometry Based on Deep Depth and Careful Feature Selection and Tracking. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 6175–6181. [Google Scholar]
- Zunair, H.; Khan, S.; MHamza, A.B. RSUD20K: A dataset for road scene understanding in autonomous driving. In Proceedings of the 2024 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 27–30 October 2024; pp. 708–714. [Google Scholar]
- Runz, M.; Buffier, M.; Agapito, L. MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects. In Proceedings of the 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Munich, Germany, 16–20 October 2018; pp. 10–20. [Google Scholar]
- Ballester, I.; Fontán, A.; Civera, J.; Strobl, K.H.; Triebel, R. DOT: Dynamic Object Tracking for Visual SLAM. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 11705–11711. [Google Scholar]
- Huang, J.; Yang, S.; Zhao, Z.; Lai, Y.-K.; Hu, S.-M. ClusterSLAM: A SLAM Backend for Simultaneous Rigid Body Clustering and Motion Estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 5875–5884. [Google Scholar]
- Huang, J.; Yang, S.; Mu, T.-J.; Hu, S.-M. ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2168–2177. [Google Scholar]
- Moosmann, F.; Stiller, C. Joint Self-Localization and Tracking of Generic Objects in 3D Range Data. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, 6–10 May 2013; pp. 1146–1152. [Google Scholar]
- Shi, J.; Wang, W.; Qi, M.; Li, X.; Yan, Y. DYNAM-LVIO: A Dynamic-Object-Aware Lidar Visual Inertial Odometry in Dynamic Urban Environments. IEEE Trans. Instrum. Meas. 2024, 73, 1–19. [Google Scholar] [CrossRef]
- Yang, S.; Scherer, S. CubeSLAM: Monocular 3-D Object SLAM. IEEE Trans. Robot. 2019, 35, 925–938. [Google Scholar] [CrossRef]
- Gonzalez, M.; Marchand, E.; Kacete, A.; Royan, J. TwistSLAM: Constrained SLAM in Dynamic Environment. IEEE Robot. Autom. Lett. 2022, 7, 6846–6853. [Google Scholar] [CrossRef]
- Qiu, Y.; Wang, C.; Wang, W.; Henein, M.; Scherer, S. AIRDOS: Dynamic SLAM Benefits from Articulated Objects. In Proceedings of the 2022 IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 8047–8053. [Google Scholar]
- Tian, X.; Zhu, Z.; Zhao, J.; Tian, G.; Ye, C. DL-SLOT: Tightly-Coupled Dynamic Lidar SLAM and 3D Object Tracking Based on Collaborative Graph Optimization. IEEE Trans. Intell. Veh. 2023, 9, 1017–1027. [Google Scholar] [CrossRef]
- Zhu, Z.; Zhao, J.; Huang, K.; Tian, X.; Lin, J.; Ye, C. LIMOT: A Tightly-Coupled System for Lidar-Inertial Odometry and Multi-Object Tracking. IEEE Robot. Autom. Lett. 2024, 9, 6600–6607. [Google Scholar] [CrossRef]
- Li, X.; Yan, Z.; Feng, S.; Xia, C.; Li, S.; Zhou, Y. LIO-LOT: Tightly-Coupled Multi-Object Tracking and Lidar-Inertial Odometry. IEEE Trans. Intell. Transp. Syst. 2024, 26, 742–756. [Google Scholar] [CrossRef]
- Ying, Z.; Li, H. IMM-SLAMMOT: Tightly-Coupled SLAM and IMM-Based Multi-Object Tracking. IEEE Trans. Intell. Veh. 2023, 9, 3964–3974. [Google Scholar] [CrossRef]
- Liu, Y.; Liu, J.; Hao, Y.; Deng, B.; Meng, Z. A Switching-Coupled Backend for Simultaneous Localization and Dynamic Object Tracking. IEEE Robot. Autom. Lett. 2021, 6, 1296–1303. [Google Scholar] [CrossRef]
- Lin, Y.-K.; Lin, W.-C.; Wang, C.-C. Asynchronous State Estimation of Simultaneous Ego-Motion Estimation and Multiple Object Tracking for Lidar-Inertial Odometry. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 10616–10622. [Google Scholar]
- Wang, Y.; Kitani, K.; Weng, X. Joint Object Detection and Multi-Object Tracking with Graph Neural Networks. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 13708–13715. [Google Scholar]
- Weng, X.; Yuan, Y.; Kitani, K. PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling. IEEE Robot. Autom. Lett. 2021, 6, 4640–4647. [Google Scholar] [CrossRef]
- Brasó, G.; Leal-Taixé, L. Learning a Neural Solver for Multiple Object Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 6247–6257. [Google Scholar]
- Bilgi, H.Ç.; Alatan, A.A. Bi-Directional Tracklet Embedding for Multi-Object Tracking. In Proceedings of the 2024 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 27–30 October 2024; pp. 4035–4041. [Google Scholar]
- Chen, H.; Li, N.; Li, D.; Lv, J.; Zhao, W.; Zhang, R.; Xu, J. Multiple Object Tracking in Satellite Video with Graph-Based Multi-Clue Fusion Tracker. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5639914. [Google Scholar]
- Gao, Y.; Xu, H.; Li, J.; Wang, N.; Gao, X. Multi-Scene Generalized Trajectory Global Graph Solver with Composite Nodes for Multiple Object Tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; pp. 1842–1850. [Google Scholar]
- Lindenberger, P.; Sarlin, P.-E.; Pollefeys, M. LightGlue: Local Feature Matching at Light Speed. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 4–6 October 2023; pp. 17627–17638. [Google Scholar]
- Chen, H.; Luo, Z.; Zhang, J.; Zhou, L.; Bai, X.; Hu, Z.; Tai, C.-L.; Quan, L. Learning to Match Features with Seeded Graph Matching Network. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 6301–6310. [Google Scholar]
- Shi, Y.; Cai, J.-X.; Shavit, Y.; Mu, T.-J.; Feng, W.; Zhang, K. ClusterGNN: Cluster-Based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 12517–12526. [Google Scholar]
- Sun, J.; Shen, Z.; Wang, Y.; Bao, H.; Zhou, X. LoFTR: Detector-Free Local Feature Matching with Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 8922–8931. [Google Scholar]
- Shi, S.; Wang, X.; Li, H. PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 770–779. [Google Scholar]
- Moré, J.J. The Levenberg-Marquardt Algorithm: Implementation and Theory. In Numerical Analysis: Proceedings of the Biennial Conference Held at Dundee, Dundee, UK, 28 June–1 July 1977; Springer: Berlin/Heidelberg, Germany, 2006; pp. 105–116. [Google Scholar]
- Pineda, L.; Fan, T.; Monge, M.; Venkataraman, S.; Sodhi, P.; Chen, R.T.; Ortiz, J.; DeTone, D.; Wang, A.; Anderson, S.; et al. Theseus: A Library for Differentiable Nonlinear Optimization. Adv. Neural Inf. Process. Syst. 2022, 35, 3801–3818. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision Meets Robotics: The KITTI Dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
- Bescos, B.; Fácil, J.M.; Civera, J.; Neira, J. DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes. IEEE Robot. Autom. Lett. 2018, 3, 4076–4083. [Google Scholar] [CrossRef]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11621–11631. [Google Scholar]
- Sun, P.; Kretzschmar, H.; Dotiwalla, X.; Chouard, A.; Patnaik, V.; Tsui, P.; Guo, J.; Zhou, Y.; Chai, Y.; Caine, B.; et al. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2446–2454. [Google Scholar]
- Teed, Z.; Deng, J. Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras. Adv. Neural Inf. Process. Syst. 2021, 34, 16558–16569. [Google Scholar]
- Teed, Z.; Deng, J. Deepv2d: Video to depth with differentiable structure from motion. arXiv 2018, arXiv:1812.04605. [Google Scholar]



| seq | ORB-SLAM2 [1] | DynaSLAM [46] | DynaSLAM2 [5] | KOM-SLAM | ||||
|---|---|---|---|---|---|---|---|---|
| RPEt (m/f) | RPER (°/f) | RPEt (m/f) | RPER (°/f) | RPEt (m/f) | RPER (°/f) | RPEt (m/f) | RPER (°/f) | |
| 0000 | 0.04 | 0.06 | 0.04 | 0.06 | 0.04 | 0.06 | 0.04 | 0.07 |
| 0001 | 0.05 | 0.04 | 0.05 | 0.04 | 0.05 | 0.04 | 0.04 | 0.03 |
| 0002 | 0.04 | 0.03 | 0.04 | 0.03 | 0.04 | 0.02 | 0.03 | 0.00 |
| 0003 | 0.07 | 0.04 | 0.07 | 0.04 | 0.06 | 0.04 | 0.05 | 0.02 |
| 0004 | 0.07 | 0.06 | 0.07 | 0.06 | 0.07 | 0.06 | 0.06 | 0.07 |
| 0005 | 0.06 | 0.03 | 0.06 | 0.03 | 0.06 | 0.03 | 0.05 | 0.01 |
| 0006 | 0.02 | 0.04 | 0.02 | 0.04 | 0.02 | 0.01 | 0.01 | 0.01 |
| 0007 | 0.05 | 0.07 | 0.05 | 0.07 | 0.05 | 0.07 | 0.04 | 0.03 |
| 0008 | 0.08 | 0.04 | 0.08 | 0.04 | 0.10 | 0.04 | 0.07 | 0.03 |
| 0009 | 0.06 | 0.05 | 0.06 | 0.05 | 0.06 | 0.06 | 0.04 | 0.02 |
| 0010 | 0.07 | 0.04 | 0.07 | 0.04 | 0.07 | 0.03 | 0.06 | 0.02 |
| 0011 | 0.04 | 0.03 | 0.04 | 0.03 | 0.04 | 0.03 | 0.03 | 0.04 |
| 0013 | 0.04 | 0.05 | 0.04 | 0.05 | 0.04 | 0.04 | 0.03 | 0.02 |
| 0014 | 0.03 | 0.08 | 0.03 | 0.08 | 0.03 | 0.08 | 0.03 | 0.05 |
| 0018 | 0.05 | 0.03 | 0.05 | 0.03 | 0.05 | 0.02 | 0.04 | 0.00 |
| 0019 | 0.05 | 0.03 | 0.05 | 0.03 | 0.05 | 0.02 | 0.03 | 0.02 |
| 0020 | 0.11 | 0.07 | 0.05 | 0.04 | 0.07 | 0.04 | 0.04 | 0.01 |
| mean | 0.055 | 0.046 | 0.051 | 0.045 | 0.053 | 0.041 | 0.041 | 0.028 |
| std | 0.021 | 0.016 | 0.015 | 0.015 | 0.018 | 0.019 | 0.014 | 0.020 |
| seq/obj.id/class | DynaSLAM2 [5] | KOM-SLAM | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2D TP | 2D MOTP | BV TP | BV MOTP | 3D TP | 3D MOTP | 2D TP | 2D MOTP | BV TP | BV MOTP | 3D TP | 3D MOTP | |
| 03/1/car | 50.00 | 71.79 | 39.34 | 56.61 | 38.53 | 48.20 | 95.90 | 86.85 | 95.90 | 75.52 | 97.54 | 66.19 |
| 05/31/car | 28.96 | 60.30 | 14.48 | 46.84 | 11.45 | 34.20 | 82.09 | 83.63 | 81.42 | 76.97 | 83.78 | 66.15 |
| 10/0/car | 81.63 | 73.51 | 70.41 | 47.60 | 68.37 | 40.28 | 98.29 | 89.54 | 98.29 | 76.98 | 98.63 | 68.84 |
| 11/0/car | 72.65 | 74.78 | 61.66 | 50.74 | 52.58 | 47.35 | 97.85 | 87.71 | 97.85 | 84.42 | 98.12 | 74.04 |
| 11/35/car | 53.17 | 65.25 | 19.05 | 31.95 | 6.35 | 26.02 | 77.34 | 84.09 | 71.09 | 84.74 | 76.56 | 74.43 |
| 18/2/car | 86.36 | 74.81 | 67.05 | 45.47 | 62.12 | 34.80 | 93.18 | 89.84 | 93.18 | 80.92 | 93.18 | 74.46 |
| 18/3/car | 53.33 | 70.94 | 21.75 | 41.45 | 16.84 | 35.80 | 85.96 | 86.20 | 85.61 | 77.06 | 85.96 | 64.50 |
| 19/63/car | 35.26 | 63.50 | 29.48 | 45.69 | 26.48 | 33.89 | 54.91 | 90.89 | 53.76 | 84.65 | 55.49 | 77.93 |
| 19/72/car | 29.11 | 62.59 | 29.43 | 55.48 | 29.43 | 39.81 | 12.34 | 75.67 | 12.03 | 73.30 | 12.03 | 61.74 |
| 20/0/car | 63.68 | 78.54 | 43.78 | 45.00 | 31.84 | 46.15 | 85.00 | 90.33 | 85.50 | 74.98 | 86.00 | 65.06 |
| 20/12/car | 42.77 | 76.77 | 37.64 | 49.29 | 36.23 | 40.81 | 91.91 | 88.24 | 90.05 | 73.67 | 91.76 | 62.03 |
| 20/122/car | 34.90 | 78.76 | 34.51 | 48.05 | 29.02 | 44.43 | 90.98 | 83.21 | 77.25 | 72.15 | 91.37 | 60.27 |
| mean | 52.65 | 70.96 | 39.05 | 47.01 | 34.10 | 39.31 | 80.48 | 86.35 | 78.94 | 77.95 | 80.87 | 67.97 |
| std | 19.01 | 6.20 | 17.82 | 6.13 | 28.30 | 6.36 | 23.49 | 4.09 | 23.47 | 4.40 | 23.69 | 5.63 |
| seq | No Keypoint–Object Layer | No GNN | No Gating Layer | No Soft Assignment | KOM-SLAM | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| RPEt (m/f) | RPER (°/f) | RPEt (m/f) | RPER (°/f) | RPEt (m/f) | RPER (°/f) | RPEt (m/f) | RPER (°/f) | RPEt (m/f) | RPER (°/f) | |
| 0000 | 0.04 | 0.07 | 0.04 | 0.10 | 0.04 | 0.08 | 0.04 | 0.09 | 0.04 | 0.07 |
| 0001 | 0.04 | 0.15 | 0.04 | 0.01 | 0.04 | 0.09 | 0.04 | 0.02 | 0.04 | 0.03 |
| 0002 | 0.03 | 0.00 | 0.03 | 0.06 | 0.03 | 0.03 | 0.03 | 0.01 | 0.03 | 0.00 |
| 0003 | 0.06 | 0.02 | 0.06 | 0.04 | 0.06 | 0.06 | 0.06 | 0.03 | 0.05 | 0.02 |
| 0004 | 0.06 | 0.08 | 0.06 | 0.07 | 0.07 | 0.07 | 0.07 | 0.10 | 0.06 | 0.07 |
| 0005 | 0.05 | 0.10 | 0.05 | 0.01 | 0.05 | 0.03 | 0.05 | 0.14 | 0.05 | 0.01 |
| 0006 | 0.01 | 0.06 | 0.01 | 0.06 | 0.01 | 0.01 | 0.02 | 0.11 | 0.01 | 0.01 |
| 0007 | 0.04 | 0.10 | 0.04 | 0.18 | 0.04 | 0.29 | 0.05 | 0.09 | 0.04 | 0.03 |
| 0008 | 0.07 | 0.03 | 0.07 | 0.07 | 0.07 | 0.05 | 0.07 | 0.04 | 0.07 | 0.03 |
| 0009 | 0.04 | 0.03 | 0.04 | 0.05 | 0.04 | 0.03 | 0.04 | 0.05 | 0.04 | 0.02 |
| 0010 | 0.06 | 0.15 | 0.06 | 0.04 | 0.06 | 0.03 | 0.06 | 0.05 | 0.06 | 0.02 |
| 0011 | 0.03 | 0.15 | 0.03 | 0.12 | 0.03 | 0.03 | 0.04 | 0.20 | 0.03 | 0.04 |
| 0013 | 0.03 | 0.08 | 0.03 | 0.05 | 0.03 | 0.04 | 0.03 | 0.06 | 0.03 | 0.02 |
| 0014 | 0.03 | 0.09 | 0.03 | 0.09 | 0.03 | 0.07 | 0.03 | 0.10 | 0.03 | 0.05 |
| 0018 | 0.04 | 0.11 | 0.04 | 0.03 | 0.04 | 0.03 | 0.04 | 0.03 | 0.04 | 0.00 |
| 0019 | 0.03 | 0.17 | 0.03 | 0.11 | 0.03 | 0.11 | 0.03 | 0.05 | 0.03 | 0.02 |
| 0020 | 0.04 | 0.03 | 0.04 | 0.00 | 0.04 | 0.02 | 0.04 | 0.01 | 0.04 | 0.01 |
| mean | 0.041 | 0.084 | 0.041 | 0.064 | 0.042 | 0.062 | 0.043 | 0.069 | 0.041 | 0.028 |
| std | 0.015 | 0.050 | 0.015 | 0.044 | 0.015 | 0.062 | 0.014 | 0.049 | 0.014 | 0.020 |
| seq | Naive Approach | No GNN | No Keypoint–Object Layer | KOM-SLAM | ||||
|---|---|---|---|---|---|---|---|---|
| MOTA | IDF1 | MOTA | IDF1 | MOTA | IDF1 | MOTA | IDF1 | |
| 0000 | 1.00 | 0.97 | 0.99 | 0.98 | 1.00 | 1.00 | 1.00 | 1.00 |
| 0001 | 0.94 | 0.91 | 0.96 | 0.91 | 0.99 | 0.94 | 0.99 | 0.96 |
| 0002 | 0.91 | 0.90 | 0.92 | 0.68 | 0.98 | 0.88 | 0.99 | 0.91 |
| 0003 | 0.59 | 0.59 | 0.97 | 0.89 | 0.97 | 0.95 | 1.00 | 1.00 |
| 0004 | 0.70 | 0.70 | 0.99 | 0.97 | 0.98 | 0.98 | 1.00 | 0.99 |
| 0005 | 0.77 | 0.74 | 0.97 | 0.92 | 0.97 | 0.95 | 1.00 | 0.99 |
| 0006 | 0.25 | 0.25 | 0.99 | 0.95 | 0.99 | 0.85 | 0.99 | 0.87 |
| 0007 | 0.95 | 0.95 | 0.99 | 0.96 | 1.00 | 0.99 | 1.00 | 1.00 |
| 0008 | 0.32 | 0.32 | 0.97 | 0.80 | 0.85 | 0.71 | 1.00 | 0.92 |
| 0009 | 0.83 | 0.80 | 0.97 | 0.87 | 0.98 | 0.93 | 0.99 | 0.94 |
| 0010 | 0.72 | 0.70 | 0.94 | 0.88 | 0.96 | 0.96 | 1.00 | 1.00 |
| 0011 | 0.87 | 0.87 | 0.96 | 0.83 | 0.98 | 0.93 | 0.99 | 0.95 |
| 0013 | 1.00 | 0.99 | 0.95 | 0.85 | 0.91 | 0.84 | 0.95 | 0.92 |
| 0014 | 0.89 | 0.88 | 0.89 | 0.82 | 0.90 | 0.85 | 0.92 | 0.88 |
| 0018 | 0.73 | 0.69 | 0.99 | 0.81 | 0.96 | 0.87 | 0.99 | 0.88 |
| 0019 | 0.99 | 0.99 | 0.95 | 0.75 | 0.96 | 0.75 | 0.97 | 0.82 |
| 0020 | 0.92 | 0.92 | 0.98 | 0.92 | 0.99 | 0.91 | 0.99 | 0.97 |
| mean | 0.79 | 0.77 | 0.96 | 0.88 | 0.96 | 0.90 | 0.99 | 0.94 |
| std | 0.22 | 0.21 | 0.027 | 0.077 | 0.039 | 0.079 | 0.021 | 0.054 |
| seq | Sparser Keypoints | Every 2 Frames | KOM-SLAM | |||
|---|---|---|---|---|---|---|
| RPEt (m/f) | RPER (°/f) | RPEt (m/f) | RPER (°/f) | RPEt (m/f) | RPER (°/f) | |
| 0000 | 0.04 | 0.08 | 0.07 | 0.28 | 0.04 | 0.07 |
| 0001 | 0.04 | 0.02 | 0.08 | 0.14 | 0.04 | 0.03 |
| 0002 | 0.03 | 0.06 | 0.04 | 0.05 | 0.03 | 0.00 |
| 0003 | 0.06 | 0.13 | 0.09 | 0.12 | 0.05 | 0.02 |
| 0004 | 0.07 | 0.08 | 0.12 | 0.31 | 0.06 | 0.07 |
| 0005 | 0.05 | 0.01 | 0.06 | 0.01 | 0.05 | 0.01 |
| 0006 | 0.02 | 0.01 | 0.04 | 0.10 | 0.01 | 0.01 |
| 0007 | 0.05 | 0.08 | 0.14 | 0.33 | 0.04 | 0.03 |
| 0008 | 0.07 | 0.05 | 0.09 | 0.13 | 0.07 | 0.03 |
| 0009 | 0.05 | 0.05 | 0.10 | 0.25 | 0.04 | 0.02 |
| 0010 | 0.06 | 0.09 | 0.08 | 0.09 | 0.06 | 0.02 |
| 0011 | 0.03 | 0.01 | 0.05 | 0.13 | 0.03 | 0.04 |
| 0013 | 0.03 | 0.05 | 0.05 | 0.19 | 0.03 | 0.02 |
| 0014 | 0.03 | 0.09 | 0.06 | 0.14 | 0.03 | 0.05 |
| 0018 | 0.04 | 0.00 | 0.08 | 0.04 | 0.04 | 0.00 |
| 0019 | 0.03 | 0.07 | 0.06 | 0.07 | 0.03 | 0.02 |
| 0020 | 0.04 | 0.07 | 0.09 | 0.06 | 0.04 | 0.01 |
| mean | 0.043 | 0.055 | 0.076 | 0.14 | 0.041 | 0.028 |
| std | 0.015 | 0.034 | 0.026 | 0.094 | 0.014 | 0.020 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Liu, J.; Tian, Y.; Gu, Y.; Kamijo, S. KOM-SLAM: A GNN-Based Tightly Coupled SLAM and Multi-Object Tracking Framework. Sensors 2026, 26, 128. https://doi.org/10.3390/s26010128
Liu J, Tian Y, Gu Y, Kamijo S. KOM-SLAM: A GNN-Based Tightly Coupled SLAM and Multi-Object Tracking Framework. Sensors. 2026; 26(1):128. https://doi.org/10.3390/s26010128
Chicago/Turabian StyleLiu, Jinze, Ye Tian, Yanlei Gu, and Shunsuke Kamijo. 2026. "KOM-SLAM: A GNN-Based Tightly Coupled SLAM and Multi-Object Tracking Framework" Sensors 26, no. 1: 128. https://doi.org/10.3390/s26010128
APA StyleLiu, J., Tian, Y., Gu, Y., & Kamijo, S. (2026). KOM-SLAM: A GNN-Based Tightly Coupled SLAM and Multi-Object Tracking Framework. Sensors, 26(1), 128. https://doi.org/10.3390/s26010128

