Abstract
Coupled simultaneous localization and mapping (SLAM) and multi-object tracking have been studied in recent years. Although these tasks achieve promising results, they mainly associate keypoints and objects across frames separately, which limits their robustness in complex dynamic scenes. To overcome this limitation, we propose KOM-SLAM, a tightly coupled SLAM and multi-object tracking framework based on a Graph Neural Network (GNN), which jointly learns keypoint and object associations across frames while estimating ego-poses in a differentiable manner. The framework constructs a spatiotemporal graph over keypoints and object detections for association, and employs a multilayer perceptron (MLP) followed by a sigmoid activation that adaptively adjusts association thresholds based on ego-motion and spatial context. We apply a soft assignment on keypoints to ensure differentiable pose estimation, enabling the pose loss to supervise the association learning directly. Experiments on the KITTI Tracking demonstrate that our method achieves improved performance in both localization and object tracking.