Abstract
Vehicle tracking is essential for autonomous driving, traffic surveillance, and intelligent transportation, yet most existing trackers rely on frame-level training that neglects temporal dependencies. This mismatch between training and testing leads to error propagation, mislocalization in challenging frames, and failure to re-identify vehicles after occlusion. We present a reinforcement learning (RL)-based sequence-level training framework that formulates tracking as a sequential decision process and directly incorporates evaluation metrics consistent with testing. Our approach enhances robustness in difficult frames and occlusion scenarios by leveraging temporal decision dependencies, and introduces a temporal data augmentation strategy based on sliding-window sampling to improve generalization across diverse motion patterns. Experiments on challenging benchmarks indicate that our method provides improved robustness and temporal continuity over frame-level training approaches, suggesting the benefits of incorporating sequence-level learning in vehicle tracking.