HiPro-AD: Sparse Trajectory Transformer for End-to-End Autonomous Driving with Hybrid Spatiotemporal Attention

Bing Chen; Gaopeng Wang; Jiandong Yang; Shaoliang Huang; Xinhe Qian; Bin Huang; Guanlun Guo

doi:10.3390/s26010185

,

and

¹

Shandong Electric Power Engineering Consulting Institute Corp., Ltd., Jinan 250100, China

²

Hubei Longzhong Laboratory, Xiangyang 441106, China

³

School of Automotive Engineering, Wuhan University of Technology, Wuhan 430070, China

⁴

Sdic Qinzhou Second Electric Power Co., Ltd., Qinzhou 535000, China

Sensors2026, 26(1), 185;https://doi.org/10.3390/s26010185
(registering DOI)

This article belongs to the Special Issue AI-Driving for Autonomous Vehicles

Version Notes

Order Reprints

Review Reports

Abstract

End-to-end (E2E) autonomous driving offers a promising alternative to traditional modular pipelines by mapping raw sensor data directly to vehicle controls, thereby mitigating error propagation. However, prevalent approaches largely rely on dense Bird’s-Eye-View (BEV) feature maps, which incur high computational overhead and necessitate complex post-processing for trajectory generation. To address these limitations, we propose HiPro-AD, a proposal-centric sparse E2E planning framework that fundamentally diverges from dense BEV paradigms. HiPro-AD integrates an efficiency-oriented IM-ResNet-34 encoder with a novel STFormer. This transformer dynamically fuses multi-view spatial features and historical temporal context via a proposal-anchored mechanism, focusing computation strictly on regions relevant to sparse trajectory proposals. Furthermore, trajectory selection is refined by a Pairwise Ranking Scorer, which identifies the optimal plan from diverse candidates based on relative quality. On the NAVSIM benchmark, HiPro-AD achieves a PDMS of 92.6 using only camera input, surpassing prior dense BEV and multimodal methods. On the closed-loop Bench2Drive benchmark, it attains a 37.31% success rate and a driving score of 65.48 with a latency of 67 ms, demonstrating real-time capability. These results validate the efficiency and robustness of our sparse paradigm in complex driving scenarios.

Keywords:

autonomous driving; bird’s eye view; trajectory planning

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.