You are currently viewing a new version of our website. To view the old version click .
Sensors
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

26 December 2025

HiPro-AD: Sparse Trajectory Transformer for End-to-End Autonomous Driving with Hybrid Spatiotemporal Attention

,
,
,
,
,
and
1
Shandong Electric Power Engineering Consulting Institute Corp., Ltd., Jinan 250100, China
2
Hubei Longzhong Laboratory, Xiangyang 441106, China
3
School of Automotive Engineering, Wuhan University of Technology, Wuhan 430070, China
4
Sdic Qinzhou Second Electric Power Co., Ltd., Qinzhou 535000, China
Sensors2026, 26(1), 185;https://doi.org/10.3390/s26010185 
(registering DOI)
This article belongs to the Special Issue AI-Driving for Autonomous Vehicles

Abstract

End-to-end (E2E) autonomous driving offers a promising alternative to traditional modular pipelines by mapping raw sensor data directly to vehicle controls, thereby mitigating error propagation. However, prevalent approaches largely rely on dense Bird’s-Eye-View (BEV) feature maps, which incur high computational overhead and necessitate complex post-processing for trajectory generation. To address these limitations, we propose HiPro-AD, a proposal-centric sparse E2E planning framework that fundamentally diverges from dense BEV paradigms. HiPro-AD integrates an efficiency-oriented IM-ResNet-34 encoder with a novel STFormer. This transformer dynamically fuses multi-view spatial features and historical temporal context via a proposal-anchored mechanism, focusing computation strictly on regions relevant to sparse trajectory proposals. Furthermore, trajectory selection is refined by a Pairwise Ranking Scorer, which identifies the optimal plan from diverse candidates based on relative quality. On the NAVSIM benchmark, HiPro-AD achieves a PDMS of 92.6 using only camera input, surpassing prior dense BEV and multimodal methods. On the closed-loop Bench2Drive benchmark, it attains a 37.31% success rate and a driving score of 65.48 with a latency of 67 ms, demonstrating real-time capability. These results validate the efficiency and robustness of our sparse paradigm in complex driving scenarios.

Article Metrics

Citations

Article Access Statistics

Article metric data becomes available approximately 24 hours after publication online.