Occlusion-Aware Interactive End-to-End Autonomous Driving for Right-of-Way Conflicts
Abstract
1. Introduction
- A novel approach that fuses visual features into a hierarchical dynamic graph structure is introduced, enabling 3D topological reasoning regarding occluded regions through primitive geometric decomposition;
- A hierarchical interaction framework is proposed that leverages graph attention mechanisms to explicitly model the dependencies of the multi-modal trajectories of multiple agents, generating physically and socially compatible motion plans;
- The effectiveness of the proposed method is validated through benchmark testing and a real-world field test conducted in Suzhou, China, covering both occlusion and complex interaction scenes.
2. Related Work
2.1. End-to-End Autonomous Driving
2.2. Occlusion Handling
2.3. Multi-Modal Trajectory Interaction
3. Materials and Methods
3.1. Scene Representation with Occlusion Awareness
3.1.1. Perception Feature Representation
3.1.2. Vectorized Occlusion Region Modeling
- represents the start/end BEV coordinates of the i-th edge;
- represents the occlusion height at the start and end points.
3.1.3. Agent and Map Feature Representation
- Agent features: , where D includes attributes like position, orientation, and category confidence;
- Map features: , representing vectorized structures such as lane dividers, road boundaries, and crosswalks.
3.1.4. Query-Based Semantic Feature Extraction
3.1.5. Scene Feature Fusion and Instance-Level Reasoning
3.2. Joint Network for Multi-Modal Trajectory Prediction and Planning
- (1) Mode-to-Scene Cross-Attention: Incorporates scene-level features (e.g., map, occlusion, and historical trajectories) into each mode embedding to enhance environmental awareness
- (2) Mode-to-Time Cross-Attention: Captures temporal dynamics in trajectory evolution (e.g., acceleration and lane changes) with time-aware cross-attention
- (3) Agent Self-Attention: Models interactions between agents (e.g., yielding, following, and merging) within the same mode
3.3. Training and Optimization
3.3.1. Vectorized Scene Learning
- Vectorized Map Modeling: We used Manhattan distance as the regression metric to measure the geometric deviation between predicted and ground-truth map points. In addition, focal loss [27] was introduced as a classification objective to enhance the model’s focus on key semantic map elements, such as lane dividers, stop lines, and crosswalks. The overall loss for this module is denoted as .
- Vectorized Dynamic Obstacle Modeling: This subtask models the vectorized trajectories of dynamic targets in the traffic environment (e.g., vehicles or pedestrians) to achieve a structured representation of their behavior. The corresponding loss is denoted as .
- Vectorized Occlusion Region Modeling: This module learns the spatial distribution features of potential occlusion regions, which helps improve the model’s reasoning ability for occluded or invisible targets. The loss function for this component is denoted as .
3.3.2. Multi-Modal Motion Prediction and Planning Constraint Modeling
3.3.3. Overall End-to-End Training Objective
4. Experiments
4.1. Experimental Settings
4.2. Performance Evaluation on Open Datasets
- L2 Error: This refers to the Euclidean distance between the predicted and actual trajectories at three different time steps (1 s, 2 s, and 3 s). The lower the L2 error, the more accurate the predicted trajectory. The “Avg.” column represents the average error across these time steps.
- Collision Rate: This measures the percentage of episodes in which the ego vehicle collides with obstacles. It is calculated at the 1 s, 2 s, and 3 s marks, with a lower value indicating fewer collisions. The “Avg.” column provides the average collision rate across all time steps.
- Driving Score: This composite score evaluates the overall performance of the vehicle across various driving tasks, including lane-keeping, speed regulation, and interaction with other vehicles. It aggregates multiple factors that contribute to the vehicle’s ability to safely and efficiently complete driving tasks.
- Success Rate: This measures the percentage of scenarios in which the ego vehicle successfully completes the task without any major failures, such as collisions or off-road excursions. A higher success rate reflects better overall performance in task execution.
4.3. Performance Evaluation on Real-World Dataset
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Fresh. Safety Considerations for Autonomous Vehicles. 2024. Available online: https://www.sgpjbg.com.cn/baogao/169884.html (accessed on 11 September 2025).
- Li, M.; Li, G.; Sun, C.; Yang, J.; Li, H.; Li, J.; Li, F. A shared-road-rights driving strategy based on resolution guidance for right-of-way conflicts. Electronics 2024, 13, 3214. [Google Scholar] [CrossRef]
- Kim, M.J.; Pertsch, K.; Karamcheti, S.; Xiao, T.; Balakrishna, A.; Nair, S.; Rafailov, R.; Foster, E.; Lam, G.; Sanketi, P.; et al. OpenVLA: An open-source vision-language-action model. arXiv 2024, arXiv:2406.09246. [Google Scholar]
- Hu, Y.; Yang, J.; Chen, L.; Li, K.; Sima, C.; Zhu, X.; Chai, S.; Du, S.; Lin, T.; Wang, W.; et al. Planning-oriented autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 17853–17862. [Google Scholar]
- Mescheder, L.; Oechsle, M.; Niemeyer, M.; Nowozin, S.; Geiger, A. Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4460–4470. [Google Scholar]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11621–11631. [Google Scholar]
- Sun, W.; Lin, X.; Shi, Y.; Zhang, C.; Wu, H.; Zheng, S. Sparsedrive: End-to-end autonomous driving via sparse scene representation. In Proceedings of the 2025 IEEE International Conference on Robotics and Automation (ICRA), Atlanta, GA, USA, 19–23 May 2025; pp. 8795–8801. [Google Scholar]
- Jiang, B.; Chen, S.; Xu, Q.; Liao, B.; Chen, J.; Zhou, H.; Zhang, Q.; Liu, W.; Huang, C.; Wang, X. Vad: Vectorized scene representation for efficient autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 8340–8350. [Google Scholar]
- Zhang, Y.; Qian, D.; Li, D.; Pan, Y.; Chen, Y.; Liang, Z.; Zhang, Z.; Zhang, S.; Li, H.; Fu, M.; et al. Graphad: Interaction scene graph for end-to-end autonomous driving. arXiv 2024, arXiv:2403.19098. [Google Scholar]
- Chen, S.; Jiang, B.; Gao, H.; Liao, B.; Xu, Q.; Zhang, Q.; Huang, C.; Liu, W.; Wang, X. Vadv2: End-to-end vectorized autonomous driving via probabilistic planning. arXiv 2024, arXiv:2402.13243. [Google Scholar]
- Li, Z.; Li, K.; Wang, S.; Lan, S.; Yu, Z.; Ji, Y.; Li, Z.; Zhu, Z.; Kautz, J.; Wu, Z.; et al. Hydra-mdp: End-to-end multimodal planning with multi-target hydra-distillation. arXiv 2024, arXiv:2406.06978. [Google Scholar]
- Zheng, W.; Song, R.; Guo, X.; Zhang, C.; Chen, L. Genad: Generative end-to-end autonomous driving. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; Springer Nature: Cham, Switzerland, 2024; pp. 87–104. [Google Scholar]
- Lange, B.; Li, J.; Kochenderfer, M.J. Scene informer: Anchor-based occlusion inference and trajectory prediction in partially observable environments. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 14138–14145. [Google Scholar]
- Tian, X.; Jiang, T.; Yun, L.; Mao, Y.; Yang, H.; Wang, Y.; Wang, Y.; Zhao, H. Occ3d: A large-scale 3d occupancy prediction benchmark for autonomous driving. Adv. Neural Inf. Process. Syst. 2023, 36, 64318–64330. [Google Scholar]
- Saxena, R.; Schuster, R.; Wasenmuller, O.; Stricker, D. PWOC-3D: Deep occlusion-aware end-to-end scene flow estimation. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019; pp. 324–331. [Google Scholar]
- Zhang, Z.; Fisac, J.F. Safe occlusion-aware autonomous driving via game-theoretic active perception. arXiv 2021, arXiv:2105.08169. [Google Scholar]
- Narksri, P.; Darweesh, H.; Takeuchi, E.; Ninomiya, Y.; Takeda, K. Occlusion-aware motion planning with visibility maximization via active lateral position adjustment. IEEE Access 2022, 10, 57759–57782. [Google Scholar] [CrossRef]
- Shao, H.; Wang, L.; Chen, R.; Waslander, S.L.; Li, H.; Liu, Y. Reasonnet: End-to-end driving with temporal and global reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 13723–13733. [Google Scholar]
- Brewitt, C.; Tamborski, M.; Wang, C.; Albrecht, S.V. Verifiable goal recognition for autonomous driving with occlusions. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 11210–11217. [Google Scholar]
- Huang, Z.; Liu, H.; Lv, C. Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 3903–3913. [Google Scholar]
- Wang, Y.; Tang, C.; Sun, L.; Rossi, S.; Xie, Y.; Peng, C.; Hannagan, T.; Sabatini, S.; Poerio, N.; Tomizuka, M.; et al. Optimizing diffusion models for joint trajectory prediction and controllable generation. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; Springer Nature: Cham, Switzerland, 2024; pp. 324–341. [Google Scholar]
- Hu, H.; Wang, Q.; Zhang, Z.; Li, Z.; Gao, Z. Holistic transformer: A joint neural network for trajectory prediction and decision-making of autonomous vehicles. Pattern Recognit. 2023, 141, 109592. [Google Scholar] [CrossRef]
- Liao, H.; Li, X.; Li, Y.; Kong, H.; Wang, C.; Wang, B.; Guan, Y.; Tam, K.; Li, Z. Cdstraj: Characterized diffusion and spatial-temporal interaction network for trajectory prediction in autonomous driving. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, Jeju, Republic of Korea, 3–9 August 2024; pp. 7331–7339. [Google Scholar]
- Huang, Z.; Liu, H.; Wu, J.; Lv, C. Differentiable integrated motion prediction and planning with learnable cost function for autonomous driving. IEEE Trans. Neural Networks Learn. Syst. 2023, 35, 15222–15236. [Google Scholar] [CrossRef] [PubMed]
- Ye, T.; Jing, W.; Hu, C.; Huang, S.; Gao, L.; Li, F.; Wang, J.; Guo, K.; Xiao, W.; Mao, W.; et al. Fusionad: Multi-modality fusion for prediction and planning tasks of autonomous driving. arXiv 2023, arXiv:2308.01006. [Google Scholar]
- Zhou, J.; Olofsson, B.; Frisk, E. Interaction-aware motion planning for autonomous vehicles with multi-modal obstacle uncertainty predictions. IEEE Trans. Intell. Veh. 2023, 9, 1305–1319. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Zhou, Z.; Wen, Z.; Wang, J.; Li, Y.H.; Huang, Y.K. Qcnext: A next-generation framework for joint multi-agent trajectory prediction. arXiv 2023, arXiv:2306.10508. [Google Scholar]
- Jia, X.; Yang, Z.; Li, Q.; Zhang, Z.; Yan, J. Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving. Adv. Neural Inf. Process. Syst. 2024, 37, 819–844. [Google Scholar]
- Jia, X.; You, J.; Zhang, Z.; Yan, J. Drivetransformer: Unified transformer for scalable end-to-end autonomous driving. arXiv 2025, arXiv:2503.07656. [Google Scholar]
- Wang, T.; Zhang, C.; Qu, X.; Li, K.; Liu, W.; Huang, C. DiffAD: A Unified Diffusion Modeling Approach for Autonomous Driving. arXiv 2025, arXiv:2503.12170. [Google Scholar]
Method | L2 (m) ↓ | Collision Rate (%) ↓ | ||||||
---|---|---|---|---|---|---|---|---|
1 s | 2 s | 3 s | Avg. | 1 s | 2 s | 3 s | Avg. | |
UniAD [4] | 0.48 | 0.76 | 1.65 | 1.03 | 0.05 | 0.17 | 1.27 | 0.71 |
VAD [8] | 0.41 | 0.70 | 1.05 | 0.72 | 0.07 | 0.17 | 0.41 | 0.22 |
SparseDrive-B [7] | 0.29 | 0.55 | 0.91 | 0.58 | 0.01 | 0.02 | 0.13 | 0.06 |
OAIAD (Ours) | 0.23 | 0.51 | 0.83 | 0.50 | 0.01 | 0.01 | 0.09 | 0.04 |
Method | Multi-Ability (%) ↑ | Overall ↑ | ||||||
---|---|---|---|---|---|---|---|---|
Merging | Overtaking | Emergency Brake | Give Way | Traffic Sign | Mean | Driving Score | Success Rate | |
UniAD [4] | 14.1 | 17.78 | 21.67 | 10 | 14.21 | 15.55 | 45.81 | 16.36 |
VAD [8] | 8.11 | 24.44 | 18.64 | 20 | 19.15 | 18.07 | 42.35 | 15 |
DriveTransformer-Large [30] | 17.57 | 35 | 48.36 | 40 | 52.1 | 38.6 | 60.45 | 30 |
DiffAD [31] | 30 | 35.55 | 46.66 | 40 | 46.32 | 38.79 | 67.92 | 38.64 |
OAIAD (Ours) | 32.68 | 34.46 | 63.47 | 40 | 42.39 | 42.6 | 68.73 | 48.86 |
ID | Occlusion Feature | Interactive Planner | L2 (m) ↓ | Collision Rate (%) ↓ | ||||||
---|---|---|---|---|---|---|---|---|---|---|
1 s | 2 s | 3 s | Avg. | 1 s | 2 s | 3 s | Avg. | |||
1 | ✕ | ✕ | 0.41 | 0.70 | 1.05 | 0.72 | 0.07 | 0.17 | 0.41 | 0.22 |
2 | ✓ | ✕ | 0.36 | 0.63 | 0.81 | 0.60 | 0.04 | 0.13 | 0.31 | 0.16 |
3 | ✕ | ✓ | 0.33 | 0.59 | 0.84 | 0.58 | 0.05 | 0.14 | 0.28 | 0.15 |
4 | ✓ | ✓ | 0.23 | 0.51 | 0.83 | 0.50 | 0.01 | 0.01 | 0.09 | 0.04 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yin, J.; Zhao, K.; Ma, X.; Yan, S.; Li, H.; Yang, J.; Chen, Y. Occlusion-Aware Interactive End-to-End Autonomous Driving for Right-of-Way Conflicts. Machines 2025, 13, 965. https://doi.org/10.3390/machines13100965
Yin J, Zhao K, Ma X, Yan S, Li H, Yang J, Chen Y. Occlusion-Aware Interactive End-to-End Autonomous Driving for Right-of-Way Conflicts. Machines. 2025; 13(10):965. https://doi.org/10.3390/machines13100965
Chicago/Turabian StyleYin, Jialun, Kun Zhao, Xiaohan Ma, Siping Yan, Haoran Li, Junru Yang, and Yin Chen. 2025. "Occlusion-Aware Interactive End-to-End Autonomous Driving for Right-of-Way Conflicts" Machines 13, no. 10: 965. https://doi.org/10.3390/machines13100965
APA StyleYin, J., Zhao, K., Ma, X., Yan, S., Li, H., Yang, J., & Chen, Y. (2025). Occlusion-Aware Interactive End-to-End Autonomous Driving for Right-of-Way Conflicts. Machines, 13(10), 965. https://doi.org/10.3390/machines13100965