Knowledge Distillation-Enhanced Behavior Transformer for Decision-Making of Autonomous Driving
Abstract
:1. Introduction
- Propose the Behavior Transformer as a novel policy network for autonomous driving decision-making in RL, which innovatively processes temporal sequences of historical observations and actions as input. By harnessing the Transformer’s powerful contextual learning capabilities and attention mechanisms, this approach significantly enhances both decision-making accuracy and generalization ability.
- Develop an innovative knowledge distillation framework that establishes a teacher–student paradigm to boost RL training efficiency. The teacher model swiftly and precisely acquires expert knowledge through IL, while the student model accelerates learning through knowledge transfer. Furthermore, an adaptive decaying coefficient gradually reduces the teacher’s influence, enabling the student model to fully develop its capabilities and ultimately surpass the teacher’s performance.
- Perform comprehensive evaluations on the CARLA NoCrash benchmark suite, with extensive experimental results demonstrating that KD-BeT achieves state-of-the-art performance in terms of both decision-making accuracy and generalization capabilities across various challenging driving conditions.
2. Method Framework for Decision-Making of Autonomous Driving Based on KD-BeT
3. Partially Observable Markov Decision Process for Decision-Making of Autonomous Driving
3.1. Partially Observable Markov Decision Process
- : State space, representing the set of all possible system states.
- : Action space, representing the set of all possible actions the agent can take.
- : State transition probability, representing the probability of transitioning from state s to state after taking action a.
- : Reward function, representing the reward received when taking action a in state s.
- : Observation space, representing the set of possible observations the system can receive.
- : Observation probability, representing the probability of observing o after taking action a and transitioning to a new state .
- : Discount factor, used to discount the influence of future rewards.
3.2. Observation Representation
3.3. Action Representation
3.4. Reward Function
4. Knowledge Distillation-Enhanced Behavior Transformer for Decision-Making of Autonomous Driving
4.1. Behavior Transformer for Sequential Decision Making
4.2. Knowledge Distillation-Enhanced Behavior Transformer
4.2.1. Imitation Learning for Teacher Model
4.2.2. Reinforcement Learning for Student Model with Knowledge Distillation
4.3. Knowledge Distillation-Enhanced Behavior Transformer Algorithm
Algorithm 1 Knowledge Distillation-Enhanced Behavior Transformer (KD-BeT) |
Require: expert demonstration dataset , batch size , IL iterations , learning rate , RL iterations , timesteps of episode , context length K, online env
|
5. Experiment
5.1. Experimental Settings
5.2. Comparison with State of the Art
5.3. Ablation Study
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix B
References
- Jain, A.; Del Pero, L.; Grimmett, H.; Ondruska, P. Autonomy 2.0: Why is self-driving always 5 years away? arXiv 2021, arXiv:2107.08142. [Google Scholar]
- Jiang, B.; Chen, S.; Xu, Q.; Liao, B.; Chen, J.; Zhou, H.; Zhang, Q.; Liu, W.; Huang, C.; Wang, X. Vad: Vectorized scene representation for efficient autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 8340–8350. [Google Scholar]
- Chen, S.; Jiang, B.; Gao, H.; Liao, B.; Xu, Q.; Zhang, Q.; Huang, C.; Liu, W.; Wang, X. Vadv2: End-to-end vectorized autonomous driving via probabilistic planning. arXiv 2024, arXiv:2402.13243. [Google Scholar]
- Hu, Y.; Yang, J.; Chen, L.; Li, K.; Sima, C.; Zhu, X.; Chai, S.; Du, S.; Lin, T.; Wang, W.; et al. Planning-oriented autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 17853–17862. [Google Scholar]
- Fu, Y.; Li, C.; Yu, F.R.; Luan, T.H.; Zhang, Y. A decision-making strategy for vehicle autonomous braking in emergency via deep reinforcement learning. IEEE Trans. Veh. Technol. 2020, 69, 5876–5888. [Google Scholar] [CrossRef]
- Hoel, C.J.; Wolff, K.; Laine, L. Automated speed and lane change decision making using deep reinforcement learning. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 2148–2155. [Google Scholar]
- Tang, X.; Huang, B.; Liu, T.; Lin, X. Highway decision-making and motion planning for autonomous driving via soft actor-critic. IEEE Trans. Veh. Technol. 2022, 71, 4706–4717. [Google Scholar] [CrossRef]
- Ozcelik, M.B.; Agin, B.; Caldiran, O.; Sirin, O. Decision Making for Autonomous Driving in a Virtual Highway Environment based on Generative Adversarial Imitation Learning. In Proceedings of the 2023 Innovations in Intelligent Systems and Applications Conference (ASYU), Sivas, Turkey, 11–13 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
- Tian, H.; Wei, C.; Jiang, C.; Li, Z.; Hu, J. Personalized lane change planning and control by imitation learning from drivers. IEEE Trans. Ind. Electron. 2022, 70, 3995–4006. [Google Scholar] [CrossRef]
- Kamran, D.; Ren, Y.; Lauer, M. High-level decisions from a safe maneuver catalog with reinforcement learning for safe and cooperative automated merging. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 804–811. [Google Scholar]
- Valiente, R.; Razzaghpour, M.; Toghi, B.; Shah, G.; Fallah, Y.P. Prediction-Aware and Reinforcement Learning-Based Altruistic Cooperative Driving. IEEE Trans. Intell. Transp. Syst. 2023, 25, 2450–2465. [Google Scholar] [CrossRef]
- Zhang, J.; Chang, C.; Zeng, X.; Li, L. Multi-agent DRL-based lane change with right-of-way collaboration awareness. IEEE Trans. Intell. Transp. Syst. 2022, 24, 854–869. [Google Scholar] [CrossRef]
- Nilsson, J.; Brännström, M.; Coelingh, E.; Fredriksson, J. Lane change maneuvers for automated vehicles. IEEE Trans. Intell. Transp. Syst. 2016, 18, 1087–1096. [Google Scholar] [CrossRef]
- Wang, X.; Qi, X.; Wang, P.; Yang, J. Decision making framework for autonomous vehicles driving behavior in complex scenarios via hierarchical state machine. Auton. Intell. Syst. 2021, 1, 10. [Google Scholar] [CrossRef]
- Noh, S. Decision-making framework for autonomous driving at road intersections: Safeguarding against collision, overly conservative behavior, and violation vehicles. IEEE Trans. Ind. Electron. 2018, 66, 3275–3286. [Google Scholar] [CrossRef]
- Du, Y.; Wang, Y.; Chan, C.Y. Autonomous lane-change controller. In Proceedings of the 2015 IEEE Intelligent Vehicles Symposium (IV), Seoul, Republic of Korea, 28 June–1 July 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 386–393. [Google Scholar]
- Karlsson, J.; Murgovski, N.; Sjöberg, J. Optimal trajectory planning and decision making in lane change maneuvers near a highway exit. In Proceedings of the 2019 18th European Control Conference (ECC), Naples, Italy, 25–28 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 3254–3260. [Google Scholar]
- Xu, D.; Ding, Z.; He, X.; Zhao, H.; Moze, M.; Aioun, F.; Guillemard, F. Learning from naturalistic driving data for human-like autonomous highway driving. IEEE Trans. Intell. Transp. Syst. 2020, 22, 7341–7354. [Google Scholar] [CrossRef]
- Zhang, Z.; Zhang, L.; Deng, J.; Wang, M.; Wang, Z.; Cao, D. An enabling trajectory planning scheme for lane change collision avoidance on highways. IEEE Trans. Intell. Veh. 2021, 8, 147–158. [Google Scholar] [CrossRef]
- Nguyen, N.T.; Schilling, L.; Angern, M.S.; Hamann, H.; Ernst, F.; Schildbach, G. B-spline path planner for safe navigation of mobile robots. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 339–345. [Google Scholar]
- Xuezhi, C. Automatic vertical parking path planning based on clothoid curve and stanley algorithm. In Proceedings of the 2022 IEEE 5th International Conference on Information Systems and Computer Aided Education (ICISCAE), Dalian, China, 23–25 September 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 761–766. [Google Scholar]
- Xiong, R.; Li, L.; Zhang, C.; Ma, K.; Yi, X.; Zeng, H. Path tracking of a four-wheel independently driven skid steer robotic vehicle through a cascaded NTSM-PID control method. IEEE Trans. Instrum. Meas. 2022, 71, 1–11. [Google Scholar] [CrossRef]
- Li, J.; Wang, J.; Peng, H.; Hu, Y.; Su, H. Fuzzy-torque approximation-enhanced sliding mode control for lateral stability of mobile robot. IEEE Trans. Syst. Man Cybern. Syst. 2021, 52, 2491–2500. [Google Scholar] [CrossRef]
- Huang, Y.; Ding, H.; Zhang, Y.; Wang, H.; Cao, D.; Xu, N.; Hu, C. A motion planning and tracking framework for autonomous vehicles based on artificial potential field elaborated resistance network approach. IEEE Trans. Ind. Electron. 2019, 67, 1376–1386. [Google Scholar] [CrossRef]
- Zhang, Z.; Zhang, L.; Wang, C.; Wang, M.; Cao, D.; Wang, Z. Integrated decision making and motion control for autonomous emergency avoidance based on driving primitives transition. IEEE Trans. Veh. Technol. 2022, 72, 4207–4221. [Google Scholar] [CrossRef]
- Nair, S.H.; Govindarajan, V.; Lin, T.; Meissen, C.; Tseng, H.E.; Borrelli, F. Stochastic mpc with multi-modal predictions for traffic intersections. In Proceedings of the 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Macau, China, 8–12 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 635–640. [Google Scholar]
- Yuan, K.; Shu, H.; Huang, Y.; Zhang, Y.; Khajepour, A.; Zhang, L. Mixed local motion planning and tracking control framework for autonomous vehicles based on model predictive control. IET Intell. Transp. Syst. 2019, 13, 950–959. [Google Scholar] [CrossRef]
- Zhou, Z.; Yang, Z.; Zhang, Y.; Huang, Y.; Chen, H.; Yu, Z. A comprehensive study of speed prediction in transportation system: From vehicle to traffic. Iscience 2022, 25, 103909. [Google Scholar] [CrossRef] [PubMed]
- Gao, H.; Hu, C.; Xie, G.; Han, C. Discretionary cut-in driving behavior risk assessment based on naturalistic driving data. IEEE Intell. Transp. Syst. Mag. 2021, 14, 29–40. [Google Scholar] [CrossRef]
- Wang, Z.; Gao, P.; He, Z.; Zhao, L. A CGAN-based Model for Human-like Driving Decision Making. In Proceedings of the 2021 IEEE Wireless Communications and Networking Conference (WCNC), Nanjing, China, 29 March–1 April 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar]
- Le Mero, L.; Yi, D.; Dianati, M.; Mouzakitis, A. A survey on imitation learning techniques for end-to-end autonomous vehicles. IEEE Trans. Intell. Transp. Syst. 2022, 23, 14128–14147. [Google Scholar] [CrossRef]
- Codevilla, F.; Müller, M.; López, A.; Koltun, V.; Dosovitskiy, A. End-to-end driving via conditional imitation learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 4693–4700. [Google Scholar]
- Codevilla, F.; Santana, E.; López, A.M.; Gaidon, A. Exploring the limitations of behavior cloning for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9329–9338. [Google Scholar]
- Xiao, Y.; Codevilla, F.; Gurram, A.; Urfalioglu, O.; López, A.M. Multimodal end-to-end autonomous driving. IEEE Trans. Intell. Transp. Syst. 2020, 23, 537–547. [Google Scholar] [CrossRef]
- Chen, D.; Zhou, B.; Koltun, V.; Krähenbühl, P. Learning by cheating. In Proceedings of the Conference on Robot Learning (PMLR), Virtual, 16–18 November 2020; pp. 66–75. [Google Scholar]
- Li, G.; Ji, Z.; Li, S.; Luo, X.; Qu, X. Driver behavioral cloning for route following in autonomous vehicles using task knowledge distillation. IEEE Trans. Intell. Veh. 2022, 8, 1025–1033. [Google Scholar] [CrossRef]
- Teng, S.; Chen, L.; Ai, Y.; Zhou, Y.; Xuanyuan, Z.; Hu, X. Hierarchical interpretable imitation learning for end-to-end autonomous driving. IEEE Trans. Intell. Veh. 2022, 8, 673–683. [Google Scholar] [CrossRef]
- Wang, L.; Fernandez, C.; Stiller, C. High-level decision making for automated highway driving via behavior cloning. IEEE Trans. Intell. Veh. 2022, 8, 923–935. [Google Scholar] [CrossRef]
- Menner, M.; Berntorp, K.; Zeilinger, M.N.; Di Cairano, S. Inverse learning for data-driven calibration of model-based statistical path planning. IEEE Trans. Intell. Veh. 2020, 6, 131–145. [Google Scholar] [CrossRef]
- He, X.; Yang, H.; Hu, Z.; Lv, C. Robust lane change decision making for autonomous vehicles: An observation adversarial reinforcement learning approach. IEEE Trans. Intell. Veh. 2022, 8, 184–193. [Google Scholar] [CrossRef]
- Cai, P.; Mei, X.; Tai, L.; Sun, Y.; Liu, M. High-speed autonomous drifting with deep reinforcement learning. IEEE Robot. Autom. Lett. 2020, 5, 1247–1254. [Google Scholar] [CrossRef]
- Wu, J.; Huang, Z.; Lv, C. Uncertainty-aware model-based reinforcement learning: Methodology and application in autonomous driving. IEEE Trans. Intell. Veh. 2022, 8, 194–203. [Google Scholar] [CrossRef]
- Li, G.; Qiu, Y.; Yang, Y.; Li, Z.; Li, S.; Chu, W.; Green, P.; Li, S.E. Lane change strategies for autonomous vehicles: A deep reinforcement learning approach based on transformer. IEEE Trans. Intell. Veh. 2022, 8, 2197–2211. [Google Scholar] [CrossRef]
- Chen, D.; Koltun, V.; Krähenbühl, P. Learning to drive from a world on rails. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 15590–15599. [Google Scholar]
- Zhao, Y.; Wu, K.; Xu, Z.; Che, Z.; Lu, Q.; Tang, J.; Liu, C.H. Cadre: A cascade deep reinforcement learning framework for vision-based autonomous urban driving. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 36, pp. 3481–3489. [Google Scholar]
- Chekroun, R.; Toromanoff, M.; Hornauer, S.; Moutarde, F. Gri: General reinforced imitation and its application to vision-based autonomous driving. Robotics 2023, 12, 127. [Google Scholar] [CrossRef]
- Coelho, D.; Oliveira, M.; Santos, V. RLfOLD: Reinforcement Learning from Online Demonstrations in Urban Autonomous Driving. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 11660–11668. [Google Scholar]
- Vaswani, A. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Chitta, K.; Prakash, A.; Jaeger, B.; Yu, Z.; Renz, K.; Geiger, A. Transfuser: Imitation with transformer-based sensor fusion for autonomous driving. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 12878–12895. [Google Scholar] [CrossRef] [PubMed]
- Shao, H.; Wang, L.; Chen, R.; Li, H.; Liu, Y. Safety-enhanced autonomous driving using interpretable sensor fusion transformer. In Proceedings of the Conference on Robot Learning (PMLR), Atlanta, GA, USA, 6–9 November 2023; pp. 726–737. [Google Scholar]
- Shao, H.; Wang, L.; Chen, R.; Waslander, S.L.; Li, H.; Liu, Y. Reasonnet: End-to-end driving with temporal and global reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 13723–13733. [Google Scholar]
- Wang, P.; Zhu, M.; Lu, H.; Zhong, H.; Chen, X.; Shen, S.; Wang, X.; Wang, Y. Bevgpt: Generative pre-trained large model for autonomous driving prediction, decision-making, and planning. arXiv 2023, arXiv:2310.10357. [Google Scholar] [CrossRef]
- Xu, Z.; Zhang, Y.; Xie, E.; Zhao, Z.; Guo, Y.; Wong, K.Y.K.; Li, Z.; Zhao, H. Drivegpt4: Interpretable end-to-end autonomous driving via large language model. IEEE Robot. Autom. Lett. 2024, 9, 8186–8193. [Google Scholar] [CrossRef]
- Toromanoff, M.; Wirbel, E.; Moutarde, F. End-to-end model-free reinforcement learning for urban driving using implicit affordances. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 7153–7162. [Google Scholar]
- Wu, P.; Jia, X.; Chen, L.; Yan, J.; Li, H.; Qiao, Y. Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline. Adv. Neural Inf. Process. Syst. 2022, 35, 6119–6132. [Google Scholar]
- Zhang, Z.; Liniger, A.; Dai, D.; Yu, F.; Van Gool, L. End-to-end urban driving by imitating a reinforcement learning coach. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 15222–15232. [Google Scholar]
- Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An open urban driving simulator. In Proceedings of the Conference on Robot Learning (PMLR), Mountain View, CA, USA, 13–15 November 2017; pp. 1–16. [Google Scholar]
- Raffin, A.; Hill, A.; Gleave, A.; Kanervisto, A.; Ernestus, M.; Dormann, N. Stable-Baselines3: Reliable Reinforcement Learning Implementations. J. Mach. Learn. Res. 2021, 22, 1–8. [Google Scholar]
Parameter | Value |
---|---|
Imitation Learning | |
Optimizer | Adam |
Learning Rate | 5 × 10−4 |
Batch Size | 256 |
Iterations | 50 |
Teacher Policy Hidden Size | [512, 512, 256, 128] |
Reinforcement Learning | |
Learning Rate | 1 × 10−5 |
Minibatch Size | 256 |
Iterations | 2000 |
Epochs per Iteration | 20 |
time steps per Iteration | 12,288 |
Context Length K | 5 |
Discount Factor | 0.99 |
MLP Ratio | 2.0 |
Number of Attention Heads | 4 |
Number of Blocks | 4 |
Embedding Dimension | 192 |
GAE Coefficient | 0.9 |
Clip Range | 0.2 |
Entropy Coefficient | 0.01 |
Exploration Coefficient | 0.05 |
Distillation Coefficient | 0.1 |
Target KL | 0.01 |
Max Gradient Norm | 0.5 |
Reward Coefficient | 5, 0.5, 1, 5, 0.1 |
Method | Source | Success Rate ↑ (%) | |||||
---|---|---|---|---|---|---|---|
Training Scenarios | Testing Scenarios | ||||||
Empty | Regular | Dense | Empty | Regular | Dense | ||
CILRS [33] | CVPR 19 | 97 | 83 | 42 | 66 | 56 | 24 |
LBC [35] | CoRL 20 | 89 | 87 | 75 | 36 | 36 | 12 |
WOR [44] | ICCV 21 | 98 | 100 | 96 | 78 | 82 | 66 |
CADRE [45] | AAAI 22 | 95 | 92 | 82 | 78 | 72 | 52 |
GRIAD [46] | Robotics 23 | 98 | 98 | 94 | 69 | 63 | 52 |
RLfOLD [47] | AAAI 24 | 100 | 94 | 90 | 100 | 86 | 66 |
KD-BeT | Ours | 100 | 96 | 93 | 100 | 94 | 85 |
Method | Success Metrics (↑) | Collision Metrics (↓) | Other Metrics (↓) | |||||
---|---|---|---|---|---|---|---|---|
Success Rate | Driving Score | Route Compl. | Infrac. Score | Collision Others | Collision Vehicle | Red Light Infraction | Vehicle Blocked | |
KD-BeT w/o | ||||||||
KD-BeT w/o | ||||||||
KD-BeT w/o | ||||||||
KD-BeT |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, R.; Fan, Y.; Li, Y.; Zhang, D.; Gao, F.; Gao, Z.; Yang, Z. Knowledge Distillation-Enhanced Behavior Transformer for Decision-Making of Autonomous Driving. Sensors 2025, 25, 191. https://doi.org/10.3390/s25010191
Zhao R, Fan Y, Li Y, Zhang D, Gao F, Gao Z, Yang Z. Knowledge Distillation-Enhanced Behavior Transformer for Decision-Making of Autonomous Driving. Sensors. 2025; 25(1):191. https://doi.org/10.3390/s25010191
Chicago/Turabian StyleZhao, Rui, Yuze Fan, Yun Li, Dong Zhang, Fei Gao, Zhenhai Gao, and Zhengcai Yang. 2025. "Knowledge Distillation-Enhanced Behavior Transformer for Decision-Making of Autonomous Driving" Sensors 25, no. 1: 191. https://doi.org/10.3390/s25010191
APA StyleZhao, R., Fan, Y., Li, Y., Zhang, D., Gao, F., Gao, Z., & Yang, Z. (2025). Knowledge Distillation-Enhanced Behavior Transformer for Decision-Making of Autonomous Driving. Sensors, 25(1), 191. https://doi.org/10.3390/s25010191