A Rule-Guided Distributional Soft Actor–Critic Algorithm for Safe Lane-Changing in Complex Driving Scenarios
Abstract
1. Introduction
1.1. Motivation
1.2. Literature Review
1.3. Contribution
1.4. Paper Organization
2. Problem Formulation and Background
3. Proposed Approach
3.1. Framework Overview
3.2. Rule-Guided Controller for Safe and Efficient Lane Changing
3.3. Curriculum-Aware Replay Sampling with Decaying Rule Ratio
3.4. Safety Shield for Policy Enforcement
4. Training Details
4.1. State and Action
4.2. Reward Function
5. Simulation
5.1. Traffic Scenario and Vehicle Modeling
5.2. Parameter Settings
5.3. Simulation Cases
5.3.1. Case 1: Regular Lane Change (RLC)
5.3.2. Case 2: Lane Merging Scenario (LMS)
6. Results and Discussion
6.1. Quantitative Analysis




6.2. Qualitative Analysis
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| SAC | Soft Actor–Critic |
| DSAC-T | Distributional Soft Actor–Critic with Three Refinements |
| LMS | Lane Merging Scenario |
| RLC | Regular Lane Change |
References
- Johnson, C.B.J. Car Accident Statistics for 2025. 2024. Available online: https://vegasvalleylaw.com/blog/car-crash-statistics/ (accessed on 15 January 2026).
- Zhao, N.; Zhang, J.; Wang, B.; Lu, Y.; Zhang, K.; Su, R. A Data-Driven Long-Term Prediction Method of Mandatory and Discretionary Lane Change Based on Transformer. In 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC); IEEE: New York, NY, USA, 2023; pp. 2390–2395. [Google Scholar]
- Huang, Y.; Gu, Y.; Yuan, K.; Yang, S.; Liu, T.; Chen, H. Human Knowledge Enhanced Reinforcement Learning for Mandatory Lane-Change of Autonomous Vehicles in Congested Traffic. IEEE Trans. Intell. Veh. 2024, 9, 3509–3519. [Google Scholar] [CrossRef]
- Guo, J.; Harmati, I. Lane-changing decision modelling in congested traffic with a game theory-based decomposition algorithm. Eng. Appl. Artif. Intell. 2022, 107, 104530. [Google Scholar] [CrossRef]
- Chakraborty, S.; Cui, L.; Ozbay, K.; Jiang, Z.-P. Automated lane changing control in mixed traffic: An adaptive dynamic programming approach. Transp. Res. Part B Methodol. 2024, 187, 103026. [Google Scholar] [CrossRef]
- He, S.; Zeng, J.; Zhang, B.; Sreenath, K. Rule Based-Safety Critical Control Design using Control Barrier Functions with Application to Autonomous Lane Change. In 2021 American Control Conference (ACC); IEEE: New York, NY, USA, 2021. [Google Scholar]
- Cao, W.; Zhao, H. Lane change algorithm using rule-based control method based on look-ahead concept for the scenario when emergency vehicle approaching. Artif. Life Robot. 2022, 27, 818–827. [Google Scholar] [CrossRef]
- Asano, S.; Ishihara, S. Safe, Smooth, and Fair Rule-Based Cooperative Lane Change Control for Sudden Obstacle Avoidance on a Multi-Lane Road. Appl. Sci. 2022, 12, 8528. [Google Scholar] [CrossRef]
- Qin, Z.; Ji, A.; Sun, Z.; Wu, G.; Hao, P.; Liao, X. Game Theoretic Application to Intersection Management: A Literature Review. In IEEE Transactions on Intelligent Vehicles; IEEE: New York, NY, USA, 2024; pp. 1–19. [Google Scholar]
- Elvik, R. A review of game-theoretic models of road user behaviour. Accid. Anal. Prev. 2014, 62, 388–396. [Google Scholar] [CrossRef] [PubMed]
- Ji, A.; Levinson, D. A review of game theory models of lane changing. Transp. A Transp. Sci. 2020, 16, 1628–1647. [Google Scholar] [CrossRef]
- Ali, Y.; Zheng, Z.; Haque, M.M.; Wang, M. A game theory-based approach for modelling mandatory lane-changing behaviour in a connected environment. Transp. Res. Part C Emerg. Technol. 2019, 106, 220–242. [Google Scholar] [CrossRef]
- Lopez, V.G.; Lewis, F.L.; Liu, M.; Wan, Y.; Nageshrao, S.; Filev, D. Game-Theoretic Lane-Changing Decision Making and Payoff Learning for Autonomous Vehicles. IEEE Trans. Veh. Technol. 2022, 71, 3609–3620. [Google Scholar] [CrossRef]
- Zare, M.; Kebria, P.M.; Khosravi, A.; Nahavandi, S. A Survey of Imitation Learning: Algorithms, Recent Developments, and Challenges. IEEE Trans. Cybern. 2024, 54, 7173–7186. [Google Scholar] [CrossRef] [PubMed]
- Le Mero, L.; Yi, D.; Dianati, M.; Mouzakitis, A. A Survey on Imitation Learning Techniques for End-to-End Autonomous Vehicles. IEEE Trans. Intell. Transp. Syst. 2022, 23, 14128–14147. [Google Scholar] [CrossRef]
- Guo, L.; Liu, X. Lane-Changing Decisions Making for Autonomous Vehicles via Behavior Cloning and Decision Tree. In 2023 China Automation Congress (CAC); IEEE: New York, NY, USA, 2023; pp. 8648–8652. [Google Scholar]
- Xiao, D.; Wang, B.; Sun, Z.; He, X. Behavioral Cloning Based Model Generation Method for Reinforcement Learning. In 2023 China Automation Congress (CAC); IEEE: New York, NY, USA, 2023; pp. 6776–6781. [Google Scholar]
- Zhao, R.; Li, Y.; Fan, Y.; Gao, F.; Tsukada, M.; Gao, Z. A Survey on Recent Advancements in Autonomous Driving Using Deep Reinforcement Learning: Applications, Challenges, and Solutions. IEEE Trans. Intell. Transp. Syst. 2024, 25, 19365–19398. [Google Scholar] [CrossRef]
- Ladosz, P.; Weng, L.; Kim, M.; Oh, H. Exploration in deep reinforcement learning: A survey. Inf. Fusion 2022, 85, 1–22. [Google Scholar] [CrossRef]
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep Reinforcement Learning: A Brief Survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
- Balhara, S.; Gupta, N.; Alkhayyat, A.; Bharti, I.; Malik, R.Q.; Mahmood, S.N.; Abedi, F. A survey on deep reinforcement learning architectures, applications and emerging trends. IET Commun. 2022, 19, e12447. [Google Scholar] [CrossRef]
- Zhao, L.; Farhi, N.; Christoforou, Z.; Haddadou, N. Imitation of Real Lane-Change Decisions Using Reinforcement Learning. IFAC-PapersOnLine 2021, 54, 203–209. [Google Scholar] [CrossRef]
- Sharma, A.K.; Choudhary, A.; Chaudhary, R.; Bhardwaj, A.; Aslam, A.M. Adaptive Trajectory Planning in Autonomous Vehicles: A Hierarchical Reinforcement Learning Approach with Soft Actor-Critic. In 2024 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS); IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar]
- Liu, J.; Feng, Y.; Jing, S.; Hui, F. Deep Reinforcement Learning-Based Lane-Changing Trajectory Planning for Connected and Automated Vehicles. In 2023 9th International Conference on Mechanical and Electronics Engineering (ICMEE); IEEE: New York, NY, USA, 2023; pp. 406–412. [Google Scholar]
- Katzilieris, K.; Kampitakis, E.; Vlahogianni, E.I. Dynamic Lane Reversal: A reinforcement learning approach. In 2023 8th International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS); IEEE: New York, NY, USA, 2023; pp. 1–6. [Google Scholar]
- Liu, Z. Learning Personalized Discretionary Lane-Change Initiation for Fully Autonomous Driving Based on Reinforcement Learning. In 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC); IEEE: Toronto, ON, Canada, 2020. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P.; et al. Soft Actor-Critic Algorithms and Applications. arXiv 2018, arXiv:1812.05905. [Google Scholar]
- Lim, D.; Joe, I. A DRL-Based Task Offloading Scheme for Server Decision-Making in Multi-Access Edge Computing. Electronics 2023, 12, 3882. [Google Scholar] [CrossRef]
- Duan, J.; Wang, W.; Xiao, L.; Gao, J.; Li, S.E.; Liu, C. Distributional Soft Actor-Critic With Three Refinements. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 3935–3946. [Google Scholar] [CrossRef] [PubMed]





| ID | Description | Triggering Condition | Value | Design Purpose |
|---|---|---|---|---|
| Forward driving | +2.0 | Encourage appropriate driving speed | ||
| Collision Penalty | A collision is detected | −10.0 | Strongly penalizes collisions stopping | |
| Off-Road Penalty | The vehicle moves outside the drivable area | −10.0 | Ensures the vehicle stays on the road | |
| Illegal Lane Penalty | The current lane ID is -5, -6, or -7 | −10.0 | Prevents illegal or oncoming-lane driving | |
| Initiate Lane Change | In lane ID = -3 with 0 < steer < 0.7 and v > 0.1 | +2.0 | Discourage driving on emergency lanes | |
| Static Steering Penalty | Vehicle speed < 0.05 with non-zero steering | −2.0 | Discourages turning the wheel while stationary |
| Parameters | Value | Description |
|---|---|---|
| Time step | Fixed simulation update interval | |
| Max step per episode | 1000 | Maximum duration for each episode |
| Traffic vehicles distribution | 7–13 m | Vehicles positioned on both ego and target lanes |
| Max ego vehicle spawn attempts | 100 | To avoid repeated failures during ego spawning |
| Replay buffer size | 1,000,000 | Maximum number of stored transitions |
| Batch size | 512 | Mini-batch size for gradient updates |
| 0.99 | Temporal reward decay | |
| For updating target networks | ||
| Policy learning rate | Learning rate of the actor network | |
| Q-network learning rate | Learning rate of critic networks | |
| 0.2 | Controls the entropy regularization strength | |
| Policy update delay | 2 Step | Actor is updated every 2 critic updates |
| Method | Regular Lane Change (RLC) | Lane Merging Scenario (LMS) |
|---|---|---|
| SAC | 78.58% | 75.35% |
| DSAC-T | 83.56% | 74.26% |
| Rule-Guidance + DSAC-T | 86.11% | 77.36% |
| Rule-Guidance + DSAC-T + Safe Aware | 88.45% | 79.45% |
| Rate | Rule-Guidance + DSAC-T +Safe Aware | SAC | TD-3 |
|---|---|---|---|
| 0.2 | 88.21% | 84.31% | 72.65% |
| 0.4 | 71.54% | 68.99% | 65.36% |
| 0.7 | 27.45% | 10.32% | 16.96% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Cui, S.; Li, H.; Su, Y.; Huang, J.; Cheng, K.; Li, H. A Rule-Guided Distributional Soft Actor–Critic Algorithm for Safe Lane-Changing in Complex Driving Scenarios. Vehicles 2026, 8, 58. https://doi.org/10.3390/vehicles8030058
Cui S, Li H, Su Y, Huang J, Cheng K, Li H. A Rule-Guided Distributional Soft Actor–Critic Algorithm for Safe Lane-Changing in Complex Driving Scenarios. Vehicles. 2026; 8(3):58. https://doi.org/10.3390/vehicles8030058
Chicago/Turabian StyleCui, Shuwan, Hao Li, Yanzhao Su, Jin Huang, Kun Cheng, and Huiqian Li. 2026. "A Rule-Guided Distributional Soft Actor–Critic Algorithm for Safe Lane-Changing in Complex Driving Scenarios" Vehicles 8, no. 3: 58. https://doi.org/10.3390/vehicles8030058
APA StyleCui, S., Li, H., Su, Y., Huang, J., Cheng, K., & Li, H. (2026). A Rule-Guided Distributional Soft Actor–Critic Algorithm for Safe Lane-Changing in Complex Driving Scenarios. Vehicles, 8(3), 58. https://doi.org/10.3390/vehicles8030058

