SBF-DRL: A Multi-Vehicle Safety Enhancement Framework Based on Deep Reinforcement Learning with Integrated Safety Barrier Function
Abstract
1. Introduction
- We bypass the formal verification of AC (black boxes) by introducing runtime safety assurance as a mechanism to ensure safety.
- We design MC through MDP, combining safety constraints and DRL optimization policies to achieve a more effective trade-off between safety and efficiency.
- We tested and evaluated SBF-DRL in a driving task using two environmental states: a single autonomous vehicle and multiple autonomous vehicles.
2. Related Work
3. SBF-DRL Framework System Design
4. Meta-Controller
4.1. Markovian Decision Process Modeling
- The state space represents a vector of environment and vehicle states. The MDP state space consists of a concatenated list of state features of the observed vehicles, including autonomous vehicles and nearby neighboring vehicles. For each autonomous vehicle, its state is modeled as a 5-tuple list of features:where presence is a 0–1 variable that indicates if the vehicle is included in the feature list (the ego vehicle is always included; a surrounding vehicle may be included if it falls within a user-defined perception distance, subject to the constraint of the total number of nearby vehicles ); and are its and coordinates, respectively; and are its speeds in the and directions, respectively. We set in our experiments, which is the default value in highway-env. Therefore, the initial state input of the AC for each autonomous vehicle has 5 5 = 25 dimensions to encode the overall system state of the five vehicles during training and validation.
- Action space determines whether the SC or AC is kept in control by FSC or RSC. FSC corresponds to “switching to SC control” in the action space; that is, when the MC detects that the current state does not meet the safety constraints, it triggers the transfer of control from AC to SC. RSC corresponds to “switching back to AC control” in the action space; that is, when the MC verifies that the current state meets the safe recovery conditions, the transfer of control rights from SC to AC is triggered.
- is the transfer probability function of taking action in one state to obtain the next state , .
- denotes the discount factor.
- Reward is the reward received corresponding to a state transfer. Achieving the desired policy behavior is guided by rewards.
4.2. Iterative Generation of Secure Deep Reinforcement Learning Controllers Based on Barrier Functions
- (1)
- Learner component
- (2)
- Validator component
| Algorithm 1: Barrier Functions Guides DDQN to Train MC Agents |
| 1: Initialize: ; empty empirical pool ; 2: for episode = 1, 2, …, M do 3: for t = 0, 1, …, T do 4: Get the current state and calculate the BF in the current state; 5: Select an action in the set of actions or policy; 6: if meets BF safety requirements then 7: ; 8: to ; 9: else 10: ; 11: to ; 12: end 13: Randomly sample small batches of experience from ; 14: Update and via Equation (4); 15: end 16: end |
5. Experiment and Analysis
5.1. Experimental Setup
- MDDQN: This policy uses the most commonly used model-free DRL algorithm. DDQN [29] to train based on the reward function . DDQN is an offline policy algorithm for discrete action spaces. This policy did not adopt MC safety constraints when testing.
- GAT-MDDQN: This policy uses the DDQN algorithm with a graph attention mechanism state representation network to be trained based on and as an AC. This policy does not adopt MC safety constraints when testing.
- GAT-MDDQN-MC: This policy uses GAT-MADDQN for experimental testing and uses MC for safety assurance, which is the SBF-DRL policy proposed in this chapter. The overall structure is shown in Figure 1.
- Cumul. Reward: Cumulative reward per episode.
- Episode Avg Speed: Average speed of the ego vehicle in each episode.
- Episode Length: Number of time steps of each episode.
- Collision Rate: Proportion of episodes that end with a collision.
- Distance Traveled: Total distance that the ego vehicle travels in each episode in the Highway environment.
5.2. Experimental Results and Analysis
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar]
- Wu, J.; Huang, C.; Huang, H.; Lv, C.; Wang, Y.; Wang, F.-Y. Recent advances in reinforcement learning-based autonomous driving behavior planning: A survey. Transp. Res. Part C Emerg. Technol. 2024, 164, 104654. [Google Scholar] [CrossRef]
- Wang, H.; Shao, W.; Sun, C.; Yang, K.; Cao, D.; Li, J. A Survey on an Emerging Safety Challenge for Autonomous Vehicles: Safety of the Intended Functionality. Engineering 2024, 33, 17–34. [Google Scholar] [CrossRef]
- Grigorescu, S.; Trasnea, B.; Cocias, T.; Macesanu, G. A survey of deep learning techniques for autonomous driving. J. Field Robot. 2020, 37, 362–386. [Google Scholar] [CrossRef]
- El Sallab, A.; Abdou, M.; Perot, E.; Yogamani, S. Deep reinforcement learning framework for autonomous driving. arXiv 2017, arXiv:1704.02532. [Google Scholar] [CrossRef]
- Perot, E.; Jaritz, M.; Toromanoff, M.; De Charette, R. End-to-end driving in a realistic racing game with deep reinforcement learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 3–4. [Google Scholar]
- Wang, P.; Chan, C.Y.; de La Fortelle, A. A reinforcement learning based approach for automated lane change maneuvers. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; pp. 1379–1384. [Google Scholar]
- Lillicrap, T.P. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Mirchevska, B.; Pek, C.; Werling, M.; Althoff, M.; Boedecker, J. High-level decision making for safe and reasonable autonomous lane changing using reinforcement learning. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 2156–2162. [Google Scholar]
- Chen, D.; Hajidavalloo, M.R.; Li, Z.; Chen, K.; Wang, Y.; Jiang, L.; Wang, Y. Deep multi-agent reinforcement learning for highway on-ramp merging in mixed traffic. IEEE Trans. Intell. Transp. Syst. 2023, 24, 11623–11638. [Google Scholar] [CrossRef]
- Foerster, J.; Farquhar, G.; Afouras, T.; Nardelli, N.; Whiteson, S. Counterfactual multi-agent policy gradients. Proc. AAAI Conf. Artif. Intell. 2018, 32. [Google Scholar] [CrossRef]
- Desjardins, C.; Chaib-Draa, B. Cooperative adaptive cruise control: A reinforcement learning approach. IEEE Trans. Intell. Transp. Syst. 2011, 12, 1248–1260. [Google Scholar] [CrossRef]
- Chen, S.; Wang, M.; Song, W.; Yang, Y.; Fu, M. Multi-agent reinforcement learning-based decision making for twin-vehicles cooperative driving in stochastic dynamic highway environments. IEEE Trans. Veh. Technol. 2023, 72, 12615–12627. [Google Scholar] [CrossRef]
- Santini, S.; Salvi, A.; Valente, A.S.; Pescape, A.; Segata, M.; Cigno, R.L. Platooning maneuvers in vehicular networks: A distributed and consensus-based approach. IEEE Trans. Intell. Veh. 2018, 4, 59–72. [Google Scholar] [CrossRef]
- Fabiani, F.; Grammatico, S. Multi-vehicle automated driving as a generalized mixed-integer potential game. IEEE Trans. Intell. Transp. Syst. 2019, 21, 1064–1073. [Google Scholar] [CrossRef]
- Ye, F.; Cheng, X.; Wang, P.; Chan, C.Y.; Zhang, J. Automated lane change policy using proximal policy optimization-based deep reinforcement learning. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 19 October–13 November 2020; pp. 1746–1752. [Google Scholar]
- Hu, W.; Li, X.; Hu, J.; Song, X.; Dong, X.; Kong, D.; Xu, Q.; Ren, C. A rear anti-collision decision-making methodology based on deep reinforcement learning for autonomous commercial vehicles. IEEE Sens. J. 2022, 22, 16370–16380. [Google Scholar] [CrossRef]
- Tang, X.; Huang, B.; Liu, T.; Lin, X. Highway decision-making and motion planning for autonomous driving via soft actor-critic. IEEE Trans. Veh. Technol. 2022, 71, 4706–4717. [Google Scholar] [CrossRef]
- Kamran, D.; Engelgeh, T.; Busch, M.; Fischer, J.; Stiller, C. Minimizing safety interference for safe and comfortable automated driving with distributional reinforcement learning. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 1236–1243. [Google Scholar]
- Jafari, R.; Ashari, A.E.; Huber, M. CHAMP: Integrated Logic with Reinforcement Learning for Hybrid Decision Making for Autonomous Vehicle Planning. In Proceedings of the 2023 American Control Conference (ACC), San Diego, CA, USA, 31 May–2 June 2023; pp. 3310–3315. [Google Scholar]
- Peng, Y.F.; Tan, G.Z.; Si, H.W. RTA-IR: A runtime assurance framework for behavior planning based on imitation learning and responsibility-sensitive safety model. Expert Syst. Appl. 2023, 232, 120824. [Google Scholar] [CrossRef]
- Shalev-Shwartz, S.; Shammah, S.; Shashua, A. On a formal model of safe and scalable self-driving cars. arXiv 2017, arXiv:1708.06374. [Google Scholar]
- Hwang, S.; Lee, K.; Jeon, H.; Kum, D. Autonomous vehicle cut-in algorithm for lane-merging scenarios via policy-based reinforcement learning nested within finite-state machine. IEEE Trans. Intell. Transp. Syst. 2022, 23, 17594–17606. [Google Scholar] [CrossRef]
- Yang, Z.; Pei, X.; Xu, J.; Zhang, X.; Xi, W. Decision-making in autonomous driving by reinforcement learning combined with planning & control. In Proceedings of the 2022 6th CAA International Conference on Vehicular Control and Intelligence (CVCI), Nanjing, China, 28–30 October 2022; pp. 1–6. [Google Scholar]
- Peng, Y.; Tan, G.; Si, H.; Li, J. Drl-gat-sa: Deep reinforcement learning for autonomous driving planning based on graph attention networks and simplex architecture. J. Syst. Archit. 2022, 126, 102505. [Google Scholar] [CrossRef]
- Wang, J.; Wu, J.; Li, Y. The driving safety field based on driver vehicle road interactions. IEEE Trans. Intell. Transp. Syst. 2015, 16, 2203–2214. [Google Scholar] [CrossRef]
- Bouton, M.; Nakhaei, A.; Isele, D.; Fujimura, K.; Kochenderfer, M.J. Reinforcement learning with iterative reasoning for merging in dense traffic. In Proceedings of the IEEE International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece, 20–23 September 2020. [Google Scholar]
- Lazarus, C.; Lopez, J.G.; Kochenderfer, M.J. Runtime Safety Assurance Using Reinforcement Learning. In Proceedings of the 2020 IEEE/AIAA 39th Digital Avionics Systems Conference (DASC), San Antonio, TX, USA, 11–15 October 2020; pp. 1–9. [Google Scholar]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. Proc. AAAI Conf. Artif. Intell. 2016, 30. [Google Scholar] [CrossRef]
- IEEE VT. White paper-literature review on kinematic properties of road users for use on safety-related models for automated driving systems. In Literature Review on Kinematic Properties of Road Users for Use on Safety-Related Models for Automated Driving Systems; IEEE: Piscataway, NJ, USA, 2022; pp. 1–35. [Google Scholar]
- Leurent, E. An Environment for Autonomous Driving Decision-Making. 2018. Available online: https://github.com/eleurent/highway-env (accessed on 2 December 2025).




| Method | Cumul. Reward | Collision Rate | Episode Length | Episode Avg Speed | Distance Traveled |
|---|---|---|---|---|---|
| GAT-MDDQN-SR | 636.81 | 6.7% | 38.48 | 23.62 | 919 |
| GAT-MDDQN-DRL | 654.90 | 5.6% | 38.92 | 23.93 | 932 |
| GAT-MDDQN-SF | 657.92 | 4.4% | 39.14 | 23.89 | 935 |
| GAT-MDDQN-RSS | 663.98 | 2% | 39.47 | 23.76 | 938 |
| GAT-MDDQN-MC | 667.91 | 1.8% | 39.56 | 23.80 | 942 |
| Algorithm | Cumul. Reward | Collision Rate | Episode Length | Episode Avg Speed | Distance Traveled | |
|---|---|---|---|---|---|---|
| MDDQN | 34.54 | 4.6% | 39.15 | 21.89 | 857 | |
| GAT-MDDQN | 35.39 | 5.5% | 39.24 | 23.76 | 932 | |
| GAT-MDDQN-MC | 35.46 | 0.1% | 39.95 | 22.37 | 894 | |
| MDDQN | 61.94 | 5.7% | 39.03 | 21.84 | 852 | |
| GAT-MDDQN | 64.19 | 8.6% | 38.94 | 23.68 | 922 | |
| GAT-MDDQN-MC | 64.85 | 0.1% | 39.97 | 22.72 | 908 | |
| MDDQN | 147.63 | 21.2% | 36.63 | 22.22 | 814 | |
| GAT-MDDQN | 156.69 | 15.5% | 37.85 | 23.72 | 898 | |
| GAT-MDDQN-MC | 162.85 | 1.2% | 39.76 | 23.04 | 916 | |
| MDDQN | 284.76 | 36.7% | 34.37 | 22.90 | 787 | |
| GAT-MDDQN | 312.45 | 15.6% | 38.55 | 23.83 | 919 | |
| GAT-MDDQN-MC | 329.42 | 1.1% | 39.74 | 23.37 | 929 | |
| MDDQN | 348.32 | 57% | 27.77 | 23.24 | 645 | |
| GAT-MDDQN | 477.71 | 10.9% | 38.15 | 23.90 | 912 | |
| GAT-MDDQN-MC | 498.45 | 1.6% | 39.60 | 23.66 | 937 | |
| MDDQN | 447.75 | 62.6% | 26.96 | 23.04 | 621 | |
| GAT-MDDQN | 654.92 | 5.6% | 38.87 | 23.93 | 930 | |
| GAT-MDDQN-MC | 667.91 | 1.8% | 39.56 | 23.80 | 942 |
| Algorithm | Cumul. Reward | Collision Rate | Episode Length | Episode Avg Speed | Distance Traveled | |
|---|---|---|---|---|---|---|
| MDDQN | 526.13 | 46.7% | 32.23 | 23.24 | 749 | |
| GAT-MDDQN | 672.51 | 0.2% | 39.94 | 23.98 | 957 | |
| GAT-MDDQN-MC | 677.52 | 0 | 40 | 23.92 | 958 | |
| MDDQN | 517.82 | 53.6% | 31.71 | 23.06 | 731 | |
| GAT-MDDQN | 665.29 | 1.1% | 39.82 | 23.96 | 954 | |
| GAT-MDDQN-MC | 675.77 | 0.3% | 39.96 | 23.90 | 955 | |
| MDDQN | 447.75 | 62.6% | 26.96 | 23.04 | 621 | |
| GAT-MDDQN | 654.92 | 5.6% | 38.87 | 23.93 | 930 | |
| GAT-MDDQN-MC | 667.91 | 1.8% | 39.56 | 23.80 | 942 | |
| MDDQN | 416.55 | 69.1% | 24.93 | 23.24 | 579 | |
| GAT-MDDQN | 617.88 | 15.5% | 37.05 | 23.89 | 885 | |
| GAT-MDDQN-MC | 656.42 | 2.6% | 39.30 | 23.59 | 927 | |
| MDDQN | 504.90 | 57.9% | 30.52 | 22.90 | 698 | |
| GAT-MDDQN | 584.33 | 31.2% | 35.13 | 23.85 | 838 | |
| GAT-MDDQN-MC | 656.72 | 1.7% | 39.63 | 23.36 | 926 | |
| MDDQN | 472.19 | 60.9% | 28.79 | 22.75 | 655 | |
| GAT-MDDQN | 576.92 | 36.1% | 34.72 | 23.80 | 826 | |
| GAT-MDDQN-MC | 652.91 | 1.8% | 39.52 | 23.26 | 919 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Published by MDPI on behalf of the World Electric Vehicle Association. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Peng, Y.; Yuan, W.; Miao, F.; Hao, W. SBF-DRL: A Multi-Vehicle Safety Enhancement Framework Based on Deep Reinforcement Learning with Integrated Safety Barrier Function. World Electr. Veh. J. 2026, 17, 24. https://doi.org/10.3390/wevj17010024
Peng Y, Yuan W, Miao F, Hao W. SBF-DRL: A Multi-Vehicle Safety Enhancement Framework Based on Deep Reinforcement Learning with Integrated Safety Barrier Function. World Electric Vehicle Journal. 2026; 17(1):24. https://doi.org/10.3390/wevj17010024
Chicago/Turabian StylePeng, Yanfei, Wei Yuan, Fei Miao, and Wei Hao. 2026. "SBF-DRL: A Multi-Vehicle Safety Enhancement Framework Based on Deep Reinforcement Learning with Integrated Safety Barrier Function" World Electric Vehicle Journal 17, no. 1: 24. https://doi.org/10.3390/wevj17010024
APA StylePeng, Y., Yuan, W., Miao, F., & Hao, W. (2026). SBF-DRL: A Multi-Vehicle Safety Enhancement Framework Based on Deep Reinforcement Learning with Integrated Safety Barrier Function. World Electric Vehicle Journal, 17(1), 24. https://doi.org/10.3390/wevj17010024
