Applying Dual Deep Deterministic Policy Gradient Algorithm for Autonomous Vehicle Decision-Making in IPG-Carmaker Simulator
Abstract
1. Introduction
- Development of a robust and efficient decision-making policy for executing highway driving maneuvers, including lane keeping and double lane changing, using the Duel-DDPG algorithm to ensure safe and responsive vehicle behavior.
- Integration of realistic traffic conditions within the IPG Carmaker simulation platform, enabling comprehensive testing of lane keeping and lane changing maneuvers under diverse, high-fidelity driving traffic and uncertainty environments.
2. Methods
- Value-based methods, like the DQN algorithm, approximate the value of states or state-action pairs.
- Policy-based methods, such as REINFORCE and PPO algorithms, learn the policy directly.
- Actor-Critic methods, including DDPG and TD3, combine both approaches by employing two networks, including an actor network for policy decisions and a critic network for evaluating values.
2.1. Simulation Environment
2.2. Duel Deep Deterministic Policy Gradient (Duel-DDPG) Algorithm
| Algorithm 1. Duel-DDPG algorithm framework |
| Initial Q function parameters φ and policy parameters θ |
| Here is a paraphrased version of that line: |
| Initialize the target network parameters to match |
| those of the primary network.
|
| Agent: Observe state (s) and select action (a) Execute a in the environment Observe reward (r) and next state () Store B (s, a, r, (D). Randomly sample batch of transitions, B (s, a, r, ) from D Compute targets: |
| Where |
| Update Q—function by one step of gradient descent
|
| Where |
| Update policy by one step of gradient ascent |
| Update target networks with |
| End |
2.3. State
2.4. Action
2.5. Reward Function
3. Results and Discussion
Considering Uncertainty
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| DRL | Deep Reinforcement Learning |
| DDPG | Deep Deterministic Policy Gradient |
| DuelDDPG | Duel Deep Deterministic Policy Gradient |
References
- Shalev-Shwartz, S.; Shammah, S.; Shashua, A. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving. arXiv 2016, arXiv:1610.03295. [Google Scholar] [CrossRef]
- Kiran, B.R.; Sobh, I.; Talpaert, V.; Mannion, P.; Al Sallab, A.A.; Yogamani, S.; Perez, P. Deep Reinforcement Learning for Autonomous Driving: A Survey. IEEE Trans. Intell. Transp. Syst. 2021, 23, 4909–4926. [Google Scholar] [CrossRef]
- EL Sallab, A.; Abdou, M.; Perot, E.; Yogamani, S. Deep Reinforcement Learning framework for Autonomous Driving. Electron. Imaging 2017, 29, 70–76. [Google Scholar] [CrossRef]
- Kuefler, A.; Morton, J.; Wheeler, T.; Kochenderfer, M.J. Imitating Driver Behavior with Generative Adversarial Networks. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 204–211. [Google Scholar] [CrossRef]
- Sadigh, D.; Sastry, S.S.; Seshia, S.A.; Dragan, A.D. Planning for Autonomous Cars that Leverage Effects on Human Actions. In Proceedings of the Robotics: Science and Systems (RSS), Ann Arbor, MI, USA, 18–22 June 2016. [Google Scholar] [CrossRef]
- Rizehvandi, A.; Azadi, S.; Eichberger, A. Decision-Making Policy for Autonomous Vehicles on Highways Using Deep Reinforcement Learning (DRL) Method. Automation 2024, 5, 564–577. [Google Scholar] [CrossRef]
- Chen, C.; Jiang, J.; Lv, N.; Li, S. An Intelligent Path Planning Scheme of Autonomous Vehicles Platoon Using Deep Reinforcement Learning on Network Edge. IEEE Access 2020, 8, 99059–99069. [Google Scholar] [CrossRef]
- Zhang, F.; Wang, L.; Coskun, S.; Pang, H.; Cui, Y.; Xi, J. Energy management strategies for hybrid electric vehicles: Review, classification, comparison, and outlook. Energies 2020, 13, 3352. [Google Scholar] [CrossRef]
- Qiao, Z.; Schneider, J.; Dolan, J. Behavior Planning at Urban Intersections through Hierarchical Reinforcement Learning. arXiv 2020, arXiv:2011.04697. [Google Scholar] [CrossRef]
- Rizehvandi, A.; Azadi, S.; Eichberger, A. Performance Comparison of Duel-DDPG and DDPG Algorithms in the Decision-Making Phase of Autonomous Vehicles. Period. Polytech. Transp. Eng. 2025, 53. [Google Scholar] [CrossRef]
- Rizehvandi, A.; Azadi, S. Design of a Path-Following Controller for Autonomous Vehicles Using an Optimized Deep Deterministic Policy Gradient Method. Int. J. Automot. Mech. Eng. 2024, 21, 11682–11694. [Google Scholar] [CrossRef]
- AVL. Global Vehicle Benchmarking and Technology Scouting. Available online: https://www.avl.com/en-us/engineering/vehicle-engineering/vehicle-development/global-vehicle-benchmarking-and-technology-scouting (accessed on 17 November 2022).
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Rizehvandi, A.; Azadi, S. Developing a controller for an adaptive cruise control (ACC) system: Utilizing deep reinforcement learning (DRL) approach. Sci. J. Silesian Univ. Technol. Ser. Transport. 2024, 125, 243–257. [Google Scholar] [CrossRef]
- Shalev-Shwartz, S.; Shammah, S.; Shashua, A. On a formal model of safe and scalable self-driving cars. arXiv 2017, arXiv:1708.06374. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. In Proceedings of the International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Fujimoto, S.; van Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
- Shou, M.; Wang, J.; Chen, H.; Chen, J. Multi-agent reinforcement learning for autonomous driving. In Proceedings of the IEEE International Conference on Intelligent Transportation Systems (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 1786–1791. [Google Scholar]
- García, J.; Fernández, F. A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 2025, 16, 1437–1480. [Google Scholar]
- Alshiekh, M.; Bloem, R.; Ehlers, R.; Könighofer, B.; Niekum, S.; Puggelli, A. Safe reinforcement learning via shielding. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 2669–2678. [Google Scholar]
- Kendall, A.; Hawke, J.; Janz, D.; Mazur, P.; Reda, D.; Allen, J.-M.; Lam, V.-D.; Bewley, A.; Shah, A. Learning to Drive in a Day. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 8248–8254. [Google Scholar]
- Hausknecht, M.; Stone, P. Deep Recurrent Q-Learning for Partially Observable MDPs. In Proceedings of the AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents (AAAI-SDMIA15), Arlington, VA, USA, 12–14 November 2015; pp. 29–37. [Google Scholar]
- Igl, M.; Zintgraf, L.; Le, T.A.; Wood, F.; Whiteson, S. Deep variational reinforcement learning for POMDPs. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Rizehvandi, A.; Azadi, S.; Eichberger, A. Enhancing Highway Driving: High Automated Vehicle Decision Making in a Complex Multi-Body Simulation Environment. Modelling 2024, 5, 951–968. [Google Scholar] [CrossRef]
- Wang, T.; Qu, D.; Song, H.; Dai, S. A Hierarchical Framework of Decision Making and Trajectory Tracking Control for Autonomous Vehicles. Sustainability 2023, 15, 6375. [Google Scholar] [CrossRef]
- Giang, H.T.H.; Hoan, T.N.K.; Thanh, P.D.; Koo, I. Hybrid NOMA/OMA-Based Dynamic Power Allocation Scheme Using Deep Reinforcement Learning in 5G Networks. Appl. Sci. 2020, 10, 4236. [Google Scholar] [CrossRef]









| Hyper Parameters | Value |
|---|---|
| Discount factor | 0.99 |
| Number of episodes | 10,000 |
| period | 0.1 s |
| Learning rate for actor network | 0.0001 |
| Learning rate for critic network | 0.001 |
| Actor hidden layer | 100 |
| Critic hidden layer | 100 |
| Training epoch | 10 |
| Target soft update rate | 0.01 |
| Batch size | 1000 |
| Experience replay buffer size | 100,000 |
| Parameter | Value | Unit |
|---|---|---|
| Temperature | 20 | °C |
| Air Density | 1.2 | kg/m3 |
| Air Pressure | 101 | kPa |
| Humidity | 60 | % |
| Solar Radiation | 400 | W/m2 |
| Cloud Speed | 71 | km/h |
| Cloud Angle | 106 | Degrees |
| Visibility in Fog | 150 | meters |
| Rainfall Rate | 5 | mm/h |
| Wind Speed | 10 | km/h |
| Longitudinal Acceleration (m/s2) | Mean | Standard Deviation |
|---|---|---|
| Normal Conditions | 1.10 | 0.70 |
| Rainy Conditions | 1.01 | 0.61 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Published by MDPI on behalf of the World Electric Vehicle Association. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Rizehvandi, A.; Azadi, S.; Eichberger, A. Applying Dual Deep Deterministic Policy Gradient Algorithm for Autonomous Vehicle Decision-Making in IPG-Carmaker Simulator. World Electr. Veh. J. 2026, 17, 33. https://doi.org/10.3390/wevj17010033
Rizehvandi A, Azadi S, Eichberger A. Applying Dual Deep Deterministic Policy Gradient Algorithm for Autonomous Vehicle Decision-Making in IPG-Carmaker Simulator. World Electric Vehicle Journal. 2026; 17(1):33. https://doi.org/10.3390/wevj17010033
Chicago/Turabian StyleRizehvandi, Ali, Shahram Azadi, and Arno Eichberger. 2026. "Applying Dual Deep Deterministic Policy Gradient Algorithm for Autonomous Vehicle Decision-Making in IPG-Carmaker Simulator" World Electric Vehicle Journal 17, no. 1: 33. https://doi.org/10.3390/wevj17010033
APA StyleRizehvandi, A., Azadi, S., & Eichberger, A. (2026). Applying Dual Deep Deterministic Policy Gradient Algorithm for Autonomous Vehicle Decision-Making in IPG-Carmaker Simulator. World Electric Vehicle Journal, 17(1), 33. https://doi.org/10.3390/wevj17010033

