A Comparative Study of Deep Reinforcement Learning Algorithms for Urban Autonomous Driving: Addressing the Geographic and Regulatory Challenges in CARLA
Abstract
1. Introduction
2. Related Works
- This work presents the first application of CrossQ and TQC in autonomous driving, conducting a comprehensive comparative analysis of their performance alongside that of existing algorithms such as DDPG, PPO, SAC, and TD3. Evaluations are performed within a closed-loop simulation environment that incorporates diverse geographic variations and complex traffic regulations. The analysis also includes a detailed investigation of failure factors, from which key insights for algorithmic improvements are derived.
- A set of comprehensive evaluation metrics is proposed to assess driving performance under varying geographic and regulatory conditions. These metrics capture multiple dimensions of performance, including the effectiveness, stability, efficiency, and comfort of driving.
- A set of adaptive reward and penalty functions is designed to facilitate effective policy training under dynamic driving scenarios that reflect complex geographic variations and real-world traffic regulations, enhancing the robustness and generalization of DRL-based decision-making.
3. System Model
3.1. Policy Network Model
3.2. DRL Module
3.3. VAE Encoder
3.4. Reward Function
3.4.1. Reward
3.4.2. Penalty
3.4.3. Total Reward Function
4. Evaluation Metrics
- Travel Distance : The distance traveled by the agent during episode i is calculated by cumulatively summing the distance covered at each step until the final time step, as illustrated in Figure 3.
- Route Completion : This metric represents the proportion of the intended route , generated from the start to the destination in episode i, that the ego vehicle successfully traveled from the starting point. It is computed according to Equation (32):
- Speed Mean : This metric denotes the average speed of the ego vehicle over the entire duration of episode i and is computed as shown in Equation (33).
- Centerline Deviation Mean : This metric quantifies how far the vehicle deviates from the lane centerline during driving and is computed according to Equation (34). It is based on the average of the Centerline Deviation values at each time step. A smaller value indicates stable lane keeping, whereas a larger value suggests a higher likelihood of lane departure.
- Episode Reward : Episode Reward represents the total cumulative reward obtained by the agent during episode i and is computed according to Equation (35).
- Episode Reward Mean and Step Reward Mean : The average reward metric is defined along two temporal scales. The first is the Episode Reward Mean, which is calculated by summing the total rewards obtained across all N episodes and dividing by the number of episodes. This metric represents the average reward per episode and is computed according to Equation (36). The second is the Step Reward Mean, which is calculated by dividing the total episode reward by the total number of steps in episode i. This metric represents the average reward per step and is computed according to Equation (37).
- Reward Standard Deviation : This metric represents the variability of the rewards obtained by the agent during episode i and is defined as shown in Equation (38). A lower value indicates that the learned policy is more stable and consistently acquires rewards.
- Success Rate : For each episode i, the binary success indicator is defined according to Equation (39), which determines whether the ego vehicle successfully reached its destination. In this formulation, an episode is considered successful () if the final vehicle position is within 5 m of the goal position . Otherwise, it is considered a failure (). The overall Success Rate is then computed by averaging the success indicators across all N episodes, as defined in Equation (40).
5. Experimental Setup
5.1. CARLA Simulation Map
5.2. Hyperparameter Setup for DRL Training
6. Simulation Result
6.1. Evaluation of Driving Performance
6.2. Evaluation of Reward-Based Performances
6.3. Evaluation of Driving Stability, Efficiency, and Comfort
6.4. Analysis of Penalty and Failure Factors
7. Conclusions
8. Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NeurIPS), Lake Tahoe, NV, USA, 3–8 December 2012. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Li, F.-F. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009. [Google Scholar]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012. [Google Scholar]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A multimodal dataset for autonomous driving. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Jun, W.; Lee, S. A Comparative Study and Optimization of Camera-Based BEV Segmentation for Real-Time Autonomous Driving. Sensors 2025, 25, 2300. [Google Scholar] [CrossRef] [PubMed]
- Kwak, D.; Yoo, J.; Son, M.; Park, M.; Choi, D.; Lee, S. Rethinking Real-Time Lane Detection Technology for Autonomous Driving. J. Korean Inst. Commun. Inf. Sci. 2023, 48, 589–599. [Google Scholar] [CrossRef]
- Cheong, Y.; Jun, W.; Lee, S. Study on Point Cloud Based 3D Object Detection for Autonomous Driving. J. Korean Inst. Commun. Inf. Sci. 2024, 49, 31–40. [Google Scholar] [CrossRef]
- Jun, W.; Yoo, J.; Lee, S. Synthetic Data Enhancement and Network Compression Technology of Monocular Depth Estimation for Real-Time Autonomous Driving System. Sensors 2024, 24, 4205. [Google Scholar] [CrossRef]
- Jun, W.; Lee, S. Optimal Configuration of Multi-Task Learning for Autonomous Driving. Sensors 2023, 23, 9729. [Google Scholar] [CrossRef]
- Son, M.; Won, Y.; Lee, S. Optimizing Large Language Models: A Deep Dive into Effective Prompt Engineering Techniques. Appl. Sci. 2025, 15, 1430. [Google Scholar] [CrossRef]
- Son, M.; Lee, S. Advancing Multimodal Large Language Models: Optimizing Prompt Engineering Strategies for Enhanced Performance. Appl. Sci. 2025, 15, 3992. [Google Scholar] [CrossRef]
- Yang, Z.; Jia, X.; Li, H.; Yan, J. LLM4Drive: A Survey of Large Language Models for Autonomous Driving. arXiv 2023, arXiv:2311.01043. [Google Scholar]
- Cui, C.; Ma, Y.; Cao, X.; Ye, W.; Zhou, Y.; Liang, K.; Chen, J.; Lu, J.; Yang, Z.; Liao, K.; et al. A Survey on Multimodal Large Language Models for Autonomous Driving. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–6 January 2024; pp. 958–979. [Google Scholar]
- Choi, J.; Park, Y.; Jun, W.; Lee, S. Research Trends Focused on End-to-End Learning Technologies for Autonomous Vehicles. J. Korean Inst. Commun. Inf. Sci. 2024, 49, 1614–1630. [Google Scholar] [CrossRef]
- Ye, F.; Zhang, S.; Wang, P.; Chan, C. A Survey of Deep Reinforcement Learning Algorithms for Motion Planning and Control of Autonomous Vehicles. In Proceedings of the 2021 IEEE Intelligent Vehicles Symposium (IV), Nagoya, Japan, 11–13 July 2021. [Google Scholar]
- Zhao, R.; Li, Y.; Fan, Y.; Gao, F.; Tsukada, M.; Gao, Z. A Survey on Recent Advancements in Autonomous Driving Using Deep Reinforcement Learning: Applications, Challenges, and Solutions. IEEE Trans. Intell. Transp. Syst. 2024, 25, 19365–19398. [Google Scholar] [CrossRef]
- Zhu, Z.; Zhao, H. A Survey of Deep RL and IL for Autonomous Driving Policy Learning. IEEE Trans. Intell. Transp. Syst. 2021, 23, 14043–14065. [Google Scholar] [CrossRef]
- Dinneweth, J.; Boubezoul, A.; Mandiau, R.; Espié, S. Multi-agent reinforcement learning for autonomous vehicles: A survey. Auton. Intell. Syst. 2022, 2, 27. [Google Scholar] [CrossRef]
- Wu, J.; Huang, C.; Huang, H.; Lv, C.; Wang, Y.; Wang, F. Recent advances in reinforcement learning-based autonomous driving behavior planning: A survey. Transp. Res. Part Emerg. Technol. 2024, 164, 104654. [Google Scholar] [CrossRef]
- Ross, S.; Gordon, G.J.; Bagnell, J.A. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS 2011), Fort Lauderdale, FL, USA, 11–13 April 2011; Volume 15, pp. 627–635. [Google Scholar]
- Ross, S.; Bagnell, D. Efficient Reductions for Imitation Learning. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS 2010), Chia Laguna Resort, Italy, 13–15 May 2010; Volume 9, pp. 661–668. [Google Scholar]
- Wu, Y.-H.; Charoenphakdee, N.; Bao, H.; Tangkaratt, V.; Sugiyama, M. Imitation Learning from Imperfect Demonstration. In Proceedings of the 36th International Conference on Machine Learning (ICML 2019), Long Beach, CA, USA, 9–15 June 2019; PMLR: London, UK, 2019; Volume 97, pp. 6818–6827. [Google Scholar]
- Codevilla, F.; Muller, M.; Lopez, A.; Koltun, V.; Dosovitskiy, A. End-to-End Driving via Conditional Imitation Learning. In Proceedings of the International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 4693–4700. [Google Scholar]
- Ho, J.; Ermon, S. Generative Adversarial Imitation Learning (GAIL). In Proceedings of the 30th Conference on Neural Information Processing Systems (NeurIPS), Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Chen, L.; Wu, P.; Chitta, K.; Jaeger, B.; Geiger, A.; Li, H. End-to-end Autonomous Driving: Challenges and Frontiers. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10164–10183. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing Atari with Deep Reinforcement Learning. In Proceedings of the NIPS Deep Learning Workshop 2013, Lake Tahoe, NV, USA, 6–7 December 2013. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.P.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous Methods for Deep Reinforcement Learning. arXiv 2016, arXiv:1602.01783. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Schulman, J.; Levine, S.; Moritz, P.; Jordan, M.I.; Abbeel, P. Trust Region Policy Optimization. In Proceedings of the 32nd International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep Reinforcement Learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Fujimoto, S.; Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; PMLR: London, UK, 2018; pp. 1587–1596. [Google Scholar] [CrossRef]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; PMLR: London, UK, 2018; pp. 1861–1870. [Google Scholar] [CrossRef]
- Bhatt, A.; Rhinehart, N.; Levine, S. CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity. arXiv 2019, arXiv:1902.05605. [Google Scholar]
- Kuznetsov, A.; Laskin, M.; Kostrikov, I.; Abbeel, P. Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics. In Proceedings of the 37th International Conference on Machine Learning (ICML), Vienna, Austria, 12–18 July 2020; PMLR: London, UK, 2020. [Google Scholar] [CrossRef]
- Li, D.; Okhrin, O. Modified DDPG car-following model with a real-world human driving experience with CARLA simulator. Transp. Res. Part 2023, 147, 103987. [Google Scholar] [CrossRef]
- Jumman Hossain, Autonomous Driving with Deep Reinforcement Learning in CARLA Simulation. arXiv 2023, arXiv:2306.11217.
- Sharma, R.; Garg, P. Optimizing Autonomous Driving with Advanced Reinforcement Learning: Evaluating DQN and PPO. In Proceedings of the 5th International Conference on Smart Electronics and Communication (ICOSEC), Tiruchirappalli, India, 18–20 September 2024. [Google Scholar]
- Pérez-Gil, Ó.; Barea, R.; López-Guillén, E.; Bergasa, L.M.; Gómez-Huélamo, C.; Gutiérrez, R.; Díaz-Díaz, A. Deep reinforcement learning based control for Autonomous Vehicles in CARLA. Multimed. Tools Appl. 2022, 81, 3553–3576. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An Open Urban Driving Simulator. In Proceedings of the 1st Annual Conference on Robot Learning (CoRL), Mountain View, CA, USA, 13–15 November 2017; PMLR: London, UK, 2017; Volume 78, pp. 1–16. [Google Scholar] [CrossRef]
- Pomerleau, D.A. ALVINN: An Autonomous Land Vehicle in a Neural Network. In Proceedings of the 1st International Conference on Neural Information Processing Systems (NIPS), Denver, CO, USA, 27–30 December 1988; pp. 305–313. [Google Scholar]
- Bojarski, M.; Testa, D.D.; Dworakowski, D.; Firner, B.; Flepp, B.; Goyal, P.; Jackel, L.D.; Monfort, M.; Muller, U.; Zhang, J.; et al. End to End Learning for Self-Driving Cars. arXiv 2016, arXiv:1604.07316. [Google Scholar]
- Rhinehart, N.; McAllister, R.; Levine, S. Deep Imitative Models for Flexible Inference, Planning, and Control. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 8001–8009. [Google Scholar]
- Kim, S.; Shin, J. Deep Imitation Learning for End-to-End Autonomous Driving: Integration of Carla and OpenAI Gym. In Proceedings of the 5th International Conference on Smart Electronics and Communication (ICOSEC), Tiruchirappalli, India, 18–20 September 2024. [Google Scholar]
- Shi, J.; Zhang, T.; Zhan, J.; Chen, S.; Xin, J.; Zheng, N. Efficient Lane-changing Behavior Planning via Reinforcement Learning with Imitation Learning Initialization. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV), Anchorage, AK, USA, 4–7 June 2023. [Google Scholar]
- Lu, Y.; Fu, J.; Tucker, G.; Pan, X.; Bronstein, E.; Roelofs, R.; Sapp, B.; White, B.; Faust, A.; Whiteson, S.; et al. Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios. arXiv 2022, arXiv:2212.11419. [Google Scholar]
- Nehme, G.; Deo, T.Y. Safe Navigation: Training Autonomous Vehicles using Deep Reinforcement Learning in CARLA. arXiv 2023, arXiv:2311.10735. [Google Scholar]
- Khlifi, A.; Othmani, M.; Kherallah, M. A Novel Approach to Autonomous Driving Using Double Deep Q-Network-Based Deep Reinforcement Learning. World Electr. Veh. J. 2025, 16, 138. [Google Scholar] [CrossRef]
- Terapaptommakol, W.; Phaoharuhansa, D.; Koowattanasuchat, P.; Rajruangrabin, J. Design of Obstacle Avoidance for Autonomous Vehicle Using Deep Q-Network and CARLA Simulator. World Electr. Veh. J. 2022, 13, 239. [Google Scholar] [CrossRef]
- Cai, P.; Wang, S.; Wang, H.; Liu, M. Carl-Lead: Lidar-based End-to-End Autonomous Driving with Contrastive Deep Reinforcement Learning. arXiv 2021, arXiv:2109.08473v1. [Google Scholar]
- Elallid, B.B.; Benamar, N.; Mrani, N.; Rachidi, T. DQN-based Reinforcement Learning for Vehicle Control of Autonomous Vehicles Interacting With Pedestrians. In Proceedings of the 2022 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT), Sakhir, Bahrain, 22–23 November 2022. [Google Scholar]
- Li, G.; Yanga, Y.; Li, S.; Qu, X.; Lyuc, N.; Li, S.E. Decision making of autonomous vehicles in lane change scenarios: Deep reinforcement learning approaches with risk awareness. Transp. Res. Part C 2022, 134, 103452. [Google Scholar] [CrossRef]
- Liang, X.; Wang, T.; Yang, L.; Xing, E. CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; Volume 11206, pp. 584–599. [Google Scholar]
- Činčurak, D.; Grbić, R.; Vranješ, M.; Vranješ, D. Autonomous Vehicle Control in CARLA Simulator using Reinforcement Learning. In Proceedings of the 66th International Symposium ELMAR-2024, Zadar, Croatia, 16–18 September 2024. [Google Scholar]
- Perez-Gil, O.; Barea, R.; Lopez-Guillen, E.; Bergasa, L.M.; Gomez-Huelamo, C.; Gutierrez, R.; Diaz, A. Deep Reinforcement Learning Based Control Algorithms: Training and Validation Using the ROS Framework in CARLA Simulator for Self-Driving Applications. In Proceedings of the 2021 IEEE Intelligent Vehicles Symposium (IV), Nagoya, Japan, 11–17 July 2021; IEEE: New York, NY, USA; pp. 958–963. [Google Scholar]
- Ahmed, M.; Abobakr, A.; Lim, C.P.; Nahavandi, S. Policy-Based Reinforcement Learning for Training Autonomous Driving Agents in Urban Areas With Affordance Learning. IEEE Trans. Intell. Transp. Syst. 2022, 23, 12562–12571. [Google Scholar] [CrossRef]
- Matsioris, G.; Theocharous, A.; Tsourveloudis, N.; Doitsidis, L. Towards developing a framework for autonomous electric vehicles using CARLA: A Validation using the Deep Deterministic Policy Gradient algorithm. In Proceedings of the 2024 32nd Mediterranean Conference on Control and Automation (MED), Chania, Greece, 11–14 June 2024. [Google Scholar]
- Huang, C.; Zhang, R.; Ouyang, M.; Wei, P.; Lin, J.; Su, J.; Lin, L. Deductive Reinforcement Learning for Visual Autonomous Urban Driving Navigation. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 5379–5391. [Google Scholar] [CrossRef] [PubMed]
- Anzalone, L.; Barra, S.; Nappi, M. Reinforced Curriculum Learning for Autonomous Driving in CARLA. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; IEEE: Piscataway, NJ, USA, 2021. [Google Scholar]
- Gutiérrez-Moreno, R.; Barea, R.; López-Guillén, E.; Araluce, J.; Bergasa, L.M. Reinforcement Learning-Based Autonomous Driving at Intersections in CARLA Simulator. Sensors 2022, 22, 8373. [Google Scholar] [CrossRef]
- Wang, L.; Liu, J.; Shao, H.; Wang, W.; Chen, R.; Liu, Y.; Waslander, S.L. Efficient Reinforcement Learning for Autonomous Driving with Parameterized Skills and Priors. In Proceedings of the Robotics: Science and Systems (RSS), Daegu, Republic of Korea, 10–14 July 2023; PMLR: Cambridge, MA, USA, 2023. [Google Scholar]
- Siboo, S.; Bhattacharyya, A.; Raj, R.N.; Ashwin, S.H. An Empirical Study of DDPG and PPO-Based Reinforcement Learning Algorithms for Autonomous Driving. IEEE Access 2023, 11, 125094–125108. [Google Scholar] [CrossRef]
- Pei, X.; Mo, S.; Chen, Z.; Yang, B. Lane Changing of Autonomous Vehicle Based on TD3 algorithm in Human-machine Hybrid Driving Environment. China J. Highw. Transp. 2021, 34, 246–254. [Google Scholar]
- Elallid, B.B.; Alaoui, H.E.; Benamar, N. Deep Reinforcement Learning for Autonomous Vehicle Intersection Navigation. In Proceedings of the 2023 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT), Sakhir, Bahrain, 20–23 November 2023; IEEE: New York, NY, USA, 2023. [Google Scholar]
- Liua, Y.; Gaoc, Y.; Zhanga, Q.; Dingc, D.; Zhao, D. Multi-task Safe Reinforcement Learning for Navigating Intersections in Dense Traffic. J. Frankl. Inst. 2023, 360, 13737–13760. [Google Scholar] [CrossRef]
- Wang, K.; She, C.; Li, Z.; Yu, T.; Li, Y.; Sakaguchi, K. Roadside Units Assisted Localized Automated Vehicle Maneuvering: An Offline Reinforcement Learning Approach. arXiv 2024, arXiv:2405.03935. [Google Scholar]
- Liu, Q.; Dang, F.; Wang, X.; Ren, X. Autonomous Highway Merging in Mixed Traffic Using Reinforcement Learning and Motion Predictive Safety Controller. In Proceedings of the 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Macau, China, 8–12 October 2022; pp. 1063–1069. [Google Scholar]
- Elallid, B.B.; Benamar, N.; Bagaa, M.; Hadjadj-Aoul, Y. Enhancing Autonomous Driving Navigation Using Soft Actor-Critic. Future Internet 2024, 16, 238. [Google Scholar] [CrossRef]
- Aghdasian, A.J.; Ardakani, A.H.; Aqabakee, K.; Abdollahi, F. Autonomous Driving using Residual Sensor Fusion and Deep Reinforcement Learning. In Proceedings of the 2023 11th RSI International Conference on Robotics and Mechatronics (ICRoM), Tehran, Iran, 13–15 December 2023; pp. 265–270. [Google Scholar]
- Abdollahian, S.A.; Fazel, S.; Hadadi, M.R.; Aghdasian, A.J. Enhancing Autonomous Vehicle Control through Sensor Fusion, NARX-based Reinforcement Learning with Soft Actor-Critic (SAC) in CARLA Simulator. In Proceedings of the RoboCup 2024: Robot World Cup XXVII, Cham, Switzerland, 15–22 July 2024; pp. 224–235. [Google Scholar]
- Yao, F.; Sun, C.; Lu, B.; Wang, B.; Yu, H. Mixture of Experts Framework Based on Soft Actor-Critic algorithm for Highway Decision-Making of Connected and Automated Vehicles. Chin. J. Mech. Eng. 2025, 38, 1. [Google Scholar] [CrossRef]
- Hart, P.E.; Nilsson, N.J.; Raphael, B. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 100–107. [Google Scholar] [CrossRef]
- Feng, Y.; Zhang, W.; Zhu, J. Application of an improved A* algorithm for the path analysis of urban multi-type transportation systems. Appl. Sci. 2023, 13, 13090. [Google Scholar] [CrossRef]
- Wang, S.; Wang, Z.; Wang, X.; Liang, Q.; Meng, L. Intelligent vehicle driving decision-making model based on variational AutoEncoder network and deep reinforcement learning. Expert Syst. Appl. 2025, 268, 126319. [Google Scholar] [CrossRef]
- Azizpour, M.; da Roza, F.; Bajcinca, N. End-to-end autonomous driving controller using semantic segmentation and variational autoencoder. In Proceedings of the 2020 7th International Conference on Control, Decision and Information Technologies (CoDIT), Prague, Czech Republic, 29 June–2 July 2020; IEEE: New York, NY, USA, 2020; Volume 1, pp. 1075–1080. [Google Scholar]
- Diaz-Diaz, A.; Aranda, M.; Barea, R.; Bergasa, L.M.; Arroyo, R. HD Maps: Exploiting OpenDRIVE Potential for Path Planning and Map Monitoring. In Proceedings of the 2022 IEEE Intelligent Vehicles Symposium (IV), Aachen, Germany, 5–9 June 2022; IEEE: New York, NY, USA, 2022; pp. 1211–1217. [Google Scholar] [CrossRef]
- Jung, S.; Jun, W.; Oh, T.; Lee, S. Performance Analysis of Motion Planning for Outdoor Autonomous Delivery Robot. In Proceedings of the 2024 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 6–8 January 2024; IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar]
- Hafner, D.; Lillicrap, T.; Ba, J.; Norouzi, M. Mastering Diverse Domains through World Models. arXiv 2023, arXiv:2301.04104. [Google Scholar]
- Riedmiller, M.; Hafner, R.; Lampe, T.; Neunert, M.; Degrave, J.; de Wiele, T.V.; Mnih, V.; Heess, N.; Springenberg, T. Learning by Playing: Solving Sparse Reward Tasks from Scratch. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; PMLR: Cambridge, MA, USA, 2018; Volume 80, pp. 4344–4353. [Google Scholar]
- Chi, C.; Xu, Z.; Feng, S.; Cousineau, E.; Du, Y.; Burchfiel, B.; Tedrake, R.; Song, S. Diffusion Policy: Visuomotor Policy Learning via Action Diffusion. Int. J. Robot. Res. 2024; Online First. [Google Scholar] [CrossRef]
- Reed, S.; Żołna, K.; Parisotto, E.; Colmenarejo, S.G.; Novikov, A.; Barth-Maron, G.; Giménez, M.; Sulsky, Y.; Kay, J.; Springenberg, J.T.; et al. A Generalist Agent. arXiv 2022, arXiv:2205.06175. [Google Scholar]
RL Category | Representative Works | Advantages | Limitations |
---|---|---|---|
Imitation Learning (IL) | [25,26,42,43,44,45] | Simple training via expert data; good initial policy generation | Poor generalization; covariate shift; error compounding |
Hybrid (IL + RL) | [46,47] | Combines strengths of IL (stability) and RL (adaptability) | Still sensitive to initial policy; intermediate complexity |
Value-based DRL (DQN and variants) | [28,48,49] | Effective for discrete action learning; well understood | Not suitable for continuous control; limited fine-grained behavior |
Deterministic Actor–Critic (DDPG, TD3) | [32,37,64] | Supports continuous action; efficient policy learning | Q-value overestimation risk; limited multi-modality |
Stochastic Actor–Critic (PPO, SAC) | [30,68,69] | Better stability and exploration; robust to noise | PPO is on-policy and sample-inefficient; SAC may still overestimate |
Regularized Actor–Critic (CrossQ, TQC) | [35,36] | Conservative updates; reduced Q-value noise; robust convergence | Relatively new; limited use in autonomous driving |
DRL Algorithm | Experience Reusability | Key Characteristics |
---|---|---|
DDPG | Off-Policy | Deterministic Actor–Critic Architecture |
TD3 | Off-Policy | Twin Q-Networks, Delayed Policy Updates |
SAC | Off-Policy | Maximum Entropy Framework with Soft Q-Function |
TQC | Off-Policy | Distributional RL, Quantile-Based Q-Value Estimation |
CrossQ | Off-Policy | Cross-Critic Regularization |
PPO | On-Policy | Policy Update Clipping for Stability |
Reward and Penalty | Value | Terminology |
---|---|---|
Reward | (22) | Speed Reward (Green Light or No Traffic Light) |
(23) | Speed Reward (Red Light) | |
(26) | Speed Reward (Yellow Light) | |
(27) | Centerline Distance Reward | |
(28) | Centerline Standard Deviation Reward | |
(29) | Vehicle Heading Reward | |
Penalty | −10 | Vehicle stopped |
Off-Track | ||
Too Fast | ||
Red Light Violation | ||
Collision | ||
Total | (31) | Total Reward Function |
Town | Map Size (m) | Cross | T | Round | Max Lane | Tunnel | Traffic Light | Highway |
---|---|---|---|---|---|---|---|---|
Town01 | 342 × 413 | 0 | 12 | 0 | 1 | 0 | 36 | No |
Town02 | 205 × 208 | 0 | 9 | 0 | 1 | 0 | 24 | No |
Town03 | 438 × 483 | 5 | 14 | 2 | 2 | 1 | 38 | No |
Town04 | 816 × 914 | 8 | 21 | 0 | 4 | 0 | 43 | Yes |
Town05 | 430 × 486 | 13 | 8 | 0 | 3 | 0 | 54 | Yes |
Town01 | Town02 | Town03 | Town04 | Town05 | |
---|---|---|---|---|---|
Red | 32.0 | 32.0 | 47.0 | 32.0 | 47.0 |
Green | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 |
Yellow | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 |
Episode 01 | Episode 02 | Episode 03 | Episode 04 | Total | Mean | |
---|---|---|---|---|---|---|
Town01 | 930 | 759 | 784 | 279 | 2752 | 688.0 |
Town02 | 175 | 264 | 313 | 156 | 908 | 227.0 |
Town03 | 545 | 433 | 341 | 399 | 1718 | 429.5 |
Town04 | 196 | 352 | 558 | 387 | 1493 | 373.3 |
Town05 | 418 | 388 | 341 | 340 | 1487 | 371.8 |
DDPG | TD3 | SAC | TQC | CrossQ | PPO | |
---|---|---|---|---|---|---|
Policy size (MB) | 0.62 | 0.62 | 0.62 | 0.62 | 0.63 | 0.62 |
Target policy size (MB) | 0.62 | 0.62 | 0.0 | 0.0 | 0.0 | 0.0 |
Critic size (MB) | 0.62 | 1.24 | 1.24 | 1.30 | 1.25 | 0.62 |
Target critic size (MB) | 0.62 | 1.24 | 1.24 | 1.30 | 0.0 | 0.0 |
Total size (MB) | 2.48 | 3.72 | 3.10 | 3.21 | 1.88 | 1.24 |
Latency (ms) | 1.05 | 1.29 | 1.46 | 1.80 | 1.41 | 1.10 |
Travel Distance (m) | Route Completion | Success Rate | |
---|---|---|---|
DDPG | 1888 | 0.23 | 0.00 |
TD3 | 6435 | 0.77 | 0.60 |
SAC | 6960 | 0.83 | 0.65 |
TQC | 7614 | 0.91 | 0.80 |
CrossQ | 1843 | 0.22 | 0.00 |
PPO | 5800 | 0.69 | 0.40 |
Episode Reward Mean | Step Reward Mean | Reward Standard Deviation | |
---|---|---|---|
DDPG | 64.5 | 0.21 | 0.71 |
TD3 | 693.8 | 0.46 | 0.31 |
SAC | 823.9 | 0.52 | 0.33 |
TQC | 977.7 | 0.62 | 0.28 |
CrossQ | 144.7 | 0.35 | 0.68 |
PPO | 644.1 | 0.46 | 0.40 |
Center Dev Mean | Speed Mean | |
---|---|---|
DDPG | 0.41 | 16.69 |
TD3 | 0.19 | 14.54 |
SAC | 0.16 | 13.24 |
TQC | 0.11 | 14.37 |
CrossQ | 0.18 | 14.73 |
PPO | 0.20 | 13.52 |
DDPG | TD3 | SAC | TQC | CrossQ | PPO | Total (Mean) | |
---|---|---|---|---|---|---|---|
Vehicle stopped | 0.63 | 0.18 | 0.18 | 0.07 | 0.69 | 0.37 | 0.35 |
Off-track | 0.08 | 0.03 | 0.01 | 0.01 | 0.00 | 0.02 | 0.03 |
Too fast | 0.00 | 0.12 | 0.09 | 0.10 | 0.29 | 0.07 | 0.11 |
Red light violation | 0.19 | 0.12 | 0.14 | 0.13 | 0.02 | 0.25 | 0.14 |
Collision | 0.10 | 0.11 | 0.02 | 0.01 | 0.00 | 0.00 | 0.04 |
Total (mean) | 0.20 | 0.11 | 0.09 | 0.06 | 0.20 | 0.14 |
Town01 | Town02 | Town03 | Town04 | Town05 | |
---|---|---|---|---|---|
Vehicle stopped | 0.32 | 0.30 | 0.52 | 0.38 | 0.26 |
Off-track | 0.03 | 0.02 | 0.06 | 0.00 | 0.02 |
Too fast | 0.03 | 0.01 | 0.14 | 0.04 | 0.12 |
Red light violation | 0.28 | 0.09 | 0.08 | 0.18 | 0.30 |
Collision | 0.03 | 0.02 | 0.03 | 0.12 | 0.01 |
Total (mean) | 0.14 | 0.09 | 0.16 | 0.14 | 0.14 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Park, Y.; Jun, W.; Lee, S. A Comparative Study of Deep Reinforcement Learning Algorithms for Urban Autonomous Driving: Addressing the Geographic and Regulatory Challenges in CARLA. Appl. Sci. 2025, 15, 6838. https://doi.org/10.3390/app15126838
Park Y, Jun W, Lee S. A Comparative Study of Deep Reinforcement Learning Algorithms for Urban Autonomous Driving: Addressing the Geographic and Regulatory Challenges in CARLA. Applied Sciences. 2025; 15(12):6838. https://doi.org/10.3390/app15126838
Chicago/Turabian StylePark, Yechan, Woomin Jun, and Sungjin Lee. 2025. "A Comparative Study of Deep Reinforcement Learning Algorithms for Urban Autonomous Driving: Addressing the Geographic and Regulatory Challenges in CARLA" Applied Sciences 15, no. 12: 6838. https://doi.org/10.3390/app15126838
APA StylePark, Y., Jun, W., & Lee, S. (2025). A Comparative Study of Deep Reinforcement Learning Algorithms for Urban Autonomous Driving: Addressing the Geographic and Regulatory Challenges in CARLA. Applied Sciences, 15(12), 6838. https://doi.org/10.3390/app15126838