An Efficiency Enhancing Methodology for Multiple Autonomous Vehicles in an Urban Network Adopting Deep Reinforcement Learning
Abstract
:1. Introduction
- An advanced DRL-based PPO method shows the integration of multilayer perceptron (MLP) and RL through the PPO algorithm to optimize the DRL policy and evaluate the efficiency of the leading autonomous vehicles in the urban network within the real traffic volume over AV penetration rates. The leading autonomous vehicle experiment outperformed other experiments regarding the DRL policy, mobility, and energy.
- The hyperparameters of the PPO with a clipped object are suggested to enhance the autonomous extraction feature and to yield a better performance in the urban network.
- The meaningful development of traffic congestion in the urban network relies upon AV penetration rates. The proposed method becomes more effective with a higher AV penetration rate.
2. Research Methodology
2.1. Car-Following Model
2.2. Proximal Policy Optimization (PPO)
Algorithm 1 PPO with a Clipped Objective Algorithm. |
1: An initial policy parameters θ0, clipping threshold 2: For k = 0, 1, 2…do 3: Gather set of trajectories on stochastic policy 4: Estimate GAE advantages using GAE technique 5: Compute policy update 6: by talking K steps of minibatch SGD (via Adam) 7: End for |
2.3. Deep Reinforcement Learning Method Architecture
3. Hyperparameter Tuning and Performance Evaluation Metrics
- Average speed: the mean velocity values of entire vehicles in the urban network.
- Fuel consumption: the mean fuel consumption values of entire vehicles in the urban network.
- Emissions: the mean emission values of entire vehicles in the urban network—namely, nitrogen oxide (Nox) and hydrocarbons (HC).
4. Experiments and Results
4.1. Simulation Scenarios
4.2. Simulation Results
4.2.1. Performance of Deep Reinforcement Learning Policy
4.2.2. Efficiency of the Leading Autonomous Vehicles Regarding the Flattening Velocity
4.2.3. Efficiency of the Leading Autonomous Vehicles Regarding Mobility and Energy
4.2.4. Comparison of Leading Autonomous Vehicle Experiments
4.2.5. Comparison of Deep Reinforcement Learning’s Hyperparameters
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Traffic Accident Analysis System. Available online: https://taas.koroad.or.kr/web/bdm/srs/selectStaticalReportsDetail.do (accessed on 30 December 2019).
- Wadud, Z.; MacKenzie, D.; Leiby, P. Help or hindrance? The travel, energy and carbon impacts of highly automated vehicles. Transp. Res. Part A Policy Pract. 2016, 86, 1–18. [Google Scholar] [CrossRef] [Green Version]
- SAE International. Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles. 2018. Available online: www.sae.org/standards/content/J3016_201806/ (accessed on 15 June 2018).
- Wu, K.F.; Ardiansyah, M.N.; Ye, W.J. An evaluation scheme for assessing the effectiveness of intersection movement assist (IMA) on improving traffic safety. Traffic Inj. Prev. 2018, 19, 179–183. [Google Scholar] [CrossRef]
- Philip, B.V.; Alpcan, T.; Jin, J.; Palaniswami, M. Distributed Real-Time IoT for Autonomous Vehicles. IEEE Trans. Ind. Inform. 2019, 15, 1131–1140. [Google Scholar] [CrossRef]
- Xu, N.; Tan, B.; Kong, B. Autonomous Driving in Reality with Reinforcement Learning and Image Translation. arXiv 2019, arXiv:1801.05299v2. [Google Scholar]
- Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An Open Urban Driving Simulator. arXiv 2017, arXiv:1711.03938. [Google Scholar]
- Krajzewicz, D.; Hertkorn, G.; Feld, C.; Wagner, P. SUMO (Simulation of Urban MObility): An open-source traffic simulation. In Proceedings of the 4th Middle East Symposium on Simulation and Modelling (MESM2002), Berlin-Adlershof, Germany, 1–30 September 2002. [Google Scholar]
- Wu, C.; Parvate, K.; Kheterpal, N.; Dickstein, L.; Mehta, A.; Vinitsky, E.; Bayen, A.M. Framework for control and deep reinforcement learning in traffic. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 1–8. [Google Scholar] [CrossRef]
- Wu, C.; Kreidieh, A.; Parvate, K.; Vinitsky, E.; Bayen, A.M. Flow: Architecture and benchmarking for reinforcement learning in traffic control. arXiv 2017, arXiv:1710.05465. [Google Scholar]
- Kreidieh, A.R.; Wu, C.; Bayen, A.M. Dissipating stop-and-go waves in closed and open networks via deep reinforcement learning. In Proceedings of the 2018 IEEE International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018. [Google Scholar]
- Koh, S.; Zhou, B.; Fang, H.; Yang, P.; Yang, Z.; Yang, Q.; Guan, L.; Ji, Z. Real-time deep reinforcement learning based vehicle navigation. Appl. Soft Comput. 2020, 96, 106694. [Google Scholar] [CrossRef]
- Rajamani, R.; Zhu, C. Semi-autonomous adaptive cruise control systems. IEEE Trans. Veh. Technol. 2002, 51, 1186–1192. [Google Scholar] [CrossRef]
- An, S.; Xu, L.; Qian, L.; Chen, G.; Luo, H.; Li, F. Car-following model for autonomous vehicles and mixed traffic flow analysis based on discrete following interval. Phys. A Stat. Mech. Its Appl. 2020, 560, 125246. [Google Scholar] [CrossRef]
- Treiber, M.; Helbing, D. Realistische mikrosimulation von straenverkehr mit einem einfachen modell’. In Proceedings of the Symposium ‘Simulationstechnik’, ASIM, Rostock, Germany, 10–13 September 2002. [Google Scholar]
- Treiber, M.; Kesting, A. Evidence of convective instability in congested traffic flow: A systematic empirical and theoretical investigation. Procedia Soc. Behav. Sci. 2011, 17, 698–716. [Google Scholar] [CrossRef] [Green Version]
- Levine, S.; Finn, C.; Darrell, T.; Abbeel, P. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 2016, 17, 1–40. [Google Scholar]
- Bai, Z.; Cai, B.; ShangGuan, W.; Chai, L. Deep Learning Based Motion Planning for Autonomous Vehicle Using Spatiotemporal LSTM Network. In Proceedings of the 2018 Chinese Automation Congress (CAC), Xi’an, China, 30 November–2 December 2018; pp. 1610–1614. [Google Scholar]
- NVIDIA Corporation. Introducing Xavier, the NVIDIA AI Supercomputer for the Future of Autonomous Transportation. Available online: https://blogs.nvidia.com/blog/2016/09/28/xavier/ (accessed on 28 September 2016).
- MobilEye. The Evolution of EyeQ. Available online: https://www.mobileye.com/our-technology/evolution-eyeq-chip/ (accessed on 17 May 2016).
- Bellman, R. A Markovian Decision Process. J. Math. Mech. 1957, 6, 679–684. [Google Scholar] [CrossRef]
- Papadimitriou, C.; Tsisiklis, J.N. The complexity of Markov decision processes. Math. Oper. Res. 1987, 12, 441–450. [Google Scholar] [CrossRef] [Green Version]
- Abdulhai, B.; Pringle, R.; Karakoulas, G.J. Reinforcement Learning for True Adaptive Traffic Signal Control. J. Transp. Eng. 2003, 129, 3. [Google Scholar] [CrossRef] [Green Version]
- Manniona, P.; Duggana, J.; Howleya, E. Parallel Reinforcement Learning for Traffic Signal Control. Procedia Comput. Sci. 2015, 52, 956–961. [Google Scholar] [CrossRef] [Green Version]
- García Cuenca, L.; Puertas, E.; Fernandez Andrés, J.; Aliane, N. Autonomous Driving in Roundabout Maneuvers Using Reinforcement Learning with Q-Learning. Electronics 2019, 8, 1536. [Google Scholar] [CrossRef] [Green Version]
- Tan, T.; Bao, F.; Deng, Y.; Jin, A.; Dai, Q.; Wang, J. Cooperative Deep Reinforcement Learning for Large-Scale Traffic Grid Signal Control. IEEE Trans. Cybern. 2020, 50, 2687–2700. [Google Scholar] [CrossRef] [PubMed]
- Chen, J.; Xue, Z.; Fan, D. Deep Reinforcement Learning Based Left-Turn Connected and Automated Vehicle Control at Signalized Intersection in Vehicle-to-Infrastructure Environment. Information 2020, 11, 77. [Google Scholar] [CrossRef] [Green Version]
- Kim, D.; Jeong, O. Cooperative Traffic Signal Control with Traffic Flow Prediction in Multi-Intersection. Sensors 2020, 20, 137. [Google Scholar] [CrossRef] [Green Version]
- Capasso, A.P.; Bacchiani, G.; Molinari, D. Intelligent roundabout insertion using deep reinforcement learning. In Proceedings of the 12th International Conference on Agents and Artificial Intelligence, ICAART 2020, Valletta, Malta, 22–24 February 2020; Volume 2, pp. 378–385. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.P.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. arXiv 2016, arXiv:1602.01783. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Schulman, J.; Levine, S.; Moritz, P.; Jordan, M.I.; Abbeel, P. Trust region policy optimization. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1889–1897. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Ye, F.; Cheng, X.; Wang, P.; Chan, C.Y.; Zhang, J. Automated Lane Change Strategy using Proximal Policy Optimization-based Deep Reinforcement Learning. arXiv 2020, arXiv:2002.02667. [Google Scholar]
- Duy, Q.T.; Bae, S.-H. Proximal Policy Optimization through a Deep Reinforcement Learning Framework for Multiple Autonomous Vehicles at a Non-Signalized Intersection. Appl. Sci. 2020, 10, 5722. [Google Scholar]
- Treiber, M.; Kesting, A. Traffic Flow Dynamics. Traffic Flow Dynamics: Data, Models and Simulation; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Schulman, J.; Moritz, P.; Levine, S.; Jordan, M.I.; Abbeel, P. High-Dimensional Continuous Control Using Generalized Advantage Estimation. arXiv 2018, arXiv:1506.02438v6. [Google Scholar]
- Graesser, L.; Keng, W.L. Foundations of Deep Reinforcement Learning: Theory and Practice in Python; Addison-Wesley Professional: Boston, MA, USA, 2019; Chapter 7. [Google Scholar]
- Rumelhart, D.; Hinton, G.; Williams, R. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Behrisch, M.; Bieker, L.; Erdmann, J.; Krajzewicz, D. SUMO—Simulation of Urban MObility: An Overview. In Proceedings of the Third International Conference on Advances in System Simulation, Barcelona, Spain, 23–29 October 2011. [Google Scholar]
- Hamby, D.M. A review of techniques for parameter sensitivity analysis of environmental models. Environ. Monit. Assess 1994, 32, 135–154. [Google Scholar] [CrossRef]
Parameters | Value |
---|---|
Desired speed (m/s) | 15 |
Time gap (s) | 1.0 |
Minimum gap (m) | 2.0 |
Acceleration exponent | 4.0 |
Acceleration (m/s2) | 1.0 |
Comfortable acceleration (m/s2) | 1.5 |
Parameters | Value |
---|---|
Number of training iterations | 200 |
Time horizon per training iteration | 6000 |
Hidden layers | 256 × 256 × 256 |
GAE Lambda | 1.0 |
Clip parameter | 0.2 |
Step size | 5 × 104 |
Value function clip parameter | 10 × 103 |
Number of SGD iterations | 10 |
Parameters | Value |
---|---|
Number of training iterations | 200 |
Time horizon per training iteration | 6000 |
Gamma | 0.99 |
Hidden layers | 256 × 256 × 256 |
Lambda | 0.95 |
Kullback–Leibler target | 0.01 |
Number of SGD iterations | 10 |
AV Penetration Rate | Average Speed (m/s) | Average Reward |
---|---|---|
0% (entire manual vehices) | 6.16 | 11,647.90 |
20% (mixed automation) | 7.31 | 43,863.63 |
40% (mixed automation) | 7.46 | 81,232.67 |
60% (mixed automation) | 7.67 | 101,690.6 |
80% (mixed automation) | 7.74 | 123,731.6 |
100% (full automation) | 7.81 | 135,611.7 |
AV Penetration Rate | Average Speed (m/s) | Average Reward | ||
---|---|---|---|---|
Leading Autonomous Vehicle Experiment | Leading Manual Vehicle Experiment | Leading Autonomous Vehicle Experiment | Leading Manual Vehicle Experiment | |
20% | 7.31 | 6.79 | 43,863.63 | 39,709.56 |
40% | 7.46 | 6.90 | 81,232.67 | 75,080.93 |
60% | 7.67 | 7.14 | 101,690.6 | 87,551.11 |
80% | 7.74 | 7.51 | 123,731.6 | 116,557.88 |
AV Penetration Rate | Average Reward | |
---|---|---|
Proposed Hyperparameter | PPO with Adaptive KL Penalty Hyperparameter | |
20% | 43,863.63 | 36,656.48 |
40% | 81,232.67 | 63,854.03 |
60% | 101,690.6 | 93,752.99 |
80% | 123,731.6 | 101,102.62 |
100% | 135,611.7 | 113,793.09 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tran, Q.-D.; Bae, S.-H. An Efficiency Enhancing Methodology for Multiple Autonomous Vehicles in an Urban Network Adopting Deep Reinforcement Learning. Appl. Sci. 2021, 11, 1514. https://doi.org/10.3390/app11041514
Tran Q-D, Bae S-H. An Efficiency Enhancing Methodology for Multiple Autonomous Vehicles in an Urban Network Adopting Deep Reinforcement Learning. Applied Sciences. 2021; 11(4):1514. https://doi.org/10.3390/app11041514
Chicago/Turabian StyleTran, Quang-Duy, and Sang-Hoon Bae. 2021. "An Efficiency Enhancing Methodology for Multiple Autonomous Vehicles in an Urban Network Adopting Deep Reinforcement Learning" Applied Sciences 11, no. 4: 1514. https://doi.org/10.3390/app11041514
APA StyleTran, Q.-D., & Bae, S.-H. (2021). An Efficiency Enhancing Methodology for Multiple Autonomous Vehicles in an Urban Network Adopting Deep Reinforcement Learning. Applied Sciences, 11(4), 1514. https://doi.org/10.3390/app11041514