Application of Hybrid Deep Reinforcement Learning for Managing Connected Cars at Pedestrian Crossings: Challenges and Research Directions
Abstract
:1. Introduction
2. Related Works
2.1. V2P Interaction
2.2. Hybrid Action Space
3. Modeling of the Problem
3.1. Simple Environment Model: Single-Lane Crossing with V2P Interaction
- Decision-making:
- Speed control:
3.2. Generalized Environment Model: Multiple Lane Crossing with Several Pedestrians and Vehicles
4. Addressing the Problem with H-DRL
4.1. Our Scalable MDP
4.2. H-DRL Model
5. Simulation and Results
5.1. Methodology for Evaluating the Performance
5.2. Results for a Single Vehicle/Pedestrian Interaction
5.3. Limitations and Perspectives
- The addition of new vehicles influences the actions required by the AVs, potentially leading to changes in the distribution of scenarios. For instance, if one vehicle prevents a pedestrian from crossing, another vehicle may be inclined to reject the pedestrian priority if the pedestrian’s waiting time is not affected by this decision. In addition, achieving convergence in the discrete part of the decision model requires a balanced distribution in the decision-making process, particularly in determining whether the AV should yield or not. Consequently, there is a need for further design work on scenario construction to ensure robustness and adaptability.
- The interdependency of actions among agents significantly influences the interaction between individual vehicles and pedestrians. For instance, if one vehicle prevents a pedestrian from crossing, another vehicle that is yielding must adjust its speed according to the pedestrian reaction to the first vehicle. Consequently, adjustments must be made to refine result estimations. We suggest prioritizing the overall success of the scenario or conducting direct comparisons with other methods.
- The actions of vehicles can occasionally be misunderstood by pedestrians, potentially leading to safety hazards. Similarly, a vehicle signal may be perceived by multiple pedestrians simultaneously, resulting in unintended communication effects. For instance, a pedestrian may only perceive one vehicle yielding while another vehicle does not. Designing appropriate reward functions and signaling models is essential to incentivize cooperation among vehicles and improve communication effectiveness.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Liu, S.; Yu, B.; Tang, J.; Zhu, Q. Towards Fully Intelligent Transportation through Infrastructure-Vehicle Cooperative Autonomous Driving: Challenges and Opportunities. arXiv 2021, arXiv:2103.02176. [Google Scholar]
- Lu, L.; Dai, F. Digitalization of Traffic Scenes in Support of Intelligent Transportation Applications. J. Comput. Civ. Eng. 2023, 37, 04023019. [Google Scholar] [CrossRef]
- Koren, M.; Alsaif, S.; Lee, R.; Kochenderfer, M.J. Adaptive Stress Testing for Autonomous Vehicles. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; pp. 1–7. [Google Scholar]
- Zhang, M.; Abbas-Turki, A.; Lombard, A.; Koukam, A.; Jo, K. Autonomous vehicle with communicative driving for pedestrian crossing: Trajectory optimization. In Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece, 20–23 September 2020; IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
- Zimmermann, R.; Wettach, R. First Step into Visceral Interaction with Autonomous Vehicles. In Proceedings of the AutomotiveUI ’17: 9th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, New York, NY, USA, 24–27 September 2017; pp. 58–64. [Google Scholar]
- Jayaraman, S.K.; Creech, C.; Robert, L.P., Jr.; Tilbury, D.M.; Yang, X.J.; Pradhan, A.K.; Tsui, K.M. Trust in AV: An Uncertainty Reduction Model of AV-Pedestrian Interactions. In Proceedings of the HRI ’18: Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, New York, NY, USA, 5–8 March 2018; pp. 133–134. [Google Scholar]
- Gupta, S.; Vasardani, M.; Winter, S. Negotiation Between Vehicles and Pedestrians for the Right of Way at Intersections. IEEE Trans. Intell. Transp. Syst. 2019, 20, 888–899. [Google Scholar] [CrossRef]
- Manski, C.; Manuszak, M.; Das, S. Walk or wait? An empirical analysis of street crossing decisions. J. Appl. Econom. 2005, 20, 529–548. [Google Scholar]
- Ackermann, C.; Beggiato, M.; Bluhm, L.F.; Löw, A.; Krems, J.F. Deceleration parameters and their applicability as informal communication signal between pedestrians and automated vehicles. Transp. Res. Part F Traffic Psychol. Behav. 2019, 62, 757–768. [Google Scholar] [CrossRef]
- Rasouli, A.; Tsotsos, J.K. Autonomous vehicles that interact with pedestrians: A survey of theory and practice. IEEE Trans. Intell. Transp. Syst. 2019, 21, 900–918. [Google Scholar] [CrossRef]
- Dey, D.; Matviienko, A.; Berger, M.; Pfleging, B.; Kuhl, B.M.; Terken, J.M.B. Communicating the intention of an automated vehicle to pedestrians: The contributions of eHMI and vehicle behavior. IT Inf. Technol. 2020, 63, 123–141. [Google Scholar] [CrossRef]
- Dey, D.; Habibovic, A.; Pfleging, B.; Martens, M.; Terken, J. Color and Animation Preferences for a Light Band eHMI in Interactions Between Automated Vehicles and Pedestrians. In Proceedings of the CHI ’20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 25–30 April 2020. [Google Scholar]
- Lillicrap, T.; Hunt, J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep Reinforcement Learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Zhang, M.; Abbas-Turki, A.; Mualla, Y.; Koukam, A.; Tu, X. Coordination Between Connected Automated Vehicles and Pedestrians to Improve Traffic Safety and Efficiency at Industrial Sites. IEEE Access 2022, 10, 68029–68041. [Google Scholar] [CrossRef]
- Brunoud, A.; Lombard, A.; Zhang, M.; Abbas-Turki, A.; Gaud, N.; Koukam, A. Comparison of Deep Reinforcement Learning Methods for Safe and Efficient Autonomous Vehicles at Pedestrian Crossings. In Proceedings of the 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Macau, China, 8–12 October 2022; pp. 2556–2562. [Google Scholar]
- Møgelmose, A.; Trivedi, M.M.; Moeslund, T.B. Trajectory analysis and prediction for improved pedestrian safety: Integrated framework and evaluations. In Proceedings of the 2015 IEEE Intelligent Vehicles Symposium (IV), Seoul, Republic of Korea, 28 June–1 July 2015; pp. 330–335. [Google Scholar]
- Wei, S.; Zou, Y.; Zhang, T.; Zhang, X.; Wang, W. Design and Experimental Validation of a Cooperative Adaptive Cruise Control System Based on Supervised Reinforcement Learning. Appl. Sci. 2018, 8, 1014. [Google Scholar] [CrossRef]
- Bosina, E.; Weidmann, U. Estimating pedestrian speed using aggregated literature data. Phys. A Stat. Mech. Appl. 2017, 468, 1–29. [Google Scholar] [CrossRef]
- Yau, T.; Malekmohammadi, S.; Rasouli, A.; Lakner, P.; Rohani, M.; Luo, J. Graph-SIM: A Graph-based Spatiotemporal Interaction Modelling for Pedestrian Action Prediction. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 8580–8586. [Google Scholar]
- Nasernejad, P.; Sayed, T.; Alsaleh, R. Modeling pedestrian behavior in pedestrian-vehicle near misses: A continuous Gaussian Process Inverse Reinforcement Learning (GP-IRL) approach. Accid. Anal. Prev. 2021, 161, 106355. [Google Scholar] [CrossRef]
- Corso, A.; Du, P.; Driggs-Campbell, K.; Kochenderfer, M.J. Adaptive Stress Testing with Reward Augmentation for Autonomous Vehicle Validation. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 163–168. [Google Scholar]
- Li, J.; Yao, L.; Xu, X.; Cheng, B.; Ren, J. Deep reinforcement learning for pedestrian collision avoidance and human-machine cooperative driving. Inf. Sci. 2020, 532, 110–124. [Google Scholar] [CrossRef]
- Ye, Y.; Zhang, X.; Sun, J. Automated vehicle’s behavior decision making using deep reinforcement learning and high-fidelity simulation environment. Transp. Res. Part C Emerg. Technol. 2019, 107, 155–170. [Google Scholar] [CrossRef]
- Chae, H.; Kang, C.M.; Kim, B.; Kim, J.; Chung, C.C.; Choi, J.W. Autonomous Braking System via Deep Reinforcement Learning. arXiv 2017, arXiv:1702.02302. [Google Scholar]
- Fu, Y.; Li, C.; Yu, F.R.; Luan, T.H.; Zhang, Y. A Decision-Making Strategy for Vehicle Autonomous Braking in Emergency via Deep Reinforcement Learning. IEEE Trans. Veh. Technol. 2020, 69, 5876–5888. [Google Scholar] [CrossRef]
- Andrychowicz, M.; Wolski, F.; Ray, A.; Schneider, J.; Fong, R.; Welinder, P.; McGrew, B.; Tobin, J.; Abbeel, O.P.; Zaremba, W. Hindsight Experience Replay. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2017; Volume 30. [Google Scholar]
- Teh, Y.W.; Bapst, V.; Czarnecki, W.M.; Quan, J.; Kirkpatrick, J.; Hadsell, R.; Heess, N.; Pascanu, R. Distral: Robust Multitask Reinforcement Learning. arXiv 2017, arXiv:1707.04175. [Google Scholar]
- Espeholt, L.; Soyer, H.; Munos, R.; Simonyan, K.; Mnih, V.; Ward, T.; Doron, Y.; Firoiu, V.; Harley, T.; Dunning, I.; et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. arXiv 2018, arXiv:1802.01561. [Google Scholar]
- Hausknecht, M.; Stone, P. Deep Reinforcement Learning in Parameterized Action Space. arXiv 2016, arXiv:1511.04143. [Google Scholar]
- Xiong, J.; Wang, Q.; Yang, Z.; Sun, P.; Han, L.; Zheng, Y.; Fu, H.; Zhang, T.; Liu, J.; Liu, H. Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space. arXiv 2018, arXiv:1810.06394. [Google Scholar]
- Jang, S.; Son, Y. Empirical Evaluation of Activation Functions and Kernel Initializers on Deep Reinforcement Learning. In Proceedings of the 2019 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea, 16–18 October 2019; pp. 1140–1142. [Google Scholar]
- Schulman, J.; Levine, S.; Moritz, P.; Jordan, M.I.; Abbeel, P. Trust Region Policy Optimization. arXiv 2015, arXiv:1502.05477. [Google Scholar]
- Fan, Z.; Su, R.; Zhang, W.; Yu, Y. Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space. arXiv 2019, arXiv:1903.01344. [Google Scholar]
- Trumpp, R.; Bayerlein, H.; Gesbert, D. Modeling Interactions of Autonomous Vehicles and Pedestrians with Deep Multi-Agent Reinforcement Learning for Collision Avoidance. In Proceedings of the 2022 IEEE Intelligent Vehicles Symposium (IV), Aachen, Germany, 4–9 June 2022; pp. 331–336. [Google Scholar]
- Brunoud, A.; Lombard, A.; Abbas-Turki, A.; Gaud, N.; Kang-Hyun, J. Hybrid Deep Reinforcement Learning Model for Safe and Efficient Autonomous Vehicles at Pedestrian Crossings. In Proceedings of the 2023 International Workshop on Intelligent Systems (IWIS), Ulsan, Republic of Korea, 9–11 August 2023; pp. 1–6. [Google Scholar]
Agent Name | Parameter Name | Initial State Range | Action Range | Type | Units |
---|---|---|---|---|---|
Pedestrian i | Speed (2D) | Ø | float | m/s | |
with i ∈ [1, Np] | [0.75, 1.75] | Ø | float | m/s | |
Position (2D) | [0, 4] | Ø | float | m | |
[−3.0, −0.5] | Ø | float | m | ||
Safety Factor C1 | C2 | Ø | float | m | |
In Crosswalk | False | Ø | boolean | Ø | |
After Crosswalk | False | Ø | boolean | Ø | |
Car j | Acceleration | 0 | [−4, 2] | float | m/s2 |
with j ∈ [1, Nc] | Speed | Sl | Ø | float | m/s |
Deviation from speed limit | 0 | Ø | float | m/s | |
Position | C2 | Ø | float | m | |
Light | 0 | {−1, 0, 1} | Ø | Ø | |
Line | ⟦0; Nl − 1⟧ | Ø | integer | Ø | |
Crosswalk/Road | Size | [2.5, 3.0] ∗ Nl | Ø | float | m |
Number of Lines | Nl | Ø | integer | Ø |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Brunoud, A.; Lombard, A.; Gaud, N.; Abbas-Turki, A. Application of Hybrid Deep Reinforcement Learning for Managing Connected Cars at Pedestrian Crossings: Challenges and Research Directions. Future Transp. 2024, 4, 579-590. https://doi.org/10.3390/futuretransp4020027
Brunoud A, Lombard A, Gaud N, Abbas-Turki A. Application of Hybrid Deep Reinforcement Learning for Managing Connected Cars at Pedestrian Crossings: Challenges and Research Directions. Future Transportation. 2024; 4(2):579-590. https://doi.org/10.3390/futuretransp4020027
Chicago/Turabian StyleBrunoud, Alexandre, Alexandre Lombard, Nicolas Gaud, and Abdeljalil Abbas-Turki. 2024. "Application of Hybrid Deep Reinforcement Learning for Managing Connected Cars at Pedestrian Crossings: Challenges and Research Directions" Future Transportation 4, no. 2: 579-590. https://doi.org/10.3390/futuretransp4020027
APA StyleBrunoud, A., Lombard, A., Gaud, N., & Abbas-Turki, A. (2024). Application of Hybrid Deep Reinforcement Learning for Managing Connected Cars at Pedestrian Crossings: Challenges and Research Directions. Future Transportation, 4(2), 579-590. https://doi.org/10.3390/futuretransp4020027