# Design of Unsignalized Roundabouts Driving Policy of Autonomous Vehicles Using Deep Reinforcement Learning

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Literature Review

#### 1.2. Contribution

## 2. Roundabout Scenario

#### 2.1. Roundabout Simulation Scenario Construction

#### 2.2. Roundabout Obstacle Vehicles Control

#### 2.3. Ego-Vehicle Control

## 3. IP-SAC Algorithm Framework

#### 3.1. Algorithm Framework Design

#### 3.2. Interval Prediction Model

#### 3.3. Self-Attention Network

#### 3.4. State Action and Reward Setting

## 4. Simulation Results and Analysis

#### 4.1. Simulation Test Results

#### 4.2. Ablation Experiment

#### 4.3. Visual Analysis of Simulation Output Results

#### 4.4. Verify in CARLA

## 5. Discussions

## 6. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Hang, P.; Huang, C.; Hu, Z.; Xing, Y.; Lv, C. Decision making of connected automated vehicles at an unsignalized roundabout considering personalized driving behaviours. IEEE Trans. Veh. Technol.
**2021**, 70, 4051–4064. [Google Scholar] [CrossRef] - Kiran, B.R.; Sobh, I.; Talpaert, V.; Mannion, P.; Al Sallab, A.A.; Yogamani, S.; Pérez, P. Deep reinforcement learning for autonomous driving: A survey. IEEE Trans. Intell. Transp. Syst.
**2021**, 23, 4909–4926. [Google Scholar] [CrossRef] - Lodinger, N.R.; DeLucia, P.R. Does automated driving affect time-to-collision judgments? Transp. Res. Part F Traffic Psychol. Behav.
**2019**, 64, 25–37. [Google Scholar] [CrossRef] - Qian, L.J.; Chen, C.; Chen, J.; Chen, X.; Xiong, C. Discrete platoon control at an unsignalized intersection based on Q-learning model. Automot. Eng.
**2022**, 44, 1350–1358. [Google Scholar] - Hawke, J.; Shen, R.; Gurau, C.; Sharma, S.; Reda, D.; Nikolov, N.; Mazur, P.; Micklethwaite, S.; Griffiths, N.; Shah, A. Urban Driving with Conditional Imitation Learning. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 251–257. [Google Scholar]
- Fuchs, F.; Song, Y.; Kaufmann, E.; Scaramuzza, D.; Dürr, P. Super-human performance in gran turismo sport using deep reinforcement learning. IEEE Robot. Autom. Lett.
**2021**, 6, 4257–4264. [Google Scholar] [CrossRef] - Kendall, A.; Hawke, J.; Janz, D.; Mazur, P.; Reda, D.; Allen, J.-M.; Lam, V.-D.; Bewley, A.; Shah, A. Learning to Drive in a Day. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 8248–8254. [Google Scholar]
- Terapaptommakol, W.; Phaoharuhansa, D.; Koowattanasuchat, P.; Rajruangrabin, J. Design of Obstacle Avoidance for Autonomous Vehicle Using Deep Q-Network and CARLA Simulator. World Electr. Veh. J.
**2022**, 13, 239. [Google Scholar] [CrossRef] - Song, X.L.; Sheng, X.; Haotian, C.; Mingjun, L.; Binlin, Y.; Zhi, H. Decision-making of intelligent vehicle lane change behavior based on imitation learning and reinforcement learning. Automot. Eng.
**2021**, 43, 59–67. [Google Scholar] - Jinghua, G.; Wenchang, L.; Yugong, L.; Tao, C.; Keqiang, L. Driver Car-following model based on deep reinforcement Learning. Automot. Eng.
**2021**, 43, 571–579. [Google Scholar] - Wang, H.; Yuan, S.; Guo, M.; Li, X.; Lan, W. A deep reinforcement learning-based approach for autonomous driving in highway on-ramp merge. Proc. Inst. Mech. Eng. Part D: J. Automob. Eng.
**2021**, 235, 2726–2739. [Google Scholar] [CrossRef] - Hoel, C.-J.; Wolff, K.; Laine, L. Tactical Decision-Making in Autonomous Driving by Reinforcement Learning with Uncertainty Estimation. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 19 October–13 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1563–1569. [Google Scholar]
- Zhang, Y.; Gao, B.; Guo, L.; Guo, H.; Chen, H. Adaptive decision-making for automated vehicles under roundabout scenarios using optimization embedded reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst.
**2020**, 32, 5526–5538. [Google Scholar] [CrossRef] - García Cuenca, L.; Puertas, E.; Fernandez Andrés, J.; Aliane, N. Autonomous driving in roundabout maneuvers using reinforcement learning with Q-learning. Electronics
**2019**, 8, 1536. [Google Scholar] [CrossRef] - Peng, Z.; Li, Q.; Hui, K.M.; Liu, C.; Zhou, B. Learning to simulate self-driven particles system with coordinated policy optimization. Adv. Neural Inf. Process. Syst.
**2021**, 34, 10784–10797. [Google Scholar] - Leurent, E.; Maillard, O.-A.; Efimov, D. Robust-adaptive control of linear systems: Beyond quadratic costs. Adv. Neural Inf. Process. Syst.
**2020**, 33, 3220–3231. [Google Scholar] - Leurent, E.; Efimov, D.; Maillard, O.-A. Robust-Adaptive Interval Predictive Control for Linear Uncertain Systems. In Proceedings of the 2020 59th IEEE Conference on Decision and Control (CDC), Jeju Island, Republic of Korea, 14–18 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1429–1434. [Google Scholar]
- Lubars, J.; Gupta, H.; Chinchali, S.; Li, L.; Raja, A.; Srikant, R.; Wu, X. Combining Reinforcement Learning with Model Predictive Control for On-Ramp Merging. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 942–947. [Google Scholar]
- Williams, G.; Wagener, N.; Goldfain, B.; Drews, P.; Rehg, J.M.; Boots, B.; Theodorou, E.A. Information Theoretic MPC for Model-based Reinforcement Learning. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Marina Bay Sands, Singapore, 29 May–3 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1714–1721. [Google Scholar]
- Wang, J.; Zhang, Q.; Zhao, D. Highway Lane Change Decision-Making via Attention-Based Deep Reinforcement Learning. IEEE/CAA J. Autom. Sin.
**2021**, 9, 567–569. [Google Scholar] [CrossRef] - Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst.
**2017**, 30, 5998–6008. [Google Scholar] - An Environment for Autonomous Driving Decision-Making. 2022. Available online: https://github.com/eleurent/highway-env (accessed on 1 May 2022).
- Seong, H.; Jung, C.; Lee, S.; Shim, D.H. Learning to Drive at Unsignalized Intersections Using Attention-Based Deep Reinforcement Learning. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 559–566. [Google Scholar]
- Riccardi, M.R.; Augeri, M.G.; Galante, F.; Mauriello, F.; Nicolosi, V.; Montella, A. Safety Index for evaluation of urban roundabouts. Accid. Anal. Prev.
**2022**, 178, 106858. [Google Scholar] [CrossRef] [PubMed]

**Figure 12.**Comparison of SAC policy and IP-SAC policy in the case of oncoming vehicle on the right. (

**a**) SAC Policy. (

**b**) IP-SAC Policy.

**Figure 13.**IP-SAC policy control ego-vehicle through the roundabout. (

**a**) Ego-vehicle drive into the roundabout. (

**b**) Ego-vehicle follow the front vehicle. (

**c**) Ego-vehicle focus on incoming vehicles. (

**d**) Ego-vehicle drive carefully. (

**e**) Ego-vehicle slow down at merge port. (

**f**) No obstacle vehicles ahead ego-vehicle accelerate. (

**g**) Ego-vehicle is exiting the roundabout. (

**h**) Ego-vehicle completely out of roundabout.

**Figure 15.**IP-SAC test in CARLA roundabout. (

**a**) Ego-vehicle front camera data. (

**b**) Ego-vehicle lidar data. (

**c**) Projection of other obstacle vehicles’ location and map information. (

**d**) Ego-vehicle driving speed curve.

Platform | Low Dimensionality | Simulation Accuracy | Easy Development |
---|---|---|---|

TORCS | × | √ | × |

CARLA | × | √ | √ |

SMARTS | √ | √ | × |

SUMO | √ | × | √ |

Driver-Gym | × | √ | × |

Highway-ENV | √ | √ | √ |

Parameters | Value |
---|---|

Maximum acceleration $\omega $ | 6.0 m/s^{2} |

Constant velocity parameter $\delta $ | 4.0 |

Safety time interval $T$ | 1.5 s |

Maximum deceleration $b$ | −5.0 m/s^{2} |

Minimum relative distance ${d}_{0}$ | 10.0 m |

Parameters | Value |
---|---|

pre-training steps | 1000 |

maximum steps in a single round | 500 |

batch size | 256 |

replay size | 1,000,000 |

discount factor | 0.99 |

learning rate | 0.0003 |

Optimizer | Adam |

fully connected hidden layer | [128, 128] |

Self-attention network coding layer | [64, 64] |

Self-attention network decoding layer | [64, 64] |

Self-attention network head | 2 |

Self-attention network normalization factor | 32 |

Interval Processor with Soft Actor-Critical, IP-SAC | |
---|---|

Input: attention matrix, accessible area. | |

1: | Initialize network and parameters. |

2: | for epoch iteration do: |

3: | for each environment step do: |

4: | Interval prediction model calculates potential feasible paths. |

5: | Adjust the reward function according to the prediction results. |

6: | Output action(t) according to SAC policy. |

7: | After performing action(t), the environment is transferred to state(t + 1) and rewarded with reward(t). |

8: | Save the training sample: sample(t) = {s(t), a(t), r(t), s(t + 1)}. |

9: | A small batch of samples were randomly selected from the experience pool buffer to calculate the gradient training and update the neural network parameters. |

10: | End for |

11: | End for |

12: | Saving network parameters. |

Output: roundabout driving policy: ${\pi}_{new}^{*}$. |

Award Category | Reward Subfunctions |
---|---|

Crash penalty | ${r}_{1}=\{\begin{array}{cc}\hfill -\left(10+\frac{{v}_{t}}{12}\right),& \mathrm{crashed}\\ \hfill 0,& \mathrm{otherwise}\end{array}$ |

Lane change penalty | ${r}_{2}=-0.5$ |

Collision area penalty | ${r}_{3}=-1$ |

Speed reward | ${r}_{4}=\frac{1}{12}*\left[-2,10\right]\propto {v}_{t},{v}_{t}\in \left(0,12m/s\right)$ |

Time penalty | ${r}_{5}=-t/100$ |

Finish reward | ${r}_{6}=5$ |

Parameters | Value |
---|---|

Number of training experience accumulation steps | 1000 |

Maximum number of steps in a single round | 500 |

Number of small batch samples | 256 |

Update frequency | 2 |

Playback buffer size | 1e6 |

Discount factor | 0.99 |

Strategy network learning rate | 3e−4 |

Neuron activation function | ReLU |

Optimizer | Adam |

Fully connected hidden layer | [128, 128] |

Self-attentive network coding layer | [64, 64] |

Self-attentive network decoding layer | [64, 64] |

Self-attentive network head count | 2 |

Attention network normalization factor | 32 |

Method | Success | Collision | Timeout |
---|---|---|---|

SAC | 0.56 | 0.25 | 0.19 |

SAC + Interval prediction | 0.73 | 0.14 | 0.13 |

SAC + Self-attention network | 0.72 | 0.22 | 0.06 |

IP-SAC | 0.83 | 0.15 | 0.02 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Wang, Z.; Liu, X.; Wu, Z.
Design of Unsignalized Roundabouts Driving Policy of Autonomous Vehicles Using Deep Reinforcement Learning. *World Electr. Veh. J.* **2023**, *14*, 52.
https://doi.org/10.3390/wevj14020052

**AMA Style**

Wang Z, Liu X, Wu Z.
Design of Unsignalized Roundabouts Driving Policy of Autonomous Vehicles Using Deep Reinforcement Learning. *World Electric Vehicle Journal*. 2023; 14(2):52.
https://doi.org/10.3390/wevj14020052

**Chicago/Turabian Style**

Wang, Zengrong, Xujin Liu, and Zhifei Wu.
2023. "Design of Unsignalized Roundabouts Driving Policy of Autonomous Vehicles Using Deep Reinforcement Learning" *World Electric Vehicle Journal* 14, no. 2: 52.
https://doi.org/10.3390/wevj14020052