# Variable Speed Limit Control for the Motorway–Urban Merging Bottlenecks Using Multi-Agent Reinforcement Learning

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Methodology

#### 2.1. Reinforcement Learning

#### 2.2. Proximal Policy Optimization

#### 2.3. Multi-Agent Proximal Policy Optimization

## 3. Simulation Environment

## 4. Results and Discussion

#### 4.1. Training Setting of MAPPO Algorithm

#### 4.2. Traffic Performance Measurements

#### 4.3. Sustainability Measurements

## 5. Conclusions

## Abbreviations

ITS | Intelligent Transportation System |

VSL | Variable Speed Limit |

DRL | Deep Reinforcement Learning |

DQN | Deep Q-Network |

MAPPO | Multi-Agent Proximal Policy Optimization |

SUMO | Simulation of Urban Mobility |

RL | Reinforcement Learning |

ML | Machine Learning |

DL | Deep Learning |

DNN | Deep Neural Networks |

NN | Neural Networks |

PG | Policy Gradient |

AC | Actor–Critic |

A3C | Advantage Actor–Critic |

TRPO | Trust Region Policy Optimization |

PPO | Proximal Policy Optimization |

MARL | Multi-Agent Reinforcement Learning |

CTDE | Centralized Training with Decentralized Execution |

GAE | Generalized Advantage Estimation |

TraCI | Traffic Control Interface |

OSM | OpenStreetMap |

TTT | Total Time Spent |

CAV | Connected and Automated Vehicles |

V2V | Vehicle to Vehicle |

HBEFA 3 | Third Version of Handbook Emission Factors for Road Transport |

**Figure 2.**Schematic diagram of variable speed limit control at the merging area, where the motorway transitions to the urban road network.

**Figure 6.**Occupancy data of different road sections during the simulation period without VSL control.

**Figure 10.**The learning curve dedicated to reducing carbon dioxide emissions over the 600 iterations.

Hyperparameters | Value |
---|---|

Number of training iterations | 600 |

Learning rate | 0.0005 |

Number of agent | 7 |

PPO clip parameter $\theta $ | 0.2 |

Discount factor $\gamma $ | 0.99 |

GAE $\lambda $ parameter | 0.95 |

Time step per update | 120 |

Number of PPO epochs per update | 15 |

Hidden layers | 64 × 64 × 64 |

Hidden layers activation function | RELU |

