A Multi-Agent Regional Traffic Signal Control System Integrating Traffic Flow Prediction and Graph Attention Networks
Abstract
1. Introduction
- A prediction-guided MADDPG framework for regional signal control. We propose TG-MADDPG, which couples short-term traffic flow prediction with multi-agent actor–critic learning for regional signal control. The predicted future traffic state is incorporated into the target construction to guide value learning, enabling more forward-looking decision making beyond purely reactive control based on historical and current states.
- Topology-aware coordination with graph attention. To capture complex spatial dependencies among intersections, we introduce a graph attention network (GAT) to dynamically encode the road network topology and learn adaptive influence weights over neighboring intersections, thereby enhancing coordination and improving cooperative control in large-scale networks.
- Reward design and comprehensive evaluation. We design a reward derived from normalized pressure difference to better align local signal operations with regional coordination objectives. The proposed control scheme is evaluated on the SUMO simulation platform using both synthetic road networks with simulated traffic and real-world road networks with measured traffic flow. Multiple performance metrics are employed to assess control performance, demonstrating the effectiveness and generalization capability of the proposed approach.
2. Related Work
3. Preliminaries
3.1. State Space
3.2. Action Space
3.3. Reward Function
4. Methodology
4.1. Traffic Flow Prediction Model
4.2. Graph Attention Network Model
4.3. TG-MADDPG Framework
- Step 1:
- Construct an Actor network and a Critic network for each agent, initializing their respective parameters and Synchronize the parameters of the corresponding target networks. Initialize hyperparameters including the experience replay buffer, learning rate, and discount factors. Simultaneously, initialize the Graph Attention Network weights and the attention coefficient calculation function.
- Step 2:
- The agent obtains the initial environmental state of the intersection and its neighborhood observation information, and forms a joint state representation after fusion.
- Step 3:
- Based on the current joint state, compute the enhanced node features and normalized attention coefficients. Subsequently, generate the action strategy for the current time step. Execute the selected action, observe the environmental feedback, and obtain the next state and the corresponding reward value.
- Step 4:
- Store the obtained experience tuple in the experience replay buffer. During the training phase, randomly sample a mini-batch of experiences from the buffer and update the Critic network parameters by minimizing the loss function.
- Step 5:
- Update the online Actor network parameters using the policy gradient method. Then, synchronize the parameters of the target policy network and the target value network via a soft update mechanism.
| Algorithm 1: TG-MADDPG Pseudocode |
| 1 Initialize Actor network and Critic network with parameters , for each agent . |
| 2 Initialize target network weights: , . |
| 3 Initialize experience replay buffer , learning rates, discount factor and other hyperparameters. |
| 4 Initialize Graph Attention Network weights and attention coefficient function. |
| 5 For episode = 1 to M do |
| 6 Initialize environment and obtain initial intersection state and neighborhood observations |
| 7 Compute joint state representation using Equation (2). |
| 8 For t = 1 to T do |
| 9 Compute enhanced features and normalized attention coefficients using Equations (17) and (19). |
| 10 Select action based on current state . |
| 11 Execute , observe next state , and compute reward using Equation (9). |
| 12 Obtain traffic flow prediction state from the prediction module. |
| 13 Store experience tuple < > into replay buffer |
| 14 Sample a random minibatch S from . |
| 15 For each sampled transition, compute target actions with target actors: . |
| 16 Similarly, compute predicted-state actions with target actors: . |
| 17 Compute target Q-value using Equation (23). |
| 18 Update Critic network by minimizing loss using Equation (22). |
| 19 Update Actor network using policy gradient method according to Equation (21). |
| 20 Soft-update target network parameters via Equation (24). |
| 21 End for |
| 22 End for |
5. Experiment
5.1. Experiment Environment
5.2. Datasets
5.3. Experimental Results
5.3.1. Traffic Signal Control Experiment
- Comparison between GMADDPG and TG-MADDPG: The integration of predictive state information leads to a 5–12% improvement in average cumulative rewards, waiting time, and queue length, with the effect being more pronounced during peak periods. This validates the rationale behind reconstructing the Q-value target. By introducing predictive information, the agent transcends the limitations of real-time observations, acquiring the capacity for “forward-looking decision-making.” This enables the agent to better handle traffic flow peaks and prevent congestion propagation.
- Comparison between MADDPG and GMADDPG: The introduction of the graph attention network results in a 3–8% improvement in performance, indicating that modeling spatial dependencies between intersections optimizes cooperative effects. TG-MADDPG further demonstrates that dynamic weight allocation, compared to static allocation, is more adaptable to the spatiotemporal dynamics of traffic flow, effectively addressing the issue of decision homogenization in regional traffic control.
5.3.2. Traffic Flow Prediction Experiment
5.3.3. Prediction Errors on Training Stability
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Haddad, T.A.; Hedjazi, D.; Aouag, S. A Deep Reinforcement Learning-Based Cooperative Approach for Multi-Intersection Traffic Signal Control. Eng. Appl. Artif. Intell. 2022, 114, 105019. [Google Scholar] [CrossRef]
- Hussain, Z.; Kaleem Khan, M.; Xia, Z. Investigating the Role of Green Transport, Environmental Taxes and Expenditures in Mitigating the Transport CO2 Emissions. Transp. Lett. 2023, 15, 439–449. [Google Scholar] [CrossRef]
- Ye, B.-L.; Wu, W.; Ruan, K.; Li, L.; Chen, T.; Gao, H.; Chen, Y. A Survey of Model Predictive Control Methods for Traffic Signal Control. IEEE/CAA J. Autom. Sin. 2019, 6, 623–640. [Google Scholar] [CrossRef]
- Nagatani, T. Effect of Bypasses on Vehicular Traffic through a Series of Signals. Phys. A Stat. Mech. Its Appl. 2018, 506, 229–236. [Google Scholar] [CrossRef]
- Chiou, S.-W. A Two-Stage Model for Period-Dependent Traffic Signal Control in a Road Networked System with Stochastic Travel Demand. Inf. Sci. 2019, 476, 256–273. [Google Scholar] [CrossRef]
- Kong, X.; Shen, G.; Xia, F.; Lin, C. Urban Arterial Traffic Two-Direction Green Wave Intelligent Coordination Control Technique and Its Application. Int. J. Control Autom. Syst. 2011, 9, 60–68. [Google Scholar] [CrossRef]
- Liu, K. Design and Application of Real-Time Traffic Simulation Platform Based on UTC/SCOOT and VISSIM. J. Simul. 2023, 18, 539–556. [Google Scholar] [CrossRef]
- Chen, R.; Fang, F.; Sadeh, N. The Real Deal: A Review of Challenges and Opportunities in Moving Reinforcement Learning-Based Traffic Signal Control Systems Towards Reality. arXiv 2022, arXiv:2206.11996. [Google Scholar] [CrossRef]
- Zhao, H.; Dong, C.; Cao, J.; Chen, Q. A Survey on Deep Reinforcement Learning Approaches for Traffic Signal Control. Eng. Appl. Artif. Intell. 2024, 133, 108100. [Google Scholar] [CrossRef]
- Solaiappan, S.; Kumar, B.R.; Anbazhagan, N.; Song, Y.; Joshi, G.P.; Cho, W. Vehicular Traffic Flow Analysis and Minimize the Vehicle Queue Waiting Time Using Signal Distribution Control Algorithm. Sensors 2023, 23, 6819. [Google Scholar] [CrossRef]
- Zhao, R.; Hu, H.; Li, Y.; Fan, Y.; Gao, F.; Gao, Z. Sequence Decision Transformer for Adaptive Traffic Signal Control. Sensors 2024, 24, 6202. [Google Scholar] [CrossRef]
- Joo, H.; Ahmed, S.H.; Lim, Y. Traffic Signal Control for Smart Cities Using Reinforcement Learning. Comput. Commun. 2020, 154, 324–330. [Google Scholar] [CrossRef]
- Huang, Z. Reinforcement Learning Based Adaptive Control Method for Traffic Lights in Intelligent Transportation. Alex. Eng. J. 2024, 106, 381–391. [Google Scholar] [CrossRef]
- Zheng, Y.; Luo, J.; Gao, H.; Zhou, Y.; Li, K. Pri-DDQN: Learning Adaptive Traffic Signal Control Strategy through a Hybrid Agent. Complex Intell. Syst. 2025, 11, 47. [Google Scholar] [CrossRef]
- Li, Z.; Yu, H.; Zhang, G.; Dong, S.; Xu, C.-Z. Network-Wide Traffic Signal Control Optimization Using a Multi-Agent Deep Reinforcement Learning. Transp. Res. Part C Emerg. Technol. 2021, 125, 103059. [Google Scholar] [CrossRef]
- Wang, L.; Zhang, G.; Yang, Q.; Han, T. An Adaptive Traffic Signal Control Scheme with Proximal Policy Optimization Based on Deep Reinforcement Learning for a Single Intersection. Eng. Appl. Artif. Intell. 2025, 149, 110440. [Google Scholar] [CrossRef]
- Chu, T.; Wang, J.; Codecà, L.; Li, Z. Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control. IEEE Trans. Intell. Transp. Syst. 2019, 21, 1086–1095. [Google Scholar] [CrossRef]
- Mannion, P.; Duggan, J.; Howley, E. An Experimental Review of Reinforcement Learning Algorithms for Adaptive Traffic Signal Control. In Autonomic Road Transport Support Systems; McCluskey, T.L., Kotsialos, A., Müller, J.P., Klügl, F., Rana, O., Schumann, R., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 47–66. [Google Scholar]
- Qu, Z.; Pan, Z.; Chen, Y.; Wang, X.; Li, H. A Distributed Control Method for Urban Networks Using Multi-Agent Reinforcement Learning Based on Regional Mixed Strategy Nash-Equilibrium. IEEE Access 2020, 8, 19750–19766. [Google Scholar] [CrossRef]
- Sun, Y.; Lin, K.; Bashir, A.K. KeyLight: Intelligent Traffic Signal Control Method Based on Improved Graph Neural Network. IEEE Trans. Consum. Electron. 2024, 70, 2861–2871. [Google Scholar] [CrossRef]
- Wang, T.; Zhu, Z.; Zhang, J.; Tian, J.; Zhang, W. A Large-Scale Traffic Signal Control Algorithm Based on Multi-Layer Graph Deep Reinforcement Learning. Transp. Res. Part C Emerg. Technol. 2024, 162, 104582. [Google Scholar] [CrossRef]
- Yang, G.; Wen, X.; Chen, F. Multi-Agent Deep Reinforcement Learning with Graph Attention Network for Traffic Signal Control in Multiple-Intersection Urban Areas. Transp. Res. Rec. 2025, 2679, 880–898. [Google Scholar] [CrossRef]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Ma, T.; Peng, K. AGRCNet: Communicate by Attentional Graph Relations in Multi-Agent Reinforcement Learning for Traffic Signal Control. Neural Comput. Appl. 2023, 35, 21007–21022. [Google Scholar] [CrossRef]
- Yan, L.; Zhu, L.; Song, K.; Yuan, Z.; Yan, Y.; Tang, Y.; Peng, C. Graph Cooperation Deep Reinforcement Learning for Ecological Urban Traffic Signal Control. Appl. Intell. 2023, 53, 6248–6265. [Google Scholar] [CrossRef]
- Gu, H.; Wang, S.; Jia, D.; Zhang, Y.; Luo, Y.; Mao, G.; Wang, J.; Gee Lim, E. Communication Strategy on Macro-and-Micro Traffic State in Cooperative Deep Reinforcement Learning for Regional Traffic Signal Control. IEEE Trans. Intell. Transport. Syst. 2025, 26, 12183–12196. [Google Scholar] [CrossRef]
- Li, G.; Deng, H.; Yang, H. Traffic Flow Prediction Model Based on Improved Variational Mode Decomposition and Error Correction. Alex. Eng. J. 2023, 76, 361–389. [Google Scholar] [CrossRef]
- Zhao, K.; Guo, D.; Sun, M.; Zhao, C.; Shuai, H. Short-Term Traffic Flow Prediction Based on VMD and IDBO-LSTM. IEEE Access 2023, 11, 97072–97088. [Google Scholar] [CrossRef]
- Ren, C.; Fu, F.; Yin, C.; Lu, L.; Cheng, L. A Combined Model for Short-Term Traffic Flow Prediction Based on Variational Modal Decomposition and Deep Learning. Sci. Rep. 2025, 15, 17142. [Google Scholar] [CrossRef]
- Li, J.; Zhang, Z.; Meng, F.; Zhu, W. Short-Term Traffic Flow Prediction via Improved Mode Decomposition and Self-Attention Mechanism Based Deep Learning Approach. IEEE Sens. J. 2022, 22, 14356–14365. [Google Scholar] [CrossRef]
- An, J.; Guo, L.; Liu, W.; Fu, Z.; Ren, P.; Liu, X.; Li, T. IGAGCN: Information Geometry and Attention-Based Spatiotemporal Graph Convolutional Networks for Traffic Flow Prediction. Neural Netw. 2021, 143, 355–367. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Sha, J.; Zhang, C.; Zhang, Y. A CNN-LSTM-GRU Hybrid Model for Spatiotemporal Highway Traffic Flow Prediction. Systems 2025, 13, 765. [Google Scholar] [CrossRef]
- Zhao, W.; Gao, Y.; Ji, T.; Wan, X.; Ye, F.; Bai, G. Deep Temporal Convolutional Networks for Short-Term Traffic Flow Forecasting. IEEE Access 2019, 7, 114496–114507. [Google Scholar] [CrossRef]
- Han, G.; Zheng, Q.; Liao, L.; Tang, P.; Li, Z.; Zhu, Y. Deep Reinforcement Learning for Intersection Signal Control Considering Pedestrian Behavior. Electronics 2022, 11, 3519. [Google Scholar] [CrossRef]
- Mao, F.; Li, Z.; Li, L. A Comparison of Deep Reinforcement Learning Models for Isolated Traffic Signal Control. IEEE Intell. Transport. Syst. Mag. 2023, 15, 160–180. [Google Scholar] [CrossRef]
- Oroojlooy, A.; Nazari, M.; Hajinezhad, D.; Silva, J. AttendLight: Universal Attention-Based Reinforcement Learning Model for Traffic Signal Control. Adv. Neural Inf. Process. Syst. 2020, 33, 4079–4090. [Google Scholar]
- Kamal, H.; Yánez, W.; Hassan, S.; Sobhy, D. Digital-Twin-Based Deep Reinforcement Learning Approach for Adaptive Traffic Signal Control. IEEE Internet Things J. 2024, 11, 21946–21953. [Google Scholar] [CrossRef]
- Xi, Q.; Chen, Q.M.; Ahmad, W.; Pan, J.; Zhao, S.; Xia, Y.; Ouyang, Q.; Chen, Q.S. Quantitative analysis and visualization of chemical compositions during shrimp flesh deterioration using hyperspectral imaging: A comparative study of machine learning and deep learning models. Food Chemistry 2025, 481, 143997. [Google Scholar] [CrossRef]
- Kohli, M.; Arora, S. Chaotic Grey Wolf Optimization Algorithm for Constrained Optimization Problems. J. Comput. Des. Eng. 2018, 5, 458–472. [Google Scholar] [CrossRef]
- Zhao, X.; Wang, L.; Zhang, Y.; Han, X.; Deveci, M.; Parmar, M. A Review of Convolutional Neural Networks in Computer Vision. Artif. Intell. Rev. 2024, 57, 99. [Google Scholar] [CrossRef]
- Xia, Y.; Xiao, X.; Adade, S.Y.-S.S.; Xi, Q.; Wu, J.; Xu, Y.; Chen, Q.M.; Chen, Q.S. Physicochemical properties and gel quality monitoring of surimi during thermal processing using hyperspectral imaging combined with deep learning. Food Control 2025, 175, 111258. [Google Scholar] [CrossRef]
- Lopez, P.A.; Wiessner, E.; Behrisch, M.; Bieker-Walz, L.; Erdmann, J.; Flotterod, Y.-P.; Hilbrich, R.; Lucken, L.; Rummel, J.; Wagner, P. Microscopic Traffic Simulation Using SUMO. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; IEEE: New York, NY, USA, 2018; Volume 11, pp. 2575–2582. [Google Scholar]
- Whiteson, S. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning. In Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Sydney, Australia, 6–11 August 2017; pp. 1146–1155. [Google Scholar]
- Bilotta, S.; Fereidooni, Z.; Ipsaro Palesi, L.A.; Nesi, P. Macroscopic GA-Based Multi-Objective Traffic Light Optimization Prioritizing Tramways. Appl. Soft Comput. 2025, 178, 113269. [Google Scholar] [CrossRef]
- Kwesiga, D.K.; Vishnoi, S.C.; Guin, A.; Hunter, M. Integrating Transit Signal Priority into Multi-Agent Reinforcement Learning Based Traffic Signal Control. arXiv 2024, arXiv:2411.19359. [Google Scholar] [CrossRef]


















| TG-MADDPG | Simulation Environment | ||
|---|---|---|---|
| Parameter | Value | Parameter | Value |
| Batch size | 64 | Intersections | 9 |
| Actor learning rate | 1 × 10−4 | 10 s | |
| Critic learning rate | 1 × 10−3 | 50 s | |
| Discount factor | 0.99 | Yellow light | 3 s |
| Episode | 100 | Acceleration | 2 m/s2 |
| Replay buffer | 10,000 | Probability of turning left | 15% |
| Simulation duration | 3600 s | Probability of going straight | 60% |
| Optimizer | Adam | Probability of turning right | 25% |
| Number of GAT attention heads | 4 | Maximum speed | 13.89 m/s |
| GAT hidden layer dimension | 64 | Minimum vehicles distance | 2.5 m |
| Model | Average Cumulative Reward | Average Waiting Time | Average Queue Length | |||
|---|---|---|---|---|---|---|
| Off-Peak | Peak Period | Off-Peak | Peak Period | Off-Peak | Peak Period | |
| IQL | −31.31 | −39.12 | 94.57 | 129.63 | 106.12 | 131.62 |
| MADDPG | −29.28 | −36.52 | 92.64 | 120.97 | 103.98 | 122.83 |
| GMADDPG | −27.73 | −33.34 | 88.48 | 111.52 | 95.31 | 118.64 |
| TG-MADDPG | −24.59 | −29.29 | 81.97 | 99.92 | 84.96 | 109.09 |
| Model | Average Cumulative Reward | Average Waiting Time | Average Queue Length | |||
|---|---|---|---|---|---|---|
| Off-Peak | Peak Period | Off-Peak | Peak Period | Off-Peak | Peak Period | |
| IQL | −37.07 | −40.46 | 88.12 | 135.51 | 100.37 | 151.32 |
| MADDPG | −36.46 | −39.34 | 86.31 | 131.20 | 98.69 | 146.53 |
| GMADDPG | −32.77 | −36.42 | 80.26 | 120.58 | 88.47 | 135.84 |
| TG-MADDPG | −30.58 | −33.46 | 73.39 | 113.08 | 82.78 | 124.81 |
| Model | Assessment Indicators | ||
|---|---|---|---|
| RMSE | MAE | R2 | |
| WT-GWO-CNN-LSTM | 1.19 | 0.84 | 94.50 |
| VMD-CNN-LSTM | 1.54 | 1.15 | 90.78 |
| CNN-GRU | 1.83 | 1.34 | 88.06 |
| Model | Assessment Indicators | ||
|---|---|---|---|
| RMSE | MAE | R2 | |
| WT-CNN-LSTM | 1.36 | 0.99 | 92.41 |
| GWO-CNN-LSTM | 1.92 | 1.46 | 87.59 |
| CNN-LSTM | 2.03 | 1.59 | 85.98 |
| WT-GWO-CNN-LSTM | 1.19 | 0.84 | 94.50 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Sun, C.; Yang, Y.; Li, J.; Fang, W.; Zhang, P. A Multi-Agent Regional Traffic Signal Control System Integrating Traffic Flow Prediction and Graph Attention Networks. Systems 2026, 14, 47. https://doi.org/10.3390/systems14010047
Sun C, Yang Y, Li J, Fang W, Zhang P. A Multi-Agent Regional Traffic Signal Control System Integrating Traffic Flow Prediction and Graph Attention Networks. Systems. 2026; 14(1):47. https://doi.org/10.3390/systems14010047
Chicago/Turabian StyleSun, Chao, Yuhao Yang, Jiacheng Li, Weiyi Fang, and Peng Zhang. 2026. "A Multi-Agent Regional Traffic Signal Control System Integrating Traffic Flow Prediction and Graph Attention Networks" Systems 14, no. 1: 47. https://doi.org/10.3390/systems14010047
APA StyleSun, C., Yang, Y., Li, J., Fang, W., & Zhang, P. (2026). A Multi-Agent Regional Traffic Signal Control System Integrating Traffic Flow Prediction and Graph Attention Networks. Systems, 14(1), 47. https://doi.org/10.3390/systems14010047

