Coordinated Multi-Intersection Traffic Signal Control Using a Policy-Regulated Deep Q-Network
Abstract
1. Introduction
- (1)
- A structured multi-intersection state representation is constructed, integrating waiting time, queue length, and movement-level pressure, so that agents can better capture both local congestion and upstream–downstream interactions.
- (2)
- A hybrid reward function is designed to combine delay reduction, pressure balancing, and adjacent-intersection influence, thereby encouraging each agent to consider not only its own performance but also the impact on neighboring intersections and network-wide coordination.
- (3)
- A policy-regulation and Q-alignment mechanism is introduced, in which an explicit, differentiable policy function is trained to align with the Q-value–induced target distribution, smoothing action-selection noise and improving interpretability. Together with a parameter-sharing multi-agent structure, this mechanism improves learning stability and ensures consistent behavior across intersections.
2. Multi-Intersection Signal Coordination Model
2.1. State Space
2.2. Action Space
2.3. Reward Function
- (1)
- Waiting Time
- (2)
- Queue Length
- (3)
- Influence of Adjacent Intersections
- (4)
- Overall Reward and Objective
3. Signal Coordination Control Algorithm Based on PRA-DQN
3.1. Policy-Regulated and Aligned DQN (PRA-DQN) Algorithm
- (1)
- Explicit Policy Function
- (2)
- Policy–Q Alignment Mechanism
- (3)
- Q-Network Optimization
- (4)
- Combined Learning Process
- Value Learning: Updating the Q-network parameters based on sampled transitions.
- Policy adaptation: updating the policy parameters θ so that aligns with the Q-value–based optimal action.
- (5)
- Practical Training Procedure
| Algorithm 1. Policy-Regulated and Aligned DQN (PRA-DQN) |
Initialization:
|
3.2. Neural Network Architecture Design
- (1)
- Input Feature Organization
- (2)
- Convolutional Feature Extraction
- (3)
- Fully Connected Layers and Output
- (4)
- Loss Function and Optimization
- (5)
- Experience Replay
3.3. Parameter Sharing Mechanism
3.4. Signal Control Based on Parameter-Sharing PRA-DQN Algorithm
4. Simulation Experiments and Results Analysis
4.1. Simulation Environment
4.2. Road Network Configuration
4.3. Traffic Flow Simulation Setup
4.4. Simulation Parameter Settings
4.5. Simulation Setup
Benchmark Algorithms
- (1)
- DQN-based Coordinated Signal Control (DQN-CoC): This method approximates the Q-value function using a deep neural network and employs an ε-greedy strategy for action selection. In coordinated mode, agents share information and jointly optimize signal timings to enhance network-wide performance.
- (2)
- DQN-based Independent Signal Control (DQN-IC): Similarly to DQN-CoC, this method also relies on a deep neural network and ε-greedy exploration. However, each agent makes decisions solely based on its own local state without sharing information, serving as a baseline to assess the benefits of coordination.
- (3)
- Stochastic Control (STOCHASTIC): In this approach, the signal phase is selected randomly without considering traffic state or historical patterns. It serves as a lower-bound benchmark for assessing learning-based strategies.
- (4)
- Max-Pressure Control (MAXPRESSURE): This rule-based controller selects the phase with the highest pressure, computed as the density difference between incoming and outgoing lanes. It aims to resolve local flow imbalance and reduce congestion.
- (5)
- Max-Wave Pressure Control (MAXWAVE): This algorithm estimates shockwave propagation and calculates wave-front pressure. The phase associated with the highest wave pressure is selected to mitigate spillback and delays.
- (6)
- Fixed-Time Control: This classical method assigns predetermined cycle lengths and green splits at all intersections. By synchronizing signal timings, it aims to create a green-wave effect and provide a reference for evaluating adaptive control methods.
4.6. Analysis of Simulation Results
- maximum queue length by 21.17%,
- average queue length by 18.75%,
- and average waiting time by 17.71%.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- INRIX. 2025 Global Traffic Scorecard. 2025. Available online: https://inrix.com/scorecard/ (accessed on 10 December 2025).
- INRIX. Traffic Is Back: Insights from the 2025 INRIX Global Traffic Scorecard. 2025. Available online: https://inrix.com/blog/traffic-is-back-insights-from-the-2025-inrix-global-traffic-scorecard (accessed on 10 December 2025).
- Wang, X.; Jerome, Z.; Wang, Z.; Zhang, C.; Shen, S.; Kumar, V.V.; Bai, F.; Krajewski, P.; Deneau, D.; Jawad, A.; et al. Traffic light optimization with low penetration rate vehicle trajectory data. Nat. Commun. 2024, 15, 1306. [Google Scholar] [CrossRef] [PubMed]
- Agarwal, A.; Sahu, D.; Mohata, R.; Jeengar, K.; Nautiyal, A.; Saxena, D.K. Dynamic traffic signal control for heterogeneous traffic conditions using Max Pressure and Reinforcement Learning. Expert Syst. Appl. 2024, 254, 124416. [Google Scholar] [CrossRef]
- Wang, L.; Zhang, G.; Yang, Q.; Han, T. An adaptive traffic signal control scheme with Proximal Policy Optimization based on deep reinforcement learning for a single intersection. Eng. Appl. Artif. Intell. 2025, 149, 110440. [Google Scholar] [CrossRef]
- Macioszek, E.; Kurek, A. Road traffic distribution on public holidays and workdays on selected road transport network elements. Transp. Probl. 2021, 16, 127–138. [Google Scholar] [CrossRef]
- Laval, J.A. Traffic Flow as a Simple Fluid: Toward a Scaling Theory of Urban Congestion. Transp. Res. Rec. 2024, 2678, 376–386. [Google Scholar] [CrossRef]
- Kerner, B.S. Introduction to Modern Traffic Flow Theory and Control: The Long Road to Three-Phase Traffic Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- Jha, A.; Wiesenfeld, K.; Lee, G.; Laval, J. Simple traffic model as a space-time clustering phenomenon. Phys. Rev. E 2025, 112, 054104. [Google Scholar] [CrossRef]
- Chen, X.; Wu, S.; Shi, C.; Huang, Y.; Yang, Y.; Ke, R.; Zhao, J. Sensing Data Supported Traffic Flow Prediction via Denoising Schemes and ANN: A Comparison. IEEE Sens. J. 2020, 20, 14317–14328. [Google Scholar] [CrossRef]
- Chen, X.; Li, Z.; Yang, Y.; Qi, L.; Ke, R. High-Resolution Vehicle Trajectory Extraction and Denoising from Aerial Videos. IEEE Trans. Intell. Transp. Syst. 2021, 22, 3190–3202. [Google Scholar] [CrossRef]
- Shabestary, S.M.A.; Abdulhai, B. Adaptive Traffic Signal Control with Deep Reinforcement Learning and High Dimensional Sensory Inputs: Case Study and Comprehensive Sensitivity Analyses. IEEE Trans. Intell. Transp. Syst. 2022, 23, 20021–20035. [Google Scholar] [CrossRef]
- Chu, K.-F.; Lam, A.Y.S.; Li, V.O.K. Traffic Signal Control Using End-to-End Off-Policy Deep Reinforcement Learning. IEEE Trans. Intell. Transp. Syst. 2022, 23, 7184–7195. [Google Scholar] [CrossRef]
- Ma, D.; Zhou, B.; Song, X.; Dai, H. A Deep Reinforcement Learning Approach to Traffic Signal Control with Temporal Traffic Pattern Mining. IEEE Trans. Intell. Transp. Syst. 2022, 23, 11789–11800. [Google Scholar] [CrossRef]
- Chen, D.; Xu, T.; Ma, S.; Gao, X.; Zhao, G. Research on Intelligent Signal Timing Optimization of Signalized Intersection Based on Deep Reinforcement Learning Using Floating Car Data. Transp. Res. Rec. J. Transp. Res. Board 2024, 2678, 1126–1147. [Google Scholar] [CrossRef]
- Hu, T.; Li, Z. A multi-agent deep reinforcement learning approach for traffic signal coordination. IET Intell. Transp. Syst. 2024, 18, 1428–1444. [Google Scholar] [CrossRef]
- Zhang, W.; Yan, C.; Li, X.; Fang, L.; Wu, Y.-J.; Li, J. Distributed Signal Control of Arterial Corridors Using Multi-Agent Deep Reinforcement Learning. IEEE Trans. Intell. Transp. Syst. 2023, 24, 178–190. [Google Scholar] [CrossRef]
- Park, S.; Han, E.; Park, S.; Jeong, H.; Yun, I. Deep Q-network-based traffic signal control models. PLoS ONE 2021, 16, e0256405. [Google Scholar] [CrossRef] [PubMed]
- Jiang, S.; Huang, Y.; Jafari, M.; Jalayer, M. A Distributed Multi-Agent Reinforcement Learning with Graph Decomposition Approach for Large-Scale Adaptive Traffic Signal Control. IEEE Trans. Intell. Transp. Syst. 2022, 23, 14689–14701. [Google Scholar] [CrossRef]
- Gu, H.; Wang, S.; Ma, X.; Jia, D.; Mao, G.; Lim, E.G.; Wong, C.P.R. Large-Scale Traffic Signal Control Using Constrained Network Partition and Adaptive Deep Reinforcement Learning. IEEE Trans. Intell. Transp. Syst. 2024, 25, 7619–7632. [Google Scholar] [CrossRef]
- Ran, Q.; Liang, C.; Liu, P. A safe lane-changing strategy for autonomous vehicles based on deep Q-networks and prioritized experience replay. Digit. Transp. Saf. 2025, 4, 170−174. [Google Scholar] [CrossRef]
- Wang, T.; Cao, J.; Hussain, A. Adaptive Traffic Signal Control for large-scale scenario with Cooperative Group-based Multi-agent reinforcement learning. Transp. Res. Part C Emerg. Technol. 2021, 125, 103046. [Google Scholar] [CrossRef]
- Wang, X.; Taitler, A.; Smirnov, I.; Sanner, S.; Abdulhai, B. eMARLIN: Distributed Coordinated Adaptive Traffic Signal Control with Topology-Embedding Propagation. Transp. Res. Rec. 2024, 2678, 189–202. [Google Scholar] [CrossRef]
- Li, Y.; Pu, Z.; Liu, P.; Qian, T.; Hu, Q.; Zhang, J.; Wang, Y. Efficient predictive control strategy for mitigating the overlap of EV charging demand and residential load based on distributed renewable energy. Renew. Energy 2025, 240, 122154. [Google Scholar] [CrossRef]
- Yang, G.; Wen, X.; Chen, F. Multi-Agent Deep Reinforcement Learning with Graph Attention Network for Traffic Signal Control in Multiple-Intersection Urban Areas. Transp. Res. Rec. 2025, 2679, 880–898. [Google Scholar] [CrossRef]
- Wang, T.; Zhu, Z.; Zhang, J.; Tian, J.; Zhang, W. A large-scale traffic signal control algorithm based on multi-layer graph deep reinforcement learning. Transp. Res. Part C-Emerg. Technol. 2024, 162, 104582. [Google Scholar] [CrossRef]
- Yazdani, M.; Sarvi, M.; Bagloee, S.A.; Nassir, N.; Price, J.; Parineh, H. Intelligent vehicle pedestrian light (IVPL): A deep reinforcement learning approach for traffic signal control. Transp. Res. Part C-Emerg. Technol. 2023, 149, 103991. [Google Scholar] [CrossRef]
- Yu, J.; Laharotte, P.-A.; Han, Y.; Leclercq, L. Decentralized signal control for multi-modal traffic network: A deep reinforcement learning approach. Transp. Res. Part C-Emerg. Technol. 2023, 154, 104281. [Google Scholar] [CrossRef]
- Hu, W.X.; Ishihara, H.; Chen, C.; Shalaby, A.; Abdulhai, B. Deep Reinforcement Learning Two-Way Transit Signal Priority Algorithm for Optimizing Headway Adherence and Speed. IEEE Trans. Intell. Transp. Syst. 2023, 24, 7920–7931. [Google Scholar] [CrossRef]
- Long, M.; Zou, X.; Zhou, Y.; Chung, E. Deep reinforcement learning for transit signal priority in a connected environment. Transp. Res. Part C-Emerg. Technol. 2022, 142, 103814. [Google Scholar] [CrossRef]
- Guo, J.; Cheng, L.; Wang, S. CoTV: Cooperative Control for Traffic Light Signals and Connected Autonomous Vehicles Using Deep Reinforcement Learning. IEEE Trans. Intell. Transp. Syst. 2023, 24, 10501–10512. [Google Scholar] [CrossRef]
- Song, L.; Fan, W.D. Performance of State-Shared Multiagent Deep Reinforcement Learning Controlled Signal Corridor with Platooning-Based CAVs. J. Transp. Eng. Part A-Syst. 2023, 149, 04023072. [Google Scholar] [CrossRef]
- Wang, L.; Zhang, W.; Yan, Z. Vehicle-Infrastructure Cooperation Framework for Vehicle Navigation and Traffic Signal Control using Deep Reinforcement Learning. Transp. Res. Rec. 2025, 2680, 568–583. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, H.; Zhang, Y. Traffic Signal and Autonomous Vehicle Control Model: An Integrated Control Model for Connected Autonomous Vehicles at Traffic-Conflicting Intersections Based on Deep Reinforcement Learning. J. Transp. Eng. Part A-Syst. 2025, 151, 04024107. [Google Scholar] [CrossRef]
- Ying, Z.; Cao, S.; Liu, X.; Ma, Z.; Ma, J.; Deng, R.H. PrivacySignal: Privacy-Preserving Traffic Signal Control for Intelligent Transportation System. IEEE Trans. Intell. Transp. Syst. 2022, 23, 16290–16303. [Google Scholar] [CrossRef]
- Kumar, N.; Mittal, S.; Garg, V.; Kumar, N. Deep Reinforcement Learning-Based Traffic Light Scheduling Framework for SDN-Enabled Smart Transportation System. IEEE Trans. Intell. Transp. Syst. 2022, 23, 2411–2421. [Google Scholar] [CrossRef]
- Zhou, S.; Chen, X.; Li, C.; Chang, W.; Wei, F.; Yang, L. Intelligent Road Network Management Supported by 6G and Deep Reinforcement Learning. IEEE Trans. Intell. Transp. Syst. 2025, 26, 17235–17243. [Google Scholar] [CrossRef]
- Yang, J.; Zhang, J.; Wang, H. Urban Traffic Control in Software Defined Internet of Things via a Multi-Agent Deep Reinforcement Learning Approach. IEEE Trans. Intell. Transp. Syst. 2021, 22, 3742–3754. [Google Scholar] [CrossRef]
- Sun, Z.; Jia, X.; Cai, Y.; Ji, A.; Lin, X.; Liu, L.; Wang, W.; Tu, Y. Joint control of traffic signal phase sequence and timing: A deep reinforcement learning method. Digit. Transp. Saf. 2025, 4, 118−126. [Google Scholar] [CrossRef]
- Zhu, Y.; Lv, Y.; Lin, S.; Xu, J. A Stochastic Traffic Flow Model-Based Reinforcement Learning Framework for Advanced Traffic Signal Control. IEEE Trans. Intell. Transp. Syst. 2025, 26, 714–723. [Google Scholar] [CrossRef]
- Mao, F.; Li, Z.; Lin, Y.; Li, L. Mastering Arterial Traffic Signal Control with Multi-Agent Attention-Based Soft Actor-Critic Model. IEEE Trans. Intell. Transp. Syst. 2023, 24, 3129–3144. [Google Scholar] [CrossRef]
- Huang, L.; Qu, X. Improving traffic signal control operations using proximal policy optimization. IET Intell. Transp. Syst. 2023, 17, 588–601. [Google Scholar] [CrossRef]
- Luo, H.; Bie, Y.; Jin, S. Reinforcement Learning for Traffic Signal Control in Hybrid Action Space. IEEE Trans. Intell. Transp. Syst. 2024, 25, 5225–5241. [Google Scholar] [CrossRef]
- Wang, Z.; Yang, K.; Li, L.; Lu, Y.; Tao, Y. Traffic signal priority control based on shared experience multi-agent deep reinforcement learning. IET Intell. Transp. Syst. 2023, 17, 1363–1379. [Google Scholar] [CrossRef]
- Xu, D.; Li, C.; Wang, D.; Gao, G. Robustness Analysis of Discrete State-Based Reinforcement Learning Models in Traffic Signal Control. IEEE Trans. Intell. Transp. Syst. 2023, 24, 1727–1738. [Google Scholar] [CrossRef]
- Laval, J.; Zhou, H. Congested Urban Networks Tend to Be Insensitive to Signal Settings: Implications for Learning-Based Control. IEEE Trans. Intell. Transp. Syst. 2022, 23, 24904–24917. [Google Scholar] [CrossRef]
- Jiang, Q.; Qin, M.; Zhang, H.; Zhang, X.; Sun, W. BlindLight: High Robustness Reinforcement Learning Method to Solve Partially Blinded Traffic Signal Control Problem. IEEE Trans. Intell. Transp. Syst. 2024, 25, 16625–16641. [Google Scholar] [CrossRef]
- Shen, J. Hierarchical reinforcement learning-based traffic signal control. Sci. Rep. 2025, 15, 32862. [Google Scholar] [CrossRef]
- Zhou, Y.; Liu, S.; Qing, Y.; Zheng, T.; Chen, K.; Song, J.; Song, M. CADP: Towards Better Centralized Learning for Decentralized Execution in MARL. In Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, Detroit, MI, USA, 19–23 May 2025; pp. 2838–2840. [Google Scholar]
- Chu, T.S.; Wang, J.; Codecà, L.; Li, Z.J. Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control. IEEE Trans. Intell. Transp. Syst. 2020, 21, 1086–1095. [Google Scholar] [CrossRef]
- Alonso, B.; Musolino, G.; Rindone, C.; Vitetta, A. Estimation of a fundamental diagram with heterogeneous data sources: Experimentation in the city of santander. ISPRS Int. J. Geo-Inf. 2023, 12, 418. [Google Scholar] [CrossRef]







| Symbol | Definition | Unit | Aggregation/Measurement Rule |
|---|---|---|---|
| Index of intersections, | — | — | |
| Number of intersections in the network | — | — | |
| Set of incoming lanes at intersection | — | — | |
| Index of an incoming lane of intersection i | — | — | |
| Decision interval for signal control | s | Fixed to 10 s per decision step (followed by a 3 s yellow clearance) | |
| Queue length on lane l at time t | veh | Instantaneous count per decision step; measured on lane l (same detection segment as other lane-level variables) | |
| Total waiting time on lane l at time t | s | Accumulated waiting time of vehicles on lane l at each decision step | |
| Number of approaching vehicles on lane l at time t | veh | Instantaneous vehicle count per decision step; not time-aggregated flow | |
| Total (summed) speed of vehicles on lane l at time t | m/s | Sum of instantaneous vehicle speeds on lane l at each decision step | |
| Total queue length at intersection | veh | ||
| Total waiting time at intersection | s | ||
| Total number of approaching vehicles at intersection | veh | ||
| Total (summed) speed at intersection | m/s | ||
| Maximum capacity of lane | veh | Used for normalization in ; represents the maximum number of vehicles that can be accommodated on lane (under the adopted lane capacity definition) | |
| (Normalized) density of lane at time | — | (dimensionless) | |
| Traffic movement at intersection | — | A movement corresponds to a specific incoming–outgoing lane pair governed by a phase | |
| Set of all possible movements at intersection | — | — | |
| Incoming and outgoing lanes of movement | — | — | |
| Pressure of movement m at time t | — | Defined using incoming/outgoing lane densities (see Section 2.3); computed per decision step | |
| Intersection pressure at intersection | |||
| Action (phase selection) at intersection | — | One action selected per decision step | |
| Action set (four-phase scheme) | — | ||
| State vector of intersection | — | A four-dimensional state constructed from (Section 2.1) | |
| Reward of intersection | — | Computed from waiting-time-related and pressure-related components (Section 2.3) | |
| Q-value function | — | — | |
| Adjustable policy function (PRA-DQN) | — | — | |
| Trainable parameters and target network parameters | — | — | |
| Discount factor | — | — | |
| Learning rate | — | — |
| Parameter | Description | Value |
|---|---|---|
| Learning rate (Q-network) | Step size for updating Q-network weights (Adam) | 0.001 |
| Learning rate (policy function G) | SGD learning rate for policy alignment | 0.01 |
| Discount factor γ | Weight of future rewards | 0.99 |
| Replay buffer size | Capacity of stored transitions | 10,000 |
| Batch size | Number of samples per update | 32 |
| Target network update frequency | Soft update interval | 500 steps |
| Convolutional layers | CNN feature extraction layers | 2 |
| Kernel size | Size of convolution kernels | 3 × 3 |
| Pooling | Type of pooling | MaxPooling |
| Activation function | Nonlinear activation | rectified linear unit (ReLU) |
| Fully connected layers | Number of FC layers | 2 |
| Parameter | Description | Value |
|---|---|---|
| Number of episodes | Total training episodes | 300 |
| Steps per episode | Simulation steps per episode | 400 |
| Pre-training steps | Steps for warm-up/memory initialization | 6000 |
| Discount factor γ | Same as Table 2 | 0.99 |
| ε-initial | Initial exploration rate | 1.0 |
| ε-final | Minimum exploration rate | 0.05 |
| ε-decay | Linear decay steps | 220 |
| Phase duration | Decision interval | 10 s |
| Yellow time | Yellow phase duration | 3 s |
| Model | Maximum Queue Length | Average Queue Length | Average Waiting Time | |||
|---|---|---|---|---|---|---|
| Value | Optimization Ratio | Value | Optimization Ratio | Value | Optimization Ratio | |
| PRA-DQN-CoC | 17.60 | 21.17% | 16.30 | 18.75% | 8.60 | 17.71% |
| DQN-CoC | 19.20 | 13.88% | 17.50 | 12.50% | 9.30 | 11.43% |
| DQN-IC | 20.10 | 9.87% | 18.30 | 8.50% | 9.80 | 6.67% |
| STOCHASTIC | 21.50 | 3.59% | 19.80 | 1.00% | 10.20 | 2.86% |
| MAXWAVE | 18.70 | 16.14% | 17.10 | 14.50% | 9.00 | 14.29% |
| MAXPRESSURE | 20.00 | 10.31% | 18.00 | 10.00% | 9.70 | 7.62% |
| Fixed-Time | 22.30 | — | 20.00 | — | 10.50 | — |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Ma, L.; Liu, Y.; Liu, Y.; Ma, C.; Wang, S. Coordinated Multi-Intersection Traffic Signal Control Using a Policy-Regulated Deep Q-Network. Sustainability 2026, 18, 1510. https://doi.org/10.3390/su18031510
Ma L, Liu Y, Liu Y, Ma C, Wang S. Coordinated Multi-Intersection Traffic Signal Control Using a Policy-Regulated Deep Q-Network. Sustainability. 2026; 18(3):1510. https://doi.org/10.3390/su18031510
Chicago/Turabian StyleMa, Lin, Yan Liu, Yang Liu, Changxi Ma, and Shanpu Wang. 2026. "Coordinated Multi-Intersection Traffic Signal Control Using a Policy-Regulated Deep Q-Network" Sustainability 18, no. 3: 1510. https://doi.org/10.3390/su18031510
APA StyleMa, L., Liu, Y., Liu, Y., Ma, C., & Wang, S. (2026). Coordinated Multi-Intersection Traffic Signal Control Using a Policy-Regulated Deep Q-Network. Sustainability, 18(3), 1510. https://doi.org/10.3390/su18031510

