Traffic Signal Control via Reinforcement Learning: A Review on Applications and Innovations
Abstract
:1. Introduction
1.1. Motivation
1.2. Literature Analysis Approach
- Study Selection: To ensure the scientific robustness of this review, a thorough selection methodology was applied, drawing exclusively from Scopus—and Web of Science—indexed peer-reviewed journals and conferences. An initial set of over 250 publications was screened based on abstracts, from which the most pertinent studies were shortlisted for full analysis. The selection process adhered to multiple quality criteria: (a) Citation Threshold: Only studies with at least 30 citations, excluding self-citations, were considered to guarantee recognized academic impact, verified through Scopus at the time of review. In addition, given their recent publication, the studies from 2023 and 2024 constituted an exception, with their threshold set at 10 citations. (b) Topical Focus: Only research directly addressing RL-driven traffic signal control was included, excluding unrelated works on traffic estimation, non-RL-based signal systems, or broader urban mobility management. (c) Peer-Review Status: Only fully peer-reviewed articles and top-tier conference papers from publishers like Elsevier, IEEE, MDPI, and Springer were selected, with pre-prints and non-peer-reviewed studies omitted. (d) Methodological Transparency: Papers were required to clearly document their RL setup, control objectives, evaluation metrics, and comparative benchmarking. (e) Methodological Diversity: A balanced representation of value-based, policy-based, and actor-critic RL research applications was maintained, covering varying signal traffic control cases.
- Keyword Strategy: A targeted keyword strategy was designed to maintain focus on reinforcement learning applications for traffic signal control. Primary search terms included “Reinforcement Learning for Traffic Signal Control”, “RL-based Traffic Light Optimization”, “Adaptive Traffic Signal Control using RL”, “Deep Reinforcement Learning for Traffic Signal Management”, “Reinforcement Learning for Traffic Signal Control”, and “RL-based Adaptive Traffic Management”. Care was given to select phrases closely aligned with signal control tasks, avoiding broader traffic management or routing topics.
- Data Categorization: Each selected study was carefully classified across multiple dimensions to enable thorough and structured analysis. The categorization included the RL methodology type and specific algorithms employed, the agent structure (single-agent vs. multi-agent control), the nature of the reward function, the performance metrics used for evaluation, and the baseline control strategies adopted for comparison. Furthermore, the scale of the traffic environment—whether single intersections or larger networks—was documented. Finally, the simulation platforms utilized for model training and validation were systematically recorded to assess the realism and comparability of the reported results.
- Quality Assessment: A systematic evaluation process was applied to ensure the inclusion of impactful and credible research. Specifically, citation metrics were used as a primary filter, requiring a minimum of 30 citations (excluding self-citations), with data sourced from Scopus to guarantee reliability. Studies published in high-impact journals and leading conference proceedings were favored. Beyond citation counts, the academic profile of the authors was assessed, focusing on their contributions to the fields of reinforcement learning and traffic signal control. Special emphasis was placed on authors with proven expertise, evidenced by publications in top-tier venues, contributions to the development or refinement of RL techniques for control problems, and affiliations with prominent transportation research centers or AI laboratories. This multi-faceted evaluation helped to prioritize studies offering substantial and credible advancements in RL-based traffic signal control.
- Findings Synthesis: Key insights were systematically organized into thematic categories to enable clear comparisons across different RL methodologies applied to traffic signal control. This structured synthesis not only contributed to a comprehensive understanding of the field’s development but also enabled the generation of statistical analyses across different study attributes, as well as the identification of emerging trends and persistent challenges within the domain. Furthermore, it provided a solid foundation for formulating informed future research directions, highlighting gaps in current approaches and proposing potential advancements in RL-driven traffic signal control optimization.
1.3. Previous Work
1.4. Contribution and Novelty
1.5. Paper Structure
2. Traffic Signal Control Approaches
2.1. Primary TSC Control Types
- Fixed-Time (Pre-Timed) Control: This state-of-the-art approach operates on predetermined schedules, assigning fixed durations to signal phases based on historical traffic patterns [61]. While effective in environments with consistent flows due to its simplicity, it lacks adaptability to real-time conditions. Consequently, it performs poorly during disruptions such as accidents, special events, or sudden demand surges. Subcategories include [61,62,63]:
- Isolated Fixed-Time Control: Each intersection follows a standalone schedule with no inter-coordination.
- Coordinated Fixed-Time Control: Intersections are synchronized to enable green wave progression along corridors.
- Time-of-Day Control: Schedules vary by time period but remain non-responsive to live traffic conditions.
Despite its straightforwardness, FT control often leads to inefficiencies in dynamic traffic scenarios [64]. - Actuated Control: These state-of-the-art systems adjust signal timings based on real-time inputs from sensors like loop detectors, cameras, or infrared devices [65]. They may be typically classified as [66,67]:
- Semi-Actuated Control: Detection is installed on minor roads, allowing the major road to remain green until a vehicle is detected on the minor approach [68].
- Fully Actuated Control: Sensors monitor all approaches, enabling dynamic adjustments from all directions based on traffic conditions. This setup suits areas with unpredictable traffic and enhances flow compared to fixed-time systems [69].
However, actuated systems operate locally and lack coordination across intersections, limiting their network-level optimization capabilities [58]. - Adaptive Control: Adaptive traffic signal control (ATSC) systems represent the most advanced form of signal management. They are able to continuously monitor traffic in real time and apply algorithmic models to optimize signals across networks [70], aiming to reduce delays and adapt to fluctuating demand [39]. Subclasses include:Model-based systems are preferred for structured environments, while model-free methods excel in complex, dynamic scenarios.
2.2. General Concept of RL Control in TSC
- Environment: The traffic network constitutes the environment in which the RL agent operates. It may include intersections, vehicle and pedestrian flows, sensors, and external disturbances like weather or sudden congestion. The RL agent does not control this environment directly but learns to respond to its dynamics by optimizing signal timings.
- Sensors: Real-time monitoring is enabled through loop detectors, video feeds, GPS data, LiDAR, and connected vehicle systems. Such sensors capture observations—such as queue lengths, arrival rates, and pedestrian activity—that inform the RL agent’s decisions.
- RL Agent(s): Acting as the system’s intelligence, the RL agent evaluates incoming traffic data and determines optimal signal phases. Depending on the method used (value-based, policy-based, or actor-critic), it learns to minimize congestion, delays, and emissions through continuous interaction with the environment.
- Control Decision Application: Once an action is selected, the system implements signal adjustments—such as extending green times, skipping phases, modifying cycle lengths, or coordinating multiple intersections—based on the agent’s output.
- Environment Update: After applying the control decisions, the environment evolves. Vehicles move, congestion shifts, and new traffic enters, resulting in a new state that becomes the input for the agent’s next decision.
- Reward Computation: The system evaluates the agent’s decision using predefined metrics. Typical reward functions assess reductions in waiting time, queue lengths, fuel consumption, emissions, and improvements in pedestrian safety and traffic balance.
- Reward Signal Feedback: The computed reward is fed back into the RL algorithm, refining its policy. Through this iterative process, the agent improves its control strategy, enabling real-time, adaptive signal management.
3. Mathematical Framework of Reinforcement Learning
3.1. The General Concept of RL
3.2. Common RL Algorithms for Traffic Signal Control
3.2.1. Value-Based Algorithms
3.2.2. Policy-Based Algorithms
3.2.3. Actor-Critic Algorithms
- Deep Deterministic Policy Gradient: The DDPG methodology is mostly suited for continuous control tasks, such as adjusting green durations in real time. It updates the critic and actor as follows:
- Soft Actor-Critic: The SAC methodology integrates entropy into the reward function, encouraging exploration and improving robustness in multi-agent settings [100]. Its objective includes an entropy term:
- Advantage Actor-Critic: A2C improves training by incorporating an advantage function to reduce variance:
4. Attribute Tables and Summaries of RL Applications
4.1. Attribute Tables and Summaries Description
- Ref.: contains the reference application is listed in the first column;
- Year: contains the publication year for each research application;
- Method: contains the specific RL algorithmic methodology applied in each application;
- Agent: contains the agent type of the concerned methodology (single-agent or multi-agent RL approach);
- Baseline: illustrates the comparison methods used to evaluate the proposed RL approach, such as fixed-time (FT), actuated control (AC), max-pressure (MP), CoLight (CL), PressLight (PL), MetaLight (ML), FRAP, SOTL, or other RL-based strategies;
- City Network: contains the main urban traffic network or city where simulations were conducted; each country is abbreviated in parentheses, while simulated traffic networks that do not correspond to a specific city are abbreviated as “synthetic”;
- Junctions: illustrates the number of intersections involved in the study (if multiple scenarios or city networks were evaluated, values for different networks are separated by a “/”);
- Simulation: illustrates the traffic simulation platform used to test and validate the RL method, such as SUMO, VISSIM, Aimsun, CityLearn, or others;
- Data: describes the utilized data that the RL algorithms were trained, tested, and validate on: real data from actual traffic networks are denoted as “real”, simulated and synthetic data as “sim”, and cases where both types of data were utilized are denoted as “both”;
- Cit.: the number of citations—according to Scopus—of each research application.
- Author: contains the name of the author along with the reference application;
- Summary: contains a brief description of the research work;
4.2. Value-Based RL Applications
4.3. Policy-Based RL Applications
4.4. Actor-Critic RL Applications
4.5. Hybrid RL Applications
5. Evaluation
- Methodology and Type: Defines the core structure and features of each RL algorithm.
- Agent Architectures: Explores prevailing agent architectures, emphasizing multi-agent RL trends.
- Reward Functions: Analyzes reward design variations across implementations.
- Performance Indexes: Identifies commonly used evaluation metrics and their characteristics.
- Baseline Control Types: Reviews baseline strategies used to benchmark RL performance.
- Intersection Sizes: Assesses intersection modeling complexity, revealing scalability challenges.
- Simulation Tools: Lists the platforms used for simulating RL-based traffic control.
5.1. Methodologies and Types
5.2. Agent Architectures
5.3. Reward Functions
5.4. Performance Indexes
5.5. Baseline Control Types
5.6. Intersections Sizes
- Single-scale: 1 intersection;
- Minimal-scale: 2–6 intersections;
- Moderate-scale: 7–30 intersections;
- Large-scale: 30+ intersections.
5.7. Practical Applicability
- Computational Efficiency: This reflects both training and inference complexity. A high score concern studies that utilized lightweight models (e.g., shallow networks or simplified tabular Q-learning) and demonstrated fast convergence, often within a few episodes or minutes of simulated time. Such models are typically suitable for real-time deployment or hardware-constrained environments. For example, simple DQN architectures applied to four-phase intersections with prioritized experience replay often fall into this category. A medium score indicates moderate model complexity—such as e.g., deep actor-critic methods—that require longer training times but are still computationally feasible using standard GPUs or cloud-based environments. These may include multi-agent PPO or A3C models applied to mid-sized grids. A low rating is reserved for models involving large neural networks, extensive hyperparameter tuning, or unstable learning dynamics (e.g., convergence issues, reward oscillations). Such models may include, for instance, DDPG-based methods with continuous actions across dozens of agents, which are computationally intensive and often unsuitable for real-time use without high-performance computing infrastructure.
- Scalability: The scalability attribute assesses how well the RL approach extends from small-scale to large-scale or city-wide deployments. A high rating concern RL applications validated on large urban networks—e.g., large-scale networks with 30+ intersections or city-level topologies—often by using decentralized or hierarchical multi-agent structures that minimize communication overhead. Examples include MARL–DQN variants tested on synthetic or real urban networks with asynchronous updates. A medium score is assigned to studies that demonstrated applicability to moderate-scale scenarios, such as 3 × 3 or 5 × 5 intersection grids—typically moderate-scale complexity considering 7–30 intersections—where scalability was addressed but not rigorously tested or discussed. Centralized training with decentralized execution (CTDE) frameworks often fall here. A low score indicates evaluation on isolated intersections or minimal-node networks, without explicit consideration of inter-agent coordination or traffic spill-back effects: single and minimal-scale networks of 1–6 intersections. Such applications tend to lack architectural features or experimental adequacy, necessary for generalizing to larger, more complex road networks.
- Generalization Ability: Generalization measures robustness to unseen traffic scenarios, layouts, or non-stationary environments. A high score applies to RL cases that explicitly evaluate transfer learning, domain randomization, or training under variable traffic demands—indicating adaptability beyond a single training case. Examples include studies using multi-scenario training or that successfully transfer policies to different intersection geometries or demand profiles. The medium rating is for cases where models are trained with minor variations in traffic input or tested on different seed initializations, but without formal generalization analysis. RL methods that utilized fixed synthetic routes or rely on a single SUMO simulation file usually fall into this category. The low rating is for RL applications that have been solely trained and evaluated in static, deterministic settings, with no consideration for transferability, noise resilience, or policy degradation under new conditions.
5.8. Simulation Tools
- SUMO: A widely adopted open-source, microscopic, multi-modal traffic simulator offering scalability, adaptability, and robust modeling capabilities (e.g., lane changing, multi-lane configurations, emissions analysis) [159]. SUMO’s modular structure and tools (e.g., netconvert, duarouter) streamline network preparation and traffic scenario simulation, making it ideal for RL-based signal control studies.
- Vissim: A commercial, high-resolution traffic simulation platform from PTV Group [160]. Known for accurate multimodal traffic modeling, advanced driver behavior algorithms, and scenario management, Vissim is suited for evaluating transport infrastructure and integrating RL-based control with high realism.
- Aimsun: A commercial predictive traffic analytics platform supporting adaptive control and intelligent transportation systems (ITS) [161]. Aimsun enables performance assessment, network evolution forecasting, and real-time optimization, making it suitable for RL applications in large-scale digital mobility solutions.
- CityFlow: An open-source, high-performance traffic simulator designed for scalability in RL-based traffic control [162]. CityFlow features optimized data structures and algorithms for city-wide simulations, outperforming SUMO in speed while supporting flexible configurations and seamless RL integration [162].
- Other: A limited number of studies employed alternative simulators. For instance, in [103], researchers used Paramics [163], known for detailed micro-simulations and behavior modeling. Meanwhile, in [134] they used a Unity3D-based custom simulator [164], offering scenario flexibility for advanced sensor integration and agent modeling.
6. Discussion
6.1. Current Trends
- ✓
- Dominance of Value-Based RL: Value-based methods, particularly DQN and its variants, remain the most widely used due to their alignment with the discrete nature of traffic signal phases and their stable convergence properties [108,111,113,124]. Their origins in game-based RL tasks made them well-suited for modeling discrete decisions like phase switching, supporting simplified policy updates and lower training overhead [103,107,128]. Enhancements like double and dueling DQN address overestimation and feature learning, improving the capture of spatiotemporal traffic patterns [119,128].
- ✓
- Growing Adoption of Hybrid and Domain-Specific Techniques: A marked shift toward combining RL paradigms—value-based, policy-based, fuzzy logic, and MPC—has emerged to better handle real-world traffic complexities [147,152,154,157]. These hybrids leverage complementary strengths such as fuzzy logic’s interpretability or model-based components’ predictive accuracy. Embedding domain-specific priors (e.g., road topology, priority rules) helps align RL with engineering logic and enhances performance in dynamic settings [145,155].
- ✓
- Advanced Neural Modules as Key Enablers: Neural enhancements like GNNs, attention mechanisms, and temporal encoders (e.g., LSTMs) have been instrumental in capturing spatial and temporal dependencies in multi-intersection networks [104,121,148]. GNNs facilitate inter-agent communication through relational state embeddings, while attention layers highlight critical lanes or intersections within complex urban grids [135,136,145].
- ✓
- Prevalence of Multi-Agent Architectures: The decentralized nature of urban traffic has driven widespread adoption of multi-agent RL [120,149]. Treating each intersection as an agent allows localized decision making with shared coordination logic via mechanisms such as neighborhood masking and graph-based message passing, enabling scale-out from grids to entire cities [105,119].
- ✓
- Reliance on Classic Traffic Metrics: Most studies anchor their reward functions to traditional indicators like waiting time, queue length, or average delay [103,133]. These metrics offer a clear bridge to conventional traffic management practices, facilitating both validation and interpretability in comparative analysis [107,135].
- ✓
- ✓
- Algorithm Choice Driven by Objective Complexity, Not Scale: The selected RL framework often correlates more with task complexity (e.g., safety, sustainability) than with network size [114,145]. Sophisticated methods appear even in single-intersection studies, while simpler techniques persist in large-scale scenarios when congestion is the primary concern [136,158].
- ✓
- ✓
- ✓
- ✓
6.2. Challenge Identification
- ■
- Algorithmic Efficiency: As denoted in the evaluation section, value-based RL algorithms such as QL and DQN frequently encounter limitations considering sample inefficiency and instability during training, particularly in environments with extensive state-action spaces or complex intersection networks [105,107,108,111]. This type of limitation is exacerbated as the scale of deployment expands, necessitating richer state representations and advanced architectures—such as DDQN, dueling DQN, or hierarchical frameworks—which, although addressing some issues, bring additional complexities [138,139,148,158]. Furthermore, policy-based approaches, while beneficial for continuous control tasks, remain computationally demanding due to their high sample complexity and training variance, restricting broader adoption in discrete, complex traffic environments [133,134,135]. Additionally, actor-critic methods, such as A2C and DDPG, are limited by issues related to training stability, especially under partially observable or highly dynamic conditions, reducing their practical applicability in complex traffic signal control applications [137,140,141].
- ■
- Reward Function Design: Reward function design is another critical challenge identified in the literature, with most current studies focusing on simplified queue- or waiting-time-based metrics, which frequently fail to capture essential dimensions like emissions, safety, and multimodal traffic dynamics comprehensively [103,121,143,148,149]. Although recent efforts toward multi-objective and hybrid reward functions attempt to address such limitations, balancing competing objectives remains a significant challenge, requiring sophisticated algorithms capable of navigating complex trade-offs [118,127,132,136].
- ■
- Limited Safety-Critical Aspects: Additionally, few studies have comprehensively addressed safety-critical aspects, such as crash or conflict avoidance, within real-world contexts. Current RL methodologies tend to prioritize efficiency metrics like travel time reduction and throughput optimization, often neglecting thorough validation of safety considerations. This oversight indicates a notable gap, as the integration of safety constraints is essential for gaining public trust and ensuring regulatory compliance, especially in dense urban environments where the potential for accidents and conflicts remains high [117,129].
- ■
- Computational Complexity and Real-Time Responsiveness: Advanced RL models such as deep neural networks, graph-based methods, GAT-based hybrids, DDPG architectures, and multi-agent frameworks introduce significant computational complexity, which often conflicts with stringent real-time decision-making requirements. Ensuring that RL-based traffic signal decisions can be computed and implemented promptly under realistic temporal constraints remains a critical bottleneck. Such computational demands potentially restrict the real-world applicability and scalability of these sophisticated algorithms, emphasizing the need for efficient, lightweight RL solutions [139,140,142,148,153].
- ■
- Continuous Retraining Demand: Last but not least, ongoing maintenance and the requirement for continuous retraining of RL systems to adapt to evolving traffic patterns and urban conditions represent a considerable long-term commitment in terms of resources and technical expertise, adding complexity to sustainable deployment [114,154,155].
- ■
- Transferability and Scalability to Other TSC Frameworks: RL models frequently exhibit limited transferability, demonstrating a notable degradation in performance when applied to different urban scenarios or varied traffic conditions [112,142,156]. This highlights challenges regarding robustness and the generalization capabilities of existing RL frameworks, especially in unseen environments. Especially in the case of multi-agent control applications, the coordination among agents under real-world communication delays and partial observability makes scaling up even more difficult. Concurrently, scalability remains a significant obstacle, as expanding RL systems from single intersections to large urban networks introduces complexities such as non-stationarity, increased inter-agent communication overhead, and difficulties maintaining convergence stability. Thus, enhancing both transferability and scalability is critical for practical RL deployment in diverse and dynamic scenarios [115,119,120,148].
- ■
- Transition to Real-World Deployment: Previous limitations strongly prohibit real-life implementations, which remain significantly low in number. Traffic simulations, although sophisticated, often oversimplify real-world complexities, including unpredictable driver behaviors, diverse vehicle types, and unforeseen incidents, making simulation-to-reality transfer problematic [159,162]. Additionally, heterogeneous traffic patterns across cities, the need for fast and reliable communication, and the difficulty of transferring models trained in idealized environments to dynamic, noisy real-world conditions further inhibit deployment [159,161,162]. As a result, simulative implementations prove inadequate and potentially unsafe for managing real-life TSC scenarios.
- ■
- Standardization and Stakeholders Trust: Other practical deployment challenges include issues related to standardization, as existing urban infrastructures often vary significantly, complicating the integration and uniform application of RL-based systems. This attribute is grounded also in the regulatory and infrastructure heterogeneity between networks, which worsens standardization challenges. Moreover, real-world implementations face further acceptance barriers due to trust and transparency concerns among city planners, policymakers, and the public, necessitating explainable and interpretable RL solutions, which are currently scarce [116,157].
- ■
- Interpretability Issues: Last but no least, the lack of interpretability in RL-based decisions is hindering stakeholders and policymakers in terms of trusting and adopting such technologies as depicted in [116,117,157]. Developing transparent and explainable models is a critical yet unresolved issue in RL deployment within urban traffic management, as indicated in the literature.
6.3. Future Directions
- ➠
- Algorithmic Efficiency Enhancement: Future research should focus on developing more sample-efficient, stable RL algorithms by leveraging distributed RL architectures, hybrid implementations, prioritized experience replay, and lightweight actor-critic architectures. Improving algorithmic efficiency will directly lead to faster convergence under large-scale and dynamic urban traffic conditions, minimizing training costs and making RL-based traffic control more viable for real-world, resource-constrained deployments.
- ➠
- Expanding Multi-Objective Reward Functions: Work should additionally prioritize multi-objective frameworks that balance diverse goals such as mobility, safety, sustainability, and fairness. Hierarchical or compositional reward structures could enable RL agents to make trade-offs intelligently, aligning operational behavior with broader urban policy objectives. This direction will lead to traffic control strategies that are not only efficient but also socially equitable and environmentally responsible.
- ➠
- Embedding Safety and Ethics into RL Policies: Future RL frameworks should integrate explicit safety constraints and ethical considerations into the policy training phase. Approaches like constrained reinforcement learning and risk-aware optimization will ensure the proactive avoidance of accidents and the prioritization of vulnerable users. Advancing safety-centric RL will be critical for gaining public trust and securing regulatory approvals for city-wide deployments.
- ➠
- Reducing Computational Complexity and Improving Real-Time Responsiveness: To address the real-time control requirements, research must focus on model compression techniques, such as pruning, quantization, and knowledge distillation, combined with efficient online decision-making architectures. Achieving lower computational complexity will facilitate the deployment of RL controllers on edge devices or existing urban infrastructure, opening the door for broader, cost-effective adoption.
- ➠
- Continuous Retraining through Lifelong Learning: Future RL traffic systems should incorporate lifelong learning and online adaptation mechanisms that allow incremental updates without catastrophic forgetting. By enabling agents to continually learn from new traffic patterns and events, the need for expensive and disruptive offline retraining cycles will be minimized, ultimately supporting the long-term sustainability and robustness of urban traffic management systems.
- ➠
- Enhancing Transferability and Scalability: Research efforts should emphasize meta-learning, domain adaptation, and modular policy design, aiming to create RL models that generalize effectively across different traffic conditions, intersection topologies, and city layouts. Enhancing transferability and scalability will allow RL systems to move beyond isolated deployments and towards managing complex, heterogeneous urban areas at the city scale.
- ➠
- Closing the Simulation-to-Reality Gap: Emphasis must shift toward robust sim-to-real transfer techniques such as domain randomization, real-time adaptation layers, and robust policy optimization under uncertainty. Strengthening sim-to-real capabilities will ensure that RL agents trained in synthetic environments are adequate to operate safely and reliably under real-world conditions, significantly accelerating the deployment timeline of RL-controlled traffic systems.
- ➠
- Establishing Standardization Protocols and Stakeholder Trust: Future research should propose standardized evaluation metrics, deployment protocols, and validation benchmarks for RL-based traffic systems. Simultaneously, fostering participatory design involving city stakeholders and transparent performance reporting will be critical for ensuring acceptance, facilitating cross-city model transfer, and enabling systematic scaling of RL-based urban mobility solutions.
- ➠
- Boosting Explainability and Interpretability: Developing inherently interpretable RL policies or applying explainable AI techniques like attention visualization, causal analysis, or symbolic policy extraction will be beneficial. This will empower traffic engineers, policymakers, and the public to understand, trust, and regulate RL systems, creating transparent governance frameworks and making RL-driven traffic control socially acceptable.
7. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
A2C | Advantage actor-critic |
AC | Actuated control |
ANN | Artificial neural network |
ATSC | Adaptive traffic signal control |
CAV | Connected and autonomous vehicles |
CL | CoLight |
CV | Connected vehicles |
D3QN | Dueling double deep Q-network |
DDPG | Deep deterministic policy gradient |
DDQN | Double deep Q-network |
DHW | Domestic got water |
DQN | Deep Q-network |
DRL | Deep reinforcement learning |
ESS | Energy storage system |
FLC | Fuzzy logic control |
FT | Fixed-time |
GA | Genetic algorithm |
GAT | Graph attention network |
GCNN | Graph convolutional neural network |
ITS | Intelligent transportation systems |
LSTM | Long short-term memory |
MA2C | Multi-agent actor-critic |
MARL | Multi-agent reinforcement learning |
MCTS | Monte Carlo tree search |
MDP | Markov decision process |
ML | MetaLight |
MP | Max-pressure |
MPC | Model predictive control |
PG | Policy gradient |
PL | PressLight |
QL | Q-learning |
RBC | Rule-based control |
RL | Reinforcement learning |
SAC | Soft actor-critic |
SOTL | Self-organizing traffic lights |
TD3 | Twin delayed deep deterministic policy gradient |
TSC | Traffic signal control |
References
- Pergantis, E.N.; Sangamnerkar, A.S. Sensors, Storage, and Algorithms for Practical Optimal Controls in Residential Buildings. ASHRAE Trans. 2023, 129, 437–444. [Google Scholar]
- Pergantis, E.N.; Dhillon, P.; Premer, L.D.R.; Lee, A.H.; Ziviani, D.; Kircher, K.J. Humidity-aware model predictive control for residential air conditioning: A field study. Build. Environ. 2024, 266, 112093. [Google Scholar] [CrossRef]
- D’Agostino, D.; D’Auria, M.; Minelli, F.; Minichiello, F. Multi-criteria Decision-Making for Thermal Insulation of an Existing Office Building Considering Environmental, Energy, and Economic Performance. In Sustainability in Energy and Buildings 2023; Springer: Singapore, 2024; pp. 167–177. [Google Scholar]
- D’Agostino, D.; Minelli, F.; Minichiello, F. New genetic algorithm-based workflow for multi-objective optimization of Net Zero Energy Buildings integrating robustness assessment. Energy Build. 2023, 284, 112841. [Google Scholar] [CrossRef]
- Minelli, F.; D’Agostino, D.; Migliozzi, M.; Minichiello, F.; D’Agostino, P. PhloVer: A modular and integrated tracking photovoltaic shading device for sustainable large urban spaces—Preliminary study and prototyping. Energies 2023, 16, 5786. [Google Scholar] [CrossRef]
- Takayanagi, M.; Yoshino, M.; Kikuchi, G.; Kanke, T.; Suzuki, N. Autonomous adaptive control of manufacturing parameters based on local regression modeling. Behaviormetrika 2024, 51, 499–513. [Google Scholar] [CrossRef]
- Shirazi, B.; Mahdavi, I.; Solimanpur, M. Intelligent decision support system for the adaptive control of a flexible manufacturing system with machine and tool flexibility. Int. J. Prod. Res. 2012, 50, 3288–3314. [Google Scholar] [CrossRef]
- Rudomanenko, A.; Chernyadev, N.; Vorotnikov, S. Adaptive control system for industrial robotic manipulator. In Proceedings of the Modern Problems of Robotics: Second International Conference, MPoR 2020, Moscow, Russia, 25–26 March 2020; Revised Selected Papers 2. pp. 153–164. [Google Scholar]
- Karatzinis, G.D.; Michailidis, P.; Michailidis, I.T.; Kapoutsis, A.C.; Kosmatopoulos, E.B.; Boutalis, Y.S. Coordinating heterogeneous mobile sensing platforms for effectively monitoring a dispersed gas plume. Integr. Comput.-Aided Eng. 2022, 29, 411–429. [Google Scholar] [CrossRef]
- Salavasidis, G.; Kapoutsis, A.C.; Chatzichristofis, S.A.; Michailidis, P.; Kosmatopoulos, E.B. Autonomous trajectory design system for mapping of unknown sea-floors using a team of AUVs. In Proceedings of the 2018 European Control Conference (ECC), Limassol, Cyprus, 12–15 June 2018; pp. 1080–1087. [Google Scholar]
- Keroglou, C.; Kansizoglou, I.; Michailidis, P.; Oikonomou, K.M.; Papapetros, I.T.; Dragkola, P.; Michailidis, I.T.; Gasteratos, A.; Kosmatopoulos, E.B.; Sirakoulis, G.C. A survey on technical challenges of assistive robotics for elder people in domestic environments: The aspida concept. IEEE Trans. Med. Robot. Bionics 2023, 5, 196–205. [Google Scholar] [CrossRef]
- Van der Pol, E.; Oliehoek, F.A. Coordinated deep reinforcement learners for traffic light control. In Proceedings of the Learning, Inference and Control of Multi-Agent Systems (at NIPS 2016), Barcelona, Spain, 10 December 2016; Volume 8, pp. 21–38. [Google Scholar]
- Zhang, K.; Cui, Z.; Ma, W. A survey on reinforcement learning-based control for signalized intersections with connected automated vehicles. Transp. Rev. 2024, 44, 1187–1208. [Google Scholar] [CrossRef]
- Hashmi, H.T.; Ud-Din, S.; Khan, M.A.; Khan, J.A.; Arshad, M.; Hassan, M.U. Traffic Flow Optimization at Toll Plaza Using Proactive Deep Learning Strategies. Infrastructures 2024, 9, 87. [Google Scholar] [CrossRef]
- Guerrieri, M.; Parla, G. Deep learning and yolov3 systems for automatic traffic data measurement by moving car observer technique. Infrastructures 2021, 6, 134. [Google Scholar] [CrossRef]
- El-Tantawy, S.; Abdulhai, B.; Abdelgawad, H. Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (MARLIN-ATSC): Methodology and large-scale application on downtown Toronto. IEEE Trans. Intell. Transp. Syst. 2013, 14, 1140–1150. [Google Scholar] [CrossRef]
- Zhao, D.; Dai, Y.; Zhang, Z. Computational intelligence in urban traffic signal control: A survey. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 2011, 42, 485–494. [Google Scholar] [CrossRef]
- Efthymiou, I.P.; Egleton, T.E. Artificial intelligence for sustainable smart cities. In Handbook of Research on Applications of AI, Digital Twin, and Internet of Things for Sustainable Development; IGI Global: Hershey, PA, USA, 2023; pp. 1–11. [Google Scholar]
- Choosakun, A.; Chaiittipornwong, Y.; Yeom, C. Development of the cooperative intelligent transport system in Thailand: A prospective approach. Infrastructures 2021, 6, 36. [Google Scholar] [CrossRef]
- Qadri, S.S.S.M.; Gökçe, M.A.; Öner, E. State-of-art review of traffic signal control methods: Challenges and opportunities. Eur. Transp. Res. Rev. 2020, 12, 1–23. [Google Scholar] [CrossRef]
- Pell, A.; Meingast, A.; Schauer, O. Trends in real-time traffic simulation. Transp. Res. Procedia 2017, 25, 1477–1484. [Google Scholar] [CrossRef]
- Michailidis, I.T.; Michailidis, P.; Rizos, A.; Korkas, C.; Kosmatopoulos, E.B. Automatically fine-tuned speed control system for fuel and travel-time efficiency: A microscopic simulation case study. In Proceedings of the 2017 25th Mediterranean Conference on Control and Automation (MED), Valletta, Malta, 3–6 July 2017; pp. 915–920. [Google Scholar]
- Zheng, X.; Wu, F.; Chen, W.; Naghizade, E.; Khoshelham, K. Show me a safer way: Detecting anomalous driving behavior using online traffic footage. Infrastructures 2019, 4, 22. [Google Scholar] [CrossRef]
- Michailidis, I.T.; Kapoutsis, A.C.; Korkas, C.D.; Michailidis, P.T.; Alexandridou, K.A.; Ravanis, C.; Kosmatopoulos, E.B. Embedding autonomy in large-scale IoT ecosystems using CAO and L4G-CAO. Discov. Internet Things 2021, 1, 1–22. [Google Scholar] [CrossRef]
- Michailidis, I.T.; Manolis, D.; Michailidis, P.; Diakaki, C.; Kosmatopoulos, E.B. A decentralized optimization approach employing cooperative cycle-regulation in an intersection-centric manner: A complex urban simulative case study. Transp. Res. Interdiscip. Perspect. 2020, 8, 100232. [Google Scholar] [CrossRef]
- Wang, S.; Xie, X.; Huang, K.; Zeng, J.; Cai, Z. Deep reinforcement learning-based traffic signal control using high-resolution event-based data. Entropy 2019, 21, 744. [Google Scholar] [CrossRef]
- Kotsi, A.; Politis, I.; Chaniotakis, E.; Mitsakis, E. A Multi-Player Framework for Sustainable Traffic Optimization in the Era of Digital Transportation. Infrastructures 2024, 10, 6. [Google Scholar] [CrossRef]
- Ištoka Otković, I.; Deluka-Tibljaš, A.; Zečević, Đ.; Šimunović, M. Reconstructing Intersection Conflict Zones: Microsimulation-Based Analysis of Traffic Safety for Pedestrians. Infrastructures 2024, 9, 215. [Google Scholar] [CrossRef]
- Šurdonja, S.; Maršanić, R.; Deluka-Tibljaš, A. Speed Analyses of Intersections Reconstructed into Roundabouts: A Case Study from Rijeka, Croatia. Infrastructures 2024, 9, 159. [Google Scholar] [CrossRef]
- Olayode, I.O.; Severino, A.; Tartibu, L.K.; Arena, F.; Cakici, Z. Performance evaluation of a hybrid PSO enhanced ANFIS model in prediction of traffic flow of vehicles on freeways: Traffic data evidence from South Africa. Infrastructures 2021, 7, 2. [Google Scholar] [CrossRef]
- Amézquita-López, J.; Valdés-Atencio, J.; Angulo-García, D. Understanding traffic congestion via network analysis, agent modeling, and the trajectory of urban expansion: A coastal city case. Infrastructures 2021, 6, 85. [Google Scholar] [CrossRef]
- Pergantis, E.N.; Premer, L.D.R.; Lee, A.H.; Ziviani, D.; Groll, E.A.; Kircher, K.J. Latent and Sensible Model Predictive Controller Demonstration in a House During Cooling Operation. ASHRAE Trans. 2024, 130, 177–185. [Google Scholar]
- Wan, C.H.; Hwang, M.C. Adaptive traffic signal control methods based on deep reinforcement learning. In Intelligent Transport Systems for Everyone’s Mobility; Springer: Singapore, 2019; pp. 195–209. [Google Scholar]
- Pergantis, E.N.; Al Theeb, N.; Dhillon, P.; Ore, J.P.; Ziviani, D.; Groll, E.A.; Kircher, K.J. Field demonstration of predictive heating control for an all-electric house in a cold climate. Appl. Energy 2024, 360, 122820. [Google Scholar] [CrossRef]
- Wei, H.; Zheng, G.; Yao, H.; Li, Z. Intellilight: A reinforcement learning approach for intelligent traffic light control. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 2496–2505. [Google Scholar]
- Wang, H.; Zhu, J.; Gu, B. Model-based deep reinforcement learning with traffic inference for traffic signal control. Appl. Sci. 2023, 13, 4010. [Google Scholar] [CrossRef]
- Ye, B.L.; Wu, W.; Ruan, K.; Li, L.; Chen, T.; Gao, H.; Chen, Y. A survey of model predictive control methods for traffic signal control. IEEE/CAA J. Autom. Sin. 2019, 6, 623–640. [Google Scholar] [CrossRef]
- Michailidis, I.T.; Manolis, D.; Michailidis, P.; Diakaki, C.; Kosmatopoulos, E.B. Autonomous self-regulating intersections in large-scale urban traffic networks: A Chania City case study. In Proceedings of the 2018 5th International Conference on Control, Decision and Information Technologies (CoDIT), Thessaloniki, Greece, 10–13 April 2018; pp. 853–858. [Google Scholar]
- Spall, J.C.; Chin, D.C. A model-free approach to optimal signal light timing for system-wide traffic control. In Proceedings of the 1994 33rd IEEE Conference on Decision and Control, Lake Buena Vista, FL, USA, 14–16 December 1994; Volume 2, pp. 1868–1875. [Google Scholar]
- Lazaridis, C.R.; Michailidis, I.; Karatzinis, G.; Michailidis, P.; Kosmatopoulos, E. Evaluating reinforcement learning algorithms in residential energy saving and comfort management. Energies 2024, 17, 581. [Google Scholar] [CrossRef]
- Michailidis, P.; Michailidis, I.; Vamvakas, D.; Kosmatopoulos, E. Model-free HVAC control in buildings: A review. Energies 2023, 16, 7124. [Google Scholar] [CrossRef]
- D’Agostino, D.; De Falco, F.; Minelli, F.; Minichiello, F. New robust multi-criteria decision-making framework for thermal insulation of buildings under conflicting stakeholder interests. Appl. Energy 2024, 376, 124262. [Google Scholar] [CrossRef]
- Vamvakas, D.; Michailidis, P.; Korkas, C.; Kosmatopoulos, E. Review and evaluation of reinforcement learning frameworks on smart grid applications. Energies 2023, 16, 5326. [Google Scholar] [CrossRef]
- Michailidis, P.; Michailidis, I.; Kosmatopoulos, E. Review and Evaluation of Multi-Agent Control Applications for Energy Management in Buildings. Energies 2024, 17, 4835. [Google Scholar] [CrossRef]
- Dogru, O.; Xie, J.; Prakash, O.; Chiplunkar, R.; Soesanto, J.; Chen, H.; Velswamy, K.; Ibrahim, F.; Huang, B. Reinforcement learning in process industries: Review and perspective. IEEE/CAA J. Autom. Sin. 2024, 11, 283–300. [Google Scholar] [CrossRef]
- Nian, R.; Liu, J.; Huang, B. A review on reinforcement learning: Introduction and applications in industrial process control. Comput. Chem. Eng. 2020, 139, 106886. [Google Scholar] [CrossRef]
- Espinosa-Leal, L.; Chapman, A.; Westerlund, M. Autonomous industrial management via reinforcement learning. J. Intell. Fuzzy Syst. 2020, 39, 8427–8439. [Google Scholar] [CrossRef]
- Chen, X.; Hu, J.; Chen, Z.; Lin, B.; Xiong, N.; Min, G. A reinforcement learning-empowered feedback control system for industrial internet of things. IEEE Trans. Ind. Inform. 2021, 18, 2724–2733. [Google Scholar] [CrossRef]
- Meyes, R.; Tercan, H.; Roggendorf, S.; Thiele, T.; Büscher, C.; Obdenbusch, M.; Brecher, C.; Jeschke, S.; Meisen, T. Motion planning for industrial robots using reinforcement learning. Procedia CIRP 2017, 63, 107–112. [Google Scholar] [CrossRef]
- Orr, J.; Dutta, A. Multi-agent deep reinforcement learning for multi-robot applications: A survey. Sensors 2023, 23, 3625. [Google Scholar] [CrossRef]
- Singh, B.; Kumar, R.; Singh, V.P. Reinforcement learning in robotic applications: A comprehensive survey. Artif. Intell. Rev. 2022, 55, 945–990. [Google Scholar] [CrossRef]
- Wang, Y.; Damani, M.; Wang, P.; Cao, Y.; Sartoretti, G. Distributed reinforcement learning for robot teams: A review. Curr. Robot. Rep. 2022, 3, 239–257. [Google Scholar] [CrossRef]
- Noaeen, M.; Naik, A.; Goodman, L.; Crebo, J.; Abrar, T.; Abad, Z.S.H.; Bazzan, A.L.; Far, B. Reinforcement learning in urban network traffic signal control: A systematic literature review. Expert Syst. Appl. 2022, 199, 116830. [Google Scholar] [CrossRef]
- Xiao, Y.; Liu, J.; Wu, J.; Ansari, N. Leveraging deep reinforcement learning for traffic engineering: A survey. IEEE Commun. Surv. Tutorials 2021, 23, 2064–2097. [Google Scholar] [CrossRef]
- Borges, F.d.S.P.; Fonseca, A.P.; Garcia, R.C. Deep reinforcement learning model to mitigate congestion in real-time traffic light networks. Infrastructures 2021, 6, 138. [Google Scholar] [CrossRef]
- Lai, X.; Yang, Z.; Xie, J.; Liu, Y. Reinforcement learning in transportation research: Frontiers and future directions. Multimodal Transp. 2024, 100164. [Google Scholar] [CrossRef]
- Wang, B.; He, Z.; Sheng, J.; Chen, Y. Deep reinforcement learning for traffic light timing optimization. Processes 2022, 10, 2458. [Google Scholar] [CrossRef]
- Yau, K.L.A.; Qadir, J.; Khoo, H.L.; Ling, M.H.; Komisarczuk, P. A survey on reinforcement learning models and algorithms for traffic signal control. ACM Comput. Surv. (CSUR) 2017, 50, 1–38. [Google Scholar] [CrossRef]
- Miletić, M.; Ivanjko, E.; Gregurić, M.; Kušić, K. A review of reinforcement learning applications in adaptive traffic signal control. IET Intell. Transp. Syst. 2022, 16, 1269–1285. [Google Scholar] [CrossRef]
- Zhao, H.; Dong, C.; Cao, J.; Chen, Q. A survey on deep reinforcement learning approaches for traffic signal control. Eng. Appl. Artif. Intell. 2024, 133, 108100. [Google Scholar] [CrossRef]
- Tang, K.; Boltze, M.; Nakamura, H.; Tian, Z. Global Practices on Road Traffic Signal Control: Fixed-Time Control at Isolated Intersections; Elsevier: Amsterdam, The Netherlands, 2019. [Google Scholar]
- Muralidharan, A.; Pedarsani, R.; Varaiya, P. Analysis of fixed-time control. Transp. Res. Part B Methodol. 2015, 73, 81–90. [Google Scholar] [CrossRef]
- Liu, Y.; Li, H.; Lu, R.; Zuo, Z.; Li, X. An overview of finite/fixed-time control and its application in engineering systems. IEEE/CAA J. Autom. Sin. 2022, 9, 2106–2120. [Google Scholar] [CrossRef]
- Ferrara, A.; Sacone, S.; Siri, S.; Ferrara, A.; Sacone, S.; Siri, S. An overview of traffic control schemes for freeway systems. In Freeway Traffic Modelling and Control; Springer: Berlin/Heidelberg, Germany, 2018; pp. 193–234. [Google Scholar]
- Kalašová, A.; Hájnik, A.; Kubal’ák, S.; Beňuš, J.; Harantová, V. The impact of actuated control on the environment and the traffic flow. J. Appl. Eng. Sci. 2022, 20, 305–314. [Google Scholar] [CrossRef]
- Dabiri, S.; Kompany, K.; Abbas, M. Introducing a cost-effective approach for improving the arterial traffic performance operating under the semi-actuated coordinated signal control. Transp. Res. Rec. 2018, 2672, 71–80. [Google Scholar] [CrossRef]
- Zheng, X.; Recker, W.; Chu, L. Optimization of control parameters for adaptive traffic-actuated signal control. J. Intell. Transp. Syst. 2010, 14, 95–108. [Google Scholar] [CrossRef]
- Lin, F.B. Knowledge base on semi-actuated traffic-signal control. J. Transp. Eng. 1991, 117, 398–417. [Google Scholar] [CrossRef]
- Xu, H.; Zhang, K.; Zhang, D.; Zheng, Q. Traffic-responsive control technique for fully-actuated coordinated signals. IEEE Trans. Intell. Transp. Syst. 2021, 23, 5460–5469. [Google Scholar] [CrossRef]
- Mannion, P.; Duggan, J.; Howley, E. An experimental review of reinforcement learning algorithms for adaptive traffic signal control. In Autonomic Road Transport Support Systems; Springer: Berlin/Heidelberg, Germany, 2016; pp. 47–66. [Google Scholar]
- Van den Berg, M.; Hegyi, A.; De Schutter, B.; Hellendoorn, H. Integrated traffic control for mixed urban and freeway networks: A model predictive control approach. Eur. J. Transp. Infrastruct. Res. 2007, 7. [Google Scholar]
- Mitsakis, E.; Salanova, J.M.; Giannopoulos, G. Combined dynamic traffic assignment and urban traffic control models. Procedia-Soc. Behav. Sci. 2011, 20, 427–436. [Google Scholar] [CrossRef]
- Abdelghany, K.F.; Valdes, D.M.; Abdelfatah, A.S.; Mahmassani, H.S. Real-time dynamic traffic assignment and path-based signal coordination; application to network traffic management. Transp. Res. Rec. 1999, 1667, 67–76. [Google Scholar] [CrossRef]
- Ashtiani, F.; Fayazi, S.A.; Vahidi, A. Multi-intersection traffic management for autonomous vehicles via distributed mixed integer linear programming. In Proceedings of the 2018 Annual American Control Conference (ACC), Milwaukee, WI, USA, 27–29 June 2018; pp. 6341–6346. [Google Scholar]
- Fayazi, S.A.; Vahidi, A. Mixed-integer linear programming for optimal scheduling of autonomous vehicle intersection crossing. IEEE Trans. Intell. Veh. 2018, 3, 287–299. [Google Scholar] [CrossRef]
- Ng, K.M.; Reaz, M.B.I.; Ali, M.A.M. A review on the applications of Petri nets in modeling, analysis, and control of urban traffic. IEEE Trans. Intell. Transp. Syst. 2013, 14, 858–870. [Google Scholar] [CrossRef]
- Papadopoulos, G.; Bastas, A.; Vouros, G.A.; Crook, I.; Andrienko, N.; Andrienko, G.; Cordero, J.M. Deep reinforcement learning in service of air traffic controllers to resolve tactical conflicts. Expert Syst. Appl. 2024, 236, 121234. [Google Scholar] [CrossRef]
- Shashi, F.I.; Sultan, S.M.; Khatun, A.; Sultana, T.; Alam, T. A study on deep reinforcement learning based traffic signal control for mitigating traffic congestion. In Proceedings of the 2021 IEEE 3rd Eurasia Conference on Biomedical Engineering, Healthcare and Sustainability (ECBIOS), Tainan, Taiwan, 28–30 May 2021; pp. 288–291. [Google Scholar]
- Koukol, M.; Zajíčková, L.; Marek, L.; Tuček, P. Fuzzy logic in traffic engineering: A review on signal control. Math. Probl. Eng. 2015, 2015, 979160. [Google Scholar] [CrossRef]
- Mohanaselvi, S.; Shanpriya, B. Application of fuzzy logic to control traffic signals. AIP Conf. Proc. 2019, 2112, 020045. [Google Scholar]
- Mao, T.; Mihăită, A.S.; Chen, F.; Vu, H.L. Boosted genetic algorithm using machine learning for traffic control optimization. IEEE Trans. Intell. Transp. Syst. 2021, 23, 7112–7141. [Google Scholar] [CrossRef]
- Lee, J.; Abdulhai, B.; Shalaby, A.; Chung, E.H. Real-time optimization for adaptive traffic signal control using genetic algorithms. J. Intell. Transp. Syst. 2005, 9, 111–122. [Google Scholar] [CrossRef]
- Wu, B.; Wang, D. Traffic signal networks control optimize with PSO algorithm. In Proceedings of the 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), Changsha, China, 13–15 August 2016; pp. 230–234. [Google Scholar]
- Zhao, H.; Han, G.; Niu, X. The signal control optimization of road intersections with slow traffic based on improved PSO. Mob. Netw. Appl. 2020, 25, 623–631. [Google Scholar] [CrossRef]
- Garcia-Nieto, J.; Olivera, A.C.; Alba, E. Optimal cycle program of traffic lights with particle swarm optimization. IEEE Trans. Evol. Comput. 2013, 17, 823–839. [Google Scholar] [CrossRef]
- Li, D.; De Schutter, B. Distributed model-free adaptive predictive control for urban traffic networks. IEEE Trans. Control. Syst. Technol. 2021, 30, 180–192. [Google Scholar] [CrossRef]
- Michailidis, P.; Michailidis, I.; Kosmatopoulos, E. Reinforcement Learning for Optimizing Renewable Energy Utilization in Buildings: A Review on Applications and Innovations. Energies 2025, 18, 1724. [Google Scholar] [CrossRef]
- Wang, Y.; Yang, X.; Liang, H.; Liu, Y. A review of the self-adaptive traffic signal control system based on future traffic environment. J. Adv. Transp. 2018, 2018, 1096123. [Google Scholar] [CrossRef]
- Mohammadi, P.; Nasiri, A.; Darshi, R.; Shirzad, A.; Abdollahipour, R. Achieving Cost Efficiency in Cloud Data Centers Through Model-Free Q-Learning. In Proceedings of the International Conference on Electrical and Electronics Engineering, Melbourne, Australia, 11–12 September 2024; pp. 457–468. [Google Scholar]
- Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Zeng, J.; Hu, J.; Zhang, Y. Adaptive traffic signal control with deep recurrent Q-learning. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; pp. 1215–1220. [Google Scholar]
- Ducrocq, R.; Farhi, N. Deep reinforcement Q-learning for intelligent traffic signal control with partial detection. Int. J. Intell. Transp. Syst. Res. 2023, 21, 192–206. [Google Scholar] [CrossRef]
- Guerrero-Ibanez, A.; Contreras-Castillo, J.; Buenrostro, R.; Marti, A.B.; Muñoz, A.R. A policy-based multi-agent management approach for intelligent traffic-light control. In Proceedings of the 2010 IEEE Intelligent Vehicles Symposium, La Jolla, CA, USA, 21–24 June 2010; pp. 694–699. [Google Scholar]
- Li, Y.; He, J.; Gao, Y. Intelligent traffic signal control with deep reinforcement learning at single intersection. In Proceedings of the 2021 7th International Conference on Computing and Artificial Intelligence, Tianjin, China, 23–26 April 2021; pp. 399–406. [Google Scholar]
- Huang, L.; Qu, X. Improving traffic signal control operations using proximal policy optimization. IET Intell. Transp. Syst. 2023, 17, 592–605. [Google Scholar] [CrossRef]
- Wei, H.; Liu, X.; Mashayekhy, L.; Decker, K. Mixed-autonomy traffic control with proximal policy optimization. In Proceedings of the 2019 IEEE Vehicular Networking Conference (VNC), Los Angeles, CA, USA, 4–6 December 2019; pp. 1–8. [Google Scholar]
- Mao, F.; Li, Z.; Lin, Y.; Li, L. Mastering arterial traffic signal control with multi-agent attention-based soft actor-critic model. IEEE Trans. Intell. Transp. Syst. 2022, 24, 3129–3144. [Google Scholar] [CrossRef]
- Lillicrap, T. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Ding, F.; Ma, G.; Chen, Z.; Gao, J.; Li, P. Averaged Soft Actor-Critic for Deep Reinforcement Learning. Complexity 2021, 2021, 6658724. [Google Scholar] [CrossRef]
- Gregurić, M.; Vujić, M.; Alexopoulos, C.; Miletić, M. Application of deep reinforcement learning in traffic signal control: An overview and impact of open traffic data. Appl. Sci. 2020, 10, 4011. [Google Scholar] [CrossRef]
- Ayeelyan, J.; Lee, G.H.; Hsu, H.C.; Hsiung, P.A. Advantage actor-critic for autonomous intersection management. Vehicles 2022, 4, 1391–1412. [Google Scholar] [CrossRef]
- Li, L.; Lv, Y.; Wang, F.Y. Traffic signal timing via deep reinforcement learning. IEEE/CAA J. Autom. Sin. 2016, 3, 247–254. [Google Scholar] [CrossRef]
- Nishi, T.; Otaki, K.; Hayakawa, K.; Yoshimura, T. Traffic signal control based on reinforcement learning with graph convolutional neural nets. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 877–883. [Google Scholar]
- Jin, J.; Ma, X. Hierarchical multi-agent control of traffic lights based on collective learning. Eng. Appl. Artif. Intell. 2018, 68, 236–248. [Google Scholar] [CrossRef]
- Aziz, H.A.; Zhu, F.; Ukkusuri, S.V. Learning-based traffic signal control algorithms with neighborhood information sharing: An application for sustainable mobility. J. Intell. Transp. Syst. 2018, 22, 40–52. [Google Scholar] [CrossRef]
- Guo, M.; Wang, P.; Chan, C.Y.; Askary, S. A reinforcement learning approach for intelligent traffic signal control at urban intersections. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 4242–4247. [Google Scholar]
- Liang, X.; Du, X.; Wang, G.; Han, Z. A deep q learning network for traffic lights’ cycle control in vehicular networks. IEEE Trans. Veh. Technol. 2019, 68, 1243–1253. [Google Scholar] [CrossRef]
- Tan, T.; Bao, F.; Deng, Y.; Jin, A.; Dai, Q.; Wang, J. Cooperative deep reinforcement learning for large-scale traffic grid signal control. IEEE Trans. Cybern. 2019, 50, 2687–2700. [Google Scholar] [CrossRef]
- Zhang, R.; Ishikawa, A.; Wang, W.; Striner, B.; Tonguz, O.K. Using reinforcement learning with partial vehicle detection for intelligent traffic signal control. IEEE Trans. Intell. Transp. Syst. 2020, 22, 404–415. [Google Scholar] [CrossRef]
- Zhou, P.; Chen, X.; Liu, Z.; Braud, T.; Hui, P.; Kangasharju, J. DRLE: Decentralized reinforcement learning at the edge for traffic light control in the IoV. IEEE Trans. Intell. Transp. Syst. 2020, 22, 2262–2273. [Google Scholar] [CrossRef]
- Zang, X.; Yao, H.; Zheng, G.; Xu, N.; Xu, K.; Li, Z. Metalight: Value-based meta-reinforcement learning for traffic signal control. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 1153–1160. [Google Scholar]
- Rasheed, F.; Yau, K.L.A.; Noor, R.M.; Wu, C.; Low, Y.C. Deep reinforcement learning for traffic signal control: A review. IEEE Access 2020, 8, 208016–208044. [Google Scholar] [CrossRef]
- Padakandla, S.; KJ, P.; Bhatnagar, S. Reinforcement learning algorithm for non-stationary environments. Appl. Intell. 2020, 50, 3590–3606. [Google Scholar] [CrossRef]
- Chen, C.; Wei, H.; Xu, N.; Zheng, G.; Yang, M.; Xiong, Y.; Xu, K.; Li, Z. Toward a thousand lights: Decentralized deep reinforcement learning for large-scale traffic signal control. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 3414–3421. [Google Scholar]
- Essa, M.; Sayed, T. Self-learning adaptive traffic signal control for real-time safety optimization. Accid. Anal. Prev. 2020, 146, 105713. [Google Scholar] [CrossRef] [PubMed]
- Gong, Y.; Abdel-Aty, M.; Yuan, J.; Cai, Q. Multi-objective reinforcement learning approach for improving safety at intersections with adaptive traffic signal control. Accid. Anal. Prev. 2020, 144, 105655. [Google Scholar] [CrossRef]
- Joo, H.; Ahmed, S.H.; Lim, Y. Traffic signal control for smart cities using reinforcement learning. Comput. Commun. 2020, 154, 324–330. [Google Scholar] [CrossRef]
- Wang, X.; Ke, L.; Qiao, Z.; Chai, X. Large-scale traffic signal control using a novel multiagent reinforcement learning. IEEE Trans. Cybern. 2020, 51, 174–187. [Google Scholar] [CrossRef]
- Xu, M.; Wu, J.; Huang, L.; Zhou, R.; Wang, T.; Hu, D. Network-wide traffic signal control based on the discovery of critical nodes and deep reinforcement learning. J. Intell. Transp. Syst. 2020, 24, 1–10. [Google Scholar] [CrossRef]
- Devailly, F.X.; Larocque, D.; Charlin, L. IG-RL: Inductive graph reinforcement learning for massive-scale traffic signal control. IEEE Trans. Intell. Transp. Syst. 2021, 23, 7496–7507. [Google Scholar] [CrossRef]
- Mushtaq, A.; Haq, I.U.; Imtiaz, M.U.; Khan, A.; Shafiq, O. Traffic flow management of autonomous vehicles using deep reinforcement learning and smart rerouting. IEEE Access 2021, 9, 51005–51019. [Google Scholar] [CrossRef]
- Wang, T.; Cao, J.; Hussain, A. Adaptive Traffic Signal Control for large-scale scenario with Cooperative Group-based Multi-agent reinforcement learning. Transp. Res. Part C Emerg. Technol. 2021, 125, 103046. [Google Scholar] [CrossRef]
- Haddad, T.A.; Hedjazi, D.; Aouag, S. A deep reinforcement learning-based cooperative approach for multi-intersection traffic signal control. Eng. Appl. Artif. Intell. 2022, 114, 105019. [Google Scholar] [CrossRef]
- Kolat, M.; Kővári, B.; Bécsi, T.; Aradi, S. Multi-agent reinforcement learning for traffic signal control: A cooperative approach. Sustainability 2023, 15, 3479. [Google Scholar] [CrossRef]
- Bouktif, S.; Cheniki, A.; Ouni, A.; El-Sayed, H. Deep reinforcement learning for traffic signal control with consistent state and reward design approach. Knowl.-Based Syst. 2023, 267, 110440. [Google Scholar] [CrossRef]
- Yazdani, M.; Sarvi, M.; Bagloee, S.A.; Nassir, N.; Price, J.; Parineh, H. Intelligent vehicle pedestrian light (IVPL): A deep reinforcement learning approach for traffic signal control. Transp. Res. Part C Emerg. Technol. 2023, 149, 103991. [Google Scholar] [CrossRef]
- Su, Z.; Chow, A.H.; Fang, C.; Liang, E.; Zhong, R. Hierarchical control for stochastic network traffic with reinforcement learning. Transp. Res. Part B Methodol. 2023, 167, 196–216. [Google Scholar] [CrossRef]
- Du, W.; Ye, J.; Gu, J.; Li, J.; Wei, H.; Wang, G. Safelight: A reinforcement learning method toward collision-free traffic signal control. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 14801–14810. [Google Scholar]
- Li, D.; Zhu, F.; Wu, J.; Wong, Y.D.; Chen, T. Managing mixed traffic at signalized intersections: An adaptive signal control and CAV coordination system based on deep reinforcement learning. Expert Syst. Appl. 2024, 238, 121959. [Google Scholar] [CrossRef]
- Vieira, M.A.; Galvão, G.; Vieira, M.; Louro, P.; Vestias, M.; Vieira, P. Enhancing urban intersection efficiency: Visible light communication and learning-based control for traffic signal optimization and vehicle management. Symmetry 2024, 16, 240. [Google Scholar] [CrossRef]
- Zhang, G.; Chang, F.; Jin, J.; Yang, F.; Huang, H. Multi-objective deep reinforcement learning approach for adaptive traffic signal control system with concurrent optimization of safety, efficiency, and decarbonization at intersections. Accid. Anal. Prev. 2024, 199, 107451. [Google Scholar] [CrossRef]
- Mousavi, S.S.; Schukat, M.; Howley, E. Traffic light control using deep policy-gradient and value-function-based reinforcement learning. IET Intell. Transp. Syst. 2017, 11, 417–423. [Google Scholar] [CrossRef]
- Garg, D.; Chli, M.; Vogiatzis, G. Deep reinforcement learning for autonomous traffic light control. In Proceedings of the 2018 3rd IEEE International Conference on Intelligent Transportation Engineering (ICITE), Singapore, 3–5 September 2018; pp. 214–218. [Google Scholar]
- Oroojlooy, A.; Nazari, M.; Hajinezhad, D.; Silva, J. Attendlight: Universal attention-based reinforcement learning model for traffic signal control. Adv. Neural Inf. Process. Syst. 2020, 33, 4079–4090. [Google Scholar]
- Guo, J.; Cheng, L.; Wang, S. CoTV: Cooperative control for traffic light signals and connected autonomous vehicles using deep reinforcement learning. IEEE Trans. Intell. Transp. Syst. 2023, 24, 10501–10512. [Google Scholar] [CrossRef]
- Aslani, M.; Mesgari, M.S.; Wiering, M. Adaptive traffic signal control with actor-critic methods in a real-world traffic network with different traffic disruption events. Transp. Res. Part C Emerg. Technol. 2017, 85, 732–752. [Google Scholar] [CrossRef]
- Chu, T.; Wang, J.; Codecà, L.; Li, Z. Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans. Intell. Transp. Syst. 2019, 21, 1086–1095. [Google Scholar] [CrossRef]
- Li, Z.; Yu, H.; Zhang, G.; Dong, S.; Xu, C.Z. Network-wide traffic signal control optimization using a multi-agent deep reinforcement learning. Transp. Res. Part C Emerg. Technol. 2021, 125, 103059. [Google Scholar] [CrossRef]
- Li, Z.; Xu, C.; Zhang, G. A deep reinforcement learning approach for traffic signal control optimization. arXiv 2021, arXiv:2107.06115. [Google Scholar]
- Guo, Y.; Ma, J. DRL-TP3: A learning and control framework for signalized intersections with mixed connected automated traffic. Transp. Res. Part C Emerg. Technol. 2021, 132, 103416. [Google Scholar] [CrossRef]
- Yang, S.; Yang, B.; Kang, Z.; Deng, L. IHG-MA: Inductive heterogeneous graph multi-agent reinforcement learning for multi-intersection traffic signal control. Neural Netw. 2021, 139, 265–277. [Google Scholar] [CrossRef] [PubMed]
- Damadam, S.; Zourbakhsh, M.; Javidan, R.; Faroughi, A. An intelligent IoT based traffic light management system: Deep reinforcement learning. Smart Cities 2022, 5, 1293–1311. [Google Scholar] [CrossRef]
- Wu, Q.; Wu, J.; Shen, J.; Du, B.; Telikani, A.; Fahmideh, M.; Liang, C. Distributed agent-based deep reinforcement learning for large scale traffic signal control. Knowl.-Based Syst. 2022, 241, 108304. [Google Scholar] [CrossRef]
- Mo, Z.; Li, W.; Fu, Y.; Ruan, K.; Di, X. CVLight: Decentralized learning for adaptive traffic signal control with connected vehicles. Transp. Res. Part C Emerg. Technol. 2022, 141, 103728. [Google Scholar] [CrossRef]
- Su, H.; Zhong, Y.D.; Chow, J.Y.; Dey, B.; Jin, L. EMVLight: A multi-agent reinforcement learning framework for an emergency vehicle decentralized routing and traffic signal control system. Transp. Res. Part C Emerg. Technol. 2023, 146, 103955. [Google Scholar] [CrossRef]
- Darmoul, S.; Elkosantini, S.; Louati, A.; Said, L.B. Multi-agent immune networks to control interrupted flow at signalized intersections. Transp. Res. Part C Emerg. Technol. 2017, 82, 290–313. [Google Scholar] [CrossRef]
- Wei, H.; Xu, N.; Zhang, H.; Zheng, G.; Zang, X.; Chen, C.; Zhang, W.; Zhu, Y.; Xu, K.; Li, Z. Colight: Learning network-level cooperation for traffic signal control. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 1913–1922. [Google Scholar]
- Kim, D.; Jeong, O. Cooperative traffic signal control with traffic flow prediction in multi-intersection. Sensors 2019, 20, 137. [Google Scholar] [CrossRef]
- Wu, T.; Zhou, P.; Liu, K.; Yuan, Y.; Wang, X.; Huang, H.; Wu, D.O. Multi-agent deep reinforcement learning for urban traffic light control in vehicular networks. IEEE Trans. Veh. Technol. 2020, 69, 8243–8256. [Google Scholar] [CrossRef]
- Xu, B.; Wang, Y.; Wang, Z.; Jia, H.; Lu, Z. Hierarchically and cooperatively learning traffic signal control. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; Volume 35, pp. 669–677. [Google Scholar]
- Kumar, N.; Rahman, S.S.; Dhakad, N. Fuzzy inference enabled deep reinforcement learning-based traffic light control for intelligent transportation system. IEEE Trans. Intell. Transp. Syst. 2020, 22, 4919–4928. [Google Scholar] [CrossRef]
- Wang, M.; Wu, L.; Li, J.; He, L. Traffic signal control with reinforcement learning based on region-aware cooperative strategy. IEEE Trans. Intell. Transp. Syst. 2021, 23, 6774–6785. [Google Scholar] [CrossRef]
- Abdoos, M.; Bazzan, A.L. Hierarchical traffic signal optimization using reinforcement learning and traffic prediction with long-short term memory. Expert Syst. Appl. 2021, 171, 114580. [Google Scholar] [CrossRef]
- Guo, G.; Wang, Y. An integrated MPC and deep reinforcement learning approach to trams-priority active signal control. Control Eng. Pract. 2021, 110, 104758. [Google Scholar] [CrossRef]
- Wang, M.; Wu, L.; Li, M.; Wu, D.; Shi, X.; Ma, C. Meta-learning based spatial-temporal graph attention network for traffic signal control. Knowl.-Based Syst. 2022, 250, 109166. [Google Scholar] [CrossRef]
- Tunc, I.; Soylemez, M.T. Fuzzy logic and deep Q learning based control for traffic lights. Alex. Eng. J. 2023, 67, 343–359. [Google Scholar] [CrossRef]
- Wang, T.; Zhu, Z.; Zhang, J.; Tian, J.; Zhang, W. A large-scale traffic signal control algorithm based on multi-layer graph deep reinforcement learning. Transp. Res. Part C Emerg. Technol. 2024, 162, 104582. [Google Scholar] [CrossRef]
- Krajzewicz, D. Traffic simulation with SUMO–simulation of urban mobility. In Fundamentals of Traffic Simulation; Springer: Berlin/Heidelberg, Germany, 2010; pp. 269–293. [Google Scholar]
- Fellendorf, M.; Vortisch, P. Microscopic traffic flow simulator VISSIM. In Fundamentals of Traffic Simulation; Springer: Berlin/Heidelberg, Germany, 2010; pp. 63–93. [Google Scholar]
- Casas, J.; Ferrer, J.L.; Garcia, D.; Perarnau, J.; Torday, A. Traffic simulation with aimsun. In Fundamentals of Traffic Simulation; Springer: Berlin/Heidelberg, Germany, 2010; pp. 173–232. [Google Scholar]
- Zhang, H.; Feng, S.; Liu, C.; Ding, Y.; Zhu, Y.; Zhou, Z.; Zhang, W.; Yu, Y.; Jin, H.; Li, Z. Cityflow: A multi-agent reinforcement learning environment for large scale city traffic scenario. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 3620–3624. [Google Scholar]
- Sykes, P. Traffic simulation with paramics. In Fundamentals of Traffic Simulation; Springer: Berlin/Heidelberg, Germany, 2010; pp. 131–171. [Google Scholar]
- Messaoudi, F.; Simon, G.; Ksentini, A. Dissecting games engines: The case of Unity3D. In Proceedings of the 2015 International Workshop on Network and Systems Support for Games (NetGames), Zagreb, Croatia, 3–4 December 2015; pp. 1–6. [Google Scholar]
Evaluation Aspect | [57] | [58] | [59] | [60] | Current |
---|---|---|---|---|---|
RL Methodology Types | x | x | x | x | x |
Agent Architectures | x | x | x | ||
Reward Function Analysis | x | x | x | x | |
Baseline Control Approaches | x | x | x | x | |
Performance Indexes | x | x | x | x | x |
Intersection Size | x | x | x | ||
Practical Applicability | x | x | |||
Simulation Environments | x | x | x | x | |
Statistical Analysis | x | ||||
Existing Challenges | x | x | x | x | x |
Trend Identification | x | x | x | ||
Future Directions | x | x | x | ||
Systematic Evaluation | No | No | Small | Medium | Large |
Ref. | Year | Method | Agent | Baseline | City Network | Junctions | Simulation | Data | Cit. |
---|---|---|---|---|---|---|---|---|---|
[103] | 2016 | DQN | Single | QL | Synthetic | 1 | PARAMICS | Sim | 482 |
[104] | 2018 | NFQI | Single | NFQI | Synthetic | 6 | SUMO | Sim | 119 |
[105] | 2018 | SARSA | Multi | AC | Stockholm (SE) | 3 | SUMO | Sim | 59 |
[106] | 2018 | RMART | Multi | FT/QL/SARSA | Tippecanoe (US) | 18 | Vissim | Sim | 50 |
[107] | 2019 | DQN | Single | FT/Other | Synthetic | 1 | SUMO | Sim | 60 |
[108] | 2019 | DQN | Single | FT/DQN | Synthetic | 1 | SUMO | Sim | 429 |
[109] | 2019 | DQN | Multi | NAQL/RBC | Synthetic | 24 | SUMO | Sim | 160 |
[110] | 2020 | DQN | Single | FT/AC | Luxembourg(LU) | 1 | SUMO | Sim | 100 |
[111] | 2020 | DQN/dDQN | Multi | FT/AC/PPO/DQN | Synthetic | 25/100/225 | SUMO | Sim | 51 |
[112] | 2020 | DQN | Single | FT/RL/SOTL | Hangzhou (CN) | 24 | CityFlow | Real | 128 |
[113] | 2020 | DQN | Multi | FT/QL | Sunway (MY) | 7 | SUMO | Sim | 57 |
[114] | 2020 | Context QL | Single | QL/RUQL | Synthetic | 1 | Vissim | Sim | 75 |
[115] | 2020 | DQN | Multi | FT/FRAP | Manhattan (US) | 2510 | CityFlow | Both | 262 |
[116] | 2020 | QL | Single | AC | Surrey (CA) | 2 | Vissim | Real | 50 |
[117] | 2020 | 3DQN | Single | AC/Other RL | Seminole (US) | 1 | Aimsun | Real | 45 |
[118] | 2020 | QL | Single | QL | Synthetic | 1 | SUMO | Sim | 76 |
[119] | 2020 | Co-DQL | Multi | RL | Qingdao (CN) | 47/57 | SUMO | Sim | 120 |
[120] | 2020 | DRQN | Multi | FT/QL/DQN | Synthetic | 20/50/100 | SUMO | Sim | 59 |
[121] | 2021 | IQL | Multi | FT/IQL/RBC | Manhattan (US) | 3971 | SUMO | Sim | 50 |
[122] | 2021 | DQN | Single | FT | Islamabad (PK) | 1 | SUMO | Sim | 41 |
[123] | 2021 | CGB-MAQL | Multi | FT/Other RL | Hangzhou (CN) | 16 | CityFlow | Real | 95 |
[124] | 2022 | DQN | Multi | DQN/Other RL | Synthetic | 4/6 | SUMO | Sim | 37 |
[125] | 2023 | QL | Multi | FT/AC | Synthetic | 6 | SUMO | Sim | 26 |
[126] | 2023 | DDQN | Single | FT/PS/Other | Synthetic | 1 | SUMO | Sim | 45 |
[127] | 2023 | DDQN | Single | AC | Melbourne(AU) | 1 | Vissim | Real | 30 |
[128] | 2023 | DQN | Single | FT/Other | Bloomsbury(UK) | 25/15 | SUMO | Sim | 25 |
[129] | 2023 | 3DQN | Single | FT/Other | Cologne (DE) | 1 | SUMO | Both | 21 |
[130] | 2024 | DQN | Multi | FT/DQN/RBC | Synthetic | 1 | SUMO | Sim | 28 |
[131] | 2024 | DQN | Multi | FT | Lisbon (PT) | 2 | SUMO | Sim | 10 |
[132] | 2024 | D3QN | Single | FT/AC/D3QN/Other | Changsha (CN) | 1 | SUMO | Real | 16 |
Author | Summary |
---|---|
Liang et al. [108] | Introduced a single-agent DQN framework where intersections were modeled as grids with vehicle position and speed as state inputs. Actions adjusted signal phase duration in 5 s increments, and the reward concerned the alteration in cumulative wait time between cycles. The model used dueling and double QL with prioritized replay. SUMO simulations showed a 26.7% reduction in average waiting time (AWT) compared to FT, ATSC, and standard DQN. |
Tan et al. [109] | Proposed CODER, a hierarchical MARL system with regional agents using DQN for local control and a global agent for coordination. The state was based on queue length per direction, and actions toggled NS/EW green lights. Reward penalized high queue length and sparse movement. In a 4 × 6 SUMO network, CODER achieved 30% congestion reduction over linear QL and RBC. |
Zhang et al. [110] | Developed a DQN adaptive TSC system (PD-ITSC) using low-rate vehicle detection via V2I. The state included detected vehicle speeds, queue length, distance to stop line, and signal phase; actions involved switching or holding phases, while the reward function minimized vehicle delay. Trained in SUMO, it reduced AWT by up to 30% and remained robust even with just 20% detection rates. |
Zhou et al. [111] | Presented DRLE, a distributed DQN/dDQN-based MARL architecture operating at three levels: intersection, intra-edge, and inter-edge servers. Input states included the number of halted vehicles, speed lag, and traffic light status; agents perform binary signal switching, while the reward penalized stops and speed loss. SUMO results showed 65.66% faster convergence vs. PPO. |
Zang et al. [112] | Proposed MetaLight, a meta-RL system combining MAML and FRAP++ (DQN-based) to enable quick policy adaptation across networks. The agent used lane-based queue length as state and selected traffic light phases to minimize queues. Reward was focused on queue length minimization. Evaluated in CityFlow on four real-world datasets, Metalight reduced travel time by up to 22.57%. |
Rasheed et al. [113] | Developed a MARL DQN for disturbed environments integrating weather data and red timings in the state. Agents coordinated to dynamically adapt phase splits, with reward based on reduced wait and queue length. Trained in SUMO, it reduced queue by 75%, wait time by 70%, and increased throughput by 70% over baselines. |
Padakandla et al. [114] | Introduced Context QL to adapt to non-stationary traffic by integrating change detection into a QL agent. The state included queue length and signal phase, while actions modified green durations. Reward optimized flow under shifting conditions. In VISSIM, it reduced queue length by 21.4% vs. QL and 45.3% vs. RUQL. |
Joo et al. [118] | Presented a single-agent QL framework using queue length and throughput as input and selecting from three fixed phases. The reward minimized queue length deviation and optimized vehicle throughput. Trained in SUMO across 3- to 6-way intersections, queue length was reduced by 63% and waiting time by 40% vs. basic QL. |
Chen et al. [115] | Proposed MPLight, a scalable DQN-based MARL with parameter sharing for large-scale urban networks. Agents used queue length as state and choose from eight phases; reward was pressure based. Trained on 2510 real-world intersections in CityFlow, MPLight reduced travel time by 19.2% and increased throughput by 3.08% vs. GCN, FRAP, and FT. |
Essa et al. [116] | Introduced a QL approach for adaptive signal control focused on safety using CV data. The state captured real-time conflicts, platoon ratio, and shockwaves, with binary actions (green extension/switching). Reward minimized rear-end risks. VISSIM simulations using real data showed a 40% reduction in rear-end conflicts vs. actuated signals. |
Gong et al. [117] | Developed a safety-enhanced ATSC framework using 3DQN combining efficiency and crash risk via a weighted reward. The state included high-res traffic data, and the agent selected signal phases. In AIMSUN, the model reduced delays by 25.93% and crash risk by 8.89% compared to actuated and RL efficiency-only controllers. |
Devailly et al. [121] | Proposed IG-RL, a decentralized MARL using Independent QL with GCNs for spatial traffic patterns. The state is a dynamic lane-based graph, with actions for phase switching. The system transfered policies to unseen networks. In SUMO, it reduced trip time by 20–30% and delay by 15–40% across 3971 Manhattan signals. |
Wang et al. [119] | Presented Co-DQL, a decentralized MARL combining IDQL with UCB exploration, mean-field approximation, and reward allocation. States include queue length and waiting time; actions control phase sequences. In a 49-intersection SUMO setup, Co-DQL cut delay by 22–38% and increased throughput by 15% vs. DDPG and MA2C. |
Xu et al. [120] | Used a DRQN with LSTM to learn from temporal traffic patterns using flow, speed, and signal status. Actions included phase holding or switching, and reward aimed to reduce delay with fairness. In SUMO’s 100-node network, travel time dropped 23.6% vs. FT and outperformed DQN, QL, and SOTL baselines. |
Mushtaq et al. [122] | Proposed a DQN system for AV networks integrated with smart re-routing. The state included vehicle speed, position, and queue length, while action space modified signal timing. SUMO training demonstrated a 31% congestion reduction vs. FT, showing potential scalability in AV-rich environments. |
Wang et al. [123] | Developed CGB-MAQL, a group-based MARL method with spatial reward discounting, pheromone coordination, and k-NN state inputs. Agents controlled adaptive phase sequences. In SUMO-based Monaco and Harbin city models, the approach reduced waiting times by up to 78.84% vs. FT and IQL. |
Haddad et al. [124] | Introduced a cooperative MADQN with neighbor-informed state/action/reward sharing. Input from real-time sensors drives phase decisions. SUMO simulations on 2 × 2 and 2 × 3 networks showed 26.8% lower AWT, 20.54% shorter average queue length, and 18.25% less CO2 than baseline MARL methods. |
Kolat et al. [125] | Designed a MARL DQN with a custom reward minimizing halting std-dev per intersection. State included halting percentage and intersection ID; actions toggled NS/EW green. SUMO (six nodes) showed a 13% travel time and 11% fuel reduction over baseline control. |
Bouktif et al. [126] | Used DDQN with PER and standardized state/reward design across vehicle count, queue length, and waiting time. Actions select among four predefined phases; reward considered the negative of selected state metric. SUMO results showed up to 8.3% travel time reduction over baselines. |
Yazdani et al. [127] | Developed IVPL, a DDQN-based system for joint vehicle and pedestrian control. State included vehicle occupancy, speed, pedestrian volume, and signal status; reward penalized all user delays. In VISSIM, it reduced total delays by 5%, pedestrian delays by 13%, and vehicle travel time by 9% vs. baselines. |
Su et al. [128] | Integrated DQN-based perimeter control with max-pressure signal control in a hierarchical RL framework. Upper-level DQN regulated inflow, while lower-level MP managed internal flow. In SUMO, throughput increased by 100.5% and waiting time decreased by 49.64% vs. baseline control. |
Du et al. [129] | Presented SafeLight, a safety-aware RL system using 3DQN with multi-objective loss and reward shaping. State includes queue length and position; actions support cyclic/acyclic controls. In SUMO using real/synthetic data, it cut collisions by 99% and waiting time by 30% vs. FT. |
Li et al. [130] | Proposed a DQN system integrating adaptive signal control with connected and automated vehicles (CAVs) coordination. States represented traffic mix and phase data; CAVs bypassed signals via platooning. SUMO showed travel time reduction of 37.33% and 15.95% less fuel use under high CAV rates. |
Vieira et al. [131] | Used MARL with visible-light-based V2X localization for dynamic vehicle/pedestrian signal timing. State captured queue, location, and wait info; action selects phase. In a Lisbon SUMO scenario, the method reduced vehicle wait by 30% and pedestrian delay by 30% vs. FT. |
Zhang et al. [132] | Proposed a D3QN-based multi-objective RL system optimizing safety, delay, and emissions using DTSE state encoding. Actions selected one of four signal phases, while reward balanced conflicts, delay, and CO2. In SUMO, it reduced wait by 85.36%, CO2 by 35.58%, and conflicts by 55.39% vs. baselines. |
Ref. | Year | Method | Agent | Baseline | City Network | Junctions | Simulation | Data | Cit. |
---|---|---|---|---|---|---|---|---|---|
[133] | 2017 | PG | Single | FT/ANN | Synthetic | 1 | SUMO | Sim | 253 |
[134] | 2018 | PG | Single | FT | Synthetic | 1 | Unity3D | Sim | 53 |
[135] | 2020 | REINFORCE | Single | FT/MP/FRAP | Atlanta (US) | 9 | SUMO | Real | 55 |
[136] | 2023 | PPO | Multi | PL/Other | Dublin (IE) | 31 | SUMO | Sim | 37 |
Author | Summary |
---|---|
Mousavi et al. [133] | Proposed a CNN-based policy-gradient (PG) model using raw traffic camera images to output phase-selection probabilities. Trained via A2C, the system avoided queue length instabilities and directly optimized delay reduction. It achieved a 67% drop in cumulative delay and 72% queue length reduction compared to baseline control. |
Garg et al. [134] | Introduced a vision-based PG method using pixel-level image input to dynamically control red/green phases. The single-agent RL framework optimized traffic throughput and was trained in a custom Unity3D traffic simulator. It performed comparably to FT signals with expected benefits in non-stationary traffic. |
Oroojlooy et al. [135] | Developed AttendLight, a REINFORCE-based PG system with attention modules allowing generalization across varied intersection layouts. Input states reflected lane/phase configs and reward minimized intersection pressure. Trained in CityFlow, it reduced travel time by 46% vs. FT and 9% over FRAP. |
Guo et al. [136] | Proposed CoTV, a PPO-based MARL for coordinating traffic lights and CAVs using real-time V2X data. States captured vehicle flow and signal status; binary actions switched phases. SUMO results from Dublin scenarios showed 30% shorter travel time and 28% lower CO2 emissions vs. PressLight and actuated control. |
Ref. | Year | Method | Agent | Baseline | City Network | Junctions | Simulation | Data | Cit. |
---|---|---|---|---|---|---|---|---|---|
[137] | 2017 | A-CATs | Single | FT/QL/BQL | Tehran (IR) | 50 | Aimsun | Real | 183 |
[138] | 2019 | MA2C | Multi | QL/A2C | Monaco (MC) | 30 | SUMO | Sim | 653 |
[139] | 2021 | DDPG | Multi | FT/MP/DQN/DDPG | Montgomery(US) | 100 | SUMO | Both | 106 |
[140] | 2021 | DDPG | Multi | FT/DQN/DDPG | Montgomery(US) | N/A | SUMO | Real | 20 |
[141] | 2021 | DDPG | Multi | AC/DP | Synthetic | 1 | Vissim | Sim | 35 |
[142] | 2021 | SAC | Multi | MP/CL/ML/MA2C | Chengdu (CN) | 15/20/121 | SUMO | Sim | 45 |
[143] | 2022 | A2C | Multi | FT | Shiraz (IR) | 6 | SUMO | Real | 38 |
[144] | 2022 | A2C/A3C | Multi | FT/QL/DQN | N/A (CN) | 27 | CityFlow | Real | 49 |
[145] | 2022 | A2C | Multi | DQN/PL/MP | Penn State (US) | 4/4/25 | SUMO | Sim | 38 |
[146] | 2023 | A2C | Multi | FT/PL/MP/CL | Manhattan (US) | 48/16/25 | SUMO | Both | 25 |
Author | Summary |
---|---|
Aslani et al. [137] | Proposed A-CATs, an actor-critic controller using RBF approximation to manage real-world traffic disruptions in Tehran. This framework handled sensor noise and stochastic events better than QL and BQL. Compared to FT control, it reduced travel time by 59.2%, fuel by 19.7%, and emissions by 41.5%. |
Chu et al. [138] | Presented MA2C, a decentralized A2C-based MARL for ATSC, integrating neighbor fingerprints and spatial discounting. Trained in SUMO on a 5 × 5 grid and Monaco network, it reduced queue length by 35%, delay by 28%, and congestion peaks by 40% vs. indep. A2C and QL. |
Li et al. [140] | Used MADDPG for centralized training with decentralized execution, with global critics and local actors. States included vehicle wait time and volume; actions involved flexible phase switching. Trained in SUMO (Montgomery County), it lowered queue length (0.8 vs. 2.3–9.8), delay (5.2 s vs. 9–21 s), and increased throughput (6118 vs. <5600). |
Li et al. [139] | Developed KS-DDPG, extending MARL DDPG with agent communication for shared state learning. Inputs included traffic volume and signal phase, while actions modified signal timing. SUMO tests on synthetic and real networks showed 28.9% queue length and 35.1% delay reduction vs. FT, MaxPressure, DQN, DDPG, and MADDPG. |
Guo et al. [141] | Introduced DRL-TP3 using DDPG and LSTM to optimize signal timing with CAV trajectory prediction. States included speed, position, and acceleration. In VISSIM, it reduced delay by 89.4% and raised throughput by 25.2% vs. actuated and DP-based methods. |
Yang et al. [142] | Proposed HG-MA, an SAC-based MARL model using inductive heterogeneous GNNs for state abstraction. TN-HetG captured signal, lane, and vehicle data; actions controlled lane-wise phases. In SUMO, it reduced delay by 30.5%, QL by 41.8%, and travel time by 29.6% vs. baselines. |
Wu et al. [144] | Developed Nash-A2C and Nash-A3C, integrating Nash equilibrium into A2C/A3C for large-scale traffic control. Input states used queue length, while rewards minimized delay. In CityFlow with 27 intersections, Nash-A3C reduced congestion by 22.1% and network delay by 9.7%. |
Damadam et al. [143] | Used real-time IoT sensor data with A2C in Shiraz City. The fingerprint-based MARL optimized light phase switching to reduce queue length and waiting time. SUMO simulations using real data showed 37.8% queue length and 31.5% wait time reduction vs. baseline controls. |
Mo et al. [145] | Proposed CVLight using Asym-A2C where the actor used connected vehicle (CV) data and the critic includes both CV and non-CV info. States included vehicle count, delay, and phase duration, while reward concerned intersection pressure. SUMO results showed 6.2 s delay at 10% CV, beating PressLight (6.8 s) and DQN (8.7 s). |
Su et al. [146] | Introduced EMVLight, a multi-agent A2C with policy sharing for emergency vehicle (EMV) routing and general traffic. The agent prioritized pressure-based rewards to favor EMVs while balancing network flow. SUMO with real/synthetic maps showed 42.6% faster EMV travel and 23.5% less overall delay vs. multiple baselines. |
Ref. | Year | Method | Agent | Baseline | City Network | Junctions | Simulation | Data | Cit. |
---|---|---|---|---|---|---|---|---|---|
[147] | 2017 | RL/AIN | Multi | FT/Other | Synthetic | 6 | Vissim | Sim | 43 |
[148] | 2019 | DQN/GAT | Multi | MP/GCN | Manhattan (US) | 196/16/12 | CityFlow | Both | 271 |
[149] | 2019 | DQN/LSTM | Multi | QL/DQN | Synthetic | 16 | SUMO | Sim | 60 |
[150] | 2020 | DDPG/LSTM | Multi | FT/DDPG/DQN | Synthetic | 2-16 | SUMO | Sim | 209 |
[151] | 2021 | DQN/PPO | Multi | PL/CL | Jinan (CN) | 12/16/48/33 | CityFlow | Real | 43 |
[152] | 2020 | DQN/FLC | Single | FLC/ANN/RBC | Gwalior (IN) | - | SUMO | Sim | 132 |
[153] | 2021 | A2C/GAT | Multi | A2C/IQL/SOTL | Monaco (MC) | 30 | SUMO | Both | 42 |
[154] | 2021 | QL/LSTM | Multi | FT/QL | Synthetic | 16 | Aimsun | Sim | 50 |
[155] | 2021 | PPO/MPC | Multi | FT/PPO/Other | Dalian (CN) | 9 | SUMO | Both | 38 |
[156] | 2022 | DQN/LSTM | Multi | FT/CL/GAT/Other | Hangzhou (CN) | 16/12/16 | CityFlow | Both | 38 |
[157] | 2023 | DQN/FLC | Single | FT/QL/FLC | Synthetic | 1 | SUMO | Sim | 35 |
[158] | 2024 | DDQN/GAT | Multi | CL/MP/IGRL | Qingdao (CN) | 45/57/12 | SUMO | Sim | 10 |
Author | Summary |
---|---|
Darmoul et al. [147] | Proposed INAMAS, a hierarchical framework combining RL with artificial immune networks for adaptive control. Agents learned phase sequencing offline and adjusted online based on queue length, wait time, and time-of-day states. VISSIM results showed 32.07% and 27.85% delay reduction in high and extreme congestion vs. FT and LQF-MWM. |
Wei et al. [148] | Developed CoLight, a MARL model integrating DQN and graph attention networks (GATs) for spatial-temporal coordination. States included phase and vehicle count per lane; actions selected signal phases. In CityFlow (NY, Jinan, Hangzhou), it reduced travel time by 19.89% vs. MaxPressure and improved efficiency 6.98–11.69%. |
Kim et al. [149] | Enhanced DQN with prioritized experience replay, double/dueling DQN, LSTM traffic prediction, and neighbor sharing in a 4×4 SUMO grid. States used vehicle/lane count and LSTM forecast; actions controlled green phases. Outperformed QL and DQN in delay, wait, and QL under real-world condition modeling. |
Wu et al. [150] | Used multi-agent R-DDPG and LSTM to improve coordination under partial observability. States included queue length, speed, and pedestrian/bus data; actions toggled binary phase. SUMO simulations (≤16 ints.) reduced queue length by 22.08%, vehicle delay by 9.57%, and pedestrian wait by 19.66%. |
Kumar et al. [152] | Combined DQN with fuzzy inference for single-agent phase switching using vehicle grid state. Reward minimized wait, queue length, and emissions. SUMO results showed 14.21% CO2 reduction, shorter AWT, and higher throughput than fuzzy and NN controllers. |
Xu et al. [151] | Proposed HiLight, a hybrid DQN and PPO hierarchical MARL optimizing sub-policy and high-level control. States included phase and vehicle counts, while actions considered the phase choice. Trained in CityFlow (Jinan, Hangzhou, Manhattan), HiLight reduced travel time 9.6–20.1% and improved throughput 2.5–10.5% vs. CoLight and PressLight. |
Wang et al. [153] | Introduced RACS, a hybrid, A2C and GAT, framework that enhanced coordination via neighboring state attention. Tested on Monaco and 5 × 5 grid, it reduced wait time by 22% vs. IA2C, 14% vs. IQL-DNN, and outperformed other baselines in convergence speed and stability. |
Abdoos et al. [154] | Used a hierarchical hybrid, QL and LSTM, prediction model for traffic signal coordination in a 16-intersection AIMSUN network. The state actions concerned queue length, delay, and green time, while the reward considered delay reduction. Achieved 23.7% lower delay vs. FT and standard QL control. |
Guo et al. [155] | Proposed PPOMA, combining PPO with MPC for tram priority and general traffic. States included real-time speed/position, while actions adjusted signal timing. SUMO results showed >90% tram stop reduction and lower wait times vs. baseline control. |
Wang et al. [156] | Presented MetaSTGAT, combining GAT, LSTM, and meta-learning with DQN for adaptive control. Meta-learner updated GAT/LSTM weights dynamically. On CityFlow—using synthetic and real data—travel time dropped by 19.3% vs. CoLight and other baselines |
Tunc et al. [157] | Developed DQ-FLSI, combining DQN for phase and FLC for green duration; states used adaptive lane-based vehicle positions. SUMO results showed average queue length = 8.88 vehicles, delay = 13.32 h, and CO2 = 228.7 kg, outperforming FT, fuzzy, and standard RL controllers. |
Wang et al. [158] | Introduced MGMQ, combining DDQN with GAT and GraphSAGE using multi-layer graphs and action masks for phase control. Tested on SUMO (45 intersections), it reduced delay by 40%, queue length by 26%, and congestion by 35%, outperforming baseline control. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Michailidis, P.; Michailidis, I.; Lazaridis, C.R.; Kosmatopoulos, E. Traffic Signal Control via Reinforcement Learning: A Review on Applications and Innovations. Infrastructures 2025, 10, 114. https://doi.org/10.3390/infrastructures10050114
Michailidis P, Michailidis I, Lazaridis CR, Kosmatopoulos E. Traffic Signal Control via Reinforcement Learning: A Review on Applications and Innovations. Infrastructures. 2025; 10(5):114. https://doi.org/10.3390/infrastructures10050114
Chicago/Turabian StyleMichailidis, Panagiotis, Iakovos Michailidis, Charalampos Rafail Lazaridis, and Elias Kosmatopoulos. 2025. "Traffic Signal Control via Reinforcement Learning: A Review on Applications and Innovations" Infrastructures 10, no. 5: 114. https://doi.org/10.3390/infrastructures10050114
APA StyleMichailidis, P., Michailidis, I., Lazaridis, C. R., & Kosmatopoulos, E. (2025). Traffic Signal Control via Reinforcement Learning: A Review on Applications and Innovations. Infrastructures, 10(5), 114. https://doi.org/10.3390/infrastructures10050114