Next Article in Journal
Physical Support of Soldiers During CBRN Scenarios with Exoskeletons
Previous Article in Journal
A Railway Mobile Terminal Malware Detection Method Based on SE-ResNet
Previous Article in Special Issue
A Coordinated Adaptive Signal Control Method Based on Queue Evolution and Delay Modeling Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Systematic Review

Hybrid Rule-Based and Reinforcement Learning for Urban Signal Control in Developing Cities: A Systematic Literature Review and Practice Recommendations for Indonesia

1
Department of Electrical Engineering, Institut Teknologi Dirgantara Adisutjipto, Yogyakarta 55198, Indonesia
2
Department of Informatics, Institut Teknologi Dirgantara Adisutjipto, Yogyakarta 55198, Indonesia
3
Department of Industrial Engineering, Institut Teknologi Dirgantara Adisutjipto, Yogyakarta 55198, Indonesia
4
Department of Mechanical Engineering, Institut Teknologi Dirgantara Adisutjipto, Yogyakarta 55198, Indonesia
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(19), 10761; https://doi.org/10.3390/app151910761
Submission received: 25 August 2025 / Revised: 27 September 2025 / Accepted: 2 October 2025 / Published: 6 October 2025

Abstract

Hybrid rule-based and reinforcement-learning (RL) signal control is gaining traction for urban coordination by pairing interpretable cycles, splits, and offsets with adaptive, data-driven updates. However, systematic evidence on their architectures, safeguards, and deployment prerequisites remains scarce, motivating this review that maps current hybrid controller designs under corridor coordination. Searches across major databases and arXiv (2000–2025) followed PRISMA guidance; screening is reported in the flow diagram. Eighteen studies were included, nine with quantitative comparisons, spanning simulation and early field pilots. Designs commonly use rule shields, action masking, and bounded adjustments of offsets or splits; effectiveness is assessed via arrivals on green, Purdue Coordination diagrams, delay, and travel time. Across the 18 studies, the majority reported improvements in arrivals on green, delay, travel time, or related coordination metrics compared to fixed-time or actuated baselines, while only a few showed neutral or mixed effects and very few indicated deterioration. These results indicate that hybrid safeguards are generally associated with positive operational gains, especially under heterogeneous traffic conditions. Evidence specific to Indonesia remains limited; this review addresses that gap and offers guidance transferable to other developing-country contexts with similar sensing, connectivity, and institutional constraints. Practical guidance synthesizes sensing choices and fallbacks, controller interfaces, audit trails, and safety interlocks into a deployment checklist, with a staged roadmap for corridor roll-outs. This paper is not only a systematic review but also develops a practice-oriented framework tailored to Indonesian corridors, ensuring that evidence synthesis and practical recommendations are clearly distinguished.

1. Introduction

Urban traffic signal control remains a persistent challenge, particularly in rapidly growing cities where traffic volumes and mixed modes of transportation increase annually. In Indonesia and other developing countries, the large share of motorcycles and lane-free behavior makes conventional strategies less effective. Fixed-time and actuated controllers, while widely deployed, often fail to adapt under these heterogeneous conditions, leading to wasted green time and rising delays [1,2,3].
In parallel, the research community has advanced intelligent traffic control. Reinforcement learning (RL), particularly deep RL and multi-agent RL (MARL), enables the continuous adaptation of signal policies in real-time [4,5]. Over the past decade, studies have progressed from single-interaction Q-learning to complex MARL architectures coordinating entire networks [6]. These controllers can reduce delays and stops by learning directly from traffic data, yet their purely data-driven nature raises safety and interpretability concerns [7]. Hybrid approaches—where rule-based logic (e.g., legal constraints or priority rules) is combined with RL agents—offer a more reliable compromise [8,9,10]. To our knowledge, comprehensive reviews focusing specifically on such hybrid rule-based and MARL signal control, especially on their fit for Indonesia’s heterogeneous traffic, remain limited.
To date, however, few reviews have synthesized hybrid rule-based and MARL signal systems, particularly for the context of Indonesian mobility. Existing studies often focus on lane-disciplined, detector-rich environments in high-income countries, making transferability uncertain [11,12]. Indonesia’s lane-free streams, powered two-wheeler dominance, and limited sensing infrastructure demand tailored solutions [12,13,14,15]. This review, therefore, provides the first systematic synthesis of hybrid traffic-signal control with explicit attention to developing-country realities, following PRISMA 2020 guidance [14].
This systematic review aims to address these needs by (a) surveying hybrid TSC methods that combine rule-based logic with MARL, (b) assessing how such methods perform under realistic traffic scenarios, and (c) identifying enabling factors (sensing/communications, data standards, and governance side) and barriers to their deployment in Indonesia. The review adheres to systematic literature review (SLR) guidance to ensure transparent, reproducible search and analysis [16]. Relevant peer-reviewed studies (2010–2025) were identified through targeted searches of IEEE, Elsevier, MDPI, and other databases using keywords on traffic signals, hybrid control, and reinforcement learning. Inclusion criteria required empirical evaluation in urban traffic settings. This review also considered works on enabling technologies and policy where they relate to intelligent signal control. Although centered on Indonesia, the synthesis is designed to be transferable to other developing-country settings that share similar sensing, connectivity, and institutional constraints.
The contribution is threefold. First, we categorize hybrid architectures by how they retain plan authority in conventional controllers while bounding MARL actions to offset, split, or cycle adjustments. Second, we evaluate reported outcomes across practical performance measures such as arrivals on green (AoG), Purdue Coordination Diagrams (PCD), travel time, and delay. Third, we identify enabling conditions for Indonesian corridors: camera-first sensing, corridor-level communication under local spectrum rules, and open-standards compliance with ATSPM/NTCIP practices. Together, these provide engineers and agencies with a practice-oriented checklist for deploying hybrid designs under constrained but realistic conditions.
In addition to being a systematic review, this paper develops a practice-oriented framework tailored to Indonesian corridors. While the synthesis emphasizes Indonesia, the principles are transferable to other developing countries facing similar heterogeneous traffic and infrastructure constraints. The review follows PRISMA 2020 guidelines and aligns with MDPI’s systematic review standards. The remainder of the paper is structured as follows: Section 2 details the review methods; Section 3 presents the operating context; Section 4 develops the hybrid architecture; Section 5 and Section 6 examine communications and policy considerations; and Section 7 concludes with limitations and implications.

2. Review Methodology

This review followed the PRISMA 2020 guidelines and the SWiM framework to ensure transparency and reproducibility. The methods comprised four main stages: search strategy, eligibility criteria, screening and selection, data extraction and synthesis, and quality appraisal. Each stage is described in the following sub-sections.

2.1. Search Strategy

We developed a comprehensive search strategy covering Scopus, Web of Science, IEEE Xplore, ScienceDirect, SpringerLink, and Google Scholar. To align with the PRISMA 2020 template, records were grouped under “Databases” and “Registers.” Although multiple databases were searched, only one category is shown in the PRISMA diagram under Databases (n = 1), while arXiv preprints were classified under Registers (n = 34). This reflects PRISMA’s reporting structure rather than a true reliance on a single database. Searches combined topical, methodological, and contextual keywords (e.g., “reinforcement learning”, “traffic signal control”, “hybrid”, “offset”), with exact search strings documented in Supplementary Table S1. The search window spanned 2000–2025 and covered studies across multiple regions, including Asia, Africa, Europe, and North America, to ensure a broad geographic scope. While this window captures the majority of relevant hybrid traffic-signal control work, we acknowledge that non-English publications or earlier studies may not have been retrieved. For Google Scholar, the same time window was applied, but only the first 200 results were screened to ensure relevance and reproducibility. The number of records retrieved from each database, together with screening outcomes, is summarized in Table 1.
For reproducibility, we provide one representative search string and execution date here, while the full set is listed in Supplementary Table S1. For example, the Scopus query executed on 16 August 2025 was:
TITLE-ABS-KEY(“reinforcement learning” AND “traffic signal” AND “hybrid”)
AND (LIMIT-TO(LANGUAGE, “English”))
AND (PUBYEAR > 1999 AND PUBYEAR < 2026)
This query returned 128 records, reduced to 127 after duplicate removal. Equivalent structured queries were applied to other databases, following the same time window and keyword logic.

2.2. Eligibility Criteria

Studies were eligible if they applied RL/MARL with explicit safeguards in traffic-signal control and were transferable to developing-city conditions. Sparse or camera-first sensing was defined as corridors in which video feeds dominated or loop detector coverage was <30% of approaches. Explicit safeguards included at least one of: plan authority (e.g., min-green, fixed splits), action masking, bounded variables (e.g., Δoffset ≤ ½ cycle or ≤10 s per cycle), or prerequisites (AoG/PCD audits, ATSPM logs, interlocks). Dense-detector studies were included only if they still applied bounded RL logic; otherwise, they were excluded. Additional exclusions were purely theoretical work, non-peer-reviewed material, non-English papers without English abstracts, and studies unrelated to signalized urban intersections.

2.3. Screening and Selection

Screening followed the PRISMA 2020 flow. After removing duplicates, titles and abstracts were screened against the eligibility rules, followed by full-text review. Records of inclusion and exclusion at each stage are documented in Supplementary Tables S2 and S3, with reasons for exclusion logged. The overall process is summarized in the PRISMA 2020 flow diagram, as shown in Figure 1. In total, 18 studies met the criteria and were retained for synthesis.
During the screening and selection process, several protocol adjustments and decision rules were required to address ambiguities (e.g., overlapping categories, borderline cases, and duplicate records). These adjustments were documented a priori to ensure transparency and reproducibility, and are summarized in Table 2.

2.4. Data Extraction and Synthesis

Data from included studies were extracted using predefined forms (Supplementary Tables S4 and S5) covering setting, control levers, safeguards, metrics, and outcomes. Narrative synthesis followed the SWiM framework, since heterogeneity in designs and reported indicators precluded meta-analysis.
Effect directions were grouped by control lever (offset, split, cycle), safeguard type (rule shields, masking, bounds, prerequisites), and reported outcomes (AoG, PCD, delay, travel time). The grouping logic applied in the synthesis is illustrated in Figure 2.
Sensitivity subsets also distinguished simulation-only studies from pilots, ensuring comparability across contexts and highlighting where transfer to field implementation has already been attempted.

2.5. Quality Appraisal

Formal tools such as ROBIS or AMSTAR-2 were considered but deemed less appropriate for engineering and transportation systems research, since these tools were developed with clinical, interventional studies in mind [17,18]. Instead, we applied an operational relevance checklist tailored to traffic-signal control studies, focusing on sensing environment, safeguard presence, simulation versus field context, and reporting of ATSPM-related metrics. Each checklist item can be mapped to standard systematic review quality domains: applicability/external validity (sparse vs. dense sensing), study design adequacy (presence of safeguards), indirectness (simulation vs. pilot), and reporting transparency (AoG, PCD, delay, or travel time metrics). The full checklist and its mapping are provided in Supplementary Table S6. This adaptation ensures that the appraisal remains aligned with recognized systematic review practices while capturing the operational constraints specific to traffic-signal control research.

3. Operating Context for Traffic Signal Control in Developing Cities

Corridor signal control in developing cities must operate under limited sensing and intermittent communications. This reflects the reality in many Low- and Middle-Income Countries (LMIC) settings. This section outlines prevalent control typologies, Indonesian-specific constraints, measurement/auditability needs (AoG/PCD), and scenarios where small, bounded residual-offset corrections are most defensible.

3.1. Typology of Signal Control in Practice

Signal control falls into four families: fixed-time, actuated, adaptive/UTC, and centralized ATMS/UTC. They differ mainly in how timings are determined. All rely on three coordination levers: cycle length, phase splits, and offset. Fixed-time (pre-timed) plans may create progression by setting cycle lengths and offsets (e.g., MAXBAND). Actuated control uses loops or cameras to extend/recall phases based on local demand. Adaptive UTC systems (e.g., SCOOT in London, SCATS in Sydney/Dublin) continuously adjust cycle, split, and offset from detector data, while centralized ATMS (e.g., LA’s ATSAC) supervises and retimes entire networks. Table 3 compares these families across example cities and contexts. In summary, implementations vary in automation and scale, but coordination always returns to the cycle–split–offset levers.

3.2. Indonesia: Conditions and Constraints

Urban arterials in Indonesia are characterized by heterogeneous, lane-free streams with very high motorcycle shares and frequent lateral maneuvers. Empirical and review studies show that such weak lane discipline alters headways, saturation flow, and delay. Therefore, per-lane assumptions from classic models need local recalibration for signalized approaches. Recent Indonesian and regional work (e.g., Denpasar/Bali field studies of motorcycle behavior; broader heterogeneous-flow syntheses) documents these effects and supports re-estimating parameters before corridor coordination [27,28,29].
At the same time, legal timing and device rules are prescriptive. Indonesian regulation on traffic signal devices (Alat Pemberi Isyarat Lalu Lintas, APILL) specifies the technical and operational requirements for traffic signals. Therefore, any modernization must retain the approved timing plan and ensure that changes are traceable to permitted objects and occur within legal constraints. [30].
On the sensing side, practice is largely camera-first (CCTV/VIVDS) rather than loop-dense. Recent Indonesian studies implement video analytics (YOLO/CNN, fuzzy-logic controllers with real-time video) for vehicle presence, queues, and congestion at intersections and arterials, reflecting both feasibility and cost practicality in local deployments. [31,32].
For coordination systems, Indonesia has experience with adaptive UTC, notably SCATS. Peer-reviewed reports cover a long-running Bandung evaluation of advanced traffic control systems and more recent Jakarta SCATS studies using MKJI-based assessments. It is useful as context for how adaptive coordination is interpreted and evaluated locally [33].
Given these constraints, performance-audited operations are essential. ATSPMs derived from controller logs, especially AoG and PCD. This lets agencies validate progression and gate any residual-offset change with before/after evidence, without requiring dense upstream detection. Guidance and case material from FHWA and state DOTs detail how AoG/PCD are computed and used for corridor retiming [34,35].
In the given constraints, it is recommended to keep the plan authoritative and confine automation to small, cycle-boundary residual-offset (Δoffset) proposals that are accepted only when AoG/PCD improves. This preserves legal compliance, auditability, and vendor neutrality while delivering measurable corridor gains [30].

3.3. Priority Scenarios for Improvement

Coordinated corridors in Indonesia frequently suffer from offset drift, especially where high motorcycle shares and mixed traffic reduce the realized saturation flow rate below modeled expectations [28]. Such lane-free dynamics are well documented as key contributors to reduced capacity and safety concerns on Indonesian urban roads [29]. To counter this, residual offset updates are applied at cycle boundaries; the controller maintains plan authority over phases and intergreens, while the adaptive component proposes bounded Δoffset adjustments. These updates are accepted only if audit-level, high-resolution performance metrics confirm measurable improvement, in line with ATSPM practice that emphasizes AoG, split failures, and PCD [34].

3.4. Technical Challenges in Indonesian and LMIC Corridors

Implementing bounded adaptation in Indonesian corridors faces several technical challenges. First, heterogeneous and lane-free flows reduce the reliability of conventional saturation flow models, complicating calibration of adaptive controllers [28]. The dominance of motorcycles, lateral weaving, and irregular lane discipline lowers effective capacity and increases variability in progression [29]. Second, sensing limitations remain a barrier. Many intersections rely only on stop-bar video detection, with limited availability of radar or loop detectors [36]. While such sparse detection can still support performance audits, accuracy varies with weather, lighting, and occlusion [37]. Third, communications and data quality affect the timeliness of audits. High-resolution logging requires stable backhaul and synchronization; without this, offset adjustments cannot be validated reliably [34].
To mitigate these issues, agencies are advised to emphasize audit-first acceptance using Automated Traffic Signal Performance Measures (ATSPM). Even with partial sensing, standard metrics such as AoG, split failures, and PCD can reveal offset drift and quantify gains from bounded adaptation [34]. This ensures that adjustments remain transparent, explainable, and consistent with best practice, even under local constraints.

3.5. Measurement and Auditability Under Limited Sensing

ATSPM enables corridor-scale diagnostics without dense upstream detectors. PCD plots arrivals against the green window and, together with AoG, exposes drift in progression, split failures, and downstream blockages. Where video or probe data are available, trajectory-based measures can complement controller logs [38].
All proposed residual-offset changes are queued and committed at the cycle boundary to preserve legal change/clearance intervals, and accepted only when corridor-level AoG increases and PCD band alignment improves; otherwise, revert. This policy is consistent with event-data offset optimization (ASCE JTE-Part A, 2020) and statewide ATSPM practice (e.g., Utah corridor analyses) [39,40].
The same audit-first pattern is documented across programs and cities. In the United States, ATSPM deployments and event-/trajectory-based evaluations in ASCE/TRR report evidence-gated offset updates [39]. In the United Kingdom, SCOOT’s continuous measurement guides routine timing adjustments [38]. Meanwhile, in Iran (Tehran), SCATS is evaluated under non-lane-based conditions with improvement proposals [41].

4. Hybrid Architecture for Urban Signal Control

This section introduces the coordination and control design for addressing offset drift in coordinated corridors. By contrast, the challenges of lane-free and heterogeneous streams are considered primarily as a sensing problem, which is discussed in Section 5.

4.1. Rule-Based Coordination on Lightweight Controllers

4.1.1. Principles & Architecture

The hybrid architecture is recommended to strictly separate safety-critical local control from adaptive optimization. Rule-based logic runs on the controller as the inner loop. An outer loop adjusts the split and offset to maintain progression. Two agent types implement the hierarchy. A coordinator agent sets corridor objectives and guardrails. Intersection agents translate targets into local actions under safety constraints. Similar manager–worker hierarchies are reported in FMA2C and in the GMHM model [42,43].
The coordinator is selected from one of the existing intersection controllers. Agencies often deploy a coordinator agent to pick plans and issue commands. On-street masters are standard microprocessor controllers and are compatible with distributed operations. Green-wave formation is introduced at the corridor layer by one signal that serves as the reference that sets the release beat. Its neighbor acts as a follower and trims its offset to catch the platoon. Time-space diagrams and offset tools guide this tuning. Classic dynamic-offset procedures show delay reductions when downstream offsets use upstream travel time. Recent two-junction studies support adjacent-pair coordination with shared timing and offsets. The approach appears in models for two adjacent intersections [44,45].
The coordination philosophy is deliberately conservative: keep the authoritative timing plan in charge of safety-critical elements (phase sequence, cycle length, green splits, yellow and red-clearance intervals, and pedestrian timings), and confine any online adaptation to small offset adjustments at cycle boundaries. This preserves the classic cycle–split–offset semantics that underpin modern arterial progression practice.

4.1.2. Reference–Follower Role Policy

A reference–follower scheme may be adopted to keep coordination simple and auditable: one intersection in a corridor is the reference, and neighboring intersections are followers that align their coordinated phases by applying a small Δoffset at the next cycle boundary only. Coordination is expressed explicitly as a reference–follower pattern executed by the rule-based controller logic. As in Ref [46,47], a “reference” intersection defines the cycle timing for a subsystem; downstream “followers” align their coordinated green start by applying a bounded Δoffset relative to the reference. This is the classic arterial coordination idea: offsets at each local controller are defined with respect to a reference point at a critical intersection or phase.
Figure 3 illustrates a corridor micro-coordination scheme where either intersection can temporarily become the reference while the other acts as the follower. The policy is a short-horizon, offset-only tactic that sits on top of safety-critical local control. Each intersection agent reports cycle-synchronous volumes, queues, and arrival-on-green summaries to a coordinator agent. The coordinator aggregates these feeds over a short sliding window. The coordinator then assigns the “reference” agent and the “follower” agent for the next horizon based on predefined rules such as dominant demand and progression targets. The coordinator pushes updated offsets to the selected follower while preserving each agent’s local safety logic. This mirrors field practice where a master controller or central system selects plans and issues timing parameters to locals in traffic-responsive and adaptive operations. In the diagram, the dominant traffic flow is indicated with thick arrows to emphasize the reference–follower direction.
Similar methods have been used by several researchers and have proven effective. Zou et al. coordinated two adjacent intersections with a two-stage algorithm that aligns arrivals and phases, effectively designating a leader–follower pair across the pair [44]. Zhang et al. formulated a dynamic optimization for adjacent intersections that adjusts offsets and green splits jointly for progression [45]. Meanwhile, Li et al. centralized a corridor-level optimizer in a connected-vehicle framework to compute offsets and phase durations simultaneously [48].
Offset can be determined by the estimated travel time from the reference intersection to the follower. Engineers compute the downstream green start by matching the expected platoon arrival from the upstream release. Dynamic offset procedures based on measured travel times have shown delay reductions on real corridors [49]. Time–space tools formalize this calculation for adjacent signals along a corridor. The MARL layer then refines the offset over successive cycles to improve arrivals on green while respecting safety limits. Hierarchical and manager–worker MARL frameworks report effective corridor coordination and network scalability [50].
The two-agent case in Development of Coordinated Control of Vehicle Traffic Flow at Adjacent Intersection [46] demonstrates a reference–follower pairing (termed master–slave), where the upstream controller synchronizes the downstream controller each cycle. The same policy extends to more than two agents, as in Figure 4. In a three-intersection corridor, the middle cabinet naturally serves as a coordinator to minimize hop delays and balance latency. Each agent publishes short, cycle-synchronous summaries to the coordinator. The coordinator may nominate the upstream end as the reference and issue bounded offset updates to the others. Synchronization signals then propagate in sequence: Agent 1 synchronizes Agent 2; Agent 2 synchronizes Agent 3. Each follower commits its Δoffset at the next cycle boundary while local safety logic remains authoritative. This pattern is consistent with long-standing “on-street reference ↔ follower” coordination in arterial practice.
Network-scale literature also supports coordinator-based or hierarchical multi-agent control beyond two signals. CoLight learns cooperation across many intersections with graph attention, enabling distributed agents to align decisions over a corridor or grid [6]. Feudal multi-agent approaches introduce manager-worker layers or adaptive regional coordinators that assign short-horizon goals to local controllers, improving scalability and stability when intersections exceed pairs [43]. Field-oriented adaptive systems such as ACS-Lite also embody a supervisor selecting timing parameters for groups of intersections on real corridors. These strands collectively justify appointing a corridor coordinator and chaining reference–follower roles for three or more agents.
Based on these studies, reference–follower coordination has proven effective for synchronizing adjacent intersections under offset adjustments. In our framework, we explicitly recommend adopting this reference–follower role policy as a practical baseline for Indonesian corridors. The reference node provides a timing anchor, while each follower intersection adjusts only within bounded residual offsets. This separation clarifies responsibilities, simplifies scheduling, and creates an auditable structure that agencies can verify through AoG and PCD metrics. Unlike prior studies that demonstrate the concept in controlled simulations, our proposal emphasizes feasibility with existing Indonesian controllers and communications, ensuring that the scheme can be deployed incrementally and monitored transparently.
As illustrated in Figure 5, a simplified Purdue Coordination Diagram (PCD) is used to show how bounded Δoffset adjustments affect corridor progression. In the previous case, Reference–Follower coordination with limited offset flexibility leaves several arrivals just outside the green window, yielding AoG approximately 48%. In the latter case, hybrid rule-based and MARL adjustments introduce small fluctuations in the start of green, allowing those same arrivals to be captured inside the window and raising AoG to approximately 62% thereby improving platoon progression efficiency without extending cycle time.

4.1.3. Hardware Implementation

Indonesian prototypes have demonstrated reference–follower coordination on 8-bit AVR controllers. A master node broadcasts synchronization frames using FSK at 433 MHz, and local controllers shift only the end of the cycle by adding or subtracting a few seconds of green. Measured resource use stayed below 10% of flash and around 1% of CPU, indicating ample headroom on very small chips. The same platform is recommended for the rule layer because it is inexpensive, dependable, and already proven in local practice [51,52].
This prototype opens the path toward a hybrid model, providing a strong foundation for the following reasons:
  • Clocks are kept aligned via RTC, so that offsets remain meaningful [53].
  • The reference issues a lightweight release signal each cycle; followers queue a bounded Δoffset and commit only at the next cycle boundary.
  • Δoffset proposals are accepted or reverted based on AoG and PCD trends computed from controller events, yielding a transparent, performance-based workflow for small offset trims [54].
  • Reads and writes can be executed through NTCIP 1202 v04 with SNMPv3 to remain vendor-neutral and fully logged [55].
On this strong foundation, fallback and safety are enforced. If the sync message is missing or time drift exceeds a threshold, followers revert to the nominal offset in the approved plan. Because only offsets are adjusted and commits are cycle-synchronous, legal change intervals (yellow/all-red) and pedestrian timings remain inviolate by design.
This method is also used in several countries and reported in recent studies, for example, in the United States, Ireland, and India, and in microcontroller-oriented research. ACS-Lite from FHWA shows a low-cost adaptive coordination retrofit that adjusts offsets and splits in real time with minimal infrastructure, proving that plan-based coordination with small online adjustments can scale without high-end hardware [56]. SCATS in Dublin (Ireland) uses real-time detector data to manage cycle, split, and offset on field controllers. The scheme remains rule-driven around the same timing levers and stands as a mature precedent under limited bandwidth. The similar prototypes can adjust cycle and offsets using crowdsourced travel data from Google APIs in India, showing that coordination can run with light inputs and modest hardware when dense detection is infeasible [57].
In summary, run rule-based coordination on the existing microcontroller, expose reads and writes through NTCIP objects, and gate any applied change on AoG and PCD improvement. The approach secures safety and auditability today while leaving headroom to add MARL-based offset learning later without replacing baseline controllers.

4.2. Reactive MARL Offset Learning Under an Authoritative Plan

MARL treats each intersection as an agent that learns, cycle by cycle, how tiny adjustments to the coordinated offset can improve platoon arrivals without touching safety timings. In this architecture, the timing plan remains authoritative: the rule layer sets reference–follower roles, and MARL proposes only small, bounded Δoffsets queued to the next cycle boundary.
Similar practices are already common in recent RL signal-control studies. Diaz et al. adjust only phase offsets while keeping phase order and splits fixed; their implementation also prevents extending a running phase by more than half a cycle, effectively bounding Δoffset [58]. Kim et al. use a hierarchical RL scheme where an “Ours-offset” mode controls offsets only. Offset changes per transition are explicitly limited to a fixed fraction of the cycle under national controller policy [59]. Park et al. deploy a DQN that selects the offset each cycle from a predetermined list. The action set is constrained to “acceptable” values, which bounds offset updates in practice [60]. Other constrained RL designs also retain field-safe limits. Gu et al. optimize signals with explicit constraints on phase sequence and minimum greens, illustrating how RL can operate under strict bounds [61]. These precedents support the choice to let MARL propose only small, bounded Δoffsets under an authoritative corridor plan.
The arrangement keeps learning subordinate to plan authority and aligned with current practice. It leverages MARL for coordinated control within a strict, offset-only, cycle-synchronous envelope, as shown in recent multi-agent RL studies [62]. This motivates a constrained agent design. Accordingly, each follower is treated as a constrained agent under plan authority. Observations, actions, and rewards are defined to enable offset-only, cycle-synchronous nudges within the reference–follower synchronization scheme.
  • State.
Each follower agent observes compact, auditable features: (i) plan context (active plan ID, current coordinated offset, time-of-day) defined per the Signal Timing Manual—Second Edition (NCHRP 812) offset reference point; (ii) progression evidence such as Arrivals on Green (AoG) and Purdue Coordination Diagram (PCD) summaries from ATSPM; and (iii) an optional per-cycle reference synchronization timestamp. These components let MARL reason about alignment without touching phase/split logic.
2.
Action.
A small, bounded residual Δoffset drawn from a discrete set and queued for the next cycle boundary only (no mid-cycle edits). This keeps actions interpretable within the classic cycle–split–offset framework and compatible with controller semantics. The write path targets the coordinated-plan offset via the controller’s NTCIP 1202 v04 (ASC) interface.
3.
Reward.
A progression-oriented signal: increase AoG and tighten PCD bands (positive reward); penalize oscillations, reversions (rejected writes), and excessive magnitude/frequency of changes. This aligns learning with ATSPM practice, where AoG/PCD are the primary diagnostics of coordination quality.
Multi-intersection traffic signal control is a canonical multi-agent RL problem, but stability and non-stationarity motivate small, well-constrained action spaces and disciplined inter-agent timing/communication, precisely what an offset-only, cycle-synchronous design provides. Recent work toward real-world RL deployments likewise emphasizes parameter-level actions that map cleanly to existing controllers [63,64].
MARL is bound tightly to the reference–follower scheme so that learning never disturbs plan semantics: the reference issues a per-cycle coordination signal aligned to a consistent offset reference point (e.g., coordinated-phase green onset), and each follower may apply an accepted Δoffset only at the next cycle boundary. Using a stable reference point is standard practice in coordinated timing and preserves the classic cycle–split–offset framework. Table 4 clarifies exactly what the MARL layer may influence online, and what remains strictly off-limits under safety rules, standards, and agency policy.
While the rule layer continues on proven microcontrollers, MARL must be run on an edge computer, e.g., Raspberry Pi 4 (Raspberry Pi Foundation, Cambridge, UK). Even with a tiny action space, the MARL process benefits from a small Linux-class edge device for inference, logging, and secure NTCIP transactions (SNMPv3 in NTCIP 1202 v04). This device meets these needs at low cost and is widely adopted as an IoT/edge platform in practice, and review the literature [55,65].
Prior studies (e.g., Díaz, Kim, Park, and Gu) demonstrate that constraining MARL to offset adjustments improves stability and reduces oscillations in coordination. Building on this evidence, we recommend restricting the MARL layer in our framework to residual offset learning only, rather than allowing it to modify full-cycle or split plans. This bounded approach limits the risk of unsafe actions, ensures compatibility with existing plan authorities, and simplifies validation through ATSPM metrics such as AoG and PCD. By explicitly positioning offset learning as a follower adjustment mechanism, our proposal translates the strengths shown in simulation into a configuration that is feasible for Indonesian deployments, where communications are intermittent and heterogeneous traffic demands require transparent safeguards.

4.3. Comparison with Other Hybrid Traffic-Signal Control Approaches

Bouktif et al. (2021) showed a hybrid action-space DRL (P-DQN) that simultaneously selects the next phase and its duration, evaluated only in SUMO (German Aerospace Center, Berlin, Germany) simulation, reporting up to 22.20% queue reduction; such freedom to alter phase order/length conflicts with our authoritative-plan setting [66]. Bi et al. (2024) combined Type-2 fuzzy control with deep RL for single-junction timing, also tested only in simulation [67]. All of these studies were simulation-only in SUMO or VISSIM (PTV Group, Karlsruhe, Germany), with no real-world deployments. In contrast, our constrained hybrid approach retains expert knowledge in the fixed plan and limits learning to offset adjustments, which eases deployment and auditing even if it is less fully adaptive.
Evidence from the literature consistently shows that there is no peer-reviewed evidence of MARL being fully deployed in real-world traffic signal corridors. Of the 18 studies synthesized in this review, the large majority (15) were simulation-only, while only a few reported limited pilot-level experiments. For instance, Wei et al. (2019) examined MARL with graph attention (CoLight) purely in simulation [6]. Bouktif et al. (2021) [66] and Bi et al. (2024) [67] likewise remained SUMO-based. By contrast, a small subset, such as ACS-Lite-inspired pilots [56], Indonesian camera-based prototypes [51], and SCATS-related evaluations [33], touched on field controllers, but none deployed MARL agents directly in corridor-scale operations. Fang and Sadeh (2023) explicitly stated that “RL-based TSC algorithms have never been deployed” [14]. Li et al. (2023) further concluded that most RL approaches show “poor real-world applicability” with hardly any successful deployments [68]. Kwesiga et al. (2025) emphasized that RL methods remain “untested for real-world signal timing plans” due to simplifying assumptions [69]. More recent 2025 reviews reaffirm the same limitation: Michailidis et al. (2025) stressed that RL-based traffic control remains primarily experimental [2], Saadi et al. (2025) highlighted that most DRL methods are still evaluated in simulators with limited transferability [70], and Li et al. (2025) presented federated DRL architectures but only under simulation conditions [71]. Othman et al. (2025) advanced a multimodal decentralized MARL controller (eMARLIN-T-MM) using transformer encoders to process heterogeneous flows; yet the experiments were confined to SUMO models of five Toronto intersections [72]. Satheesh and Powell (2025) introduced MAPPO-LCE, a constrained multi-agent RL that enforces practical safeguards such as minimum green times and phase-skip penalties [73]. While it shows improved stability compared to unconstrained MARL, the evaluation was conducted entirely in the CityFlow simulation environment with real-world datasets, and no pilot deployment has yet been reported [73].
Collectively, these findings confirm that no strong peer-reviewed evidence yet demonstrates full real-world deployment of MARL for corridor-scale traffic signal control in real-world urban environments. Table 5 provides a concise classification of the 18 included studies by deployment status. The full evidence map—including settings, control levers, safeguards, and performance metrics—is available in Supplementary Tables S1–S6, particularly Table S5. This ensures transparency and consistency with the PRISMA framework while keeping the main text focused.
To give readers the big picture at a glance, Table 6 maps the 18 studies by setting, control levers (offset/split/cycle/phase), safeguard type (mask/shield/bounds/other), metrics (including AoG/PCD where reported), and outcomes against baselines, complementing Table 6 (deployment status).
To enhance transparency, we also quantified the direction of reported effects using the SWiM effect-direction approach. Across the 18 studies, 13 reported improvements, 3 were neutral or mixed, and 2 showed deterioration. Improvements were most consistent in offset-only or bounded-variable designs, while deterioration was observed mainly when action spaces were unconstrained and safeguards were absent.
Following SWiM, we summarized the direction of effects for the primary metrics and split results by sensitivity subsets (simulation-only vs. pilot/field). Table 7 provides a compact view.
Temporary conclusions can be drawn, however, that hybrid safeguards are consistently associated with positive or at least non-deteriorating performance across heterogeneous traffic contexts.
In addition, we compared the safeguard strategies across studies. Table 8 summarizes how plan authority, action masking, bounded variables, and prerequisites were applied. This table condenses the hybrid safeguard logic into a concise comparative view and complements the broader evidence map in Table 8.
Taken together, this quantified synthesis shows that bounded, rule-shielded approaches consistently outperform unconstrained RL designs, both in terms of stability and practical deployability. Temporary conclusions can therefore be drawn that RL/MARL results remain largely simulation-based, and that only safeguarded, plan-authoritative hybrids have a realistic path toward field deployment.
This reinforces that safeguarded, plan-authoritative hybrids offer more consistent and deployable outcomes than unconstrained RL designs. Temporary conclusions can be drawn; reviews and position papers repeatedly note that RL/MARL results are largely simulation-based. They report no peer-reviewed evidence of routine field deployment of a pure RL signal controller [4,14,79]. Agencies continue to deploy adaptive platforms and plan-based coordination. These programs emphasize standards, auditability, and safety constraints. They do not report MARL running signals directly. Pilot efforts that bring RL closer to practice adopt constrained action definitions. They optimize timing parameters within existing controller limits to ensure deployability. This confirms that bounded, parameter-level control is the practical path [64].
Several studies propose safety guards around RL actions. Action masking and encoded traffic-engineering rules keep minimum greens, sequences, and clearances intact. These designs mirror our choice to bound Δoffset under plan authority [8]. Field-facing initiatives outside MARL also show the same pattern of constraint and human oversight. Google’s Green Light recommends plan tweaks to engineers rather than taking over controllers. This supports a hybrid, human-in-the-loop trajectory.
Taken together, the literature supports a restrained hybrid. MARL proposes only small, bounded Δoffsets on top of a lawful plan, while the rule-based layer guarantees safety. This matches where real deployments are heading and what agencies can accept now. Evidence from “Towards Deployment” papers also strengthens the argument. Authors explicitly state that real-world RL control is not yet established and call for realistic, standards-compliant bridges. Constrained, hybrid MARL fits that bridge.

5. Sensing and Communications Under Limited Infrastructure

Indonesia’s existing traffic management relies heavily on roadside cameras and modest detection infrastructure. For example, Jakarta’s adaptive system uses 135 FLIR TrafiCam video detectors and 25 streaming cameras at 37 key intersections, and Surabaya has 90 such cameras on 32 junctions [80]. These camera-centric installations leverage existing poles, power, and (where available) fiber connections, but coverage is still sparse. By contrast, cities like Singapore and Seoul have far more extensive sensor networks. Singapore’s Intelligent Transport System integrates traffic cameras, road sensors, GPS data, and RFID toll gantries into a unified back-end (iTransport) [81]. Seoul’s TOPIS system links 741 CCTV cameras across the city to a central controller that monitors flow and incidents in real time [82]. Curitiba, Brazil, combines inductive detectors, CCTV video detection, and GPS-equipped buses for transit priority [83]. Nairobi’s nascent ITS similarly uses loops, CCTV, and GPS data: its current pilot covers 20 intersections with vehicle detectors and cameras feeding a Traffic Management Centre [84].
Compared to these examples, Indonesian corridors face tougher constraints. This section reviews detection and fusion strategies and compares the communications architectures of Singapore, Seoul, Curitiba, and Nairobi to identify approaches best suited to Indonesia.

5.1. Indonesia-Specific Operating Constraints for Sensing and Communications

Building on the corridor comparisons and the Indonesia-first strategy outlined above, the operating constraints that shape sensing and communications choices are considered.
  • Traffic is motorcycle-dominated and heterogeneous.
These flows create dense, occlusion-prone scenes that strain detection and tracking; studies at motorcycle-dominated intersections document the challenge [85]. Non-lane-based behavior is common, with frequent lateral movements and weak lane discipline in urban traffic [27]. Researchers conclude that such contexts require new, non-lane-based or lane-free modeling and control, rather than porting lane-based methods typical of high-income countries [86,87].
2.
Camera-Only Sensing (Little to No Other Field Detectors)
Many Indonesian ATCS deployments remain camera-centric, with limited use of other field sensors. Government and recent studies describe ATCS operations that rely on CCTV for monitoring and interventions [88,89,90,91]. In high-income countries, traffic systems mix multiple detector types beyond cameras. The FHWA Detector Handbook lists inductive loops, magnetometers, video image processors, microwave radar, laser radar, passive infrared, ultrasonic, and acoustic sensors. Microwave and infrared combinations support presence, counting, and speed, including in adverse weather.
3.
No national standard or designated band for traffic-signal communications.
Indonesia lacks a dedicated, government-provided channel for traffic-signal field communications. Current rules define general allocations and class-licensed uses rather than a TLC-specific band. Agencies therefore rely on fiber where available, licensed microwave, class-licensed 2.4/5 GHz radios, and cellular. Developed networks typically use multiple engineered paths: fiber backbones to cabinets, plus RF point-to-point and cellular, where fiber is impractical. Peer-reviewed and handbook sources describe these mixed topologies and their communications demands [92,93]. Indonesia remains a 4G-first market with limited nationwide 5G capacity, and fixed-broadband penetration is low by global standards [94,95]. If cellular is used as a backup path, practitioners should expect performance limits; empirical analyses show that communication loss degrades coordination quality at the signal system level [96]. These factors reinforce the need for a clear TLC communications policy.
To address these limits, the Indonesian strategy emphasizes camera-first sensing with supplemental technologies and rugged, low-cost communications.

5.2. Integrated Detection Strategy with Multi-Sensor Fusion

The strategy centers on camera-first sensing with targeted overlays that harden detection under rain, night, occlusion, and heterogeneous flows. The video pipeline uses models trained or fine-tuned for powered two-wheelers and three-wheelers. Evidence from urban deployments and PTW-focused studies shows deep detectors (e.g., Faster R-CNN/YOLO) perform well when trained on PTW-rich data [97,98]. A recent review of CCTV analytics reports that deep learning reliably detects and tracks motorcycles, and it emphasizes the value of datasets curated for local scenes; the same line of work released an Urban Motorbike Dataset to address that need [99,100]. These choices target the hard parts of LMIC approaches, including heavy occlusion at stop bars and lane-free, heterogeneous streams [86,99,101]. Both phenomena are well documented in CCTV-based motorcycle detection research and in mixed-traffic studies from developing cities [99,102].
Even with PTW-tuned vision, agencies strengthen detection with multi-sensor fusion by pairing cameras with millimeter wave radar, thermal imaging, or LiDAR to improve robustness in occlusion, heavy rain, headlight glare, and low light [103]. Reviews and handbooks describe how non-camera detectors complement video in roadside use [104]. Radar complements cameras on critical approaches. Multi-sensor studies report higher detection robustness when radar features are fused with camera imagery, including under adverse conditions [105]. Thermal imaging mitigates night glare and heavy rain effects. Experiments show that long-wave infrared cameras preserve vehicle thermal features in poor visibility [106,107]. LiDAR offers a wider range and lighting resilience at complex junctions, though cost and calibration needs are higher. Field comparisons show LiDAR trajectories are less sensitive to low light than camera-only data [108].
Deterministic stop-line presence can rely on simple in-road or on-road sensors. Side-channel measurements support corridor travel time and progression checks. Bluetooth or Wi-Fi MAC readers and ANPR are documented for arterial journey-time monitoring [109]. Meanwhile, acoustic packs add classification and flow where views are occluded, with recent studies showing reliable inference without lane closures [110,111].
Fusion can be organized at the decision, feature, or data levels. Surveys describe these levels and report gains from camera, radar, and other combinations. Higher fusion levels demand tighter calibration and more computing [112,113].
Table 9 synthesizes these choices and maps common Indonesian failure modes, rain, night, occlusion, motorcycle-dense queues, and weak links, to minimal overlays and the fusion level recommended for each case.

5.3. Communications Architecture

5.3.1. Intra-Intersection Communication

Each intersection is an agent composed of a cabinet ASC (actuated signal controller), one cabinet edge computer, and per-approach “signal head” assemblies that integrate a microcontroller SHC and sensing. The ASC in the cabinet remains authoritative for plan execution and exposes standardized read/write objects for coordination via NTCIP 1202 v04; this keeps timing changes vendor-neutral and auditable. Cabinet hardware and lamp driving the SHC practice so load switches, power distribution, and environmental limits match traffic-control standards [134,135]. Per approach, a weatherproof on-pole box at the signal head houses the Signal Head Control (SHC) microcontroller (which drives the red/amber/green outputs and may host a countdown display) and a Raspberry Pi 4 that reads the camera (and optional radar/thermal) and publishes events.
At the cabinet ASC, a Raspberry Pi 4 aggregates per-approach events, computes PCD/AoG for ATSPM evidence. When the intersection is a follower, it prepares a bounded Δoffset proposal. Any accepted change is written by the ASC at the next cycle boundary through its NTCIP objects. The Pi 4 platform provides Gigabit Ethernet and dual-band 802.11ac Wi-Fi, which are sufficient for real-time event exchange and standards I/O in cabinet environments.
Local field links connect on-pole nodes to the cabinet. Agencies may use fiber where conduit exists and RF where fiber is impractical. Peer-reviewed studies show workable wireless signal systems, including ZigBee-based interconnects and LoRa sensor links for traffic data [136]. The 380–400 MHz range can also be used widely for public-safety TETRA [137,138]. Figure 6 illustrates the inter-agent scheme: an ASC controls vehicles from four lanes (L1–L4), aggregates lane-level features, and publishes cycle-synchronous messages to each SHC.
Meanwhile, cellular can serve as a backup path, but it is not a cure-all. A Transportation Research Record study quantified large delay increases when center-to-field communications drop [96]. Indonesia’s class-licensed RLAN rules permit 5 GHz short hops subject to Kominfo technical limits. A low-bit-rate LoRa side channel in the AS923 family can carry health/heartbeat, and LoRa traffic sensing has been validated in peer-reviewed work [139,140].

5.3.2. Inter-Intersection Communication

Inter-intersection coordination in a three-agent corridor assigns the middle cabinet as the corridor coordinator. The corridor still follows standard arterial practice with a common cycle and offsets to foster progression. The coordinator role reduces hop delay and keeps timing decisions close to the field.
The reference–follower chain remains sequential. The upstream reference publishes a cycle-boundary sync; the downstream follower applies a bounded Δoffset at the next boundary. Typical spacings of a few hundred meters continue to benefit from offset control when travel times are stable. The sync pulse or time-stamped message is generated inside the ASC microcontroller.
Figure 7 illustrates inter-intersection communications on a three-agent corridor: the middle node serves as the corridor coordinator, the upstream node acts as the reference, and the downstream node follows. Cycle-synchronous synchronization signals propagate from Agent 1 → Agent 2 → Agent 3, while the TMC monitors data and may issue timing updates.

5.3.3. Coordinator to TMC Monitoring and Communications

The coordinator agent reports traffic metrics and current signal settings to the city’s Traffic Management Center (TMC) for monitoring. The same center-to-field channel allows the TMC to push timing updates or configuration changes back to cabinets under standard signal-system workflows. Agencies commonly use commercial cellular networks for cabinet-to-center links. The FHWA handbook lists cellular as a traffic-signal communications option and summarizes when it is effective. State guidance shows field-to-central connections implemented with cellular modems or fiber in live deployments. Point-to-point RF is also typical when fiber is absent. Broadband wireless Ethernet (IEEE 802.11 at 2.4/5.8 GHz) supports IP traffic from cabinet radios and is widely applied for signal backhaul.
Other countries illustrate the same pattern. In the United States, Virginia DOT’s Connected Corridor distributes real-time signal phase and timing (SPaT) over both DSRC and a cellular path via a cloud service, with measured cellular latencies well under 100 ms [141]. In California’s guidance, agencies may backhaul ATSPM data via the city interconnect or broadband cellular at each intersection [142]. In Europe, the C-Roads Belgium/Flanders pilot uses 3G/4G cellular to exchange information directly between users and the Traffic Management Centre [143].
Cellular is a good fit for Indonesia. Independent measurements show strong nationwide 4G coverage in populated areas, and operators report broadband networks covering about 97% of the population. These conditions make the cellular viable for reliable telemetry, status, and plan updates.

5.4. Floating Sensors for Traffic Vehicles and TMC Integration

In this context, the coordinator–TMC link can ingest floating sensor feeds using established exchange models. Floating sensors provide probe positions and speeds from vehicles and smartphones. The approach covers links where fixed detectors are sparse and supports travel time, queue, and incident inference. Recent reviews trace two decades of Floating Car Data (FCD) research and its use for network monitoring [144,145].
Researchers estimate link travel times from sparse probe traces. Early work decomposed route times into segment times and modeled reliability. Later studies learned to link travel times from low-frequency probes with neural networks. Smartphone sensing now classifies flow states from noisy phone sensors [146].
Probe streams also drive control. A Nature Communications study re-times signals using only a small fraction of vehicle trajectories. Algorithms based on probe vehicles coordinate signals adaptively at the corridor scale. A field experiment regulated a signal in real time using FCD, demonstrating feasibility beyond simulation. Trajectory-based measures further quantify arterial performance for TMC dashboards [69,147,148].
Beyond the coordinator-to-TMC telemetry, the center can ingest crowd-sourced probe data from navigation platforms. Google Maps can derive live traffic by combining aggregated user location signals with historical patterns using machine learning. Agencies can access Google-derived travel times via the Google Maps APIs. Peer-reviewed work has used the API to build O–D travel-time matrices and to estimate congestion and speeds over city areas without adding fixed hardware at every approach [149]. These API outputs can be ingested by a TMC dashboard next to ATSPM logs [150,151]. Users can also see their own location trails in Google’s Timeline. The research literature documents how Timeline/Location History points can be exported and analyzed, with reported location accuracy characteristics [152].
Developed-country operations show similar crowd-sourced pipelines. The United Kingdom’s National Highways ingests fused-journey-time fields (probe vehicles plus sensors) into the national Roads Information Framework for performance monitoring. U.S. traffic centers use Waze program data and validate commercial probe feeds for arterial travel times. These examples show mature center-to-vendor interfaces and routine control-room use [153].
This approach fits Indonesia. The country remains a 4G-first market with widespread cellular connectivity, which supports reliable API polling from the TMC without new field fiber. Cellular performance reports also show improving user experience across operators. Figure 8 illustrates a three-agent corridor with a middle-node coordinator and TMC/CTSS interface, augmented by floating data from Google-served user devices. Aggregated probe data from Google Maps users flows to Google servers and then to the TMC via APIs, while cabinet traffic data and signal settings flow directly to the TMC.

6. Roadmap for Hybrid Rule-Based and MARL Traffic Signal Control in Indonesia

This section distills the recommendations into a staged roadmap for Indonesian corridors, focusing on how hybrid rule-based safeguards and bounded MARL adaptation can be deployed under real-world constraints. The staged roadmap presented in this section directly reflects the systematic review findings, ensuring that each recommendation is explicitly grounded in the synthesized evidence.
  • Governance, standards, and baselines.
The cabinet controller retains authority over fixed-time cycle, split, and offset plans consistent with national standards, ensuring compliance with APILL/NTCIP specifications. Hybrid adaptation operates only within bounded parameters—such as residual offset adjustments—while preserving safety and interoperability. This governance-first approach establishes a stable baseline against which further innovations can be introduced.
2.
Communications under Indonesian rules.
The city traffic agency defines a two-plane architecture: agent-to-agent links on the corridor and cabinet-to-center IP backhaul to the TMC. This split follows mainstream signal-program guidance for field and center communications. The national regulator (KOMINFO/SDPPI) provides the legal framework: class-licensed operation that explicitly includes RLAN and LPWAN under Permenkominfo No. 2/2023 and its public copies. The systems integrator engineers the cabinet-to-center path as fiber where available, with cellular or class-licensed RLAN (2.4/5 GHz; 6 GHz where authorized in 2025 regulations) when fiber is absent—an arrangement reflected in U.S. handbooks and recent state design guides.
The corridor/field team provisions low-rate, deterministic corridor messaging on LoRaWAN using the AS923 family (Semtech, Camarillo, CA, USA), with channel/EIRP profiles taken from LoRa Alliance regional parameters. Vendors supply radios and gateways that carry SDPPI type approval and conform to the 2024 technical standard for non-cellular LPWAN equipment. The TMC operates the IP backhaul that transports higher-volume ATSPM logs and travel-time feeds alongside routine monitoring traffic.
3.
Pilot a short reference–follower corridor.
Agencies plan to develop three adjacent intersections to operate as a downstream corridor: the middle node serves as coordinator, the upstream node serves as reference, and the next node serves as follower. The scheme follows coordinated-arterial practice governed by cycle, splits, and offsets, then verification uses PCD/AoG from ATSPM.
4.
Bounded and safe MARL.
Agencies start to build a MARL layer that proposes small set-point nudges at cycle boundaries, while the cabinet enforces phases and safety interlocks. Safe-RL practice applies constrained action spaces, masking, and shields suitable for signal control.
5.
Camera-first sensing with minimal fusion.
The next step is to develop vision methods to deliver counts, queues, and incident detection at the edge with proper siting and calibration. Compact radar augments night and rain performance through cabinet-level fusion when needed.
6.
TMC integration and data pipelines.
After the overall part of the hybrid architecture runs well, the TMC ingests high-resolution cabinet events for ATSPM and traffic-aware travel times via Google’s Distance Matrix or the Routes API “Compute Route Matrix.” Regional or cross-agency exchange adopts DATEX II for machine-to-machine road data.
7.
Scale with proven adaptive operations.
This step will expand on the patterns used by mature systems. Area supervisors (as in SCOOT/SCATS) adjust cycle–split–offset over modest-bandwidth IP, while local controllers execute safely. Corridors aggregate into regional groups under a coordinator, and reference–follower timing stays within each corridor. New corridors come online incrementally, reusing the same comms and ATSPM pipelines.
8.
Compliance and spectrum housekeeping.
Program operations remain within Indonesia’s class-licensed framework for RLAN and LPWAN, as set in the 2023 ministerial regulation and its 2025 amendment. These rules define which technologies may operate under class permits and the conditions attached to each band. Non-cellular LPWAN radios follow SDPPI’s 2024 technical standard, which requires factory-locking to the permitted band and defines equipment classes (fixed, vehicular, portable). For RLAN, cabinet backhaul commonly uses 2.4/5 GHz; Indonesia has also opened 6 GHz (Wi-Fi 6E/7) with band-specific limits on power and use. LoRa deployments align with AS923 regional parameters. All radios and gateways obtain SDPPI type approval and labeling before service; the official portals describe the certification process and service standards.
Each stage of the roadmap thus operationalizes specific patterns identified in the systematic review, ensuring continuity from evidence to practice. Economic considerations are central to feasibility in developing cities. Hybrid retrofits that reuse existing cabinet controllers and adopt camera-first sensing are substantially less capital-intensive than full adaptive replacements. Cost-effectiveness arises from leveraging existing infrastructure, deploying low-cost edge devices, and restricting online adaptation to offset tuning rather than full phase reoptimization. Financing mechanisms may include incremental upgrades through municipal budget cycles, integration with broader ITS modernization programs, and selective public–private partnerships. These options reduce upfront costs while enabling gradual scaling across corridors. A validation framework is needed to evaluate hybrid approaches under real-world conditions. We propose metrics that can be logged from existing ATSPM systems, including arrivals-on-green percentages, Purdue Coordination Diagram before-and-after comparisons, delay distributions, and travel-time reliability. Methodologies should combine cycle-by-cycle high-resolution logging with standardized data exchange formats (e.g., DATEX II, NTCIP) to ensure reproducibility across sites. Pilot evaluations can apply a before–and–after design with matched baseline corridors, allowing incremental validation of offset-only adaptations before expanding to more complex safeguards.

7. Conclusions

This systematic review has examined hybrid traffic-signal control architectures that combine rule-based authority with bounded reinforcement learning, with particular reference to Indonesian urban corridors. The synthesis highlights how rule shields, action masking, bounded variables, and prerequisites for action can embed adaptive logic within existing controller hierarchies, while still preserving safety and interpretability. Across 18 included studies, the majority reported measurable improvements in arrivals on green (AoG), Purdue Coordination Diagrams (PCD), delay, and travel time, especially where adaptation was restricted to offset nudges or bounded adjustments rather than full phase selection. These findings confirm that hybrid approaches can deliver performance gains without compromising the safeguards required for field deployment.
From an Indonesian perspective, the relevance is acute. Current operations are dominated by pre-timed plans in corridors with lane-free and heterogeneous traffic flows, where platoon dispersion, high motorcycle share, and irregular saturation flows make transfer of Western adaptive systems problematic. By emphasizing bounded offset correction and the preservation of rule-based plan authority, the proposed Shielded Independent Q-Learning with ΔOffset (SIQL-ΔOffset) framework aligns with the operational constraints of Indonesian agencies. In this way, the review contributes both a consolidated evidence base and a tailored roadmap for the gradual hybridization of existing Area Traffic Control Systems (ATCS).
At the policy level, the implications extend beyond Indonesia. The patterns identified—retaining cycle and split authority, bounding RL action spaces, and verifying gains through AoG/PCD metrics—are equally applicable to other developing cities facing resource constraints and heterogeneous traffic. As communications mature and sensing becomes more robust, incremental strengthening of adaptive functions can be pursued. The review thus supports a staged roadmap: begin with high-resolution logging and performance audits, restrict online adaptation to bounded offsets, expand multi-sensor fusion gradually, and institutionalize governance mechanisms for scaling hybrid systems.
This review has several limitations that should be acknowledged. First, the synthesis relied on a narrative approach rather than a formal meta-analysis because the included studies reported heterogeneous designs, metrics, and baselines. Second, evidence specific to Indonesia remains limited, and most findings come from simulation studies conducted in other contexts, which constrains direct transferability. Third, performance metrics varied widely across studies—for example, some reported delay or travel time, while others used arrivals on green (AoG) or Purdue Coordination Diagrams (PCD)—which reduces comparability across cases. Fourth, search strategy limitations may introduce bias: despite broad coverage of six major databases, non-English studies may have been missed, and the inclusion of preprints (arXiv) may introduce uncertainty. Finally, these limitations affect the confidence in the deployment checklist and roadmap proposed here: while the checklist reflects consistent patterns across studies, its practical value still depends on pilot validation and field evidence.
Beyond methodological limitations, implementation risks deserve attention. Technical risks include unreliable sensors, backhaul outages, or synchronization errors that could disrupt corridor operations. Institutional risks arise from staffing shortages, fragmented mandates, and resistance to procedural change within traffic agencies. Operational risks exist if safeguards are misapplied, potentially creating unsafe phase transitions. Mitigation strategies include redundant communications, audit-first acceptance policies, and staff training. Future research should quantify these risks systematically and link them to mitigation costs and institutional readiness, ensuring that the roadmap can be realistically applied in practice.
Future work should focus on pilot evaluation designs and the development of data-sharing standards that allow comparability across cities. A validation framework is needed to evaluate hybrid approaches under real-world conditions. We propose metrics that can be logged from existing ATSPM systems, including arrivals-on-green percentages, Purdue Coordination Diagram before-and-after comparisons, delay distributions, and travel-time reliability. Methodologies should combine cycle-by-cycle high-resolution logging with standardized data exchange formats (e.g., DATEX II, NTCIP) to ensure reproducibility across sites. Pilot evaluations can apply a before-and-after design with matched baseline corridors, allowing incremental validation of offset-only adaptations before expanding to more complex safeguards. Future studies should also quantify implementation risks systematically, linking them to mitigation costs and institutional readiness, so that agencies can make informed decisions about deployment pathways.
In summary, this review provides a consolidated evidence base and a practical roadmap for hybrid rule-based and MARL traffic-signal control in developing contexts. The findings affirm that bounded, safeguarded approaches are not only technically viable but also institutionally realistic under Indonesian conditions. Explicit limitations remind us that further work is needed to strengthen sensing, communications, and governance. Future research, especially corridor-scale pilots and data-standardization initiatives, will determine whether hybrid adaptive systems can transition from promising simulations to reliable field deployments.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/app151910761/s1; Table S1 (Search strings and runs), Table S2 (Protocol deviations), Table S3 (Criteria clarifications), Table S4 (Screening log), Table S5 (Data extraction form), Table S6 (Synthesis rules).

Author Contributions

Conceptualization, F.K.; methodology, F.K., D.D. and O.D.; software, H.A.; validation, F.K., D.D. and O.D.; formal analysis, F.K. and N.A.; investigation, F.K.; resources, F.K. and N.A.; data curation, H.A.; writing—original draft preparation, F.K. and O.D.; writing—review and editing, F.K. and O.D.; visualization, H.A.; supervision, R.N.; project administration, R.N.; funding acquisition, R.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Higher Education, Science and Technology (Kemdiktisaintek, Indonesia) through Higher Education Service Institution Region V (LLDikti V), grant number 0498.31/LL5-INT/AL.04/2025.

Institutional Review Board Statement

Ethical review and approval were waived for this study because it did not involve human participants or animals and analyzed only non-identifiable, aggregated traffic data.

Informed Consent Statement

Informed consent was waived because the study did not involve human participants and used only non-identifiable, aggregated traffic data.

Data Availability Statement

All data supporting this study are openly available from published literature and public web resources cited in the References; no new datasets were generated. Data reused from the authors’ prior publications are available within those articles and their associated repositories/DOIs. Extraction sheets and minor aggregation files used for the review are available from the corresponding author on reasonable request.

Acknowledgments

The authors gratefully acknowledge the administrative support of the Higher Education Service Institution Region V (LLDIKTI V) and the institutional support of Institut Teknologi Dirgantara Adisutjipto (ITDA). We also appreciate the constructive feedback provided during the review process. Any opinions and conclusions expressed here are those of the authors and do not necessarily reflect the views of the supporting institutions.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
5G Fifth-Generation Mobile Network.
ACS-Lite Adaptive Control Software Lite.
ANPR Automatic Number Plate Recognition.
AoG Arrivals on Green.
APILL Alat Pemberi Isyarat Lalu Lintas (traffic signal devices; Indonesian regulation).
ASCActuated Signal Controller
ASCE JTE (Part A) ASCE Journal of Transportation Engineering, Part A: Systems.
ATCS Area Traffic Control System.
ATMS Advanced Traffic Management System.
ATSAC Automated Traffic Surveillance and Control (Los Angeles).
ATSPM Automated Traffic Signal Performance Measures.
BMS Bus Management System (ticketing data).
BSFR Base Saturation Flow Rate.
ByteTrack Multi-object tracking method.
CCTV Closed-Circuit Television.
CNN Convolutional Neural Network.
DeepSORT Deep Simple Online and Realtime Tracking.
ERP Electronic Road Pricing.
FHWA Federal Highway Administration.
GPS Global Positioning System.
IoT Internet of Things.
LMIC Low- and Middle-Income Countries.
LoRa Long Range (radio).
LoRaWAN Long Range Wide Area Network.
LPWAN Low-Power Wide-Area Network.
LTA Land Transport Authority (Singapore).
LTE Long-Term Evolution (4G).
MARL Multi-Agent Reinforcement Learning.
MAXBAND Multi-Band Arterial Progression Optimization.
MKJI Manual Kapasitas Jalan Indonesia (Indonesian Highway Capacity Manual).
NTCIP National Transportation Communications for Intelligent Transportation System Protocol.
PCD Purdue Coordination Diagram.
PCE Passenger Car Equivalent.
PTW Powered Two-Wheeler.
RL Reinforcement Learning.
SCATS Sydney Coordinated Adaptive Traffic System.
SCOOT Split Cycle and Offset Optimization Technique.
SHCSignal Head Control
SNMPv3 Simple Network Management Protocol, Version 3.
SUMO Simulation of Urban Mobility.
TMC Traffic Management Center.
TOD Time-of-Day (plans).
TOPIS Transport Operation and Information Service (Seoul).
TRPS Traffic-Responsive Plan Selection.
TRR Transportation Research Record.
UTC Urban Traffic Control.
V2X Vehicle-to-Everything (communications).
VIVDS Video Image Vehicle Detection System.
YOLO You Only Look Once (object detection).

References

  1. Majstorović, Ž.; Tišljarić, L.; Ivanjko, E.; Carić, T. Urban Traffic Signal Control under Mixed Traffic Flows: Literature Review. Appl. Sci. 2023, 13, 4484. [Google Scholar] [CrossRef]
  2. Michailidis, P.; Michailidis, I.; Lazaridis, C.R.; Kosmatopoulos, E. Traffic Signal Control via Reinforcement Learning: A Review on Applications and Innovations. Infrastructures 2025, 10, 114. [Google Scholar] [CrossRef]
  3. Sugiarto, S.; Saleh, S.M.; Darma, Y.; Rusdi, M.; A’yUni, Q.; Fazila, T.S.; Rahma, R. Base saturation flow rate (BSFR) and its effect on performance of pretimed signalized intersection with non-lane based urban heterogeneous traffic. PLoS ONE 2024, 19, e0306112. [Google Scholar] [CrossRef]
  4. Noaeen, M.; Naik, A.; Goodman, L.; Crebo, J.; Abrar, T.; Abad, Z.S.H.; Bazzan, A.L.; Far, B. Reinforcement learning in urban network traffic signal control: A systematic literature review. Expert Syst. Appl. 2022, 199, 116830. [Google Scholar] [CrossRef]
  5. Xiao, F.; Lu, J.; Li, L.; Tu, W.; Li, C. Advances in reinforcement learning for traffic signal control: A review of recent progress. Intell. Transp. Infrastruct. 2025, 4, liaf009. [Google Scholar] [CrossRef]
  6. Wei, H.; Xu, N.; Zhang, H.; Zheng, G.; Zang, X.; Chen, C.; Zhang, W.; Zhu, Y.; Xu, K.; Li, Z. CoLight. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; ACM: New York, NY, USA, 2019; pp. 1913–1922. [Google Scholar] [CrossRef]
  7. Kushwaha, A.; Ravish, K.; Lamba, P.; Kumar, P. A Survey of Safe Reinforcement Learning and Constrained MDPs: A Technical Survey on Single-Agent and Multi-Agent Safety. arXiv 2025. [Google Scholar] [CrossRef]
  8. Müller, A.; Sabatelli, M. Safe and Psychologically Pleasant Traffic Signal Control with Reinforcement Learning using Action Masking. arXiv 2022. [Google Scholar] [CrossRef]
  9. Alshiekh, M.; Bloem, R.; Ehlers, U.; Könighofer, B.; Niekum, S.; Topcu, U. Safe Reinforcement Learning via Shielding. [Online]. Available online: https://www.aaai.org (accessed on 17 August 2025).
  10. Zheng, Y.; Luo, J.; Gao, H.; Zhou, Y.; Li, K. Pri-DDQN: Learning adaptive traffic signal control strategy through a hybrid agent. Complex Intell. Syst. 2025, 11, 47. [Google Scholar] [CrossRef]
  11. Han, Y.; Wang, M.; Leclercq, L. Leveraging reinforcement learning for dynamic traffic control: A survey and challenges for field implementation. Commun. Transp. Res. 2023, 3, 100104. [Google Scholar] [CrossRef]
  12. Miletić, M.; Ivanjko, E.; Gregurić, M.; Kušić, K. A review of reinforcement learning applications in adaptive traffic signal control. IET Intell. Transp. Syst. 2022, 16, 1269–1285. [Google Scholar] [CrossRef]
  13. Nasution, S.M.; Husni, E.; Kuspriyanto, K.; Yusuf, R. Heterogeneous Traffic Condition Dataset Collection for Creating Road Capacity Value. Big Data Cogn. Comput. 2023, 7, 40. [Google Scholar] [CrossRef]
  14. Chen, R.; Fang, F.; Sadeh, N. The Real Deal: A Review of Challenges and Opportunities in Moving Reinforcement Learning-Based Traffic Signal Control Systems Towards Reality. arXiv 2022. [Google Scholar] [CrossRef]
  15. A Joint Standard of AASHTO, ITE, and NEMA NTCIP 1202 v03A National Transportation Communications for ITS Protocol Object Definitions for Actuated Signal Controllers (ASC) Interface (Including FYA Errata). [Online]. 2019. Available online: https://www.ntcip.org (accessed on 17 August 2025).
  16. Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
  17. Perry, R.; Whitmarsh, A.; Leach, V.; Davies, P. A comparison of two assessment tools used in overviews of systematic reviews: ROBIS versus AMSTAR-2. Syst. Rev. 2021, 10, 273. [Google Scholar] [CrossRef]
  18. Lunny, C.; Kanji, S.; Thabet, P.; Haidich, A.-B.; Bougioukas, K.I.; Pieper, D. Assessing the methodological quality and risk of bias of systematic reviews: Primer for authors of overviews of systematic reviews. BMJ Med. 2024, 3, e000604. [Google Scholar] [CrossRef]
  19. Horbachov, P.; Liubyi, Y.; Svichynskyi, S.; Muzylyov, D.; Ivanov, V. A comprehensive assessment of arterial signal coordination through a case study. Transp. Res. Interdiscip. Perspect. 2025, 29, 101321. [Google Scholar] [CrossRef]
  20. Alkaissi, Z.A. Effect of Signal Coordination on the Traffic Operation of Urban Corridor. Tikrit J. Eng. Sci. 2023, 30, 12–24. [Google Scholar] [CrossRef]
  21. Abdelaziz, S.L.; Abbas, M.M.; McGhee, C.C. Determination of Significant Critical Movements to Generate Traffic Scenarios for Large Arterial Networks. Transp. Res. Rec. J. Transp. Res. Board 2009, 2128, 202–216. [Google Scholar] [CrossRef]
  22. Wu, X.; Yang, H.; Mainali, B.; Pokharel, P.; Chiu, S. Development of platoon-based actuated signal control systems to coordinated intersections: Application in corridors in Houston. IET Intell. Transp. Syst. 2020, 14, 127–137. [Google Scholar] [CrossRef]
  23. Chia, I.; Wu, X.; Dhaliwal, S.S.; Thai, J.; Jia, X. Evaluation of Actuated, Coordinated, and Adaptive Signal Control Systems: A Case Study. J. Transp. Eng. Part A Syst. 2017, 143, 05017007. [Google Scholar] [CrossRef]
  24. Hansen, B.G.; Martin, P.T.; Perrin, H.J. SCOOT Real-Time Adaptive Control in a CORSIM Simulation Environment. Transp. Res. Rec. J. Transp. Res. Board 2000, 1727, 27–30. [Google Scholar] [CrossRef]
  25. Tian, Z.; Ohene, F.; Hu, P. Arterial Performance Evaluation on an Adaptive Traffic Signal Control System. Procedia Soc. Behav. Sci. 2011, 16, 230–239. [Google Scholar] [CrossRef]
  26. Moore, I.J.E.; Mattingly, S.P.; MacCarley, C.A.; McNally, M.G. Anaheim Advanced Traffic Control System Field Operations Test: A Technical Evaluation of SCOOT. Transp. Plan. Technol. 2005, 28, 465–482. [Google Scholar] [CrossRef]
  27. Ambarwati, L.; Pel, A.J.; Verhaeghe, R.; van Arem, B. Empirical analysis of heterogeneous traffic flow and calibration of porous flow model. Transp. Res. Part C Emerg. Technol. 2014, 48, 418–436. [Google Scholar] [CrossRef]
  28. Kariyana, I.M.; Suthanaya, P.A.; Wedagama, D.M.P.; Ariawan, I.M.A.; Dissanayake, D. The influence of motorcycle behavior on saturation flow rate at signalized intersections with and without exclusive stopping space for motorcycle (ESSM). IOP Conf. Ser. Earth Environ. Sci. 2021, 673, 012020. [Google Scholar] [CrossRef]
  29. Sulistio, H. Effect of Traffic Flow, Proportion of Motorcycle, Speed, Lane Width, and the Availabilities of Median and Shoulder on Motorcycle Accidents at Urban Roads in Indonesia. Open Transp. J. 2018, 12, 1–7. [Google Scholar] [CrossRef]
  30. Ministry of Transportation of Indonesia. Alat Pemberi Isyarat Lalu-Lintas [Traffic Signal Devices]. Legislation. [Online]. Available online: https://peraturan.bpk.go.id/Details/103764/permenhub-no-49-tahun-2014?utm_source=chatgpt.com (accessed on 17 August 2025).
  31. Pangestu, I.B.; Maimunah, M.; Hanafi, M. Traffic Congestion Detection Using YOLOv8 Algorithm With CCTV Data. PIKSEL Penelit. Ilmu Komput. Sist. Embed. Log. 2024, 12, 435–444. [Google Scholar] [CrossRef]
  32. Fahrunnisa, Z.; Rahmadwati, R.; Setyawan, R.A. Adaptive Traffic Light Signal Control Using Fuzzy Logic Based on Real-Time Vehicle Detection from Video Surveillance. J. Ilm. Tek. Elektro Komput. Dan Inform. 2024, 10, 235–251. [Google Scholar] [CrossRef]
  33. Kustija, J. SCATS (Sydney Coordinated Adaptive Traffic System) as A Solution to Overcome Traffic Congestion in Big Cities. Int. J. Res. Appl. Technol. 2023, 3, 1–14. [Google Scholar] [CrossRef]
  34. Federal Highway Administration. Automated Traffic Signal Performance Measures. Federal Highway Administration: Washington, DC, USA, 2020. [Online]. Available online: https://www.flickr.com/photos/sounderbruce/22448121975 (accessed on 17 August 2025).
  35. Tanaka, A.; Schroeder, P.B.; Trask, P.L.; Chase, T. Automated Traffic Signal Performance Measures; Kittelson & Associates, Inc.: Portland, OR, USA, 2019. [Google Scholar]
  36. Yulianto, B. Detector technology for demand responsive traffic signal control under mixed traffic conditions. In Proceedings of the 4th International Conference on Engineering, Technology, and Industrial Application (ICETIA), Surakarta, Indonesia, 13–14 December 2017; p. 040021. [Google Scholar] [CrossRef]
  37. Romero, D.D.; Prabuwono, A.S.; Hasniaty, A. A Review of Sensing Techniques for Real-time Traffic Surveillance. J. Appl. Sci. 2010, 11, 192–198. [Google Scholar] [CrossRef]
  38. Saldivar-Carranza, E.; Li, H.; Mathew, J.; Hunter, M.; Sturdevant, J.; Bullock, D.M. Deriving Operational Traffic Signal Performance Measures from Vehicle Trajectory Data. Transp. Res. Rec. J. Transp. Res. Board 2021, 2675, 1250–1264. [Google Scholar] [CrossRef]
  39. Day, C.M.; Bullock, D.M. Optimization of Traffic Signal Offsets with High Resolution Event Data. J. Transp. Eng. Part A Syst. 2020, 146. [Google Scholar] [CrossRef]
  40. Schultz, G.G.; Macfarlane, G.S.; Wang, B.; Graduate, E.; Assistant, R.; Mccuen, S. Evaluating the Quality of Signal Operations Using Signal Performance Measures. [Online]. Available online: https://www.udot.utah.gov/go/research (accessed on 17 August 2025).
  41. Tafidis, P.; Gholamnia, M.; Sajadi, P.; Vijayakrishnan, S.K.; Pilla, F. Evaluating the impact of urban traffic patterns on air pollution emissions in Dublin: A regression model using google project air view data and traffic data. Eur. Transp. Res. Rev. 2024, 16, 47. [Google Scholar] [CrossRef]
  42. Zhao, P.; Yuan, Y.; Guo, T. Extensible Hierarchical Multi-Agent Reinforcement-Learning Algorithm in Traffic Signal Control. Appl. Sci. 2022, 12, 12783. [Google Scholar] [CrossRef]
  43. Ma, J.; Wu, F. Feudal Multi-Agent Deep Reinforcement Learning for Traffic Signal Control. [Online]. Available online: https://www.ifaamas.org (accessed on 17 August 2025).
  44. Zou, Y.; Liu, R.; Li, Y.; Ma, Y.; Wang, G. Signal adaptive cooperative control of two adjacent traffic intersections using a two-stage algorithm. Expert Syst. Appl. 2021, 174, 114746. [Google Scholar] [CrossRef]
  45. Zhang, L.; Wu, W.; Huang, X. A Dynamic Optimization Model for Adjacent Signalized Intersection Control Systems Based on the Stratified Sequencing Method. J. Highw. Transp. Res. Dev. (Engl. Ed.) 2016, 10, 85–91. [Google Scholar] [CrossRef]
  46. Kurniawan, F.; Jusoh, M.; Muminov, B.; Alam, H.; Dermawan, D.; Purnomo, M.J. Development of Coordinated Control of Vehicle Traffic Flow at Adjacent Intersection. Aviat. Electron. Inf. Technol. Telecommun. Electr. Control. (AVITEC) 2025, 7, 103. [Google Scholar] [CrossRef]
  47. Alam, H.; Kurniawan, F.; Widya, H. Hardware-Implemented Multi-Agent Coordination for Adaptive Traffic Control in Developing Regions: A Low-Cost Microcontroller Approach. In Proceedings of the 2nd International Conference on Science and Technology UISU (ICST), Online, 23–24 July 2025. [Google Scholar] [CrossRef]
  48. Li, W.; Ban, X. Connected Vehicle-Based Traffic Signal Coordination. Engineering 2020, 6, 1463–1472. [Google Scholar] [CrossRef]
  49. Shoup, G.E.; Bullock, D. Dynamic Offset Tuning Procedure Using Travel Time Data. Transp. Res. Rec. J. Transp. Res. Board 1999, 1683, 84–94. [Google Scholar] [CrossRef]
  50. Driessen, S.P.H.; Janssen, N.H.J.; Wang, L.; Palmer, J.L.; Nijmeijer, H. Experimentally Validated Extended Kalman Filter for UAV State Estimation Using Low-Cost Sensors. IFAC-Pap. 2018, 51, 43–48. [Google Scholar] [CrossRef]
  51. Kurniawan, F.; Dermawan, D.; Dinaryanto, O.; Irawati, M. Pre-Timed and Coordinated Traffic Controller Systems Based on AVR Microcontroller. TELKOMNIKA (Telecommun. Comput. Electron. Control) 2014, 12, 787. [Google Scholar] [CrossRef]
  52. Kurniawan, F.; Sajati, H.; Dinaryanto, O. Adaptive Traffic Controller Based On Pre-Timed System. TELKOMNIKA (Telecommun. Comput. Electron. Control) 2016, 14, 56. [Google Scholar] [CrossRef]
  53. Guo, H.; Crossley, P. Design of a Time Synchronization System Based on GPS and IEEE 1588 for Transmission Substations. IEEE Trans. Power Deliv. 2017, 32, 2091–2100. [Google Scholar] [CrossRef]
  54. Day, C.M.; Smaglik, E.J.; Bullock, D.M.; Sturdevant, J.R. FHWA/IN/JTRP-2008/9 Final Report Real-Time Arterial Traffic Signal Performance Measures. [Online]. 2008. Available online: https://www.fhwa.dot.gov/publications/research/operations/06083/index.cfm?utm_source=chatgpt.com (accessed on 21 August 2025).
  55. NTCIP 1202 v04 Actuated Signal Control (ASC). Institute of Transportation Engineers. [Online]. Available online: https://www.ite.org/technical-resources/topics/standards/ntcip-1202-v04-actuated-signal-control-asc/?utm_source=chatgpt.com (accessed on 18 August 2025).
  56. Federal Highway Administration. ACS Lite; Federal Highway Administration Research and Technology: McLean, VA, USA, 2008. [Google Scholar]
  57. Mishra, S.; Bhattacharya, D.; Gupta, A.; Singh, V.R. Adaptive traffic light cycle time controller using microcontrollers and crowdsource data of google apis for developing countries. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2018, IV-4/W7, 83–90. [Google Scholar] [CrossRef]
  58. Diaz, K.A.; Dailisan, D.; Sharaf, U.; Santos, C.; Gan, Q.; Uy, F.A.; Lim, M.T.; Bayen, A.M. Adaptive Coordination Offsets for Signalized Arterial Intersections using Deep Reinforcement Learning. arXiv 2022. [Google Scholar] [CrossRef]
  59. Kim, H.; Tak, H.; Yu, H.; Yeo, H. Hierarchical Traffic Signal Coordination with Priority-based Optimization via Deep Reinforcement Learning. Available online: https://trc-30.epfl.ch/wp-content/uploads/2024/09/TRC-30_paper_205.pdf (accessed on 17 August 2025).
  60. Park, S.; Han, E.; Park, S.; Jeong, H.; Yun, I. Deep Q-network-based traffic signal control models. PLoS ONE 2021, 16, e0256405. [Google Scholar] [CrossRef]
  61. Gu, J.; Lee, M.; Jun, C.; Han, Y.; Kim, Y.; Kim, J. Traffic Signal Optimization for Multiple Intersections Based on Reinforcement Learning. Appl. Sci. 2021, 11, 10688. [Google Scholar] [CrossRef]
  62. Kolat, M.; Kővári, B.; Bécsi, T.; Aradi, S. Multi-Agent Reinforcement Learning for Traffic Signal Control: A Cooperative Approach. Sustainability 2023, 15, 3479. [Google Scholar] [CrossRef]
  63. Jiang, Q.; Qin, M.; Shi, S.; Sun, W.; Zheng, B. Multi-Agent Reinforcement Learning for Traffic Signal Control Through Universal Communication Method. [Online]. 2022. Available online: https://github.com/ (accessed on 17 August 2025).
  64. Meess, H.; Gerner, J.; Hein, D.; Schmidtner, S.; Elger, G.; Bogenberger, K. First steps towards real-world traffic signal control optimisation by reinforcement learning. J. Simul. 2024, 18, 957–972. [Google Scholar] [CrossRef]
  65. Mathe, S.E.; Kondaveeti, H.K.; Vappangi, S.; Vanambathina, S.D.; Kumaravelu, N.K. A comprehensive review on applications of Raspberry Pi. Comput. Sci. Rev. 2024, 52, 100636. [Google Scholar] [CrossRef]
  66. Bouktif, S.; Cheniki, A.; Ouni, A. Traffic Signal Control Using Hybrid Action Space Deep Reinforcement Learning. Sensors 2021, 21, 2302. [Google Scholar] [CrossRef] [PubMed]
  67. Bi, Y.; Ding, Q.; Du, Y.; Liu, D.; Ren, S. Intelligent Traffic Control Decision-Making Based on Type-2 Fuzzy and Reinforcement Learning. Electronics 2024, 13, 3894. [Google Scholar] [CrossRef]
  68. Li, J.; Lin, S.; Shi, T.; Tian, C.; Mei, Y.; Song, J.; Zhan, X.; Li, R. A Fully Data-Driven Approach for Realistic Traffic Signal Control Using Offline Reinforcement Learning. arXiv 2022. [Google Scholar] [CrossRef]
  69. Lian, F.; Chen, B.; Zhang, K.; Miao, L.; Wu, J.; Luan, S. Adaptive traffic signal control algorithms based on probe vehicle data. J. Intell. Transp. Syst. 2021, 25, 41–57. [Google Scholar] [CrossRef]
  70. Saadi, A.; Abghour, N.; Chiba, Z.; Moussaid, K.; Ali, S. A survey of reinforcement and deep reinforcement learning for coordination in intelligent traffic light control. J. Big. Data 2025, 12, 84. [Google Scholar] [CrossRef]
  71. Li, M.; Pan, X.; Liu, C.; Li, Z. Federated deep reinforcement learning-based urban traffic signal optimal control. Sci. Rep. 2025, 15, 11724. [Google Scholar] [CrossRef] [PubMed]
  72. Othman, K.; Wang, X.; Shalaby, A.; Abdulhai, B. Multimodal adaptive traffic signal control: A decentralized multiagent reinforcement learning approach. Multimodal Transp. 2025, 4, 100190. [Google Scholar] [CrossRef]
  73. Satheesh, A.; Powell, K. A Constrained Multi-Agent Reinforcement Learning Approach to Autonomous Traffic Signal Control. arXiv 2025. [Google Scholar] [CrossRef]
  74. Huang, X.; Wu, D.; Jenkin, M.; Boulet, B. ModelLight: Model-Based Meta-Reinforcement Learning for Traffic Signal Control. arXiv 2021, 1–8. [Google Scholar] [CrossRef]
  75. Bie, Y.; Ji, Y.; Ma, D. Multi-agent Deep Reinforcement Learning collaborative Traffic Signal Control method considering intersection heterogeneity. Transp. Res. Part C Emerg. Technol. 2024, 164, 104663. [Google Scholar] [CrossRef]
  76. Jia, W.; Ji, M. Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control with Spatio-Temporal Attention Mechanism. Appl. Sci. 2025, 15, 8605. [Google Scholar] [CrossRef]
  77. Zhang, X.; Chan, L.S.; Nassir, N.; Sarvi, M. Towards fair lights: A multi-agent masked deep reinforcement learning for efficient corridor-level traffic signal control. Commun. Transp. Res. 2025, 5, 100203. [Google Scholar] [CrossRef]
  78. Dhulkefl, E.J.; Abdulsattar, A.W.; Khudhur, Z.M.; Mahmood, T.A. Design of a Hybrid Intelligent Traffic Signal Control System Using Nearest Neighbor Algorithm and Deep Reinforcement Learning with SUMO Simulator. J. Res. Eng. Comput. Sci. 2025, 3, 31–40. [Google Scholar] [CrossRef]
  79. Muller, A.; Rangras, V.; Ferfers, T.; Hufen, F.; Schreckenberg, L.; Jasperneite, J.; Schnittker, G.; Waldmann, M.; Friesen, M.; Wiering, M. Towards Real-World Deployment of Reinforcement Learning for Traffic Signal Control. In Proceedings of the 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), Pasadena, CA, USA, 13–16 December 2021. [Google Scholar] [CrossRef]
  80. Flir Helps Indonesia Start Tackling Congestion. ITS International. [Online]. Available online: https://www.itsinternational.com/its4/feature/flir-helps-indonesia-start-tackling-congestion#:~:text=Extensive%20TrafiCam%20site (accessed on 18 August 2025).
  81. Intelligent Transport Systems. A Singapore Government Agency. [Online]. Available online: https://www.lta.gov.sg/content/ltagov/en/getting_around/driving_in_singapore/intelligent_transport_systems.html#:~:text=Collectively%20called%20the%20Intelligent%20Transport,to%20maximise%20road%20network%20efficiency (accessed on 18 August 2025).
  82. TOPIS: Seoul’s Intelligent Traffic System (ITS). Seoul Solution. [Online]. Available online: https://seoulsolution.kr/en/content/2595#:~:text=The%20Seoul%20Metropolitan%20Government%20discloses,Twitter%2C%20etc (accessed on 18 August 2025).
  83. The Brazilian City of Curitiba Awards Indra Its Largest Intelligent Urban Transport and Mobility Project for €15 Million. IndraCompany. [Online]. Available online: https://www.indracompany.com/en/noticia/brazilian-city-curitiba-awards-indra-largest-intelligent-urban-transport-mobility-project#:~:text=The%20project%20includes%20new%20Intelligent,model%20based%20on%20predictive%20algorithms (accessed on 18 August 2025).
  84. Intelligent Transport Systems Intelligent Transport Systems. Kenya Urban Roads Authority. [Online]. Available online: https://kura.go.ke/news/intelligent-transport-systems/13363/#:~:text=The%20Authority%20has%20continued%20to,located%20at%20the%20KURA%20headquarters (accessed on 18 August 2025).
  85. Vu, T.; Thai, H.N.; Pham, V.N.; Vu, H.T.; Luong, A.T.; Van Luong, T. Counting Mixed Traffic Volumes at Motorcycle-Dominated Intersections by Using Computer Vision. Int. J. Intell. Transp. Syst. Res. 2025, 23, 146–164. [Google Scholar] [CrossRef]
  86. Beza, A.D.; Xie, Z.; Ramezani, M.; Levinson, D. From lane-less to lane-free: Implications in the era of automated vehicles. Transp. Res. Part C Emerg. Technol. 2025, 170, 104898. [Google Scholar] [CrossRef]
  87. Singh, M.K.; Rao, K.R. Simulation of Signalized Intersection with Non-Lane-Based Heterogeneous Traffic Conditions Using Cellular Automata. Transp. Res. Rec. J. Transp. Res. Board 2024, 2678, 909–930. [Google Scholar] [CrossRef]
  88. Pahlevi, R.; Kristanto, B. The effectiveness of atcs use policy in improving traffic smoothness and safety in banjarbaru city. J. Soc. Sci. 2025, 6, 1–7. [Google Scholar] [CrossRef]
  89. Oktorini, R.; Barus, L.S. Integration of Public Transportation in Smart Transportation System (Smart Transportation System) in Jakarta. Konfrontasi J. Kult. Ekon. Dan Perubahan Sos. 2022, 9, 341–347. [Google Scholar] [CrossRef]
  90. Tania, N.; Rachmawati, R. Area Traffic Control System (ATCS) for Supporting Urban Traffic Management in DKI Jakarta. In Proceedings of the 2022 7th International Conference on Electric Vehicular Technology (ICEVT), Bali, Indonesia, 14–16 September 2022; pp. 103–108. [Google Scholar] [CrossRef]
  91. Syah, A.F.; Aminudin, M.S.; Salsabila, N.; Rizqi, D.I. Leveraging Artificial Intelligence for Smart City Development: Social and Governance Impacts in Jakarta. In Proceedings of the Internasional Seminar on Arts, Artificial Intelligence & Society; 2024. Available online: https://www.google.com.hk/url?sa=t&source=web&rct=j&opi=89978449&url=https://proceeding.ikj.ac.id/index.php/UXA/article/download/118/111/385&ved=2ahUKEwjDqP6x04eQAxWcBdsEHeFHFNYQFnoECBkQAQ&usg=AOvVaw3Vvxu6ndjkCHDESqCinoro (accessed on 17 August 2025).
  92. Hounsell, N.B.; Shrestha, B.P.; Piao, J.; McDonald, M. Review of urban traffic management and the impacts of new vehicle technologies. IET Intell. Transp. Syst. 2009, 3, 419–428. [Google Scholar] [CrossRef]
  93. Otto, T.; Partzsch, I.; Holfeld, J.; Klöppel-Gersdorf, M.; Ivanitzki, V. Designing a C-ITS Communication Infrastructure for Traffic Signal Priority of Public Transport. Appl. Sci. 2023, 13, 7650. [Google Scholar] [CrossRef]
  94. Handoko, C.; Zhang, X. Case Study on The Palapa Ring Project: Prospects for Sub-National Competitiveness; Asia Competitiveness Institute: Singapore, 2021. [Google Scholar]
  95. Roadmaps for Awarding 5G Spectrum: A Focus on Indonesia. [Online]. 2022. Available online: www.gsma.com (accessed on 17 August 2025).
  96. An, C.; Wu, Y.-J.; Xia, J.; Lu, Z. Investigating Impacts of Communication Loss on Signal Performance with Use of Event-Based Data. Transp. Res. Rec. J. Transp. Res. Board 2017, 2645, 38–49. [Google Scholar] [CrossRef]
  97. Espinosa, J.E.; Velastin, S.A.; Branch, J.W. Motorcycle detection and classification in urban Scenarios using a model based on Faster R-CNN. In Proceedings of the 9th International Conference on Pattern Recognition Systems (ICPRS 2018), Valparaíso, Chile, 22–24 May 2018; Institution of Engineering and Technology: London, UK, 2018; p. 16. [Google Scholar] [CrossRef]
  98. Nocua M, F.; Pérez-Holguín, W.-J.; Pardo-Beainy, C. Urban traffic monitoring based on deep learning on an embedded GPU. Expert Syst. Appl. 2025, 273, 126847. [Google Scholar] [CrossRef]
  99. Espinosa, J.E.; Velastin, S.A.; Branch, J.W. Detection of Motorcycles in Urban Traffic Using Video Analysis: A Review. IEEE Trans. Intell. Transp. Syst. 2021, 22, 6115–6130. [Google Scholar] [CrossRef]
  100. Espinosa, J.E.; Velastin, S.A.; Branch, J.W. Detection and Tracking of Motorcycles in Congested Urban Environments Using Deep Learning and Markov Decision Processes. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer Verlag: Berlin/Heidelberg, Germany, 2019; pp. 139–148. [Google Scholar] [CrossRef]
  101. Akhtar, A.; Ahmed, R.; Yousaf, M.H.; Velastin, S.A. Real-Time Motorbike Detection: AI on the Edge Perspective. Mathematics 2024, 12, 1103. [Google Scholar] [CrossRef]
  102. Asaithambi, G.; Kanagaraj, V.; Toledo, T. Driving Behaviors: Models and Challenges for Non-Lane Based Mixed Traffic. Transp. Dev. Econ. 2016, 2, 19. [Google Scholar] [CrossRef]
  103. Yu, X.; Hu, T.; Zhu, H. Roadside Perception Applications Based on DCAM Fusion and Lightweight Millimeter-Wave Radar–Vision Integration. Electronics 2025, 14, 1576. [Google Scholar] [CrossRef]
  104. Zhang, Y.; Carballo, A.; Yang, H.; Takeda, K. Perception and sensing for autonomous vehicles under adverse weather conditions: A survey. ISPRS J. Photogramm. Remote Sens. 2023, 196, 146–177. [Google Scholar] [CrossRef]
  105. Ogunrinde, I.; Bernadin, S. Deep Camera–Radar Fusion with an Attention Framework for Autonomous Vehicle Vision in Foggy Weather Conditions. Sensors 2023, 23, 6255. [Google Scholar] [CrossRef]
  106. Bhadoriya, A.S.; Vegamoor, V.; Rathinam, S. Vehicle Detection and Tracking Using Thermal Cameras in Adverse Visibility Conditions. Sensors 2022, 22, 4567. [Google Scholar] [CrossRef]
  107. Wang, Z.; Zhan, J.; Li, Y.; Zhong, Z.; Cao, Z. A new scheme of vehicle detection for severe weather based on multi-sensor fusion. Measurement 2022, 191, 110737. [Google Scholar] [CrossRef]
  108. Guan, F.; Xu, H.; Tian, Y. Evaluation of Roadside LiDAR-Based and Vision-Based Multi-Model All-Traffic Trajectory Data. Sensors 2023, 23, 5377. [Google Scholar] [CrossRef]
  109. Tang, J.; Wan, L.; Schooling, J.; Zhao, P.; Chen, J.; Wei, S. Automatic number plate recognition (ANPR) in smart cities: A systematic review on technological advancements and application cases. Cities 2022, 129, 103833. [Google Scholar] [CrossRef]
  110. Marciniuk, K.; Kostek, B. Machine learning applied to acoustic-based road traffic monitoring. Procedia Comput. Sci. 2022, 207, 1087–1095. [Google Scholar] [CrossRef]
  111. Ghaffarpasand, O.; Almojarkesh, A.; Morris, S.; Stephens, E.; Chalabi, A.; Almojarkesh, U.; Almojarkesh, Z.; Pope, F.D. Traffic Noise Assessment Using Intelligent Acoustic Sensors (Traffic Ear) and Vehicle Telematics Data. Sensors 2023, 23, 6964. [Google Scholar] [CrossRef]
  112. Wang, H.; Liu, J.; Dong, H.; Shao, Z. A Survey of the Multi-Sensor Fusion Object Detection Task in Autonomous Driving. Sensors 2025, 25, 2794. [Google Scholar] [CrossRef]
  113. Wei, C.; Qin, Z.; Zhang, Z.; Wu, G.; Barth, M.J. Integrating Multi-Modal Sensors: A Review of Fusion Techniques for Intelligent Vehicles. arXiv 2025. [Google Scholar] [CrossRef]
  114. Azimjonov, J.; Özmen, A.; Varan, M. A vision-based real-time traffic flow monitoring system for road intersections. Multimed. Tools Appl. 2023, 82, 25155–25174. [Google Scholar] [CrossRef] [PubMed]
  115. Marszalek, Z.; Duda, K. Validation of Multi-Frequency Inductive-Loop Measurement System for Parameters of Moving Vehicle Based on Laboratory Model. Sensors 2024, 24, 7244. [Google Scholar] [CrossRef]
  116. Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple online and realtime tracking. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar] [CrossRef]
  117. Anandhalli, M.; Baligar, V.P. A novel approach in real-time vehicle detection and tracking using Raspberry Pi. Alex. Eng. J. 2018, 57, 1597–1607. [Google Scholar] [CrossRef]
  118. Wei, Z.; Zhang, F.; Chang, S.; Liu, Y.; Wu, H.; Feng, Z. MmWave Radar and Vision Fusion for Object Detection in Autonomous Driving: A Review. Sensors 2022, 22, 2542. [Google Scholar] [CrossRef]
  119. Llorca, D.F.; Martínez, A.H.; Daza, I.G. Vision-based vehicle speed estimation: A survey. IET Intell. Transp. Syst. 2021, 15, 987–1005. [Google Scholar] [CrossRef]
  120. Cai, G.; Wang, X.; Shi, J.; Lan, X.; Su, T.; Guo, Y. Vehicle Detection Based on Information Fusion of mmWave Radar and Monocular Vision. Electronics 2023, 12, 2840. [Google Scholar] [CrossRef]
  121. Li, S.; Yoon, H.-S. Sensor Fusion-Based Vehicle Detection and Tracking Using a Single Camera and Radar at a Traffic Intersection. Sensors 2023, 23, 4888. [Google Scholar] [CrossRef] [PubMed]
  122. Alqahtani, D.K.; Cheema, A.; Toosi, A.N. Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices. arXiv 2024. [Google Scholar] [CrossRef]
  123. Iwasaki, Y.; Misumi, M.; Nakamiya, T. Robust Vehicle Detection under Various Environmental Conditions Using an Infrared Thermal Camera and Its Application to Road Traffic Flow Monitoring. Sensors 2013, 13, 7756–7773. [Google Scholar] [CrossRef]
  124. Liu, Y.; Su, H.; Zeng, C.; Li, X. A Robust Thermal Infrared Vehicle and Pedestrian Detection Method in Complex Scenes. Sensors 2021, 21, 1240. [Google Scholar] [CrossRef]
  125. Ben-Shoushan, R.; Brook, A. Fused Thermal and RGB Imagery for Robust Detection and Classification of Dynamic Objects in Mixed Datasets via Pre-Trained High-Level CNN. Remote Sens. 2023, 15, 723. [Google Scholar] [CrossRef]
  126. Biglari, A.; Tang, W. A Review of Embedded Machine Learning Based on Hardware, Application, and Sensing Scheme. Sensors 2023, 23, 2131. [Google Scholar] [CrossRef] [PubMed]
  127. Saldivar-Carranza, E.D.; Desai, J.; Thompson, A.; Taylor, M.; Sturdevant, J.; Bullock, D.M. Vehicle and Pedestrian Traffic Signal Performance Measures Using LiDAR-Derived Trajectory Data. Sensors 2024, 24, 6410. [Google Scholar] [CrossRef]
  128. Wu, A.; Banerjee, T.; Chen, K.; Rangarajan, A.; Ranka, S. A Multi-Sensor Video/LiDAR System for Analyzing Intersection Safety. In Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 24–28 September 2023; pp. 1158–1165. [Google Scholar] [CrossRef]
  129. Liang, L.; Ma, H.; Zhao, L.; Xie, X.; Hua, C.; Zhang, M.; Zhang, Y. Vehicle Detection Algorithms for Autonomous Driving: A Review. Sensors 2024, 24, 3088. [Google Scholar] [CrossRef]
  130. Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. ByteTrack: Multi-Object Tracking by Associating Every Detection Box. [Online]. Available online: https://github.com/ifzhang/ByteTrack (accessed on 17 August 2025).
  131. Yoshioka, T.; Sakakibara, H.; Tenhagen, R.; Lorkowski, S.; Oguchi, T. Traffic Signal Control Parameter Calculation Using Probe Data. Int. J. Intell. Transp. Syst. Res. 2022, 20, 288–298. [Google Scholar] [CrossRef]
  132. Nagashima, Y.; Hattori, O.; Kobayashi, M. 44 · Improvement of Traffic Signal Control Using Probe Data FEATURED TOPIC. Sei Technical Review 2014. Available online: https://global-sei.com/technology/tr/bn78/pdf/78-09.pdf (accessed on 17 August 2025).
  133. Traffic Signal Control API Based on Probe Data; 2024. Available online: https://sumitomoelectric.com/sites/default/files/2024-04/download_documents/E98-26.pdf (accessed on 17 August 2025).
  134. A Project Document of the Joint Committee on the NTCIP NTCIP 1202 Version 04 National Transportation Communications for ITS Protocol Functional Requirements: Object Definitions for Actuated Signal Controllers (ASC) Interface. [Online]. Available online: www.ntcip.org (accessed on 17 August 2025).
  135. NEMA Standards Publication TS 2-2021 Traffic Controller Assemblies with NTCIP Requirements Version 03.08. [Online]. 2021. Available online: www.nema.org (accessed on 17 August 2025).
  136. Cunha, J.; Batista, N.; Cardeira, C.; Melicio, R. Wireless Networks for Traffic Light Control on Urban and Aerotropolis Roads. J. Sens. Actuator Netw. 2020, 9, 26. [Google Scholar] [CrossRef]
  137. Del Rey Carrión, D.; Juan-Llácer, L.; Rodríguez, J.-V. Radio Planning Considerations in TETRA to LTE Migration for PPDR Systems: A Radioelectric Coverage Case Study. Appl. Sci. 2019, 9, 250. [Google Scholar] [CrossRef]
  138. Van De Beek, S.; Leferink, F. Robustness of a TETRA Base Station Receiver Against Intentional EMI. IEEE Trans. Electromagn. Compat. 2015, 57, 461–469. [Google Scholar] [CrossRef]
  139. Nor, R.F.A.M.; Zaman, F.H.K.; Mubdi, S. Smart traffic light for congestion monitoring using LoRaWAN. In Proceedings of the 2017 IEEE 8th Control and System Graduate Research Colloquium (ICSGRC), Shah Alam, Malaysia, 4–5 August 2017; pp. 132–137. [Google Scholar] [CrossRef]
  140. Menteri Komunikasi dan Informatika Republik Indonesia. Peraturan Menteri Komunikasi dan Informatika Republik Indonesia Nomor 8 Tahun 2024 Tentang Penggunaan Spektrum Frekuensi Radio Untuk Sistem Komunikasi Microwave Link; 2024. Available online: https://peraturan.bpk.go.id/Details/309376/permenkominfo-no-8-tahun-2024 (accessed on 17 August 2025).
  141. Suthanthira, S.; Andrino-Chavez, A.; Ctc, A.; Lewis, L. DRAFT MEMORANDUM and Fremont Blvd. Multimodal Corridor Project Implementation Guidance for Traffic Signal Technology Improvements INTRODUCTION AND BACKGROUND. 2020. Available online: https://www.alamedactc.org/wp-content/uploads/2021/02/Appendix-E_Traffic_Signal_Technology_Implementation_Guidance_20200824.pdf (accessed on 17 August 2025).
  142. Cronin, B. Produced under Saxton Transportation Operations Laboratory Task Order 18-404 (Feasibility Study and Assessment of Communications Approaches for Real-Time Traffic Signal Applications), DTFH6116D00030L-693JJ318F000404. Available online: https://rosap.ntl.bts.gov/view/dot/50751 (accessed on 17 August 2025).
  143. C-ITS Organisation. Report on Legal and Organisational Structures for C-ITS Operation; 2018. Available online: https://www.c-roads.eu/fileadmin/user_upload/media/Dokumente/Report_on_legal_structures_for_C-_ITS_operation_v1_Final.pdf (accessed on 17 August 2025).
  144. Zhang, C.; Zhou, Y.; Zhang, M.; Wang, B.; Nie, Y. Review and prospect of floating car data research in transportation. J. Traffic Transp. Eng. (Engl. Ed.) 2025, 12, 752–771. [Google Scholar] [CrossRef]
  145. Tang, R.; Kanamori, R.; Yamamoto, T. Improving Coverage Rate for Urban Link Travel Time Prediction Using Probe Data in the Low Penetration Rate Environment. Sensors 2020, 20, 265. [Google Scholar] [CrossRef] [PubMed]
  146. Hellinga, B.; Izadpanah, P.; Takada, H.; Fu, L. Decomposing travel times measured by probe-based traffic monitoring systems to individual road segments. Transp. Res. Part C: Emerg. Technol. 2008, 16, 768–782. [Google Scholar] [CrossRef]
  147. Xu, J.; Tian, Z.; Wang, A.; Xie, G.; Valenzuela, L. Development and assessment of trajectory-based arterial through percent arrivals on red for arterial signal coordination performance evaluation. Int. J. Transp. Sci. Technol. 2025, 18, 131–147. [Google Scholar] [CrossRef]
  148. Astarita, V.; Giofré, V.; Festa, D.; Guido, G.; Vitale, A. Floating Car Data Adaptive Traffic Signals: A Description of the First Real-Time Experiment with ‘Connected’ Vehicles. Electronics 2020, 9, 114. [Google Scholar] [CrossRef]
  149. Dabbas, H.; Friedrich, B. Estimating traffic demand of different transportation modes using floating smartphone data. Transp. A Transp. Sci. 2024, 1–27, in press. [Google Scholar] [CrossRef]
  150. Wang, F.; Xu, Y. Estimating O–D travel time matrix by Google Maps API: Implementation, advantages, and implications. Ann. GIS 2011, 17, 199–209. [Google Scholar] [CrossRef]
  151. Muñoz-Villamizar, A.; Solano-Charris, E.L.; AzadDisfany, M.; Reyes-Rubiano, L. Study of urban-traffic congestion based on Google Maps API: The case of Boston. IFAC-Pap. 2021, 54, 211–216. [Google Scholar] [CrossRef]
  152. Rodriguez, A.M.; Tiberius, C.; van Bree, R.; Geradts, Z. Google timeline accuracy assessment and error prediction. Forensic. Sci. Res. 2018, 3, 240–255. [Google Scholar] [CrossRef] [PubMed]
  153. Sandt, A.; McCombs, J.; Cornelison, E.; Al-Deek, H.; Carrick, G. Using Crowdsourced Data to Reduce Traffic Congestion by Improving Detection of and Response to Disabled or Abandoned Vehicles on Florida Limited-Access Facilities. Transp. Res. Rec. J. Transp. Res. Board 2023, 2677, 309–323. [Google Scholar] [CrossRef]
Figure 1. PRISMA flow diagram.
Figure 1. PRISMA flow diagram.
Applsci 15 10761 g001
Figure 2. Grouping rules for narrative synthesis.
Figure 2. Grouping rules for narrative synthesis.
Applsci 15 10761 g002
Figure 3. Corridor micro-coordination scheme with dynamic reference–follower roles. Blue arrows show eastbound downstream flow (L1.4 → L2.2), and purple arrows show westbound downstream flow (L2.2 → L1.4). In both cases, the follower lies downstream of the reference.
Figure 3. Corridor micro-coordination scheme with dynamic reference–follower roles. Blue arrows show eastbound downstream flow (L1.4 → L2.2), and purple arrows show westbound downstream flow (L2.2 → L1.4). In both cases, the follower lies downstream of the reference.
Applsci 15 10761 g003
Figure 4. Reference–follower coordination across three adjacent intersections on a downstream corridor.
Figure 4. Reference–follower coordination across three adjacent intersections on a downstream corridor.
Applsci 15 10761 g004
Figure 5. Purdue Coordination Diagrams (PCD) illustrating a corridor with three signals before and after bounded offset adjustments. (a) Before (Reference–Follower coordination only): green window (approximately 38 s) with mild fluctuations; arrivals inside the band (blue) achieve AoG approximately 48%, while many near-start arrivals (gray) fall just outside the window. (b) After (Hybrid Rule-based and MARL coordination): green window (approximately 42 s) with stronger fluctuations; many near-start arrivals are “captured” into the band, raising AoG to approximately 62%.
Figure 5. Purdue Coordination Diagrams (PCD) illustrating a corridor with three signals before and after bounded offset adjustments. (a) Before (Reference–Follower coordination only): green window (approximately 38 s) with mild fluctuations; arrivals inside the band (blue) achieve AoG approximately 48%, while many near-start arrivals (gray) fall just outside the window. (b) After (Hybrid Rule-based and MARL coordination): green window (approximately 42 s) with stronger fluctuations; many near-start arrivals are “captured” into the band, raising AoG to approximately 62%.
Applsci 15 10761 g005
Figure 6. Inter-agent communication architecture.
Figure 6. Inter-agent communication architecture.
Applsci 15 10761 g006
Figure 7. Three-agent reference–follower coordination with a middle-node coordinator and TMC interface.
Figure 7. Three-agent reference–follower coordination with a middle-node coordinator and TMC interface.
Applsci 15 10761 g007
Figure 8. Three-agent corridor with a middle-node coordinator and TMC/CTSS interface, augmented by floating data from Google-served user devices.
Figure 8. Three-agent corridor with a middle-node coordinator and TMC/CTSS interface, augmented by floating data from Google-served user devices.
Applsci 15 10761 g008
Table 1. Search strategy and screening outcomes (summary from Supplementary Tables S1 and S4).
Table 1. Search strategy and screening outcomes (summary from Supplementary Tables S1 and S4).
DatabasePeriodFiltersRecords RetrievedAfter DuplicatesIncluded After Screening
Scopus2000–2025English1281277
Web of Science2000–2025English94925
IEEE Xplore2000–2025Conf. & Journal52514
Google Scholar2000–2025English, 2000–2025; first 200 results screened2002002
Total47447018
Table 2. Protocol adjustments and decision rules (summary from Supplementary Tables S2 and S3).
Table 2. Protocol adjustments and decision rules (summary from Supplementary Tables S2 and S3).
AspectFinal ImplementationRationale
Search windowExtended to Aug 2025Capture 2025 publications
Databases+Google ScholarBroader coverage
Quality assessmentOperational relevance (not ROBIS)Fit for engineering practice
SynthesisNarrative (SWiM)Heterogeneity in metrics
Edge case: freeway studiesExcludeOut of scope
Edge case: hybrid w/o metricsExcludeCannot assess outcomes
Edge case: policy/agency docsInclude, marked separatelyPractical relevance
Table 3. Comparative examples of signal-control families in practice from other countries.
Table 3. Comparative examples of signal-control families in practice from other countries.
FamilyExamples
(Systems)
Timing Update MethodTypical
Context
Representative Cities
Fixed-time (pre-timed/coordinated)Green-wave/TODScheduled cycle & offsetPredictable corridorsKharkiv (Ukraine) [19], Gaza/Palestine [20]
TRPS (traffic-
responsive plan selection)
TRPS (traffic-responsive plan selection)Select among pre-plans via detectorsLimited sensing; intermittent commsNorthern Virginia (USA) [21]
Actuated (semi/fully)Local detector actuationsPhase extensions/recallsIsolated or mixed-demandHouston (USA) [22]; Anaheim (USA) [23]
Adaptive/UTC (real-time)SCOOT/SCATS/ACS-LiteContinuous cycle–split–offsetNetworked corridorsLondon (UK, SCOOT) [24]; Las Vegas (USA, SCATS) [25]; Anaheim (USA, ACS-Lite) [23]
Centralized ATMS/UTCCity traffic control centersCenter-led supervision & retimingLarge urban areasAnaheim (USA, ATCS FOT) [26]
Table 4. Allowed vs. Disallowed under the authoritative fixed-time plan.
Table 4. Allowed vs. Disallowed under the authoritative fixed-time plan.
Control VariableAllowed Under Fixed PlanNotes
Cycle length (full cycle time)NoFixed by the authoritative timing plan
Phase green splits (durations)NoPre-set for each fixed-time plan
Phase sequence/orderNoRemains as in the pre-designed plan
Yellow and all-red intervalsNoSafety-critical timings are immutable
Phase start offsetsYesMARL agents may fine-tune offsets (cycle alignment)
Minor green adjustments (<few s)NoOnly offsets are adjusted, not phase durations
Local vehicle detection triggersNoA fixed-time plan does not use real-time calls
Table 5. Deployment Status of 18 Hybrid RL/Traffic-Signal Studies.
Table 5. Deployment Status of 18 Hybrid RL/Traffic-Signal Studies.
StudyYearMethod/LeverSettingDeployment Status
Wei et al.—CoLight [6]2019MARL—Graph AttentionSUMO, 4–100 intersectionsSimulation-only
Bouktif et al.—P-DQN [66]2021Hybrid action-space DRLSUMO single corridorSimulation-only
Huang et al.—ModelLight [74]2021Meta-RLSUMOSimulation-only
Fang & Sadeh (Survey) [14]2023Review of RL TSCLiteratureSimulation-only evidence
Li et al.—Offline RL (D2TSC) [68]2023Offline RLSUMOSimulation-only
Bi et al.—Fuzzy + DRL [75]2024Type-2 Fuzzy + DRLSUMO, single junctionSimulation-only
Jia & Ji [76]2025Multi-Agent DRLSUMO, large networksSimulation-only
Zhang et al.—Masked MARL [77]2025Action-masked MARLSUMO, 16 intersectionsSimulation-only
Dhulkefl et al. [78]2025Hybrid intelligent TSCMatlab/SimulinkSimulation-only
K. Othman et al. [72]2025Decentralized multi-agent RLSUMOSimulation-only
Michailidis et al. (Survey) [2]2025Review of RL TSCLiteratureSimulation-only evidence
Saadi et al. (Survey) [70]2025Review of DRL TSCLiteratureSimulation-only evidence
Li et al.—Federated DRL [71]2025Federated DRL (PPO)SUMO gridSimulation-only
Zheng et al. [10]2025Pri-DDQN hybrid agentSUMOSimulation-only
Satheesh & Powell [73]2025Constrained Multi-Agent RL (MAPPO-LCE)CityFlowSimulation-only evidence
Table 6. Evidence Map of Included Studies.
Table 6. Evidence Map of Included Studies.
Study SettingControl Lever(s) (Offset/Split/Cycle/Phase/Hybrid)Safeguard Type (Mask/Shield/Bounds/Other)Metrics (AoG, PCD, Delay, Travel Time, Queue, etc.)Beats Baselines?
Wei et al.: CoLight Large-network simulation (CityFlow; Hangzhou/Jinan/NY datasets)Phase selection with network-level cooperation via graph attentionNot reported (no explicit mask/shield/bounds)Avg travel time, throughput (large networks)Yes vs. SOTA (e.g., PressLight, DQN). [6]
Bouktif et al.: P-DQN Simulation (SUMO)Hybrid: discrete phase + continuous duration (P-DQN)Not reportedAvg queue, avg travel time, waiting timeYes (e.g., −22.2% queue; −5.78% travel time vs. baselines). [66]
Huang et al.: ModelLight Simulation with real-data-driven testbeds (model-based meta-RL)Phase/split optimization (model-based & meta-learning)Not reportedTravel time, delayYes vs. SOTA, with far fewer interactions. [74]
Fang & Sadeh (Survey)—“The Real Deal” (with R. Chen) Survey/review (no experiments)Synthesis of evaluation metrics and deployment barriers
(review)
Li et al.: Offline RL (D2TSC) Offline RL from historical field data → simulator matched to fieldPhase/split (offline policy learning)Offline-dataset constraint (implicit safety)Travel time vs. actuated & offline RL baselinesYes, with better real-world applicability claims. [68]
Bi et al.: Type-2 Fuzzy + DRL Simulation (single intersection)Phase control (DQN with Type-2 fuzzy output)Fuzzy rules act as bounds on actionsReward, delayYes vs. DQN variants. [75]
Jia & Ji Simulation (large-scale)Phase/split with spatio-temporal attention (GAT + RNN)Not reportedAvg travel time, delayYes vs. fixed-time/DRL baselines. [76]
Zhang et al.: Masked MARL Corridor-level multi-intersection simulationPhase (multi-agent masked SAC for corridor control)Action masking (invalid actions masked)Delay/efficiency (corridor)Yes vs. strong MARL baselines (corridor-level). [77]
Dhulkefl et al. Simulation (SUMO)Hybrid KNN (state classifier) + DQN (phase policy)Not reportedDelay/queue (paper focuses on responsiveness)Yes vs. single-model baselines (reported). [78]
K. Othman et al. Multimodal (transit + traffic) simulation; decentralized multi-agentPhase/priority tuned to person-delay objectiveNot reportedPerson-delay (transit + general traffic)Yes (reduces total person-delay vs. pre-timed/TSP). [72]
Michailidis et al. (Survey) Survey (Infrastructures, MDPI)Review of methods, metrics, baselines, and applicability
(review)
Saadi et al. (Survey) Survey (Journal of Big Data)Coordination in ITLC; compiles metrics/baselines
(review)
Li et al.: Federated DRL Federated across domains (real datasets)Phase/split via federated PPOPrivacy/aggregation guardrails (federation)Delay, queue, throughputYes vs. local & centralized baselines. [71]
Zheng et al.: Pri-DDQN (hybrid agent) Simulation (single intersection)Phase & cycle (hybrid agent with prioritized DDQN)Not reportedWaiting time, queueYes vs. DQN variants. [10]
Satheesh & Powell: MAPPO-LCE Simulation on three real-world datasets (CityFlow)Phase/split (multi-agent PPO) under constraintsConstrained RL with Lagrange Cost Estimator; explicit GreenTime/GreenSkip/PhaseSkip constraintsDelay plus fairness/safety proxiesYes (+12.6% vs. MAPPO; +10.3% vs. IPPO; +13.1% vs. QTRAN). [73]
Table 7. Effect-direction summary by metric and sensitivity subset.
Table 7. Effect-direction summary by metric and sensitivity subset.
MetricSimulation-Only (n = 15)—ImproveMixedDeterioratePilot/Field (n = 3)—ImproveMixedDeteriorate
Arrivals on Green (AoG)811200
Purdue Coordination Diagram (PCD)611100
Delay922210
Travel time711100
Notes: counts are per-study, per-metric effect directions (studies may contribute to multiple metrics). Patterns are consistent with the overall SWiM tally (13 improvements, 3 mixed/neutral, 2 deteriorations), with pilots showing improvements but generally smaller magnitudes than simulation-only studies.
Table 8. Comparative safeguards in hybrid rule-based + RL traffic-signal control (all 18 studies).
Table 8. Comparative safeguards in hybrid rule-based + RL traffic-signal control (all 18 studies).
Study Rule Shield/Plan AuthorityAction MaskingBounded VariablesPrerequisites for Action
Wei et al.: CoLight No explicit plan shield; full phase selectionNoNone (phase choice free)None reported
Bouktif et al.: P-DQN Not reported (hybrid phase + duration)NoNone; continuous duration allowedSimulation only
Huang et al.: ModelLight Not explicit; model-based policyNoIndirect bounds via modelData-driven testbed
Fang & Sadeh (Survey)— (review only)
Li et al.: Offline RL (D2TSC)Implicit via an offline datasetNoLearned policy constrained by logged dataOffline dataset (field logs)
Bi et al.: Fuzzy + DRL Fuzzy rules constrain outputsNoBounds via fuzzy membershipSimulation (single junction)
Jia & Ji Not reportedNoNoneSimulation (large networks)
Zhang et al.: Masked MARL Phase order preservedYesMasked invalid actionsSimulation corridor
Dhulkefl et al. The hybrid classifier constrains the stateNoNone explicitSimulation
K. Othman et al.: eMARLIN-T-MM Priority rules encodedNoPerson-delay objective guides boundsSimulation multimodal
Michailidis et al. (Survey)—(review only)
Saadi et al. (Survey)—(review only)
Li et al.: Federated DRL Aggregation/federation constraintsNoIndirect bounds via federated PPOPrivacy-preserving prerequisites
Zheng et al.: Pri-DDQN Not reportedNoNone explicitSimulation
Satheesh & Powell: MAPPO-LCE Rule shield: min-green, phase skip penaltiesYesSplit/offset bounded via LCEConstrained RL estimator
Indonesian prototype (microcontroller)Fixed phase/split; offset onlyNoΔoffset ≤ few seconds per cycleAoG/PCD audit required
ACS-Lite inspired pilots (FHWA)Plan authority fixed; only offsets adjustedNoOffset trims onlyField ATSPM evidence
SCATS (Bandung/Jakarta evals)Cycle/split authority retainedNoOffset/split bounded by planNational regulation, MKJI audits
Table 9. Minimal multi-sensor overlays for camera-first detection under Indonesian constraints.
Table 9. Minimal multi-sensor overlays for camera-first detection under Indonesian constraints.
Objective (What to Detect)Primary SensorAugmenting Sensor(s)Edge Algorithm on Raspberry Pi (Notes)
Stop-bar presence & lane-by-lane counts (daytime)Visible-light video detector (fixed camera) for real-time traffic flow monitoring [114]. Inductive loops [115].Lightweight vision and tracking on Raspberry Pi for real-time vehicle detection/counting; tracking for ID persistence [116,117]; YOLO Tiny or EfficientDet Lite for PTW detection [97,118]
Approach speed & trajectories for AoG/PCD (arrivals profiling)Vision for per-vehicle speed/trajectory estimation [119]. mmWave radar to stabilize speed/range under rain/dark [120]. Kalman and Hungarian data association on Raspberry Pi-class TPUs [121,122].
Night/glare-robust presence & classificationThermal infrared camera [123,124].RGB camera for context and class labels; RGB-thermal fusion improves detection when either alone is weak [125]. Lightweight thermal/visible on edge devices [126].
Queue length/platoon length & multi-user trackingLiDAR for precise 3D trajectories [127,128]. Video for classification context; fusion with video assists ID and class labels [129]. Per-frame detection and multi-object tracking (e.g., ByteTrack/SORT) on an edge host; LiDAR processing for queue estimation [116,130].
Corridor travel time & timing updates without dense detectorsProbe/fleet data to estimate signal control parameters (cycle/split/offset) with low penetration [131].Optional stop-bar video or loops to validate/bias estimates where installed [115]. Pi in cabinet aggregate probe APIs and local sensors, compute simple metrics, and feed controller [132,133].
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kurniawan, F.; Agustian, H.; Dermawan, D.; Nurdin, R.; Ahmadi, N.; Dinaryanto, O. Hybrid Rule-Based and Reinforcement Learning for Urban Signal Control in Developing Cities: A Systematic Literature Review and Practice Recommendations for Indonesia. Appl. Sci. 2025, 15, 10761. https://doi.org/10.3390/app151910761

AMA Style

Kurniawan F, Agustian H, Dermawan D, Nurdin R, Ahmadi N, Dinaryanto O. Hybrid Rule-Based and Reinforcement Learning for Urban Signal Control in Developing Cities: A Systematic Literature Review and Practice Recommendations for Indonesia. Applied Sciences. 2025; 15(19):10761. https://doi.org/10.3390/app151910761

Chicago/Turabian Style

Kurniawan, Freddy, Harliyus Agustian, Denny Dermawan, Riani Nurdin, Nurfi Ahmadi, and Okto Dinaryanto. 2025. "Hybrid Rule-Based and Reinforcement Learning for Urban Signal Control in Developing Cities: A Systematic Literature Review and Practice Recommendations for Indonesia" Applied Sciences 15, no. 19: 10761. https://doi.org/10.3390/app151910761

APA Style

Kurniawan, F., Agustian, H., Dermawan, D., Nurdin, R., Ahmadi, N., & Dinaryanto, O. (2025). Hybrid Rule-Based and Reinforcement Learning for Urban Signal Control in Developing Cities: A Systematic Literature Review and Practice Recommendations for Indonesia. Applied Sciences, 15(19), 10761. https://doi.org/10.3390/app151910761

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop