Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessSystematic Review

Peer-Review Record

Hybrid Rule-Based and Reinforcement Learning for Urban Signal Control in Developing Cities: A Systematic Literature Review and Practice Recommendations for Indonesia

Appl. Sci. 2025, 15(19), 10761; https://doi.org/10.3390/app151910761

by Freddy Kurniawan^1,*

, Harliyus Agustian²

, Denny Dermawan¹

, Riani Nurdin³

, Nurfi Ahmadi⁴

and Okto Dinaryanto⁴

Reviewer 1: Anonymous

Reviewer 2:

Dario Jaramillo

Reviewer 3: Anonymous

Appl. Sci. 2025, 15(19), 10761; https://doi.org/10.3390/app151910761

Submission received: 25 August 2025 / Revised: 27 September 2025 / Accepted: 2 October 2025 / Published: 6 October 2025

(This article belongs to the Special Issue Intelligent Transportation System Technologies and Applications, 2nd Edition)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This is a well-structured systematic literature review that addresses an important gap in traffic signal control research for developing countries, particularly Indonesia. The paper demonstrates methodological rigor following PRISMA 2020 guidelines and provides practical recommendations for implementation. However, several areas require improvement to strengthen the contribution.

Expand Literature Base: Consider a broader geographic scope or longer time window to capture more relevant studies.
Strengthen Evidence-Recommendation Link: Create explicit mapping between systematic review findings and implementation recommendations.
Add Economic Analysis: Include discussion of cost-effectiveness and financing mechanisms for developing country deployments.
Validation Framework: Propose metrics and methodologies for evaluating the proposed hybrid approaches in real-world settings.
Risk Assessment: Provide more comprehensive analysis of implementation risks and mitigation strategies.

Author Response

Comments 1: Expand Literature Base: Consider a broader geographic scope or longer time window to capture more relevant studies.

Response 1: Thank you for pointing this out. In the revised manuscript, we clarified that the review applied a broad time window (2000–2025) and covered multiple regions (Asia, Africa, Europe, and North America). This is now explicitly stated in the Search Strategy section (lines 114-116)

“The search window spanned 2000–2025 and covered studies across multiple regions, including Asia, Africa, Europe, and North America, to ensure a broad geographic scope.”

Comments 2: Strengthen Evidence-Recommendation Link: Create explicit mapping between systematic review findings and implementation recommendations.

Response 2: Thank you for this valuable suggestion. We revised Section 6 (Roadmap) to explicitly map systematic review findings to recommendations. The text now states that the staged roadmap directly reflects the synthesis tables and grouped evidence. This appears around lines 815–817.

“The staged roadmap presented in this section directly reflects the systematic review findings, ensuring that each recommendation is explicitly grounded in the synthesized evidence.”

Comments 3: Add Economic Analysis: Include discussion of cost-effectiveness and financing mechanisms for developing country deployments.

Response 3: We appreciate this important point. The revised version now includes a discussion of cost-effectiveness and financing strategies for developing-country deployments. This is integrated in Section 6 (Discussion/Policy), lines 877–883. The text highlights that hybrid retrofits are less capital-intensive, can leverage existing infrastructure, and may be financed via phased budgets or Public–Private Partnership (PPP) schemes.

“Hybrid retrofits that reuse existing cabinet controllers and adopt camera-first sensing are substantially less capital-intensive than full adaptive replacements. Cost-effectiveness arises from leveraging existing infrastructure, deploying low-cost edge devices, and restricting online adaptation to offset tuning rather than full phase reoptimization. Financing mechanisms may include incremental upgrades through municipal budget cycles, integration with broader ITS modernization programs, and selective public–private partnerships.”

Comments 4: Validation Framework: Propose metrics and methodologies for evaluating the proposed hybrid approaches in real-world settings.

Response 4: Thank you for pointing this out. We added a clear validation framework in Future Work (lines 945–948). It proposes measurable indicators such as AoG, PCD before–after comparisons, delay distributions, and travel-time reliability, with methodology based on high-resolution logging and standardized data formats.

“We propose metrics that can be logged from existing ATSPM systems, including arrivals-on-green percentages, Purdue Coordination Diagram before–after comparisons, delay distributions, and travel-time reliability.”

Comments 5: Risk Assessment: Provide more comprehensive analysis of implementation risks and mitigation strategies.

Response 4: Thank you for highlighting this gap. The revised manuscript now contains a dedicated Implementation Risks (around lines 934–940). It discusses technical risks (sensor failures, comms outages), institutional risks (staffing, mandates), and operational risks (misapplied safeguards), along with mitigation strategies such as redundant comms, audit-first acceptance, and staff training.

“Technical risks include unreliable sensors, backhaul outages, or synchronization errors that could disrupt corridor operations. Institutional risks arise from staffing shortages, fragmented mandates, and resistance to procedural change within traffic agencies. Operational risks exist if safeguards are misapplied, potentially creating unsafe phase transitions. Mitigation strategies include redundant communications, audit-first acceptance policies, and staff training”

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors Dear Authors

Below you will find a review of this work, structured by sections:

Abstract
Please, you should add a couple of sentences that correctly indicate the effect and direction of the summary (for example, “A, B, C, ... studies reported improved AoG or reduced delay vs. baseline”) to make the abstract’s key outcome more explicit.

Introduction
Please, improve the novelty paragraph by introducing a comparison prior RL-only surveys and stating exactly what this SLR adds (for example, operational safeguards, auditability, standards, and Indonesian implementation constraints).

Methods
Risk-of-bias or quality appraisal, ROBIS was replaced with an “operational relevance” checklist. Provide justification and a mapping to standard SLR quality domains; otherwise, the authors should add a supplemental risk-of-bias instrument.

PRISMA counts indicate Databases (n=1) vs. arXiv Registers (n=34) before de-duplication; Reconcile this with the stated multi-database strategy and discuss potential bias from heavy reliance on preprints.

SWiM is appropriate in your paper, but please add a compact effect-direction table (improve/mixed/no-change) for the primary metrics with the sensitivity subsets you already defined.

Results/Evidence Synthesis
The authors should quantify the direction and prevalence of improvements (for example, count studies with AoG and analyze the increase or decrement of those) using the SWiM effect-direction approach and the predefined sensitivity subsets

Create a small comparative table summarizing the hybrid patterns (rule shields, action masking, bounded variables, prerequisites) across included studies.

Conclusions
Explicitly list limitations (sensor reliability, backhaul dependence, institutional capacity)
Add a paragraph on future work (related to pilot evaluation designs, data-sharing standards)

Author Response

Comments 1:

Abstract

Please, you should add a couple of sentences that correctly indicate the effect and direction of the summary (for example, “A, B, C, ... studies reported improved AoG or reduced delay vs. baseline”) to make the abstract’s key outcome more explicit.

Response 1: Thank you for pointing this out. We revised the abstract to make the direction and effect of outcomes explicit.
This is now explicitly stated in the Abstract, lines 25–28:

“Across the 18 studies, the majority reported improvements in arrivals on green, delay, travel time, or related coordination metrics compared to fixed-time or actuated baselines, while only a few showed neutral or mixed effects and very few indicated deterioration”

Comments 2:

Introduction
Please, improve the novelty paragraph by introducing a comparison prior RL-only surveys and stating exactly what this SLR adds (for example, operational safeguards, auditability, standards, and Indonesian implementation constraints).

Response 2: We appreciate this helpful suggestion. The novelty paragraph has been expanded by contrasting this review with RL-only surveys and clarifying the unique contributions.
This is now explicitly stated in the Introduction, lines 59–61:

“To our knowledge, comprehensive reviews focusing specifically on such hybrid rule-based and MARL signal control, especially on their fit for Indonesia’s heterogeneous traffic, remains limited.”

Comments 3:

Methods

Risk-of-bias or quality appraisal, ROBIS was replaced with an “operational relevance” checklist. Provide justification and a mapping to standard SLR quality domains; otherwise, the authors should add a supplemental risk-of-bias instrument.

Response 3: Thank you for raising this important point. We justified replacing ROBIS with an operational relevance checklist and mapped items to standard quality domains.
This is now explicitly stated in the Methods, lines 176–184:

“Formal tools such as ROBIS or AMSTAR-2 were considered but deemed less appropriate for engineering and transportation systems research, since these tools were developed with clinical, interventional studies in mind [18][19]. Instead, we applied an operational relevance checklist tailored to traffic-signal control studies, focusing on sensing environment, safeguard presence, simulation versus field context, and reporting of ATSPM-related metrics. Each checklist item can be mapped to standard systematic review quality domains: applicability/external validity (sparse vs dense sensing), study design adequacy (presence of safeguards), indirectness (simulation vs pilot), and reporting transparency (AoG, PCD, delay, or travel time metrics).”

Comments 4: PRISMA counts indicate Databases (n=1) vs. arXiv Registers (n=34) before de-duplication; Reconcile this with the stated multi-database strategy and discuss potential bias from heavy reliance on preprints.

Response 4: We acknowledge this observation. The discrepancy in PRISMA counts has been clarified by explaining that the PRISMA template groups multiple bibliographic databases under “Databases” (n=1) and lists arXiv separately under “Registers” (n=34). We also acknowledged the potential bias from including preprints.

This is now explicitly stated in the Methods – Search Strategy lines 107–111:

“To align with the PRISMA 2020 template, records were grouped under “Databases” and “Registers.” Although multiple databases were searched, only one category is shown in the PRISMA diagram under Databases (n=1), while arXiv preprints were clas-sified under Registers (n=34). This reflects PRISMA’s reporting structure rather than a true reliance on a single database.”

It is also reinforced in the Limitations, lines 922–924:

“Fourth, search strategy limitations may introduce bias: despite broad coverage of six major da-tabases, non-English studies may have been missed, and the inclusion of preprints (arXiv) may introduce uncertainty.”

Comments 5: SWiM is appropriate in your paper, but please add a compact effect-direction table (improve/mixed/no-change) for the primary metrics with the sensitivity subsets you already defined.

Response 5: We appreciate the reviewer’s suggestion. A compact effect-direction summary table has been added using the SWiM framework.
This is now explicitly stated in the Results, lines 555–564, with Table 7:

“To enhance transparency, we also quantified the direction of reported effects us-ing the SWiM effect-direction approach. Across the 18 studies, 13 reported improve-ments, 3 were neutral or mixed, and 2 showed deterioration. Improvements were most consistent in offset-only or bounded-variable designs, while deterioration was ob-served mainly when action spaces were unconstrained and safeguards were absent.

Effect-direction summary. Following SWiM, we summarized the direction of ef-fects for the primary metrics and split results by sensitivity subsets (simulation-only vs pilot/field). Table 7 provides a compact view.

Table 7. Effect-direction summary by metric and sensitivity subset… .”

Comments 6: Results/Evidence Synthesis

The authors should quantify the direction and prevalence of improvements (for example, count studies with AoG and analyze the increase or decrement of those) using the SWiM effect-direction approach and the predefined sensitivity subsets

Create a small comparative table summarizing the hybrid patterns (rule shields, action masking, bounded variables, prerequisites) across included studies.

Response 6: Thank you for highlighting this. The revised Results section now quantifies the prevalence of improvements and includes a comparative table of hybrid safeguard patterns.
This is now explicitly stated in the Comparison with Other Hybrid Traffic-Signal Control Approaches, lines 556–564

“Across the 18 studies, 13 reported improvements, 3 were neutral or mixed, and 2 showed deterioration … Table 7 summarizes how plan authority, action masking, bounded variables, and prerequisites were applied”

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

The paper is a systematic review of hybrid traffic signal control that combines rule-based logic with reinforcement learning, with practice recommendations tailored to Indonesia’s context of mixed traffic, sensing limits, and governance needs; its contribution is mapping architectures and safeguards across 18 studies and translating them into a practical deployment checklist and roadmap for corridors in developing cities.

The authors should clearly state the paper’s exact scope and keep it consistent, because parts read like both a systematic review and a design proposal; please say plainly if this is “SLR only” or “SLR plus practice framework,” and signpost where evidence ends and where recommendations begin so readers do not get confused.

The authors should strengthen PRISMA transparency by putting the full flow diagram and checklist in the main text, giving the exact database search strings and dates, and using one consistent search window across the whole paper, so others can repeat the review without guesswork.

The authors should make inclusion and exclusion rules more concrete with simple thresholds, for example define what “sparse/camera‑first sensing” means, what “explicit safeguards” include, and when dense-detector studies are still in scope, so screening is clear and repeatable.

The authors should add a short “evidence map” table in the main paper that lists the 18 studies with setting, control lever (offset/split/cycle), safeguard type (mask/shield/bounds), metrics (AoG, PCD, delay, travel time), and whether results beat baselines, so readers see the big picture at a glance.

I suggest to the authors to separate literature findings from the proposed architecture more gently by labeling the “reference–follower, offset‑only MARL” as the authors’ recommendation, and only linking claimed benefits to specific included studies or stating clearly when it reflects expert opinion.

I suggest to the authors to be careful with real‑world claims about RL by stating which of the 18 studies are simulation‑only and which are pilots, and avoid broad statements about field operations unless each point is tied to a cited study in the evidence map.

The authors should add a simple, honest limitations paragraph in the Conclusion noting that the synthesis is narrative (no meta‑analysis), Indonesia‑specific evidence is still thin, metrics vary across studies, and search choices may introduce bias, and briefly say how this affects confidence in the checklist.

I suggest to the authors to streamline writing and reduce repetition: say the “audit‑first” and AoG/PCD points once, fix minor grammar and placeholders, and shorten long sentences in the Introduction and Methods so the guidance feels clear and friendly to city engineers.

I suggest to the authors to include one small corridor example (2–3 signals) that walks through before/after AoG/PCD and shows the exact bounded offset tweaks over a few cycles, turning the checklist into something practical people can picture using on the ground next week.

The authors should move a few key supplemental items (core parts of Tables S1–S6, screening logs, and synthesis grouping rules) into brief main text tables or figures, because readers need these to understand how the review was done and how the conclusions were formed.

Author Response

Comments 1:

Response 1: We thank the reviewer for this suggestion. We have explicitly clarified that our paper is both a systematic literature review and a practice framework. For example, the Abstract now states (lines 34–36):

“This paper is not only a systematic review but also develops a practice-oriented framework tailored to Indonesian corridors, ensuring that evidence synthesis and practical recommendations are clearly distinguished.”

This statement (also reflected at the end of the Introduction) makes the dual scope clear to readers.

Comments 2:

Response 2:

We appreciate the suggestion to improve reproducibility. The revised Methods (Sec.2) now explicitly note in lines 146-151 that:

“Screening followed the PRISMA 2020 flow. After removing duplicates, titles and abstracts were screened against the eligibility rules, followed by full-text review. Records of inclusion and exclusion at each stage are documented in Supplementary Tables S2–S3, with reasons for exclusion logged. The overall process is summarized in the PRISMA 2020 flow diagram as shown Figure 1. In total, 18 studies met the criteria and were retained for synthesis.”

We now explicitly state in Section 2.1 Search Strategy, lines 128–130 that the Scopus query executed on 16 August 2025 was:
“TITLE-ABS-KEY(‘reinforcement learning’ AND ‘traffic signal’ AND ‘hybrid’) … (PUBYEAR > 1999 AND PUBYEAR < 2026).”

This addition provides an exact search string with date, strengthening PRISMA transparency and showing that a consistent 2000–2025 window was applied. The exact database search strings and run dates are fully documented in Supplementary Table S1, ensuring the review process can be replicated without ambiguity.

Comments 3:

Response 3: Thank you for raising this important point. We rewrote Section 2.2 Eligibility Criteria to define thresholds and added a concise inclusion/exclusion table.
We now explicitly state in Eligibility Criteria, lines 136–142:
“Sparse or camera-first sensing was defined as corridors in which video feeds dominated or loop detector coverage was <30% of approaches. Explicit safeguards included at least one of: plan authority (e.g. min-green, fixed splits), action masking, bounded variables (Δoffset ≤½ cycle or ≤10 s), or prerequisites (AoG/PCD audits, ATSPM logs, interlocks). Dense-detector studies were included only if they still applied bounded RL logic; otherwise excluded.”
These definitions and the new Table 2 (lines 158-159) provide a clear, repeatable framework for screening.

Comments 4: The authors should add a short “evidence map” table in the main paper that lists the 18 studies with setting, control lever (offset/split/cycle), safeguard type (mask/shield/bounds), metrics (AoG, PCD, delay, travel time), and whether results beat baselines, so readers see the big picture at a glance.

Response 4: Thank you for this helpful suggestion. We created a concise evidence map summarizing all 18 studies in the main text.
We now explicitly state in the Results, lines 550–553:
“Across the 18 studies, 13 reported improvements, 3 were neutral/mixed, and 2 showed deterioration.”
This evidence map, now presented as Table 6 (Section 4.3, line 547), includes study setting, control lever, safeguards, metrics, and outcome relative to baseline, giving readers an at-a-glance overview

Comments 5: I suggest to the authors to separate literature findings from the proposed architecture more gently by labeling the “reference–follower, offset‑only MARL” as the authors’ recommendation, and only linking claimed benefits to specific included studies or stating clearly when it reflects expert opinion.

Response 5: Thank you for this observation. We revised Section 4.1.2 and Section 4.2 to insert explicit transition sentences marking where literature summaries end and recommendations begin.
We now explicitly state at the end of Section 4.1.2, lines 365–366:
“Based on these studies, our recommendation is to adopt a reference–follower scheme as a simple and auditable coordination policy.”
And at the end of Section 4.2, lines 490–492:
“Building on this evidence, we recommend restricting the MARL layer in our framework to residual offset learning only, rather than modifying full cycle or split plans.”
These changes clearly signal which statements are evidence-based and which are expert recommendations.

Comments 6: I suggest to the authors to be careful with real‑world claims about RL by stating which of the 18 studies are simulation‑only and which are pilots, and avoid broad statements about field operations unless each point is tied to a cited study in the evidence map.

Response 6: Thank you for pointing this out. To avoid overstating field evidence, we classified all studies explicitly by simulation or pilot.
We now explicitly state in Section 4.3, lines 504-505:
“All of these studies were simulation-only (SUMO or VISSIM), with no real-world deployments.”
We also identify three non-MARL pilots separately, and Table 5 (in line 541) shows simulation versus pilot classification. This ensures that any practice relevance is clearly distinguished from simulation-only evidence.

Comments 7: The authors should add a simple, honest limitations paragraph in the Conclusion noting that the synthesis is narrative (no meta‑analysis), Indonesia‑specific evidence is still thin, metrics vary across studies, and search choices may introduce bias, and briefly say how this affects confidence in the checklist.

Response 7: Thank you for this important comment. We added a clear Limitations paragraph at the end of the Conclusions. This is now explicitly stated in Section 7 (Limitations), lines 915–917:
“This review has several limitations that should be acknowledged. First, the synthesis relied on a narrative approach rather than a formal meta-analysis, because the included studies reported heterogeneous designs, metrics, and baselines.”

This revised wording confirms the narrative nature of the synthesis, notes the scarcity of Indonesia-specific evidence, highlights heterogeneity of metrics, and explains potential biases, thereby addressing the reviewer’s concerns.

Comments 8: I suggest to the authors to streamline writing and reduce repetition: say the “audit‑first” and AoG/PCD points once, fix minor grammar and placeholders, and shorten long sentences in the Introduction and Methods so the guidance feels clear and friendly to city engineers.

Response 8: The revised manuscript does show that audit-first and AoG/PCD definitions are presented once and then referenced later. For example, Section 4 (Hybrid Architecture) at lines 1–5 explains:
“ATSPM enables corridor-scale diagnostics without dense upstream detectors. PCD plots arrivals against the green window and, together with AoG, exposes drift in progression, split failures, and downstream blockages.”

Later sections only use shorthand (AoG/PCD) without repeating full definitions. Long sentences in the Introduction and Methods have been broken up, and placeholders were corrected. This confirms the streamlining.

Comments 9: I suggest to the authors to include one small corridor example (2–3 signals) that walks through before/after AoG/PCD and shows the exact bounded offset tweaks over a few cycles, turning the checklist into something practical people can picture using on the ground next week.

Response 9: Thank you for the constructive suggestion. We now explicitly state in the Results, lines 375–392 that

“As illustrated in Figure 5, a simplified Purdue Coordination Diagram (PCD) is used to show how bounded Δoffset adjustments affect corridor progression. In the before case… AoG ≈ 48%. In the after case… raising AoG to ≈ 62%.”

This corridor example makes the checklist more tangible and practical. This is reinforced by Figure 5, which contrasts before/after conditions for a three-signal corridor

Comments 10: The authors should move a few key supplemental items (core parts of Tables S1–S6, screening logs, and synthesis grouping rules) into brief main text tables or figures, because readers need these to understand how the review was done and how the conclusions were formed.

Response 10: Thank you for pointing this out. We have moved condensed versions of key supplemental materials into the main text so that readers can immediately see how the review was conducted and how the conclusions were derived, while still keeping the detailed versions in Supplementary Tables S1–S6.

This is now explicitly stated as follows:

Table 1 (Section 2.1, line 121) now summarizes the database search and screening outcomes (records retrieved, duplicates removed, and final studies included), complementing the PRISMA flow diagram.
Table 2 (Section 2.3, line 158) presents the main protocol adjustments and decision rules, showing how deviations from the a priori protocol and edge cases were handled consistently.
Table 8 (Section 4.3, lines 527) provides a comparative safeguards summary across all 18 studies:
“In addition, we compared the safeguard strategies across studies. Table 8 summarizes how plan authority, action masking, bounded variables, and prerequisites were applied. This table condenses the hybrid safeguard logic into a concise comparative view and complements the broader evidence map.”

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The paper improved for previous comments.

Reviewer 2 Report

Comments and Suggestions for Authors

I would like to thank the authors for carefully considering my previous comments. The revised manuscript now includes protocol registration, improved quality appraisal, quantitative effect-direction synthesis, and explicit limitations. These changes have substantially strengthened the paper

Article Menu

Hybrid Rule-Based and Reinforcement Learning for Urban Signal Control in Developing Cities: A Systematic Literature Review and Practice Recommendations for Indonesia

Further Information

Guidelines

MDPI Initiatives

Follow MDPI