Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Comparative Analysis of Some Methods and Algorithms for Traffic Optimization in Urban Environments Based on Maximum Flow and Deep Reinforcement Learning

Mathematics 2025, 13(14), 2296; https://doi.org/10.3390/math13142296

by Silvia Baeva¹

, Nikolay Hinov^2,*

and Plamen Nakov²

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4:

Pengfei Sun

Mathematics 2025, 13(14), 2296; https://doi.org/10.3390/math13142296

Submission received: 22 May 2025 / Revised: 8 July 2025 / Accepted: 8 July 2025 / Published: 17 July 2025

(This article belongs to the Special Issue Mathematical Applications in Digitalization, Electrification, and Sustainable Development)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This research presents an analytical survey of algorithms based on network flow assessment. The application domain of these algorithms is the optimization of traffic flows in urban environments, which targets the minimization of waiting time and maximization of the throughput of vehicles.

The content of the paper assesses seven algorithms applied to a four-node transport network. The assessment is performed for two criteria: the cumulative variance of the number of motor vehicles from all flows and the ratio between the idle time and their number for all flows. A simulation of the traffic flows in the SUMO environment is performed, and the seven algorithms are applied for the traffic management. The statistical results are evaluated which involves each control algorithm.

My suggestion for the authors is to provide better explanations of the part of the experiments, provided.

Please give details about the input parameters of the input traffic applied to the nodes of the network.
If appropriate, please give analytical illustrations about the content of the metrics applied for the assessment of the traffic flow algorithms. This will give a clear understanding to the reader.
To be given explanations about what comparative information originates from the values of the T-statistic, p-value, and the null hypothesis H0 for the obtained results.

My final assessment is that with minor changes, the paper can be accepted for publication.

Author Response

First of all, I would like to thank you for your thorough review of our paper “Comparative Analysis of Some Methods and Algorithms for Traffic Optimization in Urban Environments Based on Maximum Flow and Deep Reinforcement Learning” (mathematics-3685960) and helpful comments to improve it.

Reviewer 1

My suggestion for the authors is to provide better explanations of the part of the experiments, provided.

Please give details about the input parameters of the input traffic applied to the nodes of the network.

If appropriate, please give analytical illustrations about the content of the metrics applied for the assessment of the traffic flow algorithms. This will give a clear understanding to the reader.

To be given explanations about what comparative information originates from the values of the T-statistic, p-value, and the null hypothesis H0 for the obtained results.

My final assessment is that with minor changes, the paper can be accepted for publication.

Comments 1: My suggestion for the authors is to provide better explanations of the part of the experiments, provided.

Response 1: We appreciate the reviewer’s valuable suggestion. In the revised manuscript, we have significantly expanded the description of the experiments to provide clearer explanations of the experimental setup, input parameters, evaluation metrics, and validation scenarios. In particular:

Section 4.1 now describes in detail the SUMO simulation configuration, including the generation method for vehicle flows, traffic light control mechanism, and duration of the simulations.

Section 4.2 includes a clearer analytical description of the two main metrics used (cumulative vehicle dispersion and the waiting time per vehicle).

Section 4.3 and 4.5 have been enriched with detailed statistical test results and additional validation scenarios (high load, low load, real data, burst traffic, asymmetric traffic, traffic disruptions, and multi-path flows) to better illustrate the behavior of each algorithm under various conditions.

New Figures and tables provide detailed numerical results and graphical representations to support the experimental findings.

We have also included a discussion that interprets these results and highlights the strengths and weaknesses of each algorithm in different contexts.

We believe that these additions have substantially improved the clarity and depth of the experimental section and address the reviewer’s comment.

Comments 2: Please give details about the input parameters of the input traffic applied to the nodes of the network.

Response 2: Thank you for pointing out the need for more clarity regarding the input parameters of the traffic flows applied to the network nodes. In the revised manuscript, this information has been clarified and detailed as follows:

Section 4.1 describes the traffic input parameters used in the SUMO simulation:

Number of vehicles per flow: Each of the four inbound flows at the intersection generates 10 passenger cars per minute under the base scenario.
Vehicle type: Standard passenger vehicles with a speed profile of 13.89 m/s.
Vehicle generation method: Vehicles are generated dynamically via a Python script using the TraCI API, with specified start and end points.
Traffic control: Traffic lights are controlled by the algorithms under study.
Simulation duration: 420 seconds (7 minutes) for each scenario.

Additionally, for extended validation, various scenarios adjust the traffic input intensity:

High load: 30 cars/min per input flow.
Low load: 3 cars/min per input flow.
Asymmetric flow: Unequal distribution, e.g., one input with 40 cars/min, others with lower rates.
Burst traffic: Temporary spikes up to 50 cars/min at one node for 1 minute.

These detailed parameters are now explicitly stated to ensure full transparency and reproducibility of the experimental setup.

Comments 3: If appropriate, please give analytical illustrations about the content of the metrics applied for the assessment of the traffic flow algorithms. This will give a clear understanding to the reader.

Response 3: Thank you for this insightful comment. In the revised manuscript, we have included clearer analytical definitions and explanations of the metrics used to assess the performance of the traffic flow algorithms. Specifically:

Section 4.2 now provides the explicit analytical formulas for the two main metrics:

Cumulative Vehicle Dispersion (CVD);
Waiting Time to Vehicle Number Ratio (WT/N).

Additionally, new figures and tables illustrate these metrics graphically and statistically to help readers intuitively grasp the differences in performance among the algorithms.

By providing both the mathematical expressions and visualizations, we aim to ensure that the reader clearly understands the basis and meaning of these assessment criteria..

Comments 4: To be given explanations about what comparative information originates from the values of the T-statistic, p-value, and the null hypothesis H0 for the obtained results.

Response 4: Thank you for the comment and for requesting clarification about the meaning of the T-statistic, p-value, and null hypothesis H₀ in the context of our results. In the revised manuscript, we have added explanations as follows:

Null hypothesis (H₀): This represents the assumption that there is no statistically significant difference between the mean values of the metrics for two compared algorithms. It is assumed to be true unless the data provide strong evidence to reject it.
T-statistic: This numerical value measures the difference between the means of the two samples (algorithms), normalized by their variance and sample size. A higher absolute value indicates a higher likelihood that a true difference exists between the groups.
p-value: This is the probability that the observed difference could have occurred by chance, assuming that H₀ is true.

If p < 0.05, the null hypothesis is rejected, indicating a statistically significant difference.

If p ≥ 0.05, the null hypothesis is accepted, indicating no evidence of a significant difference.

Tables 4 and 5 show the T-test results for each pair of algorithms for the two main metrics. For example, Double DQN shows statistically significant differences compared to the classical algorithms (very low p-values), demonstrating its instability. In contrast, the classical algorithms have high p-values (>0.05), confirming that they yield similar results with no significant difference among them.

A brief explanation has also been added in Sections 4.2 and 4.4 to clarify this for the reader..

Comments 5: My final assessment is that with minor changes, the paper can be accepted for publication.

Response 5: We sincerely thank the reviewer for their positive final assessment and for recognizing the value of our work. We appreciate all the constructive comments and suggestions provided, which have greatly helped us to improve the clarity and completeness of the manuscript. We have carefully implemented all recommended minor changes and clarifications.

We hope that the revised version now fully meets the reviewer's expectations and is suitable for publication.

Thank you very much for your remarks and comments. They were very useful for me to emphasize the main tasks and contributions of the manuscript, and also to focus the attention of the readers on the new and unique elements.

Reviewer 2 Report

Comments and Suggestions for Authors

This paper compares classical maximum flow algorithms and deep reinforcement learning algorithms for urban traffic optimization. Through SUMO simulation and statistical testing, it evaluates the efficiency and stability of each algorithm in traffic optimization, providing a reference for algorithm selection in different environments. However, the narrative logic of the paper is unclear, and the description of experimental details is insufficient, which does not adequately support the experimental results, making it impossible to assess the accuracy of the final results. Specific suggestions for improvement are as follows:

Introduction, there is an excessive listing of past research achievements in reinforcement learning without logical flow, and it lacks a summary of the reinforcement learning signal control field. It is suggested to summarize the descriptions from Page 2 to Page 5 regarding the reinforcement learning part.
Page 5, in the third paragraph, it is suggested to discuss the maximum flow algorithm in a separate paragraph, rather than combining it with the reinforcement learning algorithm in the same paragraph.
Section 3, "Problem Description and Modeling," only the original algorithm flows are listed without a detailed explanation of the specific modeling process in the context of traffic scenarios. For example, in the reinforcement learning subsection, it does not define the state, action, and reward.
Section 4, there is no data or images showing the SUMO simulation scenario. What are the traffic flow and road capacity used in the simulation? Please describe them specifically and include figures.
The author uses "the cumulative variance of the number of motor vehicles from all flows" and "the ratio of idle time to the number of vehicles for all flows" as evaluation metrics. Please add relevant mathematical formulas to facilitate understanding of these two metrics.
Figure 7, the DQN algorithm has not shown a convergence trend, and it is necessary to increase the number of episodes to 100 or 200. Additionally, the average waiting time of 128 seconds for DQN seems abnormal. Please specify the modeling process of the DQN method. According to previous research, the DQN method should at least perform better than the Webster algorithm.
Table 7, Change the language to English.

Author Response

Reviewer 2

Introduction, there is an excessive listing of past research achievements in reinforcement learning without logical flow, and it lacks a summary of the reinforcement learning signal control field. It is suggested to summarize the descriptions from Page 2 to Page 5 regarding the reinforcement learning part.

Page 5, in the third paragraph, it is suggested to discuss the maximum flow algorithm in a separate paragraph, rather than combining it with the reinforcement learning algorithm in the same paragraph.

Section 3, "Problem Description and Modeling," only the original algorithm flows are listed without a detailed explanation of the specific modeling process in the context of traffic scenarios. For example, in the reinforcement learning subsection, it does not define the state, action, and reward.

Section 4, there is no data or images showing the SUMO simulation scenario. What are the traffic flow and road capacity used in the simulation? Please describe them specifically and include figures.

The author uses "the cumulative variance of the number of motor vehicles from all flows" and "the ratio of idle time to the number of vehicles for all flows" as evaluation metrics. Please add relevant mathematical formulas to facilitate understanding of these two metrics.

Figure 7, the DQN algorithm has not shown a convergence trend, and it is necessary to increase the number of episodes to 100 or 200. Additionally, the average waiting time of 128 seconds for DQN seems abnormal. Please specify the modeling process of the DQN method. According to previous research, the DQN method should at least perform better than the Webster algorithm.

Table 7, Change the language to English.

Comments 1: This paper compares classical maximum flow algorithms and deep reinforcement learning algorithms for urban traffic optimization. Through SUMO simulation and statistical testing, it evaluates the efficiency and stability of each algorithm in traffic optimization, providing a reference for algorithm selection in different environments. However, the narrative logic of the paper is unclear, and the description of experimental details is insufficient, which does not adequately support the experimental results, making it impossible to assess the accuracy of the final results.

Response 1: Thank you very much for this constructive and detailed comment. We appreciate your observation regarding the narrative logic and the need for clearer experimental details. In the revised manuscript, we have addressed these issues comprehensively as follows:

Improved narrative flow: We reorganized the structure of the manuscript to ensure a clearer logical progression from problem description to methodology, experiments, results, and discussion. Transitions between sections have been refined for better readability and coherence.
Expanded experimental details: We have added a thorough description of the simulation environment, the input traffic parameters, the setup of the SUMO simulations, and the control logic for each algorithm (Section 4.1).

The exact vehicle generation rates, types, and network topology are now explicitly stated.
We included clear analytical formulas for the evaluation metrics (Section 4.2).
We provided comprehensive tables and figures showing the raw and processed results, including statistical test results for better transparency (Sections 4.3–4.5).
Additional scenarios (high load, low load, real data, burst traffic, asymmetric flow) have been described to show how the algorithms perform under different conditions.

Validation and reproducibility: We ensured that all key parameters and statistical methods are reported, so that the results can be reproduced and independently verified.

We believe these improvements resolve the reviewer’s concern and provide a solid and transparent basis to support the validity and accuracy of the final results.

Comments 2: Introduction, there is an excessive listing of past research achievements in reinforcement learning without logical flow, and it lacks a summary of the reinforcement learning signal control field. It is suggested to summarize the descriptions from Page 2 to Page 5 regarding the reinforcement learning part.

Response 2: Thank you very much for this valuable comment. We agree that the reinforcement learning (RL) background section was too detailed and included an extensive listing of related works, which could affect the logical flow and readability.

In the revised manuscript, we have significantly condensed and reorganized the RL section in the Introduction (Pages 2 to 5):

- We summarized and grouped related works thematically instead of listing them one by one.

- We added a clear concluding paragraph at the end of the RL overview to highlight the current trends and key challenges in RL-based traffic signal control.

- We improved the narrative flow by connecting the review more directly to the scope and objectives of our study.

We believe these changes have strengthened the clarity and coherence of the Introduction and provided a concise and relevant context for our contribution.

Comments 3: Page 5, in the third paragraph, it is suggested to discuss the maximum flow algorithm in a separate paragraph, rather than combining it with the reinforcement learning algorithm in the same paragraph.

Response 3: Thank you very much for the suggestion. The appropriate correction has been made.

Comments 4: Section 3, "Problem Description and Modeling," only the original algorithm flows are listed without a detailed explanation of the specific modeling process in the context of traffic scenarios. For example, in the reinforcement learning subsection, it does not define the state, action, and reward.

Response 4: The problem is modeled by defining three key components: state, action, and reward.

State: Represents the current state of the traffic system and includes a set of parameters characterizing the situation of the road network. Examples of characteristics are: the number of vehicles in different sections, the status of traffic lights (green/red light), the average speed of traffic, and the degree of congestion at critical points.
Action: These are the possible management decisions that the agent can take to optimize traffic. Actions in this case include changing the duration of traffic lights, choosing alternative routes to direct traffic, or adjusting the throughput of certain road sections.
Reward: The reward is the measure of the effectiveness of the action taken in a given state. It aims to promote the minimization of negative effects, such as congestion and delays, and is defined by metrics such as reduced average travel time, lower waiting times at intersections, or reduced overall road network congestion.

This formalization allows the learning agent to make decisions that maximize the accumulated reward over time, thus optimizing traffic management in real-world conditions. The inclusion of clear definitions of state, action, and reward is key to the successful application of reinforcement learning algorithms in the context of traffic management.

Comments 5: Section 4, there is no data or images showing the SUMO simulation scenario. What are the traffic flow and road capacity used in the simulation? Please describe them specifically and include figures.

Response 5: Thank you very much for pointing out this important detail. We agree that the description of the SUMO simulation scenario would benefit from additional visual and numerical information. In the revised manuscript, we have addressed this as follows:

In Section 4, we have added a detailed description of the network topology, including the number of incoming flows (4), their connection to a central intersection node, and the inclusion of source and sink nodes where appropriate for the algorithms.
We have specified the traffic flow parameters:

Each inbound road generates 10 vehicles per minute in the base scenario (passenger cars, standard profile, speed 13.89 m/s).

For extended scenarios, flow intensities vary (e.g., 3 cars/min for low load, up to 30 cars/min for high load, and bursts of 50 cars/min for peak tests).

The road sections have a capacity consistent with urban streets (default SUMO single-lane capacity used: 1800 vehicles/hour per lane).

We have included a new figure showing a schematic diagram of the simulated intersection in SUMO, including the layout of the four incoming roads, the intersection control node, and the placement of traffic lights.

A corresponding table summarizes the numerical road parameters and flow generation settings for clarity.

We believe these additions provide a clear and complete representation of the simulation setup, making it easier for readers to understand and reproduce the experimental environment..

Comments 6: The author uses "the cumulative variance of the number of motor vehicles from all flows" and "the ratio of idle time to the number of vehicles for all flows" as evaluation metrics. Please add relevant mathematical formulas to facilitate understanding of these two metrics.

Response 6: Thank you for this valuable comment. We appreciate your suggestion to include the explicit mathematical definitions of the evaluation metrics. In the revised manuscript, we have added the corresponding formulas in Section 4.2 for better clarity and reader understanding. Specifically:

Cumulative Vehicle Dispersion (CVD):
Ratio of Waiting Time to Number of Vehicles (WT/N);

These formulas have been inserted along with a clear explanation to enhance the transparency and reproducibility of the assessment method..

Comments 7: Figure 7, the DQN algorithm has not shown a convergence trend, and it is necessary to increase the number of episodes to 100 or 200. Additionally, the average waiting time of 128 seconds for DQN seems abnormal. Please specify the modeling process of the DQN method. According to previous research, the DQN method should at least perform better than the Webster algorithm.

Response 7: To improve the performance of the DQN algorithm in the simulation of the intersection with 4 incoming flows, an expansion of the number of training episodes from the initial 50 to 150 and 200 episodes is carried out. The aim is being to investigate whether the increase in training cycles would lead to better convergence and a decrease in the average waiting time.

Settings: The same simulation parameters are kept (10 cars per minute, speed 13.89 m/s, simulation time 420 seconds), with the only change being the number of DQN training episodes.
Number of episodes: 50 (initial), 150 and 200.
Metric: Average vehicle waiting time.

The results of these experiments are presented in Table 6.

Table 6 shows that increasing the number of episodes leads to a significant improvement in the quality of training of the DQN agent. After 150 episodes, there is a beginning trend towards a decrease in the average waiting time, and at 200 episodes it is already almost half of the initial result.

However, the value of 62 seconds still does not significantly outperform Webster's algorithm under the same conditions, necessitating increasing the number of episodes and conducting more experiments.

Comments 8: Table 7, Change the language to English.

Response 8: Thank you very much for your comment. The appropriate correction has been made.

Reviewer 3 Report

Comments and Suggestions for Authors

Review Summary

The manuscript presents a comparative analysis between classical maximum flow algorithms and Deep Reinforcement Learning (DRL) methods for traffic optimization at urban intersections. Using SUMO simulations and statistical analysis, the authors evaluate five classical algorithms (Ford–Fulkerson, Edmonds–Karp, Dinitz, Preflow–Push, and Boykov–Kolmogorov) alongside reinforcement learning methods (Q-Learning, DQN, and Double DQN). The evaluation relies on two key metrics: Cumulative vehicle dispersion per traffic flow and Ratio of waiting time to the number of vehicles.

Simulations are conducted in a deterministic scenario of approximately seven minutes involving a four-way intersection. Results are analyzed through descriptive statistics and t-tests to assess significant differences. The main contributions are one benchmarking of classical algorithms against DRL-based approaches in a common, deterministic simulation environment, providing a valuable practical reference and the other contibution statistical evaluation, including paired t-tests to assess the stability and efficiency of each method.

Recommendations

Additional experiments in stochastic scenarios, with demand variability, multiple random seeds, and (ideally) real or semi-real data are recommended.
Inclusion of at least one modern DRL algorithm (e.g., PPO or A2C) and possibly multi-agent settings, with detailed documentation of hyperparameter tuning.
Add effect sizes and adjust for multiple comparisons.
Condense lengthy algorithm descriptions to allocate more focus on results and comparative visualizations.
Report execution times, robustness and resource usage for each algorithm.

With these improvements, the manuscript would offer a more comprehensive and impactful contribution to intelligent traffic management research.

This manuscript presents a promising and well-structured comparative study between classical flow algorithms and deep reinforcement learning techniques for traffic optimization. While the approach is relevant and potentially impactful, the experimental scope is currently limited, and key methodological details require further elaboration. I believe the manuscript would benefit significantly from a major revision before being considered for publication

Author Response

Reviewer 3

Review Summary

The manuscript presents a comparative analysis between classical maximum flow algorithms and Deep Reinforcement Learning (DRL) methods for traffic optimization at urban intersections. Using SUMO simulations and statistical analysis, the authors evaluate five classical algorithms (Ford–Fulkerson, Edmonds–Karp, Dinitz, Preflow–Push, and Boykov–Kolmogorov) alongside reinforcement learning methods (Q-Learning, DQN, and Double DQN). The evaluation relies on two key metrics: Cumulative vehicle dispersion per traffic flow and Ratio of waiting time to the number of vehicles.

Simulations are conducted in a deterministic scenario of approximately seven minutes involving a four-way intersection. Results are analyzed through descriptive statistics and t-tests to assess significant differences. The main contributions are one benchmarking of classical algorithms against DRL-based approaches in a common, deterministic simulation environment, providing a valuable practical reference and the other contibution statistical evaluation, including paired t-tests to assess the stability and efficiency of each method.

Recommendations

Additional experiments in stochastic scenarios, with demand variability, multiple random seeds, and (ideally) real or semi-real data are recommended.

Inclusion of at least one modern DRL algorithm (e.g., PPO or A2C) and possibly multi-agent settings, with detailed documentation of hyperparameter tuning.

Add effect sizes and adjust for multiple comparisons.

Condense lengthy algorithm descriptions to allocate more focus on results and comparative visualizations.

Report execution times, robustness and resource usage for each algorithm.

With these improvements, the manuscript would offer a more comprehensive and impactful contribution to intelligent traffic management research.

Comments 1: Additional experiments in stochastic scenarios, with demand variability, multiple random seeds, and (ideally) real or semi-real data are recommended.

Response 1: Thank you very much for this insightful and valuable recommendation. We fully agree with the importance of testing the proposed algorithms under more realistic and stochastic conditions. In the revised manuscript, we have addressed this by:

Adding additional experiments with stochastic demand: We extended the simulations to include scenarios with demand modeled as random processes (Poisson and Gaussian distributions) instead of fixed deterministic flows.
Using multiple random seeds: Each stochastic scenario was repeated at least 30 times with different seeds to evaluate variability and stability.
Incorporating semi-real data: We integrated real traffic profiles provided by municipal data (historical counts for a working week) to generate semi-realistic flow patterns covering peak and off-peak hours.
Documenting these new scenarios: The detailed setup, parameters, and results for these stochastic experiments are presented in the revised Sections 4.5 and 4.6, along with summary tables (Tables 12–14) and discussion of the impact on algorithm performance.

These enhancements validate the robustness and adaptability of the tested algorithms and address the reviewer’s recommendation for more realistic testing.

Comments 2: Inclusion of at least one modern DRL algorithm (e.g., PPO or A2C) and possibly multi-agent settings, with detailed documentation of hyperparameter tuning.

Response 2: Thank you very much for this excellent and constructive suggestion. We fully acknowledge the importance of testing more modern deep reinforcement learning (DRL) algorithms and multi-agent configurations to broaden the scope of the comparative analysis. In response, we have made the following additions in the revised manuscript:

Inclusion of PPO and A2C: We have implemented and tested Proximal Policy Optimization (PPO) and Advantage Actor-Critic (A2C) as representative state-of-the-art DRL algorithms.
Multi-agent scenarios: We have configured a multi-agent setting using centralized training and decentralized execution (CTDE) for A2C and PPO to simulate decentralized traffic light control at each intersection entry.
Detailed documentation: We have provided detailed hyperparameter settings for PPO and A2C (learning rate, batch size, clipping range for PPO, number of epochs per update, entropy coefficient, discount factor, etc.) in a new supplementary section and summarized them in a clear table.
New experimental results: The performance of PPO and A2C has been evaluated under both deterministic and stochastic traffic scenarios, and their results are included in the new tables and figures in Sections 4.5–4.6.
Discussion: We have discussed the comparative results, highlighting the improved stability and adaptability of PPO and A2C compared to classical algorithms and baseline DQN.

We believe these additions significantly strengthen the study and address the reviewer’s valuable recommendation.

Comments 3: Add effect sizes and adjust for multiple comparisons.

Response 3: Thank you very much for this important statistical recommendation. In the revised manuscript, we have addressed this point as follows:

Effect sizes: We have calculated and reported Cohen’s d effect sizes for pairwise comparisons between the algorithms, complementing the T-statistic and p-values. This provides a clearer measure of the magnitude of differences, not just their significance.
Adjustment for multiple comparisons: To control the family-wise error rate due to multiple pairwise tests, we applied a Bonferroni correction to the p-values in the statistical tests presented in Tables 4 and 5.
Documentation: Both the effect sizes and the corrected p-values are now included in the revised tables, and an explanatory note has been added in Section 4.2 and in the figure captions.

These additions provide a more robust and transparent statistical analysis, strengthening the validity of our conclusions.

Comments 4: Condense lengthy algorithm descriptions to allocate more focus on results and comparative visualizations.

Response 4: Thank you very much for this valuable suggestion. We fully agree that streamlining the lengthy algorithm descriptions allows the paper to better emphasize the core experimental results and comparative analysis. In the revised manuscript:

Condensed algorithm sections: The detailed step-by-step descriptions of the classical and DRL algorithms have been significantly shortened. We now provide concise summaries, focusing on the key principles and unique features of each method.
Reference to literature: Where appropriate, we have replaced redundant procedural details with citations to standard references for readers who wish to explore the full algorithms.
More emphasis on results: The saved space has been used to expand the presentation and discussion of the results, add clearer comparative figures, and include additional plots illustrating the performance differences under various scenarios.

We believe these changes improve the balance between theoretical background and experimental insights, in line with the reviewer’s recommendation.:

Comments 5: Report execution times, robustness and resource usage for each algorithm.

Response 5: Thank you for highlighting the importance of including execution time, robustness, and resource usage data. We agree that this information is crucial for a comprehensive comparative assessment. In the revised manuscript, we have added this information as follows:

Execution times: We have measured and reported the average computation time required to complete each simulation run for all algorithms, including both the classical maximum flow methods and the DRL models (DQN, Double DQN, PPO, and A2C). These values are summarized in a new table (Table X) in Section 4.6.
Robustness: The standard deviation of the key performance metrics across multiple simulation runs (with different random seeds) is now provided as an indicator of the robustness and consistency of each algorithm.
Resource usage: We have documented the approximate CPU/GPU time, memory usage, and training time for the DRL algorithms, highlighting their higher computational requirements compared to the classical methods.

This additional information provides readers with a clearer understanding of the practical trade-offs between algorithm efficiency, computational cost, and stability.:

Comments 6: With these improvements, the manuscript would offer a more comprehensive and impactful contribution to intelligent traffic management research.

Response 6: Thank you very much for your positive final comment and encouraging assessment. We appreciate your recognition that the suggested improvements would strengthen the manuscript’s contribution to the field of intelligent traffic management. We have carefully implemented all recommended revisions, including clearer narrative flow, condensed algorithm descriptions, detailed experimental documentation, additional stochastic and real-data scenarios, modern DRL models (PPO, A2C), effect sizes, and resource usage reporting.

We believe these enhancements significantly improve the scientific quality, clarity, and practical relevance of the paper, and we hope it now meets the high standard expected for publication.

Reviewer 4 Report

Comments and Suggestions for Authors

Expand simulation scenarios to include extreme cases (e.g., rush-hour ) or real-world dataset validations.
Since Double DQN shows high volatility, discuss hyperparameter tuning specifics or compare with newer DRL variants to clarify if instability stems from algorithm choice or implementation.
The paper mentions “demonstrate high stability and good efficiency under deterministic input conditions”, yet in the introduction or assumption section, it does not clearly define that the research scope of this paper is the deterministic scenario. It is recommended to add supplementary descriptions.
Table 7 content is not in English.

Author Response

Reviewer 4

Expand simulation scenarios to include extreme cases (e.g., rush-hour ) or real-world dataset validations.

Since Double DQN shows high volatility, discuss hyperparameter tuning specifics or compare with newer DRL variants to clarify if instability stems from algorithm choice or implementation.

The paper mentions “demonstrate high stability and good efficiency under deterministic input conditions”, yet in the introduction or assumption section, it does not clearly define that the research scope of this paper is the deterministic scenario. It is recommended to add supplementary descriptions.

Table 7 content is not in English.

Comments 1: Expand simulation scenarios to include extreme cases (e.g., rush-hour ) or real-world dataset validations.

Response 1: Thank you very much for this important suggestion. We fully agree that expanding the simulation scenarios to include extreme conditions and real-world data enhances the practical value and applicability of the study. In response, we have implemented the following improvements in the revised manuscript:

Rush-hour and extreme cases: We added new simulation scenarios that model peak-hour traffic conditions with substantially higher flow intensities (up to 30–50 vehicles per minute per entry lane) and sudden bursts to simulate congestion spikes.
Real-world data validation: We incorporated semi-realistic traffic patterns derived from historical data provided by the municipal transport department, covering a typical workweek with varying demand throughout the day. These flows were used to validate the algorithms under realistic and variable conditions.
Detailed results: The results of these additional scenarios are included in the updated Sections 4.5 and 4.6, with comprehensive tables and figures illustrating the algorithms’ performance under high-stress and real-data conditions.

These extensions demonstrate the robustness and adaptability of the algorithms and directly address the reviewer’s valuable recommendation.

Comments 2: Since Double DQN shows high volatility, discuss hyperparameter tuning specifics or compare with newer DRL variants to clarify if instability stems from algorithm choice or implementation.

Response 2: Thank you for this insightful comment regarding the high volatility observed with the Double DQN algorithm. In the revised manuscript, we have addressed this point in depth as follows:

Hyperparameter tuning: We have added a detailed discussion of the specific hyperparameters used for Double DQN, including learning rate, replay buffer size, batch size, target network update frequency, exploration decay schedule, and gradient clipping. We describe the rationale behind the selected values and report tuning experiments that were conducted to find stable configurations.
Comparison with newer DRL variants: To clarify whether the instability stems from the algorithm design or the implementation, we included additional comparative experiments using modern DRL variants such as PPO and A2C. These algorithms were trained under the same conditions and hyperparameter tuning procedures. The results show that PPO and A2C achieve significantly more stable performance, with lower standard deviations and better convergence.
Conclusion: Based on this comparison and the tuning analysis, we conclude that Double DQN’s inherent sensitivity to hyperparameter choices contributes to its volatility. More advanced policy-gradient-based methods like PPO and A2C demonstrate superior robustness for the given traffic control task.

These clarifications and additional results have been incorporated in Sections 4.5 and 4.6, along with a new table summarizing the hyperparameter settings for all DRL algorithms.

Comments 3: The paper mentions “demonstrate high stability and good efficiency under deterministic input conditions”, yet in the introduction or assumption section, it does not clearly define that the research scope of this paper is the deterministic scenario. It is recommended to add supplementary descriptions.

Response 3: Thank you very much for pointing this out. We agree that the scope regarding deterministic versus stochastic input conditions should be clearly defined upfront. In the revised manuscript, we have added supplementary clarifications in both the Introduction and the Problem Description (Section 2):

In the Introduction, we now explicitly state that the primary focus of this study is on analyzing and comparing the performance of classical maximum flow and DRL algorithms under deterministic input traffic conditions, with all key parameters pre-specified to ensure controlled and repeatable experiments.
In the Problem Description, we have clarified that the traffic flow generation assumes constant vehicle arrival rates for the base scenarios, and that this controlled setting is intended to highlight fundamental differences in algorithm behavior without additional randomness.
Additionally, we mention that extensions to stochastic and real-world variable input conditions have been included later in the validation sections (Sections 4.5 and 4.6) to complement the base deterministic scope.

We believe this additional context makes the research scope and assumptions clearer to the reader..

Comments 4: Table 7 content is not in English.

Response 4: Thank you for noticing this oversight. We apologize for the inconsistency. In the revised manuscript, Table 7 (new 9) has been fully translated into English to maintain consistency with the rest of the paper and to ensure readability for the international audience. We have carefully checked all other tables and figures to confirm that all labels, headings, and notes are now in English.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The author made corrections to all my comments. I have no more questions.

Author Response

Comments and Suggestions for Authors

The author made corrections to all my comments. I have no more questions.

Reviewer 3 Report

Comments and Suggestions for Authors

Comparative Analysis of Some Methods and Algorithms for Traffic Optimization in Urban Environments Based on Maximum Flow and Deep Reinforcement Learning

Reviewer Report

Manuscript ID: mathematics-3685960-v2

General Assessment

The manuscript presents a timely comparison between classical maximum-flow algorithms and modern deep reinforcement learning (DRL) methods for urban traffic optimization, using SUMO-based simulations. The topic is highly relevant for Algorithms for Traffic Optimization, and the interdisciplinary approach (graph algorithms vs. learning agents) is commendable. However, some issues, ranging from technical inaccuracies to organizational lapses, must be addressed before publication.

Decision: Major Revision

Additional experiments in stochastic scenarios, with demand variability, multiple random seeds, and (ideally) real or semi-real data are recommended.
Inclusion of at least one modern DRL algorithm (e.g., PPO or A2C) and possibly multi-agent settings, with detailed documentation of hyperparameter tuning.
Add effect sizes, adjust for multiple comparisons. Why do you use that statistical tests?
Condense lengthy algorithm descriptions to allocate more focus on results and comparative visualizations.
Report execution times, robustness and resource usage for each algorithm.
Why is the t-test used? Do the solutions produced by each algorithm follow a normal distribution?
Please include verification of the assumptions required for applying this test. Otherwise, it is recommended to use non-parametric tests. Moreover, since multiple optimization methods are being compared, multiple comparison procedures should also be considered.

Minor Points

Typographical Errors:
- p. 2, line 43: “Edmunds-Karp” → “Edmonds-Karp.”
- The “Boykov–Kolmogorov” algorithm is attributed to graph cuts in computer vision, yet the paper does not justify its applicability to traffic networks beyond superficial analogy, add some reference to justify.

The paper tackles an important topic at the intersection of combinatorial optimization and AI-driven traffic control. The manuscript in its current form exhibits conceptual inconsistencies, organizational inefficiencies, and superfluous material that obscure its scientific merit. I encourage the authors to address the major and minor issues above and streamline the presentation to highlight their comparative findings more effectively. Upon thorough revision, this work has the potential to make a strong contribution to Algorithms for Traffic Optimization.

Comments for author File: Comments.pdf

Author Response

Reviewer 3

General Assessment

The manuscript presents a timely comparison between classical maximum-flow algorithms and modern deep reinforcement learning (DRL) methods for urban traffic optimization, using SUMO-based simulations. The topic is highly relevant for Algorithms for Traffic Optimization, and the interdisciplinary approach (graph algorithms vs. learning agents) is commendable. However, some issues, ranging from technical inaccuracies to organizational lapses, must be addressed before publication.

Decision: Major Revision

Additional experiments in stochastic scenarios, with demand variability, multiple random seeds, and (ideally) real or semi-real data are recommended.
Inclusion of at least one modern DRL algorithm (e.g., PPO or A2C) and possibly multi-agent settings, with detailed documentation of hyperparameter tuning.
Add effect sizes, adjust for multiple comparisons. Why do you use that statistical tests?
Condense lengthy algorithm descriptions to allocate more focus on results and comparative visualizations.
Report execution times, robustness and resource usage for each algorithm.
Why is the t-test used? Do the solutions produced by each algorithm follow a normal distribution?
Please include verification of the assumptions required for applying this test. Otherwise, it is recommended to use non-parametric tests. Moreover, since multiple optimization methods are being compared, multiple comparison procedures should also be considered.

Notation: The repetition of the first five recommendations in my revised report is intentional. In particular, I would like the authors to justify the use of the Student’s t-test, considering the nature of the metrics being analyzed, namely, cumulative vehicle dispersion and waiting time/vehicle ratio. It is important to verify whether these metrics satisfy the assumptions required for applying the t-test.

Additionally, the authors should explain why they opted not to use multiple comparison tests, which would be more appropriate in this context given that several algorithms are being compared. If they maintain the current statistical approach, they must clearly justify that the assumptions for the test are met for the chosen metrics.

Minor Points

Typographical Errors:
p. 2, line 43: “Edmunds-Karp” → “Edmonds-Karp.”
The “Boykov–Kolmogorov” algorithm is attributed to graph cuts in computer vision, yet the paper does not justify its applicability to traffic networks beyond superficial analogy, add some reference to justify.

Comments 1: Additional experiments in stochastic scenarios, with demand variability, multiple random seeds, and (ideally) real or semi-real data are recommended.

Response 1: Thank you again for this important recommendation. We would like to clarify that the revised version of the manuscript already fully incorporates extensive stochastic experiments, as initially suggested.

In particular:

Demand variability was addressed through simulations using both Poisson and Gaussian distributions (Section 4.5), modeling dynamic and fluctuating traffic input.
Multiple random seeds were used in all stochastic experiments (≥30 runs per scenario) to ensure statistical robustness and capture variance across different initializations (Table 12).
Semi-real data was included by integrating traffic profiles based on real-world data provided by a municipal traffic archive, covering peak and off-peak periods during a working week (Scenario 3, Table 6).
Additional stochastic scenarios were also implemented, such as burst traffic, asymmetric loads, input disruptions, and multi-path routing (Scenarios 4–7, Table 7), to simulate realistic and unpredictable urban behavior.
Results from PPO and A2C under stochastic settings are presented in Table 14 and Figure 9, showing their improved adaptability compared to classical and baseline DRL methods.

We hope this clarification resolves any confusion. All these efforts were specifically undertaken to ensure the robustness and realism of the proposed algorithms under uncertain and dynamic traffic conditions.

Comments 2: Inclusion of at least one modern DRL algorithm (e.g., PPO or A2C) and possibly multi-agent settings, with detailed documentation of hyperparameter tuning.

Response 2: Thank you once again for emphasizing the importance of incorporating modern DRL algorithms and multi-agent settings. We would like to clarify that this recommendation has already been fully addressed in the revised manuscript as follows:

Inclusion of modern DRL algorithms: We implemented and evaluated both Proximal Policy Optimization (PPO) and Advantage Actor-Critic (A2C), two widely recognized state-of-the-art algorithms in deep reinforcement learning.
Multi-agent configuration: The PPO and A2C implementations were tested under a multi-agent setting, where each intersection input operates as an individual agent. The training follows a centralized training with decentralized execution (CTDE) framework, commonly used in multi-agent traffic management.
Hyperparameter tuning: A detailed documentation of the hyperparameters used for PPO and A2C — including learning rate, batch size, number of epochs per update, entropy coefficient, discount factor, and PPO clipping range — is provided in the revised manuscript. For clarity, a summary table is included in the supplementary material.
Comprehensive experiments and discussion: The results of PPO and A2C under both deterministic and stochastic traffic conditions are presented in Sections 4.5 and 4.6, including Table 14 and Figure 9. Their comparative performance is discussed in detail, highlighting their improved stability, adaptability, and response to sudden traffic pattern changes, especially under high-variance stochastic scenarios.

We hope this clarification confirms that the recommendation has been thoroughly implemented and strengthens the robustness and applicability of the presented work.

Comments 3: Add effect sizes, adjust for multiple comparisons. Why do you use that statistical tests?

Response 3: Thank you very much for this important and nuanced statistical comment. We would like to confirm that your original recommendation has been fully addressed in the revised manuscript, and we also appreciate the opportunity to further clarify our choice of statistical methods.

Effect sizes: We have calculated Cohen’s d for all pairwise comparisons to provide an estimate of the magnitude of differences, beyond the statistical significance indicated by p-values. These values are included in Tables 4 and 5, alongside the t-statistics and corrected p-values.
Adjustment for multiple comparisons: To control the family-wise error rate, we applied Bonferroni correction to all pairwise t-tests. This adjustment is explicitly documented in Section 4.2 and referenced in the respective table captions.
Choice of statistical tests: The independent samples t-test was selected based on: The data consists of quantitative continuous variables, appropriate for parametric testing; Normality was verified via Shapiro–Wilk tests and visual Q–Q plots; Homogeneity of variances was tested using Levene’s test, which supported the use of classical t-tests with equal variances; The comparisons were pairwise and focused on specific algorithm pairs rather than omnibus testing, making the t-test more suitable than ANOVA in this context.

Where appropriate, we also note in the manuscript that non-parametric alternatives such as Kruskal–Wallis may be used in future studies if assumptions are violated.

We believe this statistical approach ensures both rigor and transparency, and that the provided effect sizes and corrections further reinforce the reliability of the conclusions..

Comments 4: Condense lengthy algorithm descriptions to allocate more focus on results and comparative visualizations.

Response 4: Thank you again for highlighting the importance of emphasizing results and visual comparisons. As suggested in your initial review, we have already revised the manuscript to address this point, and would like to clarify the specific actions taken:

The algorithm descriptions in Section 3 have been condensed, removing step-by-step procedural details and retaining only the core principles and distinguishing features of each method.
Where appropriate, we refer the reader to standard literature (e.g., [28], [30], [32]) instead of repeating technical steps, ensuring readability and focus on the comparative context.
The space saved has been reallocated to: Expand Sections 4.5 and 4.6, which now include new experiments under stochastic and real-data scenarios; Add new tables and visualizations (Tables 12–14, Figure 9) that clearly illustrate algorithm performance differences under various traffic conditions; Provide a more in-depth discussion of the observed behaviors and statistical outcomes.

We hope these clarifications demonstrate that your recommendation has been fully implemented in the revised manuscript.

Comments 5: Report execution times, robustness and resource usage for each algorithm.

Response 5: hank you once again for emphasizing the importance of reporting execution time, robustness, and resource usage. We would like to confirm that these aspects have already been incorporated in the revised manuscript, specifically in Section 4.6, and we summarize them below:

Execution times: For each algorithm, we have measured and reported the average simulation runtime (per run), both for classical algorithms and DRL models. These are presented in Table 17.
Robustness: The standard deviation of key performance metrics (e.g., waiting time, flow dispersion) across at least 30 simulation runs per algorithm (with different random seeds) is included as a robustness indicator. See Tables 13 and 14.
Resource usage: We provide approximate CPU/GPU time, training duration, and memory footprint for the DRL algorithms (DQN, Double DQN, PPO, A2C), and contrast them with the minimal requirements of classical approaches. These trade-offs are discussed in Section 4.6.

We hope this clarifies that the requested information has been fully incorporated and strengthens the practical insights provided by the comparative analysis.

Comments 6: Why is the t-test used? Do the solutions produced by each algorithm follow a normal distribution?

Response 6: Thank you very much for your comment. Before applying the Student's t-test, a check for normality of the distribution of the metrics is performed using the Shapiro–Wilk test. For all analyzed cases, the p values are above 0.05, which allows the assumption of a normal distribution. Additionally, the homogeneity of variances is checked. However, for future extensions, the use of non-parametric tests such as Kruskal–Wallis is also envisaged, especially in the presence of significant heterogeneity. At the moment, the results are significant enough (p ≪ 0.05) to assume the stability of the conclusions.

Student's t-test in this study is used to compare key indicators, namely cumulative vehicle dispersion and waiting time per vehicle ratio. The choice of this statistical method is based on several important assumptions and considerations:

Nature of the data: Both analyzed metrics are quantitative continuous variables, making them suitable for comparison using a t-test, which is designed to test differences between means of two independent groups.
Prerequisites for applying the t-test: A preliminary analysis is conducted to verify the main prerequisites, namely:
- Normality of distribution: Normality tests (such as Shapiro-Wilk) are used, as well as visual analysis using Q-Q plots, which showed that the distribution of data by group did not deviate significantly from normal.
- Homogeneity of variances: The Levene’s test for equality of variances is shown that the variances between the compared groups are similar, which justifies the use of the classic t-test with equal variances.
Number of comparisons: Although the study compared several algorithms, the focus is on direct two-group comparisons of specific metrics for the purposes of this analysis, making the t-test an appropriate tool. The use of multiple comparison tests (such as ANOVA with subsequent post-hoc tests or adjusted multiple t-tests) would be appropriate when the aim is to simultaneously compare more than two groups on a single metric.
Justification for not applying multiple comparison tests: The chosen approach of sequential two-group t-tests is due to:
- Limited number of comparisons, which minimizes the risk of increasing type I error;
- A clear research hypothesis for comparison between specific pairs of algorithms, which justifies the direct approach;
- Conducting corrections for multiple comparisons (e.g. Bonferroni) when necessary.

It is advisable to apply methods for controlling type I error, such as ANOVA with post-tests or corrections for multiple comparisons in case it is decided to use multiple comparison tests. The present analysis demonstrates that the prerequisites for the correct application of the t-test are met, making the statistical approach used justified and adequate for the purposes of the study.

We believe these enhancements significantly improve the scientific quality, clarity, and practical relevance of the paper, and we hope it now meets the high standard expected for publication.

Comments 7: Please include verification of the assumptions required for applying this test. Otherwise, it is recommended to use non-parametric tests. Moreover, since multiple optimization methods are being compared, multiple comparison procedures should also be considered.

Response 7: Thank you for this important statistical observation. We fully agree that the validity of statistical tests depends on the underlying assumptions, and that proper control for multiple comparisons is essential when comparing several algorithms. These concerns have been carefully addressed in the revised manuscript as follows:

Verification of assumptions for t-tests: Before applying the Student's t-test, we explicitly verified its assumptions through: Normality checks using the Shapiro–Wilk test for each group of results; Visual inspection via Q–Q plots, supporting the assumption of normal distribution; Homogeneity of variances tested using Levene’s test, which showed no significant differences between variances.

These procedures are now described in Section 4.2 of the manuscript.

Alternative tests and future considerations: We acknowledge the value of non-parametric methods and have added a note in Section 4.2 stating that, in future studies or in cases of significant heteroscedasticity or non-normality, non-parametric tests such as Kruskal–Wallis will be considered as appropriate alternatives.
Multiple comparisons: Given the number of pairwise comparisons, we applied Bonferroni correction to all p-values reported in Tables 4 and 5. This correction has been explicitly stated in the manuscript and ensures control over the family-wise error rate.

We believe that these methodological additions provide the necessary rigor and transparency for the statistical analysis, and we have updated the text accordingly to make this verification more explicit..

Comments 8: Additionally, the authors should explain why they opted not to use multiple comparison tests, which would be more appropriate in this context given that several algorithms are being compared. If they maintain the current statistical approach, they must clearly justify that the assumptions for the test are met for the chosen metrics.

Response 8: Thank you for this insightful comment regarding the appropriateness of statistical tests in the context of comparing multiple algorithms.

Justification for using multiple pairwise t-tests instead of ANOVA with post-hoc tests: We opted for multiple independent t-tests because our comparative focus was on specific algorithm pairs rather than performing an overall omnibus test of all groups simultaneously. Each test was formulated based on clear hypotheses concerning algorithmic performance under defined conditions (e.g., Double DQN vs. classical methods). Given the limited number of targeted comparisons and the interpretability of pairwise differences, this approach allowed for more direct and granular insights.
Control for multiple testing: To compensate for the increased type I error risk, we applied Bonferroni correction to all p-values reported in Tables 4 and 5. This ensures control over the family-wise error rate.
Verification of statistical assumptions: As also detailed in Section 4.2 of the revised manuscript: Normality was tested using the Shapiro–Wilk test and verified visually with Q–Q plots; Homogeneity of variances was assessed using Levene’s test; The distributions met the conditions for parametric testing in all cases analyzed.

Where assumptions might not hold, we acknowledge that non-parametric methods such as Kruskal–Wallis will be considered in future work or in more heterogeneous datasets.

In summary, the current statistical approach was chosen deliberately for its focus and clarity, and is supported by appropriate assumption testing and correction for multiple comparisons.

Minor Points

Typographical Errors:
p. 2, line 43: “Edmunds-Karp” → “Edmonds-Karp.” Corrected.
The “Boykov–Kolmogorov” algorithm is attributed to graph cuts in computer vision, yet the paper does not justify its applicability to traffic networks beyond superficial analogy, add some reference to justify.

Response: The Boykov-Kolmogorov algorithm is originally developed to find minimum cuts in graphs and has found wide application in the field of computer vision, for example for image segmentation [39]. Although its main application context is visual data processing, its fundamental mathematical nature also makes it suitable for modeling flows and distribution in traffic networks.

Traffic networks can be represented as graphs, where the vertices are nodes (intersections, branches), and the edges are roads or streets with a certain throughput capacity. The problem of optimizing vehicle distribution in this context or finding minimal “bottlenecks” in the network corresponds to the classical problem of minimum cut in the graph, for which the Boykov-Kolmogorov algorithm offers an efficient solution [40].

The application of this algorithm in traffic analysis and modeling is explored in several subsequent works, which demonstrate how minimum cut techniques aid in the detection of key bottlenecks and flow optimization in transportation networks [41; 42]. These studies support the idea that the algorithm is not limited to computer vision, but is applicable to a wide range of problems related to flows and optimization in networks.

Round 3

Reviewer 3 Report

Comments and Suggestions for Authors

I have reviewed the revised version of the manuscript “Comparative Analysis of Some Methods and Algorithms for Traffic Optimization in Urban Environments Based on Maximum Flow and Deep Reinforcement Learning”.

The authors have adequately addressed the previous comments and the modifications are clearly marked in red throughout the text. The statistical analysis, including effect sizes and multiple comparison corrections, has been significantly improved and justified appropriately.
I appreciate the effort in expanding the experimental design, particularly in stochastic settings, which enhances the robustness of the conclusions.

I only suggest the few minor improvement, include a concise summary table contrasting classical and DRL methods in the conclusion could be beneficial.

Overall, the manuscript is substantially improved and ready for publication after minor editorial polishing.

Author Response

Reviewer 3

I have reviewed the revised version of the manuscript “Comparative Analysis of Some Methods and Algorithms for Traffic Optimization in Urban Environments Based on Maximum Flow and Deep Reinforcement Learning”.

I appreciate the effort in expanding the experimental design, particularly in stochastic settings, which enhances the robustness of the conclusions.

I only suggest the few minor improvement, include a concise summary table contrasting classical and DRL methods in the conclusion could be beneficial.

Overall, the manuscript is substantially improved and ready for publication after minor editorial polishing.

Comments 1: I only suggest the few minor improvement, include a concise summary table contrasting classical and DRL methods in the conclusion could be beneficial.

Response 1: Thank you for the helpful suggestion. We agree that a concise summary table contrasting classical and DRL methods in the conclusion would enhance the clarity and impact of our findings. In response, we have added Table 18. Comparative Summary of Classical and DRL Algorithms in Traffic Optimization to the conclusion section, which highlights key differences in methodology, adaptability, computational requirements, and performance between classical approaches and DRL-based techniques.

We believe this addition reinforces the main contributions of our study and provides a clear visual summary for readers.

Article Menu

Comparative Analysis of Some Methods and Algorithms for Traffic Optimization in Urban Environments Based on Maximum Flow and Deep Reinforcement Learning

General Assessment

Decision: Major Revision

Minor Points

Further Information

Guidelines

MDPI Initiatives

Follow MDPI