Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Multi-Agent Reinforcement Learning-Based Control Method for Pedestrian Guidance Using the Mojiko Fireworks Festival Dataset

Electronics 2025, 14(6), 1062; https://doi.org/10.3390/electronics14061062

by Masato Kiyama^1,*

, Motoki Amagasaki¹

and Toshiaki Okamoto^1,2

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3:

Haodong Chen

Electronics 2025, 14(6), 1062; https://doi.org/10.3390/electronics14061062

Submission received: 12 February 2025 / Revised: 3 March 2025 / Accepted: 5 March 2025 / Published: 7 March 2025

(This article belongs to the Special Issue AI-Based Pervasive Application Services)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper presents a multi-agent reinforcement learning (MARL)-based approach for pedestrian guidance at congested events using the Mojiko Fireworks Festival dataset. The authors implement a Multi-Agent Deep Deterministic Policy Gradient (MA-DDPG) and propose an enhancement to Q-function learning. The study evaluates congestion reduction and agent cooperation in crowd flow management, demonstrating that their method effectively mitigates congestion compared to existing approaches like MA-DDPG and MAT.

1. The choice of learning rates, batch sizes, and update frequencies for different models is not well justified. The authors mention tuning experiments but do not provide a systematic analysis of how different hyperparameter settings impacted results. Include an ablation study or sensitivity analysis to demonstrate how hyperparameter variations affect performance.

2. The study is based on simulated pedestrian behavior from the dataset. It is unclear whether the findings would generalize to real-world events with dynamic crowd behavior and external factors (e.g., environmental influences and unexpected human decisions). If feasible, test the model in a real-world setting or use real-world pedestrian movement data to validate the findings.

3. The method involves multiple agents and deep reinforcement learning, which can be computationally expensive. There is no discussion of training time, hardware requirements, or scalability.
Provide details on computational efficiency, including runtime, GPU/CPU usage, and memory consumption.

4. The study reports performance improvements but does not analyze failure cases or situations where the model underperforms. Include qualitative examples or visualizations showing where and why the model struggles, especially at route confluences where congestion remains high.

5. Safety risks in reinforcement learning-based pedestrian guidance are not discussed. There is no mention of practical deployment considerations, such as how event organizers could integrate this system. Include a discussion of ethical risks, potential biases in simulations and real-world deployment challenges.

6. Figures (e.g., Figure 6 and Figure 7) should have clearer labels and captions to enhance readability.

Author Response

Comments 1: The choice of learning rates, batch sizes, and update frequencies for different models is not well justified. The authors mention tuning experiments but do not provide a systematic analysis of how different hyperparameter settings impacted results. Include an ablation study or sensitivity analysis to demonstrate how hyperparameter variations affect performance. Response 1: We thank the reviewer for this valuable suggestion. We added the following paragraph to page 8, paragraph 188:
We performed hyperparameter optimization for each method using Optuna \cite{Optuna}, an automatic hyperparameter optimization software framework, to systematically search for the parameter configurations that most effectively alleviate congestion. This approach allowed us to efficiently explore the parameter space and identify optimal settings that maximize performance across our evaluation metrics.
Comments 2: The study is based on simulated pedestrian behavior from the dataset. It is unclear whether the findings would generalize to real-world events with dynamic crowd behavior and external factors (e.g., environmental influences and unexpected human decisions). If feasible, test the model in a real-world setting or use real-world pedestrian movement data to validate the findings. Response 2: We acknowledge the reviewer's concern about generalizability to real-world scenarios. While our current study relies on simulated pedestrian behavior from the dataset, implementing and testing the model in real-world settings presents significant practical challenges. Real-world validation would require extensive infrastructure, regulatory approvals, and safety considerations that are beyond the scope of this initial research.
We added the following paragraph to page 12, paragraph 265:
Validating these findings with real-world pedestrian data or controlled field experiments to assess model performance under truly dynamic crowd behaviors, extreme congestion situations, safety risk, and unpredictable factors not fully captured in simulations.
Comments 3: The method involves multiple agents and deep reinforcement learning, which can be computationally expensive. There is no discussion of training time, hardware requirements, or scalability. Provide details on computational efficiency, including runtime, GPU/CPU usage, and memory consumption. Response 3: We thank the reviewer for this important point about computational aspects. While we agree that a detailed computational analysis would be valuable, we had to prioritize our research focus on algorithmic effectiveness rather than hardware optimization in this initial study. The computational efficiency largely depends on specific hardware configurations and implementation optimizations, which can vary significantly across different deployment scenarios. Our current implementation was designed as a proof-of-concept to validate the effectiveness of our approach rather than for computational efficiency.
We added the following paragraph to page 12, paragraph 263:
A detailed analysis of computational requirements and optimization remains an important direction.
Comments 4: The study reports performance improvements but does not analyze failure cases or situations where the model underperforms. Include qualitative examples or visualizations showing where and why the model struggles, especially at route confluences where congestion remains high. Response 4: In our research, we confirmed that the proposed method works well in all test scenarios, and is particularly effective in reducing congestion. In the experimental results, we did not observe any situations where the model clearly “stopped working”. However, as you pointed out, it is important to understand the behavior of the model in more extreme congestion situations and complex traffic patterns. In future research, We plan to test our model with extreme congestion situations.
Comments 5: Safety risks in reinforcement learning-based pedestrian guidance are not discussed. There is no mention of practical deployment considerations, such as how event organizers could integrate this system. Include a discussion of ethical risks, potential biases in simulations and real-world deployment challenges. Response 5: We sincerely thank the reviewer for raising these crucial points about safety, ethics, and practical deployment considerations. We have expanded our future work.
We added the following paragraph to page 12, paragraph 269:
Future deployment must address ethical concerns including: privacy implications of sensor systems, potential biases in crowd management algorithms, and clear responsibility frameworks for safety outcomes.
Comments 6: Figures (e.g., Figure 6 and Figure 7) should have clearer labels and captions to enhance readability.
We have revised the figures (Figure 6 and 7)

Reviewer 2 Report

Comments and Suggestions for Authors

The paper introduces a multi-agent reinforcement learning method for pedestrian guidance, particularly focusing on reducing congestion at events. The authors employ Multi-Agent Deep Deterministic Policy Gradient (MA-DDPG) and propose an enhanced method for learning the Q-function for actors within the MA-DDPG framework. They evaluate their method using the Mojiko Fireworks Festival dataset, comparing congestion levels with existing approaches.

Main Question Addressed: The research primarily addresses how to effectively reduce pedestrian congestion at large-scale events through improved guidance systems using multi-agent reinforcement learning techniques. The key is to provide guidance at multiple points while ensuring guides cooperate based on congestion information at each location.

Originality and Relevance: The topic is highly relevant, given the increasing concerns about safety at public events due to congestion. While reinforcement learning has been applied to crowd simulation, the paper specifically tackles the cooperative aspect of pedestrian guidance using multi-agent reinforcement learning, which is a valuable contribution. The paper addresses the gap in applying machine learning techniques such as reinforcement learning to guide pedestrian flow at events.

Contribution to the Subject Area: The paper enhances the application of MA-DDPG by improving the learning of Q-functions for actors. Unlike other methods that assume each agent has the same reward, this research allows for different rewards to be assigned to different agents, making it more adaptable to real-world scenarios. The method facilitates cooperative learning using local information by comparing the Q-function output from the critic to the Q-function of the actor from the information of all agents.

The authors could explore the sensitivity of the model to different hyperparameter settings. A more detailed discussion on how the hyperparameters were tuned would strengthen the methodology.

While the ϵ-greedy method is used for exploration, a comparison with other exploration techniques like Gumbel-Softmax could provide additional insights.

The paper could benefit from a discussion of the limitations of the social force model used in the simulator and how these limitations might affect the results.

Consistency of Conclusions: The conclusions are largely consistent with the evidence presented. The results indicate that the proposed method reduces congestion and promotes better agent cooperation compared to MA-DDPG and MAT. The method's ability to maintain an A rating in Level of Service (LOS) even near the goal is a strong indicator of its effectiveness. The authors state that each agent was able to cooperate without causing congestion in parts of the system when their method was used.

Appropriateness of References: The references appear appropriate, covering essential works in reinforcement learning, crowd simulation, and relevant datasets. The inclusion of specific papers on MA-DDPG, DQN, and the social force model is fitting.

The figures are generally helpful in illustrating the concepts and experimental setup. However, increasing the resolution and size of Figure 8 might improve readability.

Tables 6 and 7 could benefit from statistical significance testing to confirm the observed differences between the methods. Including p-values or confidence intervals would enhance the robustness of the findings.

Consider adding a table that summarizes the advantages and disadvantages of each method (MAT, MA-DDPG, Ours) for quick comparison.

How does the method perform with different crowd densities and event types?

What are the computational costs associated with the proposed method, and how do they compare to existing approaches?

How can the proposed method be adapted to handle unexpected events or changes in pedestrian behavior?

What are the ethical considerations in using such a system for crowd management, especially regarding privacy and autonomy?

Provide a more detailed discussion on the hyperparameter tuning process.

Include statistical significance testing in the results section.

Discuss the limitations of the social force model and its potential impact on the results.

Address the computational costs and scalability of the proposed method.

Consider the ethical implications of using the system in real-world scenarios.

Author Response

Comments 1:The authors could explore the sensitivity of the model to different hyperparameter settings. A more detailed discussion on how the hyperparameters were tuned would strengthen the methodology.
We thank the reviewer for this valuable suggestion. We added the following paragraph to page 8, paragraph 189:
We performed hyperparameter optimization for each method using Optuna \cite{Optuna}, an automatic hyperparameter optimization software framework, to systematically search for the parameter configurations that most effectively alleviate congestion. This approach allowed us to efficiently explore the parameter space and identify optimal settings that maximize performance across our evaluation metrics.
Comments 2:While the ϵ-greedy method is used for exploration, a comparison with other exploration techniques like Gumbel-Softmax could provide additional insights. Response 2: In this study, we adopted the ϵ-greedy method as a simple and effective method, but comparing it with other search methods such as Gumbel-Softmax could certainly provide useful insights. In the current implementation, the ϵ-greedy method provides sufficient performance, but we would like to consider comparative analysis of search strategies as a future research topic. We added the following paragraph to page 12, paragraph 272:
Exploring alternative exploration strategies beyond $\epsilon$-greedy, such as Gumbel-Softmax, to potentially enhance the learning process and convergence properties of our approach.
Comments 3:The figures are generally helpful in illustrating the concepts and experimental setup. However, increasing the resolution and size of Figure 8 might improve readability. Response 3: We have revised Figure 8.
Comments 4:Tables 6 and 7 could benefit from statistical significance testing to confirm the observed differences between the methods. Including p-values or confidence intervals would enhance the robustness of the findings. Response 4: We agree that the results in Tables 6 and 7 need to be tested for statistical significance. However, due to the limitations of the number of simulation runs and computing resources in this study, it is currently difficult to conduct a test with a sufficient sample size. In future versions, we plan to increase the number of trials and conduct statistical validation, including p-values and confidence intervals, to improve the robustness of the results. We hope that you will interpret the current results as indicative of a trend. Comments 5:Consider adding a table that summarizes the advantages and disadvantages of each method (MAT, MA-DDPG, Ours) for quick comparison. Response 5: Thank you for this suggestion. We believe the comprehensive comparisons between our method, MAT, and MA-DDPG are already well-documented throughout the paper, particularly in Section 5 where we present detailed experimental results with quantitative metrics and analysis. We feel this approach maintains the paper's flow while providing readers with the comparative information they need without redundancy.
Comments 6:How does the method perform with different crowd densities and event types? Response 6: We thank the reviewer for this insightful question about generalizability across different crowd densities and event types. Our current evaluation focused on the Mojiko Fireworks Festival scenario from the MAS-BENCH dataset, which represents a specific crowd density pattern and event type.
We added the following paragraph to page 12, paragraph 265:
Validating these findings with real-world pedestrian data or controlled field experiments to assess model performance under truly dynamic crowd behaviors, extreme congestion situations, safety risk, and unpredictable factors not fully captured in simulations.
Comments 7:What are the computational costs associated with the proposed method, and how do they compare to existing approaches? Response 7: We thank the reviewer for this important point about computational aspects. While we agree that a detailed computational analysis would be valuable, we had to prioritize our research focus on algorithmic effectiveness rather than hardware optimization in this initial study. The computational efficiency largely depends on specific hardware configurations and implementation optimizations, which can vary significantly across different deployment scenarios. Our current implementation was designed as a proof-of-concept to validate the effectiveness of our approach rather than for computational efficiency.
We added the following paragraph to page 12, paragraph 263:
A detailed analysis of computational requirements and optimization remains an important direction. Comments 8:How can the proposed method be adapted to handle unexpected events or changes in pedestrian behavior? Response 8: We thank the reviewer for this very interesting question regarding adaptability to unexpected situations.
As the current research is being evaluated in a simulation environment using a social force model, it has not been fully verified in terms of dealing with unexpected events or sudden changes in pedestrian behavior.
Similar to our response to Comments 6, we will address this limitation in our future work by investigating how our approach can be adapted to handle unexpected events and rapid changes in pedestrian behavior patterns in real-world scenarios.
Comments 9:What are the ethical considerations in using such a system for crowd management, especially regarding privacy and autonomy? Response 9: We sincerely thank the reviewer for raising these crucial points about safety, ethics, and practical deployment considerations. We have expanded our future work.
We added the following paragraph to page 12, paragraph 269:
Future deployment must address ethical concerns including: privacy implications of sensor systems, potential biases in crowd management algorithms, and clear responsibility frameworks for safety outcomes. Comments 10:Provide a more detailed discussion on the hyperparameter tuning process. We added the following paragraph to page 8, paragraph 189:
We performed hyperparameter optimization for each method using Optuna \cite{Optuna}, an automatic hyperparameter optimization software framework, to systematically search for the parameter configurations that most effectively alleviate congestion. This approach allowed us to efficiently explore the parameter space and identify optimal settings that maximize performance across our evaluation metrics. Comments 11:Include statistical significance testing in the results section. Response 11: We appreciate the reviewer's suggestion regarding statistical significance testing. In our study, we employed Optuna for hyperparameter optimization to identify the most effective parameter configurations for our models. This approach systematically explores the parameter space to find optimal performance rather than comparing different methods through multiple trials that would necessitate statistical significance testing.
Since our focus was on determining the best possible performance of our proposed approach through comprehensive optimization rather than establishing comparative statistical differences between methods, we did not incorporate traditional significance testing. The performance metrics reported represent the outcomes from these optimized configurations, which we believe provide clear evidence of the effectiveness of our approach for the given scenario. Comments 12:Discuss the limitations of the social force model and its potential impact on the results. Response 12: We added the following paragraph to page 12, paragraph 275:
Social force model has limitations in its ability to perfectly reproduce actual pedestrian behavior, which may affect the results. Therefore, we explore a more sophisticated model\cite{greilcrowds}.
Comments 13:Address the computational costs and scalability of the proposed method. Response 13: We thank the reviewer for this important point about computational aspects. While we agree that a detailed computational analysis would be valuable, we had to prioritize our research focus on algorithmic effectiveness rather than hardware optimization in this initial study. The computational efficiency largely depends on specific hardware configurations and implementation optimizations, which can vary significantly across different deployment scenarios. Our current implementation was designed as a proof-of-concept to validate the effectiveness of our approach rather than for computational efficiency.
We added the following paragraph to page 12, paragraph 263:
A detailed analysis of computational requirements and optimization remains an important direction.
Comments 14:Consider the ethical implications of using the system in real-world scenarios. Response 14: We added the following paragraph to page 12, paragraph 269:
Future deployment must address ethical concerns including: privacy implications of sensor systems, potential biases in crowd management algorithms, and clear responsibility frameworks for safety outcomes.

Reviewer 3 Report

Comments and Suggestions for Authors

The proposed multi-agent reinforcement learning (MARL) method effectively addresses pedestrian congestion. However, how does the method compare in highly dynamic environments where crowd behavior shifts unpredictably? Please add a discussion in the experimental section about whether integrating real-time sensor feedback improves adaptability.

How well does the method generalize to different event settings, such as stadium evacuations or urban transit hubs? Would transfer learning approaches help adapt the model to new environments?

The reinforcement learning framework uses MA-DDPG with an enhanced Q-function learning mechanism. Have alternative MARL algorithms, such as QMIX or MADDPG with centralized training and decentralized execution (CTDE), been considered for comparison? Please add this comparison.

The study evaluates congestion levels and travel times but does not explicitly assess pedestrian safety beyond the LOS metric. Please add additional safety-focused metrics, such as collision rates or stress indicators from crowd behavior models, provide further validation.

The work builds on existing MARL and pedestrian simulation research, but additional references to related optimization and control strategies could strengthen its positioning. Research on multi-modal machine learning for adaptive human interaction systems, such as those found in smart manufacturing and human-robot collaboration, could offer useful insights. Relevant works include:

https://doi.org/10.1007/s10845-023-02152-x

The proposed Q-function learning for actors improves cooperation among agents, leading to more balanced congestion reduction. Would an explicit reward-sharing mechanism or communication between agents further enhance coordination?

The study uses the ϵ-greedy exploration strategy. Did alternative exploration techniques, such as entropy-based exploration or curiosity-driven learning, improve performance in preliminary tests?

The experiments compare the method against MA-DDPG and MAT. However, how does the approach compare to rule-based heuristics used in real-world crowd control? Please add the discussion about the application in real-world.

Author Response

Comments 1:The proposed multi-agent reinforcement learning (MARL) method effectively addresses pedestrian congestion. However, how does the method compare in highly dynamic environments where crowd behavior shifts unpredictably? Please add a discussion in the experimental section about whether integrating real-time sensor feedback improves adaptability. Response 1: While we recognize the importance of evaluating our method in highly dynamic environments with unpredictable crowd behavior shifts, conducting additional experiments was beyond the scope of the current work. We have acknowledged this limitation and added the following paragraph to page 12, paragraph 265:
Validating these findings with real-world pedestrian data or controlled field experiments to assess model performance under truly dynamic crowd behaviors, extreme congestion situations, safety risk, and unpredictable factors not fully captured in simulations.
Comments 2:How well does the method generalize to different event settings, such as stadium evacuations or urban transit hubs? Would transfer learning approaches help adapt the model to new environments? Response 2: Thank you for this important question about generalizability. Our method's ability to transfer to different event settings involves several considerations:
1. Generalization capabilities: Our approach should theoretically generalize to various crowded scenarios like stadium evacuations or transit hubs, as the underlying principles of agent navigation and congestion management remain consistent. Our method is designed to be environment-agnostic, focusing on relative positions and movements rather than absolute spatial coordinates.
2. Current limitations: We acknowledge that our current experiments focus on specific environmental configurations. Different settings involve unique challenges: - Stadium evacuations feature highly directional flows and bottlenecks - Urban environments include diverse agent types (pedestrians, cyclists, vehicles)
3. Transfer learning potential: We believe transfer learning approaches would indeed be valuable for adapting our models to new environments. Specifically: - Pre-trained agent policies could serve as initialization for fine-tuning in new scenarios - Our method should transfer well as they focus on agent relationships rather than specific environmental features
Comments 3:The reinforcement learning framework uses MA-DDPG with an enhanced Q-function learning mechanism. Have alternative MARL algorithms, such as QMIX or MADDPG with centralized training and decentralized execution (CTDE), been considered for comparison? Please add this comparison. Response 3: Thank you for this insightful suggestion regarding comparison with alternative MARL algorithms. We appreciate the recommendation to include QMIX and CTDE-based MADDPG approaches in our evaluation.
While we recognize the value of such comparisons, we face significant computational and time constraints that prevent us from conducting full re-experiments with these alternative architectures for this revision.
Comments 4:The study evaluates congestion levels and travel times but does not explicitly assess pedestrian safety beyond the LOS metric. Please add additional safety-focused metrics, such as collision rates or stress indicators from crowd behavior models, provide further validation. Response 4: Thank you for this insightful suggestion regarding safety metrics. While we recognize the importance of pedestrian safety evaluation beyond the LOS metric, there are several methodological constraints in our current experimental framework that limit our ability to incorporate additional safety metrics at this stage.
Comments 5:The work builds on existing MARL and pedestrian simulation research, but additional references to related optimization and control strategies could strengthen its positioning. Research on multi-modal machine learning for adaptive human interaction systems, such as those found in smart manufacturing and human-robot collaboration, could offer useful insights. Relevant works include:
https://doi.org/10.1007/s10845-023-02152-x Response 5: Thank you for this suggestion. We have incorporated references to multi-modal machine learning for adaptive human interaction systems. We added reference to page 1, paragraph 25:
Comments 6:The proposed Q-function learning for actors improves cooperation among agents, leading to more balanced congestion reduction. Would an explicit reward-sharing mechanism or communication between agents further enhance coordination? Response 6: Thank you for this insightful question. We believe the strength of our approach lies precisely in allowing agents to learn coordination implicitly through the Q-function learning, without requiring explicit reward-sharing or communication mechanisms.
While explicit communication and reward-sharing could potentially enhance coordination, we intentionally focused on a learning-based approach where agents autonomously discover effective collaborative strategies.
Comments 7:The study uses the ϵ-greedy exploration strategy. Did alternative exploration techniques, such as entropy-based exploration or curiosity-driven learning, improve performance in preliminary tests? Response 7 In this study, we used the ϵ-greedy exploration strategy as our primary exploration mechanism due to its simplicity and established effectiveness for our task domain. We acknowledge that exploration strategy selection is an important aspect of reinforcement learning systems.
We added reference to page 12, paragraph 272:
Exploring alternative exploration strategies beyond $\epsilon$-greedy, such as Gumbel-Softmax, to potentially enhance the learning process and convergence properties of our approach.
Comments 8:The experiments compare the method against MA-DDPG and MAT. However, how does the approach compare to rule-based heuristics used in real-world crowd control? Please add the discussion about the application in real-world. Response 8: As mentioned in the introduction, this research focuses mainly on improving learning-based multi-agent navigation methods. Therefore, we focused on comparing our methods with the most advanced learning-based methods (MA-DDPG and MAT).
Comparing with the rule-based heuristics you mentioned is a very important point, but they are based on a different design philosophy and are considered to be outside the primary scope of this research.
We added the following paragraph to page 12, paragraph 265:
Validating these findings with real-world pedestrian data or controlled field experiments to assess model performance under truly dynamic crowd behaviors, extreme congestion situations, safety risk, and unpredictable factors not fully captured in simulations.

Article Menu

Multi-Agent Reinforcement Learning-Based Control Method for Pedestrian Guidance Using the Mojiko Fireworks Festival Dataset

Further Information

Guidelines

MDPI Initiatives

Follow MDPI