You are currently viewing a new version of our website. To view the old version click .
by
  • Wenxin Li1,
  • Yongxin Feng1,* and
  • Fan Zhou1
  • et al.

Reviewer 1: Anonymous Reviewer 2: Anonymous Reviewer 3: Anonymous Reviewer 4: Anonymous Reviewer 5: Sooyoung Jang

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper presents a novel framework for overlapping coalition formation in post-disaster rescue UAV swarms, integrating hypergraph attention networks, hierarchical value decomposition, and Monte Carlo Tree Search. The proposed SGRL-TS method demonstrates promising results in simulation experiments. Overall, I consider the quality and contributions of the work good. But there are some missing parts on the methodology and results that need to be addressed to make this work strong enough to be considered for publication. The following are some comments l suggest authors address in order to improve the quality and clarity of methods and results presentation:

  1. The literature review section should incorporate citations from top-tier industry journals to underscore the innovation.
  2. In Equation (4) on Page 8, the denominator of the task fulfillment metric f1 uses the L2-norm ||L_(T_m)|| of the resource demand vector. Since the dimensions of this vector represent different physical quantities with incompatible units, employing the norm may introduce normalization bias. The authors should justify the rationality of this approach or consider alternative dimensionally consistent calculation methods.
  3. The hypergraph representation G=(V,E) is introduced on Page 8, and the bidirectional Hypergraph Attention Network (HAN) is described. However, two aspects remain unclear. First, the dynamic update process of node (UAV) and hyperedge (task) features based on the evolving coalition structure is not sufficiently explained. Second, the theoretical rationale for why bidirectional attention effectively captures high-order UAV-task-coalition relationships needs elaboration. Specifically, how does task-to-node attention reflect "task selectivity," and how does node-to-task attention embody "contextual suitability"?
  4. The MCTS process utilizes SHIELD to estimate the global value of partial coalition structures (where only the first m) tasks are allocated). A critical implementation detail is missing: how is a partial structure encoded and fed into the HAN and SHIELD models, which were designed for a complete set of M tasks? The treatment of unassigned tasks—specifically, how their hyperedge features are initialized and how the model handles this "missing" information—must be clarified. The use of masking or zero-padding strategies should be discussed if applicable.
  5. The MCTS pruning rules are mentioned on Page 17 but lack implementation specifics. In a combinatorial search space, exhaustively checking all constraints in Eq. (11) at each node expansion is computationally expensive and could hinder real-time performance. The authors should elaborate on the efficient techniques employed for constraint checking, to demonstrate the practical feasibility of their MCTS approach.
  6. In Table 5 (Page 26), the definitions of the evaluation metrics "AUC" and "Iter@95%" are ambiguous. "AUC" should be explicitly defined—specifying which curve it corresponds to. For "Iter@95%," the criterion for the "95% steady-state threshold" must be clearly explained. Is it defined relative to the final converged average performance or a moving average within a time window? Precise definitions are essential for interpreting the results and ensuring reproducibility.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The paper studies overlapping coalition formation for heterogeneous UAV swarms in post disaster rescue, framing allocation as a multi objective combinatorial problem that balances task fulfillment, coalition synchronization deviation, and operational cost under energy, communication, and deadline constraints. A hypergraph attention representation encodes high order relations among UAVs, tasks, and coalitions, then a structure conditioned hierarchical value decomposition aggregates individual and coalition information and models cross task cooperation and competition. Monte Carlo Tree Search with feasible region pruning uses the learned global value as a heuristic, while demonstration replay and value distillation feed search outcomes back to policy learning. Simulations across resource regimes indicate higher utility and stronger energy efficiency, including up to 11.4 percent utility improvement over the best baseline and more than 228 percent energy efficiency gain versus a non overlapping variant. Novelty lies in unifying overlapping coalitions, hypergraph based structure modeling, global value estimation with cross terms, and closed loop search learning. Contributions include a structure aware policy framework, a hierarchical value model, and a practical search feedback mechanism for constrained rescue settings. 


The claim to novelty appears overstated, and the technical focus remains insufficiently disentangled. Prior studies have already advanced task driven OCF with PGG TS, enabling aircraft to join multiple coalitions and outperform non overlapping CF. Hypergraph based communication or coordination in MARL has been reported, and GAT methods for UAV contexts also exist. The manuscript combines OCF, hypergraph attention, hierarchical value decomposition, and MCTS, yet it does not isolate the incremental effect of each module through component wise ablations, since results primarily report trade offs under search budgets, which attenuates the case for distinct contributions. 

Practicality and deployability seem constrained, since the evaluation relies entirely on simulation with non public data. No field trials are presented. There is no hardware in the loop validation or assessment against real post disaster communication traces. Such omissions may impair external validity when channel conditions, latency, and energy consumption diverge from modeling assumptions.

The multi objective design and baseline selection may induce bias, and systematic sensitivity checks are absent. The objective blends task completion, synchronization deviation, and operational cost with feasibility constraints. Weight choices, normalization spans, and alternative priority scenarios are not probed, yet these factors could invert rankings between the method and baselines. The set of comparators appears to omit recent hypergraph MARL approaches, which can yield optimistic appraisals of performance.  

Computational burden and real time feasibility present a salient risk, particularly in emergency operations. The proposed architecture integrates hypergraph attention, structure conditioned value decomposition, intra coalition and inter coalition aggregation, and budgeted MCTS. Even with feasible region pruning and learned heuristics, inference costs may scale sharply with the number of UAVs, tasks, and overlap levels. Reported trade offs indicate marked dependence on search budgets, so end to end latency and concrete timing constraints should be quantified before any deployment.

Connections to prior literature remain tenuous for the post disaster setting, and direct comparisons are lacking. Many works already exploit OCF for UAV or crowdsensing, and use graph or hypergraph structures for coordination. The paper does not benchmark against representative sequential OCF or hypergraph MARL under identical rescue metrics, which weakens the argument for relative superiority.  

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The UAV swarms are nowadays used for various tasks, including military warfare, post-disaster support, payload delivery, communication support, etc. Conventional collaboration techniques allocate each UAV to a singular task, thereby constraining the inter-task exchange of resources. To tackle this issue, the proposed study proposes overlapping coalition formation (OCF) for UAV swarms, allowing one UAV to engage in several coalitions, hence facilitating capacity recycling and minimizing inactivity. The integration of overlapping alliances and structure-sensitive policy learning enhances capacity reutilization across tasks, mitigates obstacles under strict limits, and reinforces global collaboration for concurrent tasks in post-disaster scenarios. The presented article is well structured and contains all required information. The first part contains the introduction, the second part contains the related literature survey, the third part provides information about the proposed model and problem statement, the fourth part contains information about the proposed algorithm, the fifth part provides the experimental setup and results, and the sixth part contains the conclusion. Only future research trends are missing in this paper. Thus, it is suggested to add a future research trends section just before the conclusion section before accepting this article for the publication.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

The article presents a framework combining hypergraph attention modeling, hierarchical reinforcement learning, and Monte Carlo Tree Search (MCTS) for overlapping coalition formation (OCF) in heterogeneous UAV swarms. However, several issues should be addressed to improve clarity.

The novelty of this work concerning applied methods is not clearly demontrated. The authors should clearly explain how their SGRL-TS approach differs from and improves upon existing graph-based deep reinforcement learning methods, such as QMIX-Graph or Graph-VDN.

The explanation of feasible-region pruning in MCTS is brief. More information is needed on how constraints such as energy, bandwidth, and latency are quantified and how pruning limits are selected.

The combination of reinforcement learning and MCTS is reasonable but the interaction between the two is not clear: How often is the policy updated from MCTS results? How are demonstration replay and Q-value distillation scheduled? These design choices are critical for reproducibility.

Finally, the contribution of each component in the framework (HAN, SHIELD, and MCTS) is not evaluated separately. Demonstrating performance differences when removing or replacing modules would better justify the architecture.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 5 Report

Comments and Suggestions for Authors

Please refer to the attached PDF.

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The proposed SGRL-TS framework demonstrates strong innovation and is supported by comprehensive experiments. The authors have satisfactorily addressed all the key concerns raised in the previous review round, with significantly improved methodological details and reproducibility of the results. Therefore, I recommend acceptance.

Reviewer 2 Report

Comments and Suggestions for Authors

The authors reflected the reviewer comments and suggestsion and significantly modified and improved the paper in a way better manner. I believe the paper is ready to be published in the current form. Thanks for the efforts. 

Reviewer 4 Report

Comments and Suggestions for Authors

Thank you for considering all my queries. The revised manuscript is improved.

Reviewer 5 Report

Comments and Suggestions for Authors

Great Job!