DDWCN: A Dual-Stream Dynamic Strategy Modeling Network for Multi-Agent Elastic Collaboration
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsOverall, the article is well written. Still, in my opinion, its quality can be easily further enhanced.
A more concise conclusion could be a benefit.
The same applies to the organization of the text.
Specific comments and suggestions
- How is the training phase defined? In terms of duration (time steps)? In terms of a “testing interval”? In the latter case, what is the test condition?
- Please rigorously define the “win rate."
In this context, please also define “optimal performance” and the other terms you employ to compare models' outputs.
- In the StarCraft environment, the number of units changes not only due to death, but also as a result of “production.” Do the authors consider only scenarios in which the simulation starts with a ready team of units without further “production”? Please define.
- How are the weights in the ICN module defined?
- In the description of the DWFN module, the authors mention neural networks: “…a gate structure composed of neural networks that…” (Line 275) Isn’t it only one neural network?
Why have the authors preferred a sigmoid function for the second activation? Have the authors tried other options?
- Please standardize the terminology. For example: “The information compensation strength… (Line 465) is denoted “lambda”; the same symbol is employed for the “is the residual compensation coefficient.”
Suggestion: once defined, “lambda” can be used subsequently without the definition.
- The VDN algorithm exhibits…a consistent upward trend during the training phase. (Line 602-604) According to Figure 8, there is no consistent upward trend in the first phase. Or, maybe, the first phase is not clearly defined?
Minor suggestions
- Please replace “chapter” with “section.”
- “…this paper organizes the architecture of its three core…” (Line 192) Maybe “outlines”?
- “Dual-flow Action Modeling Network” (Line 195) Previously, the authors used the term “Dual-stream.” Please unify the terminology.
- “…upward movement, leftward movement, and cessation of movement.” (Line 201) Why are only these limited degrees of freedom considered? What about movement to the right?
- In Figure 1, in the unit-oriented stream, are the two vectors “h” the same? If they are not, why do they have the same subscripts?
- “The ICN boasts several advantages…” (Line 314) I suggest editing this sentence, given that the following lines describe why the implementation of ICN is needed, and not the obtained pluses.
- I suggest “Simulation Experiments” instead of “Simulation Experiment.”
- “Among these, StarCraft II…” (Line 432) What are the other options?
- “…a deficiency in both rapidity weight processing.” (Lines 499-500) Please edit this sentence.
- “…and began to rise steand robustness. (Lines 501-502) Please edit this sentence.
- “Nevertheless, the win rate can be sustained at approximately 90% in the final convergence.” (Line 532) Maybe, the authors should note that the success rate is 90% of that of the best-performing model.
- As illustrated in Figure 9, the win rate comparison curves for the four algorithms are presented. DDWCN, VDN, QMIX, and QTRAN. (Lines 625-626). Please edit this sentence.
Author Response
1、Major Comments
Comments 1:How is the training phase defined? In terms of duration (time steps)? In terms of a “testing interval”? In the latter case, what is the test condition?
Response 1:In the revised manuscript, we have clarified that a training stage is defined as 10,000 timesteps, which corresponds to the testing interval shown in Table 1. After each stage, testing is conducted under the same environmental settings as in training, with fixed agent parameters, no further learning, and a purely greedy policy. This ensures that the evaluation accurately reflects the performance learned up to that point.
Comments 2:Please rigorously define the “win rate." In this context, please also define “optimal performance” and the other terms you employ to compare models' outputs.
Response 2:In the revised manuscript, we have clarified that win rate is the primary evaluation metric in our experiments, defined as the ratio of the number of games won by our agents to the total number of evaluation games, expressed as a percentage. Optimal performance refers to the highest average win rate achieved during the entire training process. This clarification has been added to the experimental setup section to ensure clarity and consistency in performance evaluation.
Comments 3:In the StarCraft environment, the number of units changes not only due to death, but also as a result of “production.” Do the authors consider only scenarios in which the simulation starts with a ready team of units without further “production”? Please define.
Response 3:In our current experimental setup, all scenarios start with a fixed number of units, and no further “production” events occur during the simulation. This setting was chosen to ensure that DDWCN, VDN, QMIX, and QTRAN are evaluated under the same initial unit conditions, thereby avoiding additional variables introduced by unit production. We acknowledge that scenarios involving both unit death and production could provide a more comprehensive assessment of algorithm performance, and we will consider incorporating such settings in our future research to further enhance the applicability of our approach.
Comments 4:How are the weights in the ICN module defined?
Response 4:In the revised manuscript, we have clarified that the weight parameters of the ICN are randomly initialized at the start of training and subsequently optimized via backpropagation together with the other network parameters of DDWCN. The bias terms are initialized to zero and updated through backpropagation during training.
Comments 5:In the description of the DWFN module, the authors mention neural networks: “…a gate structure composed of neural networks that…” (Line 275) Isn’t it only one neural network? Why have the authors preferred a sigmoid function for the second activation? Have the authors tried other options?
Response 5:Although the gate structure in DWFN is implemented using neural networks, it is specialized for generating dynamic fusion weights for attack and non-attack actions, rather than serving as a general-purpose feature extractor. This design captures differences in semantic importance between the two types of actions and assigns adaptive weights accordingly. Therefore, we intentionally define it as a “gate structure” to emphasize its specific role and semantic-level regulation.
The Sigmoid function is chosen for the second activation because it strictly constrains the output to the [0, 1] range, making it well-suited for normalized fusion weights and ensuring interpretability and stability in relative importance assignment. Its smooth nonlinearity helps avoid large gradient fluctuations during early training, enhancing convergence stability. Preliminary experiments with alternative activations (Tanh, Softmax) showed that Sigmoid achieved superior stability, interpretability, and overall performance, leading to its adoption in the final design.
Comments 6:Please standardize the terminology. For example: “The information compensation strength… (Line 465) is denoted ‘lambda’; the same symbol is employed for the ‘is the residual compensation coefficient.’ Suggestion: once defined, ‘lambda’ can be used subsequently without the definition.
Response 6:In response, we have standardized the terminology throughout the manuscript by removing the phrase “The information compensation strength” and, after the first definition of the symbol λ (lambda), consistently using this symbol in all subsequent occurrences. This ensures uniform terminology and avoids redundant definitions.
Comments 7:The VDN algorithm exhibits… a consistent upward trend during the training phase. (Line 602-604) According to Figure 8, there is no consistent upward trend in the first phase. Or, maybe, the first phase is not clearly defined?
Response 7:We appreciate the reviewer’s careful observation. We agree that in Figure 8, the win rate curve of VDN does not exhibit a consistently monotonic upward trend in the early phase. In response, we have revised the corresponding description in the manuscript to state that the curve shows initial fluctuations followed by a gradual upward trend, rather than describing it as consistently increasing. This change ensures that the textual description aligns accurately with the visual data presented in Figure 8.
- Minor Comments
Comments 1:Please replace “chapter” with “section.”
Response 1:We have replaced all non-standard uses of “chapter” with “section” throughout the manuscript to conform to academic writing conventions and standardized terminology.
Comments 2:“…this paper organizes the architecture of its three core…” (Line 192) Maybe “outlines”?
Response 2:In the original manuscript, the word “organizes” was intended to describe the presentation of the architecture of the three core submodules of DDWCN. However, we agree that “outlines” is a more accurate and appropriate term in this context, as it better conveys the intent to provide an overview rather than to structurally organize. Accordingly, we have replaced “organizes” with “outlines” in the revised manuscript.
Comments 3:“Dual-flow Action Modeling Network” (Line 195) Previously, the authors used the term “Dual-stream.” Please unify the terminology.
Response 3:In the revised manuscript, we have standardized the terminology by replacing all instances of “Dual-flow” with “Dual-stream” to ensure consistency throughout the text.
Comments 4:“…upward movement, leftward movement, and cessation of movement.” (Line 201) Why are only these limited degrees of freedom considered? What about movement to the right?
Response 4:In the revised manuscript, we have updated the description to clarify that the movement actions are not limited to upward, leftward, and cessation of movement, but also include rightward movement and other possible actions, ensuring a more comprehensive representation of the agents’ degrees of freedom.
Comments 5:In Figure 1, in the unit-oriented stream, are the two vectors “h” the same? If they are not, why do they have the same subscripts?
Response 5:In the revised manuscript, we have updated Figure 1 to clarify the representation of the two “h” vectors in the unit-oriented stream, ensuring that their subscripts are correctly distinguished to reflect their differences.
Comments 6:“The ICN boasts several advantages…” (Line 314) I suggest editing this sentence, given that the following lines describe why the implementation of ICN is needed, and not the obtained pluses.
Response 6:we have revised the sentence to emphasize the theoretical motivations for incorporating the ICN module, rather than presenting it as a list of advantages. The updated version now explicitly clarifies that the introduction of ICN is driven by two key considerations: (1) capturing the discrepancies between Q-values before and after fusion to mitigate information loss, and (2) regulating excessive variations in Q-value information to maintain the stability of the fusion mechanism. This modification aligns the description more closely with the logical flow of the method section.
Comments 7:I suggest “Simulation Experiments” instead of “Simulation Experiment.”
Response 7:We have revised “Simulation Experiment” to “Simulation Experiments” accordingly in the manuscript to ensure accuracy and consistency.
Comments 8:“Among these, StarCraft II…” (Line 432) What are the other options?
Response 8:We appreciate the reviewer’s suggestion. The original sentence “Among these, StarCraft II is a classic elastic collaboration scenario, and its training environment meets our training and testing requirements.” has been revised to “StarCraft II serves as a canonical testbed, and its training environment meets our training and testing requirements.” This change streamlines the expression while retaining the emphasis on StarCraft II’s role as the primary evaluation environment in our study.
Comments 9:“…a deficiency in both rapidity weight processing.” (Lines 499-500) Please edit this sentence.
Response 9:In the revised manuscript, we have refined the sentence to: “Furthermore, the overall convergence speed was found to be insufficient, reflecting limitations in both the rapid adjustment and efficient processing of weights.” This change improves clarity, ensures grammatical accuracy, and provides a more precise academic expression.
Comments 10:“…and began to rise steadily and robustness. ”(Lines 501-502) Please edit this sentence.
Response 10:The original sentence, “The win rate of DDWCN-NO remained at a low level during the early stages of training and began to rise steadily…” has been revised to “During the early stages of training, the win rate of DDWCN-NO remains at a low level and only gradually increases in the mid stages, eventually achieving a moderate degree of stability and robustness.” This modification clarifies the timing of the win rate increase and more accurately reflects the observed experimental results.
Comments 11:“Nevertheless, the win rate can be sustained at approximately 90% in the final convergence.” (Line 532) Maybe, the authors should note that the success rate is 90% of that of the best-performing model.
Response 11:To address the potential ambiguity, we have revised the sentence to explicitly indicate that the value refers to the absolute win rate. The updated sentence now reads: “Nevertheless, its absolute win rate can still be maintained at approximately 90% during the final convergence phase.” This modification clarifies the intended meaning and avoids any possible misinterpretation.
Comments 12:As illustrated in Figure 9, the win rate comparison curves… Please edit this sentence.
Response 12:The sentence has been revised to improve clarity and conciseness. The updated version is:“As illustrated in Figure 9, the win rate comparison curves for the four algorithms—DDWCN, VDN, QMIX, and QTRAN—are presented, enabling a direct
performance comparison among them.”
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for Authors1. This work addresses the problems faced by conventional algorithms such as VDN, QMIX, and QTRAN in the field of multi-agent reinforcement learning, as they have shown limited ability to model elastic collaboration scenarios.
2. A novel multi-agent collaboration mechanism is proposed: the Dual-Stream Dynamic Weight Compensation Network (DDWCN).
3. The proposed method employs a dual-stream action modeling network for action classification and processing, integrating an information compensation network and a dynamic weight fusion network.
4. The DDWCN improves robustness; however, it is useful to discuss the conditions under which the network's generalization ability for complex collaborative tasks presents significant results.
5. Experiments on various benchmark tasks in StarCraft II have validated the effectiveness of the DDWCN; however, the efficiency of the proposed mechanism, expressed in terms of reliability, needs to be discussed.
6. References are appropriate to the context of the phenomenon studied.
7. Specific scenarios and conditions of multi-agent collaboration should be noted, in which the proposed approach can be applied scalably and reliably.
Author Response
Comments 1:This work addresses the problems faced by conventional algorithms such as VDN, QMIX, and QTRAN in the field of multi-agent reinforcement learning, as they have shown limited ability to model elastic collaboration scenarios.
Response 1:Thank you for your positive comment. We agree with your assessment. Therefore, we have retained this motivation clearly in the Introduction section, emphasizing that our work is specifically aimed at overcoming the limitations of conventional algorithms in modeling elastic collaboration scenarios.
Comments 2:A novel multi-agent collaboration mechanism is proposed: the Dual-Stream Dynamic Weight Compensation Network (DDWCN).
Response 2:We appreciate the reviewer’s recognition of our proposed Dual-Stream Dynamic Weight Compensation Network (DDWCN) as a novel multi-agent collaboration mechanism.
Comments 3:The proposed method employs a dual-stream action modeling network for action classification and processing, integrating an information compensation network and a dynamic weight fusion network.
Response 3:We thank the reviewer for accurately summarizing the core architecture of DDWCN. This design enables the model to effectively capture semantic differences between action types and to maintain robustness through information compensation and adaptive weight fusion.
Comments 4:The DDWCN improves robustness; however, it is useful to discuss the conditions under which the network's generalization ability for complex collaborative tasks presents significant results.
Response 4:We agree with your suggestion. Therefore, we have added concise discussions on scalability and reliability at the end of Section 4.3’s first paragraph (Page 15, Lines 602–604) and Section 4.3.3 (Page 18, Lines 685–687). The first addition provides a general statement on generalization ability across different scenarios, while the second highlights this capability in the most complex heterogeneous collaboration case (3s5z).
Comments 5:Experiments on various benchmark tasks in StarCraft II have validated the effectiveness of the DDWCN; however, the efficiency of the proposed mechanism, expressed in terms of reliability, needs to be discussed.
Response 5:We appreciate your valuable suggestion. Accordingly, we have added a sentence at the end of Section 5 (Page 18, Lines 731–735) to explicitly highlight the observed stability of the win rate across multiple independent runs under identical experimental settings. This underscores that the coordinated design of the DAMN, DWFN, and ICN modules inherently enhances reliability and efficiency in dynamic collaborative environments.
Comments 6: References are appropriate to the context of the phenomenon studied.
Response 6: Thank you for acknowledging the appropriateness of our references. No change was necessary.
Comments 7: Specific scenarios and conditions of multi-agent collaboration should be noted, in which the proposed approach can be applied scalably and reliably.
Response 7: We agree with your suggestion. Thus, we have added the following sentence to the final paragraph of the Conclusions section (Page 18, Lines 735–738):
“Beyond the tested StarCraft II environments, the proposed approach can be readily scaled to large-scale multi-agent collaboration tasks, such as multi-UAV cooperative reconnaissance, distributed sensor networks, and dynamic resource allocation in communication systems.” This addition broadens the applicability discussion to a wider range of real-world and virtual multi-agent collaborative scenarios, addressing scalability and reliability concerns.
Author Response File: Author Response.pdf