Causal Correction and Compensation Network for Robotics: Applications and Validation in Continuous Control
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe manuscript introduces an interpretable reinforcement learning framework built on the Causal World Model, featuring three key contributions. First, it employs causal discovery methods and a Structural Causal Model to learn and formally represent the environment’s causal structure, enabling more informed state transition and reward modeling. Second, it replaces traditional attention mechanisms with a novel Graph Neural Network–based Neighborhood Causal Model (GNN-NCM) that performs multi-layer message passing to automatically compute causal influence weights. Third, it proposes the C²-Net algorithm, which integrates these causal weights into dynamic action correction and compensation, significantly enhancing robustness, interpretability, policy performance, and convergence speed, as demonstrated through extensive experiments on both standard continuous control tasks and a complex multi-agent simulation platform. Results are mainly correlational, lacking causal framework and uncertainty quantification. The interdisciplinary approach is superficial, citing methods but not integrating them. Conclusions overreach, relying on unsupported inference.
Specific comments Before Acceptance
- Improve the definition of the research gap by systematically synthesizing the recent five years of literature and explicitly contrasting this work with comparable state-of-the-art studies.
- Increase methodological transparency by providing details on all operational parameters, instrumentation models, and environmental conditions.
- Clarify the experimental design rationale with explicit justification for the selected parameters, sample sizes, and control conditions.
- Add a causal explanatory model instead of relying solely on empirical correlations.
- Revise figures and tables to include complete axis units, standardized statistical notation, and more clear legends.
- Report statistical significance tests (e.g., p-values, effect sizes) where applicable, for all comparative results.
- Contributions needs to be redefined.
The English in the revised manuscript falls below Applied Sciences' publication standards. Although the meaning is understandable, frequent grammatical errors, awkward phrasing, and lexical issues hinder clarity. Long, complex sentences with excessive clauses make key points hard to follow.
Author Response
Dear Reviewer,
We sincerely appreciate your valuable comments and suggestions, which have greatly helped us improve the quality of the manuscript. Please see the attachment for our detailed response.
Best regards.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe article is well-written and focuses on deep learning methods. Authors presented a few algorithms and they used them to humanoid robot. In my opinion, the article should be improved in the field of description of robot. Readers don't find any information about mathematical model of robot but authors used concepts "state vector", "velocity" etc. This concepts are really general and readers don't know what they mean in this case. Moreover, authors should better describe tasks. For example, Figures 13 and 14 present the robot movements without and with correction but we don't know what is the reference traffic pattern? There is no detailed description of how the described algorithms are connected to the robot's motion parameters. This makes analysis of the research results difficult.
Author Response
Dear Reviewer,
We sincerely appreciate your valuable comments and suggestions, which have greatly helped us improve the quality of the manuscript. Please see the attachment for our detailed response.
Best regards.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe revised version of the manuscript shows that the authors have effectively addressed the reviewers’ comments. The paper’s structure has been reorganized to improve clarity, the methodology section now includes adequate details and parameter specifications to ensure transparency, and the reference list has been updated with more recent and relevant sources. The figures and tables have been enhanced to increase clarity, and the conclusions have been rewritten to highlight contributions, acknowledge limitations, and suggest future research directions. These changes mark significant progress compared to the earlier version.
However, a minor issue still persists. While the discussion of the results has improved, it remains more descriptive than analytical and would benefit from a more critical comparison with related studies.
In summary, only minor adjustments are needed before the manuscript can be considered fully prepared for acceptance.
Comments on the Quality of English LanguageAlthough the overall English style has been refined, a final review could further enhance fluency and readability.
Author Response
Dear Reviewer:
We sincerely thank the reviewer for the constructive feedback and for recognizing the substantial improvements made in the revised manuscript. We carefully addressed the remaining concerns as follows:
Comment 1:
However, a minor issue still persists. While the discussion of the results has improved, it remains more descriptive than analytical and would benefit from a more critical comparison with related studies.
Response 1:
Thank you for pointing this out. We have revised the "Discussion" section to move beyond descriptive summaries and provide more analytical insights. We added a comparison between our findings and related studies in reinforcement learning and causal inference. This comparison highlights how C²-Net differs from prior approaches, not only in improving performance but also in enhancing interpretability and robustness. We also discuss the implications of the observed increase in policy variance and its potential trade-offs compared to existing methods.
This change can be found on page 19, line 560:” Figure 16 summarizes the results across all environments. Here, reward_raw denotes the mean return before correction, and reward_crr represents the mean return after applying C²-Net. Overall, C²-Net consistently improves the agent’s average return across all tasks. For example, in the Hopper environment with SAC and PPO algorithms, the mean return increased by 14.6% and 20.2%, with a value of Cohen’s d indicating a medium-to-large effect. Similar trends are observed in Walker2d (SAC: 18.5% and PPO: 44.9%), Humanoid (SAC: +22.3%, PPO: +9.92%), and AzureLoong (+3.6%). These effect sizes range from small to large according to Cohen’s convention, underscoring the contribution of causal correction across diverse benchmarks.
However, the T-shaped error bars indicate that the standard deviation also increased in certain scenarios. For instance, in the Hopper-SAC task, the variance rose from 0.04503 to 0.05872, and in the AzureLoong environment, it increased from 0.004507 to 0.007859. This suggests that although causal correction improves the mean performance, it also introduces greater policy variability. Such an increase in variance is expected, as C²-Net explicitly models the causal dependencies among states, actions, and rewards, providing a principled basis for the agent’s action selection. By guiding the correction process using reward information, the agent can explore multiple near-optimal behaviors under uncertainty or perturbations, thereby enhancing system robustness and adaptability. At the same time, this increased variance highlights the inherent trade-off between exploratory behavior and policy stability.
Conventional reinforcement learning methods, such as PPO and SAC, primarily regulate policy variance through reward shaping or entropy regularization. In contrast, C²-Net provides an alternative and potentially more principled approach by explicitly modeling causal dependencies among states, actions, and rewards. While this causal-driven exploration helps to interpret both the improvement in average performance and the observed variance, it also highlights a potential trade-off between adaptability and stability. Specifically, greater variance may yield diverse behavioral trajectories beneficial for long-term exploration, yet it can also increase the difficulty of convergence and control in safety-critical environments.”
Comments 2:
Although the overall English style has been refined, a final review could further enhance fluency and readability.
Response 2:
Agree. To meet the journal’s standards for academic English, we have carefully polished the manuscript for fluency, conciseness, and readability.
We believe these additional revisions address the reviewer’s concerns, and we respectfully submit the manuscript for final consideration.