A Residual PPO Method for Shipboard Helicopter Landing Control
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsAlthough the article has a strong technical foundation, making corrections in line with the following points will increase the scientific value of the study.
The study focuses only on the "deck-fixed" holding phase. However, the article lacks any prediction or discussion of how this method would perform in much higher-risk and more dynamic phases such as descent and touchdown.
The simulations were performed for specific wind directions and specific sea conditions. There is no explanation about the limitations in robustness tests under more extreme sea conditions or sudden wind gusts.
Author Response
Comments 1: The study focuses only on the "deck-fixed" holding phase. However, the article lacks any prediction or discussion of how this method would perform in much higher-risk and more dynamic phases such as descent and touchdown.
Response 1: Thank you for this important comment. We agree that evaluating only the deck-fixed holding phase is not sufficient to assess the usefulness of the proposed method during more dynamic shipboard recovery phases. In the revised manuscript, we have therefore extended the simulation study from hover-and-wait station keeping to an additional descent-and-landing strong-disturbance condition. Two landing scenes are now considered: one based on the original disturbance setting extended to a 10 s descent-and-landing task, and one more stringent moving-deck descent scene. The revised manuscript evaluates Residual PPO, Pure PPO, pure INDI, SHMPC, and DOB/CTSMC under these descent-and-landing conditions. Terminal landing classes are defined using final deck-relative position, velocity, and attitude errors, and a terminal score is also reported to quantify landing quality.
These revisions can be found in Section 4.4, "Descent-and-Landing Strong-Disturbance Condition," Page 16, Lines 481-489; Page 16, Lines 490-501; Page 17, Lines 502-515; Page 18, Lines 516-531; and Tables 5-6 on Page 19.
Revised text in the manuscript:
"The preceding results evaluate the hover-and-wait strong-disturbance condition. To examine whether the same learning-enhanced architecture remains useful during descent, an additional descent-and-landing strong-disturbance condition was evaluated."
Revised text in the manuscript:
"The compared controllers are Residual PPO, Pure PPO, pure INDI, SHMPC, and DOB/CTSMC."
Revised text in the manuscript:
"Residual PPO achieves the highest Desired rate, 90.0%, while keeping 98.3% of trials within the Adequate envelope."
Comments 2: The simulations were performed for specific wind directions and specific sea conditions. There is no explanation about the limitations in robustness tests under more extreme sea conditions or sudden wind gusts.
Response 2: Thank you for pointing this out. We agree that the original robustness discussion was too narrow. In the revised manuscript, we have clarified that the current WOD sweep and Monte Carlo simulations are performed within a bounded simulation envelope and should not be interpreted as demonstrating robustness under all possible sea states, wind directions, or abrupt gust transients. We now explicitly frame the work as a numerical-simulation-based feasibility study and state that broader robustness evaluation under more extreme sea states, sudden gusts, expanded disturbance sets, and higher-fidelity airwake/deck-motion models remains necessary before stronger operational conclusions can be made.
The WOD-sweep evaluation is reported in Section 4.3, Page 14, Lines 436-445, and Page 15, Lines 466-480. The added limitation discussion can be found in Section 5, "Limitations and Future Work," Page 19, Lines 532-542, and Page 20, Lines 543-556. The corresponding conclusion is provided on Page 20, Lines 565-577.
Revised text in the manuscript:
"This study is a numerical-simulation-based feasibility investigation. The simulation framework uses simplified modelling assumptions, including a simplified disturbance model, a quasi-steady Cheeseman–Bennett ground-effect correction, an analytic multisine deck-motion model, and a CETI aerodynamic surrogate model."
Revised text in the manuscript:
"Future work should therefore combine higher-fidelity airwake, dynamic-inflow, inclined- and moving-ground, and touchdown-contact models with broader sea-state, gust, deck-motion, out-of-distribution airwake, and actuator-failure campaigns."
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript addresses a relevant problem in shipboard helicopter recovery, namely deck-relative station keeping during the waiting phase before touchdown. The proposed architecture, combining a split-channel INDI outer loop, a reduced-order DI inner loop, and a bounded residual PPO policy, is conceptually sound. A strength of the work is that the learned policy does not replace the model-based controller, but provides residual corrections around an interpretable baseline.
The numerical results are coherent and suggest that the residual architecture improves tracking performance, Desired/Adequate occupancy, success-hold rate, and command usage compared with both the baseline controller and a Pure PPO controller. The Monte Carlo analysis and WOD sweep provide useful evidence within the adopted simulation framework.
However, substantial revisions are needed. First, the novelty should be clarified. The manuscript combines established components, including INDI, DI, PPO, and residual reinforcement learning. The main contribution appears to be mainly applicative: applying a baseline-preserving residual PPO architecture to the waiting phase of shipboard helicopter recovery. This should be stated more explicitly and with a more balanced tone.
Second, the claims about robustness and operational relevance should be moderated. The study relies on simplified disturbance and deck-motion models, including a CETI airwake surrogate, a quasi-steady Cheeseman-Bennett ground-effect correction, and analytic multisine deck motion. These assumptions are acceptable for a preliminary control-oriented study, but they do not support broad operational claims. The paper should be framed more clearly as a simulation-based feasibility study.
Third, the comparison with the state of the art should be strengthened. The comparison against the baseline controller and Pure PPO is useful, but it does not demonstrate superiority over stronger alternatives such as MPC, shrinking-horizon MPC, disturbance-observer-based control, robust/adaptive control, or constrained control allocation. The authors should either include stronger benchmarks or explicitly justify why they are outside the scope of the present work.
Fourth, the ground-effect modelling assumptions deserve more discussion. The current quasi-steady correction does not capture dynamic wake inflow effects, inclined-ground effects, or moving-deck-induced inflow variations. The authors may consider discussing recent work on rotor ground effect over inclined and moving surfaces, for example: Pasquali, Claudio, Jacopo Serafini, Giovanni Bernardini, Joseph Milluzzo, and Massimo Gennaretti. “Numerical-Experimental Correlation of Hovering Rotor Aerodynamics in Ground Effect.” Aerospace Science and Technology, 2020; and Pasquali, Claudio, Massimo Gennaretti, Giovanni Bernardini, and Jacopo Serafini. “State-Space Dynamic Inflow Modelling for Hovering Rotors in Fixed- and Moving-Ground Effect.” Aerospace Science and Technology, 2023.
Fifth, the manuscript should discuss safety and stability more explicitly. Since reinforcement learning is introduced into a safety-critical aerospace control problem, the authors should address closed-loop stability, actuator saturation, residual-command bounds, failure modes, out-of-distribution behavior, and possible safety monitors or fallback mechanisms. Even if formal guarantees are not provided, this limitation should be acknowledged.
Finally, the reproducibility of the results should be improved. The authors should specify what is included in the release bundle, including code, trained policies, simulation configuration files, random seeds, evaluation scripts, disturbance realizations, and plotting scripts. The Monte Carlo results would also benefit from confidence intervals, standard deviations, interquartile ranges, or statistical tests comparing the controllers.
Overall, the paper is relevant and promising, but it requires major revision before the contribution can be considered fully convincing.
Author Response
Comments 1: The novelty should be clarified. The manuscript combines existing components, including INDI, DI, PPO, and residual reinforcement learning. The main contribution appears primarily application-oriented: applying a baseline-preserving residual PPO architecture to the waiting phase of shipboard helicopter recovery. This should be stated more explicitly and in a more balanced way.
Response 1: Thank you for pointing this out. We agree with this comment. We have revised the manuscript to state the novelty more clearly and more cautiously. The revised manuscript now emphasizes that INDI, DI, PPO, and residual reinforcement learning are established building blocks, and that the contribution of this work lies in their deck-relative, baseline-preserving integration for shipboard near-deck station keeping and landing. This revision can be found in the Abstract, Page 1, Paragraph 1, Lines 10–17; Introduction, Page 3, Paragraph 2, Lines 87–94; the contribution list, Page 3, Lines 96–107; and Conclusions, Page 21, Paragraph 1, Lines 595–605.
Revised text in the manuscript:
“Although INDI, DI, PPO, and residual learning are established building blocks, their coupling with deck-relative state representation, residual authority allocation, and task-specific evaluation forms the control algorithm studied in this work.”
Comments 2: Claims about robustness and operational relevance should be managed. The study relies on simplified disturbance and deck-motion models, including a CETI aerodynamic surrogate, a quasi-steady Cheeseman-Bennett ground-effect correction, and analytic multisine deck motion. These assumptions are acceptable for an initial controlled study, but they cannot support broad operational claims. The paper should be more clearly framed as a simulation-based feasibility study.
Response 2: Thank you for this important suggestion. We have revised the manuscript to explicitly frame the work as a numerical-simulation-based feasibility investigation, rather than as a validation of operational readiness. We also clarified that the CETI surrogate, quasi-steady Cheeseman-Bennett correction, and analytic multisine deck-motion model are simplified assumptions suitable for controlled comparative evaluation, but not sufficient to support broad all-scenario operational conclusions. These changes can be found in Limitations and Future Work, Page 20, Paragraph 1, Lines 559–565, and in Conclusions, Page 22, Paragraph 2, Lines 621–625.
Revised text in the manuscript:
“This study is a numerical-simulation-based feasibility investigation. The simulation process adopts simplified modelling assumptions, including a simplified disturbance model, a quasi-steady Cheeseman-Bennett ground-effect correction, an analytic multisine deck-motion model, and a CETI aerodynamic surrogate model.”
Comments 3: Comparison with the state of the art should be strengthened. Comparisons with the baseline controller and pure PPO are useful, but do not show superiority over stronger alternatives such as MPC, shrinking-horizon MPC, disturbance-observer-based control, robust/adaptive control, or constrained control allocation. The authors should include stronger benchmarks or clearly state why these benchmarks are outside the scope of the present study.
Response 3: Thank you for this helpful comment. We have strengthened the state-of-the-art comparison in two ways. First, we expanded the literature discussion to include disturbance-observer-based control, fixed-time and state-constrained control, MPC, shrinking-horizon MPC, and variable-horizon MPC. This revision can be found in the Introduction, Page 2, Paragraph 3, Lines 55–65, and Page 3, Lines 66–68. Second, we added descent-and-landing benchmark comparisons against pure INDI, DOB/CTSMC, and SHMPC, in addition to the baseline controller and Pure PPO. These new comparisons are reported in Section 4.4, Page 17, Paragraph 2, Lines 503–514; Page 18, Lines 528–534; Page 19, Lines 538–557; and Tables 5–6 on Page 20.
Revised text in the manuscript:
“The compared controllers are Residual PPO, Pure PPO, pure INDI, SHMPC, and DOB/CTSMC.”
Comments 4: The ground-effect modeling assumptions deserve more discussion. The current quasi-steady correction does not capture dynamic wake-inflow effects, inclined-ground effects, or moving-deck-induced inflow variations. The authors may consider discussing recent studies on rotor ground effect over inclined and moving surfaces, for example Pasquali et al., Aerospace Science and Technology, 2020, and Pasquali et al., Aerospace Science and Technology, 2023.
Response 4: Thank you for recommending these relevant references. We have expanded the discussion of the ground-effect model and added the suggested Pasquali et al. studies to the revised manuscript. We now explicitly state that the quasi-steady Cheeseman-Bennett correction cannot capture dynamic wake-inflow effects, inclined-ground effects, or moving-deck-induced inflow dynamics. We also clarify that these effects are important directions for higher-fidelity future modeling. This revision can be found in Section 2.3, Ground-Effect Model, Page 6, Paragraph 2, Lines 181–192, and in the updated reference list.
Revised text in the manuscript:
“The quasi-steady correction model adopted here cannot accurately capture dynamic wake-inflow effects, inclined-ground effects, or moving-deck-induced inflow dynamics, and therefore cannot fully reproduce rotor ground-effect aerodynamics under complex operating conditions.”
Comments 5: The manuscript should discuss safety and stability more explicitly. Since reinforcement learning is introduced into a safety-critical aerospace control problem, the authors should address closed-loop stability, actuator saturation, residual command bounds, failure modes, non-distributional behavior, and possible safety monitoring or backup mechanisms. Even without formal guarantees, this limitation should be acknowledged.
Response 5: Thank you for this important safety-related comment. We have added a dedicated discussion of residual command bounds, actuator constraints, safety monitoring, emergency handling, and current limitations in formal stability analysis. The revised manuscript explains that the residual command is bounded by normalized action clipping and physical channel scaling, and that actuator displacement and rate limits are included in the simulation. We also added a description of the basic emergency go-around mechanism and explicitly acknowledge that formal closed-loop stability proof, systematic failure-mode classification, and redundant backup-control design remain future work. These revisions can be found in Limitations and Future Work, Page 20, Paragraph 2, Lines 566–570, and Page 21, Paragraphs 1–2, Lines 571–593.
Revised text in the manuscript:
“Nevertheless, this paper has not yet conducted a systematic quantitative closed-loop stability analysis, multi-condition failure-mode classification, or deep design of redundant backup mechanisms. These topics should be treated as important directions for future work.”
Comments 6: Reproducibility should be improved. The authors should clearly state what is included in the released package, including code, trained policies, simulation configuration files, random seeds, evaluation scripts, disturbance implementations, and plotting scripts. The Monte Carlo results should also include confidence intervals, standard deviations, interquartile ranges, or statistical tests at comparison checkpoints.
Response 6: Thank you for this suggestion. We have improved reproducibility reporting in both the manuscript and the accompanying submission materials. The current submission materials include the manuscript source, generated tables, figure artifacts, and evaluation summaries supporting the reported results. The simulation code, trained-policy/checkpoint files, detailed configuration files, random seeds, evaluation and plotting scripts, disturbance and deck-motion implementations, and additional intermediate training artifacts are currently being organized for a subsequent public release; before that release, these materials are available from the corresponding author upon reasonable request. We also added statistical uncertainty and paired comparison results to the revised manuscript. Specifically, Table 5 now reports Wilson 95% confidence intervals for categorical outcomes and standard deviations/interquartile ranges for continuous terminal metrics, while Table 6 reports paired statistical tests against Residual PPO. These revisions can be found in Section 4.4, Page 19, Paragraph 3, Lines 543–557; Tables 5–6, Page 20; and the Data Availability Statement, Page 22, Lines 633–635.
Revised text in the manuscript:
“The table reports Wilson 95% confidence intervals for Desired, Adequate, and exceedance rates, together with standard deviations and interquartile ranges for continuous metrics.”
Reviewer 3 Report
Comments and Suggestions for Authors(1)For the abstract and conclusion of the paper, please adopt as many quantitative descriptions as possible and present qualitative conclusions.
(2)The proposed method can significantly improve the trajectory tracking accuracy of shipboard helicopters in deck-fixed relative station-keeping. Is it reliable to verify the method only by simulation?
(3)What about the generalization ability and broad applicability of the method proposed in this paper?
(4)Among the numerous formulas in the manuscript, please clarify which ones are cited from references and which ones are derived by the authors. The authors shall supplement and complete the content and provide necessary citations accordingly.
Author Response
Comments 1: For the abstract and conclusion of the paper, please adopt as many quantitative descriptions as possible and present qualitative conclusions.
Response 1: Thank you for this valuable suggestion. We agree that quantitative results make the abstract and conclusion clearer and more scientifically informative. We have revised both the Abstract and Conclusions to include more numerical descriptions of the main findings, including the residual authority range, Desired/Adequate rates, success-hold rate, WOD-sweep performance, and descent-and-landing statistics. These revisions can be found in the Abstract, Page 1, Lines 13-16, and in the Conclusions, Page 20, Lines 565-577.
Revised text in the manuscript:
"With approximately 20–30% residual authority, it achieved 90.0% Desired landing rates in both tested descent-and-landing scenes."
Revised text in the manuscript:
"Across the 0°, 45°, and 90° WOD sweep, it maintained 84.8–86.0% Desired rates, 95.6–96.6% Adequate rates, 96.2–97.0% success rates, and 0.246–0.263 m XYZ RMS errors."
Comments 2: The proposed method can significantly improve the trajectory tracking accuracy of shipboard helicopters in deck-fixed relative station-keeping. Is it reliable to verify the method only by simulation?
Response 2: Thank you for raising this important issue. We agree that simulation alone is not sufficient to prove operational reliability for a safety-critical shipboard helicopter landing system. In the revised manuscript, we have therefore moderated the claims and explicitly framed the work as a numerical-simulation-based feasibility study. The simulation results are used to provide controlled comparative evidence under the adopted J-GenHel-based model, CETI disturbance surrogate, deck-motion model, and Monte Carlo/WOD evaluation protocol. However, we now clearly state that broader sea states, touchdown/contact dynamics, hardware constraints, and formal safety properties still require higher-fidelity and experimental validation before operational reliability can be claimed.
These revisions can be found in Limitations and Future Work, Page 19, Lines 532-542, Page 20, Lines 543-556, and Conclusions, Page 20, Lines 565-577.
Revised text in the manuscript:
"This study is a numerical-simulation-based feasibility investigation. The simulation framework uses simplified modelling assumptions, including a simplified disturbance model, a quasi-steady Cheeseman–Bennett ground-effect correction, an analytic multisine deck-motion model, and a CETI aerodynamic surrogate model."
Revised text in the manuscript:
"These findings support the proposed residual-control formulation for the simulated envelope, while broader sea states, touchdown/contact dynamics, hardware constraints, and formal safety properties still require higher-fidelity and experimental validation."
Comments 3: What about the generalization ability and broad applicability of the method proposed in this paper?
Response 3: Thank you for this helpful comment. We have revised the manuscript to discuss generalization more explicitly. In the current study, generalization is evaluated in three limited ways: randomized Monte Carlo trials, wind-over-deck sweeps at 0°, 45°, and 90°, and an additional descent-and-landing condition including a more stringent moving-deck descent scene. These tests show that the residual architecture generalizes better than Pure PPO within the tested simulation envelope, especially because the learned policy is bounded and preserves the nominal INDI/DI baseline pathway. However, we also clarify that these results do not prove broad applicability across all sea states, ship types, wind fields, or aircraft configurations. Broader applicability will require expanded disturbance sets, higher-fidelity airwake and ground-effect models, and additional validation.
These revisions can be found in Section 4.3, Page 15, Lines 466-480; Section 4.4, Page 16, Lines 481-489, Page 17, Lines 509-515, and Page 18, Lines 516-531; Limitations and Future Work, Page 20, Lines 551-556; and Conclusions, Page 20, Lines 565-577.
Revised text in the manuscript:
"Residual PPO maintains 84.8–86.0% Desired rates and the lowest RMS tracking error across all three WOD settings, while also using less command activity."
Revised text in the manuscript:
"Future work should therefore combine higher-fidelity airwake, dynamic-inflow, inclined- and moving-ground, and touchdown-contact models with broader sea-state, gust, deck-motion, out-of-distribution airwake, and actuator-failure campaigns."
Comments 4: Among the numerous formulas in the manuscript, please clarify which ones are cited from references and which ones are derived by the authors. The authors shall supplement and complete the content and provide necessary citations accordingly.
Response 4: Thank you for pointing this out. We have revised the manuscript to clarify the source and role of the main formula groups and added necessary citations. Specifically, the J-GenHel flight-dynamics model and reduced-order control-oriented model are based on the publicly available J-GenHel/GenHel modeling lineage and related validation literature. The CETI shaping-filter disturbance model is cited as a control-oriented airwake surrogate. The Cheeseman–Bennett ground-effect correction is explicitly cited. The ship-deck multisine motion model is cited as a simplified representative deck-motion model. The INDI/DI baseline and dynamic-inversion inner-loop equations are based on established rotorcraft INDI/NDI formulations, while the residual command interface, reward construction, command-usage metric, and task-specific terminal scoring are formulated by the authors for this study. The PPO objective and GAE equations are cited to the original PPO and GAE references.
These clarifications and citations can be found in Section 2.1, Page 3, Lines 102-109, and Page 4, Lines 110-134; Section 2.2, Page 5, Lines 135-162; Section 2.3, Page 6, Lines 163-185; Section 2.4, Page 6, Lines 186-204, and Page 7, Lines 205-209; Section 3.1, Page 7, Lines 211-225; Section 3.2-3.4, Page 7, Lines 235-263, Page 8-11, Lines 264-360; and Section 4.1, Page 12-13, Lines 371-420.
Revised text in the manuscript:
"The INDI/DI and PPO blocks follow established formulations, but their coupling here defines a task-specific control algorithm for moving-deck operation."
Revised text in the manuscript:
"Following PPO, policy updates are performed on rollout batches collected under the current closed-loop controller."
Revised text in the manuscript:
"Advantage estimation is computed by generalized advantage estimation (GAE)."
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors have substantially revised the manuscript and have addressed my previous comments in a satisfactory manner. The novelty is now presented more cautiously and appropriately, with the contribution framed as a deck-relative, baseline-preserving residual PPO architecture that integrates established components rather than as a fundamentally new control method. The revised manuscript also more clearly acknowledges the simplified nature of the simulation framework, including the CETI disturbance surrogate, the quasi-steady Cheeseman-Bennett ground-effect correction, and the analytic multisine deck-motion model.
I also appreciate the strengthened comparison with additional structured baselines, including INDI, DOB/CTSMC, and SHMPC, as well as the inclusion of confidence intervals, dispersion statistics, and paired statistical comparisons. These additions make the evidence more convincing and improve the balance of the manuscript.
The discussion of safety, stability limitations, residual command bounds, actuator constraints, and future work has also improved. Although the study remains a simulation-based feasibility investigation and some aspects of reproducibility still depend on future public release or availability upon request, I consider the current revision sufficient for publication. I recommend acceptance.
