Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A Residual PPO Method for Shipboard Helicopter Landing Control

Aerospace 2026, 13(6), 516; https://doi.org/10.3390/aerospace13060516

by Xiao Chang and Jianliang Ai^*

Reviewer 1:

Mehmet Konar

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Aerospace 2026, 13(6), 516; https://doi.org/10.3390/aerospace13060516

Submission received: 23 April 2026 / Revised: 20 May 2026 / Accepted: 29 May 2026 / Published: 31 May 2026

(This article belongs to the Section Aeronautics)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Although the article has a strong technical foundation, making corrections in line with the following points will increase the scientific value of the study.

The study focuses only on the "deck-fixed" holding phase. However, the article lacks any prediction or discussion of how this method would perform in much higher-risk and more dynamic phases such as descent and touchdown.

The simulations were performed for specific wind directions and specific sea conditions. There is no explanation about the limitations in robustness tests under more extreme sea conditions or sudden wind gusts.

Author Response

Comments 1: The study focuses only on the "deck-fixed" holding phase. However, the article lacks any prediction or discussion of how this method would perform in much higher-risk and more dynamic phases such as descent and touchdown.

Response 1: Thank you for this important comment. We agree that evaluating only the deck-fixed holding phase is not sufficient to assess the usefulness of the proposed method during more dynamic shipboard recovery phases. In the revised manuscript, we have therefore extended the simulation study from hover-and-wait station keeping to an additional descent-and-landing strong-disturbance condition. Two landing scenes are now considered: one based on the original disturbance setting extended to a 10 s descent-and-landing task, and one more stringent moving-deck descent scene. The revised manuscript evaluates Residual PPO, Pure PPO, pure INDI, SHMPC, and DOB/CTSMC under these descent-and-landing conditions. Terminal landing classes are defined using final deck-relative position, velocity, and attitude errors, and a terminal score is also reported to quantify landing quality.

These revisions can be found in Section 4.4, "Descent-and-Landing Strong-Disturbance Condition," Page 16, Lines 481-489; Page 16, Lines 490-501; Page 17, Lines 502-515; Page 18, Lines 516-531; and Tables 5-6 on Page 19.

Revised text in the manuscript:
"The preceding results evaluate the hover-and-wait strong-disturbance condition. To examine whether the same learning-enhanced architecture remains useful during descent, an additional descent-and-landing strong-disturbance condition was evaluated."

Revised text in the manuscript:
"The compared controllers are Residual PPO, Pure PPO, pure INDI, SHMPC, and DOB/CTSMC."

Revised text in the manuscript:
"Residual PPO achieves the highest Desired rate, 90.0%, while keeping 98.3% of trials within the Adequate envelope."

Comments 2: The simulations were performed for specific wind directions and specific sea conditions. There is no explanation about the limitations in robustness tests under more extreme sea conditions or sudden wind gusts.

Response 2: Thank you for pointing this out. We agree that the original robustness discussion was too narrow. In the revised manuscript, we have clarified that the current WOD sweep and Monte Carlo simulations are performed within a bounded simulation envelope and should not be interpreted as demonstrating robustness under all possible sea states, wind directions, or abrupt gust transients. We now explicitly frame the work as a numerical-simulation-based feasibility study and state that broader robustness evaluation under more extreme sea states, sudden gusts, expanded disturbance sets, and higher-fidelity airwake/deck-motion models remains necessary before stronger operational conclusions can be made.

The WOD-sweep evaluation is reported in Section 4.3, Page 14, Lines 436-445, and Page 15, Lines 466-480. The added limitation discussion can be found in Section 5, "Limitations and Future Work," Page 19, Lines 532-542, and Page 20, Lines 543-556. The corresponding conclusion is provided on Page 20, Lines 565-577.

Revised text in the manuscript:
"This study is a numerical-simulation-based feasibility investigation. The simulation framework uses simplified modelling assumptions, including a simplified disturbance model, a quasi-steady Cheeseman–Bennett ground-effect correction, an analytic multisine deck-motion model, and a CETI aerodynamic surrogate model."

Revised text in the manuscript:
"Future work should therefore combine higher-fidelity airwake, dynamic-inflow, inclined- and moving-ground, and touchdown-contact models with broader sea-state, gust, deck-motion, out-of-distribution airwake, and actuator-failure campaigns."

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript addresses a relevant problem in shipboard helicopter recovery, namely deck-relative station keeping during the waiting phase before touchdown. The proposed architecture, combining a split-channel INDI outer loop, a reduced-order DI inner loop, and a bounded residual PPO policy, is conceptually sound. A strength of the work is that the learned policy does not replace the model-based controller, but provides residual corrections around an interpretable baseline.

The numerical results are coherent and suggest that the residual architecture improves tracking performance, Desired/Adequate occupancy, success-hold rate, and command usage compared with both the baseline controller and a Pure PPO controller. The Monte Carlo analysis and WOD sweep provide useful evidence within the adopted simulation framework.

However, substantial revisions are needed. First, the novelty should be clarified. The manuscript combines established components, including INDI, DI, PPO, and residual reinforcement learning. The main contribution appears to be mainly applicative: applying a baseline-preserving residual PPO architecture to the waiting phase of shipboard helicopter recovery. This should be stated more explicitly and with a more balanced tone.

Second, the claims about robustness and operational relevance should be moderated. The study relies on simplified disturbance and deck-motion models, including a CETI airwake surrogate, a quasi-steady Cheeseman-Bennett ground-effect correction, and analytic multisine deck motion. These assumptions are acceptable for a preliminary control-oriented study, but they do not support broad operational claims. The paper should be framed more clearly as a simulation-based feasibility study.

Third, the comparison with the state of the art should be strengthened. The comparison against the baseline controller and Pure PPO is useful, but it does not demonstrate superiority over stronger alternatives such as MPC, shrinking-horizon MPC, disturbance-observer-based control, robust/adaptive control, or constrained control allocation. The authors should either include stronger benchmarks or explicitly justify why they are outside the scope of the present work.

Fourth, the ground-effect modelling assumptions deserve more discussion. The current quasi-steady correction does not capture dynamic wake inflow effects, inclined-ground effects, or moving-deck-induced inflow variations. The authors may consider discussing recent work on rotor ground effect over inclined and moving surfaces, for example: Pasquali, Claudio, Jacopo Serafini, Giovanni Bernardini, Joseph Milluzzo, and Massimo Gennaretti. “Numerical-Experimental Correlation of Hovering Rotor Aerodynamics in Ground Effect.” Aerospace Science and Technology, 2020; and Pasquali, Claudio, Massimo Gennaretti, Giovanni Bernardini, and Jacopo Serafini. “State-Space Dynamic Inflow Modelling for Hovering Rotors in Fixed- and Moving-Ground Effect.” Aerospace Science and Technology, 2023.

Fifth, the manuscript should discuss safety and stability more explicitly. Since reinforcement learning is introduced into a safety-critical aerospace control problem, the authors should address closed-loop stability, actuator saturation, residual-command bounds, failure modes, out-of-distribution behavior, and possible safety monitors or fallback mechanisms. Even if formal guarantees are not provided, this limitation should be acknowledged.

Finally, the reproducibility of the results should be improved. The authors should specify what is included in the release bundle, including code, trained policies, simulation configuration files, random seeds, evaluation scripts, disturbance realizations, and plotting scripts. The Monte Carlo results would also benefit from confidence intervals, standard deviations, interquartile ranges, or statistical tests comparing the controllers.

Overall, the paper is relevant and promising, but it requires major revision before the contribution can be considered fully convincing.

Author Response

Comments 1: The novelty should be clarified. The manuscript combines existing components, including INDI, DI, PPO, and residual reinforcement learning. The main contribution appears primarily application-oriented: applying a baseline-preserving residual PPO architecture to the waiting phase of shipboard helicopter recovery. This should be stated more explicitly and in a more balanced way.

Response 1: Thank you for pointing this out. We agree with this comment. We have revised the manuscript to state the novelty more clearly and more cautiously. The revised manuscript now emphasizes that INDI, DI, PPO, and residual reinforcement learning are established building blocks, and that the contribution of this work lies in their deck-relative, baseline-preserving integration for shipboard near-deck station keeping and landing. This revision can be found in the Abstract, Page 1, Paragraph 1, Lines 10–17; Introduction, Page 3, Paragraph 2, Lines 87–94; the contribution list, Page 3, Lines 96–107; and Conclusions, Page 21, Paragraph 1, Lines 595–605.