Enhancing Robust Adaptive Dynamic Positioning of Full-Actuated Surface Vessels: Reinforcement Learning Approach for Unknown Hydrodynamics

Li, Jiqiang; Huang, Wanjin; Huang, Chenfeng; Zhang, Guoqing

doi:10.3390/jmse13050993

Open AccessArticle

Enhancing Robust Adaptive Dynamic Positioning of Full-Actuated Surface Vessels: Reinforcement Learning Approach for Unknown Hydrodynamics

Navigation College, Dalian Maritime University, Dalian 116026, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2025, 13(5), 993; https://doi.org/10.3390/jmse13050993

Submission received: 15 April 2025 / Revised: 13 May 2025 / Accepted: 19 May 2025 / Published: 21 May 2025

(This article belongs to the Special Issue Optimal Maneuvering and Control of Ships—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

In this article, a robust adaptive dynamic position-control problem is addressed for full-actuated surface vessels under coupled uncertainties from unmodeled hydrodynamic effects and time-varying external disturbances. To obtain a high-performance dynamic position controller, a reinforcement learning (RL) weights law involving actor and critic networks is designed without knowledge of the model dynamics and the disturbance parameters. This can enhance the robustness of the closed-loop control system. Furthermore, dynamic surface control is integrated to diminish the design complexity resulting from the derivative of the kinematics, while ensuring semi-global uniformly ultimately bounded (SGUUB) stability through Lyapunov-based synthesis. Simulations are carried out to evaluate the superiority and feasibility of the proposed algorithm.

Keywords:

surface vessels; reinforcement learning; robust adaptive control; dynamic surface control

1. Introduction

In recent years, artificial intelligence has achieved an exploratory application in the field of marine engineering [1]. Some new approaches have been introduced into the autonomous control of marine vehicles for the high-efficiency implementation of engineering missions, such as path following, formation control, and dynamic positioning (DP), to name but a few. Among these tasks, especially, is the DP of fully actuated surface vehicles demanding high-precision operation under unpredictable external disturbances and model hydrodynamic parameters [2]. To this end, neural networks and a Takagi–Sugeno (T-S) fuzzy system [3] are utilized for addressing uncertainties [4], which involves a larger number of computational parameters. An explicit fact is that the RL strategy [5,6] should be further studied to enhance the control performance of the DP system [7], highlighting the need for more adaptive solutions.

In the literature, a large number of advanced DP control algorithms have been presented for fully actuated surface vehicles. By dynamically adjusting control parameters in real time, adaptive control methods [8,9] enhance the robustness and stability of systems to effectively address variations in system parameters and external disturbances [10]. In controller design using event-triggered approaches [11], computational resource utilization is optimized and operational efficiency improved by designing appropriate triggering conditions that reduce the update frequency of control signals, while maintaining system performance. Additionally, through the accurate modeling and compensation of nonlinear dead zones, dead-zone compensation control [12] is used to mitigate the impact of actuator nonlinearities within the system. Along similar lines, by predicting and compensating for communication and actuation delays, time-delay compensation control successfully enhances system’s reliability and response speed. Despite these advancements, the aforementioned methods do not achieve the expected results satisfactorily, and they still have significant limitations when dealing with uncertainties in the hydrodynamic model and time-varying external disturbances.

It should be pointed that the above-mentioned control strategies do not concern the strong nonlinear properties resulting from low-velocity operations, such as the complex hydrodynamic parameters between the hull and the ocean environment. Around these issues, although traditional nonlinear control approaches have achieved satisfactory performance in several applications, such as the controller design based on Lyapunov stability analysis, they still have limitations for complex, high-dimensional, and dynamical changes in nonlinear systems. To overcome this, with their strong approximation capabilities, neural networks are used in system modeling and control strategy design [13,14], which possess strong fault tolerance and robustness [15]. Similarly, through utilizing fuzzy rules to handle uncertainties, the fuzzy control approaches in [16] simplify complexity with linear systems. Currently, as an important branch of the artificial intelligence field, machine learning [17] has a broad application prospect. Furthermore, RL is an automatic learning approach [18] that interacts with the environment to adjust behavioral strategies through reward and punishment signals, as described in [19]. In particular, compared to traditional methods, where the system explored the simulated environment through trial and error, RL enables the direct acquisition of the optimal control strategy through interaction with the environment, without the need for a preexisting hydrodynamic model. It allows for real-time adaptation of the strategy through online learning, effectively handling time-varying disturbances and parameter uncertainties, and thereby enhancing positioning accuracy, energy efficiency, and resilience to interference [20]. Despite the fact that RL has been widely applied in various fields, it is largely untapped in DP. Through its adaptive learning ability, RL can optimize control strategies in real time in a dynamically changing environment, and it is particularly suitable for addressing challenges such as hydrodynamic uncertainty and external disturbances. The training efficiency and stability of the algorithm have been improved, and the dynamic response and disturbance suppression ability of the system have also been enhanced. By approximating solutions to the Hamilton–Jacobi–Bellman (HJB) equation, RL can address the nonlinearities inherent in DP systems [21], which offers a promising solution to their control challenges. RL provides a solution for DP systems that goes beyond traditional methods through model-free learning, online adaptation, complex disturbance suppression capabilities, and reduced training data requirements. Its unique advantages in scenarios of model uncertainty, time-varying disturbances, and multi-objective optimization make it an ideal choice for DP control in a highly dynamic marine environment.

Motivated by the above observations, this paper proposes an enhancing robust adaptive DP control algorithm for fully actuated surface vehicles by employing the A-C RL mechanism and dynamic surface control technique. The major contributions of this article are twofold:

(1): Compared with conventional neural networks [13], the core innovation lies in a RL framework where actor–critic mechanisms are presented for the DP system: the actor network explicitly estimates multi-source model uncertainties, while the critic network dynamically optimizes the coupled interactions between positioning accuracy and environmental disturbances via a value function;
(2): By constructing a gain-related adaptive law, a high-efficiency thrust allocation algorithm of the multi-actuators of the fully actuated vessels is proposed, where the complexity calculation can be avoided by employing the dynamic surface control technique. The proposed methodology significantly enhances operational autonomy while maintaining compatibility with standard marine control hardware platforms.

2. Problem for Formulation and Preliminaries

2.1. Preliminaries

Throughout this paper,

diag {y_{1}, \dots, y_{n}}

denotes the main diagonal matrix, with the diagonal elements being

y_{1}, \dots, y_{n}

. For a given matrix

X = [x_{i, j}] \in ℝ^{m \times n}

where

ℝ^{m \times n}

represents the set of all

m \times n

matrices consisting of real numbers,

m

and

n

represent the number of rows and columns of the matrix, respectively,

{‖X‖}_{F}^{2} = tr \{X^{T} X\} = \sum_{i = 1}^{m} \sum_{j = 1}^{n} x_{i, j}^{2}

.

|\cdot|

indicates the matrices of the absolute operator of a scalar.

‖\cdot‖

is the Euclidean norm of a vector, and

{‖\cdot‖}_{F}

denotes the Frobenius norm and usually refers to the square of the Frobenius norm of the matrix, which is the sum of the squares of all elements.

tr \{X\}

is the sum of the main diagonal elements of

X

. In mathematics, the trace of a matrix is mathematically equivalent to the square of the Frobenius norm.

(\cdot)

represents a certain variable or parameter; here, the “·” represents a placeholder indicating the name of a specific variable or parameter.

\hat{(\cdot)}

is the estimate of

(\cdot)

. The estimation error

\tilde{(\cdot)} = \hat{(\cdot)} - (\cdot)

.

2.2. Dynamic Model of Marine Vessels

The purpose of the DP of vessels at sea is to accurately control the vessel propulsion system, which can adjust the position of the vessel in real time to meet the high stability requirements of offshore operations and ensure the accuracy and stability of the working location. According to seakeeping and maneuvering theory, the model of the marine vessel for 3 degrees of freedom (DOFs) can be expressed as

\dot{η} = R (ψ) μ

(1)

M \dot{μ} + N (μ) μ = τ + τ_{w}

(2)

τ = T (β) κ (n) μ_{p}

(3)

where

η = {[x, y, ψ]}^{T} \in ℝ^{3}

represents the position and heading angle vector in the geodetic coordinate system,

μ = {[u, v, r]}^{T} \in ℝ^{3}

represents the velocity and rotational angular velocity vector in the hull coordinate system, and the rotational matrix between body-fixed and earth-fixed coordinate frames is expressed as Equation (4). Note that

R^{T} (ψ) = R^{- 1} (ψ)

,

‖R‖ = 1

.

R (ψ) = [\begin{matrix} \cos (ψ) & \sin (ψ) & 0 \\ - \sin (ψ) & \cos (ψ) & 0 \\ 0 & 0 & 1 \end{matrix}]

(4)

In Equation (2),

M \in ℝ^{3 \times 3}

is the inertial mass matrix, and

N (μ) μ

are a nonlinear fluid mechanics function, which includes Coriolis forces and centripetal forces (moments), as well as nonlinear damping forces.

τ = {[τ_{u}, τ_{v}, τ_{r}]}^{T}

represents the controlling force and torque provided by the driving device, and in Equation (1) their detailed distribution models are given.

τ_{w} = {[τ_{w u}, τ_{w v}, τ_{w r}]}^{T}

is a bounded disturbance caused by the disturbance of wind, wave, and current environment.

M = [\begin{matrix} m - X_{\dot{u}} & 0 & 0 \\ 0 & m - Y_{\dot{v}} & m x_{G} - Y_{\dot{r}} \\ 0 & m x_{G} - Y_{\dot{r}} & I_{z} - N_{\dot{r}} \end{matrix}]

(5)

N (μ) μ = [\begin{matrix} - X_{u} - X_{|u| u} |u| u + Y_{\dot{v}} v |r| + Y_{\dot{r}} r r \\ - X_{\dot{u}} u r - Y_{v} v - Y_{r} r - Y_{|v| v} |v| v - Y_{|v| r} |v| r \\ (X_{\dot{u}} - Y_{\dot{v}}) u v - Y_{\dot{r}} u r - N_{v} v - N_{r} r - N_{|v| v} |v| v - N_{|v| r} |v| r \end{matrix}]

(6)

I_{z}

represents the moment of inertia along the body-fixed

z_{b}

axis.

Y_{\dot{v}}

,

Y_{\dot{r}}

, and

N_{\dot{r}}

characterize added mass and inertia moment. Around

N (μ) μ, X_{u}, X_{|u| u}, Y_{\dot{v}}

, etc., are all hydrodynamic force derivatives.

In Equation (3),

T (β) \in ℝ^{3 \times q}

is the thrust allocation matrix.

q

is the number of collocated thrusters.

β

is the angle of the equivalent.

κ (n) = diag \{κ_{1} (n_{1}), \dots, κ_{q} (n_{q})\} \in ℝ^{q \times q}

, is the unknown coefficient moment matrix that depends on the propeller rotation speed

n_{i}

, where

i = 1, \dots, q

. and

κ_{i} (n_{i})

is the elements of the

κ (n)

. In the actual use environment, the change of wake flow and the influence of hull movement make the performance of the propeller more difficult to predict and quantify.

μ_{p} = {[|p_{1}| p_{1}, \dots, |p_{q}| p_{q}]}^{T}

is the actual controllable input, being the pitch ratio of thrusters

p_{i} \in [- 1, 1]

, where

i = 1, \dots, q

.

Assumption 1.

As the mass matrix,

M

is positive, definite, and invertible. Due to the inherent port–starboard symmetry and approximate fore–aft symmetry of surface vessels, this assumption is typically satisfied.

Assumption 2.

In Equation (2),

τ_{w}

represents a bounded disturbance, i.e., there exists a positive constant vector

τ_{w} = {[τ_{w u}, τ_{w v}, τ_{w r}]}^{T} > 0

. Note that

τ_{w}

is an unknown vector solely for the purposes of analysis.

Lemma 1.

In practical engineering, the thrust generated by a propulsor is always limited. Therefore, for a given vessel, the force coefficient of the propulsor is a constant that must satisfy the condition

0 < \underline{κ_{i}} \leq κ_{i} (n_{i}) \leq \bar{κ_{i}}

, where

i = 1, \dots, q

. Both

\underline{κ_{i}}

and

\bar{κ_{i}}

are unknown and are used solely for stability analysis.

2.3. NN Function Approximation

In a control system, for any given nonlinear continuous function

f (x)

satisfying the initial condition

f (0) = 0

, the NN serves as an efficient approximator to model

f (x)

[22]. From [23,24,25], the

f (x)

is expressed approximately as

f (x) = W^{* T} ϕ (x) + ε (x)

(7)

where the matrix

W^{* T}

represents the ideal weight matrix;

ε (x)

denotes the approximation error; and

ϕ (x)

represents the basis function vector, described by

ϕ_{i} (x) = \frac{1}{\sqrt{2 π} ζ_{i}} \exp (- \frac{{(x - ο_{r i})}^{T} (x - ο_{r i})}{ζ_{i}^{2}}), i = 1, \dots, l

(8)

with

ο_{r i} \in ℝ^{n}

representing the center vector and

ζ_{i}

denoting the width of the Gaussian function.

3. Robust Adaptive Neural Cooperative Controller

In this section, an RL approach based on the actor–critic framework is proposed to develop an adaptive control method [26] for the DP system of vessels. This method is capable of adapting to complex and dynamic marine environmental changes through continuous learning and the optimization of control policies. Under Assumptions 1 and 2, a robust neural adaptive control scheme is designed for the DP system, as described by Equations (1) and (2), utilizing dynamic surface control (DSC) techniques. The schematic diagram and detailed block diagram of the proposed DP system is illustrated in Figure 1 and Figure 2, providing a comprehensive overview of the system architecture.

3.1. Control Design

Let

η_{d}

be the target position; the entire control synthesis consists of the following several steps. Define the position error vector as

η_{e} = η - η_{d}

(9)

Taking the time derivative of Equation (9), one has

{\dot{η}}_{e} = R (ψ) μ - {\dot{η}}_{d}

(10)

Based on the position error dynamics Equation (10), the immediate control

α_{μ} \in ℝ^{3}

can be directly selected as

α_{μ} = R^{- 1} (ψ) (- K_{η} η_{e} + {\dot{η}}_{d})

(11)

where

K_{η}

is a positive definite diagonal matrix. Obviously, the differential expression of

α_{μ}

is very complicated and difficult to solve. In order to solve this problem, DSC technology [14,27] is applied, and the first-order filter is introduced in Equation (12) with the time constant matrix

t_{μ} = diag \{t_{u}, t_{v}, t_{r}\}

.

α_{μ} - β_{μ} = t_{μ} {\dot{β}}_{μ}, α_{μ} (0) = β_{μ} (0)

(12)

The vector

β_{μ}

as the output is the reference signal for the velocity vector

μ

. Meanwhile, ne can define the error vector as

q_{μ} = α_{μ} - β_{μ}

(13)

and the velocity vector as

μ_{e} = μ - β_{μ}

(14)

The derivative of

q_{μ}

is obtained by Equations (13) and (14)

\begin{array}{l} {\dot{q}}_{μ} & = {\dot{α}}_{μ} - {\dot{β}}_{μ} \\ = Ω_{μ} (ψ, r, η_{e}, {\dot{η}}_{e}) - t_{μ}^{- 1} q_{μ} \end{array}

(15)

where

Ω_{μ} (\cdot) = {[Ω_{u} (\cdot), Ω_{v} (\cdot), Ω_{r} (\cdot)]}^{T}

is a vector with its elements being bounded continuous functions. The position error derivative expression is further obtained from Equations (10) and (15) as

{\dot{η}}_{e} = R (ψ) (α_{μ} - q_{μ} + μ_{e}) - {\dot{η}}_{d}

(16)

The primary control variable

α_{μ p} = {[α_{u p}, α_{v p}, α_{r p}]}^{T}

in the proposed design is chosen as Equation (22) for the desired thrusts term

T (β) κ (n) μ_{p}

.

Due to system complexity and dynamic environmental/operational conditions, the servo system governing propeller thrust faces substantial uncertainties, which include load variations, environmental disturbances, and fluctuations in system parameters [25], inducing variations in the thrust coefficient and compromising the stability of the propulsion system [28]. To address these challenges, a robust adaptive control strategy is proposed for adjusting controller parameters in real time to adapt to changes in propeller dynamics or the vessel’s state [29]. The estimation of

1 / κ_{i} (n_{i})

is accomplished using

{\hat{λ}}_{k}

[30]. The control law, expressed as

p = {[p_{1}, \dots, p_{q}]}^{T}

, is derived based on Equations (17) and (18), with the corresponding adaptive law detailed in Equation (19). This strategy ensures enhanced stability and operational efficiency under varying conditions.

p = sgn (μ_{p}) * \sqrt{|μ_{p}|}

(17)

μ_{p} = diag \{{\hat{λ}}_{1}, \dots, {\hat{λ}}_{q}\} T^{†} (\cdot) α_{μ p}

(18)

{\dot{\hat{λ}}}_{k} = χ_{k} (\sum_{i = u, v, r} \sum_{j = u, v, r} T_{k, j}^{†} T_{i, k} i_{e} α_{j p}) - χ_{k} σ k ({\hat{λ}}_{k} - λ_{k} (0)), k = 1, \dots, q

(19)

α_{μ p} = - K_{μ} μ_{e} + {\hat{W}}_{a}^{T} ϕ_{a} + {\dot{β}}_{μ} - R (ψ) η_{e}

(20)

where

K_{μ}, χ_{k} = {[χ_{1}, \dots, χ_{q}]}^{T}

and

σ_{k} = {[σ_{1}, \dots, σ_{q}]}^{T}

are the positive parameters; especially,

σ_{k} ({\hat{λ}}_{k} - {\hat{λ}}_{k} (0))

could protect the estimation

{\hat{λ}}_{k}

from drifting divergence.

T^{†}

denotes the pseudo inverse of

T

. According to the adaptive law Equation (20), the gain correlation coefficient is added to compensate for the influence of model uncertainty and interference [31].

3.2. Critic and Actor NN Design

This section introduces the RL [32] utilizing the critic and actor method to control the DP of vessels [33,34]. At the same time, in order to construct the Bellman error of nonlinear systems, the long-term strategy performance index function [35] is defined as follows:

J_{c} (t) = \int_{t}^{\infty} γ^{\frac{t - x}{ω}} h (η_{e}) d x

(21)

where

ω > 0

is integral reinforcement interval, and

γ

is the discount rate that reduces the weight of this cost incurred further in the future. Based on the delay characteristics of the system, a smaller value (such as 0.9) was selected through the debugging of parameters in order to pay more attention to the long-term return.

J_{c} (t) = {[J_{c 1} (t), J_{c 2} (t), J_{c 3} (t)]}^{T}

, which integrates future information that is not known at the current time related to the position error vector.

h_{i} = \{\begin{matrix} 0, |η_{e i}| \leq C_{i} \\ 1, |η_{e i}| \geq C_{i} \end{matrix}, i = 1, 2, 3

(22)

where

C_{i}

denotes a small positive threshold associated with tracking accuracy, typically set as 0.2, and

h_{i}

and

η_{e i}

express the

i^{th}

parameters of

h (η_{e})

and

η_{e}

, respectively. Under the current control strategy,

h_{i} = 1

indicates a decrease in tracking performance and an increase in long-term performance indicators, while

h_{i} = 0

indicates the opposite, which reflects a significant tracking error under the current control strategy.

At the same time, the long-term strategy performance index function at time

t - ω

is defined as follows:

h_{c} (t) = {[h_{c 1} (t), h_{c 2} (t), h_{c 3} (t)]}^{T}

,

h_{c} (t) = \int_{t - ω}^{t} γ^{\frac{t - x}{ω}} h (η_{e}) d x

.

However, for the highly nonlinear and coupled system described in Equation (3), the long-term utility function

J_{c} (t)

incorporates future information that is unavailable at the current time, making it challenging to solve directly, even for linear systems. Only a limited class of nonlinear systems with specific functional designs and appropriate parameters can achieve the explicit evolution of

J_{c} (t)

.

J_{c} (t)

is particularly difficult to solve. To address this issue, the critic NN, which is utilized as the approximation, is expressed as

J_{c} (t) \approx W_{c}^{* T} Δ ϕ_{c} (t)

where

W_{c}^{*}

is the ideal weight matrix of the critic NN,

η_{e}

is the input vector of the critic NN, and

ϕ_{c}

is the given critic NN vector. The actor NN operates based on the control law, driven by the critic NN’s evaluation of control performance [36].

From Equation (7), the actor NN

f (μ) = W_{a}^{*} ϕ_{a} (μ) + ε_{a} (μ)

is designed. The update mechanism aims to maintain closed-loop stability and optimize the performance index

J_{c} (t)

. However, since

W_{c}^{*}

W_{a}^{*}

is unknown,

{\hat{W}}_{c}

and

{\hat{W}}_{a}

are utilized to approximate

J_{c} (t)

and

f (μ)

in real time, respectively.

As outlined in [37], the strategic utility function is constructed as

ϖ_{a} (t) = {\hat{W}}_{a} ϕ_{a} (μ) + {\hat{J}}_{c} (μ) + μ_{e}

, where

μ_{e}

is the input vector of the actor NN system at time

t

,

{\hat{W}}_{a}

represents the ideal weight matrix of the actor NN, and

ε_{a} (μ)

is an NN approximation error. Meanwhile, the temporal difference error

ϖ_{c}

is defined as

ϖ_{c} (t) = {\hat{W}}_{c}^{T} Δ ϕ_{c} (t) + h_{c}

, where

Δ ϕ_{c} = ϕ_{c} (η_{e} (t)) - γ ϕ_{c} (η_{e} (t - ω))

. Positive constants

K_{h_{c}}

,

K_{W_{c}}

,

K_{ϕ_{c}}

,

K_{ε_{c}}

,

K_{W_{a}}

,

K_{ϕ_{a}}

and

K_{ε_{a}}

exist such that

{‖h_{c}‖}_{F} \leq K_{h_{c}}

,

{‖W_{c}^{*}‖}_{F} \leq K_{W_{c}}

,

{‖ϕ_{c} (η_{e})‖}_{F} \leq K_{ϕ_{c}}

,

{‖ε_{c} (η_{e})‖}_{F} \leq K_{ε_{c}}

,

{‖Δ ϕ_{c}‖}_{F} \leq (1 + γ) K_{ϕ_{c}}

,

{‖W_{a}^{*}‖}_{F} \leq K_{W_{a}}

,

{‖ϕ_{a} (μ)‖}_{F} \leq K_{ϕ_{a}}

,

{‖ε_{a} (μ)‖}_{F} \leq K_{ε_{a}}

. Using a gradient descent algorithm, the adaptive control rate of critic and actor NN part are respectively defined as

{\dot{\hat{W}}}_{c j} = - γ_{c} \frac{\partial (ϖ_{c}^{T} ϖ_{c})}{{\hat{W}}_{c j}} = - γ_{c} Δ ϕ_{c} (t) {[{\hat{W}}_{c j}^{T} Δ ϕ_{c} (t) ρ_{j} + h_{c k}]}^{T}

(23)

{\dot{\hat{W}}}_{a i} = - γ_{a} \frac{\partial (ϖ_{a}^{T} ϖ_{a})}{{\hat{W}}_{a i}} = - γ_{a} ϕ_{a} (ϕ_{a}^{T} {\hat{W}}_{a i} ρ_{i} + μ_{e}^{T} + ϕ_{c}^{T} {\hat{W}}_{c j} ρ_{j})

(24)

where

ρ_{i}

,

ρ_{j}

are gain coefficients,

{\hat{W}}_{c} = {[{\hat{W}}_{c x}, {\hat{W}}_{c y}, {\hat{W}}_{c ψ}]}^{T}

,

{\hat{W}}_{a} = {[{\hat{W}}_{a u}, {\hat{W}}_{a v}, {\hat{W}}_{a r}]}^{T}

,

k = 1, 2, 3

,

i = u, v, r

, and

j = x, y, ψ

. By repeatedly adjusting the size of the gain coefficient in the simulation, the stability and convergence during the training process are guaranteed.

To prevent one network form dominating while the other stagnates, the mitigation of uncoordinated updates between actor and critic NN is solved by the inclusion of gain coefficients.

γ_{c} = diag (γ_{c x}, γ_{c y}, γ_{c ψ}) > 0

and

γ_{a} = diag (γ_{a u}, γ_{a v}, γ_{a r}) > 0

represent the self-defined learning rate matrix. Then, one can define the weight error as

{\tilde{W}}_{c} = {\hat{W}}_{c} - W_{c}^{*}

and

{\tilde{W}}_{a} = {\hat{W}}_{a} - W_{a}^{*}

. Neural networks play a crucial role in the training and performance of networks. To accelerate the training, adjust the network architecture to meet the requirements and ensure that in the actor–critic framework, the critic network can effectively estimate the value function, while the actor network can stably learn the optimal strategy.

Remark 1.

NNs are widely recognized as effective tools for the adaptive approximation of complex unknowns, such as unmodeled dynamics and uncertainties, within intelligent learning-based control frameworks. RL techniques, particularly those employing actor–critic architectures, are designed to optimize long-term rewards through a continuous interaction with online control performance. This performance is inherently influenced by both internal control strategies and external disturbances, which collectively shape the system’s behavior and adaptability. Through multiple repeated experiments, the parameters are evaluated, and the balance of each parameter is adjusted. Due to the complex design of the dynamic adjustment strategy, the hyperparameters need to be adjusted multiple times. Relying on empirical adjustment and repeated trial and error, the values of the parameters are finally determined in Equation (39).

3.3. Stability Analysis

Theorem 1.

Consider a closed-loop system composed of vessel dynamics, a robust neural control law, an actor–critic NN, and adaptive laws. Assume that the initial conditions are satisfied:

η_{e}^{T} (0) η_{e} (0) + μ_{e}^{T} (0) μ_{e} (0) + q_{e}^{T} (0) q_{e} (0) + {\tilde{W}}_{a}^{T} (0) γ_{a}^{- 1} {\tilde{W}}_{a} (0) + {\tilde{W}}_{c}^{T} (0) γ_{c}^{- 1} {\tilde{W}}_{c} (0) \leq 0

and there exist adjustable parameters

K_{η}, K_{μ}, t_{μ}, λ_{k}, χ_{k}, γ_{c}, γ_{a}

. Then, the closed-loop system consisting of the vessel model Equations (1) and (2), the control law Equations (11) and (20), and the adaptive law Equation (19) is stable. The proposed adaptive NN controller, integrating RL, ensures the following closed-loop properties for all initial conditions:

(1): Semi-Global Uniform Ultimate Boundedness: The state variables of the closed-loop system are semi-globally uniformly ultimately bounded;
(2): Tightly Concentrated Errors: The tracking position error $η_{e}$ , tracking velocity error $μ_{e}$ , and the NN weight errors ${\tilde{W}}_{c}$ and ${\tilde{W}}_{c}$ are kept within a tight region;
(3): The optimal control strategy in RL is determined by approximating the ideal unknown weights. An estimator of actor and critic weights is designed to allow the NN to update recursively in parallel.

The proof is as follows: According to the characteristics of vessel DP and control design, the following Lyapunov function is used:

V = \frac{1}{2} η_{e}^{T} η_{e} + \frac{1}{2} μ_{e}^{T} M μ_{e} + \frac{1}{2} q_{e}^{T} q_{e} + \frac{1}{2} \sum_{k = 1}^{q} \frac{κ (n_{k}) {\tilde{λ}}_{k}^{2}}{χ_{k}} + \frac{1}{2} tr (γ_{a}^{- 1} {\tilde{W}}_{a}^{T} {\tilde{W}}_{a}) + \frac{1}{2} tr (γ_{c}^{- 1} {\tilde{W}}_{c}^{T} {\tilde{W}}_{c})

(25)

The time derivative of Equation (25) is

\dot{V} = η_{e}^{T} {\dot{η}}_{e} + μ_{e}^{T} M {\dot{μ}}_{e} + q_{e}^{T} {\dot{q}}_{e} + \sum_{k = 1}^{q} \frac{κ (n_{k}) {\tilde{λ}}_{k} {\dot{\hat{λ}}}_{k}}{χ_{k}} + tr (γ_{a}^{- 1} {\tilde{W}}_{a}^{T} \dot{\hat{W_{a}}}) + tr (γ_{c}^{- 1} {\tilde{W}}_{c}^{T} {\dot{\hat{W}}}_{c})

(26)

Similar to the former, we further have

\begin{array}{l} η_{e}^{T} {\dot{η}}_{e} & = η_{e}^{T} (R (ψ) μ - {\dot{η}}_{d}) \\ \leq - K_{η} η_{e}^{T} η_{e} + η_{e}^{T} R (ψ) μ_{e} + {‖η_{e}‖}_{F}^{2} + \frac{1}{4} {‖q_{e}‖}_{F}^{2} \end{array}

(27)

According to Equations (4) and (14) and Young’s inequality,

μ_{e}^{T} M {\dot{μ}}_{e} = μ_{e}^{T} [τ + τ_{ω} - N (μ) μ] - μ_{e}^{T} q_{e} t_{μ}^{- 1} M

(28)

where the

N (μ) μ - τ_{ω}

term is treated with an

f (μ) = {\hat{W}}_{a}^{T} ϕ_{a} (μ) + ε_{a} (μ)

approximation. And according to Equation (28),

\begin{array}{l} μ_{e}^{T} M {\dot{μ}}_{e} & = μ_{e}^{T} [τ - {\hat{W}}_{a}^{T} ϕ_{a} (μ) - ε_{a} (μ) - q_{e} t_{μ}^{- 1} M] \\ = - μ_{e}^{T} [K_{μ} μ_{e} + ε_{a} (μ) - T κ (\cdot) {\tilde{λ}}_{k} T^{†} α_{μ p}] \\ \leq - (K_{μ} - \frac{1}{4} I) μ_{e}^{T} μ_{e} + μ_{e}^{T} T κ (\cdot) {\tilde{λ}}_{k} T^{†} α_{μ p} + {‖{\bar{ε}}_{a} (μ)‖}_{F}^{2} \end{array}

(29)

Moreover, we also have

\begin{array}{l} q_{μ}^{T} {\dot{q}}_{μ} & = q_{μ}^{T} ({\dot{α}}_{μ} - {\dot{β}}_{μ}) \\ \leq - q_{μ}^{T} t_{μ}^{- 1} q_{μ} + \sum_{i = u, v, r} \frac{K_{Ω_{i}}^{2} q_{i}^{2}}{2 b} + \frac{3}{2} \end{array}

(30)

Through the designed adaptive control law, Equation (19) is further obtained by

\sum_{k = 1}^{q} \frac{κ (n_{k}) {\tilde{λ}}_{k} {\dot{\hat{λ}}}_{k}}{χ_{k}} = \sum_{k = 1}^{q} κ (\cdot) {\tilde{λ}}_{k} [\sum_{i = u, v, r} \sum_{j = u, v, r} T_{k, j}^{†} T_{i, k} i_{e} α_{j p} - σ_{k} ({\hat{λ}}_{k} - {\dot{\hat{λ}}}_{k} (0))]

(31)

The first term cancels out the one of

T κ (\cdot) {\tilde{λ}}_{k} T^{†} α_{μ p} μ_{e}

, and the second term is determined by Young’s inequality:

T κ (\cdot) {\tilde{λ}}_{k} T^{†} α_{μ p} μ_{e} + \sum_{k = 1}^{q} \frac{κ (n_{k}) {\tilde{λ}}_{k} {\dot{\hat{λ}}}_{k}}{χ_{k}} \leq - \frac{1}{2} \sum_{k = 1}^{q} κ (n_{k}) σ_{k} \{{\tilde{λ}}_{k}^{2} - [{\hat{λ}}_{k} - {\hat{λ}}_{k} (0)]\}

(32)

Finally, according to Equation (24), the error of RL weight is analyzed as

tr (γ_{a}^{- 1} {\tilde{W}}_{a}^{T} {\dot{\hat{W}}}_{a}) \leq - \frac{1}{4} K_{ϕ_{a}}^{2} tr ({\tilde{W}}_{a}^{T} {\tilde{W}}_{a}) + μ_{e}^{T} μ_{e} + K_{ϕ_{c}}^{2} tr ({\tilde{W}}_{c}^{T} {\tilde{W}}_{c}) + K_{ϕ_{a}}^{2} K_{W_{a}}^{2} + K_{ϕ_{c}}^{2} K_{W_{a}}^{2}

(33)

tr (γ_{c}^{- 1} {\tilde{W}}_{c}^{T} {\dot{\hat{W}}}_{c}) \leq - \frac{1}{2} {(1 + γ)}^{2} K_{ϕ_{c}}^{2} tr ({\tilde{W}}_{c}^{T} {\tilde{W}}_{c}) + \frac{1}{2} K_{h_{c}}^{2} + {(1 + γ)}^{2} K_{ϕ_{c}}^{2} K_{W_{c}}^{2}

(34)

To sum up, the final form is obtained by combining from Equation (27) to Equation (34):

\begin{array}{l} \dot{V} & \leq - (K_{η} - I) η_{e}^{T} η_{e} - (K_{μ} - \frac{5}{4} I) μ_{e}^{T} μ_{e} - \sum_{i = u, v, r} (t_{μ}^{- 1} - \frac{1}{4} - \frac{K_{Ω_{i}}^{2}}{2 b}) q_{i}^{2} \\ - \frac{1}{2} \sum_{k = 1}^{q} κ (n_{k}) σ_{k} {\tilde{λ}}_{k}^{2} - \frac{1}{4} K_{ϕ_{c}}^{2} [{(1 + γ)}^{2} - 2] tr ({\tilde{W}}_{c}^{T} {\tilde{W}}_{c}) - \frac{1}{4} K_{ϕ_{a}}^{2} tr ({\tilde{W}}_{a}^{T} {\tilde{W}}_{a}) \\ + \frac{3}{2} b + {\bar{ε}}_{a}^{2} + \frac{1}{2} {\sum_{k = 1}^{q} κ (n_{k}) σ_{k} [{\hat{λ}}_{k} - {\hat{λ}}_{k} (0)]}^{2} + \frac{1}{2} K_{h_{c}}^{2} + K_{ϕ_{a}}^{2} K_{W_{a}}^{2} + K_{ϕ_{c}}^{2} K_{W_{a}}^{2} + {(1 + γ)}^{2} K_{ϕ_{c}}^{2} K_{W_{c}}^{2} \\ \leq - (λ_{\min} \{K_{η}\} - 1) η_{e}^{T} η_{e} - (λ_{\min} \{K_{μ}\} - \frac{5}{4}) μ_{e}^{T} μ_{e} - \sum_{i = u, v, r} (t_{μ}^{- 1} - \frac{1}{4} - \frac{K_{Ω_{i}}^{2}}{2 b}) q_{i}^{2} \\ - \frac{1}{2} \sum_{k = 1}^{q} κ (n_{k}) σ_{k} {\tilde{λ}}_{k}^{2} - \frac{1}{4} K_{ϕ_{c}}^{2} [{(1 + γ)}^{2} - 2] tr ({\tilde{W}}_{c}^{T} {\tilde{W}}_{c}) - \frac{1}{4} K_{ϕ_{a}}^{2} tr ({\tilde{W}}_{a}^{T} {\tilde{W}}_{a}) + ϱ \end{array}

(35)

The design parameters are appropriately selected, satisfying

\begin{array}{l} a_{1} \leq λ_{\min} \{K_{η}\} - 1, a_{2} \leq λ_{\min} \{K_{μ}\} - \frac{5}{4}, \\ a_{3} \leq \sum_{i = u, v, r} (t_{μ}^{- 1} - \frac{1}{4} - \frac{K_{Ω_{i}}^{2}}{2 b}), \\ a_{4} \leq \frac{1}{2} K_{ϕ_{c}}^{2} [{(1 + γ)}^{2} - 2], a_{5} \leq \frac{1}{4} K_{ϕ_{a}}^{2} \end{array}

(36)

where

\begin{array}{l} a & = \min \{a_{1}, a_{2}, a_{3}, a_{4}, a_{5}\} \\ ϱ & = \frac{3}{2} b + {\bar{ε}}_{a}^{2} + \frac{1}{2} {\sum_{k = 1}^{q} κ (n_{k}) σ_{k} [{\hat{λ}}_{k} - {\hat{λ}}_{k} (0)]}^{2} \\ + \frac{1}{2} K_{h_{c}}^{2} + K_{ϕ_{a}}^{2} K_{W_{a}}^{2} + K_{ϕ_{c}}^{2} K_{W_{a}}^{2} + {(1 + γ)}^{2} K_{ϕ_{c}}^{2} K_{W_{c}}^{2} \end{array}

(37)

By adjusting the following parameter matrix, all tracking errors of the closed-loop control system are converged to a compact set. Further, by adjusting the parameter

K_{η}, K_{μ}, t_{μ}^{- 1}, γ

, the following conditions can be obtained. Integrating two sides of Equation (35), one can obtain

V (t) \leq ϱ / 2 a + (V (0) - ϱ / 2 a) \exp (- 2 a t)

. The

V (t)

can converge into

ϱ / 2 a

with

t \to \infty

, and the bounded variable

ϱ

can be small enough through turning the control parameters appropriately. Therefore, all the state variables in the closed-loop control system are SGUUB.

4. Numerical Simulation

In this section, two representative case studies are compared to illustrate the effectiveness and merits of the control scheme, i.e., a comparative experiment with the result in [38], and in order to verify the DP adaptive control method based on integral RL, MATLAB 2024a is used as the simulation platform to conduct experiments on the marine environment model. The DP system model used in the experiment includes a thruster, servo system, control system, and external disturbance model. And the environmental disturbances are as follows: the wind is at level 6, the number of wave frequency segments is 10, and the wave velocity is

10 \times 1852 / 3600

. The nominal physical parameters [39] of the vessels are as follows in Equation (38):

\begin{array}{l} X_{\dot{u}} = - 0.7212 \times 10^{6}, Y_{\dot{v}} = - 3.6921 \times 10^{6} \\ Y_{\dot{r}} = - 1.0234 \times 10^{6}, I_{z} - N_{\dot{r}} = 3.7454 \times 10^{9} \\ X_{u} = 5.0242 \times 10^{4}, Y_{v} = 2.7229 \times 10^{5} \\ Y_{r} = - 4.3933 \times 10^{6}, Y_{|v| v} = 1.7860 \times 10^{4} \\ X_{|u| u} = 1.0179 \times 10^{3}, Y_{|v| r} = - 3.0068 \times 10^{5} \\ N_{v} = - 4.3821 \times 10^{6}, N_{r} = 4.1894 \times 10^{8} \\ N_{|v| v} = - 2.4684 \times 10^{5}, N_{|v| r} = 6.5759 \times 10^{6} \end{array}

(38)

With the initial states

η (0) = {[10 m, 10 m, 140 \deg]}^{T}

and

μ (0) = {[10 m / s, 10 m / s, 140 \deg / s]}^{T}

, the desired position and heading are chosen as

η_{d} = {[0 m, 0 m, 156 \deg]}^{T}

. The design parameters utilized for the model are chosen as in Equation (39):

\begin{array}{l} K_{η} = diag \{0.4, 0.4, 0.2\}, K_{μ} diag \{0.4, 0.4, 0.2\}, \\ χ_{k} = [0.1, 0.1, 0.05, 0.05, 0.28, 0.32, 0.21], \\ σ_{k} = [3.2, 3.2, 4.1, 3.5, 1..8, 1.2, 3.1], \\ t_{μ} = diag \{0.01, 0.01, 0.01\}, \\ γ_{c} = diag \{0.01, 0.01, 0.01\}, \\ γ_{a} = diag \{0.01, 0.01, 0.01\}, \\ ρ_{x} = 230, ρ_{y} = 220, ρ_{ψ} = 200, \\ ρ_{u} = 66, ρ_{v} = 58, ρ_{r} = 59, \\ C_{x} = 0.3, C_{y} = 0.3, C_{ψ} = 0.3, γ = 0.9, ω = 8 \end{array}

(39)

The comparative experimental results are illustrated in Figure 3, Figure 4, Figure 5 and Figure 6. In Figure 3, with its trajectory exhibiting a narrower error range and smoother curve, the vessel’s position ultimately stabilizes within the desired attitude domain. As the simulation progresses, the system demonstrates a high accuracy, with the proposed algorithm outperforming in terms of control and positioning precision. The trajectory (the proposed curve) converges tightly within the desired error boundary. In contrast, the other curve shows persistent oscillations with a maximum deviation.

The comparison of specific position parameters

(x, y, ψ)

in Figure 4 reveals that both algorithms attain fast convergence within the target region. However, the algorithm from [38], exhibits significant fluctuations in attitude and velocity errors, whereas the proposed algorithm maintains smoother variations, achieving a precise targeting of the desired attitude. Figure 5 presents a remarkable demonstration of control robustness, with the proposed algorithm reducing speed variations by an order of magnitude compared to the oscillatory behavior characteristic of conventional methods [38]. Under substantial external disturbances, the proposed algorithm reduces the speed fluctuations of the vessel to the target point more stably. Based on the adaptive control algorithm’s ability, the outcome highlights the RL optimizing the system’s response to disturbance, reducing the necessity for abrupt adjustments in propeller output to accommodate environmental changes. During the initial stage, there is a significant transient overshoot in [38], while the proposed algorithm has an overshoot of no more than 0.1

m / s

. Safety improvement: The steady-state fluctuation range of

u

and

v

is reduced, indicating that the vessel is less susceptible to drift caused by wind and waves during positioning operations. Its low fluctuation and high-precision characteristics are particularly suitable for high-demand marine engineering scenarios. In the comparison in Figure 6, the proposed algorithm demonstrates smoother temporal variations in force and torque, minimizing sudden adjustments or fluctuations in control inputs. For DP systems requiring high precision and stability, the proposed method significantly enhances operational stability and control efficiency, thereby improving overall system performance metrics. The attenuation rate

a

determines the speed at which the system state tends to be stable. It can be seen from the theoretical analysis that the convergence effect is good. In Figure 3 and Figure 4, both the velocity vector and the position vector have converged to zero in the domain near 100 s, and the dynamic indicators such as the overshoot and the adjustment time perform well. RL and adaptive control methods are adopted to further improve the convergence speed and enhance the transient response characteristics of the system. This scheme ensures that the system reaches a stable process quickly and smoothly, reduces over-rush and slows oscillation to make the oscillation amplitude smaller, and improves the stability of the system.

Figure 7 depicts the pitch rate of the control propeller, in the simulation experiments, with all values falling within reasonable ranges. The two subplots represent the pitch ratio and the azimuth angle of the azimuth thruster, respectively [30]. The azimuth angle of the propeller directly affects the anti-drift ability of the vessel by controlling the thrust direction of the thruster. Its dynamic adjustment can align the thrust vector in reverse with environmental disturbances (such as wind loads), significantly enhancing the positioning stability. Multiple thrusters are combined through differentiated azimuth angles to form a cooperative thrust vector field, achieving high-precision torque control. This azimuth–pitch joint control mechanism is optimized in real time through an adaptive algorithm. It can not only adjust the thrust phase in advance, according to the environmental interference prediction model, to avoid overdraft but also dynamically reconstruct the thrust distribution when the thruster fails. Figure 8 illustrates significant variations in the long-term utility function. Over time, the control system progressively minimizes the heading angle error, rendering changes negligible over time, thus achieving the required performance and stability. The primary objective of both networks is to minimize the utility function, optimizing the control strategy and reducing the impact of wind and wave variations to ensure high-precision positioning.

Figure 9 presents the parameters of the robust adaptive control, where the incorporation of the RL network significantly smooths their fluctuations, reflecting improved control performance. Subsequently, both position and velocity vector fluctuations remain minimal, maintaining precise positioning and robustness against disturbances in complex environments. Notably, due to the larger parameter space required for policy learning, the weight norm of the actor network is slightly larger than that of the critic network, reflecting its higher structural complexity. In contrast, the critic, which estimates the value function, typically operates within a less complex action space, leading to a smaller weight norm. This observation underscores the distinct structural and training demands of the two networks during the learning process. Figure 10 and Figure 11 present the cost function trends and norm of the current weights for the two networks. With more fluctuations in the early learning phase, the critics cost quantifies the error in estimating the long-term utility. The norm of the actor weights updates and focuses on optimizing the strategy to enhance future expected rewards, showing greater variation due to uncertainty in the training environment. The primary objective of the actor–critic RL method in tuning the action–critic NNs is to minimize the cost function

V_{a} = ϖ_{a}^{T} ϖ_{a}

and

V_{c} = ϖ_{c}^{T} ϖ_{c}

, effectively tracking the target heading and maintaining small heading errors despite environmental disturbances. This method dynamically adjusts the control strategy based on historical data and environmental feedback, demonstrating a high accuracy in practical applications. The critic network evaluates the long-term value function and directs the actor to focus on actions that minimize cumulative future errors. The actor network learns a strategy that achieves a balance between tracking accuracy and energy efficiency. The proposed RL framework ensures adaptability to non-stationary disturbances, while avoiding the excessive punishment of control efforts. It is achieved through collaborative actor-driven strategy optimization and critic-guided value learning.

In Figure 12, in the area where the position and velocity error values rapidly decrease and then return to zero, even when the system is confronted with disturbances under harsh sea conditions, it can still rapidly converge and maintain stability and robustness.

Through the RL process, the agent learns the most energy-efficient control strategy while maintaining control precision, particularly in unstable sea conditions, where the system optimally manages propeller operations to avoid excessive adjustments. It could be heavily impacted by harsh sea conditions in [38], and its capacity to mitigate such disturbances was relatively weak. In contrast, the single parameter of the proposed algorithm can be finely adjusted, enabling the gradual stabilization of the critic network’s value estimation. With adequate training and sufficient interaction with the environment, the algorithm demonstrates improved performance.

5. Conclusions

Even though robust adaptive control boasts strong stability and the ability to handle uncertainties, when facing extreme conditions, especially sea condition changing rapidly, its judgement is slow, and its energy consumption is high. By combining RL with traditional robust control, it performs well in dynamic maritime environments, significantly reducing position and heading errors, while optimizing energy efficiency and ensuring real-time adaptability, which guarantees smaller error fluctuations and enhanced stability. Moreover, it addresses the problems of response delay and performance degradation in extreme conditions. This synergy improves control accuracy and stability, making it highly effective in challenging maritime operations.

Author Contributions

Conceptualization, J.L., W.H. and C.H.; methodology, W.H.; software, W.H.; validation, J.L. and C.H.; formal analysis, J.L.; investigation, J.L.; writing—original draft preparation, W.H.; writing—review and editing, J.L. and G.Z.; supervision, J.L.; project administration, G.Z.; funding acquisition, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

The paper is partially supported by the National Excellent Youth Science Fund of China (52322111), the National Natural Science Foundation of China (52171291), the Dalian Science and Technology Program for Distinguished Young Scholars (2022RJ07), and the Fundamental Research Funds for the Central Universities (3132023137, 3132023502). The authors would like to thank the anonymous reviewers for their valuable comments.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhai, M.; Wu, W.; Tsai, S. The Effects of Artificial Intelligence Orientation on Inefficient Investment: Firm-Level Evidence from China’s Energy Enterprises. Energy Econ. 2025, 141, 108048. [Google Scholar] [CrossRef]
Alagili, O.; Fernando, E.; Ahmed, S.; Imtiaz, S.; Murrant, K.; Gash, B.; Islam, M.; Zaman, H. Experimental Investigations of An Energy-Efficient Dynamic Positioning Controller for Different Sea Conditions. Ocean Eng. 2024, 299, 117297. [Google Scholar] [CrossRef]
Xie, W.; Liang, J.; Wang, Z.; Yang, J. Non-Monotonic Lyapunov Function Based Membership Function Dependent Robust Control of Takagi-Sugeno Fuzzy Systems. Eng. Appl. Artif. Intell. 2025, 152, 110785. [Google Scholar] [CrossRef]
Hatami, E.; Salarieh, H. Adaptive Critic-Based Neuro-Fuzzy Controller for Dynamic Position of Ships. Sci. Iran. 2015, 22, 272–280. [Google Scholar]
Wan, L.; Fu, L.; Li, C.; Li, K. Flexible Job Shop Scheduling via Deep Reinforcement Learning with Meta-Path-Based Heterogeneous Graph Neural Network. Knowl.-Based Syst. 2024, 296, 111940. [Google Scholar] [CrossRef]
Wang, D.; Zhao, H.; Zhang, L.; Chen, K. Learning to Dispatch for Flexible Job Shop Scheduling Based on Deep Reinforcement Learning via Graph Gated Channel Transformation. IEEE Access 2024, 12, 50935–50948. [Google Scholar]
Zhao, L.; Bai, Y. Unlocking the Ocean 6G: A Review of Path-Planning Techniques for Maritime Data Harvesting Assisted by Autonomous Marine Vehicles. J. Mar. Sci. Eng. 2024, 12, 126. [Google Scholar] [CrossRef]
Liu, Z.; Zhang, O.; Gao, Y.; Zhao, Y.; Sun, Y.; Liu, J. Adaptive Neural Network-Based Fixed-Time Control for Trajectory Tracking of Robotic Systems. IEEE Trans. Circuits Syst. II Express Briefs 2023, 70, 241–245. [Google Scholar] [CrossRef]
Li, J.; Zhang, G.; Cabecinhas, D.; Pascoal, A.; Zhang, W. Prescribed Performance Path Following Control of USVs via An Output-Based Threshold Rule. IEEE Trans. Veh. Technol. 2024, 73, 6171–6182. [Google Scholar] [CrossRef]
Ma, D.; Chen, X.; Ma, W.; Zheng, H.; Qu, F. Neural network Model Based Reinforcement Learning Control for AUV 3-D Path Following. IEEE Trans. Intell. Veh. 2014, 9, 893–904. [Google Scholar] [CrossRef]
Li, J.; Zhang, G.; Zhang, X.; Zhang, W. Integrating Dynamic Event-Triggered and Sensor-Tolerant Control: Application to USV-UAVs Cooperative Formation System for Maritime Parallel Search. IEEE Trans. Intell. Transp. Syst. 2025, 25, 3986–3998. [Google Scholar] [CrossRef]
Wang, Y.; Bai, W.; Zhang, W.; Chen, S.; Zhao, Y. Optimal Course Tracking Control of USV with Input Dead Zone Based on Adaptive Fuzzy Dynamic Programing. Proc. Inst. Mech. Eng. Part J. Syst. Control Eng. 2024. [CrossRef]
Zhang, G.; Sun, Z.; Li, J.; Huang, J.; Bin, Q. Iterative Learning Control for Path-following of ASV with the Ice Floes Auto-select Avoidance Mechanism. IEEE Trans. Intell. Transp. Syst. 2025; early access. [Google Scholar] [CrossRef]
Qu, C.; Cheng, L.; Ga Gao, S.; Huang, X. Experience Replay Enhances Excitation Condition of Neural-Network Adaptive Control Learning. J. Guid. Control Dyn. 2025, 48, 496–507. [Google Scholar] [CrossRef]
Ning, J.; Ma, Y.; Chen, C. Event-Triggered-Based Distributed Formation Cooperative Tracking Control of Under-Actuated Unmanned Surface Vehicles with Input and State Quantization. IEEE Trans. Intell. Transp. Syst. 2025, 26, 7081–7097. [Google Scholar] [CrossRef]
Gao, Y.; Su, S.; Zong, Y.; Zhang, L.; Guo, X. Adaptive Fuzzy Gain-Scheduling Robust Control for Stability of Quadrotors. Appl. Math. Model. 2025, 138, 115816. [Google Scholar] [CrossRef]
He, Y.; Liu, Y.; Yang, L.; Qu, X. Deep Adaptive Control: Deep Reinforcement Learning-Based Adaptive Vehicle Trajectory Control Algorithms for Different Risk Levels. IEEE Trans. Intell. Veh. 2024, 9, 1654–1666. [Google Scholar] [CrossRef]
Wang, S.; Li, J.; Jiao, Q.; Ma, F. Design Patterns of Deep Reinforcement Learning Models for Job Shop Scheduling Problems. J. Intell. Manuf. 2024. [CrossRef]
Yang, Y.; Geng, S.; Yue, D.; Gorbachev, S.; Korovin, I. Event-Triggered Approximately Optimized Formation Control of Multi-Agent Systems with Unknown Disturbances via Simplified Reinforcement Learning. Appl. Math. Comput. 2025, 489, 129149. [Google Scholar] [CrossRef]
Abreu, M.; Reis, L.; Lau, N. Addressing Imperfect Symmetry: A Novel Symmetry-Learning Actor-Critic Extension. Neurocomputing 2025, 614, 128771. [Google Scholar] [CrossRef]
Tagliaferri, F.; Viola, I. A Real-Time Strategy Decision Program for Sailing Yacht Races. Ocean Eng. 2017, 134, 129–139. [Google Scholar] [CrossRef]
Ning, J.; Wang, Y.; Chen, C.; Li, T. Neural Network Observer Based Adaptive Trajectory Tracking Control Strategy of Unmanned Surface Vehicle with Event-Triggered Mechanisms and Signal Quantization. IEEE Trans. Emerg. Top. Comput. Intell. 2025; early access. [Google Scholar] [CrossRef]
Zhang, G.; Yin, S.; Li, J.; Zhang, W.; Zhang, W. Game-Based Event-Triggered Control for Unmanned Surface Vehicle: Algorithm Design and Harbor Experiment. IEEE Trans. Cybern. 2025; early access. [Google Scholar] [CrossRef]
He, W.; Dong, Y.; Sun, C. Adaptive Neural Impedance Control of a Robotic Manipulator with Input Saturation. IEEE Trans. Syst. Man Cybern. Syst. 2016, 46, 334–344. [Google Scholar] [CrossRef]
Liu, Z.; Zhao, Y.; Zhang, O.; Chen, W.; Wang, J.; Gao, Y.; Liu, J. A Novel Faster Fixed-Time Adaptive Control for Robotic Systems with Input Saturation. IEEE Trans. Ind. Electron. 2024, 71, 5215–5223. [Google Scholar] [CrossRef]
Shen, H.; Wu, J.; Wang, Y.; Wang, J. Reinforcement Learning-Based Robust Tracking Control for Unknown Markov Jump Systems and Its Application. IEEE Trans. Circuits Syst. II-Express Briefs 2024, 71, 1211–1215. [Google Scholar] [CrossRef]
Li, H.; Zhang, T. Neural Adaptive Dynamic Event-Triggered Practical Fixed-Time Dynamic Surface Control for Non-Strict Feedback Nonlinear Systems. Int. J. Adapt. Control Signal Process. 2022, 36, 3066–3086. [Google Scholar] [CrossRef]
Liu, A.; Wang, D.; Qiao, J. An Advanced Robust Integral Reinforcement Learning Scheme with The Fuzzy Inference System. Int. J. Robust Nonlinear Control 2024, 34, 11745–11759. [Google Scholar] [CrossRef]
Chen, Y.; Ding, J.; Chen, Y.; Yan, D. Nonlinear Robust Adaptive Control of Universal Manipulators Based on Desired Trajectory. Appl. Sci. 2024, 15, 2219. [Google Scholar] [CrossRef]
Chwa, D. Tracking Control of Differential Drive Wheeled Mobile Robots Using a Backstepping Like Feedback Linearization. IEEE Trans. Syst. Man Cybern. Part A-Syst. Hum. 2010, 40, 1285–1295. [Google Scholar] [CrossRef]
Zhang, G.; Cai, Y.; Zhang, W. Robust Neural Control for Dynamic Positioning Ships with The Optimum-Seeking Guidance. IEEE Trans. Syst. Man Cybern. Syst. 2017, 47, 1500–1509. [Google Scholar] [CrossRef]
Zheng, L.; Fiez, T.; Alumbaugh, Z.; Chasnov, B.; Ratliff, L. Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning Algorithms. IEEE Trans. Neural Netw. Learn. Syst. 2022, 36, 9217–9224. [Google Scholar] [CrossRef]
Zhang, G.; Li, Z.; Li, J.; Zhang, W.; Bin, Q. Prescribed Performance Path Following Control for Rotor-Assisted Vehicles via an Improved Reinforcement Learning Mechanism. IEEE Trans. Neural Netw. Learn. Syst. 2025; early access. [Google Scholar] [CrossRef]
Xu, B.; Yang, C.; Shi, Z. Reinforcement Learning Output Feedback NN Control Using Deterministic Learning Technique. IEEE Trans. N Neural Netw. Learn. Syst. 2014, 25, 635–641. [Google Scholar]
Qin, C.; Zhu, T.; Jiang, K.; Wu, Y. Integral Reinforcement Learning-Based Dynamic Event-Triggered Safety Control for Multiplayer Stackelberg-Nash Games with Time-Varying State Constraints. Eng. Appl. Artif. Intell. 2024, 133, 108317. [Google Scholar] [CrossRef]
Zheng, Z.; Ruan, L.; Zhu, M.; Guo, X. Reinforcement Learning Control for Underactuated Surface Vessel with Output Error Constraints and Uncertainties. IEEE Trans. Veh. Technol. 2020, 399, 479–490. [Google Scholar] [CrossRef]
Hou, Y.; Lin, M.; Anjidani, M.; Nik, H. Robust Optimal Control of Point-Feet Biped Robots Using a Reinforcement Learning Approach. IETE J. Res. 2024, 70, 7831–7846. [Google Scholar] [CrossRef]
Song, W.; Zuo, Y.; Tong, S. Fuzzy Optimal Event-Triggered Control for Dynamic Positioning of Unmanned Surface Vehicle. IEEE Trans. Syst. Man Cybern.-Syst. 2025, 55, 2302–2311. [Google Scholar] [CrossRef]
Zhang, G.; Xing, Y.; Zhang, W.; Li, J. Prescribed Performance Control for USV-UAV via a Robust Bounded Compensating Technique. IEEE Trans. Control. Netw. Syst. 2025; early access. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of control effect of vessel DP system.

Figure 2. Flowchart of the entire process of the vessel DP system based on reinforcement learning under time-varying interference.

Figure 3. Comparison of position control performance under different algorithms [38].

Figure 4. Comparison of the vessel position vector variables [38].

Figure 5. Comparison of the vessel velocity vector variables [38].

Figure 6. Comparison of the vessel control inputs

(τ_{u}, τ_{v}, τ_{r})

[38].

Figure 6. Comparison of the vessel control inputs

(τ_{u}, τ_{v}, τ_{r})

[38].

Figure 7. Schematic diagram of the timing sequence for adjusting the pitch and azimuth angles of the propeller during the analysis of the parameter response of the thruster and the steady-state performance

p_{i}, i = 1, \dots, 6

.

Figure 7. Schematic diagram of the timing sequence for adjusting the pitch and azimuth angles of the propeller during the analysis of the parameter response of the thruster and the steady-state performance

p_{i}, i = 1, \dots, 6

.

Figure 8. The adaptive parameter

{\hat{λ}}_{k}, k = 1, \dots, 7

dynamic adjustment analysis diagram under the proposed algorithm.

Figure 8. The adaptive parameter

{\hat{λ}}_{k}, k = 1, \dots, 7

dynamic adjustment analysis diagram under the proposed algorithm.

Figure 9. Long-term effect strategy function

{\hat{J}}_{c i}, i = 1, 2, 3

time evolution analysis.

Figure 9. Long-term effect strategy function

{\hat{J}}_{c i}, i = 1, 2, 3

time evolution analysis.

Figure 10. The variation in the cost function based on actor–critic RL.

Figure 11. Adaptive weight matrix norm variation curve based on actor–critic framework.

Figure 12. Norm of the position and velocity errors.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, J.; Huang, W.; Huang, C.; Zhang, G. Enhancing Robust Adaptive Dynamic Positioning of Full-Actuated Surface Vessels: Reinforcement Learning Approach for Unknown Hydrodynamics. J. Mar. Sci. Eng. 2025, 13, 993. https://doi.org/10.3390/jmse13050993

AMA Style

Li J, Huang W, Huang C, Zhang G. Enhancing Robust Adaptive Dynamic Positioning of Full-Actuated Surface Vessels: Reinforcement Learning Approach for Unknown Hydrodynamics. Journal of Marine Science and Engineering. 2025; 13(5):993. https://doi.org/10.3390/jmse13050993

Chicago/Turabian Style

Li, Jiqiang, Wanjin Huang, Chenfeng Huang, and Guoqing Zhang. 2025. "Enhancing Robust Adaptive Dynamic Positioning of Full-Actuated Surface Vessels: Reinforcement Learning Approach for Unknown Hydrodynamics" Journal of Marine Science and Engineering 13, no. 5: 993. https://doi.org/10.3390/jmse13050993

APA Style

Li, J., Huang, W., Huang, C., & Zhang, G. (2025). Enhancing Robust Adaptive Dynamic Positioning of Full-Actuated Surface Vessels: Reinforcement Learning Approach for Unknown Hydrodynamics. Journal of Marine Science and Engineering, 13(5), 993. https://doi.org/10.3390/jmse13050993

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Robust Adaptive Dynamic Positioning of Full-Actuated Surface Vessels: Reinforcement Learning Approach for Unknown Hydrodynamics

Abstract

1. Introduction

2. Problem for Formulation and Preliminaries

2.1. Preliminaries

2.2. Dynamic Model of Marine Vessels

2.3. NN Function Approximation

3. Robust Adaptive Neural Cooperative Controller

3.1. Control Design

3.2. Critic and Actor NN Design

3.3. Stability Analysis

4. Numerical Simulation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI