Research on Path-Following Technology of a Single-Outboard-Motor Unmanned Surface Vehicle Based on Deep Reinforcement Learning and Model Predictive Control Algorithm

Cui, Bin; Chen, Yuanming; Hong, Xiaobin; Luo, Hao; Chen, Guanqiao

doi:10.3390/jmse12122321

Open AccessArticle

Research on Path-Following Technology of a Single-Outboard-Motor Unmanned Surface Vehicle Based on Deep Reinforcement Learning and Model Predictive Control Algorithm

by

Bin Cui

^1,2,

Yuanming Chen

³

,

Xiaobin Hong

^1,*,

Hao Luo

¹ and

Guanqiao Chen

¹

School of Mechanical & Automotive Engineering, South China University of Technology, Guangzhou 510641, China

²

Guangzhou Shipyard International Company Limited, Guangzhou 511462, China

³

School of Civil Engineering & Transportation, South China University of Technology, Guangzhou 510641, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(12), 2321; https://doi.org/10.3390/jmse12122321

Submission received: 9 November 2024 / Revised: 9 December 2024 / Accepted: 16 December 2024 / Published: 18 December 2024

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

Path following is one of the key technologies for unmanned surface vehicles (USVs). This paper proposes a path-tracking control method for a single-outboard-motor USV based on a Deep Deterministic Policy Gradient (DDPG) algorithm and model predictive control (MPC) algorithm. Initially, the motion model and outboard motor model of the USV are analyzed. Subsequently, simulation and real ship experiments provide a comprehensive performance comparison between the proposed DDPG-MPC method and the traditional ALOS-PID method. The results indicate that for straight path tracking, the DDPG-MPC algorithm achieves 37% and 21% reductions in the average cross error and heading angle error, respectively, compared to the ALOS-PID algorithm. The real ship experiments further validate the DDPG-MPC algorithm’s advantages in real-world environments. Specifically, under disturbances like wind, waves, and currents, the maximum cross error of the DDPG-MPC algorithm is one-third of the ALOS-PID algorithm. Additionally, the DDPG-MPC algorithm sustains a higher and more stable longitudinal velocity over extended periods, while the ALOS-PID algorithm shows greater instability and variability. Overall, the findings confirm the feasibility and effectiveness of the proposed approach, highlighting its potential for enhancing path-tracking control performance in single-outboard-motor USVs.

Keywords:

path following; unmanned surface vehicle; deep reinforcement learning; deep deterministic policy gradient; model predictive control

1. Introduction

Unmanned surface vehicles (USVs) are increasingly valued for their intelligence, maneuverability, and adaptability in various applications such as water patrols, hydrological surveys, maritime operations, rescue missions, and military usage [1,2,3,4]. Path following, a crucial technology for the autonomous navigation of USVs, focuses on maximizing efficiency and ensuring safety by maintaining adherence to a planned path within accepted error margins, as opposed to trajectory tracking, which involves strict timing at specific points [5,6,7]. Path-following technology principally involves path guidance and heading control components [8,9,10]. Among the many methods studied, line of sight (LOS) and vector field (VF) are prevalent [11,12]. These methods facilitate the control of underactuated vessels by translating the path-following challenge into a heading management task using well-elaborated guidance laws.

Several advancements have enhanced the LOS method’s efficacy. Wang et al. [13] introduced a Predictor-Based Fixed-Time LOS guidance law tailored for USVs navigating in unknown disturbances, featuring global fixed-time convergence that guarantees quick stabilization and robust performance, regardless of the initial conditions. Liu et al. [14] developed an adaptive LOS (ALOS) that finely adjusts to changes in sideslip, maintaining trajectory precision under varying conditions. Zhou et al. [15] enhanced this approach with their integral–differential LOS (IDLOS), which incorporates adaptive elements to speed convergence and minimize overshoot, optimizing response times and control precision. Further advancements came from Tong et al. [16], who proposed a finite-time, error-constrained LOS suitable for navigating curved paths under uncertain circumstances, expanding the versatility of LOS applications. Complementarily, Fossen et al. [17] crafted a nonlinear ALOS method that effectively counters environmental drift, such as wind and currents, ensuring uniform semi-global exponential stability (USGES) under linear trajectory conditions. In comparative analysis, ALOS guidance [16] stands out in dynamic environments due to its rapid adaptability and robust stability, while the ALOS law [17] offers finite-time convergence and cross-track error constraints, ideal for underactuated systems facing unpredictable challenges. On the other hand, Wang’s Predictor-Based Fixed-Time LOS [13] provides a resilient framework ensuring swift and precise control. Nonetheless, despite higher complexity and computational demands, these advanced LOS methods excel in scenarios requiring an immediate response and exceptional accuracy, with ALOS proving particularly effective in resource-limited settings. Hong et al. [18] refined LOS by integrating a variable “acceptance circle” radius to enhance flexibility during waypoint transitions, offering a significant improvement over traditional methods. In contrast, VF methods, developed initially for micro aerial vehicles, have been adapted for USVs [19,20,21], with comparative studies suggesting these methods provide excellent path adherence but may cause yaw instability [22]. Collectively, these adaptations balance robustness, convergence speed, and computational efficiency, pushing forward the development of unmanned ship path-following technology. However, the current research landscape in this domain is not without its limitations. The existing research and practical application of unmanned surface ship heading guidance laws based on the LOS method and the VF method have inherent defects [23]. The LOS method grapples with the challenge of striking a balance between the response time and steady-state error in forward distance settings [24]. Moreover, when transitioning between different desired paths, the convergence speed of the associated controller cannot be assured [25]. Conversely, the VF method is plagued by rudder oscillation in the horizontal direction [26].

Recent research [27,28,29,30,31] has newly incorporated deep reinforcement learning into USV path-following strategies, highlighting novel advancements in control methodologies. Zhao et al. [6] developed an enhanced deep Q-network (DQN) for the simplified control of three-degree-of-freedom USV models, unveiling significant potential for adaptive learning in maritime navigation scenarios. Zhong et al. [32] introduced a novel control approach utilizing a Deep Deterministic Policy Gradient (DDPG) algorithm designed around a composite state space and dynamic reward function, aimed at reducing the impacts of large inertia and state lag on RL agents’ training. Han et al. [33] has developed a system using distributed deep reinforcement learning and adaptive neural networks for straight-line path tracking and formation control in USVs, employing radial basis function neural networks (RBF NNs) to approximate the hydrodynamic forces and unknown external disturbances affecting the vehicles, demonstrating high precision in tracking control through simulation results. Furthermore, Zhu et al. [34] has proposed an intelligent path-tracking control strategy for underactuated USVs with unknown model parameters, combining LOS guidance and Q-learning to enhance path-following accuracy under uncertain dynamics. These advances underscore the growing versatility and challenge of applying deep reinforcement learning, showcasing its potential in achieving sophisticated maritime navigational tasks while highlighting inherent limitations such as computational complexity, data requirements, and response time to dynamic environmental changes.

However, a considerable part of the research still relies on differential twin propeller thrusters configured at the stern and cannot be directly converted to a single-outboard-motor setup [27,28,29,30,31], which highlights the need for specific solutions for different propulsion types. This paper introduces a novel path-following control method for a single-outboard-motor USV that combines deep reinforcement learning with model predictive control. This approach addresses the shortcomings of traditional LOS and VF approaches and is tailored to the specific propulsion characteristics of a single-outboard-motor USV. By implementing deep reinforcement learning for path tracking in a single-outboard-motor USV mission, we significantly improve the path-following capabilities. At the same time, our reinforcement learning method DDPG-MPC also improves the versatility of reinforcement learning-based path tracking in complex ocean environments, contributing innovation to autonomous USV path tracking. By employing this cutting-edge strategy, we extend the practical applicability of reinforcement learning to different operating conditions and improve the resilience and adaptability of USVs to a wider range of environmental challenges.

2. Kinematic Modeling

2.1. Kinematic Modeling of USV

The kinematic modeling of USVs involves multiple factors such as thruster forces and moments, viscous hydrodynamics and external disturbances, and has highly nonlinear and variable coupling characteristics, making it a very complex process [35]. It needs to balance two aspects. Firstly, in order to avoid the model being too complex and unable to guarantee real-time performance in subsequent control operations, reasonable assumptions and simplifications need to be made during modeling. The second is to avoid neglecting some key system characteristics of the model when simplifying it.

The kinematic modeling of USVs generally involves two coordinate systems, namely the world coordinate system

O X_{w} Y_{w} Z_{w}

and the body-fixed coordinate system

O X_{b} Y_{b} Z_{b}

[36], as shown in Figure 1. In this paper, the world coordinate system is set as the NED coordinate system. In this system, the six-degrees-of-freedom (DOF) motion vector of the USV can be represented as

η = {[x, y, z, φ, θ, ψ]}^{T}

, where

x

,

y

, and

z

denote displacements along the

X_{w}

,

Y_{w}

, and

Z_{w}

axes, respectively, and

φ

,

θ

, and

ψ

represent the roll, pitch, and yaw angles, corresponding to rotations around the

X_{w}

,

Y_{w}

, and

Z_{w}

axes.

In the body-fixed coordinate system, this paper defines the origin

O

as the center of gravity of the USV,

X_{b}

as the forward direction,

Y_{b}

as the starboard direction, and

Z_{b}

as the downward direction of the USV. The rotations around the

X_{b}

,

Y_{b}

, and

Z_{b}

axes produce roll rate

p

, pitch rate

q

, and yaw rate

r

, respectively. The translations along the

X_{b}

,

Y_{b}

, and

Z_{b}

axes generate surge speed

u

, sway speed

v

, and heave speed

w

. In the body-fixed coordinate system, the velocity of the USV can be represented by the vector

υ = {[u, v, w, p, q, r]}^{T}

.

In the study of USV dynamics, more emphasis is placed on the planar motion state of the USV. Therefore, the motions in roll, pitch, and heave are generally ignored. As a result, the motion of the USV can be represented by a three-degree-of-freedom motion model based on Lagrangian mechanics, as shown in Equation (1).

M \dot{υ} + C (υ) υ + D (υ) υ = τ

(1)

where

M

represents the inertia matrix,

C (υ)

represents the Coriolis and centripetal force matrix, and

D (υ)

represents the hydrodynamic damping matrix. The above terms can be calculated using the following formulas:

M = [\begin{matrix} m - X_{\dot{u}} & 0 & 0 \\ 0 & m - Y_{\dot{v}} & m x_{g} - Y_{\dot{r}} \\ 0 & m x_{g} - Y_{\dot{r}} & I_{z} - N_{\dot{r}} \end{matrix}]

(2)

C (υ) = [\begin{matrix} 0 & 0 & - (m - Y_{\dot{v}}) v - (m x_{g} - Y_{\dot{r}}) r \\ 0 & 0 & (m - X_{\dot{u}}) u \\ (m - Y_{\dot{v}}) v + (m x_{g} - Y_{\dot{r}}) r & - (m - X_{\dot{u}}) u & 0 \end{matrix}]

(3)

D (υ) = [\begin{matrix} - X_{u} & 0 & 0 \\ 0 & - Y_{v} & - Y_{r} \\ 0 & - N_{v} & - N_{r} \end{matrix}]

(4)

where

X_{\dot{u}}

,

Y_{\dot{v}}

,

Y_{\dot{r}}

,

N_{\dot{r}}

,

X_{u}

,

Y_{v}

,

Y_{r}

,

N_{v}

, and

N_{r}

are the hydrodynamic coefficients of the USV.

υ

represents the three-degree-of-freedom velocity vector of the USV in the body-fixed coordinate system as shown in Equation (5).

υ = {[\begin{matrix} u & v & r \end{matrix}]}^{T}

(5)

τ

represents the force exerted by the propeller on the USV as shown in Equation (6).

τ = {[\begin{matrix} τ_{u} & τ_{v} & τ_{r} \end{matrix}]}^{T}

(6)

Among them,

τ_{u}

is the longitudinal thrust,

τ_{v}

is the lateral thrust, and

τ_{r}

is the yaw moment. These three quantities can be calculated using Equation (7).

{\begin{cases} τ_{u} = T \cos δ \\ τ_{v} = T \sin δ \\ τ_{r} = x_{l} T \sin δ \end{cases}

(7)

where

T

represents the thrust of the outboard motor,

δ

represents the thrust angle of the outboard motor, and

x_{l}

represents the distance from the rotation axis of the outboard motor to the center of rotation of the USV.

The USV propelled by a single outboard engine only has two control variables: the thrust angle and thrust of the outboard engine, but the USV needs to control the three degrees of freedom, which belongs to the underactuated system. Based on the underactuated characteristics of a single-outboard-motor USV, the following definitions are made in the paper [37,38]:

m_{11} = m - X_{\dot{u}}

,

m_{22} = m - Y_{\dot{v}}

,

m_{23} = m x_{g} - Y_{\dot{r}}

,

m_{32} = m x_{g} - N_{\dot{v}}

,

m_{33} = I_{z} - N_{\dot{r}}

,

d_{11} = - X_{u}

,

d_{22} = - Y_{v}

,

d_{23} = - Y_{r}

,

d_{32} = - N_{v}

, and

d_{33} = - N_{r}

. So, Equation (1) can be transformed into Equation (8).

[\begin{matrix} m_{11} \dot{u} \\ m_{22} \dot{v} + m_{23} \dot{r} \\ m_{32} \dot{v} + m_{33} \dot{r} \end{matrix}] + [\begin{matrix} d_{11} u - m_{22} v r - m_{23} r^{2} \\ d_{22} v - d_{23} r + m_{11} u r \\ m_{22} v u + m_{23} r u + d_{32} v - m_{11} u v + d_{33} r \end{matrix}] = [\begin{matrix} τ_{u} \\ 0 \\ τ_{r} \end{matrix}]

(8)

Assuming that the center of gravity of the hull coincides with the geometric center and that the USV is symmetric in the fore–aft and port–starboard directions, we have

x_{g} = 0

,

Y_{\dot{r}} = 0

,

N_{\dot{v}} = 0

,

Y_{r} = 0

, and

N_{v} = 0

, allowing Equation (8) to be simplified to Equation (9).

{\begin{cases} \dot{u} = \frac{m_{22}}{m_{11}} v r - \frac{d_{11}}{m_{11}} u + \frac{1}{m_{11}} τ_{u} \\ \dot{v} = - \frac{m_{11}}{m_{22}} u r - \frac{d_{22}}{m_{22}} v \\ \dot{r} = \frac{m_{11} - m_{22}}{m_{33}} u v - \frac{d_{33}}{m_{33}} r + \frac{1}{m_{33}} τ_{r} \end{cases}

(9)

2.2. Kinematic Modeling of the Outboard Motor

As can be seen from Equation (7), the longitudinal thrust, lateral thrust, and steering moment exerted by the outboard motor on the USV are determined by the thrust

T

and thrust angle

δ

of the outboard motor. Therefore, it is necessary to model the thrust model and servo model of the single outboard motor.

2.2.1. The Outboard Motor Model

The outboard motor has the advantage of being easy to retract, allowing it to be stowed away when an amphibious USV comes ashore, thereby avoiding interference with the ground. This makes it suitable as a propulsion system for amphibious USVs. The thrust generated by the outboard motor is determined by the propeller’s rotational speed, and its thrust mathematical model is given by Equation (10) [39]:

T = (1 - t_{P}) ρ n^{2} D_{P}^{4} K_{T} (J_{p})

(10)

where

ρ

represents the seawater density,

n

denotes the propeller rotational speed, and

D_{P}

indicates the propeller diameter. Additionally,

t_{P}

stands for the reduction coefficient,

K_{T} (J_{p})

represents the thrust coefficient, and

J_{p}

indicates the advance coefficient. The calculation methods for these coefficients are described in detail below.

The reduction coefficient

t_{P}

can be approximated using the Hallrot empirical formula [40], which is suitable for single-propeller propulsion systems, as shown in Equation (11).

t_{P} = 0.001979 \frac{L}{(B - B C_{p 1})} + 1.0585 C_{10} - 0.000524 - 0.1418 \frac{D_{P}^{2}}{B d} + 0.0015 C_{s t e r n}

(11)

where

L

represents the waterline length,

B

denotes the beam of the vessel, and

d

indicates the draft depth. And

C_{p 1}

can be calculated using Equation (12).

C_{p 1} = 1.45 C_{P} - 0.315 - 0.225 L_{c b}

(12)

where

C_{P}

represents the prismatic coefficient and

L_{c b}

denotes the position of the forward half-length center of buoyancy.

C_{10}

represents a coefficient related to the length-to-beam ratio, which can be calculated using Equation (13).

C_{s t e r n}

denotes a coefficient related to the stern shape of the vessel, which can be determined using Equation (14).

{\begin{matrix} C_{10} = B / L & L / B > 5.2 \\ C_{10} = 0.25 - \frac{0.003328402}{B / L - 0.134615385} & L / B \leq 5.2 \end{matrix}

(13)

{\begin{cases} \begin{matrix} C_{s t e r n} = - 10 & V - S h a p e d H u l l \end{matrix} \\ \begin{matrix} C_{s t e r n} = 0 & C o n v e n t i o n a l H u l l \end{matrix} \\ \begin{matrix} C_{s t e r n} = + 10 & U - S h a p e d H u l l w i t h a H o g n e r S t e r n \end{matrix} \end{cases}

(14)

The thrust coefficient

K_{T}

can be calculated using multiple regression analysis [41], as illustrated in Equation (15).

K_{T} = \sum_{i = 0}^{n 1} \sum_{j = 0}^{n 2} A_{i j} {(\frac{P}{D})}^{i} J_{P}^{j}

(15)

where

J_{P} = \frac{(1 - w_{p}) u}{n D_{p}}

represents the advance coefficient, and

w_{p}

denotes the wake coefficient, which can be determined using the Bappmiller formula, as shown in Equation (16).

w_{p} = 0.165 C_{b}^{x} \sqrt{\frac{\sqrt[3]{\nabla}}{D_{p}}} - Δ w

(16)

where

\nabla

represents the displacement volume,

x

is a coefficient related to the propeller installation position, as shown in Equation (17), and

Δ w

represents the correction value for the wake coefficient, which can be expressed using Equation (18).

{\begin{matrix} x = 1 & P r o p e l l e r P o s i t i o n e d o n t h e C e n t e r l i n e \\ x = 2 & P r o p e l l e r P o s i t i o n e d o n t h e S i d e \end{matrix}

(17)

Δ w = {\begin{cases} 0.1 (F_{n} - 0.2), & F_{n} > 0.2 \\ 0, & F_{n} \leq 0.2 \end{cases}

(18)

where

F_{n} = V / \sqrt{g L}

is the Froude number and

g

is the gravitational acceleration.

2.2.2. The Servo Model

The outboard engine servo model can be represented as a first-order inertial component, which aligns with the approach by M. T. Meziou, J. Ghommam, and N. Derbel in their paper [42], as shown in Equation (19).

\dot{δ} = - \frac{1}{T_{d}} δ + \frac{1}{T_{d}} δ_{d}

(19)

where

T_{d}

represents the time coefficient,

δ

denotes the current propulsion angle, and

δ_{d}

indicates the target propulsion angle.

3. Path-Following Control Algorithm

This paper presents a design strategy that divides the path-following control framework of a single-outboard-motor USV into two key components: path-following guidance law and heading control. To overcome the limitations of conventional algorithms, it introduces a novel method integrating DDPG with MPC, as illustrated in Figure 2. This method computes the desired heading angle

ψ_{d}

from a pre-planned path and employs an MPC-based heading controller to adjust the USV’s heading angle, enabling effective path following.

3.1. Path-Following Guidance Law Based on DDPG Algorithm

3.1.1. Design of the Markov Decision Process

In reinforcement learning, constructing a Markov decision model is a key element. The algorithm’s efficiency and convergence heavily rely on accurately defining the state space, action space, and reward function [43]. Hence, to develop a high-performance guidance law for USV path following, it is crucial to design a suitable Markov decision model tailored to the specific requirements of single-outboard-motor USVs.

The state space is defined as follows:

S = [u, r, β, \dot{β}, ε, \dot{ε}, Δ ψ_{d}^{p}]

(20)

where

u

represents the longitudinal velocity of the USV,

r

denotes the USV’s pitch rate,

β

is the deviation between the USV’s heading angle and the desired path angle,

ε

is the cross error between the desired path and the USV’s real-time position, and

Δ ψ_{d}^{p}

is the action taken in the previous time step, which is the desired change in the heading angle of the USV.

In path-following control, greater emphasis is placed on the longitudinal velocity of the USV, while lateral drift velocity is often overlooked. To ensure the path-following guidance law can generate a real-time optimal desired heading angle based on the USV’s longitudinal velocity

u

, this study incorporates the longitudinal velocity into the state space. The state space includes the deviation

β

between the current heading angle and the desired path angle, as well as the cross-track error between the desired path and the USV’s real-time position, reflecting the discrepancy between the actual and desired positions. Furthermore, to evaluate if the USV’s state changes align with the trend of reducing tracking errors, the state space also integrates the rate of change in the deviation between the heading and path angles

\dot{β}

, the rate of change in the cross-track error, and the action taken in the previous time step

Δ ψ_{d}^{p}

.

The action space is defined as follows:

A = [Δ ψ_{d}]

(21)

Δ ψ_{d}

represents the change in the desired heading angle of the USV, and

Δ ψ_{d} \in [- \frac{π}{2}, \frac{π}{2}]

is such that the USV’s desired heading angle

ψ_{d}

can be calculated using formula

ψ_{d} = ψ + Δ ψ_{d}

, where

ψ

denotes the actual heading angle of the USV.

The reward function is defined as follows:

r = {\begin{cases} r_{ε} + r_{β} + r_{Δ ψ_{d}} | β | < \frac{π}{2} \\ r_{β} | β | \geq \frac{π}{2} \end{cases}

(22)

The definitions of each term are as follows:

r_{ε} = e^{- k_{ε} | ε |}

(23)

r_{β} = e^{- k_{β} | β |}

(24)

r_{Δ ψ_{d}} = e^{- k_{Δ ψ_{d}} | Δ ψ_{d} |}

(25)

where

k_{ε}

,

k_{β}

, and

k_{Δ ψ_{d}}

are constants, calibrated through multiple experiments, and

k_{ε} = 0.7

,

k_{β} = 4.82

, and

k_{Δ ψ_{d}} = 0.3

.

Equation (23) describes the reward function

r_{ε}

for the cross error

ε

between the actual position of the unmanned boat and the desired path. As shown in Figure 3a, the function exhibits the following characteristics: when the cross error

ε

is 0, the value

r_{ε}

reaches 1; as the value of

| ε |

increases, the reward value

r_{ε}

decreases rapidly. However, when the error

| ε |

increases to 8 m (approximately the length of a boat), the reward value

r_{ε}

is almost close to 0. This design encourages the agent to actively minimize the cross error during the interaction with the environment in the form of an exponential function, thereby achieving more accurate path tracking.

Equation (24) is the reward function

r_{β}

related to the deviation

β

between the USV’s heading angle and the desired path angle. The graph of this function is shown in Figure 3b. When the angle deviation

β

is 0, the value of

r_{β}

reaches its maximum of 1. To prevent the USV from heading away from the desired path, that is, when

| β | > \frac{π}{2}

occurs, the value of

r_{β}

remains near 0.

Equation (25) defines the reward function

r_{Δ ψ_{d}}

associated with changes in the USV’s desired heading angle

Δ ψ_{d}

. Illustrated in Figure 3c, this function is carefully designed to minimize abrupt alterations in the desired heading direction. By doing so, it ensures a smooth and continuous adjustment process, enhancing the stability and precision of the USV’s navigation path. This approach ultimately contributes to more efficient and reliable path tracking.

Equation (22) is the final reward function

r

after weighting the individual reward functions. When

| β | \geq \frac{π}{2}

occurs, it indicates that the deviation between the USV’s heading angle and the desired path angle is large, and the heading angle should be adjusted as soon as possible to prevent the deviation between the USV’s actual position and the desired tracking path from further increasing. Therefore,

r = r_{β}

is chosen; when

| β | < \frac{π}{2}

is the case, all three reward functions are considered, and

r = r_{ε} + r_{Δ ψ_{d}} + r_{β}

is chosen.

3.1.2. Implementation of the DDPG Algorithm

The DDPG algorithm is a type of deep reinforcement learning method built on the Actor–Critic framework [44]. Its structural principles are shown in Figure 4.

The DDPG agent consists of a policy network and a value network. The policy network, referred to as the Actor Network, takes the state vector

s_{t}

as input and produces the action vector

μ_{θ} (s_{t})

following the action policy, with

θ

representing the weights of the Actor Network. Structurally, the Actor Network includes an input layer, two hidden layers, and an output layer. The input layer’s size aligns with the state vector’s dimension, while the output layer matches the action vector’s dimension. ReLU is used as the activation function in the hidden layers, and the output layer applies the tanh function to normalize the action values, which are then scaled to the actual action range.

The value network, known as the Critic Network, takes the state vector

s_{t}

and the action vector

a_{t}

generated by the Actor Network as inputs and produces the action value

Q_{ω} (s_{t}, a_{t})

, with

ω

representing the weights of the Critic Network. The state vector is processed through two hidden layers, while the action vector passes through one hidden layer. These outputs are then combined via tensor addition, followed by an additional hidden layer and an output layer. ReLU serves as the activation function across all hidden layers. The training process of the DDPG agent includes the following steps:

➀: Randomly initialize the Actor Network $μ (s | θ)$ and Critic Network $Q (s, a | ω)$ ; initialize the target network’s actor $ω^{'} \leftarrow ω$ and $θ^{'} \leftarrow θ$ ; initialize the experience replay buffer $R$ .
➁: Begin a new episode by initializing the USV’s state randomly in the environment as the initial state $s_{1}$ . At each time step, the current state vector $s_{t}$ and action policy $μ_{θ} (s_{t})$ are combined with exploration noise $N_{t}$ to produce the action vector $a_{t} = μ_{θ} (s_{t}) + N_{t}$ , which defines the USV’s desired heading angle. The heading controller then tracks this angle, and the environment provides the updated state vector $s_{t + 1}$ . Meanwhile, rewards are computed using environmental data, such as cross-track and heading angle errors. Interaction data $(s_{t}, a_{t}, r_{t}, s_{t + 1})$ generated at each time step are saved in the experience replay buffer $R$ .
➂: Randomly sample N interaction data from the experience replay buffer $R$ to update the networks. Update the Critic Network by minimizing the loss function $L = \frac{1}{N} \sum_{i} {(y_{i} - Q_{ω} (s_{i}, a_{i}))}^{2}$ , where $y_{i} = r_{i} + γ {Q^{'}}_{ω^{'}} (s_{i + 1}, {μ^{'}}_{θ^{'}} (s_{i + 1}))$ represents the target value output by the target Critic Network. Update the Actor Network based on the sampled gradients, with the gradient formula given by the following:

$\nabla_{θ} μ |_{s_{t}} \approx \frac{1}{N} \sum_{i} \nabla_{a} Q_{ω} (s, a) |_{s = s_{t}, a = μ (s_{t})} \nabla_{θ} μ_{θ} (s) |_{s_{t}}$

(26)
➃: Perform soft updates on the parameters of the target Critic Network and target Actor Network, with the update formula as follows:

{\begin{cases} ω^{'} \leftarrow τ ω + (1 - τ) ω^{'} \\ θ^{'} \leftarrow τ θ + (1 - τ) θ^{'} \end{cases}

(27)

with

τ \in (0, 1)

.

➄: Return to step 2 to continue learning at the next time step. When reset conditions are met or learning exceeds the set time steps, start a new episode.

3.2. Heading Controller Design Based on MPC Algorithm

Assuming the longitudinal velocity

u

of the USV remains constant, the linearized model of the USV can be written in state-space form, resulting in the system state-space equations:

\begin{array}{l} \dot{x} (t) = A x (t) + B δ (t) \\ y (t) = C x (t) \end{array}

(28)

where

x = {[\begin{matrix} v & r & ψ \end{matrix}]}^{T}

is the system state vector,

δ

is the input variable, i.e., the thrust angle,

A

is the state transition matrix,

B

is the control input matrix,

y = {[ψ]}^{T}

is the system output variable, and

C = [\begin{matrix} 0 & 0 & 1 \end{matrix}]

is the output matrix. Discretizing the continuous state-space equations yields the following:

\begin{array}{l} x_{m} (k + 1) = A_{m} x_{m} (k) + B_{m} δ (k) \\ y (k) = C_{m} x_{m} (k) \end{array}

(29)

where

A_{m} = (A T + I)

,

B_{m} = B T

,

C_{m} = C

, and

T

represents the sampling period.

To address steady-state errors, incremental state-space equations are applied by substituting state variables and control inputs with their respective changes. Differentiating both sides of the state-space equations results in the following:

\begin{array}{l} x_{m} (k + 1) - x_{m} (k) = A_{m} (x_{m} (k) - x_{m} (k - 1)) + B_{m} (δ (k) - δ (k - 1)) \\ y (k + 1) - y (k) = C_{m} (x_{m} (k + 1) - x_{m} (k)) \end{array}

(30)

Define

Δ x_{m} (k + 1) = x_{m} (k + 1) - x_{m} (k)

,

Δ δ (k) = δ (k) - δ (k - 1)

, and substitute them to obtain the incremental state-space equations:

\begin{array}{l} [\begin{matrix} Δ x_{m} (k + 1) \\ y (k + 1) \end{matrix}] = [\begin{matrix} A_{m} & 0 \\ C_{m} A_{m} & 1 \end{matrix}] [\begin{matrix} Δ x_{m} (k) \\ y (k) \end{matrix}] + [\begin{matrix} B_{m} \\ C_{m} B_{m} \end{matrix}] Δ δ (k) \\ y (k) = [\begin{matrix} 0 & 0 & 0 & 1 \end{matrix}] [\begin{matrix} Δ x_{m} (k) \\ y (k) \end{matrix}] \end{array}

(31)

The matrix form can be written as follows:

\begin{array}{l} x (k + 1) = A_{k} x (k) + B_{k} Δ δ (k) \\ y (k) = C_{k} x (k) \end{array}

(32)

If the control input sequence over the prediction horizon is available, future system outputs can be estimated using the system model and current state. Let N denote the prediction horizon. The equations for system state prediction are as follows:

{\begin{cases} x (k + 1 | k) = A_{k} x (k) + B_{k} Δ δ (k) \\ x (k + 2 | k) = A_{k}^{2} x (k) + A_{k} B_{k} Δ δ (k) + B_{k} Δ δ (k + 1) \\ \dots \\ x (k + N | k) = A_{k}^{N} x (k) + A_{k}^{N - 1} B_{k} Δ δ (k) + \dots + A_{k} B_{k} Δ δ (k + N - 2) + B_{k} Δ δ (k + N - 1) \end{cases}

(33)

Substituting the output matrix

C_{k}

, the system output prediction equation is obtained as follows:

{\vec{y}}_{k + 1} = P x_{k} + H Δ {\vec{u}}_{k}

(34)

where

{\vec{y}}_{k + 1}

represents the predicted system output sequence at the current time step, and

{\vec{u}}_{k}

represents the current system input sequence.

P = [\begin{matrix} C_{k} A_{k} \\ C_{k} A_{k}^{2} \\ ⋮ \\ C_{k} A_{k}^{n} \end{matrix}]

(35)

H = [\begin{matrix} C_{k} B_{k} & 0 & \dots & 0 \\ C_{k} A_{k} B_{k} & C_{k} B_{k} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ C_{k} A_{k}^{n - 1} B_{k} & C_{k} A_{k}^{n - 2} B_{k} & \dots & C_{k} B_{k} \end{matrix}]

(36)

To achieve convergence of path-following errors while minimizing the variation in the thrust angle to reduce energy loss, the objective function for optimization is defined as follows:

\min J_{k} = \sum_{i = 1}^{N} {‖ y (k + i | k) - ψ_{d} ‖}_{Q}^{2} + \sum_{i = 1}^{N} {‖ Δ u (k + i - 1 | k) ‖}_{R}^{2}

(37)

where

N

is the total number of optimization steps,

Q = (\begin{matrix} q_{1} & \dots & q_{N} \end{matrix})

represents the output error weight matrix, and

R = (\begin{matrix} r_{1} & \dots & r_{N} \end{matrix})

represents the input weight matrix.

In practice, there are thrust angle saturation limits and thrust angle rate saturation limits for the outboard motor. These constraints must be considered during the study. The two types of constraints can be expressed as follows:

δ_{\min} \leq δ \leq δ_{\max}

(38)

Δ δ_{\min} \leq Δ δ \leq Δ δ_{\max}

(39)

where

δ_{\min}

and

δ_{\max}

represent the minimum and maximum thrust angles of the outboard motor, while

Δ δ_{\min}

and

δ_{\max}

represent the minimum and maximum thrust angle rates.

The optimization problem can be reformulated into a standard quadratic programming (QP) form, allowing the optimal thrust angle increment sequence to be determined. By adding the first element of this sequence to the previous thrust angle, the control input for the current time step is generated, and the process repeats. This approach, known as the rolling optimization strategy, enables the desired heading angle control based on MPC.

4. Experimental Results and Analysis

This paper conducts simulation and experimental research based on an independently developed amphibious USV propelled by a single outboard motor, as shown in Figure 5. The main technical parameters of the amphibious USV for experiment are shown in Table 1, and the main technical parameters of the outboard motor are shown in Figure 6. The full-scale test is conducted in the harbor of Guangzhou Shipyard International Co., Ltd., in Nansha District, Guangzhou, China (latitude: about N 22.7023°, longitude: about E 113.6505°), with sea conditions ranging from level 1 to level 2, wave height of about 0.2 m, sea surface wind force of about level 1 to level 2, and clear weather.

This section outlines the training process of the proposed DDPG-based path guidance law and examines the training results. To validate the efficiency and performance of the single-outboard-motor USV path-following algorithm combining DDPG and MPC, the high-performing ALOS-PID algorithm is used for the simulation and real ship comparison experiments.

4.1. Agent Training and Results Analysis

The software environment for the agent experiment is based on the Windows 10 operating system with a program implemented in PyTorch. The hardware setup includes an AMD R7-5900X CPU and an NVIDIA GeForce RTX 3090Ti GPU. The hyperparameter settings of the training process of the DDPG agent are shown in Table 2. Exploration noise

N_{t}

selected Ornstein–Uhlenbeck noise with a mean of 0 and a variance of 0.25.

At the start of each training episode, a 400 m straight path is defined, with the USV’s initial position and heading angle randomly set within a 100 m radius of the path’s starting point. The USV propeller thrust is randomly assigned a value between [4000, 6000] and remains constant throughout the episode, enabling the trained path guidance model to accommodate various USV postures and speeds. A total of 1000 episodes are trained, each containing 1000 steps with a time interval of 0.05 s. Details of the MPC heading controller parameters are provided in Table 3.

Figure 7 illustrates the cumulative reward curve for each episode during training. Initially, the cumulative reward remained steady at around 200. After approximately 50 training episodes, it rose rapidly and stabilized at about 900. Following another 100 episodes, the cumulative reward showed continuous growth, indicating that the agent was progressively learning a strategy to achieve higher rewards. This suggests that the path-following guidance law effectively reduces cross-tracking and heading angle errors, eventually stabilizing the cumulative reward at around 2700. This confirms the validity of the designed reward function, as it ensures convergence of the cumulative reward during training. Subsequently, the performance of the trained path-following guidance law is evaluated through comparative experiments.

4.2. Simulation Comparative Experiment

This study employs the nonlinear grey box model from Matlab’s system identification toolbox to determine the parameters of the USV’s three-degree-of-freedom model, resulting in Equations (40) and (41).

{\begin{cases} \dot{u} = 0.9294 v r - 0.4114 u + 0.0003 τ_{u} \\ \dot{v} = - 0.8771 u r - 2.4674 v \\ \dot{r} = - 0.0533 u v - 4.2943 r + 0.0001 τ_{r} \end{cases}

(40)

\dot{r} = - 1.5753 r + 0.3932 δ

(41)

A comparative simulation experiment of USV path following is conducted for a single straight path. The simulation model of the comparative simulation experiment is shown in Figure 8. The experimental path settings and the initial posture parameters of the USV are shown in Table 4. The parameter settings of the ALOS-PID algorithm are shown in Table 5. The simulation experiment results are shown in Figure 9, where the blue line represents the results of the DDPG-MPC algorithm proposed in this paper, and the red line represents the results of the ALOS-PID algorithm used for comparison. Figure 9a shows the comparison of the path-following trajectories of the two methods, Figure 9b shows the comparison of the cross error during the path-following process of the two methods, and Figure 9c shows the comparison of the heading angle error of the two methods.

Figure 9 demonstrates that the ALOS-PID algorithm responds faster. Regarding the cross error, the ALOS-PID algorithm reduces the cross error to zero for the first time in approximately 15 s, while the DDPG-MPC algorithm achieves this in about 30 s. For the heading angle error, both algorithms take roughly the same time to reach zero initially. However, the ALOS-PID algorithm exhibits a larger overshoot compared to the DDPG-MPC algorithm. After the cross error reaches zero, the ALOS-PID algorithm’s cross error gradually increases, peaking at around 16 m, whereas the DDPG-MPC algorithm remains stable. Similarly, the heading angle error shows a comparable trend, with the ALOS-PID algorithm peaking at approximately 7.3° after the initial zero point. Additionally, the DDPG-MPC algorithm outperforms the ALOS-PID algorithm in steady-state performance, as the latter retains some steady-state error even after 50 s of tracking.

By analyzing the average errors of the sampling points for both algorithms throughout the path-following process, as presented in Table 6, it is evident that the DDPG-MPC algorithm outperforms the ALOS-PID algorithm in terms of the average cross error and heading angle error. Specifically, the average cross error of the DDPG-MPC algorithm is approximately 37% lower than that of the ALOS-PID algorithm, while its average heading angle error is about 21% smaller.

In addition, this paper also conducted a comparative simulation experiment of USV path following of a multi-polyline path to compare and verify the performance of the algorithms at path turning points. The experimental path points and initial pose of the USV are shown in Table 7. And the simulation experiment results are shown in Figure 10.

It can be observed from Figure 10 that when the USV navigates along the longitudinal long straight path, the performance of the DDPG-MPC and ALOS-PID algorithms is basically consistent with the results of the single straight path simulation experiment. However, when the lateral short straight path segment switches to the longitudinal long straight path segment, the crossover error of the DDPG-MPC algorithm is slightly larger than that of the ALOS-PID algorithm. Table 8 compares the average errors of the two algorithms on the multi-fold path in detail. It is worth noting that the average crossover error of the DDPG-MPC algorithm is about 8.5% smaller than that of the ALOS-PID, but its average heading angle error is about 4.6% higher. This phenomenon can be attributed to the fact that the traditional ALOS-PID algorithm is prone to overshoot when dealing with right-angle turns. It quickly returns to the trajectory by adopting a larger angle, which performs better in some indicators, but causes subsequent trajectory oscillation. In contrast, the DDPG-MPC algorithm chooses a smoother adjustment strategy and uses a smaller angle to fit the path. Although it seems to have a larger error in related indicators, it effectively avoids the oscillation phenomenon and improves the overall path stability and tracking smoothness. This design achieves performance optimization and balance in a complex path environment.

Based on the above simulation results, it can be concluded that the DDPG-MPC algorithm outperforms the ALOS-PID algorithm in all aspects of single straight-line path following. When switching path inflection points in multi-polyline path following, although the cross error of the DDPG-MPC algorithm is slightly larger than that of the ALOS-PID algorithm, the above difference can be ignored in the overall path-following process. The above comparison results verify the effectiveness and superiority of the DDPG-MPC algorithm proposed in this paper in terms of overall performance.

4.3. Real Ship Comparative Experiment

After verifying the effectiveness of the algorithm through simulation comparative experiments, the proposed algorithm is applied to a real ship to carry out path-following real ship comparative experiments to further validate its performance in actual navigation. The experimental path points and the initial pose of the USV are shown in Table 9.

Figure 11 illustrates the comparison of path-following trajectories from the real ship experiment. The red line depicts the expected path, the blue line shows the trajectory of the ALOS-PID algorithm, and the yellow line represents the trajectory of the DDPG-MPC algorithm. The trajectory diagram indicates that, despite environmental interference such as wind, waves, and currents, both algorithms exhibit differences from the simulation results but can successfully complete the path-following task. Notably, the DDPG-MPC algorithm demonstrates superior overall path-following performance compared to the ALOS-PID algorithm.

Figure 12 compares the errors of the two algorithms during the experiment. The results show that the DDPG-MPC algorithm’s maximum cross error is around 10 m, while the ALOS-PID algorithm’s maximum cross error is approximately 30 m. As summarized in Table 10, the average cross error and heading angle error for each sampling point during the path-following process were computed. Both errors of the DDPG-MPC algorithm are lower than those of the ALOS-PID algorithm, with the average cross error decreased by 6.7% and the average heading angle error reduced by 60.0%.

The three velocity components of the USV in the real ship experiment are shown in Figure 13. It can be seen from the figure that the USV controlled by the DDPG-MPC algorithm can maintain a relatively high longitudinal speed for a long time, while the longitudinal speed of the USV controlled by the ALOS-PID algorithm is more unstable and varies relatively greatly.

5. Conclusions

This paper proposes a path-following control method for a single-outboard-motor USV, incorporating DDPG and MPC algorithms. The research includes modeling and analysis of the USV’s motion model and the outboard motor model. A comprehensive performance comparison with the traditional ALOS-PID method was conducted through simulations and real ship experiments, demonstrating the feasibility and effectiveness of the proposed approach.

In simulation experiments, whether for a single straight path or a multi-polyline path, the DDPG-MPC algorithm proposed in this study outperformed the ALOS-PID algorithm in the average cross error and heading angle error. Specifically, on the single straight path, the DDPG-MPC algorithm achieved an average cross error of 4.955 m, approximately 37% lower than the ALOS-PID algorithm’s 7.871 m. Regarding the heading angle error, the DDPG-MPC algorithm recorded an average of 11.1°, about 21% lower than the ALOS-PID algorithm’s 14.0°. Furthermore, the DDPG-MPC algorithm demonstrated superior stability and response time while effectively reducing the thruster’s energy consumption.

Real ship experiments confirmed the practical advantages of the DDPG-MPC algorithm. Despite external disturbances like wind, waves, and currents, the DDPG-MPC algorithm’s maximum cross error was approximately 10 m, compared to 30 m for the ALOS-PID algorithm. During path following, the DDPG-MPC algorithm achieved an average cross error of 6.747 m, 6.7% lower than the ALOS-PID algorithm’s 7.233 m. In terms of the heading angle error, the DDPG-MPC algorithm recorded an average of 13.8°, significantly lower than the 34.6° of the ALOS-PID algorithm, representing a 60% reduction. Moreover, the DDPG-MPC algorithm maintained a more stable and higher longitudinal velocity over a longer duration, whereas the ALOS-PID algorithm showed less stability and greater variability in longitudinal velocity.

Under real sea conditions, the DDPG-MPC algorithm demonstrated strong adaptability and robustness, effectively handling interference factors like wind, waves, and currents to accomplish path-following tasks. From the experimental results and comparative analysis, it can be inferred that the DDPG-MPC algorithm proposed in this study offers notable advantages in the overall performance of path following for single-outboard-motor USVs and holds significant potential for broader application and promotion.

Future research will aim to further optimize the algorithm to improve its real-time performance and robustness, making it applicable to more complex real-world navigation scenarios. These advancements are anticipated to deliver safer, more efficient, and stable path-following solutions for the autonomous navigation of single-outboard-motor USVs.

Author Contributions

Conceptualization, X.H. and Y.C.; formal analysis, B.C. and Y.C.; funding acquisition, X.H.; investigation, B.C., H.L., and Y.C.; methodology, X.H. and Y.C.; project administration, X.H.; software, H.L. and G.C.; supervision, X.H.; validation, B.C. and Y.C.; writing—original draft, B.C., H.L., and G.C.; writing—review and editing, Y.C. and X.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the 2024 Guangdong Provincial Marine Economy Development Special Project under grant GDNRC [2024]20.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Author Bin Cui was employed by the Guangzhou Shipyard International Company Limited The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in the paper:

USVs	unmanned surface vehicles
DDPG	Deep Deterministic Policy Gradient
MPC	model predictive control
VF	vector field
LOS	line of sight
ALOS	adaptive los
IDLOS	integral–differential los
USGES	uniform semi-global exponential stability
DQN	deep q-network
RBF NN	radial basis function neural networks
DOF	degrees of freedom
QP	quadratic programming

References

Zhao, L.; Qiu, S.; Chen, Y. Enhanced Water Surface Object Detection with Dynamic Task-Aligned Sample Assignment and Attention Mechanisms. Sensors 2024, 24, 3104. [Google Scholar] [CrossRef]
Chen, Y.; Hong, X.; Cui, B.; Peng, R. Implementation of an Efficient Image Transmission Algorithm for Unmanned Surface Vehicles Based on Semantic Communication. J. Mar. Sci. Eng. 2023, 11, 2280. [Google Scholar] [CrossRef]
Chen, Y.; Hong, X.; Chen, W.; Wang, H.; Fan, T. Experimental Research on Overwater and Underwater Visual Image Stitching and Fusion Technology of Offshore Operation and Maintenance of Unmanned Ship. J. Mar. Sci. Eng. 2022, 10, 747. [Google Scholar] [CrossRef]
Xiao, G.; Tong, C.; Wang, Y.; Guan, S.; Hong, X.; Shang, B. CFD simulation of the safety of unmanned ship berthing under the influence of various factors. Appl. Sci. 2021, 11, 7102. [Google Scholar] [CrossRef]
Aguiar, A.P.; Hespanha, J.P.; Kokotović, P.V. Path-following for nonminimum phase systems removes performance limitations. IEEE Trans. Autom. Control 2005, 50, 234–239. [Google Scholar] [CrossRef]
Zhao, Y.; Qi, X.; Ma, Y.; Li, Z.; Malekian, R.; Sotelo, M.A. Path following optimization for an underactuated USV using smoothly-convergent deep reinforcement learning. IEEE Trans. Intell. Transp. Syst. 2020, 22, 6208–6220. [Google Scholar] [CrossRef]
Lekkas, A.M.; Fossen, T.I. A time-varying lookahead distance guidance law for path following. IFAC Proc. 2012, 45, 398–403. [Google Scholar] [CrossRef]
Liu, Z.; Zhang, Y.; Yu, X.; Yuan, C. Unmanned surface vehicles: An overview of developments and challenges. Annu. Rev. Control 2016, 41, 71–93. [Google Scholar] [CrossRef]
Shamsuddin, P.N.F.b.M.; Mansorand, M.A.b. Motion control algorithm for path following and trajectory tracking for unmanned surface vehicle: A review paper. In Proceedings of the CRC 2018: The 3rd International Conference on Control, Robotics and Cybernetics, Penang, Malaysia, 26–28 September 2018. [Google Scholar]
Shan, Q.; Wang, X.; Li, T.; Chen, C.L.P. Finite-time control for USV path tracking under input saturation with random disturbances. Appl. Ocean Res. 2023, 138, 103628. [Google Scholar] [CrossRef]
Fossen, T.I.; Breivik, M.; Skjetne, R. Line-of-sight path following of underactuated marine craft. IFAC Proc. 2003, 36, 211–216. [Google Scholar] [CrossRef]
Nelson, D.R.; Barber, D.B.; McLain, T.W.; Beard, R.W. Vector field path following for miniature air vehicles. IEEE Trans. Robot. 2007, 23, 519–529. [Google Scholar] [CrossRef]
Wang, S.; Sun, M.; Xu, Y.; Liu, J.; Sun, C. Predictor-based fixed-time LOS path following control of underactuated USV with unknown disturbances. IEEE Trans. Intell. Veh. 2023, 8, 2088–2096. [Google Scholar] [CrossRef]
Liu, Z.; Song, S.; Yuan, S.; Ma, Y.; Yao, Z. ALOS-Based USV path-following control with obstacle avoidance strategy. J. Mar. Sci. Eng. 2022, 10, 1203. [Google Scholar] [CrossRef]
Zhou, G.; Lin, J.; Wu, J.; Liu, Z.; Wu, G.; Zhao, D.; Xu, C.; Zhang, H. An integral-differential LOS algorithm for USV path-tracking control. In Proceedings of the ACAIB 2023: The 3rd International Conference on Automation Control, Algorithm, and Intelligent Bionics, Xiamen, China, 28–30 April 2023. [Google Scholar]
Tong, H. An adaptive error constraint line-of-sight guidance and finite-time backstepping control for unmanned surface vehicles. Ocean Eng. 2023, 285, 115298. [Google Scholar] [CrossRef]
Fossen, T.I. An adaptive line-of-sight (ALOS) guidance law for path following of aircraft and marine craft. IEEE Trans. Control Syst. Technol. 2023, 31, 2887–2894. [Google Scholar] [CrossRef]
Hong, Z.; Wang, X.; Li, M.; Gu, Y.; Zhao, J.; Cao, X. Predictive Path Following for Unmanned Surface Vessel Based on Adaptive Line-of-Sight. In Proceedings of the CCC 2023: The 42nd Chinese Control Conference, Tianjin, China, 24–26 July 2023. [Google Scholar]
Papelis, Y.; Weate, M. Operations Architecture and Vector Field Guidance for the Riverscout Subscale Unmanned Surface Vehicle. In Proceedings of the DHSS 2013: The 3rd International Defense and Homeland Security Simulation Workshop, Athens, Greece, 25–27 September 2013. [Google Scholar]
Niu, H.; Lu, Y.; Savvaris, A.; Tsourdos, A. Efficient path following algorithm for unmanned surface vehicle. In Proceedings of the OCEANS 2016, Shanghai, China, 10–13 April 2016. [Google Scholar]
Woo, J.; Kim, N. Vector field based guidance method for docking of an unmanned surface vehicle. In Proceedings of the PACOMS 2016: The 12th Pacific-Asia Offshore Mechanics Symposium, Gold Coast, Australia, 4–7 October 2016. [Google Scholar]
Caharija, W.; Pettersen, K.Y.; Calado, P.; Braga, J. A comparison between the ILOS guidance and the vector field guidance. IFAC-PapersOnLine 2015, 48, 89–94. [Google Scholar] [CrossRef]
Zhang, Z.; Zhao, Y.; Zhao, G.; Wang, H.; Zhao, Y. Path-following control method for surface ships based on a new guidance algorithm. J. Mar. Sci. Eng. 2021, 9, 166. [Google Scholar] [CrossRef]
Liu, Z.; Yu, L.; Xiang, Q.; Qian, T.; Lou, Z.; Xue, W. Research on USV Trajectory Tracking Method Based on LOS Algorithm. In Proceedings of the 2021 14th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, 11–12 December 2021; pp. 408–411. [Google Scholar] [CrossRef]
Gonzalez-Garcia, A.; Castañeda, H. Guidance and control based on adaptive sliding mode strategy for a USV subject to uncertainties. IEEE J. Ocean. Eng. 2021, 46, 1144–1154. [Google Scholar] [CrossRef]
Xu, H.T.; Guedes Soares, C. Waypoint-following for a marine surface ship model based on vector field guidance law. Marit. Technol. Eng. 2016, 3, 409–418. [Google Scholar]
Gonzalez-Garcia, A.; Castañeda, H.; Garrido, L. USV Path-Following Control Based On Deep Reinforcement Learning and Adaptive Control. In Proceedings of the Global Oceans 2020: Singapore—U.S. Gulf Coast, Biloxi, MS, USA, 5–30 October 2020; pp. 1–7. [Google Scholar]
Zhao, Y.; Qi, X.; Incecik, A.; Ma, Y.; Li, Z. Broken lines path following algorithm for a water-jet propulsion USV with disturbance uncertainties. Ocean Eng. 2020, 201, 107118. [Google Scholar] [CrossRef]
Mou, J.; He, Y.; Zhang, B.; Li, S.; Xiong, Y. Path Following of a Water-Jetted USV Based on Maneuverability Tests. J. Mar. Sci. Eng. 2020, 8, 354. [Google Scholar] [CrossRef]
Yang, Z.; Lai, S.; Hong, X.; Shi, Y.; Cheng, Y.; Qing, C. DFAEN: Double-order knowledge fusion and attentional encoding network for texture recognition. Expert Syst. Appl. 2022, 209, 118223. [Google Scholar] [CrossRef]
Xu, Z.; Hong, X.; Chen, T.; Yang, Z.; Shi, Y. Scale-aware squeeze-and-excitation for lightweight object detection. IEEE Robot. Autom. Lett. 2022, 8, 49–56. [Google Scholar] [CrossRef]
Zhong, W.; Li, H.; Meng, Y.; Yang, X.; Feng, Y.; Ye, H.; Liu, W. USV path following controller based on DDPG with composite state-space and dynamic reward function. Ocean Eng. 2022, 266, 112449. [Google Scholar] [CrossRef]
Han, Z.; Wang, Y.; Sun, Q. Straight-path following and formation control of USVs using distributed deep reinforcement learning and adaptive neural network. IEEE/CAA J. Autom. Sin. 2023, 10, 572–574. [Google Scholar] [CrossRef]
Zhu, D.; Pan, Y.-J.; Wang, T.; Liu, S.; Pei, W. Improved Line-of-Sight Path Following Control for Underactuated USVs with Unknown Parameters Using Q-learning. In Proceedings of the 2024 IEEE 7th International Conference on Industrial Cyber-Physical Systems (ICPS), St. Louis, MO, USA, 12–15 May 2024; pp. 1–6. [Google Scholar]
Hong, S.M.; Ha, K.N.; Kim, J.Y. Dynamics modeling and motion simulation of usv/uuv with linked underwater cable. J. Mar. Sci. Eng. 2020, 8, 318. [Google Scholar] [CrossRef]
Setiawan, F.A.; Kadir, R.E.A.; Gamayanti, N.; Santoso, A.; Bilfaqih, Y.; Hidayat, Z. Dynamic modelling and controlling unmanned surface vehicle. In Proceedings of the SIDIIC 2019: The Sustainable Islands Development Initative International Conference, Surabaya, Indonesia, 2–3 September 2019. [Google Scholar]
Liu, T.; Dong, Z.; Du, H.; Song, L.; Mao, Y. Path following control of the underactuated USV based on the improved line-of-sight guidance algorithm. Pol. Marit. Res. 2017, 24, 3–11. [Google Scholar] [CrossRef]
Mu, D.; Wang, G.; Fan, Y.; Sun, X.; Qiu, B. Modeling and identification for vector propulsion of an unmanned surface vehicle: Three degrees of freedom model and response model. Sensors 2018, 18, 1889. [Google Scholar] [CrossRef] [PubMed]
Sonnenburg, C.R.; Woolsey, C.A. Modeling, identification, and control of an unmanned surface vehicle. J. Field Robot. 2013, 30, 371–398. [Google Scholar] [CrossRef]
Ueno, M.; Tsukada, Y. Estimation of full-scale propeller torque and thrust using free-running model ship in waves. Ocean Eng. 2016, 120, 30–39. [Google Scholar] [CrossRef]
Öztürk, O.B.; Başar, E. Multiple linear regression analysis and artificial neural networks based decision support system for energy efficiency in shipping. Ocean Eng. 2022, 243, 110209. [Google Scholar] [CrossRef]
Taktak-Meziou, M.; Ghommam, J.; Derbel, N. Adaptive backstepping neural network approach to ship course control. In Proceedings of the SSD 2011: The 8th International Multi-Conference on Systems, Signals and Devices, Sousse, Tunisia, 22–25 March 2011. [Google Scholar]
Wang, C.; Zhang, X.; Cong, L.; Li, J.; Zhang, J. Research on intelligent collision avoidance decision-making of unmanned ship in unknown environments. Evol. Syst. 2019, 10, 649–658. [Google Scholar] [CrossRef]
Du, Y.; Zhang, X.; Cao, Z.; Wang, S.; Liang, J.; Zhang, F.; Tang, J. An optimized path planning method for coastal ships based on improved DDPG and DP. J. Adv. Transp. 2021, 2021, 7765130. [Google Scholar] [CrossRef]

Figure 1. Coordinate system for kinematic modeling of USV.

Figure 2. Algorithm framework for path-following control of single-outboard-motor USV.

Figure 3. The reward function. (a) The reward function

r_{ε}

related to the cross-track error

ε

between the USV’s actual position and the desired tracking path. (b) The reward function

r_{β}

related to the deviation

β

between the USV’s heading angle and the desired path angle. (c) The reward function

r_{Δ ψ_{d}}

related to the change in the desired heading angle

Δ ψ_{d}

of the USV.

Figure 3. The reward function. (a) The reward function

r_{ε}

related to the cross-track error

ε

between the USV’s actual position and the desired tracking path. (b) The reward function

r_{β}

related to the deviation

β

between the USV’s heading angle and the desired path angle. (c) The reward function

r_{Δ ψ_{d}}

related to the change in the desired heading angle

Δ ψ_{d}

of the USV.

Figure 4. Framework of the DDPG algorithm.

Figure 5. Amphibious USV propelled by single outboard motor for experimentation.

Figure 6. Outboard engine parameters.

Figure 7. Cumulative reward value curve for each episode.

Figure 8. MATLAB path-following simulation model.

Figure 9. Simulation experiment results of path following of a single straight path. (a) The comparison of the path-following trajectories. (b) The comparison of the cross error. (c) The comparison of the heading angle error.

Figure 10. Simulation experiment results of path following of multi-polyline path. (a) Comparison of path-following trajectories. (b) Comparison of cross error. (c) Comparison of heading angle error.

Figure 11. Path-following trajectories of real ship experiments. (a) Comparison of path-following trajectories. (b) Aerial view of path-following experiment.

Figure 12. Comparison of path-following error in real ship experiment. (a) Comparison of cross error. (b) Comparison of heading angle error.

Figure 13. Comparison of the three-degree-of-freedom speeds of the USV in real ship experiment. (a) Controlled by the DDPG-MPC algorithm. (b) Controlled by the ALOS-PID algorithm.

Table 1. Main technical parameters of the amphibious USV for experiment.

Parameter	Value
Length overall	8 m
Extreme breadth	2.3 m
Load draft	0.55 m
Maximum speed	25 kn
Propeller diameter	$1 4^{″}$
Maximum propulsion angle	30°

Table 2. Hyperparameter settings of the training process of the DDPG agent.

Parameter	Value
Number of neurons in the hidden layer	256
Optimizer	Adam
Network learning rate	0.0001
Soft update factor $τ$	0.005
Reward discount factor $γ$	0.99
Batch size	256
Experience replay pool size	1 × 106

Table 3. Parameter settings of the MPC heading controller.

Parameter	Value
Predicted step length $N_{p}$	10
Dwell time	0.1 s
Propulsion angle constraints	$- 30^{\circ} \leq δ \leq 30^{\circ}$
Propulsion angular velocity constraint	$- 10^{\circ} / s \leq Δ δ \leq 10^{\circ} / s$
Weight parameters $Q$	1
Weight parameters $R$	1

Table 4. Parameter settings for path-following experiment of a single straight path.

Parameter	Value
Starting point of the path	(0, 0)
End point of the path	(0, 500)
Direction angle of the path	90°
Initial position of the USV	(40, 0)
Initial heading angle of the USV	0°

Table 5. Parameter settings related to the ALOS-PID algorithm.

Parameter	Value
Front visual distance $Δ$ (m)	20
The adaptive gain coefficient $γ$	0.0013
Observer gain coefficient $K_{f}$	0.2
Determine radius $r_{s}$ (m)	8
$K_{p}$	0.83
$K_{i}$	0
$K_{d}$	1.1

Table 6. Comparison of average error of path following of a single straight path.

Algorithm	Average Cross Error (m)	Average Heading Angle Error (°)
ALOS-PID	7.871	14.0
DDPG-MPC (Ours)	4.955	11.1

Table 7. Parameter settings for path-following experiment of multi-polyline path.

Parameter	Value
Knuckle point of the path	(0, 0), (0, 300), (50, 300), (50, 0), (100, 0), (100, 300), (150, 300), (150, 0), (200, 0), (200, 300)
Initial position of the USV	(−30, −30)
Initial heading angle of the USV	0°

Table 8. Comparison of average error of path following of multi-polyline path.

Agorithm	Average Cross Error (m)	Average Heading Angle Error (°)
ALOS-PID	4.801	10.8
DDPG-MPC (Ours)	4.394	11.3

Table 9. Parameter settings for real ship comparative experiment.

Parameter	Value
The latitude and longitude of the path point	(N 22.704023, E 113.644455), (N 22.700570, E 113.646854), (N 22.694336, E 113.648297), (N 22.690591, E 113.645698), (N 22.689223, E 113.648798)
Initial position of the USV	The latitude and longitude: (N 22.704600, E 113.644300)
Initial heading angle of the USV	20.0°

Table 10. Comparison of average error of path following of real ship experiment.

Algorithm	Average Cross Error (m)	Average Heading Angle Error (°)
ALOS-PID	7.233	34.6
DDPG-MPC (Ours)	6.747	13.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cui, B.; Chen, Y.; Hong, X.; Luo, H.; Chen, G. Research on Path-Following Technology of a Single-Outboard-Motor Unmanned Surface Vehicle Based on Deep Reinforcement Learning and Model Predictive Control Algorithm. J. Mar. Sci. Eng. 2024, 12, 2321. https://doi.org/10.3390/jmse12122321

AMA Style

Cui B, Chen Y, Hong X, Luo H, Chen G. Research on Path-Following Technology of a Single-Outboard-Motor Unmanned Surface Vehicle Based on Deep Reinforcement Learning and Model Predictive Control Algorithm. Journal of Marine Science and Engineering. 2024; 12(12):2321. https://doi.org/10.3390/jmse12122321

Chicago/Turabian Style

Cui, Bin, Yuanming Chen, Xiaobin Hong, Hao Luo, and Guanqiao Chen. 2024. "Research on Path-Following Technology of a Single-Outboard-Motor Unmanned Surface Vehicle Based on Deep Reinforcement Learning and Model Predictive Control Algorithm" Journal of Marine Science and Engineering 12, no. 12: 2321. https://doi.org/10.3390/jmse12122321

APA Style

Cui, B., Chen, Y., Hong, X., Luo, H., & Chen, G. (2024). Research on Path-Following Technology of a Single-Outboard-Motor Unmanned Surface Vehicle Based on Deep Reinforcement Learning and Model Predictive Control Algorithm. Journal of Marine Science and Engineering, 12(12), 2321. https://doi.org/10.3390/jmse12122321

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Path-Following Technology of a Single-Outboard-Motor Unmanned Surface Vehicle Based on Deep Reinforcement Learning and Model Predictive Control Algorithm

Abstract

1. Introduction

2. Kinematic Modeling

2.1. Kinematic Modeling of USV

2.2. Kinematic Modeling of the Outboard Motor

2.2.1. The Outboard Motor Model

2.2.2. The Servo Model

3. Path-Following Control Algorithm

3.1. Path-Following Guidance Law Based on DDPG Algorithm

3.1.1. Design of the Markov Decision Process

3.1.2. Implementation of the DDPG Algorithm

3.2. Heading Controller Design Based on MPC Algorithm

4. Experimental Results and Analysis

4.1. Agent Training and Results Analysis

4.2. Simulation Comparative Experiment

4.3. Real Ship Comparative Experiment

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI