Enhancing Trajectory Tracking Performance of Underwater Gliders Using Finite-Time Sliding Mode Control Within a Reinforcement Learning Framework

Wang, Guohui; Yu, Jianing; Yang, Yanan

doi:10.3390/jmse13050884

Open AccessArticle

Enhancing Trajectory Tracking Performance of Underwater Gliders Using Finite-Time Sliding Mode Control Within a Reinforcement Learning Framework

by

Guohui Wang

¹

,

Jianing Yu

¹ and

Yanan Yang

^2,*

¹

School of Mechanical Engineering and Automation, Shanghai University, No. 333, Nanchen Road, Baoshan District, Shanghai 200444, China

²

School of Mechanical Engineering, Tianjin University, No. 135 Yaguan Road, Jinnan District, Tianjin 300350, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2025, 13(5), 884; https://doi.org/10.3390/jmse13050884

Submission received: 25 March 2025 / Revised: 17 April 2025 / Accepted: 24 April 2025 / Published: 29 April 2025

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

Underwater gliders, as autonomous underwater vehicles, are integral to oceanographic research, environmental monitoring, and military applications. Given the intricate and ever-changing underwater environment, the precise management of an underwater glider’s dive depth and pitch angle is imperative for optimal functionality.This study introduces a finite-time sliding mode control method for controlling dive depth and pitch angle of underwater gliders. It incorporates a radial basis function neural network in a critic–actor reinforcement learning framework, enhancing navigational performance in difficult conditions. Sea trial data are used to create a dynamic model for the underwater glider, which is then used to design a control law. Sliding mode control is used to align the dive depth and pitch angle with the desired trajectory. Actor and critic neural networks are used to handle disturbances and evaluate error costs. By incorporating standard deviation update technique into actor and critic neural networks, along with weight updates, we improve controller stability and reduce errors in maintaining dive depth and pitch angle. Our approach is proven to be more effective than traditional SMC and reinforcement learning SMC methods in trajectory tracking, even in the presence of disturbances, as shown in the simulation results.

Keywords:

underwater glider; finite-time sliding mode control; reinforcement learning; standard deviation update; trajectory tracking

1. Introduction

Underwater exploration is a frontier field in marine science and technology, playing a vital role in resource discovery, environmental monitoring, and ecological research [1,2,3]. In the face of global climate change and rising demand for marine resources, this field presents both significant opportunities and complex challenges. Common platforms include Remotely Operated Vehicles (ROVs), Autonomous Underwater Vehicles (AUVs), and underwater gliders [4,5]. ROVs and AUVs, propelled by thrusters, offer high maneuverability and precise control, making them suitable for tasks such as maintenance, salvage operations, and scientific exploration. In contrast, underwater gliders achieve propulsion through variable buoyancy and hydrodynamic lift, enabling slower yet more energy-efficient movement [6]. Their low noise and extended range make them ideal for long-duration marine monitoring missions.

Despite these advantages, underwater gliders face challenges such as environmental disturbances and unmodeled dynamics [7,8], necessitating advanced control algorithms for precise and reliable operation. Early studies primarily employed conventional control strategies, including Proportional-Integral-Derivative (PID) and Linear Quadratic Regulator (LQR) control. Leonard et al. [9] developed a LQR controller based on a linearized system model, enabling accurate vertical trajectory tracking. Liu et al. [10] demonstrated the effectiveness of PID control in regulating the depth and pitch angle. Wu et al. later validated the applicability of the LQR method for basic motion control, though their results showed that heading adjustment from −1.5 to 1.5 rad required approximately 1000 s, while pitch angle control reached the target within 30 s.

As traditional methods struggle with dynamic and nonlinear underwater environments, modern control strategies—such as adaptive, robust, and sliding mode control—have been introduced. Adaptive control is particularly valued for its real-time parameter adjustment capability [11]. Wan et al. [12] applied an adaptive extended state observer in shallow-water navigation, while Sang et al. [13] implemented a fuzzy PID controller for heading tracking. Robust control methods, known for their disturbance rejection, have also gained traction. Nguyen et al. and Garc’ıa et al. [14,15] proposed robust adaptive controllers for heading stabilization, although these methods depended heavily on accurate system modeling.

In parallel, sliding mode control techniques have been explored for their effectiveness in trajectory tracking [16,17]. Zhang et al. [18] proposed an adaptive non-singular integral terminal sliding mode control (ANITSMC) method, which ensures finite-time convergence and robust performance under a bounded range of external disturbances. Zhou et al. [19] developed an adaptive robust sliding mode approach to counter parameter uncertainties, though the well-known chattering problem remains a concern. Zou et al. [20] further improved upon this by proposing a non-singular fast-terminal sliding mode controller for fixed-time tracking. Meanwhile, Gupta Roy et al. [21] presented variants of the sliding mode and active disturbance rejection control (ADRC), demonstrating advantages in energy efficiency and disturbance rejection.

In contemporary control system research, data-driven methodologies, including neural network control and reinforcement learning (RL), are increasingly recognized as state-of-the-art technologies for tackling complex control challenges. These innovative approaches demonstrate remarkable adaptability and intelligence, particularly within precision control systems such as underwater gliders. Recent studies have reported substantial advancements in this field. For instance, Juan et al. introduced a Heterogeneous Agent Asynchronous Policy Gradient (HAAPG) method grounded in model-free reinforcement learning, which achieves high-precision motion control by dynamically adjusting the rotation of a movable mass block to effectively counteract external disturbances [22]. Simultaneously, Wang and colleagues developed an adaptive sliding mode fault-tolerant controller that marked a significant advancement in addressing the modeling uncertainties of underwater gliders, achieving a mean integral absolute error of 0.29 and a convergence time of 24.64 s [23]. Notably, the advancement of neural network-based control technologies has garnered significant scholarly attention. Lei et al. successfully combined physical modeling principles with data-driven methodologies, thereby markedly improving the modeling accuracy of underwater gliders [24]. Building upon this foundation, Jeong and Gao introduced a neural network-based self-tuning PID controller and a model-free neural adaptive attitude control strategy, both of which exhibited robust control stability under uncertain conditions [25,26]. These methodologies are characterized by their ability to effectively learn and interpret nonlinear system behaviors, demonstrating enhanced perception and adaptability in complex dynamic systems [27]. Furthermore, the research conducted by Su and Zang underscores the potential of reinforcement learning in tasks related to attitude control and heading tracking [28,29]. By integrating traditional control strategies, such as PID control, with sophisticated machine learning algorithms, researchers are progressively addressing the limitations inherent in conventional control systems.

Recent advancements in sensing and communication technologies have markedly enhanced the performance of underwater glider control systems. In particular, the high-precision fiber-optic sensing technology introduced by Jawad Mirza et al. plays a pivotal role in augmenting the disturbance rejection capabilities and stability of data-driven control algorithms [30,31]. Complementary to this, Wang et al. have developed a cooperative data compression and communication strategy for underwater oil spill detection, employing support vector machines and density-based spatial clustering techniques [32]. Concurrently, significant progress has been achieved in the domain of trajectory tracking control strategies. A range of methodologies—including quadratic programming, PID control, and sliding mode control—have been rigorously investigated and implemented across various fields such as ground vehicles, unmanned aerial vehicles (UAVs), and surface vessels [33]. The fundamental principles underlying these methods—such as dynamic response optimization, enhanced disturbance rejection, and robust system design—offer substantial insights and advancements.

Based on the literature reviewed, it is clear that current control methods for underwater gliders encounter several significant challenges, such as slow response times, limited disturbance rejection capabilities, and inadequate adaptability to dynamic environments. To address these challenges, this paper introduces a SMC strategy enhanced by Critic–Actor Reinforcement Learning with Standard Deviation updating (SD-RLSMC). The primary innovations of this approach include the following:

1: Precise hydrodynamic parameter identification and dynamic modeling: Hydrodynamic parameters are accurately identified using the Trust Region Reflective Algorithm, based on experimental data obtained from the South China Sea. A dynamic model is subsequently established, incorporating a depth correction term and nonlinear hydrodynamic effects. The model’s accuracy is validated by comparing the root-mean-square error (RMSE) between the proposed full-order model and experimental results.
2: Critic–actor reinforcement learning framework with radial basis function neural networks: Radial Basis Function (RBF) neural networks are embedded within a critic–actor reinforcement learning framework to construct a unified perturbation estimation model, enabling the real-time approximation of unmodeled dynamics and external disturbances. Compared to conventional sliding mode control, the proposed SD-RLSMC approach reduces the RMSE by approximately 42% in disturbance rejection scenarios and substantially enhances trajectory tracking performance.
3: Standard deviation adjustment: An adaptive standard deviation updating mechanism based on gradient descent is introduced to dynamically regulate the local approximation capability of RBF neural networks. This mechanism improves the approximation accuracy for complex nonlinear systems, enhances controller adaptability under varying environmental conditions, and reduces the required control effort.

The remainder of this paper is organized as follows: Section 2 presents the dynamic modeling and hydrodynamic parameter identification of the underwater glider. Section 3 introduces a finite-time sliding mode control strategy based on reinforcement learning. Section 4 evaluates the proposed controller’s performance through numerical simulations. Finally, Section 5 concludes the paper and outlines potential directions for future research.

2. Dynamic Modeling and Parameter Identification

In this section, a dynamic model of the underwater glider is proposed for controller design. A parameter identification methodology is applied to acquire the hydrodynamic coefficients, rotational inertia, and other relevant parameters. The model’s accuracy is confirmed through validation with additional data.

2.1. Model Description

Figure 1 displays a schematic diagram of an underwater glider, along with its vertical plane coordinate system. In accordance with Fossen’s definition, the movement and location of an underwater glider can be characterized with the following two coordinate systems: the inertial coordinate system and the body-fixed coordinate system [34].

To formulate a dynamic model for tracking vertical plane trajectories, the following assumptions shall be established [35]:

1: The center of buoyancy is typically considered to be fixed, a condition supported by the glider’s structural symmetry and inherent hydrostatic stability. However, in practical scenarios, minor shifts in the center of buoyancy may occur due to fluid disturbances or structural deformations. These shifts are generally small in magnitude, evolve gradually, and can therefore be treated as bounded, unmodeled dynamic perturbations.
2: The influence of actuator motion on mass distribution is negligible. The glider’s actuators, such as the buoyancy and attitude adjustment units, operate slowly and have a limited capacity for mass adjustments, resulting in minimal dynamic impact on the overall mass distribution. Any unmodeled perturbations can be incorporated into the bounded uncertainty analysis.
3: The pitch angle $θ$ is constrained between $- π / 2$ and $π / 2$ to prevent the occurrence of inverted or runaway gliders in the vertical plane. In practice, the physical limitations of the actuators, such as the operational range of the hydraulic pump, along with stability requirements, ensure that the pitch angle remains within this range. Exceeding this range may lead to control failures or complicate the dynamic behavior.

When we focus on the motion of an underwater glider in the vertical plane, a simplified dynamic model can be obtained according to the literature [20,36,37]:

\{\begin{matrix} \dot{z} (t) = - v_{1} (t) sin (θ (t)) + v_{3} (t) cos (θ (t)) \\ \dot{θ} (t) = q (t) \\ \dot{q} (t) = M + B r_{p 1} (t) \\ {\dot{v}}_{1} (t) = \frac{1}{M_{f 1} + m (t)} (F_{1} + F_{2} + F_{3}) \\ {\dot{v}}_{3} (t) = \frac{1}{M_{f 3} + m (t)} (F_{4} + F_{5}) \end{matrix}

(1)

where the following holds:

\begin{matrix} F_{1} & = - q (t) v_{3} (t) (M_{f 3} + m (t)) - \dot{q} (t) m_{p} r_{p 3} \\ F_{2} & = - D (t) cos (α (t)) + L (t) sin (α (t)) \\ F_{3} & = - m_{0} g sin (θ (t)) + q^{2} (t) (m_{p} r_{p 1} (t) + m_{b} (t) r_{b 1}) \\ F_{4} & = - D (t) sin (α (t)) - L (t) cos (α (t)) \\ F_{5} & = m_{0} g cos (θ (t)) + q (t) v_{1} (t) (M_{f 1} + m (t)) \\ M & = \frac{1}{E} (T_{2} (t) - M_{b} r_{b 1} g cos (θ (t)) - m_{p} r_{p 3} g sin (θ (t)) + v_{1} (t) v_{3} (t) (M_{f 3} - M_{f 1} - m_{p}) \\ - ({\dot{v}}_{1} (t) + v_{1} (t) q (t)) m_{p} r_{p 3} + {\dot{v}}_{3} (t) m_{b} r_{b 1} - v_{1} (t) q (t) m_{b} r_{b 1}) \\ B & = \frac{1}{E} (- m_{p} g cos (θ (t)) + {\dot{v}}_{3} (t) m_{p} - v_{1} (t) q (t) m_{p} - 2 q (t) m_{p} {\dot{r}}_{p 1} (t)) \end{matrix}

In addition, the velocities in the body-fixed coordinate system are represented by

v_{1} (t)

,

v_{3} (t)

, and

q (t)

. The values of

v_{1} (t)

and

v_{3} (t)

can be obtained through inertial measurement unit (IMU) sensors. The variables

z (t)

and

θ (t)

correspond to the submersion depth and pitch angle of the inertial coordinate system, respectively. The total mass of the glider can be calculated by

m (t) = m_{rb} + m_{p} + m_{b} (t)

, where

m_{rb}

represents the glider body weight. The equation

m_{0} = m (t) - m_{d}

represents the net buoyancy of the glider, where

m_{d}

denotes the displacement mass,

m_{p}

represents the attitude adjustment mass, and g is the acceleration of gravity. The variable ballast mass is located at position

r_{b 1}

along the x-axis in the body-fixed coordinate system, while

r_{p 3}

indicates the position of the internal mass along the x-axis in the same coordinate system. Furthermore,

M_{f 1}

and

M_{f 3}

pertain to the additional mass. The variable

r_{p 1} (t)

denotes the position of the attitude adjustment unit, while

m_{b} (t)

represents the mass of the buoyancy adjustment unit. These two variables serve as the control inputs for the dynamics model. The hydraulic pump is used to control the injection (or extraction) of seawater into (or out of) the bladder, allowing for precise control over the buoyancy unit’s mass.

It is wel -documented in Ref. [38] that the buoyancy regulation mass,

m_{b} (t)

, is influenced by both seawater density and pressure. To simplify the calculation of the buoyancy, the effect of the changes in seawater density and pressure is combined by incorporating a depth correction term into the buoyancy regulation mass. Consequently, the buoyancy regulation mass can be expressed as follows:

m_{b} (t) = \pm (ρ_{sea} V_{sea} + K_{h} z)

(2)

where

ρ_{sea}

is the density of seawater, set in this research as

1022.70 kg / m^{3}

.

K_{h}

, denoted as a correction term, accounts for the impact of depth on the buoyancy regulation. Moreover, the variables

α (t)

,

D (t)

,

L (t)

,

T_{2} (t)

, and E represent the angle of attack, drag, lift, pitching moment, and velocity, respectively, which are mathematically expressed as follows:

\{\begin{matrix} α (t) = arctan (\frac{v_{3} (t)}{v_{1} (t)}) \\ V (t) = \sqrt{v_{1}^{2} (t) + v_{3}^{2} (t)} \\ D (t) = (K_{D 0} + K_{D} α^{2} (t)) V^{2} (t) \\ L (t) = (K_{L 0} + K_{L} α (t)) V^{2} (t) \\ T_{2} (t) = (K_{M 0} + K_{M} α (t) + K_{q} q (t)) V^{2} (t) \\ J_{2} = J_{rb 2} + J_{f 2} \\ E = J_{2} + m_{p} r_{p 3}^{2} + m_{b} (t) r_{b 1}^{2} \end{matrix}

(3)

where

K_{D 0}

,

K_{D}

,

K_{L 0}

,

K_{L}

,

K_{M 0}

,

K_{M}

,

K_{q}

, and

J_{2}

are the hydrodynamic parameters and inertial moment. These parameters, along with

K_{h}

, will be identified with sea trial data.

It is anticipated that the underwater glider will adhere to the desired trajectory in terms of the diving depth and pitch angle. Different from the research by Zou [20], the current study involves the doubling of both control inputs and outputs, leading to a more intricate model. This model exhibits significant nonlinear characteristics and dynamic behaviors, underscoring the necessity for an advanced controller design. Given the relatively low and predominantly stable velocity of the underwater glider (ranging from 0.5 to 0.8 knots), it is reasonable to disregard the impact of velocity variation in Equation (1). Moreover, the expression for energy, represented as

J_{2} + m_{b} r_{b 1}^{2} ≫ m_{b} (t) r_{b 1}^{2}

, can be simplified to

E = J_{2} + m_{b} r_{b 1}^{2}

. Thus, Equation (1) can be expressed more succinctly as follows:

\{\begin{matrix} \dot{z} (t) & = - v_{1} (t) sin θ (t) + v_{3} (t) cos θ (t) \\ \dot{θ} (t) & = q (t) \\ \dot{q} (t) & = M (θ (t), \dot{θ} (t)) + B (θ (t), \dot{θ} (t)) u_{2} (t) \\ {\dot{v}}_{1} (t) & = \frac{1}{M_{f 1}} [\begin{matrix} - q (t) v_{3} (t) M_{f 3} - D (t) cos α (t) + L (t) sin α (t) \\ - m_{0} g sin θ (t) + q {(t)}^{2} (m_{p} r_{p 1} (t) + m_{b} (t) r_{b 1}) \end{matrix}] \\ {\dot{v}}_{3} (t) & = \frac{1}{M_{f 3}} [\begin{matrix} q (t) v_{1} (t) M_{f 1} - D (t) sin α (t) \\ - L (t) cos α (t) + m_{0} g cos θ (t) \end{matrix}] \end{matrix}

(4)

where the functions

M (\cdot)

and

B (\cdot)

are defined as follows:

\begin{matrix} M (θ (t), \dot{θ} (t)) & = \frac{1}{E} [T_{2} (t) - m_{b} r_{b 1} g cos θ (t) - m_{p} r_{p 3} g sin θ (t) \\ + v_{1} (t) v_{3} (t) (M_{f 3} - M_{f 1} - m_{p}) - v_{1} (t) q (t) m_{p} r_{p 3} - v_{1} (t) q (t) m_{b} r_{b 1}] \\ B (θ (t), \dot{θ} (t)) & = \frac{1}{E} (- m_{p} g cos θ (t) - m_{p} v_{1} (t) q (t)) \end{matrix}

2.2. Parameter Identification

Based on the analysis in Section 2.1, altogether, eleven parameters of the dynamics model need to be identified, designated as

K_{h}

,

K_{D 0}

,

K_{D}

,

K_{L 0}

,

K_{L}

,

K_{M 0}

,

K_{M}

,

K_{q}

,

J_{f 2}

,

M_{f 1}

, and

M_{f 3}

. The parameter identification toolbox in MATLAB 2024a is used, and the whole process is depicted in Figure 2. The cost function

F (t)

is minimized as follows [39]:

F (t) = \sum_{i = 0}^{n} {(z (t_{i}) - z_{s} (t_{i}))}^{2} + {(θ (t_{i}) - θ_{s} (t_{i}))}^{2}

(5)

where

z_{s}

and

θ_{s}

are the dive depth and pitch angle in the sea trial, respectively.

Notably, the trust-region-reflective algorithm [40] is adopted in the parameter identification to optimize the selection process. The procedure for parameter identification is outlined in Table 1:

2.3. Model Validation

Parameter identification is based on the data collected from the initial profile test of an underwater glider conducted in the South China Sea. For model validation, experimental data from profiles 28 to 39 are selected. Table 2 presents the results of parameter identification.

Based on the data obtained from profiles 28 to 39 of the sea trial, the experimental results are compared with those predicted by the simplified model (Equation (4)) and the full model (Equation (1)). Figure 3 displays the comparison results in terms of dive depth and pitch angle. The simulation results from both models are highly consistent with the sea trial data. The root mean square error (RMSE) is employed to quantify the accuracy of the two models in representing the real underwater glider, expressed as follows [41]:

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(6)

where

y_{i}

represents the true value and

{\hat{y}}_{i}

represents the observed value.

Table 3 presents the RMSE indices for the dive depth and pitch angle of both models. The results show the comparable accuracy of both models, indicating that they accurately capture the underwater glider’s behaviors.

2.4. Standard Forms

Based on Section 2.3, the parameter identification results are integrated into the underwater glider dynamics model (Equation (4)), leading to the following simplified form:

\{\begin{matrix} \dot{η} (t) & = G (η (t)) φ (t) \\ \dot{φ} (t) & = M F_{1} (η (t), φ (t)) + M F_{2} (η (t), φ (t)) u (t) \end{matrix}

(7)

where the nonlinear vector function

F_{1} (t) \in R^{3}

is defined as follows:

\begin{matrix} F_{1} (t) & = [\begin{matrix} f_{1} (t) \\ f_{2} (t) \\ f_{3} (t) \end{matrix}] \\ f_{1} (t) & = T_{2} (t) + 0.0019 cos θ (t) \cdot z (t) - 2.06 sin θ (t) + 158.53 v_{1} (t) v_{3} (t) - 0.21 v_{1} (t) q (t) \\ f_{2} (t) & = - 182.37 v_{3} (t) q (t) + 2.34 sin θ (t) + 0.0025 z (t) sin θ (t) \\ + L (t) sin α (t) - D (t) cos α (t) \\ f_{3} (t) & = 8.85 v_{1} (t) q (t) - 2.34 cos θ (t) - 0.0025 z (t) cos θ (t) \\ - L (t) cos α (t) + D (t) sin α (t) \end{matrix}

The input-related matrix

F_{2} (t) \in R^{3 \times 2}

is given by the following:

F_{2} (t) = [\begin{matrix} - 10.52 cos θ (t) - 1.07 v_{1} (t) q (t) & - 147.15 cos θ (t) - 15 v_{1} (t) q (t) \\ - 14.03 sin θ (t) & 0 \\ 14.03 cos θ (t) & 0 \end{matrix}]

The constant mass-inertia matrix

M \in R^{3 \times 3}

is as follows:

M = [\begin{matrix} 0.048 & 0 & 0 \\ 0 & 0.11 & 0 \\ 0 & 0 & 0.0055 \end{matrix}]

And the kinematic transformation matrix

G (t) \in R^{2 \times 3}

is as follows:

G (t) = [\begin{matrix} 0 & - sin θ (t) & cos θ (t) \\ 1 & 0 & 0 \end{matrix}]

By defining

l_{1} (t) = {MF}_{1} (η (t), φ (t))

, the dynamical model described in Equation (7) can be reformulated as follows:

\{\begin{matrix} \dot{η} (t) = G (η (t)) φ (t) \\ \dot{φ} (t) = M F_{2} (η (t), φ (t)) u (t) + l_{1} (t) \end{matrix}

(8)

Then, the variable substitutions

x_{1} (t) = η (t)

and

x_{2} (t) = G (η (t)) φ (t)

are introduced to simplify the controller design process. Subsequently, Equation (8) can be further expressed as follows:

\{\begin{matrix} {\dot{x}}_{1} (t) = x_{2} (t) \\ {\dot{x}}_{2} (t) = G_{2} (x_{1} (t)) M F_{2} (x_{1} (t), x_{2} (t)) u (t) + G_{2} (x_{1} (t)) l (t) + \dot{θ} G_{1} (x_{1} (t)) x_{2} (t) \end{matrix}

(9)

where

G_{1} (x_{1} (t)) = [\begin{matrix} 0 & sin (θ (t)) & cos (θ (t)) \\ 0 & 0 & 0 \end{matrix}]

, and

G_{2} (x_{1} (t)) = [\begin{matrix} 0 & - cos (θ (t)) & sin (θ (t)) \\ 1 & 0 & 0 \end{matrix}]

.

Let

F (t) = G_{2} (x_{1} (t)) {MF}_{2} (x_{1} (t), x_{2} (t))

, and

l_{2} (t) = G_{2} (x_{1} (t)) l_{1} (t) + \dot{θ} G_{1} (x_{1} (t)) x_{2} (t) + f (t)

. Here,

f (t)

represents the combined uncertainties of the unmodeled portion and the model’s time-varying components. Considering the changing disturbances at the inputs of the model represented as

d (t)

, the dynamical model can be written in the standard form as follows:

\{\begin{matrix} {\dot{x}}_{1} (t) = x_{2} (t) \\ {\dot{x}}_{2} (t) = F (x_{1} (t), x_{2} (t)) (u (t) + d (t)) + l_{2} (t) \end{matrix}

(10)

Assumption 1

([42,43]). The uncertainty component of the model,

l_{2} (t)

, and the external perturbation,

d_{1} (t)

, are collectively treated as aggregate perturbations

Ld (t)

, where

Ld (t)

is bounded and satisfies

∥ L d (t) ∥ \leq L d_{Max}

. Under this assumption, Equation (10) can be reformulated into the following expression:

\{\begin{matrix} {\dot{x}}_{1} (t) = x_{2} (t) \\ {\dot{x}}_{2} (t) = F (x_{1} (t), x_{2} (t)) u (t) + L d (t) \end{matrix}

(11)

The main objective of this study is to develop a sliding mode controller based on a reinforcement learning framework. This control strategy is intended to effectively regulate the dive depth and pitch angle within a specified timeframe, while also improving resilience to external disturbances and uncertainties in the system model.

3. Controller Design

The schematic of controller design is shown in Figure 4.

3.1. Prior Knowledge

Lemma 1

([44]). Considering the nonlinear system

\dot{x} = f (x (t))

, let

x \in R^{n}

,

x (0) = x_{0}

, where

f (x (t))

represents a nonlinear function. If Lyapunov function

V (x)

exists, along with parameters

α > 0

,

β > 0

,

0 < p < 1

,

q > 1

, such that

\dot{V} (x) ⩽ - (α V {(x)}^{p} + β V {(x)}^{q})

, then, the system demonstrates stability within a fixed timeframe. The convergence time T satisfies the following:

T ⩽ \frac{1}{α (1 - p)} + \frac{1}{β (q - 1)}

(12)

Lemma 2

(Young’s inequality, [45]). Suppose that

a

and

b

are vectors in

R^{n}

, and let p, q satisfy

\frac{1}{p} + \frac{1}{q} = 1

; we can obtain the following:

a \cdot b \leq {∥ a ∥}_{p} {∥ b ∥}_{q}

(13)

where

{∥ a ∥}_{p}

denotes the

L^{p}

-norm of the vector

a

.

{∥ b ∥}_{q}

denotes the

L^{q}

-norm of the vector

b

.

3.2. Sliding Mode Controller Design

The desired trajectory is defined as

x_{1 d} = {[\begin{matrix} z_{d} & θ_{d} \end{matrix}]}^{T}

. The state error is subsequently determined as

e_{1 d} = x_{1} - x_{1 d}

, with the sliding mode surface denoted as

S = {[\begin{matrix} S_{1} & S_{2} \end{matrix}]}^{T}

.

S_{i} = {\dot{e}}_{i} + β_{1} {sig}^{ψ_{i} (ε_{i})} (e_{1 i}) + R_{i} (e_{1 i}), i = 1, 2 .

(14)

where

ψ_{i} (ε_{1}) = 1 + 0.5 ε_{1} (1 + sgn (|e_{1 i}| - 1))

, and

β_{1}

is the speed of convergence when the tracking error of the regulation system is away from the equilibrium point, with

β_{1} > 0

. The expression of

R_{i} (e_{1 i})

is presented as follows:

R_{i} (e_{1 i}) = \{\begin{matrix} β_{2} {sig}^{ϕ_{i} (ε_{2})} (e_{1 i}), & |e_{1 i}| ⩾ δ \\ λ_{1} e_{i} + λ_{2} {sig}^{γ} (e_{1 i}), & |e_{1 i}| < δ \end{matrix}

(15)

where

λ_{1} = β_{2} (ϕ_{i} (ε_{2}) - γ) δ^{δ (ε_{2}) - 1} / (1 - γ)

;

λ_{2} = β_{2} (ϕ_{i} (ε_{2}) - 1) δ^{ϕ_{1} (ε_{2}) - γ} / (1 - γ)

;

ϕ_{i} (ε_{2}) = 1 - 0.5 ε_{2} (1 - sgn (|e_{1 i}| - 1))

;

γ

determines the sliding mode surface, with

γ > 0

;

δ

is a very small positive constant; and

β_{2}

controls the rate of convergence of the tracking error near the equilibrium point, with

β_{2} > 0

. The function

{sig}^{a} (x)

is defined as

{sig}^{a} (x) = {| x |}^{a} sgn (x)

.

Differentiating Equation (14) yields the following:

{\dot{S}}_{i} = {\ddot{e}}_{1 i} + β_{1} ψ_{i} (ε_{1}) {|e_{1 i}|}^{ψ_{i} (ε_{1}) - 1} {\dot{e}}_{1 i} + {\dot{R}}_{i} (e_{1 i}), i = 1, 2 .

(16)

Let

\dot{S} = 0

, the control law

u_{1}

can be written as follows:

u_{1} = - F^{- 1} (L d (t) - {\ddot{x}}_{1 d} + β_{1} ψ (ε_{1}) {| e |}^{ψ (ε_{1}) - 1} \dot{e} + \dot{R} (e))

(17)

To enhance the convergence speed and alleviate the chattering problems associated with sliding mode control, the following convergence law is proposed:

{\dot{S}}_{i} = - K_{0} sgn (S_{i}) - K_{1} {sig}^{11 + 0.5 α_{i} (1 + sgn (S_{i} - 1 - 1))} (S_{i}) - K_{2} {sig}^{1 - 0.5 α_{2} (1 - sgn n (S_{i} ∣ - 1))} (S_{i}), i = 1, 2

(18)

where

K_{0}

,

K_{1}

, and

K_{2}

denote positive constants. The control rate for chattering suppression can be expressed as follows:

u_{2} = - F^{- 1} \dot{S}

(19)

where

\dot{S} = {[\begin{matrix} {\dot{S}}_{1} & {\dot{S}}_{2} \end{matrix}]}^{T}

. In summary, the integrated control law can be written as follows:

u = u_{1} + u_{2}

(20)

When the model is inaccurately constructed, the effectiveness of the control law outlined in Equation (20) may be compromised by the presence of a nonlinear term and an unknown disturbance,

L d (t)

, in the control law described in Equation (17).

To address this issue, we implement a critic–actor reinforcement learning strategy utilizing RBFNN. The critic network indirectly compensates for unknown disturbances by assessing the system’s performance metrics, which enables the dynamic adjustment of its weights. In contrast, the actor network directly generates error compensation signals. Its RBFNN architecture approximates the dynamic characteristics of disturbances through nonlinear mapping, integrating this estimation into the control law to actively mitigate the disturbances. Furthermore, we propose a real-time standard deviation updating strategy. The dynamic adjustment of the standard deviation optimizes the neural network’s local response capability, thereby enhancing its adaptability to the system’s nonlinear variations. This updating mechanism improves the network’s approximation accuracy and bolsters the control algorithm’s robustness against model uncertainties and external perturbations. Through online learning, the system continuously optimizes control performance, ensuring the precise management of underwater gliders in complex environments.

3.3. Critic Neural Network Design

This section presents the design of a critic neural network, which evaluates policy performance and optimizes the controller parameters. A method is proposed for updating the weights and standard deviations of the RBF neurons in real time, enhancing the network’s adaptability and robustness.

3.3.1. Weight Updates

Define the long-term cost function as follows [46,47]:

I (t) = \int_{0}^{\infty} e_{1}^{- \frac{τ - t}{χ}} φ (τ) d τ

(21)

where

χ

is a constant representing future costs and represents an immediate cost function, which can be expressed as follows:

φ (t) = e_{1}^{T} D e_{1} + u^{T} R u

(22)

where

D

and

R

are positive definite matrices.

Define

I = ω_{c}^{* T} σ_{c} (e_{1}) + ε_{o}

, and its estimate is

\hat{I} = {\hat{ω}}_{c}^{T} σ_{c} (e_{1})

. Where

ω_{c}^{*}

and

{\hat{ω}}_{c}

are the true and estimated neural network weights, respectively, and

σ_{c} (\cdot)

takes the form of a Gaussian function as follows:

σ_{c} (e) = exp (- \frac{\sum_{i = 1}^{r} {(e_{c} - μ_{i})}^{2}}{2 ξ_{c}^{2}})

(23)

Define the approximation error of the cost-to-go as follows:

γ (t) = φ (t) - \frac{1}{χ} \hat{I} (t) + \dot{\hat{I}} (t)

(24)

Using gradient descent, the update law for the evaluation network is designed as follows [46]:

{\dot{\hat{ω}}}_{c} (t) = - \frac{δ_{c}}{2} \frac{\partial (γ^{2} (t))}{\partial ω_{c}}

(25)

where

δ_{c} > 0

is the learning rate of the critic neural network. By substituting Equation (25) into Equation (24), we obtain the following:

\begin{matrix} {\dot{\hat{ω}}}_{c} (t) & = - \frac{δ_{c}}{2} \frac{\partial (γ^{2} (t))}{\partial ω_{c}} \\ = - δ_{c} γ (t) \frac{\partial (φ (t) - \frac{1}{χ} \hat{I} (t) + \hat{I} (t))}{\partial ω_{c}} \\ = - δ_{c} γ (t) (\frac{\partial φ (t)}{\partial ω_{c}} - \frac{1}{χ} \frac{\partial \hat{I} (t)}{\partial ω_{c}} + \frac{\partial \dot{I} (t)}{\partial ω_{c}}) \\ = - δ_{c} (φ (t) + {\hat{ω}}_{c}^{T} (t) (- \frac{1}{χ} σ_{c} (t) + \nabla σ_{c} (t) {\dot{e}}_{1} (t)) (- \frac{1}{χ} σ_{c} (t) + \nabla σ_{c} (t) {\dot{e}}_{1} (t)) \end{matrix}

(26)

Define

Θ (t) = - \frac{1}{χ} σ_{c} (t) + {\dot{σ}}_{c} (t) {\dot{e}}_{1} (t)

, and then, Equation (26) can be rewritten as follows:

{\dot{\hat{ω}}}_{c} (t) = - δ_{c} (φ (t) + {\hat{ω}}_{c}^{T} (t) Θ (t)) Θ (t)

(27)

Define

ρ_{c} (t) = (φ (t) + {\hat{ω}}_{c}^{T} (t) Θ (t)) Θ (t)

, and the updating law based on the projection method for the critic neural network is designed as follows:

{\dot{\hat{ω}}}_{c} (t) = - δ_{c} ρ_{c} (t)

(28)

3.3.2. Standard Deviation Update

Previous studies often emphasize weight updating strategies [46,47], but these approaches frequently fail to effectively address steady-state errors. To enhance the generalization capability of the neural network, this study introduces a gradient descent-based learning method specifically designed for the real-time adjustment of the standard deviation [48], optimizing the sensory field of the RBF networks.

ξ_{c} = ξ_{c} - η_{c} \frac{\partial E}{\partial ξ_{c}}

(29)

where

η_{c}

is the learning rate of the standard deviation update. Let

E = \frac{1}{2} {∥e_{1}∥}^{2}

, and

\frac{\partial E}{\partial ξ_{c}}

can be written as follows:

\begin{matrix} \frac{\partial E}{\partial ξ_{c}} & = \frac{\partial E}{\partial e_{1}} \cdot \frac{\partial e_{1}}{\partial σ_{c}} \cdot \frac{\partial σ_{c}}{\partial ξ_{c}} \\ = - e_{c} σ_{c} (e) (- \frac{{∥e_{c} - μ_{i}∥}^{2}}{ξ_{c}^{3}}) \end{matrix}

(30)

Employing this method, the kernel width is adaptively expanded during phases of significant error to expedite the convergence of learning, while it is contracted during phases of minor error to enhance the accuracy of local approximation. This dynamically adjusted strategy not only effectively reduces modeling errors but also improves robustness against parameter uncertainty.

3.4. Actor Neural Network Design

3.4.1. Weighs’ Update

This section utilizes an actor neural network to approximate the aggregate disturbance in Equation (17). Specifically,

f_{N N} = Ld (t)

. The core of this approach lies in transforming the complex problem of disturbance estimation into a learned optimization process. By incorporating this into the control law, it can be formulated as follows:

u_{1} = - F^{- 1} (f_{N N} - {\ddot{x}}_{1 d} + β_{1} ψ (ε_{1}) {| e |}^{ψ (s_{1}) - 1} \dot{e} + \dot{R} (e))

(31)

By using RBF neural networks, the following expression can be used to define

f_{N N}

:

f_{N N} = [\begin{matrix} {\hat{ω}}_{z 1} σ_{z 1} (e_{z}) + {\hat{ω}}_{z 2} σ_{z 2} ({\dot{e}}_{z}) \\ {\hat{ω}}_{θ 1} σ_{θ 1} (e_{θ}) + {\hat{ω}}_{θ 2} σ_{θ 2} ({\dot{e}}_{θ}) \end{matrix}] = [\begin{matrix} {\hat{ω}}_{z}^{T} σ_{z} \\ {\hat{ω}}_{θ}^{T} σ_{θ} \end{matrix}]

(32)

The learning error of the

f_{N N}

can be defined as follows:

μ_{a} = Δ ω_{z}^{T} σ_{z} + Δ ω_{θ}^{T} σ_{θ}

(33)

The actor neural network error is expressed as follows:

\begin{matrix} e_{a} & = μ_{a} + k_{i} (\hat{I} (t) - I_{d} (t)) \\ = μ_{a} + k_{i} (ω_{c}^{*} σ_{c} + ε_{c} - I_{d} (t)) \end{matrix}

(34)

Define

ϑ = t a n h ({\hat{ω}}_{z}^{T} σ_{z} + {\hat{ω}}_{θ}^{T} σ_{θ} + k_{i} \hat{I})

.

In the actor neural network, we utilize the following modified linear activation function:

R (ϑ) = \{\begin{matrix} ϑ & if ϑ > 0 \\ e^{ϑ} - 1 & if ϑ \leq 0 \end{matrix}

(35)

The renewal law for updating the parameters of the neural network is then expressed as:

\{\begin{matrix} {\dot{\hat{ω}}}_{z} = - δ_{a} R (ϑ) σ_{z} \\ {\dot{\hat{ω}}}_{θ} = - δ_{a} R (ϑ) σ_{θ} \end{matrix}

(36)

Define

ρ_{z} = R (ϑ) σ_{z}

and

ρ_{θ} = R (ϑ) σ_{θ}

. The updating law of the actor neural network is designed as follows:

{\dot{\hat{ω}}}_{a} = - δ_{a} ρ_{a}, a = z, θ

(37)

3.4.2. Standard Deviation Update

Inspired by the critic neural network’s update rule, an update rule is proposed for the standard deviations, which follows the gradient descent method, expressed as follows:

ξ_{a} = ξ_{a} - η_{a} e_{a} σ_{a} (e) (- \frac{{∥e_{a} - μ_{i}∥}^{2}}{ξ_{a}^{3}}), a = z, θ

(38)

The stability analysis of the controller is shown in Appendix A.

4. Simulation Analysis

In this section, the proposed control law is applied to the dynamics model for the underwater glider to evaluate its practicality and effectiveness. The controller’s performance in rejecting disturbances and tracking trajectories is assessed through three distinct cases as follows:

1: Model uncertainty: This case examines the controller’s effectiveness in handling underwater glider dynamics with unmodeled components or time-varying hydrodynamic parameters, which introduce model uncertainty.
2: External disturbance suppression: The controller’s ability to suppress external disturbances is evaluated by applying two typical disturbances to the angular velocity q.

To evaluate the efficacy of various control strategies, simulations were executed utilizing the Simulink module in MATLAB 2024a. The methods assessed include the SMC, RLSMC, and SD-RLSMC algorithms. The simulations employed the ode4 (Runge-Kutta) solver with a fixed step size of 0.0005, spanning a total simulation duration of 400 s. Additionally, the neural network architecture adopted in this study is grounded in an actor–critic framework, employing RBF networks. Within this framework, all disturbances are consistently regarded as unmodeled dynamics, denoted by the term

f_{N N} = L d (t)

, which is integrated into the control law to facilitate compensation. To ensure a fair comparison, the same initial conditions and relevant parameters are set for the simulation. The initial state of the underwater glider is defined as

{[\begin{matrix} z & θ & q & v_{1} & v_{3} \end{matrix}]}^{T} = {[\begin{matrix} 0 & 0 & 0 & 0 & 0 \end{matrix}]}^{T}

. The results of parameter identification are used to determine the hydrodynamic parameters, rotational inertia, and other relevant parameters, as shown in Table 2. The trajectories to be tracked can be expressed as follows:

θ_{d} (rad) = \{\begin{matrix} - 0.32 & 0 s \leq t < 100 s \\ 0.32 & 100 s \leq t < 200 s \\ - 0.32 & 200 s \leq t < 300 s \\ 0.32 & 300 s \leq t < 400 s \end{matrix}

(39)

The z-direction trajectory of the underwater glider is indirectly controlled with its velocity. A velocity of

v_{1 d} = 0.3 m / s

is assigned for horizontal movement. During descent, the velocity is set to

v_{3 d} = - 0.03 m / s

, and during ascent, it is set to

v_{3 d} = 0.03 m / s

. Consequently, the desired z-direction trajectory is represented as follows:

z_{d} = \int_{0}^{t} - v_{1 d} sin (θ_{d}) + v_{3 d} cos (θ_{d}) d t

(40)

The controller parameters are set in Table 4.

4.1. Case 1: Model Uncertainty

In the model parameter variation settings outlined in [20], parameter

K_{M 0}

was adjusted by up to 40% of its nominal value, while all other parameters experienced perturbations not exceeding 20%. To conduct a more thorough evaluation of the robustness of the control algorithm proposed in this study, we have increased the relative deviation of the model parameters to 25% and set them as varying with time. This enhanced configuration not only presents a more challenging scenario for testing the control algorithm under conditions of significant uncertainty but also more accurately reflects the dynamic variations encountered in real-world system operations. The general expression for the parameter variation is provided as follows:

K = (1 + ζ_{k} 25 %) K_{0}

(41)

where

0 \leq ζ_{k} \leq 1

characterizes the uncertainty of this parameter, and the frequency of change for

ζ_{k}

is set to 0.2 Hz.

Figure 5 systematically compares the following three trajectory control strategies for underwater gliders: SD-RLSMC, RLSMC, and SMC. The tracking performance is depicted in terms of the dive depth (Figure 5a) and pitch angle (Figure 5b). The findings suggest that all methods effectively track the depth trajectory; however, both SD-RLSMC and RLSMC exhibit larger errors around

t = 200

s compared to SMC. Nonetheless, the quantitative analysis presented in Table 5 indicates that SD-RLSMC achieves the lowest RMSE in dive depth at 0.0154, surpassing RLSMC (0.0285) and SMC (0.0228). Notably, the overshoot of SD-RLSMC is merely 0.01%, considerably lower than that of the other strategies. Similarly, for pitch angle control, SD-RLSMC attains an RMSE of 0.0430 with an overshoot of 9.77%, demonstrating superior accuracy and stability under dynamic conditions.

The controller outputs for the tracking of the intended trajectory are illustrated in Figure 6. It is evident that the actuator demonstrates significant vibrations when SMC is implemented. In contrast, the RLSMC and SD-RLSMC strategies significantly reduce vibrations, with SD-RLSMC exhibiting the lowest levels of vibration. This implies that SD-RLSMC is highly effective in reducing actuator strain and is distinguished by its exceptional energy efficiency. To quantitatively evaluate the control effort, the power associated with the control signals

d V

and

r_{p 1}

is computed. The control power is calculated using the following expression:

Power = \frac{1}{N} \sum u_{i}^{2}

(42)

As shown in the Table 6, the SMC strategy exhibits a significantly higher control effort in both the

d V

and

r_{p 1}

control signals compared to the other two strategies. Notably, the control effort of SD-RLSMC is slightly lower than that of RLSMC, demonstrating the effectiveness of SD-RLSMC in reducing actuator energy consumption.

Figure 7 presents the outcomes of the neural network estimations of the aggregated disturbances within the simplified model, employing both the RLSMC and SD-RLSMC methodologies. The figure demonstrates that the SD-RLSMC algorithm yields a comparatively more precise estimation of disturbances, whereas the RLSMC algorithm exhibits substantial deviation as simulation time advances.

4.2. Case 2: External Disturbance Suppression

Ocean currents and winds are integral components of the marine environment, with disturbances induced by these currents exerting a substantial influence on the movement of underwater vehicles. The velocity and direction of ocean currents exhibit significant variability across different geographical regions and seasonal periods, with surface currents generally demonstrating greater intensity compared to deep-sea currents. Despite this, there is presently no model that can precisely capture the intricate behavior of ocean currents. To address the impact of external currents on the dynamics of gliders, Zhou et al. [49] and Zou et al. [20] propose the introduction of perturbation signals into the q velocity component, as follows:

q_{dis 1} = 0.33 sin (0.1 t) + 0.25 cos (0.02 t) + 0.3 sin (0.1 t) cos (0.2 t)

(43)

q_{dis 2} = 0.02 sin (0.9 t) + 0.03 cos (1.1 t) - 0.05 sin (0.8 t)

(44)

From these two equations, it becomes apparent that the perturbation signifies a large-amplitude, slowly varying the signal employed to characterize surface wave conditions. Conversely, the second perturbation is indicative of a small-amplitude, rapidly fluctuating signal, which represents the impact of underwater currents.

Figure 8a,b illustrates the trajectory tracking performance of the three algorithms when subjected to large-amplitude, slow time-varying disturbance signal, thereby offering a visual depiction of each algorithm’s control efficacy. The RMSE values presented in Table 7 quantitatively substantiate these visual observations. In terms of dive depth, while the tracking performance of the SD-RLSMC algorithm is marginally inferior to that of the RLSMC algorithm, its RMSE value is 0.0159, which is significantly lower than the RMSE values of RLSMC (0.1480) and SMC (0.0278). Regarding pitch angle tracking, as depicted in Figure 8b, the SMC algorithm demonstrates a pronounced oscillation phenomenon, whereas the RLSMC algorithm exhibits fluctuations upon each adjustment of the pitch angle, achieving stability only after a delay while still displaying steady-state errors. Conversely, the SD-RLSMC algorithm demonstrates minimal oscillation and negligible steady-state error. Moreover, Table 7 indicates that the RMSE for SD-RLSMC is 0.0722, substantially lower than that of RLSMC (0.1161) and SMC (0.1340), thereby underscoring the superior tracking stability of the SD-RLSMC algorithm in these two metrics. Furthermore, the overshoot data presented in Table 7 corroborate this observation. The overshoot in pitch angle tracking for the SD-RLSMC method is markedly lower compared to the other two methods, indicating that the algorithm provides enhanced stability and accuracy during the response process by effectively mitigating overreactions and oscillations. This advantage likely arises from the standard deviation adaptive feature incorporated into the SD-RLSMC algorithm, which facilitates dynamic adjustments to the control strategy in response to disturbances, thereby significantly reducing tracking errors.

Figure 8c,d provides additional evidence of the trajectory tracking performance of the three algorithms when subjected to small-amplitude, rapidly varying signal perturbations, thereby corroborating the previous findings. In the context of dive depth and pitch angle tracking, the SD-RLSMC algorithm consistently outperforms the RLSMC and significantly exceeds the performance of the SMC. Notably, under these conditions, SD-RLSMC does not display any discernible steady-state error, further affirming its robust control capabilities. The overshoot calculations presented in Table 7 substantiate the continued superiority of SD-RLSMC in pitch angle tracking, underscoring the algorithm’s enhanced adaptability in managing small-amplitude, rapidly varying signal perturbations.

Figure 9 illustrates the controller outputs used to track the target trajectory. It is evident that significant vibrations occur when using SMC. In contrast, both RLSMC and SD-RLSMC strategies effectively reduce vibration levels, with SD-RLSMC exhibiting the lowest vibration. To quantitatively assess the control effort, we calculated the power of the control signals for the three algorithms under two perturbation scenarios, with the results presented in Table 8. The SD-RLSMC strategy demonstrates significantly lower control effort, in terms of both dV and rp1 power, compared to the other two strategies. Notably, the control effort of SD-RLSMC is slightly lower than that of RLSMC, indicating its effectiveness in reducing actuator energy consumption.

Table 9 presents performance metrics from the literature for an initial comparison. The algorithm in this paper is similar to existing control methods in structure, interference immunity, and system modeling, with no significant drawbacks. Although the literature uses different evaluation metrics, this study’s control performance metrics are comparably high, indicating the proposed method’s effectiveness and potential benefits.

5. Conclusions and Prospects

This study presents the implementation of a finite-time sliding mode trajectory tracking control strategy for underwater gliders, known as SD-RLSMC, which incorporates the radial basis function neural network reinforcement learning method. The identification of hydrodynamic and pertinent parameters is achieved through data collected during sea trials, facilitating the development of the precise and appropriate dynamic model for controller formulation. The reinforcement learning framework incorporates a critic neural network for strategy evaluation and optimization and the actor neural networks for addressing model uncertainties, the external environment, and actuator disturbances. These estimations are integrated as feedback terms within the sliding mode control framework. To improve the generalization capability and robustness of the control system, a mechanism for updating the weights and standard deviations in both critic and actor Neural Networks is proposed. The simulation results illustrate that the control strategy exhibits consistent and reliable performance across diverse conditions, encompassing unmodeled influences and external disturbances.

While the control method proposed in this paper presents several advantages, it also exhibits certain limitations. Firstly, when addressing large-amplitude, slow time-varying disturbances, the controller partially mitigates the steady-state error; however, control oscillations remain discernible, and jittering behavior is evident during the attitude adjustment phase. Secondly, the integration of multiple components, including sliding mode control and radial basis function neural networks, complicates the parameter tuning process, thereby increasing the difficulty of algorithm debugging in practical engineering applications.

Future research endeavors will focus on simplifying the control strategy, refining the algorithms governing weights and standard deviation updates, and crafting a trajectory tracking control algorithm tailored to underwater gliders navigating three-dimensional space. Furthermore, the efficacy of the proposed SD-RLSMC scheme will be assessed through experimentation on an actual underwater glider system to verify its practical utility.

Author Contributions

Methodology, Y.Y.; Software, G.W.; Validation, J.Y.; Writing—original draft, G.W.; Visualization, G.W.; Supervision, Y.Y.; Funding acquisition, Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China, grant number 51575376.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Stability Analysis

Theorem A1.

For a nonlinear underwater glider model, described by Equation (11), with uncertain dynamics and external disturbances, when bounded disturbances are assumed, the estimation error of the complete set of disturbances can converge within a finite time to a region close to the origin that is considered acceptable, as supported by the control laws in Equations (17)–(20) and (31).

Proof.

Define the following Lyapunov function:

V = V_{1} + V_{2} + V_{3}

(A1)

where

V_{1} = \frac{1}{2} S^{T} S

,

V_{2} = \frac{1}{2 δ_{c}} {\tilde{ω}}_{c}^{T} {\tilde{ω}}_{c}

, and

V_{3} = \frac{1}{2 δ_{a}} ({\tilde{ω}}_{z}^{T} {\tilde{ω}}_{z} + {\tilde{ω}}_{θ}^{T} {\tilde{ω}}_{θ})

.

We differentiate

V_{1}

and derive the following:

{\dot{V}}_{1} = S^{T} ({\ddot{e}}_{1} + (β_{1} ψ {|e_{1}|}^{ψ - 1} + β_{2} ϕ {|e_{1}|}^{ϕ - 1}) {\dot{e}}_{1})

(A2)

By substituting the control laws Equations (17) and (18) into the aforementioned equation, we derive the following:

{\dot{V}}_{1} = S^{T} (- f_{NN} + l_{2} + (- K_{0} sgn (S) - K_{1} {sig}^{1 + 0.5 α_{1} (1 + sgn (| s | - 1))} (S) - K_{2} {sig}^{1 - 0.5 α_{2} (1 - sen (| s | - 1))} (S)))

(A3)

We define

{\dot{V}}_{11} = S^{T} (l_{2} - f_{NN}), {\dot{V}}_{12} = - S^{T} (K_{0} sgn (S) + K_{1} {sig}^{1 + 0.5 α_{1} (1 + sgn (| s | - 1))} (S) + K_{2} {sig}^{1 - 0.5 α_{2} (1 - sgn (∣ s - 1))} (S))

,

{\tilde{ω}}_{a} = {[\begin{matrix} {\tilde{ω}}_{z} & {\tilde{ω}}_{θ} \end{matrix}]}^{T}

, and

σ_{a} = {[\begin{matrix} σ_{z} & σ_{θ} \end{matrix}]}^{T}

.

According to Young’s inequality and

S^{T} {\tilde{ω}}_{a}^{T} σ_{a} \leq \frac{1}{2} S^{T} S + \frac{1}{2} {∥{\tilde{ω}}_{a}∥}_{F}^{2} {∥σ_{a}∥}^{2}

,

S^{T} ε \leq \frac{1}{2} S^{T} S + \frac{1}{2} {∥ ε ∥}^{2}

, we derive:

{\dot{V}}_{11} \leq S^{T} S + \frac{1}{2} {∥{\tilde{ω}}_{a}∥}_{F}^{2} {∥σ_{a}∥}^{2} + \frac{1}{2} {∥ ε ∥}^{2}

(A4)

By using Young’s inequality,

{\dot{V}}_{12}

can be written as follows:

\begin{matrix} {\dot{V}}_{12} = - S^{T} (K_{0} sgn (S) + K_{1} {sig}^{1 + 0.5 α_{1} (1 + sgn (| s | - 1))} (S) + K_{2} {sig}^{1 - 0.5 α_{2} (1 - sgn (| S | - 1))} (S)) \\ = - \sum_{i = 1}^{2} S_{i} (K_{0} sgn (S_{i}) + K_{1} {sig}^{1 + 0.5 α_{1} (1 + sgn (∣ s = 1 - 1))} (S_{i}) + K_{2} {sig}^{1 - 0.5 α_{2} (1 - sen (∣ s = 1 - 1))} (S_{i})), i = z, θ \\ = - (K_{0} \sum_{i = 1}^{2} |S_{i}| + K_{1} \sum_{i = 1}^{2} {|S_{i}|}^{2 + 0.5 α_{i} (1 + sen (|s_{1}| - 1))} + K_{2} \sum_{i = 1}^{2} {|S_{i}|}^{2 - 0.5 α_{2} (1 - sen (|P_{1}| - 1))}) \\ \leq - K_{1} \sum_{i = 1}^{2} {|S_{i}|}^{2 + 0.5 α_{1} (1 + sgn (∣ s + 1 - 1))} - K_{2} \sum_{i = 1}^{2} {|S_{i}|}^{2 - 0.5 α_{2} (1 - sgn (∣ s = 1 - 1))} \\ = - K_{1} {(S^{T} S)}^{\frac{2 + 0.5 α_{1} (1 + sgn (| S | - 1))}{2}} - K_{2} {(S^{T} S)}^{\frac{2 - 0.5 α_{2} (1 - sgn (| S | - 1))}{2}} \end{matrix}

(A5)

According to Equation (12), the expression for

{\dot{V}}_{12}

is either positively or negatively related to

S

, which can be categorized and analyzed as follows.

(1) When

| S | - 1 < 0

, Equation (A5) can be simplified as follows:

{\dot{V}}_{12} \leq - K_{1} (S^{T} S) - K_{2} {(S^{T} S)}^{1 - 0.5 α_{2}}

(A6)

(2) When

| S | - 1 > 0

, Equation (A5) can be simplified as follows:

{\dot{V}}_{12} \leq - K_{1} {(S^{T} S)}^{1 + 0.5 α_{1}} - K_{2} (S^{T} S)

(A7)

Following this, we can delve into the examination of

{\dot{V}}_{1}

in cases where

| S | - 1 < 0

, as follows:

\begin{matrix} {\dot{V}}_{1} & \leq (1 - K_{1}) (S^{T} S) - K_{2} {(S^{T} S)}^{1 - 0.5 α_{2}} + \frac{1}{2} {∥{\tilde{ω}}_{a}∥}_{F}^{2} {∥σ_{a}∥}^{2} + \frac{1}{2} {∥ ε ∥}^{2} \\ \leq (1 - K_{1}) 2 V_{1} - K_{2} {(2 V_{1})}^{1 - 0.5 α_{2}} + \frac{1}{2} {∥{\tilde{ω}}_{a}∥}_{F}^{2} {∥σ_{a}∥}^{2} + \frac{1}{2} {∥ ε ∥}^{2} \end{matrix}

(A8)

Due to

σ_{a} \subset (\begin{matrix} 0 & 1 \end{matrix}]

,

{\dot{V}}_{1}

can be scaled to the following form:

{\dot{V}}_{1} \leq (1 - K_{1}) 2 V_{1} - K_{2} {(2 V_{1})}^{1 - 0.5 α_{2}} + \frac{1}{2} \sum_{i = 1}^{2} {∥{\tilde{ω}}_{a i}∥}^{2} + \frac{1}{2} \sum_{i = 1}^{2} {∥ε_{a i}∥}^{2}

(A9)

According to Lemma 2, it is known that

∥{\tilde{ω}}_{a}∥ \leq ∥ω_{a}∥ + ∥{\bar{ω}}_{a}∥

. Therefore, Equation (A9) can be further written as follows:

\begin{matrix} {\dot{V}}_{1} & \leq (1 - K_{1}) 2 V_{1} - K_{2} {(2 V_{1})}^{1 - 0.5 α_{2}} + \frac{1}{2} \sum_{i = 1}^{2} {∥{\tilde{ω}}_{a i}∥}^{2} + \frac{1}{2} \sum_{i = 1}^{2} {∥ε_{a i}∥}^{2} \\ \leq 2 (1 - K_{1}) V_{1} - 2^{1 - 0.5 α_{2}} K_{2} V_{1}^{1 - 0.5 α_{2}} + \sum_{i = 1}^{2} {∥ω_{a}∥}^{2} + \sum_{i = 1}^{2} {∥{\bar{ω}}_{a}∥}^{2} + \frac{1}{2} \sum_{i = 1}^{2} {∥ε_{a i}∥}^{2} \end{matrix}

(A10)

Similarly, it can be shown that when

| S | - 1 > 0

,

{\dot{V}}_{1}

can be scaled into the following form:

{\dot{V}}_{1} \leq (1 - K_{1}) 2^{1 + 0.5 α_{1}} V_{1}^{1 + 0.5 α_{1}} - 2 K_{2} V + \sum_{i = 1}^{2} {∥ω_{a}∥}^{2} + \sum_{i = 1}^{2} {∥{\bar{ω}}_{a}∥}^{2} + \frac{1}{2} \sum_{i = 1}^{2} {∥ε_{a i}∥}^{2}

(A11)

According to Lemma 2, we can obtain the expressions for

{\dot{V}}_{2}

and

{\dot{V}}_{3}

as follows:

\begin{matrix} {\dot{V}}_{2} & = {\tilde{ω}}_{c}^{T} ρ_{c} \leq {∥{\tilde{ω}}_{c}∥}_{F} ∥ρ_{c}∥ \\ = {∥{\tilde{ω}}_{c}∥}_{F} ∥(φ + {\hat{ω}}_{c}^{T} Θ) Θ∥ \\ \leq {∥{\tilde{ω}}_{c}∥}_{F} ∥φ + {\hat{ω}}_{c}^{T} Θ∥ ∥ Θ ∥ \\ \leq {∥{\tilde{ω}}_{c}∥}_{F} (∥ φ ∥ + ∥{\hat{ω}}_{c}∥ ∥ Θ ∥) ∥ Θ ∥ \\ \leq (∥ω_{c}∥ + ∥{\bar{ω}}_{c}∥) (∥ φ ∥ + ∥{\hat{ω}}_{c}∥ ∥ Θ ∥) ∥ Θ ∥ \end{matrix}

(A12)

\begin{matrix} {\dot{V}}_{3} & = \sum_{i = 1}^{2} {\tilde{ω}}_{a i}^{T} ρ_{a i} \\ \leq \sum_{i = 1}^{2} (∥ω_{a i}∥ + ∥{\bar{ω}}_{a i}∥) (tanh (\sum_{i = 1}^{2} {\hat{ω}}_{a i}^{T} σ_{a i} + k_{i} \hat{I}) σ_{a}) \\ \leq \sum_{i = 1}^{2} (∥ω_{a i}∥ + ∥{\bar{ω}}_{a i}∥) \cdot \sum_{i = 1}^{2} ∥σ_{a i}∥ \\ \leq \sum_{i = 1}^{2} (∥ω_{a i}∥ + ∥{\bar{ω}}_{a i}∥) \end{matrix}

(A13)

Based on the above analysis, we can combine Equations (A10)–(A13) to obtain the following:

(1) When

| S | - 1 < 0

\begin{matrix} \dot{V} \leq & 2 (1 - K_{1}) V_{1} - 2^{1 - 0.5 a_{2}} K_{2} V_{1}^{1 - 0.5 α_{2}} \\ + (∥ω_{c}∥ + ∥{\bar{ω}}_{c}∥) (∥ φ ∥ + ∥{\hat{ω}}_{c}∥ ∥ Θ ∥) ∥ Θ ∥ \\ + \sum_{i = 1}^{2} ∥ω_{a}∥ (1 + ∥ω_{a}∥) + \sum_{i = 1}^{2} ∥{\bar{ω}}_{a}∥ (1 + ∥{\bar{ω}}_{a}∥) + \frac{1}{2} \sum_{i = 1}^{2} {∥ε_{a i}∥}^{2} \end{matrix}

(A14)

(2) When

| S | - 1 > 0

\begin{matrix} \dot{V} \leq & 2 (1 - K_{2}) V_{1} - 2^{1 + 0.5 α_{1}} K_{1} V_{1}^{1 + 0.5 α_{1}} \\ + (∥ω_{c}∥ + ∥{\bar{ω}}_{c}∥) (∥ φ ∥ + ∥{\hat{ω}}_{c}∥ ∥ Θ ∥) ∥ Θ ∥ \\ + \sum_{i = 1}^{2} ∥ω_{a}∥ (1 + ∥ω_{a}∥) + \sum_{i = 1}^{2} ∥{\bar{ω}}_{a}∥ (1 + ∥{\bar{ω}}_{a}∥) + \frac{1}{2} \sum_{i = 1}^{2} {∥ε_{a i}∥}^{2} \end{matrix}

(A15)

Subsequently, we can organize the variable in accordance with the framework outlined in Lemma 1.

When

| S | - 1 < 0

\begin{matrix} \dot{V} \leq & 2 (1 - K_{1}) V - 2^{1 - 0.5 α_{2}} K_{2} V^{1 - 0.5 α_{2}} \\ - 2 (1 - K_{1}) V + 2^{1 - 0.5 α_{2}} K_{2} V^{1 - 0.5 α_{2}} \\ + 2 (1 - K_{1}) V_{1} - 2^{1 - 0.5 α_{2}} K_{2} V_{1}^{1 - 0.5 α_{2}} + (∥ω_{c}∥ + ∥{\bar{ω}}_{c}∥) (∥ φ ∥ + ∥{\hat{ω}}_{c}∥ ∥ Θ ∥) ∥ Θ ∥ \\ + \sum_{i = 1}^{2} ∥ω_{a}∥ (1 + ∥ω_{a}∥) + \sum_{i = 1}^{2} ∥{\bar{ω}}_{a}∥ (1 + ∥{\bar{ω}}_{a}∥) + \frac{1}{2} \sum_{i = 1}^{2} {∥ε_{a i}∥}^{2} \end{matrix}

(A16)

By substituting the expressions for

V_{1}

,

V_{2}

, and

V_{3}

into the above equation, we obtain the following:

\begin{matrix} \dot{V} \leq & 2 (1 - K_{1}) V - 2^{1 - 0.5 a_{2}} K_{2} V^{1 - 0.5 α_{2}} \\ - 2 (1 - K_{1}) (V_{1} + V_{2} + V_{3}) + 2^{1 - 0.5 α_{2}} K_{2} {(V_{1} + V_{2} + V_{3})}^{1 - 0.5 a_{2}} \\ + 2 (1 - K_{1}) V_{1} - 2^{1 - 0.5 α_{2}} K_{2} V_{1}^{1 - 0.5 a_{2}} \\ + (∥ω_{c}∥ + ∥{\bar{ω}}_{c}∥) (∥ φ ∥ + ∥{\hat{ω}}_{c}∥ ∥ Θ ∥) ∥ Θ ∥ \\ + \sum_{i = 1}^{2} ∥ω_{a}∥ (1 + ∥ω_{a}∥) + \sum_{i = 1}^{2} ∥{\bar{ω}}_{a}∥ (1 + ∥{\bar{ω}}_{a}∥) + \frac{1}{2} \sum_{i = 1}^{2} {∥ε_{a i}∥}^{2} \end{matrix}

(A17)

Let

Ξ = - 2 (1 - K_{1}) (V_{1} + V_{2} + V_{2}) + 2^{1 - 0.5 α_{2}} K_{2} {(V_{1} + V_{2} + V_{3})}^{1 - 0.5 α_{2}} + 2 (1 - K_{1}) V_{1} - 2^{1 - 0.5 α_{2}} K_{2} V_{1}^{1 - 0.5 α_{2}}

. According to Minkowski’s inequality [50], the following results can be obtained:

{(V_{1} + V_{2} + V_{3})}^{1 - 0.5 a_{2}} \leq V_{1}^{1 - 0.5 a_{2}} + V_{2}^{1 - 0.5 α_{2}} + V_{3}^{1 - 0.5 α_{2}}

(A18)

By substituting Equation (A13) into the expression for

Ξ

, we obtain the following:

Ξ \leq - 2 (1 - K_{1}) (V_{2} + V_{3}) + 2^{1 - 0.5 α_{2}} K_{2} (V_{2}^{1 - 0.5 α_{2}} + V_{3}^{1 - 0.5 α_{2}})

(A19)

By substituting the expression for V from Equation (A1) into the above equation, we obtain the following:

\begin{matrix} Ξ \leq & - 2 (1 - K_{1}) (\frac{1}{2 δ_{c}} ({\tilde{ω}}_{c}^{T} {\tilde{ω}}_{c} + \sum_{i = 1}^{2} {\tilde{ω}}_{a i}^{T} {\tilde{ω}}_{a i})) \\ + 2^{1 - 0.5 ω_{2}} K_{2} ({(\frac{1}{2 δ_{c}} {\tilde{ω}}_{c}^{T} {\tilde{ω}}_{c})}^{1 - 0.5 α_{2}} + {(\frac{1}{2 δ_{c}} \sum_{i = 1}^{2} {\tilde{ω}}_{a i}^{T} {\tilde{ω}}_{a i})}^{1 - 0.5 α_{2}}) \\ \leq & \frac{K_{1} - 1}{δ_{c}} ({∥{\tilde{ω}}_{c}∥}^{2} + \sum_{i = 1}^{2} {∥{\tilde{ω}}_{a i}∥}^{2}) + \frac{K_{2}}{δ_{c}} ({∥{\tilde{ω}}_{c}∥}^{2 - α_{2}} + {(\sum_{i = 1}^{2} {∥{\tilde{ω}}_{a i}∥}^{2})}^{1 - 0.5 α_{2}}) \\ \leq & \frac{K_{1} - 1}{δ_{c}} ({∥{\tilde{ω}}_{c}∥}^{2} + \sum_{i = 1}^{2} {∥{\tilde{ω}}_{a i}∥}^{2}) + \frac{K_{2}}{δ_{c}} ({∥{\tilde{ω}}_{c}∥}^{2 - α_{2}} + \sum_{i = 1}^{2} {∥{\tilde{ω}}_{a i}∥}^{2 - α_{1}}) \\ \leq & (\frac{K_{1} - 1}{δ_{c}} {(∥ω_{c}∥ + ∥{\bar{ω}}_{c}∥)}^{2} + \frac{K_{2}}{δ_{c}} {(∥ω_{c}∥ + ∥{\bar{ω}}_{c}∥)}^{2 - α_{2}}) \\ + (\frac{K_{1} - 1}{δ_{c}} \sum_{i = 1}^{2} {(∥ω_{a i}∥ + ∥{\bar{ω}}_{a i}∥)}^{2} + \frac{K_{2}}{δ_{c}} \sum_{i = 1}^{2} {(∥ω_{a i}∥ + ∥{\bar{ω}}_{a i}∥)}^{2 - α_{2}}) \\ \leq & \frac{2 (K_{1} - 1)}{δ_{c}} ({∥ω_{c}∥}^{2} + {∥{\bar{ω}}_{c}∥}^{2}) + \frac{2 K_{2}}{δ_{c}} ({∥ω_{c}∥}^{2 - α_{2}} + {∥{\bar{ω}}_{c}∥}^{2 - α_{2}}) \\ + \frac{2 (K_{1} - 1)}{δ_{c}} \sum_{i = 1}^{2} ({∥ω_{a i}∥}^{2} + {∥{\bar{ω}}_{a i}∥}^{2}) + \frac{2 K_{2}}{δ_{c}} \sum_{i = 1}^{2} ({∥ω_{a i}∥}^{2 - α_{2}} + {∥{\bar{ω}}_{a i}∥}^{2 - α_{2}}) \end{matrix}

(A20)

Furthermore, by substituting

Ξ

into

\dot{V}

in Equation (A16), we obtain the following:

\begin{matrix} \dot{V} \leq & 2 (1 - K_{1}) V - 2^{1 - 0.5 α_{2}} K_{2} V^{1 - 0.5 α_{2}} \\ + \frac{2 (K_{1} - 1)}{δ_{c}} ({∥ω_{c}∥}^{2} + {∥{\bar{ω}}_{c}∥}^{2}) + \frac{2 K_{2}}{δ_{c}} ({∥ω_{c}∥}^{2 - α_{2}} + {∥{\bar{ω}}_{c}∥}^{2 - α_{2}}) \\ + \frac{2 (K_{1} - 1)}{δ_{c}} \sum_{i = 1}^{2} ({∥ω_{a i}∥}^{2} + {∥{\bar{ω}}_{a i}∥}^{2}) + \frac{2 K_{2}}{δ_{c}} \sum_{i = 1}^{2} ({∥ω_{a i}∥}^{2 - α_{2}} + {∥{\bar{ω}}_{a i}∥}^{2 - α_{2}}) \\ + (∥ω_{c}∥ + ∥{\bar{ω}}_{c}∥) (∥ φ ∥ + ∥{\hat{ω}}_{c}∥ ∥ Θ ∥) ∥ Θ ∥ \\ + \sum_{i = 1}^{2} ∥ω_{a i}∥ (1 + ∥ω_{a i}∥) + \sum_{i = 1}^{2} ∥{\bar{ω}}_{a i}∥ (1 + ∥{\bar{ω}}_{a i}∥) + \frac{1}{2} \sum_{i = 1}^{2} {∥ε_{a i}∥}^{2} \end{matrix}

(A21)

Let

\begin{matrix} Ω & = \frac{2 (K_{1} - 1)}{δ_{c}} ({∥ω_{c}∥}^{2} + {∥{\bar{ω}}_{c}∥}^{2}) + \frac{2 K_{2}}{δ_{c}} ({∥ω_{c}∥}^{2 - α_{2}} + {∥{\bar{ω}}_{c}∥}^{2 - α_{2}}) \\ + \frac{2 (K_{1} - 1)}{δ_{c}} \sum_{i = 1}^{2} ({∥ω_{a i}∥}^{2} + {∥{\bar{ω}}_{a i}∥}^{2}) + \frac{2 K_{2}}{δ_{c}} \sum_{i = 1}^{2} ({∥ω_{a i}∥}^{2 - α_{2}} + {∥{\bar{ω}}_{a i}∥}^{2 - α_{2}}) \\ + (∥ω_{c}∥ + ∥{\bar{ω}}_{c}∥) (∥ φ ∥ + ∥{\hat{ω}}_{c}∥ ∥ Θ ∥) ∥ Θ ∥ \\ + \sum_{i = 1}^{2} ∥ω_{a ∥}∥ (1 + ∥ω_{a i}∥) + \sum_{i = 1}^{2} ∥{\bar{ω}}_{a i}∥ (1 + ∥{\bar{ω}}_{a i}∥) + \frac{1}{2} \sum_{i = 1}^{2} {∥ε_{a i}∥}^{2} \end{matrix}

(A22)

Then, Equation (A2) can be scaled into the following form:

\dot{V} \leq - 2 (K_{1} - 1) V - 2^{1 - 0.5 α_{2}} K_{2} V^{1 - 0.5 α_{2}} + Ω

(A23)

Similarly, when

| S | - 1 > 0

, the following can be derived:

\dot{V} \leq - 2^{1 + 0.5 α_{1}} K_{1} V^{1 + 0.5 α_{1}} - 2 (K_{2} - 1) V + Ω

(A24)

Subsequently, the convergence time is determined by defining

l_{1} = - 2^{1 + 0.5 α_{1}} K_{1}

,

l_{2} = 2 (K_{2} - 1)

,

l_{3} = - 2 (K_{1} - 1)

, and

l_{4} = 2^{1 - 0.5 α_{2}} K_{2}

. Then, Equations (A23) and (A24) can be rewritten as follows:

\dot{V} = \{\begin{matrix} - l_{1} V^{1 + 0.5 a_{1}} - l_{2} V & | V | \geq 1 \\ - l_{3} V - l_{4} V^{1 - 0.5 a_{2}} & | V | < 1 \end{matrix}

(A25)

The above equations are transformed as follows: when

| V | \geq 1

, let

z = | V |

, and when

| V | < 1

, let

z = {| V |}^{0.5 a_{2}}

. Equation (A25) can then be written as follows:

\dot{Z} = \{\begin{matrix} l_{1} Z^{1 + 0.5 a_{1}} - l_{2} Z & Z \geq 1 \\ - 0.5 a_{2} l_{3} Z - 0.5 a_{2} l_{4} & 0 < Z < 1 \end{matrix}

(A26)

By solving Equation (A26), the upper bound for the convergence time can be derived as follows:

lim_{Z_{0} \to \infty} T (Z_{0}) = lim_{Z_{0} \to \infty} (\int_{0}^{1} \frac{1}{0.5 a_{2} l_{3} Z + 0.5 a_{2} l_{4}} dZ + \int_{1}^{z_{0}} \frac{1}{l_{1} Z^{1 + 0.5 a_{1}} + l_{2} Z} dZ)

(A27)

Based on the above equation, it is evident that the expression comprises the following two components: a definite integral over the interval

[0, 1]

and an integral with a variable upper limit over the interval

[1, Z_{0}]

. The calculation for the first component can be carried out directly, yielding the following result:

\int_{0}^{1} \frac{1}{0.5 a_{2} l_{3} Z + 0.5 a_{2} l_{4}} dZ = \frac{2}{a_{2} l_{3}} ln (\frac{l_{3} + l_{4}}{l_{4}})

(A28)

In the second part of the integral, it is necessary to examine its asymptotic behavior as

Z_{0} \to \infty

. When

κ_{1} > 0

, the integrand

l_{1} Z^{1 + κ_{1}} + l_{2} Z \to 0

as

Z \to \infty

, indicating the convergence of the integral as

Z_{0} \to \infty

. However, the antiderivative of this integral is non-elementary, precluding the derivation of a closed-form expression for its value. Nevertheless, for the purpose of determining the limit, this portion can be evaluated numerically and denoted as C. Substituting the expressions for

l_{1}

,

l_{2}

, and

l_{4}

into Equation (A26), the upper bound for the convergence time can be obtained as follows:

T_{1} = \frac{1}{α_{2} (K_{1} - 1)} ln (\frac{2 (K_{1} - 1) + 2^{1 - 0.5 α_{2}} K_{2}}{2^{1 - 0.5 α_{2}} K_{2}}) + C

(A29)

Within the time period

T_{1}

,

S

tends to zero. According to Equation (14), the following can be observed:

\begin{matrix} \dot{e} & = - β_{1} {sig}^{1 + 0.5 ε_{1} (1 + sgn (| e | - 1))} (e) - β_{2} {| e |}^{1 - 0.5 ε_{2} (1 - sgn (| e | - 1))} sgn (e) \\ = - β_{1} {| e |}^{1 + 0.5 ε_{1} (1 + sgn (| e | - 1))} sgn (e) - β_{2} {| e |}^{1 - 0.5 ε_{2} (1 - sgn (e ∣ - 1))} sgn (e) \end{matrix}

(A30)

When

| e | \geq 1

, the equation can be expressed as

| \dot{e} | = - β_{1} {| e |}^{1 + ε_{1}} - β_{2} | e |

. Conversely, when

|e| < 1

, the equation can be written as

| \dot{e} | = - β_{1} | e | - β_{2} {| e |}^{1 - ε_{2}}

. By referencing the derivation process of

T_{1}

, as discussed earlier, we can derive the upper bound on the convergence time for

e \to δ

as follows:

T_{2} = \frac{1}{ε_{2} β_{1}} ln (\frac{β_{1} + β_{2}}{β_{2}}) + C_{2}

(A31)

where

C_{2} = {lim}_{z_{0} \to \infty} (\int_{1}^{z_{0}} \frac{1}{β_{1} z^{1 + ε_{1}} + β_{2} z} dz)

.

When

|e_{z}| < δ

, it follows that

| e | < δ < 1

. Thus,

sgn (| e | - 1) = - 1

and

ϕ (ε_{2}) = 1 - ε_{2}

. Thus, we can obtain the following:

\begin{matrix} λ_{1} = \frac{β_{2} (1 - ε_{2} - γ) δ^{1 - ε_{2} - 1}}{1 - γ} \\ λ_{2} = \frac{β_{2} (1 - ε_{2} - 1) δ^{1 - ε_{2} - γ}}{1 - γ} \end{matrix}

(A32)

By substituting Equation (A32) into Equation (15), we obtain the expression for

R (e)

as follows:

\begin{matrix} R (e) & = λ_{1} e + λ_{2} {sig}^{γ} (e) \\ = \frac{β_{2} (1 - ε_{2} - γ) δ^{1 - ε_{2} - 1}}{1 - γ} e + \frac{β_{2} (1 - ε_{2} - 1) δ^{1 - ε_{2} - γ}}{1 - γ} {| e |}^{γ} sgn (e) \end{matrix}

(A33)

Thus, we can obtain:

\begin{matrix} | \dot{e} | & = β_{1} | e | + \frac{β_{2} (1 - ε_{2} - γ) S^{- ε_{2}}}{1 - γ} | e | + \frac{β_{2} (- ε_{2}) δ^{1 - ε_{2} - γ}}{1 - γ} {| e |}^{γ} \\ \leq β_{1} δ + \frac{β_{2} (1 - ε_{2} - γ)}{1 - γ} S^{- ε_{2}} δ + \frac{β_{2} (- ε_{2})}{1 - γ} δ^{1 - ε_{2} - γ} S^{γ} \\ = β_{1} δ + β_{2} (1 - \frac{2 ε_{2}}{1 - γ}) δ^{1 - ε_{2}} \\ \leq β_{1} δ + β_{2} δ^{1 - ε_{2}} \end{matrix}

(A34)

The upper bound for the convergence time is expressed as follows:

T_{2} = \frac{1}{ε_{2} β_{1}} ln (\frac{β_{1} + β_{2}}{β_{2}}) + C_{2}

(A35)

Moreover, it can be inferred that the tracking errors will converge to small neighborhoods around zero within a fixed time period of

T = T_{1} + T_{2}

. This completes the proof. □

References

Tian, B.; Guo, J.; Song, Y.; Zhou, Y.; Xu, Z.; Wang, L. Research progress and prospects of gliding robots applied in ocean observation. J. Ocean Eng. Mar. Energy 2023, 9, 113–124. [Google Scholar] [CrossRef]
Yuan, S.; Li, Y.; Bao, F.; Xu, H.; Yang, Y.; Yan, Q.; Zhong, S.; Yin, H.; Xu, J.; Huang, Z.; et al. Marine environmental monitoring with unmanned vehicle platforms: Present applications and future prospects. Sci. Total Environ. 2023, 858, 159741. [Google Scholar] [CrossRef] [PubMed]
Jiang, Z.; Wu, H.; Wu, Q.; Yang, Y.; Tan, L.; Yan, S. Control parameter optimization based trajectory design of underwater gliders executing underwater fixed-point exploration missions. Ocean Eng. 2023, 279, 114127. [Google Scholar] [CrossRef]
Joshi, B.; Xanthidis, M.; Roznere, M.; Burgdorfer, N.J.; Mordohai, P.; Li, A.Q.; Rekleitis, I. Underwater exploration and mapping. In Proceedings of the 2022 IEEE/OES Autonomous Underwater Vehicles Symposium (AUV), Singapore, 19–21 September 2022; pp. 1–7. [Google Scholar]
Alexandris, C.; Papageorgas, P.; Piromalis, D. Positioning Systems for Unmanned Underwater Vehicles: A Comprehensive Review. Appl. Sci. 2024, 14, 9671. [Google Scholar] [CrossRef]
Liang, Y.; Wang, Y.; Zhang, L.; Wang, Y.; Yang, M.; Niu, W.; Yang, S. Conceptual design and analysis of a two-stage underwater glider for ultra-long voyage. Appl. Ocean Res. 2023, 138, 103639. [Google Scholar] [CrossRef]
Yang, H.; Mahmoudian, N. Gliding in extreme waters: Dynamic Modeling and Nonlinear Control of an Agile Underwater Glider. arXi 2024, arXiv:2402.06055. [Google Scholar] [CrossRef]
Liu, Y.; Liu, J.; Pan, G.; Huang, Q.; Guo, L. Vibration Analysis and Isolator Component Design of the Power System in an Autonomous Underwater Glider. Int. J. Acoust. Vib. 2022, 27, 112–121. [Google Scholar] [CrossRef]
Leonard, N.E.; Graver, J.G. Model-based feedback control of autonomous underwater gliders. IEEE J. Ocean. Eng. 2001, 26, 633–645. [Google Scholar] [CrossRef]
Liu, Y.; Su, Z.; Luan, X.; Song, D.; Han, L. Motion analysis and fuzzy-PID control algorithm designing for the pitch angle of an underwater glider. J. Math. Comput. Sci. 2017, 17, 133–147. [Google Scholar] [CrossRef]
Landau, I.D.; Lozano, R.; M’Saad, M.; Karimi, A. Adaptive Control: Algorithms, Analysis and Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
Wan, L.; Zhang, D.; Sun, Y.; Qin, H.; Cao, Y.; Chen, G. Fast fixed-time vertical plane motion control of autonomous underwater gliders in shallow water. J. Frankl. Inst. 2022, 359, 10483–10509. [Google Scholar] [CrossRef]
Sang, H.; Zhou, Y.; Sun, X.; Yang, S. Heading tracking control with an adaptive hybrid control for under actuated underwater glider. ISA Trans. 2018, 80, 554–563. [Google Scholar] [CrossRef] [PubMed]
Nguyen, N.D.; Choi, H.s.; Jin, H.S.; Huang, J.; Lee, J.H. Robust Adaptive Depth Control of hybrid underwater glider in vertical plane. Adv. Technol. Innov. 2020, 5, 135–146. [Google Scholar] [CrossRef]
García-Valdovinos, L.G.; Salgado-Jiménez, T.; Bandala-Sánchez, M.; Nava-Balanzar, L.; Hernández-Alvarado, R.; Cruz-Ledesma, J.A. Modelling, design and robust control of a remotely operated underwater vehicle. Int. J. Adv. Robot. Syst. 2014, 11, 1. [Google Scholar] [CrossRef]
Ding, W.; Wei, D.; Diao, Y.; Yang, C.; Zhang, X.; Zhang, X.; Huang, H. Research on trajectory tracking control of ocean unmanned aerial vehicles based on disturbance observer and nonlinear sliding mode. Ocean Eng. 2024, 293, 116682. [Google Scholar] [CrossRef]
Zeng, Z.; Lyu, C.; Bi, Y.; Jin, Y.; Lu, D.; Lian, L. Review of hybrid aerial underwater vehicle: Cross-domain mobility and transitions control. Ocean Eng. 2022, 248, 110840. [Google Scholar] [CrossRef]
Zhang, X.; Zhou, H.; Fu, J.; Wen, H.; Yao, B.; Lian, L. Adaptive integral terminal sliding mode based trajectory tracking control of underwater glider. Ocean Eng. 2023, 269, 113436. [Google Scholar] [CrossRef]
Zhou, H.; Xu, H.; Cao, J.; Fu, J.; Mao, Z.; Zeng, Z.; Yao, B.; Lian, L. Robust adaptive control of underwater glider for bottom sitting-oriented soft landing. Ocean Eng. 2024, 293, 116725. [Google Scholar] [CrossRef]
Zou, H.; Zhang, G.; Hao, J. Nonsingular fast terminal sliding mode tracking control for underwater glider with actuator physical constraints. ISA Trans. 2024, 146, 249–262. [Google Scholar] [CrossRef]
Roy, R.G.; Ghoshal, D. A novel adaptive second-order sliding mode controller for autonomous underwater vehicles. Adapt. Behav. 2021, 29, 39–54. [Google Scholar] [CrossRef]
Juan, R.; Wang, T.; Liu, S.; Zhou, Y.; Ma, W.; Niu, W.; Gao, Z. High-precision motion control of underwater gliders based on reinforcement learning. Ocean Eng. 2024, 310, 118603. [Google Scholar] [CrossRef]
Wang, J.; Chen, Y.; Gao, J.; Min, B.; Pan, G. Adaptive fault tolerant control of unmanned underwater glider with predefined-time stability. J. Frankl. Inst. 2025, 362, 107364. [Google Scholar] [CrossRef]
Lei, L.; Gang, Y.; Jing, G. Physics-guided neural network for underwater glider flight modeling. Appl. Ocean Res. 2022, 121, 103082. [Google Scholar] [CrossRef]
Jeong, S.k.; Choi, H.S.; Ji, D.H.; Kim, J.Y.; Hong, S.M.; Cho, H.J. A study on an accurate underwater location of hybrid underwater gliders using machine learning. J. Mar. Sci. Technol. 2020, 28, 7. [Google Scholar]
Gao, J.; Min, B.; Chen, Y.; Jing, A.; Wang, J.; Pan, G. Compound learning based event-triggered adaptive attitude control for underwater gliders with actuator saturation and faults. Ocean Eng. 2023, 280, 114651. [Google Scholar] [CrossRef]
Emami, S.A.; Castaldi, P.; Banazadeh, A. Neural network-based flight control systems: Present and future. Annu. Rev. Control 2022, 53, 97–137. [Google Scholar] [CrossRef]
Su, Z.q.; Zhou, M.; Han, F.f.; Zhu, Y.w.; Song, D.l.; Guo, T.t. Attitude control of underwater glider combined reinforcement learning with active disturbance rejection control. J. Mar. Sci. Technol. 2019, 24, 686–704. [Google Scholar] [CrossRef]
Zang, W.; Yao, P.; Song, D. Standoff tracking control of underwater glider to moving target. Appl. Math. Model. 2022, 102, 1–20. [Google Scholar] [CrossRef]
Mirza, J.; Kanwal, F.; Salaria, U.A.; Ghafoor, S.; Aziz, I.; Atieh, A.; Almogren, A.; Haq, A.U.; Kanwal, B. Underwater temperature and pressure monitoring for deep-sea SCUBA divers using optical techniques. Front. Phys. 2024, 12, 1417293. [Google Scholar] [CrossRef]
Mirza, J.; Atieh, A.; Kanwal, B.; Ghafoor, S.; Almogren, A.; Kanwal, F.; Aziz, I. Relay aided UWOC-SMF-FSO based hybrid link for underwater wireless optical sensor network. Opt. Fiber Technol. 2025, 89, 104045. [Google Scholar] [CrossRef]
Wang, Y.; Thanyamanta, W.; Bose, N. Cooperation and compressed data exchange between multiple gliders used to map oil spills in the ocean. Appl. Ocean Res. 2022, 118, 102999. [Google Scholar] [CrossRef]
Shi, Y.; Dong, H.; He, C.R.; Chen, Y.; Song, Z. Mixed Vehicle Platoon Forming: A Multi-Agent Reinforcement Learning Approach. IEEE Internet Things J. 2025. [Google Scholar]
Fossen, T.I. Marine Control Systems—Guidance: Navigation, and Control of Ships, Rigs and Underwater Vehicles; Marine Cybernetics, Trondheim, Norway, Org. Number NO 985 195 005 MVA; Springer: Berlin/Heidelberg, Germany, 2002; ISBN 82-92356-00-2. Available online: www.marinecybernetics.com (accessed on 1 April 2025).
Wang, G.; Yang, Y.; Wang, S. Adaptive digital disturbance rejection controller design for underwater thermal vehicles. J. Mar. Sci. Eng. 2021, 9, 406. [Google Scholar] [CrossRef]
Yang, Y.; Liu, Y.; Wang, Y.; Zhang, H.; Zhang, L. Dynamic modeling and motion control strategy for deep-sea hybrid-driven underwater gliders considering hull deformation and seawater density variation. Ocean Eng. 2017, 143, 66–78. [Google Scholar] [CrossRef]
Hussain, N.A.A.; Arshad, M.R.; Mohd-Mokhtar, R. Underwater glider modelling and analysis for net buoyancy, depth and pitch angle control. Ocean Eng. 2011, 38, 1782–1791. [Google Scholar]
Liang, Y.; Zhang, L.; Yang, M.; Wang, Y.; Niu, W.; Yang, S. Dynamic behavior analysis and bio-inspired improvement of underwater glider with passive buoyancy compensation gas. Ocean Eng. 2022, 257, 111644. [Google Scholar]
Miniguano, H.; Barrado, A.; Lázaro, A.; Zumel, P.; Fernández, C. General parameter identification procedure and comparative study of Li-Ion battery models. IEEE Trans. Veh. Technol. 2019, 69, 235–245. [Google Scholar] [CrossRef]
Tekin, M.; Karamangil, M.I. Development of dual polarization battery model with high accuracy for a lithium-ion battery cell under dynamic driving cycle conditions. Heliyon 2024, 10, e28454. [Google Scholar] [CrossRef]
Hodson, T.O. Root mean square error (RMSE) or mean absolute error (MAE): When to use them or not. Geosci. Model Dev. Discuss. 2022, 2022, 5481–5487. [Google Scholar] [CrossRef]
Lee, T.H.; Harris, C.J. Adaptive Neural Network Control of Robotic Manipulators; World Scientific: Singapore, 1998; Volume 19. [Google Scholar]
Guo, Q.; Li, X.; Zuo, Z.; Shi, Y.; Jiang, D. Quasi-synchronization control of multiple electrohydraulic actuators with load disturbance and uncertain parameters. IEEE/ASME Trans. Mechatronics 2020, 26, 2048–2058. [Google Scholar] [CrossRef]
Jiang, B.; Hu, Q.; Friswell, M.I. Fixed-time attitude control for rigid spacecraft with actuator saturation and faults. IEEE Trans. Control Syst. Technol. 2016, 24, 1892–1898. [Google Scholar] [CrossRef]
Saeki, S. The L^p-conjecture and Young’s inequality. Ill. J. Math. 1990, 34, 614–627. [Google Scholar]
Ouyang, Y.; He, W.; Li, X. Reinforcement learning control of a single-link flexible robotic manipulator. IET Control Theory Appl. 2017, 11, 1426–1433. [Google Scholar] [CrossRef]
Cao, S.; Sun, L.; Jiang, J.; Zuo, Z. Reinforcement learning-based fixed-time trajectory tracking control for uncertain robotic manipulators with input saturation. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 4584–4595. [Google Scholar] [CrossRef] [PubMed]
Panda, S.; Panda, G. On the development and performance evaluation of improved radial basis function neural networks. IEEE Trans. Syst. Man Cybern. Syst. 2021, 52, 3873–3884. [Google Scholar] [CrossRef]
Zhou, H.; Wei, Z.; Zeng, Z.; Yu, C.; Yao, B.; Lian, L. Adaptive robust sliding mode control of autonomous underwater glider with input constraints for persistent virtual mooring. Appl. Ocean Res. 2020, 95, 102027. [Google Scholar] [CrossRef]
Steele, J.M. The Cauchy-Schwarz Master Class: An Introduction to the Art of Mathematical Inequalities; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]

Figure 1. Schematic diagram of the underwater glider.

Figure 2. Schematic diagram of parameter identification.

Figure 3. Comparison between the full model simulation, simplified model simulation, and sea trial data. (a) Diving depth, (b) pitch angle.

Figure 4. Schematic of controller design.

Figure 5. Trajectory tracking of an underwater glider utilizing SD-RLSMC, RLSMC, and SMC. (a) Dive depth. (b) Pitch angle.

Figure 6. Actuator outputs of an underwater glider utilizing SD-RLSMC, RLSMC, and SMC. (a) Buoyancy adjustment. (b) Position of the attitude adjustment unit.

Figure 7. Estimation results of actor NN on unmodel parts in SD-RLSMC and RLSM. (a) The estimation results for

l_{1}

. (b) The estimation results for

l_{2}

.

Figure 7. Estimation results of actor NN on unmodel parts in SD-RLSMC and RLSM. (a) The estimation results for

l_{1}

. (b) The estimation results for

l_{2}

.

Figure 8. Performance of SD-RLSMC, RLSMC, and SMC for depth and pitch angle tracking. (a)

q_{dis 1}

for depth, (b)

q_{dis 1}

for pitch angle, (c)

q_{dis 2}

for depth, (d)

q_{dis 2}

for pitch angle.

Figure 8. Performance of SD-RLSMC, RLSMC, and SMC for depth and pitch angle tracking. (a)

q_{dis 1}

for depth, (b)

q_{dis 1}

for pitch angle, (c)

q_{dis 2}

for depth, (d)

q_{dis 2}

for pitch angle.

Figure 9. Buoyancy regulation and positions of the attitude adjustment units with SD-RLSMC, RLSMC, and SMC. (a)

q_{dis 1}

for V, (b)

q_{dis 1}

for

r_{r p 1}

, (c)

q_{dis 2}

for V, (d)

q_{dis 2}

for

r_{r p 1}

.

Figure 9. Buoyancy regulation and positions of the attitude adjustment units with SD-RLSMC, RLSMC, and SMC. (a)

q_{dis 1}

for V, (b)

q_{dis 1}

for

r_{r p 1}

, (c)

q_{dis 2}

for V, (d)

q_{dis 2}

for

r_{r p 1}

.

Table 1. Parameter identification procedure.

Step	Procedure
Step 1	Utilize Simulink software (v. 24.2) to construct a nonlinear simulation model of the underwater glider.
Step 2	Import the sea trial data into the parameter identification toolbox in Simulink.
Step 3	Choose the parameters to be estimated and define the upper and lower bounds for each parameter.
Step 4	Configure the optimization and parallel computing settings.
Step 5	Execute the estimation process, iterating as needed to determine the desired parameters and optimize the cost function.
Step 6	Verify the accuracy of the model by assessing the simulated data against the estimated parameters. If significant discrepancies are found, iterate this process or adjust the model in Step 2, considering alternative identification methods if necessary.

Table 2. Parameter identification results.

Parameters	Identification Results	Parameters	Identification Results
$K_{h}$	−0.000257	$K_{M}$	−124.60 kg/rad
$K_{D 0}$	6.33 $kg / m$	$K_{q}$	−134.02 $kg \cdot s / ra d^{2}$
$K_{D}$	271.37 $kg / (m \cdot ra d^{2})$	$J_{f 2}$	−3.71 $kg / m^{2}$
$K_{L 0}$	5.85 $kg / m$	$M_{f 1}$	−59.16 $kg$
$K_{L}$	136.48 $kg / m$	$M_{f 3}$	114.37 $kg$
$K_{M 0}$	2.10 $kg$

Table 3. RMSE for the dive depth and pitch angle with the full model and simplified model.

Parameters	Full Model	Simplified Model
z	33.8584	34.9584
$θ$	0.9710	0.9730

Table 4. Controller parameter values.

Controller Composition	Variable
Critic NN	$D = 0.9 [\begin{matrix} 1 \\ 1 \end{matrix}]$ , $R = 0.1 [\begin{matrix} 1 \\ 1 \end{matrix}]$ ,
	$μ = [\begin{matrix} - 0.15 : & 0.01 : & 0.15 \end{matrix}]$ ,
	$ψ = 1$ , $δ_{c} = 1$ , $η_{c} = 0.001$ .
Active NN	$μ = [\begin{matrix} - 0.15 : & 0.01 : & 0.15 \end{matrix}]$ ,
	$ψ = 1$ , $δ_{a} = 0.25$ , $k_{I} = 1$ , $η_{a} = 0.001$ .
SMC Controller	$β_{1} = 1$ , $β_{2} = 1$ , $ε_{1} = 1$ , $ε_{2} = 1$ ,
	$γ = 10$ , $δ = 0.01$ , $α_{1} = 1$ ,
	$α_{2} = 0.25$ ,
	$K_{0} = 0.01$ , $K_{1} = 1$ , $K_{2} = 1$

Table 5. RMSE and overshoot of the diving depth and pitch angle.

Parameter	Controllers	RMSE	Overshoot
z	SMC	0.0228	0.21%
	RLSMC	0.0285	0.02%
	SD-RLSMC	0.0154	0.01%
$θ$	SMC	0.0659	36.13%
	RLSMC	0.0514	30.74%
	SD-RLSMC	0.0430	9.77%

Table 6. Controller power.

Parameter	Controllers	Power
$d V$	SMC	0.7841
	RLSMC	0.1500
	SD-RLSMC	0.1573
$r_{r p 1}$	SMC	0.0036
	RLSMC	0.0011
	SD-RLSMC	0.0006

Table 7. RMSE and overshoot of the diving depth and pitch angle.

Parameter	Controllers	$q_{dis 1}$		$q_{dis 2}$
Parameter	Controllers	RMSE	Overshoot	RMSE	Overshoot
z	SMC	0.0278	0.12%	0.284	0.03%
	RLSMC	0.1480	0.03%	0.0401	0.02%
	SD-RLSMC	0.0159	0.05%	0.0145	0.02%
$θ$	SMC	0.1340	113.64%	0.1082	89.82%
	RLSMC	0.1161	63.04%	0.0591	31.79%
	SD-RLSMC	0.0722	20.39%	0.0304	16.21%

Table 8. Controller power.

Parameter	Controllers	Power
Parameter	Controllers	$q_{dis 1}$	$q_{dis 2}$
$d V$	SMC	0.9582	0.9582
	RLSMC	0.3234	0.1852
	SD-RLSMC	0.2038	0.1346
$r_{p 1}$	SMC	0.0044	0.0044
	RLSMC	0.0021	0.0008
	SD-RLSMC	0.0016	0.0006

Table 9. Comparison of performance for underwater glider control algorithms.

Reference	Performance Effect
Wang et al., 2025 [23]	Average integral absolute error: 0.29
Zou et al., 2024 [20]	Trajectory tracking error between $10^{- 6}$ and $10^{- 4}$
Juan et al., 2024 [22]	Average control errors in velocities along u, v, and w directions: 0.5195 ± 0.5452, 0.4703 ± 0.5859, and 0.2149 ± 0.3041
Lei et al., 2024 [24]	Pitch angle steady-state error less than $0.1$ rad
This paper	For various disturbances: Z-direction RMSE between 0.0145 and 0.0159; RMSE in other directions between 0.003 and 0.007

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, G.; Yu, J.; Yang, Y. Enhancing Trajectory Tracking Performance of Underwater Gliders Using Finite-Time Sliding Mode Control Within a Reinforcement Learning Framework. J. Mar. Sci. Eng. 2025, 13, 884. https://doi.org/10.3390/jmse13050884

AMA Style

Wang G, Yu J, Yang Y. Enhancing Trajectory Tracking Performance of Underwater Gliders Using Finite-Time Sliding Mode Control Within a Reinforcement Learning Framework. Journal of Marine Science and Engineering. 2025; 13(5):884. https://doi.org/10.3390/jmse13050884

Chicago/Turabian Style

Wang, Guohui, Jianing Yu, and Yanan Yang. 2025. "Enhancing Trajectory Tracking Performance of Underwater Gliders Using Finite-Time Sliding Mode Control Within a Reinforcement Learning Framework" Journal of Marine Science and Engineering 13, no. 5: 884. https://doi.org/10.3390/jmse13050884

APA Style

Wang, G., Yu, J., & Yang, Y. (2025). Enhancing Trajectory Tracking Performance of Underwater Gliders Using Finite-Time Sliding Mode Control Within a Reinforcement Learning Framework. Journal of Marine Science and Engineering, 13(5), 884. https://doi.org/10.3390/jmse13050884

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Trajectory Tracking Performance of Underwater Gliders Using Finite-Time Sliding Mode Control Within a Reinforcement Learning Framework

Abstract

1. Introduction

2. Dynamic Modeling and Parameter Identification

2.1. Model Description

2.2. Parameter Identification

2.3. Model Validation

2.4. Standard Forms

3. Controller Design

3.1. Prior Knowledge

3.2. Sliding Mode Controller Design

3.3. Critic Neural Network Design

3.3.1. Weight Updates

3.3.2. Standard Deviation Update

3.4. Actor Neural Network Design

3.4.1. Weighs’ Update

3.4.2. Standard Deviation Update

4. Simulation Analysis

4.1. Case 1: Model Uncertainty

4.2. Case 2: External Disturbance Suppression

5. Conclusions and Prospects

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Stability Analysis

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI